talk-data.com

Topic

Polars

data_manipulation data_analysis rust

Activities

tagged

Activity Trend

13 peak/qtr

2020-Q1 2026-Q2

Top Events

SciPy 2025 5 PyData Berlin 2025 3 O'Reilly Data Science Books 3 Data Engineering Central Podcast 3 PyData Paris 2025 2 PyData London 2025 2 DataTopics: All Things Data, AI & Tech 2 PyData Seattle 2025 2 PyConDE & PyData Berlin 2023 2 PyData Amsterdam 2025 2 Databricks DATA + AI Summit 2023 2 O'Reilly Data Engineering Books 1

Top Speakers

Marco Gorelli (Narwhals) 4 Dr. Jeroen Janssens (Posit) 3 Thijs Nieuwdorp (VodafoneZiggo) 2 Daniel Beach 2 Thomas Bierhance 1 Bernardo Dionisi 1 Brodie Vidrine 1 Guen Prawiroatmodjo 1 Vyas Ramasubramani 1 Ritchie Vink (Polars) 1 Oz Katz (Treeverse) 1 Joris Bekkers 1

Activities

4 activities · Newest first

All Video Podcast Book

Time Series Analysis with Python Cookbook - Second Edition

2026-01-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Tarek A. Atwan

AI/ML Pandas Python PyTorch TensorFlow data data-science data-science-tasks statistics time-series

Perform time series analysis and forecasting confidently with this Python code bank and reference manual Purchase of the print or Kindle book includes a free PDF eBook Key Features Explore up-to-date forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms Learn different techniques for evaluating, diagnosing, and optimizing your models Work with a variety of complex data with trends, multiple seasonal patterns, and irregularities Book Description To use time series data to your advantage, you need to be well-versed in data preparation, analysis, and forecasting. This fully updated second edition includes chapters on probabilistic models and signal processing techniques, as well as new content on transformers. Additionally, you will leverage popular libraries and their latest releases covering Pandas, Polars, Sktime, stats models, stats forecast, Darts, and Prophet for time series with new and relevant examples. You'll start by ingesting time series data from various sources and formats, and learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods. Further, you'll explore forecasting using classical statistical models (Holt-Winters, SARIMA, and VAR). Learn practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Then we will move into more advanced topics such as building ML and DL models using TensorFlow and PyTorch, and explore probabilistic modeling techniques. In this part, you’ll also learn how to evaluate, compare, and optimize models, making sure that you finish this book well-versed in wrangling data with Python. What you will learn Understand what makes time series data different from other data Apply imputation and interpolation strategies to handle missing data Implement an array of models for univariate and multivariate time series Plot interactive time series visualizations using hvPlot Explore state-space models and the unobserved components model (UCM) Detect anomalies using statistical and machine learning methods Forecast complex time series with multiple seasonal patterns Use conformal prediction for constructing prediction intervals for time series Who this book is for This book is for data analysts, business analysts, data scientists, data engineers, and Python developers who want practical Python recipes for time series analysis and forecasting techniques. Fundamental knowledge of Python programming is a prerequisite. Prior experience working with time series data to solve business problems will also help you to better utilize and apply the different recipes in this book.

DuckDB: Up and Running

2024-12-12 · O'Reilly Data Science Books O'Reilly Amazon

book

by Wei-Meng Lee

Analytics CSV Data Analytics DuckDB JSON Pandas Parquet Python SQL data data-science data-science-tools

DuckDB, an open source in-process database created for OLAP workloads, provides key advantages over more mainstream OLAP solutions: It's embeddable and optimized for analytics. It also integrates well with Python and is compatible with SQL, giving you the performance and flexibility of SQL right within your Python environment. This handy guide shows you how to get started with this versatile and powerful tool. Author Wei-Meng Lee takes developers and data professionals through DuckDB's primary features and functions, best practices, and practical examples of how you can use DuckDB for a variety of data analytics tasks. You'll also dive into specific topics, including how to import data into DuckDB, work with tables, perform exploratory data analysis, visualize data, perform spatial analysis, and use DuckDB with JSON files, Polars, and JupySQL. Understand the purpose of DuckDB and its main functions Conduct data analytics tasks using DuckDB Integrate DuckDB with pandas, Polars, and JupySQL Use DuckDB to query your data Perform spatial analytics using DuckDB's spatial extension Work with a diverse range of data including Parquet, CSV, and JSON

Data Engineering for Machine Learning Pipelines: From Python Libraries to ML Pipelines and Cloud Platforms

2024-09-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Pavan Kumar Narayanan

AI/ML Airflow Analytics API AWS Azure Cloud Computing Data Analytics Data Engineering Data Quality GCP Kafka +9 more

This book covers modern data engineering functions and important Python libraries, to help you develop state-of-the-art ML pipelines and integration code. The book begins by explaining data analytics and transformation, delving into the Pandas library, its capabilities, and nuances. It then explores emerging libraries such as Polars and CuDF, providing insights into GPU-based computing and cutting-edge data manipulation techniques. The text discusses the importance of data validation in engineering processes, introducing tools such as Great Expectations and Pandera to ensure data quality and reliability. The book delves into API design and development, with a specific focus on leveraging the power of FastAPI. It covers authentication, authorization, and real-world applications, enabling you to construct efficient and secure APIs using FastAPI. Also explored is concurrency in data engineering, examining Dask's capabilities from basic setup to crafting advanced machine learning pipelines. The book includes development and delivery of data engineering pipelines using leading cloud platforms such as AWS, Google Cloud, and Microsoft Azure. The concluding chapters concentrate on real-time and streaming data engineering pipelines, emphasizing Apache Kafka and workflow orchestration in data engineering. Workflow tools such as Airflow and Prefect are introduced to seamlessly manage and automate complex data workflows. What sets this book apart is its blend of theoretical knowledge and practical application, a structured path from basic to advanced concepts, and insights into using state-of-the-art tools. With this book, you gain access to cutting-edge techniques and insights that are reshaping the industry. This book is not just an educational tool. It is a career catalyst, and an investment in your future as a data engineering expert, poised to meet the challenges of today's data-driven world. What You Will Learn Elevate your data wrangling jobs by utilizing the power of both CPU and GPU computing, and learn to process data using Pandas 2.0, Polars, and CuDF at unprecedented speeds Design data validation pipelines, construct efficient data service APIs, develop real-time streaming pipelines and master the art of workflow orchestration to streamline your engineering projects Leverage concurrent programming to develop machine learning pipelines and get hands-on experience in development and deployment of machine learning pipelines across AWS, GCP, and Azure Who This Book Is For Data analysts, data engineers, data scientists, machine learning engineers, and MLOps specialists

Polars Cookbook

2024-08-23 · O'Reilly Data Science Books O'Reilly Amazon

book

by Yuki Kakegawa

Analytics Big Data Cloud Computing Data Analytics Microsoft NumPy Pandas Python data data-science data-science-tools

Dive into the world of data analysis with the Polars Cookbook. This book, ideal for data professionals, covers practical recipes to manipulate, transform, and analyze data using the Python Polars library. You'll learn both the fundamentals and advanced techniques to build efficient and scalable data workflows. What this Book will help me do Master the basics of Python Polars including installation and setup. Perform complex data manipulation like pivoting, grouping, and joining. Handle large-scale time series data for accurate analysis. Understand data integration with libraries like pandas and numpy. Optimize workflows for both on-premise and cloud environments. Author(s) Yuki Kakegawa is an experienced data analytics consultant who has collaborated with companies such as Microsoft and Stanford Health Care. His passion for data led him to create this detailed guide on Polars. His expertise ensures you gain real-world, actionable insights from every chapter. Who is it for? This book is perfect for data analysts, engineers, and scientists eager to enhance their efficiency with Python Polars. If you are familiar with Python and tools like pandas but are new to Polars, this book will upskill you. Whether handling big data or optimizing code for performance, the Polars Cookbook has the guidance you need to succeed.