talk-data.com talk-data.com

Topic

Pandas

data_manipulation data_analysis python

187

tagged

Activity Trend

17 peak/qtr
2020-Q1 2026-Q1

Activities

187 activities · Newest first

Time Series Analysis with Python Cookbook - Second Edition

Perform time series analysis and forecasting confidently with this Python code bank and reference manual Purchase of the print or Kindle book includes a free PDF eBook Key Features Explore up-to-date forecasting and anomaly detection techniques using statistical, machine learning, and deep learning algorithms Learn different techniques for evaluating, diagnosing, and optimizing your models Work with a variety of complex data with trends, multiple seasonal patterns, and irregularities Book Description To use time series data to your advantage, you need to be well-versed in data preparation, analysis, and forecasting. This fully updated second edition includes chapters on probabilistic models and signal processing techniques, as well as new content on transformers. Additionally, you will leverage popular libraries and their latest releases covering Pandas, Polars, Sktime, stats models, stats forecast, Darts, and Prophet for time series with new and relevant examples. You'll start by ingesting time series data from various sources and formats, and learn strategies for handling missing data, dealing with time zones and custom business days, and detecting anomalies using intuitive statistical methods. Further, you'll explore forecasting using classical statistical models (Holt-Winters, SARIMA, and VAR). Learn practical techniques for handling non-stationary data, using power transforms, ACF and PACF plots, and decomposing time series data with multiple seasonal patterns. Then we will move into more advanced topics such as building ML and DL models using TensorFlow and PyTorch, and explore probabilistic modeling techniques. In this part, you’ll also learn how to evaluate, compare, and optimize models, making sure that you finish this book well-versed in wrangling data with Python. What you will learn Understand what makes time series data different from other data Apply imputation and interpolation strategies to handle missing data Implement an array of models for univariate and multivariate time series Plot interactive time series visualizations using hvPlot Explore state-space models and the unobserved components model (UCM) Detect anomalies using statistical and machine learning methods Forecast complex time series with multiple seasonal patterns Use conformal prediction for constructing prediction intervals for time series Who this book is for This book is for data analysts, business analysts, data scientists, data engineers, and Python developers who want practical Python recipes for time series analysis and forecasting techniques. Fundamental knowledge of Python programming is a prerequisite. Prior experience working with time series data to solve business problems will also help you to better utilize and apply the different recipes in this book.

Analyzing how patterns evolve over time in multi-dimensional datasets is challenging—traditional time-series methods often struggle with interpretability when comparing multiple entities across different scales. This talk introduces a clustering-based framework that transforms continuous data into categorical trajectories, enabling intuitive visualization and comparison of temporal patterns.What & Why: The method combines quartile-based categorization with modified Hamming distance to create interpretable "trajectory fingerprints" for entities over time. This approach is particularly valuable for policy analysis, economic comparisons, and any domain requiring longitudinal pattern recognition.Who: Data scientists and analysts working with temporal datasets, policy researchers, and anyone interested in comparative analysis across entities with different scales or distributions.Type: Technical presentation with practical implementation examples using Python (pandas, scikit-learn, matplotlib). Moderate mathematical content balanced with intuitive visualizations.Takeaway: Attendees will learn a novel approach to temporal pattern analysis that bridges the gap between complex statistical methods and accessible, policy-relevant insights. You'll see practical implementations analyzing 60+ years of fiscal policy data across 8 countries, with code available for adaptation to your own datasets.

Efficient Time-Series Forecasting with Thousands of Local Models on Databricks

In industries like energy and retail, forecasting often requires local models when each time series has unique behavior — though training thousands of them can be overwhelming. However, training and managing thousands of such models presents scalability and operational challenges. This talk shows how we scaled local models on Databricks by leveraging the Pandas API on Spark, and shares practical lessons on storage, reuse, and scaling challenges to make this approach efficient when it’s truly needed

In this episode, Tristan Handy sits down with Chang She — a co-creator of Pandas and now CEO of LanceDB — to explore the convergence of analytics and AI engineering. The team at LanceDB is rebuilding the data lake from the ground up with AI as a first principle, starting with a new AI-native file format called Lance. Tristan traces Chang's journey as one of the original contributors to the pandas library to building a new infrastructure layer for AI-native data. Learn why vector databases alone aren't enough, why agents require new architecture, and how LanceDB is building a AI lakehouse for the future. For full show notes and to read 6+ years of back issues of the podcast's companion newsletter, head to https://roundup.getdbt.com. The Analytics Engineering Podcast is sponsored by dbt Labs.

Traditional subgraph isomorphism algorithms like VF2 rely on sequential tree-search that can't leverage parallel computing. This talk introduces Δ-Motif, a data-centric approach that transforms graph matching into data operations using Python's data science stack. Δ-Motif decomposes graphs into small "motifs" to reconstruct matches. By representing graphs as tabular data with RAPIDS cuDF and Pandas, we achieve 10-595X speedups over VF2 without custom GPU kernels. I'll demonstrate practical applications from social networks to quantum computing, and show when GPU acceleration provides the biggest benefits for graph analysis problems. Perfect for data scientists working with network analysis, recommendation systems, or pattern matching at scale

Most AI pipelines still treat models like Python UDFs, just another function bolted onto Spark, Pandas, or Ray. But models aren’t functions: they’re expensive, stateful, and difficult to configure. In this talk, we’ll explore why this mental model breaks at scale and share practical patterns for treating models as first-class citizens in your pipelines.

This talk presents a formal methodology for constructing a Multi-Modal Knowledge Graph for a smart city, addressing data privacy and heterogeneity by using entirely synthetic data. We demonstrate a Python pipeline that leverages Large Language Models for text generation and knowledge extraction, Pandas for sensor data simulation, and rdflib for graph construction. The result is a robust, privacy-preserving foundation for a Cognitive Digital Twin, enabling advanced urban analytics.

Exploring how to model forest structure and quantify ecosystem services using Python and Earth observation data (GEDI, Sentinel-1, Sentinel-2, SRTM) with data preprocessing, machine learning (Random Forest), and SHAP interpretation to understand variable importance. Estimating canopy height and aboveground biomass and mapping forest ecosystem services for monitoring and climate research. Case study combining GEDI, Sentinel, and SRTM data.

Python is at the core of our analytics platform, which processes over 8,000 game records daily, each approximately 500 MB in size. Over the past two years, we have accumulated more than 200 TB of data, equivalent to 1,600 years of game time from over 7 million players—and our goal is to increase this user count tenfold. This talk will cover how we transitioned from Go and C++ parsers connected via PyBind to data frames in Python, how our analyses evolved from Pandas to Polars, and why we migrated our backend from Django to FastAPI. Finally, we will share our real-world experience with performance optimization, leveraging RabbitMQ, Redis, and process monitoring in an environment where Python bridges the worlds of game data and AI analysis.

L'introduction de ES|QL dans Elasticsearch facilite la recherche et l'analyse de grands jeux de données.\n\nES|QL présente ses résultats sous forme tabulaire en JSON, CSV et aussi au format Apache Arrow, un format de dataframe compact permettant des échanges sans désérialisation, qui est nativement supporté par la librairie Python Pandas.\n\nCette intégration ouvre de nouvelles perspectives pour l'exploration des données avec les outils habituels des data analysts, et l'intégration facile des pipelines d'aggrégation dans les applications.\n\nAprès un bref aperçu de ES|QL, nous ferons une exploration interactive d'un jeu de données avec ES|QL, Arrow et Pandas dans un notebook Jupyter. Et un petit benchmark vous montrera l'efficacité du format Arrow comparé à JSON !

Modern data engineering leverages Python to build robust, scalable, end-to-end workflows. In this talk, we will cover how Snowflake offers you a flexible development environment for developing Python data pipelines, performing transformation at scale, orchestrating and deploying your pipelines at scale. Topics we’ll cover include:

•Ingest: Data source APIs, Snowflake file-to-read and ingest data of any format when files arrive, with sources outside Snowflake •Develop: Packaging (artifact repo), Python runtimes, IDE (Notebook, vscode) •Transform: Snowpark pandas, UDFs, UDAFs •Deploy: Tasks, Notebook scheduling

Getting Started with Taipy

Share your machine learning models, create chatbots, as well as build and deploy insightful dashboards speedily using Taipy with this hands-on book featuring real-world application examples from multiple industries Free with your book: DRM-free PDF version + access to Packt's next-gen Reader Key Features Create visually compelling, interactive data applications with Taipy Bring predictive models to end users and create data pipelines to compare scenarios with what-if analyses Go beyond prototypes to build and deploy production-ready applications using the cloud provider of your choice Purchase of the print or Kindle book includes a free PDF eBook in full color Book Description While data analysts, data scientists, and BI experts have the tools to analyze data, build models, and create compelling visuals, they often struggle to translate these insights into practical, user-friendly applications that help end users answer real-world questions, such as identifying revenue trends, predicting inventory needs, or detecting fraud, without wading through complex code. This book is a comprehensive guide to overcoming this challenge. This book teaches you how to use Taipy, a powerful open-source Python library, to build intuitive, production-ready data apps quickly and efficiently. Instead of creating prototypes that nobody uses, you'll learn how to build faster applications that process large amounts of data for multiple users and deliver measurable business impact. Taipy does the heavy lifting to enable your users to visualize their KPIs, interact with charts and maps, and compare scenarios for better decision-making. You’ll learn to use Taipy to build apps that make your data accessible and actionable in production environments like the cloud or Docker. By the end of this book, you won’t just understand Taipy, you'll be able to transform your data skills into impactful solutions that address real-world needs and deliver valuable insights. Email sign-up and proof of purchase required What you will learn Explore Taipy, its use cases, and how it's different from other projects Discover how to create visually appealing interactive apps, display KPIs, charts, and maps Understand how to compare scenarios to make better decisions Connect Taipy applications to several data sources and services Develop apps for diverse use cases, including chatbots, dashboards, ML apps, and maps Deploy Taipy applications on different types of servers and services Master advanced concepts for simplifying and accelerating your development workflow Who this book is for If you’re a data analyst, data scientist, or BI analyst looking to build production-ready data apps entirely in Python, this book is for you. If your scripts and models sit idle because non-technical stakeholders can’t use them, this book shows you how to turn them into full applications fast with Taipy, so your work delivers real business value. It’s also valuable for developers and engineers who want to streamline their data workflows and build UIs in pure Python.

Vos modèles prédictifs vieillissent mal ? Une mise à jour de vos packages (pandas, scikit-learn, lightgbm…), et c’est la panne assurée en production…

Avec Scoring.AI, reprenez le contrôle total et garantissez leur pérennité. Notre outil innovant construit des scores hyper performants et traduit automatiquement leur déploiement en code Python pur, basé uniquement sur Pandas et NumPy.

Résultat ?

Une portabilité totale : vos modèles fonctionnent en production indépendamment des packages et outils qui ont servi à les construire

Une maintenance simplifiée : les équipes IT peuvent mettre à jour leur stack technique sans risque de casse

Propriété et transparence accrue : un code lisible, auditable et facile à déployer, même dans des environnements contraints

À travers des cas concrets et une démo live, explorez comment désenclaver vos modèles des dépendances logicielles et garantir leur survie sur le long terme. Parce qu’un bon modèle, c’est un modèle qui dure !

tesa SE is a global adhesive manufacturing company. In their highly automated tape production process it's needed to observe quality and effiency abnormal events with very short latency to avoid high costs.

Utilizing Snowflake's Machine Learning capabilities, Tesa SE is monitoring various KPI's that indicate the correct production process.

Tesa's newest innovative usecase aims to decrease waste during the production process using anomaly detection methodologies, which are trained on Snowflake, and used for inference on-edge for optimal latency.

The machine learning model pipeline components are built and served leveraging Snowflake features such as Snowflake CLI, Snowpark Pandas and other Snowflake capabilities to streamline the overarching ML process.

Modern data engineering leverages Python to build robust, scalable, end-to-end workflows. In this talk, we will cover how Snowflake offers you a flexible development environment for developing Python data pipelines, performing transformation at scale, orchestrating and deploying your pipelines at scale. Topics we’ll cover include: – Ingest: Data source APIs, Snowflake file-to-read and ingest data of any format when files arrive, with sources outside Snowflake – Develop: Packaging (artifact repo), Python runtimes, IDE (Notebook, vscode) – Transform: Snowpark pandas, UDFs, UDAFs – Deploy: Tasks, Notebook scheduling

How to do real TDD in data science? A journey from pandas to polars with pelage!

In the world of data, inconsistencies or inaccuracies often presents a major challenge to extract valuable insights. Yet the number of robust tools and practices to address those issues remain limited. Particularly, the practice of TDD remains quite difficult in data science, while it is a standard among classic software development, also because of poorly adapted tools and frameworks.

To address this issue we released Pelage, an open-source Python package to facilitate data exploration and testing, which relies on Polars intuitive syntax and speed. Pelage empowers data scientists and analysts to facilitate data transformation, enhance data quality and improve code clarity.

We will demonstrate, in a test-first approach, how you can use this library in a meaningful data science workflow to gain greater confidence for your data transformations.

See website: https://alixtc.github.io/pelage/

Advanced Polars: Lazy Queries and Streaming Mode

Do you find yourself struggling with Pandas' limitations when handling massive datasets or real-time data streams?

Discover Polars, the lightning-fast DataFrame library built in Rust. This talk presents two advanced features of the next-generation dataframe library: lazy queries and streaming mode.

Lazy evaluation in Polars allows you to build complex data pipelines without the performance bottlenecks of eager execution. By deferring computation, Polars optimises your queries using techniques like predicate and projection pushdown, reducing unnecessary computations and memory overhead. This leads to significant performance improvements, particularly with datasets larger than your system’s physical memory.

Polars' LazyFrames form the foundation of the library’s streaming mode, enabling efficient streaming pipelines, real-time transformations, and seamless integration with various data sinks.

This session will explore use cases and technical implementations of both lazy queries and streaming mode. We’ll also include live-coding demonstrations to introduce the tool, showcase best practices, and highlight common pitfalls.

Attendees will walk away with practical knowledge of lazy queries and streaming mode, ready to apply these tools in their daily work as data engineers or data scientists.