talk-data.com talk-data.com

Event

PyData Berlin 2025

2025-09-01 – 2025-09-03 PyData

Activities tracked

99

Sessions & talks

Showing 76–99 of 99 · Newest first

Search within this event →
Automating Content Creation with LLMs: A Journey from Manual to AI-Driven Excellence

Automating Content Creation with LLMs: A Journey from Manual to AI-Driven Excellence

2025-09-01 Watch
talk

In the fast-paced realm of travel experiences, GetYourGuide encountered the challenge of maintaining consistent, high-quality content across its global marketplace. Manual content creation by suppliers often resulted in inconsistencies and errors, negatively impacting conversion rates. To address this, we leveraged large language models (LLMs) to automate content generation, ensuring uniformity and accuracy. This talk will explore our innovative approach, including the development of fine-tuned models for generating key text sections and the use of Function Calling GPT API for structured data. A pivotal aspect of our solution was the creation of an LLM evaluator to detect and correct hallucinations, thereby improving factual accuracy. Through A/B testing, we demonstrated that AI-driven content led to fewer defects and increased bookings. Attendees will gain insights into training data refinement, prompt engineering, and deploying AI at scale, offering valuable lessons for automating content creation across industries.

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

More than DataFrames: Data Pipelines with the Swiss Army Knife DuckDB

2025-09-01 Watch
talk

Most Python developers reach for Pandas or Polars when working with tabular data—but DuckDB offers a powerful alternative that’s more than just another DataFrame library. In this tutorial, you’ll learn how to use DuckDB as an in-process analytical database: building data pipelines, caching datasets, and running complex queries with SQL—all without leaving Python. We’ll cover common use cases like ETL, lightweight data orchestration, and interactive analytics workflows. You’ll leave with a solid mental model for using DuckDB effectively as the “SQLite for analytics.”

The EU AI Act: Unveiling Lesser-Known Aspects, Implementation Entities, and Exemptions

The EU AI Act: Unveiling Lesser-Known Aspects, Implementation Entities, and Exemptions

2025-09-01 Watch
talk
Adrin Jalali (scikit-learn and Fairlearn)

The EU AI Act is already partly in effect which prohibits certain AI systems. After going through the basics, we cover some of the less talked about aspects of the Act, introducing entities involved in its implementation and how many high risk government and law enforcement use cases are excluded!

Lunch Break

2025-09-01
talk

Lunch Break

2025-09-01
talk

Lunch Break

2025-09-01
talk

PyLadies & Empowered in Tech Lunch

2025-09-01
talk

Join PyLadies & Empowered in Tech for a special lunch event aimed at fostering community. Enjoy meaningful conversations and networking opportunities.

Accessible Data Visualizations

Accessible Data Visualizations

2025-09-01 Watch
talk

Data visualizations often exclude users with visual impairments and temporary or situational constraints. Many regulations (European Accessibility Act, American Disabilities Act) now mandate inclusive digital content. Our research provides practical solutions — optimized color palettes, supplementary patterns, and alternative formats — implemented in popular libraries like Bokeh and Vega-Altair. These techniques, available through our open-source cusy Design System, create visualizations that reach broader audiences while meeting compliance requirements and improving comprehension for all users.

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

Democratizing Experimentation: How GetYourGuide Built a Flexible and Scalable A/B Testing Platform

2025-09-01 Watch
talk

At GetYourGuide, we transformed experimentation from a centralized, closed system into a democratized, self-service platform accessible to all analysts, engineers, and product teams. In this talk, we'll share our journey to empower individuals across the company to define metrics, create dimensions, and easily extend statistical methods. We'll discuss how we built a Python-based Analyzer toolkit enabling standardized, reusable calculations, and how our experimentation platform provides ad-hoc analytical capabilities through a flexible API. Attendees will gain practical insights into creating scalable, maintainable, and user-friendly experimentation infrastructure, along with access to our open-source sequential testing implementation.

Democratizing Digital Maps: How Protomaps Changes the Game

Democratizing Digital Maps: How Protomaps Changes the Game

2025-09-01 Watch
talk
API

Digital mapping has long been dominated by commercial providers, creating barriers of cost, complexity, and privacy concerns. This talk introduces Protomaps, an open-source project that reimagines how web maps are delivered and consumed. Using the innovative PMTiles format – a single-file approach to vector tiles – Protomaps eliminates complex server infrastructure while reducing bandwidth usage and improving performance. We'll explore how this technology democratizes cartography by making self-hosted maps accessible without API keys, usage quotas, or recurring costs. The presentation will demonstrate implementations with Leaflet and MapLibre, showcase customization options, and highlight cases where Protomaps enables privacy-conscious, offline-capable mapping solutions. Discover how this technology puts mapping control back in the hands of developers while maintaining the rich experiences modern applications demand.

Exploring Millions of High-dimensional Datapoints in the Browser for Early Drug Discovery

Exploring Millions of High-dimensional Datapoints in the Browser for Early Drug Discovery

2025-09-01 Watch
talk

The visual exploration of large, high-dimensional datasets presents significant challenges in data processing, transfer, and rendering for engineering in various industries. This talk will explore innovative approaches to harnessing massive datasets for early drug discovery, with a focus on interactive visualizations. We will demonstrate how our team at Bayer utilizes a modern tech stack to efficiently navigate and analyze millions of data points in a high-dimensional embedding space. Attendees will gain insights into overcoming performance challenges, optimizing data rendering, and developing user-friendly tools for effective data exploration. We aim to demonstrate how these technologies can transform the way we interact with complex datasets in engineering applications and eventually allow us to find the needle in a multidimensional haystack.

A Beginner's Guide to State Space Modeling

A Beginner's Guide to State Space Modeling

2025-09-01 Watch
talk

State Space Models (SSMs) are powerful tools for time series analysis, widely used in finance, economics, ecology, and engineering. They allow researchers to encode structural behavior into time series models, including trends, seasonality, autoregression, and irregular fluctuations, to name just a few. Many workhorse time series models, including ARIMA, VAR, and ETS, are special cases of the general statespace framework.

In this practical, hands-on tutorial, attendees will learn how to leverage PyMC's new state-space modeling capabilities (pymc_extras.statespace) to build, fit, and interpret Bayesian state space models.

Starting from fundamental concepts, we'll explore several real-world use cases, demonstrating how SSMs help tackle common time series challenges, such as handling missing observations, integrating external regressors, and generating forecasts.

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python

Beyond Linear Funnels: Visualizing Conditional User Journeys with Python

2025-09-01 Watch
talk

Optimizing user funnels is a common task for data analysts and data scientists. Funnels are not always linear in the real world. often, the next step depends on earlier responses or actions. This results in complex funnels that can be tricky to analyze. I’ll introduce an open-source Python library I developed that analyzes and visualizes non-linear, conditional funnels by utilizing Graphviz and Streamlit. It calculates conversion rates, drop-offs, time spent on each step, and highlights bottlenecks by color. Attendees will learn about how to quickly explore complex user journeys and generate insightful funnel data.

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

🛰️➡️🧑‍💻: Streamlining Satellite Data for Analysis-Ready Outputs

2025-09-01 Watch
talk

I will share how our team built an end-to-end system to transform raw satellite imagery into analysis-ready datasets for use cases like vegetation monitoring, deforestation detection, and identifying third-party activity. We streamlined the entire pipeline from automated acquisition and cloud storage to preprocessing that ensures spatial, spectral, and temporal consistency. By leveraging Prefect for orchestration, Anyscale Ray for scalable processing, and the open source STAC standard for metadata indexing, we reduced processing times from days to near real-time. We addressed challenges like inconsistent metadata and diverse sensor types, building a flexible system capable of supporting large-scale geospatial analytics and AI workloads.

Coffee Break

2025-09-01
talk

Coffee Break

2025-09-01
talk

Coffee Break

2025-09-01
talk

Coffee Break

2025-09-01
talk
PyData 2077: a data science future retrospective

PyData 2077: a data science future retrospective

2025-09-01 Watch
talk

From: Chrono-Regulatory Commission, Temporal Enforcement Division To: PyData Berlin Organising Committee Subject: Citation #TMP-2077-091 - Unauthorised Spacetime Disturbance

Dear Committee, Our temporal monitoring systems have detected an unauthorised chronological anomaly emanating from your facility (Berliner Congress Center, coordinates 52.52068°N, 13.416451°E) scheduled to manifest on September 1st at 9:20 a.m.

Opening Session

Opening Session

2025-09-01 Watch
talk

Opening Session for PyData Berlin 2025

Registration & Coffee

2025-09-01
talk

Registration & Coffee

2025-09-01
talk

Registration & Coffee

2025-09-01
talk

Registration & Coffee

2025-09-01
talk