talk-data.com talk-data.com

Filter by Source

Select conferences and events

Activities & events

Title & Speakers Event

Program Synthesis (PS) is the task of automatically generating logical procedures or source code from a small set of input-output examples. While LLMs and agents dominate current AI conversations, they often struggle with these kinds of precise reasoning tasks—where smaller, well-structured models for PS can succeed. In this talk, we’ll walk through the end-to-end development of an PS system, covering dataset representation using graph structures, model architectures, and tree search algorithms. The working example for this talk is the generation of procedural textures for 3D modeling, but the methodology is domain-agnostic. Participants will leave with a deeper understanding of PS, its real-world potential, and the trade-offs between different architectural approaches. The session is designed for practitioners with a solid understanding of ML concepts and some familiarity with NN architectures such as transformers and CNNs.

AI/ML LLM
Jean Carlo Machado – Data Science Manager @ GetYourGuide

In this talk, I will walk through how building data products is evolving with modern AI development tools. I’ll take you through a small end-to-end product I built in my free time—covering everything from design, to frontend development, to data collection, and ultimately to building data science components. Here is the link to the project https://stateoftheartwithai.com/

AI/ML Data Collection Data Science
The Zen of Claude Code 2025-11-19 · 19:30
Vlad Gheorghe – AI Engineer @ SceneMind.ai

This talk traces the evolution of AI agents from the ambitious but complex AutoGPT experiment in 2023, through Cursor's gradual "autonomy slider" approach (2023-2025), to the elegant simplicity of Claude Code in 2025. The presentation argues that we've moved from highly complex systems with multiple agents, embeddings, and cloud infrastructure that, while groundbreaking, often struggled with basic tasks, to "The Zen of Claude Code": a simple terminal-based agent that achieves excellent performance by embracing the bitter lesson.

AI/ML Cloud Computing
Functional Reproducibility 2025-11-19 · 19:30
Robin Gower – Freelance data scientist

Have you ever written the perfect data analysis? Has it still ran unchanged 6 months later? Can your colleagues run it without you? Just because your analysis is executable, it doesn’t mean the results are reproducible. Data ages. Libraries change. Machines differ. Servers go down. Bits rot. Entropy is inescapable. We can learn how to engineer reproducibility by drawing on techniques from functional programming and the MLOps movement.

MLOps
Yashasvi Misra – Data Engineer @ Pure Storage , Igor Kvachenok – Master’s student in Data Science @ Leuphana University of Lüneburg , Selim Nowicki – Founder @ Distill Labs , Mehdi Ouazza – guest , Gülsah Durmaz – Architect & Developer

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

  • Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
  • Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
  • Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
  • Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
  • Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

AI/ML Analytics Data Engineering Data Science dbt DuckDB Kubernetes LLM MLOps Motherduck Python
DataTalks.Club
Event PyData Berlin 2025 2025-09-03
Closing Session 2025-09-03 · 13:10

Closing Session

Free-floating carsharing systems struggle to balance vehicle supply and demand, which often results in inefficient fleet distribution and reduced vehicle utilization. This talk explores how data scraping can be used to model vehicle demand and user behavior, enabling targeted incentives to encourage self-balancing vehicle flows.

Using information scraped from a major mobility provider over multiple months, the presentation provides spatiotemporal analyses and machine learning results to determine whether it's practically possible to offer low-friction discounts that lead to improved fleet balance.

AI/ML
Kubeflow pipelines meet uv 2025-09-03 · 12:20

Kubeflow is a platform for building and deploying portable and scalable machine learning (ML) workflows using containers on Kubernetes-based systems.

We will code together a simple Kubeflow pipeline, show how to test it locally. As a bonus, we will explore one solution to avoid dependency hell using the modern dependency management tool uv.

AI/ML Kubernetes

When a new requirement appears, whether it's document storage, pub/sub messaging, distributed queues, or even full-text search, Postgres can often handle it without introducing more infrastructure.

This talk explores how to leverage Postgres' native features like JSONB, LISTEN/NOTIFY, queueing patterns and vector extensions to build robust, scalable systems without increasing infrastructure complexity.

You'll learn practical patterns that extend Postgres just far enough, keeping systems simpler, more maintainable, and easier to operate, especially in small to medium projects or freelancing setups, where Postgres often already forms a critical part of the stack.

Postgres might not replace everything forever - but it can often get you much further than you think.

postgresql Pub/Sub

Energy infrastructure is vulnerable to damage by erosion or third party interference, which often takes the form of unsanctioned construction. In this talk we discuss our experiences using deep learning algorithms powered by large foundation models to monitor for changes in bi-temporal very-high resolution satellite imagery.

Managing who can see or do what with your data is a fundamental challenge, especially as applications and data grow in complexity. Traditional role-based systems often lack the granularity needed for modern data platforms. Fine-Grained Authorization (FGA) addresses this by controlling access at the individual resource level. In this 90-minute hands-on tutorial, we will explore implementing FGA using OpenFGA, an open-source authorization engine inspired by Google's Zanzibar. Attendees will learn the core concepts of Relationship-Based Access Control (ReBAC) and get practical experience defining authorization models, writing relationship tuples, and performing authorization checks using the OpenFGA Python SDK. Bring your laptop ready to code to learn how to build secure and flexible permission systems for your data applications.

Python
Lunch Break 2025-09-03 · 10:30
Lunch Break 2025-09-03 · 10:30
Lunch Break 2025-09-03 · 10:30
Lunch Break 2025-09-03 · 10:30

Clear documentation is crucial for the success of open-source libraries, but it’s often hard to get right. In this talk, I’ll share our experience applying the Diataxis documentation framework to improve two HoloViz ecosystem libraries, hvPlot and Panel. Attendees will come away with practical insights on applying Diataxis and strengthening documentation for their own projects.

Dat Tran – guest @ Priceloop , Dennis Schmidt

In aviation, search isn’t simple—people use abbreviations, slang, and technical terms that make exact matching tricky. We started with just Postgres, aiming for something that worked. Over time, we upgraded: semantic embeddings, reranking. We tackled filter complexity, slow index builds, and embedding updates and much more. Along the way, we learned a lot about making AI search fast, accurate, and actually usable for our users. It’s been a journey—full of turbulence, but worth the landing.

AI/ML postgresql

This talk introduces a new and innovative business model supported by a network of digital activists that form a collective force for protecting humanity, enabling digitally aware users to reclaim control over their data.

Docling, an open source package, is rapidly becoming the de facto standard for document parsing and export in the Python community. Earning close to 30,000 GitHub in less than one year and now part of the Linux AI & Data Foundation. Docling is redefining document AI with its ease and speed of use. In this session, we’ll introduce Docling and its features, including usages with various generative AI frameworks and protocols (e.g. MCP).

AI/ML GenAI GitHub Linux Python

API calls suck! Okay, not all of them. But building your AI features reliant on third party APIs can bring a lot of trouble. In this talk you'll learn how to use web technologies to become more independent.

AI/ML API