talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (435 results)

See all 435 →

Activities & events

Title & Speakers Event
TidyTuesday 2026-01-27 · 23:00

Join R-Ladies Ottawa for a casual evening of programming on Tuesday, January 27th. We'll be participating in TidyTuesday, a weekly data visualization challenge organized by the R for Data Science community.

What is TidyTuesday?

Every week, a new dataset is posted online on the TidyTuesday GitHub repo, and folks from around the world create data visualizations using the dataset. It's an opportunity to put your programming skills into practice using real-world data in a way that's fun! It's also a great way for everyone to learn from each other, by sharing their visualizations and code.

What will the dataset be?

Even we don't know that (yet)! We'll have to wait until the day before the event to know what data we'll be working with. If you're interested in seeing some past datasets, take a look at the examples below, or visit the TidyTuesday GitHub repo to see all of the datasets dating back to 2018.

Examples from past TidyTuesdays:

Do I have to use R?

No! You can use any programming language or visualization software that you want. In fact, Python users from around the globe participate in "TyDyTuesday" on a weekly basis.

Who is this event for?

No previous programming experience is required to participate, and we'll have experienced programmers in the room who can help you get started (or unstuck), if needed.

...But if you want to get the most out of the event, a good way to prepare is to watch the recording of the introduction to data visualization workshop we hosted back in 2024. :)

What should I bring?

  • Please bring a laptop so you can code along. We recommend that you have RStudio or another IDE (such as VS Code or Positron) installed ahead of time, but we can help you get one installed if needed!
  • Come ready to learn, share, and contribute to a safe and welcoming community!

How will this event work?

  • First few minutes of the event: Introductions, and taking a look at the dataset together as a group.
  • Time to create a data visualization using the language or software of your choice, either on your own or with a (new) friend! Grab a free snack while you're at it :)
  • Last \~30 minutes of the event: Show and tell session for anyone who would like to share their creation with the group.

What else do I need to know?

This event (like all R-Ladies events) is totally FREE to attend.

The event will take place at Bayview Yards, which is located just a few steps away from the Bayview O-Train station. There is also a free parking lot available for those who are driving. You can find us in the "Training Room", which is on the second floor of the Bayview Yards building.

This is an in-person event with limited space! Please only RSVP if you are able to attend in-person!

***Please note that the mission of R-Ladies is to increase gender diversity in the R community. This event is intended to provide a safe space for women and gender minorities. We ask for male allies to be invited by and accompanied by a woman or gender minority.***

We’re grateful to be part of the Bayview Meetups initiative and extend our thanks to Bayview Yards for generously providing the venue space.

TidyTuesday
PyData at BeSecure 2025 2025-12-08 · 08:30

As community partner we are inviting you to sign up for BeSecure Community Stage - a joint initiative of many technical communities in 3city!

*Some of the talks are in Polish others in English.

📍 Where: Amber Expo, Gdańsk 📅 When: 8 grudnia 2025 📝 Registration: https://codeme.pl/besecure/

Events cooperating: Hackerspace Trójmiasto, TJUG, PyGda, PyData, Gdańsk Embedded Meetup, TRUG, Golang Trójmiasto, MLGdańsk, WordUp Gdynia, Nerds Coding Gang.

👉Admission to the Community Stage is free; simply complete the registration form. This is made available by exploring different perspectives and presenting lectures from the local community on topics ranging from hardware and systems, through backend, to Python, data science, and machine learning.

Community Stage Agenda

09:30–09:45 – Official opening 🇵🇱 09:50–10:20 – Maciej Wierzbowski 🇵🇱 Prywatność pod lupą: Hacking WiFi i Bluetooth w praktyce 10:25–10:45 – Wojciech Kargul 🇬🇧 Cloudflare Outage: The Day Rust Broke the Internet 10:45–11:00 – Coffee break ☕ 11:00–11:50 – Adam Bien 🇬🇧 Shared responsibility or beyond the firewall? Cloud security for Enterprise Java Developers 11:50–12:20 – Dorota Kozłowska 🇬🇧 Social engineering for Covert Access Engagements 12:20-12:35 – Coffee break ☕ 12:35–13:15 – Przemysław Michalak 🇵🇱 Podstawy hackowania sprzętu 13:20–13:55 – Jakub Rachoń 🇵🇱 Java: czy WORA wciąż aktualna? 14:00–15:00 – Lunch 🍽️ 15:00–15:30 – Mateusz Bełczowski 🇵🇱 Przegląd zagrożeń w ekosystemie Pythona 15:35–16:25 – Łukasz Langa 🇬🇧 Permacomputing and Python 16:30–16:45 – Coffee break ☕ 16:45–17:00 – Lightning Talks – Yet open to anyone 🇵🇱/🇬🇧 17:00–18:00 – Lightning Talks (continuation) 🇵🇱/🇬🇧 *** Other stages at BeSecure (getting extended ticket required):

  • Main Stage
  • Business Stage
  • Workshop Stage
  • Public & Skills Stage

Use 25% discount code PyData_BeSecure-25% As community partner we still have a few free admissions codes for other stages for you - PM us!

🎟️ Tickets and more details: https://codeme.pl/besecure/

PyData at BeSecure 2025
Yashasvi Misra – Data Engineer @ Pure Storage , Igor Kvachenok – Master’s student in Data Science @ Leuphana University of Lüneburg , Selim Nowicki – Founder @ Distill Labs , Mehdi Ouazza – guest , Gülsah Durmaz – Architect & Developer

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

  • Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
  • Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
  • Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
  • Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
  • Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

AI/ML Analytics Data Engineering Data Science dbt DuckDB Kubernetes LLM MLOps Motherduck Python
DataTalks.Club

This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t

As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass.

Apply code - COMMUNITY-20- to save more.

Level Up Your AI Skills This Fall! 🚀

Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person.

🚀And here's a pro-tip:

If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West.


🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount!


If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending:

  • AI & Machine Learning Modeling - September 9th, 2025
  • Vibe Coding with AI (NEW) - September 11th, 2025
  • Machine Learning Data Prep with Python - September 16th, 2025
  • Introduction to Machine Learning - September 18th, 2025
  • Large Language Models & Fine-Tuning - September 25th, 2025
  • Introduction to RAG - October 2nd, 2025
  • Introduction AI Agents - October 9th, 2025
  • Build and Launch Your AI Project - October 16th, 2025

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy *.Data Wrangling With SQL * Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science

How it works 🌍 :

  • Each course is 2.5 hours long and includes extra materials
  • The primer series is taught live and then available on demand.
  • If you miss the live course, each session is available on-demand as soon as you register.
  • Each course includes exercises to improve learning outcomes.
  • Coding expercises allow you to learn hands-on skills.
  • Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training

Your Instructor🚀:

Sheamus McGovern, Founder and Engineer \| ODSC AI

Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ ODSC blog: https://opendatascience.com/ Slack Channel: https://hubs.li/Q038cQBy0 Code of conduct: https://odsc.ai/code-of-conduct/

Virtual 6-Week AI Bootcamp 2025

This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t

As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass.

Apply code - COMMUNITY-20- to save more.

Level Up Your AI Skills This Fall! 🚀

Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person.

🚀And here's a pro-tip:

If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West.


🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount!


If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending:

  • AI & Machine Learning Modeling - September 9th, 2025
  • Vibe Coding with AI (NEW) - September 11th, 2025
  • Machine Learning Data Prep with Python - September 16th, 2025
  • Introduction to Machine Learning - September 18th, 2025
  • Large Language Models & Fine-Tuning - September 25th, 2025
  • Introduction to RAG - October 2nd, 2025
  • Introduction AI Agents - October 9th, 2025
  • Build and Launch Your AI Project - October 16th, 2025

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy *.Data Wrangling With SQL * Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science

How it works 🌍 :

  • Each course is 2.5 hours long and includes extra materials
  • The primer series is taught live and then available on demand.
  • If you miss the live course, each session is available on-demand as soon as you register.
  • Each course includes exercises to improve learning outcomes.
  • Coding expercises allow you to learn hands-on skills.
  • Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training

Your Instructor🚀:

Sheamus McGovern, Founder and Engineer \| ODSC AI

Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ ODSC blog: https://opendatascience.com/ Slack Channel: https://hubs.li/Q038cQBy0 Code of conduct: https://odsc.ai/code-of-conduct/

Virtual 6-Week AI Bootcamp 2025

This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t

As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass.

Apply code - COMMUNITY-20- to save more.

Level Up Your AI Skills This Fall! 🚀

Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person.

🚀And here's a pro-tip:

If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West.


🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount!


If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending:

  • AI & Machine Learning Modeling - September 9th, 2025
  • Vibe Coding with AI (NEW) - September 11th, 2025
  • Machine Learning Data Prep with Python - September 16th, 2025
  • Introduction to Machine Learning - September 18th, 2025
  • Large Language Models & Fine-Tuning - September 25th, 2025
  • Introduction to RAG - October 2nd, 2025
  • Introduction AI Agents - October 9th, 2025
  • Build and Launch Your AI Project - October 16th, 2025

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy *.Data Wrangling With SQL * Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science

How it works 🌍 :

  • Each course is 2.5 hours long and includes extra materials
  • The primer series is taught live and then available on demand.
  • If you miss the live course, each session is available on-demand as soon as you register.
  • Each course includes exercises to improve learning outcomes.
  • Coding expercises allow you to learn hands-on skills.
  • Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training

Your Instructor🚀:

Sheamus McGovern, Founder and Engineer \| ODSC AI

Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ ODSC blog: https://opendatascience.com/ Slack Channel: https://hubs.li/Q038cQBy0 Code of conduct: https://odsc.ai/code-of-conduct/

Virtual 6-Week AI Bootcamp 2025

This is PAID event. REGISTER HERE - https://lu.ma/psj48l0t

As a gesture of our appreciation for being part of ODSC Community, we are offering a 20% discount on the 6-week Bootcamp pass.

Apply code - COMMUNITY-20- to save more.

Level Up Your AI Skills This Fall! 🚀

Join us for an intensive 6-week virtual AI Bootcamp, a fantastic prelude to the renowned ODSC AI West Bootcamp in October! This isn't just any bootcamp; it's your chance to build a strong foundation in AI from the comfort of your home, all before experiencing the full, immersive 4-day event in person.

🚀And here's a pro-tip:

If you secure a pass for the ODSC AI West Bootcamp, you'll gain free access to this 6-week virtual training. It's the perfect way to maximize your learning experience and hit the ground running at ODSC West.


🚀 Curious to learn more? Head over to https://odsc.ai/west/bootcamp/ for all the details. Don't forget to use code COMMUNITYWest2025 at checkout for an extra discount!


If you choose to enroll in the 6-Week Virtual AI Bootcamp as a standalone event:

Over 6 weeks, gain a comprehensive understanding of AI, from foundational concepts in coding and machine learning to LLMs, AI Agents & RAG Here are the list of sessions you will be attending:

  • AI & Machine Learning Modeling - September 9th, 2025
  • Vibe Coding with AI (NEW) - September 11th, 2025
  • Machine Learning Data Prep with Python - September 16th, 2025
  • Introduction to Machine Learning - September 18th, 2025
  • Large Language Models & Fine-Tuning - September 25th, 2025
  • Introduction to RAG - October 2nd, 2025
  • Introduction AI Agents - October 9th, 2025
  • Build and Launch Your AI Project - October 16th, 2025

Regardless of your current skill level, this AI bootcamp will help transform you from AI novice to confident practitioner 🚀

If you’ve missed any, don’t worry—you still have access to the 13 sessions that passed including our On-Demand courses: * Data & Generative AI Literacy *.Data Wrangling With SQL * Linear Algebra * Statistics and Hypothesis Testing * Introduction to Math for Data Science

How it works 🌍 :

  • Each course is 2.5 hours long and includes extra materials
  • The primer series is taught live and then available on demand.
  • If you miss the live course, each session is available on-demand as soon as you register.
  • Each course includes exercises to improve learning outcomes.
  • Coding expercises allow you to learn hands-on skills.
  • Learn at your own pace. Courses can be taken alongside additional Ai+ courses - aiplus.training

Your Instructor🚀:

Sheamus McGovern, Founder and Engineer \| ODSC AI

Sheamus McGovern is the founder of ODSC AI (The Open Data Science Conference). He is also a software architect, data engineer, and AI expert. He started his career in finance by building stock and bond trading systems and risk assessment platforms and has worked for numerous financial institutions and quant hedge funds. Over the last decade, Sheamus has consulted with dozens of companies and startups to build leading-edge data-driven applications in finance, healthcare, eCommerce, and venture capital. He holds degrees from Northeastern University, Boston University, Harvard University, and a CQF in Quantitative Finance.

Some useful links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://aiplus.training/ ODSC blog: https://opendatascience.com/ Slack Channel: https://hubs.li/Q038cQBy0 Code of conduct: https://odsc.ai/code-of-conduct/

Virtual 6-Week AI Bootcamp 2025

TL;DR Learn how to turn your Python functions into interactive web applications using open-source tools. By the end, each of us will have deployed a portfolio (or store) with multiple web applications and learned how to reproduce it easily later on.

Tell me more Work not shown is work lost. Many excellent scientists and engineers are not always adept at showcasing their work. This results in many interesting scientific ideas that have never been brought to light.

However, using today's tools, one no longer has to leave the Python ecosystem to create classy, complete prototypes using modern data visualization and web development tools. With over five years of experience building and presenting data solutions at huge science companies, we show it doesn't have to be challenging. We provide a walkthrough of the primary web application frameworks and showcase Fast Dash, an open-source Python library that we built to address specific prototyping needs.

This tutorial is designed for all data professionals who value the ability to quickly convert their scientific code into web applications. Participants will learn about the leading frameworks, their strengths and limitations, and a decision flowchart for picking the best one for a given task. We will go through some day-to-day applications and hands-on Python coding throughout the session. Whether you bring your use-cases and datasets, or pick from our suggestions, you'll have a reproducible portfolio (app store) of deployed web applications by the end!

DataViz Python
SciPy 2025

Join us for a special PyData x R-Ladies x Rome R Users Group EVENT!

IN PERSON at Istituto Nazionale di Geofisica e Vulcanologia - INGV.IT Registration for in person: https://www.meetup.com/pydata-roma-capitale/events/308302003 ⚠️ Remember to RSVP using your full name for security reasons and bring a valid ID to show at the entrance. Otherwise, you will not be allowed to enter the premises! ⚠️

ONLINE Attendees can use the zoom link on this page.

We’ll kick off the day with a warm welcome from R-Ladies & Rome R Users communities, followed by an inspiring opening talk and a series of dynamic 5-minute lightning talks from community members. Whether you’re joining in person or tuning in remotely, we’re excited to celebrate the power of open-source languages, data science, and community.

Schedule:

  • 18:00 🚪 Doors open
  • 18:15 🙌 Welcome from PyData Roma Capitale, R-Ladies Rome, and R User Group
  • 18.30 🎤 "Speaking Many Languages: Finding Power and Joy in R and Python" – Federica Gazzelloni
  • 19.00 ⚡ Lightning Talks (open mic, up to 5 minutes!)
  • 19.30 🤝 Networking —share insights, spark new collaborations
  • 20:00 👋 Close & goodnight (or if you want, join us for a multi-community dinner outside the venue!)

Opening Talk – Speaking Many Languages: Finding Power and Joy in R and Python By Federica Gazzelloni When I first started my journey in data science, I believed I had to “choose a side”—R or Python. But over time, I discovered that speaking more than one programming language isn’t about choosing sides; it’s about expanding perspectives. In this talk, I’ll share a bit of my personal story—how I came to learn both R and Python, what each has taught me, and how switching between them has enriched my work and opened unexpected doors. Along the way, I’ll walk through a few simple, side-by-side examples in both languages to highlight their strengths and unique styles. Whether you’re deeply rooted in one language or just starting out, I hope this talk leaves you feeling empowered to learn broadly, build confidently, and connect across communities.

Speaking Many Languages: Finding Power and Joy in R and Python
Humble Data Workshop 2025-06-08 · 13:45
Hugh Evans – Developer Advocate @ Imply

Learn Python for Data Science in this Beginners’ Day Workshop Would you like to learn to code but don’t know where to start? Taking your first steps in programming can seem like an impossible task so we’ve decided to put on a workshop to show beginners how it can be done and share our passion for the world of data science!

Apply to be a student https://forms.gle/2cvNyRK8c8pNnpnz5

Data Science Python
PyData London 2025

PyCon US 2025 is coming to Pittsburgh this May 14–22, and PyData Pittsburgh is thrilled to be part of it! We’re hosting the Hometown Heroes Hatchery track on Saturday, May 17—a half-day event inside the conference celebrating the incredible work of Python developers, researchers, educators, and technologists from across our city. As part of PyCon’s Hatchery initiative, this track will feature presentations and lightning talks that highlight the creativity and impact of Pittsburgh’s Python community.

If you're attending PyCon US 2025, we invite the PyData Pittsburgh community to join us at the Hometown Heroes track—come connect, engage, and help showcase the strength of our local tech scene.

Please note: you must be registered for PyCon US 2025 to attend this event, and all attendees and speakers are responsible for securing their own tickets. You can find registration details for the Conference here:https://us.pycon.org/2025/attend/information/.

HOMETOWN HEROES HATCHERY PROGRAM - May 17th

TALK SCHEDULE:

Decoding Spatial Biology with Python: Multi-Modal Insights into Breast Cancer Progression Time: 01:45 PM - 02:15 PM Speakers: Alex C. Chang, CMU-Pitt (Graduate Student PhD, Computational Biology ) and Brent Schlegel, University of Pittsburgh School of Medicine (Graduate Student PhD, Integrative Systems Biology)

Python has rapidly become a cornerstone of scientific computing, computational biology, and bioinformatics due to its ease of use and scalability for handling large datasets—qualities that are critical in today’s “big data” era of clinical and translational research. As computational resources and data collection methods continue to expand, we are now empowered to ask larger and more clinically relevant questions that enable us to dissect complex biological systems with unprecedented detail. However, this surge in data complexity brings new challenges, from the integration of diverse data modalities to the need for sophisticated analytical methods capable of untangling intricate biological signals from background noise. In this talk, we describe how Python not only meets these challenges but also drives innovation through the development of novel bioinformatics tools like CITEgeist—a case study in harnessing Python’s capabilities for multi-modal spatial transcriptomics. Biological datasets often face challenges of high sparsity and noise. CITEgeist harnesses Python’s robust ecosystem to provide an efficient, scalable pipeline that deconvolutes messy spatial signals into actionable, clinically relevant features.

Exploring Energy Burden in Pittsburgh Neighborhoods with Python Time: 02:30 PM - 03:00 PM Speakers: Ling Almoubayyed, SmithGroup, Inc. (Project Manager) and Husni Almoubayyed, Carnegie Learning

National-level energy studies consistently find that energy burdens are a significant challenge, and that lower-income neighborhoods sometimes end up paying more for energy in cities including Pittsburgh. Using Python, we were able to extract and analyze data on energy consumption in the City of Pittsburgh, along with real-estate and geographic information system (GIS) data to compare trends in energy usage and burden across Pittsburgh neighborhoods, and across different housing types. We present statistical analyses and Python visualizations describing these trends across different features such as housing price, size, and neighborhood.

Bottling Tesla's Solar: A Solar Dashboard with Python Time: 03:15 PM - 03:45 PM Speaker: Christopher Pitstick (Sr. SWE)

Tesla's Powerwall/Inverter solar ecosystem are powerful yet notoriously opaque. For home labbers, extracting meaningful data can be daunting—but not impossible. In this talk, I'll share my journey of developing a custom solar dashboard using Grafana and PyPowerwall, navigating the quirks and closed nature of Tesla's ecosystem along the way. The backend is all Python, so I will demo my server code and dashboard to show how I was able find hundreds of kilowatt hours in lost solar production. In this talk, we'll do a deep dive into the way I altered the Python server code to be able to query multiple inverters at the same time with complex iptable rules. This presentation may conclude with the value of installing solar on your home, and how self-monitoring is a critical component of every nerd's arsenal.

Strategies for Eliciting Structured Ouputs from LLMs Time: 03:50 PM - 03:55 PM Speaker: Utkarsh Tripathi, Solventum (Machine Learning Engineer)

This lightning talk will provide a concise yet comprehensive overview of techniques for extracting structured, predictable outputs from Large Language Models. I will compare and demonstrate multiple state-of-the-art libraries (such as BAML, Instructor, Langchain, SGLang etc. + how they work under the hood), utilize pydantic / dataclass / etc. to get structured outputs. We will explore practical examples of JSON schema enforcement, markdown formatting directives, and template-based approaches that dramatically improve downstream processing capabilities. The presentation will include code snippets and prompt templates that participants can immediately implement in their own projects.

Does Generative AI Know Statistics? Time: 03:55 PM - 04:00 PM Speaker: Louis Luangkesorn, Highmark Health (Lead Data Scientist)

Generative AI has promise to impact many fields of endeavor. But experience has shown that it often has problems with nuance and context. This talk discusses some experiences using Generative AI as an aid in applied analytics and walks through an example that illustrates working around its weaknesses and taking advantage of its capabilities.

Demystifying How Animal Behavior Affects Disease Spread Using Python Time: 04:00 PM - 04:05 PM Speaker: Carolyn Tett, University of Pittsburgh (Research Technician)

Not all individuals contribute equally to disease spread. During COVID-19, social distancing reduced transmission for some, while high-contact individuals increased disease spread. Preventative measures for massive disease outbreaks, however, cannot rely solely on data from rare epidemic events. Instead, disease ecologists study animal models to understand how host behavior theoretically drives disease outbreaks. Tracking animal movement and interactions is essential for identifying transmission-relevant behaviors. In lab experiments, video recordings provide an abundance of behavioral data, now efficiently processed through automation, and coding languages like Python enable large-scale data analysis. The Stephenson Lab at the University of Pittsburgh uses Raspberry Pis to autonomously record guppies infected with an ectoparasite. These parasites transmit primarily through instances of close contact between hosts. Through autonomous video recordings, we generated 1,300 hours of footage—equivalent to 54 consecutive days of observation. Given that each video captures six guppies, manually tracking behavior would take tens of billions of days. Instead, animal tracking software reduces this processing time to a mere few months.

The Many-Colored Functions of Async Python Time: 04:15 PM - 04:45 PM Speaker: Bryan C. Mills, Duolingo (Senior Software Engineer)

You might think of functions in async Python in terms of “synchronous” and “async”, but the possibility of binding objects (such as Locks) to the asyncio event loop adds a whole new dimension to consider. We'll examine six vibrant kinds of functions and how they interact! This talk will examine code examples of how to adapt each kind of function to call other kinds, suggest design patterns that minimize the complexity of dealing with different kinds (such as non-blocking context managers), and examine patterns or libraries to safely synchronize concurrent calls involving multiple kinds of function.

Automated Dependency Inference and its Applications Time: 05:00 PM - 05:30 PM Speaker: Jason R. Coombs, Microsoft (Principal Software Engineer)

Last summer, I launched the Coherent Software Development System (https://bit.ly/coherent-system) with the principal that one should not have to repeat themselves when developing more than one Python project. One of the key innovations of that system is coherent.deps, a system for deriving package dependencies from the imports that a project or script uses. I'll explore some of the background motivations from Google's monorepo, some prior art at Meta, and some of the approaches that failed (AI-based inference) before going into the details of the implementation (AST parsing, world-readable MongoDB database, Big Table query to PyPI downloads). I'll additionally talk about some of the applications of this generalized library (coherent.build, pip-run), some of the maintenance challenges (expensive query, refresh interval), and possible other applications (on-demand dependency loader).

SPEAKER BIOS:

Alex C. Chang Alexander Chih-Chieh Chang is a fourth-year MSTP student in the CMU-Pitt Computational Biology Ph.D. Program, mentored by Drs. Lee and Oesterreich. He earned a BS/BA in Chemical and Biomolecular Engineering/Sociology from Johns Hopkins University in 2021. Previously, during his undergraduate research in the lab of Rong Li, Ph.D., he conducted large-scale genomic screens to study proteomic dysregulation and spent a gap year in the lab of Manish Aghi, MD. PhD., studying breast cancer metastasis to the brain. Currently, as a computational biologist and medical student, he coordinates the Hope for OTHERS tissue donation program in the Lee-Oesterreich Lab and computational research projects in breast cancer metastasis and genomic evolution. Brent Schlegel Brent Schlegel is a first-year PhD student in Integrative Systems Biology at the University of Pittsburgh School of Medicine, co-mentored by Drs. Adrian Lee and Steffi Oesterreich. He earned his AS in Mathematics and Sciences from CCAC (2019) and a BS in Computational Biology from Pitt (2021). Most recently, he worked as a Bioinformatics Analyst at the UPMC Children’s Hospital of Pittsburgh, where he specialized in the integrative analysis of large, complex biomedical datasets. Now, Brent combines data science, computational modeling, and multi-omic integration to tackle the systems biology of invasive lobular breast cancer, using patient-derived organoid models and leveraging “big data” to uncover hidden patterns and drive innovation in diagnosis and treatment.

Ling Almoubayyed Ling is an experienced architecture and urban designer with extensive project management expertise. Specializing in urban design, planning, community engagement, and spatial analysis, she has successfully led projects ranging from individual buildings to comprehensive urban districts. Ling uses evidence-based design with data gathered through stakeholder engagement to identify the best design solutions to create built environments. She is currently a Project Manager with SmithGroup. Husni Almoubayyed Husni Almoubayyed is the Director of AI at Pittsburgh-based education technology company Carnegie Learning. Husni uses machine learning and data science methods to conduct research in education, specifically in topics such as personalization, equity, and predictive analytics. Prior to his work in education technology, Husni acquired a Ph.D. in Astrophysics from Carnegie Mellon University, where he worked on mitigating biases in astronomical data to advance understanding of dark energy. Needless to say, Python is Husni's favorite programming language, and PyCon is one of his favorite events of the year!

Christopher Pitstick Christopher, a passionate software engineer who installed solar panels on his home in 2024, quickly immersed himself in system analysis to optimize performance—expertise that directly inspired this presentation. His programming journey began at age 12 with QBasic, igniting a lifelong passion that led to roles at industry giants including Microsoft, Amazon, and Argo AI before joining his current position at Latitude. Throughout his career, Christopher has mastered multiple programming languages from C++ to Perl and Python, approaching coding both as a profession and personal passion. As a dedicated neurodiversity advocate, he regularly shares his experiences through public speaking engagements, raising awareness and empowering others in the tech community.

Utkarsh Tripathi Utkarsh Tripathi is a Machine Learning Engineer at Solventum, Inc., where he works on Solventum™ Fluency Align™ and Solventum™ Fluency Direct™ : AI-powered clinical documentation tools that leverage conversational and generative AI, along with ambient intelligence, to automate medical documentation. These solutions help reduce administrative work and physician burnout, while improving the overall patient care experience. Utkarsh holds degrees in Electrical Engineering, Chemistry, and Computer Science from BITS Pilani and the University of Chicago.

Louis Luangkesorn Dr. Louis Luangkesorn is a Lead Data Scientist at Highmark Health where he works on projects applying statistical, predictive, operations research, and Generative AI models in use cases involving human resources and healthcare. He has contributed code to Scipy and a book appendix porting a simulation textbook's examples to Simpy.

Carolyn Tett Carolyn is an ecologist that specializes in animal behavior and disease ecology. She works with guppies and their ectoparasites to better understand how host contact rate and physiological status impact disease spread. She captures guppy behaviors on video and uses Python to automate the video processing. Using these outputs, she quantifies guppy social metrics and runs statistical models to predict behavior-mediated parasite spread.

Bryan C. Mills Bryan maintains Python core services at Duolingo, and was formerly a maintainer on the Go project at Google.

Jason R. Coombs Jason's been a passionate contributor to Python and open source software since the 90's, is a core contributor to Python, and maintains hundreds of packages in PyPI.

PyCon 2025 Special Event: Hometown Heroes Hatchery Program

Weekly Milestone

📅 Week 6 Focus: Storytelling & Presenting Your Work This week, we’re zooming in on one of the most underrated skills in data science: communicating your project effectively.

👉 You can start anytime — the program is designed in a loop, so each theme comes back every 7 weeks. This week’s focus is storytelling and presentation, but feel free to jump in wherever you are. Your project, your pace.


🚀 Build & Learn: Data Science Meetup --- With Coffee 💡 Always wanted to build a data science project but struggle to start? Or just looking for a structured, motivating space to learn and create? This isn’t just a casual meetup—it’s a community-driven program designed to help you go from idea → working project → portfolio-ready presentation in 7 weeks.

🛠 What This Meetup Is AboutGuided learning and setup help—perfect if you’re new to data science or Python. ✅ A structured weekly challenge—each week has a focus, with clear milestones and community check-ins. ✅ Support and accountability—work alongside others, ask questions, and stay motivated through our Discord ServerA final showcase day—present your project, get feedback, and celebrate your progress. ✅ In-person AND online participation—join us at the café or follow along remotely via Discord.

📌 You’ll walk away with: 🎯 A working project to add to your portfolio, GitHub, or resume 🎯 definitely some caffeine in your system ☕ 💡 Not sure what to build? Ever had a question that stuck in your head—something you wish you could map, analyze, or visualize? Now’s your chance. 🕵️ Can I map out corporate and government corruption based on their interactions? 🎶 Can I find movies based on emotions instead of genres? 💸 Are all skincare products really just the same thing in different bottles? 🗺️ Can I optimize my travels by balancing flexibility with smart planning? 🤖 Can I build an AI that rewrites history in different storytelling styles? (More ideas at the event! You can also bring your own!)

Who’s Hosting? I’m Lindsey, a senior data scientist working on AI, causal inference, and data products. I’ve built models for fraud detection, uplift modeling, and LLM applications. 📅 When? Saturday , May 3, 11:00 AM- 1:00 PM 📍 Where? Octopus Bar, Pestalozzistraße 5-8, 13187 Berlin 💻 Bring: Your laptop, an idea, or just curiosity! 👩‍💻 **No experience needed—just curiosity.**Grab a coffee, meet cool people, and work on something fun.

Build & Learn: Data Science with Coffee

Details

🚀 Build & Learn: Data Science Meetup --- With Coffee 💡 Always wanted to build a data science project but struggle to start? Or just looking for a structured, motivating space to learn and create? This isn’t just a casual meetup—it’s a community-driven program designed to help you go from idea → working project → portfolio-ready presentation in 7 weeks.

🛠 What This Meetup Is AboutGuided learning and setup help—perfect if you’re new to data science or Python. ✅ A structured weekly challenge—each week has a focus, with clear milestones and community check-ins. ✅ Support and accountability—work alongside others, ask questions, and stay motivated through our Discord ServerA final showcase day—present your project, get feedback, and celebrate your progress. ✅ In-person AND online participation—join us at the café or follow along remotely via Discord.

📌 You’ll walk away with: 🎯 A working project to add to your portfolio, GitHub, or resume 🎯 definitely some caffeine in your system ☕ 💡 Not sure what to build? Ever had a question that stuck in your head—something you wish you could map, analyze, or visualize? Now’s your chance. 🕵️ Can I map out corporate and government corruption based on their interactions? 🎶 Can I find movies based on emotions instead of genres? 💸 Are all skincare products really just the same thing in different bottles? 🗺️ Can I optimize my travels by balancing flexibility with smart planning? 🤖 Can I build an AI that rewrites history in different storytelling styles? (More ideas at the event! You can also bring your own!)

Who’s Hosting? I’m Lindsey, a senior data scientist working on AI, causal inference, and data products. I’ve built models for fraud detection, uplift modeling, and LLM applications. 📅 When? Saturday , Apirl 19 11:00 AM- 1:00 PM 📍 Where? Octopus Bar, Pestalozzistraße 5-8, 13187 Berlin 💻 Bring: Your laptop, an idea, or just curiosity! 👩‍💻 **No experience needed—just curiosity.**Grab a coffee, meet cool people, and work on something fun.

Build & Learn: Data Science with Coffee
PyData Leeds: March Meet-up 2025-03-25 · 17:30

PyData Leeds is back and we're very excited to bring you the March Meet-up. We've got a full schedule with 2 presentations, it's going to be great!

PyData Leeds brings together people who are passionate about Python, Data & Engineering for evenings focussed around learning and networking.

Schedule: Date: Tuesday 25th March 2024 Time: 17:30 Location: Parallax Offices, The Elbow Rooms, 64 Call Lane, Leeds, LS1 6DT

Agenda: 17:30: Networking and Refreshments 18:00: Welcome & Icebreaker 18:15: Jakub Szamuk, Software Engineer - 'Purr-mission Granted: Machine Vision in the Real World' In an era where LLMs and machine learning are transforming industries, how do we bring this tech into a real product - quickly? This talk explores the journey of building Purr-mission Granted, a heavily over-engineered machine-vision catflap. From concept to working prototype in just one day, we will dive into the challenges of gathering training data and lessons learned in implementing machine vision in a physical product. Whether you're an AI enthusiast, maker, or just a pet owner tired of surprise deliveries, this talk aims to help inspire you to start bringing this exciting new technology into your own projects. 19:00: Suze Hawkins, Lead Data Scientist & Magda Nowakowska, Senior Data Scientist - 'Data Science Without Data: Building Models When Real Data is Scarce' What do you do when you're faced with a data science problem, but there’s no real data available? Sometimes, access is restricted due to privacy, legal constraints, or simply because it hasn’t been collected yet. However, being able to test and experiment ideas quickly is an important aspect of the development to production cycle - often as a proof of concept to secure the necessary approvals or access to real data. In this talk, we’ll explore practical strategies for tackling machine learning challenges when starting from scratch. 19:45: Wrap-up & Drinks

If you have been before, we look forward to seeing you again and if you're coming along for the first time, we're excited to meet you and for you to join the Leeds PyData Community.

Connect with us on Meetup, Discord or Twitter.

PyData Leeds is a strictly professional event, as such professional behaviour is expected.

PyData Leeds is a chapter of PyData, an educational program of NumFOCUS and thus abides by the NumFOCUS Code of Conduct - https://pydata.org/code-of-conduct.html

PyData Leeds: March Meet-up

We are excited to finally have the first ClickHouse Meetup in the vibrant city of Delhi! Join the ClickHouse crew, from Singapore and from different cities in India, for an engaging day of talks, food, and discussion with your fellow database enthusiasts.

But here's the deal: to secure your spot, make sure you register ASAP!

🗓️ Agenda:

  • 10:30 AM: Registration & Networking
  • 11:05 AM: Welcome & Opening
  • 11:10 AM: Introduction to ClickHouse by Rakesh Puttaswamy, Solution Architect @ ClickHouse
  • 11:25 AM: ClickPipes Overview and demo by Kunal Gupta, Sr. Software Engineer @ ClickHouse
  • 11:40 AM: Optimizing Log Management with Clickhouse: Cost-Effective & Scalable Solutions by Pushpender Kumar, DevOps Architect @ OLX India
  • 12:10 PM: ClickHouse at Physics Wallah: Empowering Real-Time Analytics at Scale by Utkarsh G. Srivastava, Software Development Engineer III @ Physics Wallah
  • 12:40 PM: FabFunnel & ClickHouse: Delivering Real-Time Marketing Analytics by Anmol Jain, SDE-2 (Full stack Developer) and Siddhant Gaba, SDE-2 (Python), @ Idea Clan
  • 1:10 PM: From SQL to AI: Building Intelligent Applications with ClickHouse and LangDB by Matteo Pelati, Co-founder, LangDB.ai
  • 1:40 PM: Lunch & Networking

If anyone from the community is interested in sharing a talk at future meetups, complete this CFP form and we’ll be in touch. _______

🎤 Session Details: Introduction to ClickHouse Discover the secrets behind ClickHouse's unparalleled efficiency and performance. Johnny will give an overview of different use cases for which global companies are adopting this groundbreaking database to transform data storage and analytics.

Speaker: Rakesh Puttaswamy, Solution Architect @ ClickHouse Rakesh Puttaswamy is a Solution Architect with ClickHouse, working with users across India, with over 12 years of experience in data architecture, big data, data science, and software engineering.Rakesh helps organizations design and implement cutting-edge data-driven solutions. With deep expertise in a broad range of databases and data warehousing technologies, he specializes in building scalable, innovative solutions to enable data transformation and drive business success.

🎤 Session Details: ClickPipes Overview and demo ClickPipes is a powerful integration engine that simplifies data ingestion at scale, making it as easy as a few clicks. With an intuitive onboarding process, setting up new ingestion pipelines takes just a few steps—select your data source, define the schema, and let ClickPipes handle the rest. Designed for continuous ingest, it automates pipeline management, ensuring seamless data flow without manual intervention. In this talk, Kunal will demo the Postgres CDC connector for ClickPipes, enabling seamless, native replication of Postgres data to ClickHouse Cloud in just a few clicks—no external tools needed for fast, cost-effective analytics.

Speaker: Kunal Gupta, Sr. Software Engineer @ ClickHouse Kunal Gupta is a Senior Software Engineer at ClickHouse, joining through the acquisition of PeerDB in 2024, where he played a pivotal role as a founding engineer. With several years of experience in architecting scalable systems and real-time applications, Kunal has consistently driven innovation and technical excellence. Previously, he was a founding engineer for new solutions at ICICIdirect and at AsknBid Tech, leading high-impact teams and advancing code analysis, storage solutions, and enterprise software development.

🎤 Session Details: Optimizing Log Management with Clickhouse: Cost-Effective & Scalable Solutions Efficient log management is essential in today's cloud-native environments, yet traditional solutions like ElasticSearch often face scalability issues, high costs, and performance limitations. This talk will begin with an overview of common logging tools and their challenges, followed by an in-depth look at ClickHouse's architecture. We will compare ClickHouse with ElasticSearch, focusing on improvements in query performance, storage efficiency, and overall cost-effectiveness.

A key highlight will be OLX India's migration to ClickHouse, detailing the motivations behind the shift, the migration strategy, key optimizations, and the resulting 50% reduction in log storage costs. By the end of this talk, attendees will gain a clear understanding of when and how to leverage ClickHouse for log management, along with best practices for optimizing performance and reducing operational costs.

Speaker: Pushpender Kumar, DevOps Architect @ OLX India Born and raised in Bijnor, moved to Delhi to stay ahead in the race of life. Currently working as a DevOps Architect at OLX India, specializing in cloud infrastructure, Kubernetes, and automation with over 10 years of experience. Successfully optimized log storage costs by 50% using Clickhouse, bringing scalability and efficiency to large-scale logging systems. Passionate about cloud optimization, DevOps hiring, and performance engineering.

🎤 Session Details: ClickHouse at Physics Wallah: Empowering Real-Time Analytics at Scale This session explores how Physics Wallah revolutionized its real-time analytics capabilities by leveraging ClickHouse. We'll delve into the journey of implementing ClickHouse to efficiently handle large-scale data processing, optimize query performance, and power diverse use cases such as user activity tracking and engagement analysis. By enabling actionable insights and seamless decision-making, this transformation has significantly enhanced the learning experience for millions of users.

Today, more than five customer-facing products at Physics Wallah are powered by ClickHouse, serving over 10 million students and parents, including 1.5 million Daily Active Users. Our in-house ClickHouse cluster, hosted and managed within our EKS infrastructure on AWS Cloud, ingests more than 10 million rows of data daily from various sources. Join us to learn about the architecture, challenges, and key strategies behind this scalable, high-performance analytics solution.

Speaker: Utkarsh G. Srivastava, Software Development Engineer III @ Physics Wallah As a versatile Software Engineer with over 7 years of experience in the IT industry, I have had the privilege of taking on diverse roles, with a primary focus on backend development, data engineering, infrastructure, DevOps, and security. Throughout my career, I have played a pivotal role in transformative projects, consistently striving to craft innovative and effective solutions for customers in the SaaS space.

🎤 Session Details: FabFunnel & ClickHouse: Delivering Real-Time Marketing Analytics We are a performance marketing company that relies on real-time reporting to drive data-driven decisions and maximize campaign effectiveness. As our client base expanded, we encountered significant challenges with our reporting system—frequent data updates meant handling large datasets inefficiently, leading to slow query execution and delays in delivering insights. This bottleneck hindered our ability to provide timely optimizations for ad campaigns. To address these issues, we needed a solution that could handle rapid data ingestion and querying at scale without the overhead of traditional refresh processes. In this talk, we’ll share how we transformed our reporting infrastructure to achieve real-time insights, enhancing speed, scalability, and efficiency in managing large-scale ad performance data.

Speakers: Anmol Jain, SDE-2 (Full stack Developer), & Siddhant Gaba, SDE-2 (Python) @ Idea Clan From competing as a national table tennis player to building high-performance software, Anmol Jain brings a unique mix of strategy and problem-solving to tech. With 3+ years of experience at Idea Clan, they play a key role in scaling Lookfinity and FabFunnel, managing multi-million-dollar ad spends every month. Specializing in ClickHouse, React.js, and Node.js, Anmol focuses on real-time data processing and scalable backend solutions. At this meet-up, they’ll share insights on solving reporting challenges and driving real-time decision-making in performance marketing.

Siddhant Gaba is an SDE II at Idea Clan, with expertise in Python, Java, and C#, specializing in scalable backend systems. With four years of experience working with FastAPI, PostgreSQL, MongoDB, and ClickHouse, he focuses on real-time analytics, database optimization, and distributed systems. Passionate about high-performance computing, asynchronous APIs, and system design, he aims to advance real-time data processing. Outside of work, he enjoys playing volleyball. At this meetup, he will share insights on how ClickHouse transformed real-time reporting and scalability.

🎤 Session Details: From SQL to AI: Building Intelligent Applications with ClickHouse and LangDB As AI becomes a driving force behind innovation, building applications that seamlessly integrate AI capabilities with existing data infrastructures is critical.

In this session, we explore the creation of agentic applications using ClickHouse and LangDB. We will introduce the concept of an AI gateway, explaining its role in connecting powerful AI models with the high-performance analytics engine of ClickHouse. By leveraging LangDB, we demonstrate how to directly interact with AI functions as User-Defined Functions (UDFs) in ClickHouse, enabling developers to design and execute complex AI workflows within SQL.

Additionally, we will showcase how LangDB facilitates deep visibility into AI function behaviors and agent interactions, providing tools to analyze and optimize the performance of AI-driven logic. Finally, we will highlight how ClickHouse, powered by LangDB APIs, can be used to evaluate and refine the quality of LLM responses, ensuring reliable and efficient AI integrations.

Speaker: Matteo Pelati, Co-founder, LangDB.ai Matteo Pelati is a seasoned software engineer with over two decades of experience, specializing in data engineering for the past ten years. He is the co-founder of LangDB, a company based in Singapore building the fastest Open Source AI Gateway. Before founding LangDB, he was part of the early team at DataRobot, where he contributed to scaling their product for enterprise clients. Subsequently, he joined DBS Bank where he built their data platform and team from the ground up. Prior to starting LangDB, Matteo led the data group for Asia Pacific and data engineering at Goldman Sachs.

ClickHouse Delhi/Gurgaon Meetup - March 2025

Welcome to the PyData Berlin March meetup!

We would like to welcome you all starting from 18:45. There will be food and drinks. The talks begin around 19.30 and the doors will close at 19:30. Make sure to arrive on time!

*** Important!! *** Please keep in mind that there is a BVG strike on this day, affecting U-Bahn, trams, and buses. S-Bahn and regional trains will work.

Please provide your first and last name for the registration because this is required for the venue's entry policy. If you cannot attend, please cancel your spot so others are able to join as the space is limited.

Host: Bonial is excited to welcome you to this month's version of PyData. ************************************************************************** The Lineup for the evening

Talk 1: Extract structured product & deal information from PDFs on scale via LLM Abstract: Bonial shows hundreds of thousands of offers from local brick-and-mortar retailers on its platform, a subset of this content is retrieved from PDF files. In this talk I’ll explain how we leverage LLM to parse unstructured PDF files to create content on our platform.

Speaker: Philipp Johannis has been part of Bonial for 12 years. He established and leads the Data Department, which consists of multiple Analytics, Engineering & Data Science teams, and is currently serving as Head of Data. He focuses on improving the data platform and enabling and supporting the development of various data driven products such as personalisation and traffic management.

Talk 2: Airweave, an Open-Source Tool To Turn Any App Into Accessible Agent Knowledge Abstract: The talk will be an introduction to Airweave, which is an open-source Python tool that helps agent developers turn app data into accessible knowledge for AI agents. It connects to any app, database, URL, or API and structures the data for retrieval. Airweave automates authentication, ingestion, enrichment, mapping, and syncing to vector stores and graph databases of choice. It has a search layer for agents out-of-the-box and allows extension of the platform with minimal code. Developers can use Airweave via our web UI, REST API, or SDKs.

Speakers: Lennert Jansen and Rauf Akdemir are the creators of Airweave AI. Lennert is an AI Engineer & Researcher with a background in Applied Statistics and Deep Learning for NLP. Before Airweave, he worked on AI & Bayesian Statistics at Amazon, IBM, and the University of Amsterdam. Rauf is a CS graduate from Technical University of Delft, with strong engineering experience in productionising ML & data infrastructure in both start-ups and enterprise.

Lightning talks There will be slots for 2-3 Lightning Talks (3-5 Minutes for each). Kindly let us know if you would like to present something at the start of the meetup :)

*** NumFOCUS Code of Conduct THE SHORT VERSION Be kind to others. Do not insult or put down others. Behave professionally. Remember that harassment and sexist, racist, or exclusionary jokes are not appropriate for NumFOCUS. All communication should be appropriate for a professional audience including people of many different backgrounds. Sexual language and imagery are not appropriate. NumFOCUS is dedicated to providing a harassment-free community for everyone, regardless of gender, sexual orientation, gender identity, and expression, disability, physical appearance, body size, race, or religion. We do not tolerate harassment of community members in any form. Thank you for helping make this a welcoming, friendly community for all. If you haven't yet, please read the detailed version here: https://numfocus.org/code-of-conduct ***

PyData Berlin 2025 March Meetup

This event has PAID and FREE Passes. More info you may find here - https://lu.ma/7ochmq77 Pre-Registration via lu.ma is REQUIRED.

Time series forecasting is more than just predicting future trends - it’s a critical skill for industries ranging from finance to healthcare, retail, and beyond. Join us for a one-day virtual event packed with expert-led workshops designed to equip you with the latest AI-driven and classical forecasting techniques.

What’s on the agenda?

12.00 pm ET - Talk - Jeff Tackes, Global Head of Forecasting at Kraft Heinz and Hamed Alikhani PhD- Data Scientist at Kraft Heinz - 30 min 12.30 pm ET - Talk - Marco Peixeiro, Applied AI Scientist Nixtla - 30 min 1.00 pm ET - Workshop - John Mount, PhD Principal Consultant, Win Vector LLC - 1 h 2.00 pm ET - Training - Jeffrey Yau, Former Global Head of Data Science and Engineering at Amazon Music - 2 h

Talk#1 details: Topic: Optimizing Forecast Stability and Accuracy

In this talk, we introduce a novel approach leveraging genetic algorithms to optimize both forecast stability and accuracy, creating a dynamically weighted ensemble that balances these competing objectives and delivering better accuracy than any single base model. By incorporating past model performance into our evolutionary framework, we iteratively evolve an ensemble that minimizes large forecast swings while maintaining or improving overall accuracy. We demonstrate how this method systematically adjusts model weights based on historical deviations and performance metrics, solving a key business challenge.

Talk#2 details: Topic: State of Foundation Models For Time Series Forecasting

First, we explore the core concepts of foundation models, such as pretraining, transfer learning and fine-tuning. Second, we take a look at the advantages and disadvantages of foundation models in time series forecasting. While they can speed up the modeling and inference process, they might also not be the best solution for a particular project, meaning that we must still have a certain expertise to use them correctly and compare them with other methods. Then, we explore some of the major contributions to the field, including TimeGPT, Chronos, Moirai and TimesFM. We quickly discover their architectures, their capabilities and their limitations. Finally, we see TimeGPT in action to demonstrate how a foundation model can be used and how it compares to traditional methods.

Training details: Topic: Unlocking the Future with AI-Driven Time Series Forcasting

Time series forecasting is the science of predicting future events based on historical data, a practice with applications that permeate our daily lives. Consider demand and inventory planning, where forecasting enables businesses to anticipate customer needs, ensuring optimal product availability while minimizing costs.

Workshop details: Topic: ​Forecasting the Future Using Time Series

​Time series forecasting remains a specialty topic specializing in "predicting the future". Because of this, you really want to use a package that is tuned for your use case, and specialized to deal with the difficulties inherent in time series forecasting. Speaker will share a simplified problem notation that helps you to survey available solution offerings, and succeed with time series packages in R and Python.

Additionally, with Time Series event Paid Pass you will have Ai+ Premium Annual Subscription - https://hubs.li/H0Zycsf0It will give access to dozens on-demand sessions, Gen AI&LLMs cerification, 5-week AI Bootcamp, extra discounts to attend ODSC conferences and more.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 ODSC blog: https://opendatascience.com/ Facebook: https://www.facebook.com/OPENDATASCI Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science Slack Channel: https://hubs.li/Q02ZkDV90 Code of conduct: https://odsc.com/code-of-conduct/

Virtual event: "Time Series Mastery"

This event has PAID and FREE Passes. More info you may find here - https://lu.ma/7ochmq77 Pre-Registration via lu.ma is REQUIRED.

Time series forecasting is more than just predicting future trends - it’s a critical skill for industries ranging from finance to healthcare, retail, and beyond. Join us for a one-day virtual event packed with expert-led workshops designed to equip you with the latest AI-driven and classical forecasting techniques.

What’s on the agenda?

12.00 pm ET - Talk - Jeff Tackes, Global Head of Forecasting at Kraft Heinz and Hamed Alikhani PhD- Data Scientist at Kraft Heinz - 30 min 12.30 pm ET - Talk - Marco Peixeiro, Applied AI Scientist Nixtla - 30 min 1.00 pm ET - Workshop - John Mount, PhD Principal Consultant, Win Vector LLC - 1 h 2.00 pm ET - Training - Jeffrey Yau, Former Global Head of Data Science and Engineering at Amazon Music - 2 h

Talk#1 details: Topic: Optimizing Forecast Stability and Accuracy

In this talk, we introduce a novel approach leveraging genetic algorithms to optimize both forecast stability and accuracy, creating a dynamically weighted ensemble that balances these competing objectives and delivering better accuracy than any single base model. By incorporating past model performance into our evolutionary framework, we iteratively evolve an ensemble that minimizes large forecast swings while maintaining or improving overall accuracy. We demonstrate how this method systematically adjusts model weights based on historical deviations and performance metrics, solving a key business challenge.

Talk#2 details: Topic: State of Foundation Models For Time Series Forecasting

First, we explore the core concepts of foundation models, such as pretraining, transfer learning and fine-tuning. Second, we take a look at the advantages and disadvantages of foundation models in time series forecasting. While they can speed up the modeling and inference process, they might also not be the best solution for a particular project, meaning that we must still have a certain expertise to use them correctly and compare them with other methods. Then, we explore some of the major contributions to the field, including TimeGPT, Chronos, Moirai and TimesFM. We quickly discover their architectures, their capabilities and their limitations. Finally, we see TimeGPT in action to demonstrate how a foundation model can be used and how it compares to traditional methods.

Training details: Topic: Unlocking the Future with AI-Driven Time Series Forcasting

Time series forecasting is the science of predicting future events based on historical data, a practice with applications that permeate our daily lives. Consider demand and inventory planning, where forecasting enables businesses to anticipate customer needs, ensuring optimal product availability while minimizing costs.

Workshop details: Topic: ​Forecasting the Future Using Time Series

​Time series forecasting remains a specialty topic specializing in "predicting the future". Because of this, you really want to use a package that is tuned for your use case, and specialized to deal with the difficulties inherent in time series forecasting. Speaker will share a simplified problem notation that helps you to survey available solution offerings, and succeed with time series packages in R and Python.

Additionally, with Time Series event Paid Pass you will have Ai+ Premium Annual Subscription - https://hubs.li/H0Zycsf0It will give access to dozens on-demand sessions, Gen AI&LLMs cerification, 5-week AI Bootcamp, extra discounts to attend ODSC conferences and more.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 ODSC blog: https://opendatascience.com/ Facebook: https://www.facebook.com/OPENDATASCI Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science Slack Channel: https://hubs.li/Q02ZkDV90 Code of conduct: https://odsc.com/code-of-conduct/

Virtual event: "Time Series Mastery"

This event has PAID and FREE Passes. More info you may find here - https://lu.ma/7ochmq77 Pre-Registration via lu.ma is REQUIRED.

Time series forecasting is more than just predicting future trends - it’s a critical skill for industries ranging from finance to healthcare, retail, and beyond. Join us for a one-day virtual event packed with expert-led workshops designed to equip you with the latest AI-driven and classical forecasting techniques.

What’s on the agenda?

12.00 pm ET - Talk - Jeff Tackes, Global Head of Forecasting at Kraft Heinz and Hamed Alikhani PhD- Data Scientist at Kraft Heinz - 30 min 12.30 pm ET - Talk - Marco Peixeiro, Applied AI Scientist Nixtla - 30 min 1.00 pm ET - Workshop - John Mount, PhD Principal Consultant, Win Vector LLC - 1 h 2.00 pm ET - Training - Jeffrey Yau, Former Global Head of Data Science and Engineering at Amazon Music - 2 h

Talk#1 details: Topic: Optimizing Forecast Stability and Accuracy

In this talk, we introduce a novel approach leveraging genetic algorithms to optimize both forecast stability and accuracy, creating a dynamically weighted ensemble that balances these competing objectives and delivering better accuracy than any single base model. By incorporating past model performance into our evolutionary framework, we iteratively evolve an ensemble that minimizes large forecast swings while maintaining or improving overall accuracy. We demonstrate how this method systematically adjusts model weights based on historical deviations and performance metrics, solving a key business challenge.

Talk#2 details: Topic: State of Foundation Models For Time Series Forecasting

First, we explore the core concepts of foundation models, such as pretraining, transfer learning and fine-tuning. Second, we take a look at the advantages and disadvantages of foundation models in time series forecasting. While they can speed up the modeling and inference process, they might also not be the best solution for a particular project, meaning that we must still have a certain expertise to use them correctly and compare them with other methods. Then, we explore some of the major contributions to the field, including TimeGPT, Chronos, Moirai and TimesFM. We quickly discover their architectures, their capabilities and their limitations. Finally, we see TimeGPT in action to demonstrate how a foundation model can be used and how it compares to traditional methods.

Training details: Topic: Unlocking the Future with AI-Driven Time Series Forcasting

Time series forecasting is the science of predicting future events based on historical data, a practice with applications that permeate our daily lives. Consider demand and inventory planning, where forecasting enables businesses to anticipate customer needs, ensuring optimal product availability while minimizing costs.

Workshop details: Topic: ​Forecasting the Future Using Time Series

​Time series forecasting remains a specialty topic specializing in "predicting the future". Because of this, you really want to use a package that is tuned for your use case, and specialized to deal with the difficulties inherent in time series forecasting. Speaker will share a simplified problem notation that helps you to survey available solution offerings, and succeed with time series packages in R and Python.

Additionally, with Time Series event Paid Pass you will have Ai+ Premium Annual Subscription - https://hubs.li/H0Zycsf0It will give access to dozens on-demand sessions, Gen AI&LLMs cerification, 5-week AI Bootcamp, extra discounts to attend ODSC conferences and more.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 ODSC blog: https://opendatascience.com/ Facebook: https://www.facebook.com/OPENDATASCI Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science Slack Channel: https://hubs.li/Q02ZkDV90 Code of conduct: https://odsc.com/code-of-conduct/

Virtual event: "Time Series Mastery"

This event has PAID and FREE Passes. More info you may find here - https://lu.ma/7ochmq77 Pre-Registration via lu.ma is REQUIRED.

Time series forecasting is more than just predicting future trends - it’s a critical skill for industries ranging from finance to healthcare, retail, and beyond. Join us for a one-day virtual event packed with expert-led workshops designed to equip you with the latest AI-driven and classical forecasting techniques.

What’s on the agenda?

12.00 pm ET - Talk - Jeff Tackes, Global Head of Forecasting at Kraft Heinz and Hamed Alikhani PhD- Data Scientist at Kraft Heinz - 30 min 12.30 pm ET - Talk - Marco Peixeiro, Applied AI Scientist Nixtla - 30 min 1.00 pm ET - Workshop - John Mount, PhD Principal Consultant, Win Vector LLC - 1 h 2.00 pm ET - Training - Jeffrey Yau, Former Global Head of Data Science and Engineering at Amazon Music - 2 h

Talk#1 details: Topic: Optimizing Forecast Stability and Accuracy

In this talk, we introduce a novel approach leveraging genetic algorithms to optimize both forecast stability and accuracy, creating a dynamically weighted ensemble that balances these competing objectives and delivering better accuracy than any single base model. By incorporating past model performance into our evolutionary framework, we iteratively evolve an ensemble that minimizes large forecast swings while maintaining or improving overall accuracy. We demonstrate how this method systematically adjusts model weights based on historical deviations and performance metrics, solving a key business challenge.

Talk#2 details: Topic: State of Foundation Models For Time Series Forecasting

First, we explore the core concepts of foundation models, such as pretraining, transfer learning and fine-tuning. Second, we take a look at the advantages and disadvantages of foundation models in time series forecasting. While they can speed up the modeling and inference process, they might also not be the best solution for a particular project, meaning that we must still have a certain expertise to use them correctly and compare them with other methods. Then, we explore some of the major contributions to the field, including TimeGPT, Chronos, Moirai and TimesFM. We quickly discover their architectures, their capabilities and their limitations. Finally, we see TimeGPT in action to demonstrate how a foundation model can be used and how it compares to traditional methods.

Training details: Topic: Unlocking the Future with AI-Driven Time Series Forcasting

Time series forecasting is the science of predicting future events based on historical data, a practice with applications that permeate our daily lives. Consider demand and inventory planning, where forecasting enables businesses to anticipate customer needs, ensuring optimal product availability while minimizing costs.

Workshop details: Topic: ​Forecasting the Future Using Time Series

​Time series forecasting remains a specialty topic specializing in "predicting the future". Because of this, you really want to use a package that is tuned for your use case, and specialized to deal with the difficulties inherent in time series forecasting. Speaker will share a simplified problem notation that helps you to survey available solution offerings, and succeed with time series packages in R and Python.

Additionally, with Time Series event Paid Pass you will have Ai+ Premium Annual Subscription - https://hubs.li/H0Zycsf0It will give access to dozens on-demand sessions, Gen AI&LLMs cerification, 5-week AI Bootcamp, extra discounts to attend ODSC conferences and more.

ODSC Links: • Get free access to more talks/trainings like this at Ai+ Training platform: https://hubs.li/H0Zycsf0 ODSC blog: https://opendatascience.com/ Facebook: https://www.facebook.com/OPENDATASCI Twitter: https://twitter.com/_ODSC & @odsc • LinkedIn: https://www.linkedin.com/company/open-data-science Slack Channel: https://hubs.li/Q02ZkDV90 Code of conduct: https://odsc.com/code-of-conduct/

Virtual event: "Time Series Mastery"