talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

1516

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

1516 activities · Newest first

Who’s the most clutch quarterback in NFL history — Tom Brady, Patrick Mahomes, Aaron Rodgers, or someone completely unexpected? We’ll use Python + Data Science to figure it out.  👉 Try Sphinx for free - https://www.sphinx.ai ⏱️ TIMESTAMPS00:00 - Who’s the most clutch QB? 00:40 - Python + Sphinx AI: analyzing 1M NFL plays 02:00 - Defining “clutch” in football (data-driven approach) 03:15 - “TV Clutch” Top 10 07:50 - Using AI to processes play-by-play data 11:10 - Advanced Clutch Factor 17:00 - Advanced Top 10 24:30 - Build your own analysis 🔗 RESOURCES & LINKS💌 Join 20k+ aspiring data analysts — https://www.datacareerjumpstart.com/newsletter 🎯 Free Training: How to Land Your First Data Job — https://www.datacareerjumpstart.com/training 👩‍💻 Accelerator Program: Data Analytics Accelerator — https://www.datacareerjumpstart.com/daa 💼 Interview Prep Tool: Interview Simulator — https://www.datacareerjumpstart.com/interviewsimulator 📱 CONNECT WITH AVERY🎥 YouTube: @averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com 📱 CONNECT WITH SPHINX🐦Twitter/X - https://x.com/getsphinx 🔗Linkedin - https://www.linkedin.com/company/sphinx-ml/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

What happens when marketing teams spend countless hours on manual campaign analysis while missing critical market opportunities? In this session, discover how AI is transforming marketing from a cost centre into a revenue-driving powerhouse. You'll see how Snowflake's Cortex AI enables marketers to automatically classify campaign assets, analyse multimodal performance data, and generate personalised content at scale—all without waiting for post-campaign analysis. This is marketing analytics reimagined—where AI democratizes data science, accelerates decision-making, and turns every campaign into a learning opportunity that drives immediate business impact.

How data science and the next wave of open-source innovation are closing the €50B efficiency gap in Enterprise AI.

Today, 75% of data science output is lost to fragmented data, scattered tooling, manual workflows, and poor reproducibility. Yet nearly every data scientist relies on scikit-learn — the backbone of modern AI/ML.

We’ll unpack the root causes of inefficiency in enterprise data science — and show how open-source tools are unlocking performance, reproducibility, and strategic autonomy at scale.

Learn how Trade Republic builds its analytical data stack as a modern, real-time Lakehouse with ACID guarantees. Using Debezium for change data capture, we stream database changes and events into our data lake. We leverage Apache Iceberg to ensure interoperability across our analytics platform, powering operational reporting, data science, and executive dashboards.

BlaBlaCar a économisé plus d'1 million d'euros par an en arrêtant l'externalisation de la modération de contenu textuel et en développant « Sphinx », un outil interne conçu sur Vertex AI. Raphaël Berly, chapter lead data science, nous présentera le pourquoi et le comment des embeddings utilisés pour représenter le texte, la mesure de la qualité avec les modèles d'IA générative, ainsi que les principaux enseignements qui ont permis à ce projet de passer de l'idée au déploiement sur les interfaces de millions d'utilisateurs en moins d'un an.

How to do real TDD in data science? A journey from pandas to polars with pelage!

In the world of data, inconsistencies or inaccuracies often presents a major challenge to extract valuable insights. Yet the number of robust tools and practices to address those issues remain limited. Particularly, the practice of TDD remains quite difficult in data science, while it is a standard among classic software development, also because of poorly adapted tools and frameworks.

To address this issue we released Pelage, an open-source Python package to facilitate data exploration and testing, which relies on Polars intuitive syntax and speed. Pelage empowers data scientists and analysts to facilitate data transformation, enhance data quality and improve code clarity.

We will demonstrate, in a test-first approach, how you can use this library in a meaningful data science workflow to gain greater confidence for your data transformations.

See website: https://alixtc.github.io/pelage/

Building Data Science Tools for Sustainable Transformation

The current AI hype, driven by generative AI and particularly large language models, is creating excitement, fear, and inflated expectations. In this keynote, we'll explore geographic & mobility data science tools (such as GeoPandas and MovingPandas) to transform this hype into sustainable and positive development that empowers users.

Abstract: In this talk, Claire will share how she went from a career in data science to founding an AI startup backed by Y Combinator: what are the steps along this path? How do you find "the idea," how do you find a co-founder, how do you get your first clients and funding? How do you get into Y Combinator? She will also share her vision on the next data skills to acquire: how to go from data science to AI engineering, and how to build and evaluate agentic AI.

Optimal Transport in Python: A Practical Introduction with POT

Optimal Transport (OT) is a powerful mathematical framework with applications in machine learning, statistics, and data science. This talk introduces the Python Optimal Transport toolbox (POT), an open-source library designed to efficiently solve OT problems. Attendees will learn the basics of OT, explore real-world use cases, and gain hands-on experience with POT (https://pythonot.github.io/) .

Think you need a fancy degree to start a career in data? Think again. In this episode of Data Career School, Amlan Mohanty breaks down exactly how you can launch a successful data career and land your first job in data analytics, data science, or business intelligence without a traditional degree. Discover how to build in-demand skills, create a portfolio that gets noticed, and land your first data job using practical, actionable strategies. Whether you’re self-taught, switching careers, or just curious about the data field, this episode gives you the perfect roadmap to break into a data career.

At PyData Berlin, community members and industry voices highlighted how AI and data tooling are evolving across knowledge graphs, MLOps, small-model fine-tuning, explainability, and developer advocacy.

  • Igor Kvachenok (Leuphana University / ProKube) combined knowledge graphs with LLMs for structured data extraction in the polymer industry, and noted how MLOps is shifting toward LLM-focused workflows.
  • Selim Nowicki (Distill Labs) introduced a platform that uses knowledge distillation to fine-tune smaller models efficiently, making model specialization faster and more accessible.
  • Gülsah Durmaz (Architect & Developer) shared her transition from architecture to coding, creating Python tools for design automation and volunteering with PyData through PyLadies.
  • Yashasvi Misra (Pure Storage) spoke on explainable AI, stressing accountability and compliance, and shared her perspective as both a data engineer and active Python community organizer.
  • Mehdi Ouazza (MotherDuck) reflected on developer advocacy through video, workshops, and branding, showing how creative communication boosts adoption of open-source tools like DuckDB.

Igor Kvachenok Master’s student in Data Science at Leuphana University of Lüneburg, writing a thesis on LLM-enhanced data extraction for the polymer industry. Builds RDF knowledge graphs from semi-structured documents and works at ProKube on MLOps platforms powered by Kubeflow and Kubernetes.

Connect: https://www.linkedin.com/in/igor-kvachenok/

Selim Nowicki Founder of Distill Labs, a startup making small-model fine-tuning simple and fast with knowledge distillation. Previously led data teams at Berlin startups like Delivery Hero, Trade Republic, and Tier Mobility. Sees parallels between today’s ML tooling and dbt’s impact on analytics.

Connect: https://www.linkedin.com/in/selim-nowicki/

Gülsah Durmaz Architect turned developer, creating Python-based tools for architectural design automation with Rhino and Grasshopper. Active in PyLadies and a volunteer at PyData Berlin, she values the community for networking and learning, and aims to bring ML into architecture workflows.

Connect: https://www.linkedin.com/in/gulsah-durmaz/

Yashasvi (Yashi) Misra Data Engineer at Pure Storage, community organizer with PyLadies India, PyCon India, and Women Techmakers. Advocates for inclusive spaces in tech and speaks on explainable AI, bridging her day-to-day in data engineering with her passion for ethical ML.

Connect: https://www.linkedin.com/in/misrayashasvi/

Mehdi Ouazza Developer Advocate at MotherDuck, formerly a data engineer, now focused on building community and education around DuckDB. Runs popular YouTube channels ("mehdio DataTV" and "MotherDuck") and delivered a hands-on workshop at PyData Berlin. Blends technical clarity with creative storytelling.

Connect: https://www.linkedin.com/in/mehd-io/

In this episode, we talk with Daniel, an astrophysicist turned machine learning engineer and AI ambassador. Daniel shares his journey bridging astronomy and data science, how he leveraged live courses and public knowledge sharing to grow his skills, and his experiences working on cutting-edge radio astronomy projects and AI deployments. He also discusses practical advice for beginners in data and astronomy, and insights on career growth through community and continuous learning.TIMECODES00:00 Lunar eclipse story and Daniel’s astronomy career04:12 Electromagnetic spectrum and MEERKAT data explained10:39 Data analysis and positional cross-correlation challenges15:25 Physics behind radio star detection and observation limits16:35 Radio astronomy’s advantage and machine learning potential20:37 Radio astronomy progress and Daniel’s ML journey26:00 Python tools and experience with ZoomCamps31:26 Intel internship and exploring LLMs41:04 Sharing progress and course projects with orchestration tools44:49 Setting up Airflow 3.0 and building data pipelines47:39 AI startups, training resources, and NVIDIA courses50:20 Student access to education, NVIDIA experience, and beginner astronomy programs57:59 Skills, projects, and career advice for beginners59:19 Starting with data science or engineering1:00:07 Course sponsorship, data tools, and learning resourcesConnect with Daniel Linkedin -   / egbodaniel   Connect with DataTalks.Club: Join the community - https://datatalks.club/slack.htmlSubscribe to our Google calendar to have all our events in your calendar - https://calendar.google.com/calendar/...Check other upcoming events - https://lu.ma/dtc-eventsGitHub: https://github.com/DataTalksClubLinkedIn -   / datatalks-club   Twitter -   / datatalksclub   Website - https://datatalks.club/

Resource Monitoring and Optimization with Metaflow

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

This talk explores how data science helps balance energy systems in the face of demand volatility, generation volatility, and the push for sustainability. We’ll dive into two technical case studies: churn prediction using survival models, and the design of a high-availability real-time trading system on Databricks. These examples illustrate how data can support operational resilience and sustainability efforts in the energy sector.

Ken Jee has spent a decade in sports analytics, working at the intersection of data science and athlete performance. Now, he's building The Exponential Athlete, a podcast dedicated to exploring what makes athletes reach their highest potential. In this show, Ken shares: His 10-year journey in sports analytics and the lessons data can, and can't teach us about performance. How his background in data science set him up to successfully launch The Exponential Athlete. The limits of analytics — why diagnosis is easy, but decision-making is complex. How mental visualization (seeing success before it happens) plays a crucial role in athletic and personal excellence. The intersection of training philosophy, psychology, and data in shaping elite performers. Whether you're passionate about sports, data science, entrepreneurship, or personal growth, this episode offers practical insights you can apply immediately. 🤝 Follow Ken on LinkedIn!   Register for free to be part of the next live session: https://bit.ly/3XB3A8b   Follow us on Socials: LinkedIn YouTube Instagram (Mavens of Data) Instagram (Maven Analytics) TikTok Facebook Medium X/Twitter

This presentation provides an overview of how NVIDIA RAPIDS accelerates data science and data engineering workflows end-to-end. Key topics include leveraging RAPIDS for machine learning, large-scale graph analytics, real-time inference, hyperparameter optimization, and ETL processes. Case studies demonstrate significant performance improvements and cost savings across various industries using RAPIDS for Apache Spark, XGBoost, cuML, and other GPU-accelerated tools. The talk emphasizes the impact of accelerated computing on modern enterprise applications, including LLMs, recommenders, and complex data processing pipelines.

Energy flexibility is playing an increasingly fundamental role in the UK energy market. With the adoption of renewable energy sources such as EVs, solar panels and domestic and commercial batteries, the number of flexible assets is soaring - making aggregation and flexibility trading infinitely more complex and requiring vast amounts of data modelling and forecasting. To address this challenge, Flexitricity adopted MLOps best practices to tackle this complex real-world challenge and meet the needs of the scaling energy demand in the UK. 

The session will cover:

- The complex technical challenge of energy flexibility in 2025.

- The critical requirement to invest in technology and skillsets.

- A real-life view of how machine learning operations (MLOps) scaled Flexitricity’s data science model development.

- How innovations in technology can support and optimise delivering on energy flexibility. 

The audience will gain insight into:

- The challenge of building data science models to keep up with scaling demand.

- How MLOps best practices can be adopted to drive efficiency and increase data science experiments to 10000+ per year.

- Lessons learned from adopting MLOps pipelines.