talk-data.com talk-data.com

Topic

Data Science

machine_learning statistics analytics

235

tagged

Activity Trend

68 peak/qtr
2020-Q1 2026-Q1

Activities

235 activities · Newest first

How AI Is Transforming Data Careers — A Panel Discussion

AI is transforming data careers. Roles once centered on modeling and feature engineering are evolving into positions that involve building AI products, crafting prompts, and managing workflows shaped by automation and augmentation. In this panel discussion, ambassadors from Women in Data Science (WiDS) share how they have adapted through this shift—turning personal experiments into company practices, navigating uncertainty, and redefining their professional identities. They’ll also discuss how to future-proof your career by integrating AI into your daily work and career growth strategy. Attendees will leave with a clearer view of how AI is reshaping data careers and practical ideas for how to evolve their own skills, direction, and confidence in an era where AI is not replacing, but redefining, human expertise.

Accelerating Geospatial Analysis with GPUs

Geospatial analysis often relies on raster data, n‑dimensional arrays where each cell holds a spatial measurement. Many raster operations, such as computing indices, statistical analysis, and classification, are naturally parallelizable and ideal for GPU acceleration.

This talk demonstrates an end‑to‑end GPU‑accelerated semantic segmentation pipeline for classifying satellite imagery into multiple land cover types. Starting with cloud-hosted imagery, we will process data in chunks, compute features, train a machine learning model, and run large-scale predictions. This process is accelerated with the open-source RAPIDS ecosystem, including Xarray, cuML, and Dask, often requiring only minor changes to familiar data science workflows.

Attendees who work with raster data or other parallelizable, computationally intensive workflows will benefit most from this talk, which focuses on GPU acceleration techniques. While the talk draws from geospatial analysis, key geospatial concepts will be introduced for beginners. The methods demonstrated can be applied broadly across domains to accelerate large-scale data processing.

Planning Hockey Careers With Python

How can data science help young athletes navigate their careers? In this talk, I’ll share my experience building a career path planner for aspiring ice hockey players. The project combines player performance data, career path patterns, and predictive modeling to suggest possible development paths and milestones. Along the way, I’ll discuss the challenges of messy sports data and communicating insights in a way that resonates with non-technical users like coaches, parents, and players.

From €1M License to In-House Success: How We Built a Real-Time Recommendation System and Saved Millions Doing It

When we at Bol decided to personalize campaign banners, we did what many companies do: bought an expensive solution. As a software engineering team with zero data science experience, we integrated a third-party recommender system for €1 million annually, built the cloud infrastructure, and waited for results. After our first season, the data told a harsh truth—the third-party tool wasn't delivering value proportional to its cost. We faced a crossroads: accept mediocrity or build our own solution from scratch, tailored to our requirements and architecture. We'll walk you through our journey of building a more intelligent and flexible recommendation system from the ground up, and how this journey saved us over a million euros per year. We will share the incremental steps that shaped our journey, alongside the valuable lessons learned along the way

Powering Personalization with Data Science at Target with Samantha Schumacher

At Target, creating relevant guest experiences at scale takes more than great creative — it takes great data. In this session, we’ll explore how Target’s Data Science team is using first-party data, machine learning, and GenAI to personalize marketing across every touchpoint.

You’ll hear how we’re building intelligence into the content supply chain, turning unified customer signals into actionable insights, and using AI to optimize creative, timing, and messaging — all while navigating a privacy-first landscape. Whether it’s smarter segmentation or real-time decisioning, we’re designing for both scale and speed.

Lessons from the Front Lines of Public Sector Data Science with Martha Norrick

As the Chief Analytics Officer for New York City, I witnessed firsthand how data science and AI can transform public service delivery while navigating the unique challenges of government implementation. This talk will share real-world examples of successful data science initiatives in the government context, from predictive analytics for fire department risk modeling to machine learning models that improve social service targeting.

However, government data science isn't just about technical skill—it's about accountability, equity, and transparency. I'll discuss critical pitfalls including algorithmic bias, privacy concerns, and the importance of explainable AI in public decision-making.

We'll explore how traditional data science skills must be adapted for the public sector context, where stakeholders include not just internal teams but taxpayers, elected officials, and community advocates.

Whether you're a data scientist considering public service or a government professional seeking to leverage analytics, this session will provide practical insights into building data capacity that serves the public interest while maintaining democratic values and citizen trust.

Bridging Accessibility and AI: Sign Language Recognition & Inclusive Design with Sheida Rashidi

As AI continues to shape human-computer interaction, there’s a growing opportunity and responsibility to ensure these technologies serve everyone, including people with communication disabilities. In this talk, I will present my ongoing work in developing a real-time American Sign Language (ASL) recognition system, and explore how integrating accessible design principles into AI research can expand both usability and impact.

The core of the talk will cover the Sign Language Recogniser project (available on GitHub), in which I used MediaPipe Studio together with TensorFlow, Keras, and OpenCV to train a model that classifies ASL letters from hand-tracking features.

I’ll share the methodology: data collection, feature extraction via MediaPipe, model training, and demo/testing results. I’ll also discuss challenges encountered, such as dealing with gesture variability, lighting and camera differences, latency constraints, and model generalization.

Beyond the technical implementation, I’ll reflect on the broader implications: how accessibility-focused AI projects can promote inclusion, how design decisions affect trust and usability, and how women in AI & data science can lead innovation that is both rigorous and socially meaningful. Attendees will leave with actionable insights for building inclusive AI systems, especially in domains involving rich human modalities such as gesture or sign.

The Elephant in the room between data collection and data science with Katya Kovalenko

Whether you call it wrangling, cleaning, or preprocessing, data prep is often the most expensive and time-consuming part of the analytical pipeline. It may involve converting data into machine-readable formats, integrating across many datasets or outlier detection, and it can be a large source of error if done manually. Lack of machine-readable or integrated data limits connectivity across fields and data accessibility, sharing, and reuse, becoming a significant contributor to research waste.

For students, it is perhaps the greatest barrier to adopting quantitative tools and advancing their coding and analytical skills. AI tools are available for automating the cleanup and integration, but due to the one-of-a-kind nature of these problems, these approaches still require extensive human collaboration and testing. I review some of the common challenges in data cleanup and integration, approaches for understanding dataset structures, and strategies for developing and testing workflows.

Accessible by Design: Redefining AI Inclusion with Valerie Lockhart

AI has the potential to transform learning, work, and daily life for millions of people, but only if we design with accessibility at the core. Too often, disabled people are underrepresented in datasets, creating systemic barriers that ripple through models and applications. This talk explores how data scientists and technologists can mitigate bias, from building synthetic datasets to fine-tuning LLMs on accessibility-focused corpora. We’ll look at opportunities in multimodal AI: voice, gesture, AR/VR, and even brain-computer interfaces, that open new pathways for inclusion. Beyond accuracy, we’ll discuss evaluation metrics that measure usability, comprehension, and inclusion, and why testing with humans is essential to closing the gap between model performance and lived experience. Attendees will leave with three tangible ways to integrate accessibility into their own work through datasets, open-source tools, and collaborations. Accessibility is not just an ethical mandate, it’s a driver of innovation, and it begins with thoughtful, human-centered data science.

Redefining Marketing Measurement in the Era of Open-Source Innovation with Koel Ghosh

In a rapidly evolving advertising landscape where data, technology, and methodology converge, the pursuit of rigorous yet actionable marketing measurement is more critical—and complex—than ever. This talk will showcase how modern marketers and applied data scientists employ advanced measurement approaches—such as Marketing Mix Modeling (frequentist and Bayesian) and robust experimental designs, including randomized control trials and synthetic control-based counterfactuals—to drive causal inference in advertising effectiveness for meaningful business impact.

The talk will also address emergent aspects of applied marketing science- namely open-source methodologies, digital commerce platforms and artificial intelligence usage. Innovations from industry giants like Google and Meta, as well as open-source communities exemplified by PyMC-Marketing, have democratized access to advancement in methodologies. The emergence of digital commerce platforms such as Amazon and Walmart and the rich data they bring forward is transforming how customer journeys and campaign effectiveness are measured across channels. Artificial Intelligence is accelerating every facet of the data science workflow, streamlining processes like coding, modeling, and rapid prototyping (“vibe coding”) to enabling the integration of neural networks and deep learning techniques into traditional MMM toolkits. Collectively, these provide new and easy ways of quick experimentation and learning of complex nonlinear dynamics and hidden patterns in marketing data

Bringing these threads together, the talk will show how Ovative Group—a media and marketing technology firm—integrates domain expertise, open-source solutions, strategic partnerships, and AI automation into comprehensive measurement solutions. Attendees will gain practical insights on bridging academic rigor with business relevance, empowering careers in applied data science, and helping organizations turn marketing analytics into clear, actionable strategies.

Operationalizing Responsible AI and Data Science in Healthcare with Nasibeh Zanirani Farahani

As healthcare organizations accelerate their adoption of AI and data-driven systems, the challenge lies not only in innovation but in responsibly scaling these technologies within clinical and operational workflows. This session examines the technical and governance frameworks required to translate AI research into reliable and compliant real-world applications. We will explore best practices in model lifecycle management, data quality assurance, bias detection, regulatory alignment, and human-in-the-loop validation, grounded in lessons from implementing AI solutions across complex healthcare environments. Emphasizing cross-functional collaboration among clinicians, data scientists, and business leaders, the session highlights how to balance technical rigor with clinical relevance and ethical accountability. Attendees will gain actionable insights into building trustworthy AI pipelines, integrating MLOps principles in regulated settings, and delivering measurable improvements in patient care, efficiency, and organizational learning.

Unleash the power of dbt on Google Cloud: BigQuery, Iceberg, DataFrames and beyond

The data world has long been divided, with data engineers and data scientists working in silos. This fragmentation creates a long, difficult journey from raw data to machine learning models. We've unified these worlds through the Google Cloud and dbt partnership. In this session, we'll show you an end-to-end workflow that simplifies data to AI journey. The availability of dbt Cloud on Google Cloud Marketplace streamlines getting started, and its integration with BigQuery's new Apache Iceberg tables creates an open foundation. We'll also highlight how BigQuery DataFrames' integration with dbt Python models lets you perform complex data science at scale, all within a single, streamlined process. Join us to learn how to build a unified data and AI platform with dbt on Google Cloud.

How to do real TDD in data science? A journey from pandas to polars with pelage!

In the world of data, inconsistencies or inaccuracies often presents a major challenge to extract valuable insights. Yet the number of robust tools and practices to address those issues remain limited. Particularly, the practice of TDD remains quite difficult in data science, while it is a standard among classic software development, also because of poorly adapted tools and frameworks.

To address this issue we released Pelage, an open-source Python package to facilitate data exploration and testing, which relies on Polars intuitive syntax and speed. Pelage empowers data scientists and analysts to facilitate data transformation, enhance data quality and improve code clarity.

We will demonstrate, in a test-first approach, how you can use this library in a meaningful data science workflow to gain greater confidence for your data transformations.

See website: https://alixtc.github.io/pelage/

Building Data Science Tools for Sustainable Transformation

The current AI hype, driven by generative AI and particularly large language models, is creating excitement, fear, and inflated expectations. In this keynote, we'll explore geographic & mobility data science tools (such as GeoPandas and MovingPandas) to transform this hype into sustainable and positive development that empowers users.

Optimal Transport in Python: A Practical Introduction with POT

Optimal Transport (OT) is a powerful mathematical framework with applications in machine learning, statistics, and data science. This talk introduces the Python Optimal Transport toolbox (POT), an open-source library designed to efficiently solve OT problems. Attendees will learn the basics of OT, explore real-world use cases, and gain hands-on experience with POT (https://pythonot.github.io/) .

Resource Monitoring and Optimization with Metaflow

Metaflow is a powerful workflow management framework for data science, but optimizing its cloud resource usage still involves guesswork. We have extended Metaflow with a lightweight resource tracking tool that automatically monitors CPU, memory, GPU, and more, then recommends the most cost-effective cloud instance type for future runs. A single line of code can save you from overprovisioned costs or painful job failures!

What Works: Practical Lessons in Applying Privacy-Enhancing Technologies (PET) in Data Science

Privacy-Enhancing Technologies (PETs) promise to bridge the gap between data utility and privacy — but how do they perform in practice? In this talk, we’ll share real-world insights from our hands-on experience testing and implementing leading PET solutions across various data science use cases. We explored tools such as differential privacy libraries, homomorphic encryption frameworks, federated learning, multi-party computation, etc. Some lived up to their promise — others revealed critical limitations. You’ll walk away with a clear understanding of which PET solutions work best for which types of data and analysis, what trade-offs to expect, and how to set realistic goals when integrating PETs into your workflows. This session is ideal for data professionals and decision-makers who are navigating privacy risks while still wanting to innovate responsibly.

Data science in containers: the good, the bad, and the ugly

If we want to run data science workloads (e.g. using Tensorflow, PyTorch, and others) in containers (for local development or production on Kubernetes), we need to build container images. Doing that with a Dockerfile is fairly straightforward, but is it the best method? In this talk, we'll take a well-known speech-to-text model (Whisper) and show various ways to run it in containers, comparing the outcomes in terms of image size and build time.

Narwhals: enabling universal dataframe support

Ever tried passing a Polars Dataframe to a data science library and found that it...just works? No errors, no panics, no noticeable overhead, just...results? This is becoming increasingly common in 2025, yet only 2 years ago, it was mostly unheard of. So, what changed? A large part of the answer is: Narwhals.

Narwhals is a lightweight compatibility layer between dataframe libraries which lets your code work seamlessly across Polars, pandas, PySpark, DuckDB, and more! And it's not just a theoretical possibility: with ~30 million monthly downloads and set as a required dependency of Altair, Bokeh, Marimo, Plotly, Shiny, and more, it's clear that it's reshaping the data science landscape. By the end of the talk, you'll understand why writing generic dataframe code was such a headache (and why it isn't anymore), how Narwhals works and how its community operates, and how you can use it in your projects today. The talk will be technical yet accessible and light-hearted.

PyData 2077: a data science future retrospective

From: Chrono-Regulatory Commission, Temporal Enforcement Division To: PyData Berlin Organising Committee Subject: Citation #TMP-2077-091 - Unauthorised Spacetime Disturbance

Dear Committee, Our temporal monitoring systems have detected an unauthorised chronological anomaly emanating from your facility (Berliner Congress Center, coordinates 52.52068°N, 13.416451°E) scheduled to manifest on September 1st at 9:20 a.m.