Data Science

Practical Statistics for Data Scientists, 3rd Edition

2026-08-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Andrew Bruce , Peter Bruce , Peter Gedeck

Python data data-science data-science-tasks statistics

Statistical methods are a key part of data science, yet few data scientists have formal statistical training. Courses and books on basic statistics rarely cover the topic from a data science perspective. And many data science resources incorporate statistical methods but lack a deeper statistical perspective. If you're familiar with the R or Python programming languages and have some exposure to statistics, this quick reference bridges the gap in an accessible, readable format.

High Performance Spark, 2nd Edition

2026-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Rachel Warren , Holden Karau (Fight Health Insurance) , Adi Polak (Treeverse)

AI/ML Kubernetes PySpark PyTorch Spark apache-spark data data-engineering

Apache Spark is amazing when everything clicks. But if you haven't seen the performance improvements you expected or still don't feel confident enough to use Spark in production, this practical book is for you. Authors Holden Karau, Rachel Warren, and Anya Bida walk you through the secrets of the Spark code base, and demonstrate performance optimizations that will help your data pipelines run faster, scale to larger datasets, and avoid costly antipatterns. Ideal for data engineers, software engineers, data scientists, and system administrators, the second edition of High Performance Spark presents new use cases, code examples, and best practices for Spark 3.x and beyond. This book gives you a fresh perspective on this continually evolving framework and shows you how to work around bumps on your Spark and PySpark journey. With this book, you'll learn how to: Accelerate your ML workflows with integrations including PyTorch Handle key skew and take advantage of Spark's new dynamic partitioning Make your code reliable with scalable testing and validation techniques Make Spark high performance Deploy Spark on Kubernetes and similar environments Take advantage of GPU acceleration with RAPIDS and resource profiles Get your Spark jobs to run faster Use Spark to productionize exploratory data science projects Handle even larger datasets with Spark Gain faster insights by reducing pipeline running times

Magic Quadrant Power Session: Analytics and BI and Data Science and Machine Learning Platforms

2026-03-09 · gartner-data-analytics-us-2026

talk

by Kjell Carlsson (Gartner)

AI/ML Analytics BI

Analytics and BI platforms and data Science and machine learning platforms are important technologies that drive insight-driven decision making and allow AI systems to be built and operationalized throughout the enterprise. This session unpacks the Magic Quadrants of both markets and gives inisght on the trends that you should be aware of.

ML and Generative AI in the Data Lakehouse

2026-01-25 · O'Reilly AI & ML Books O'Reilly Amazon

book

by Bennie Haelen

AI/ML Data Engineering Data Lakehouse Databricks GenAI LLM ai-ml artificial-intelligence-ai data generative-ai

In today's race to harness generative AI, many teams struggle to integrate these advanced tools into their business systems. While platforms like GPT-4 and Google's Gemini are powerful, they aren't always tailored to specific business needs. This book offers a practical guide to building scalable, customized AI solutions using the full potential of data lakehouse architecture. Author Bennie Haelen covers everything from deploying ML and GenAI models in Databricks to optimizing performance with best practices. In this must-read for data professionals, you'll gain the tools to unlock the power of large language models (LLMs) by seamlessly combining data engineering and data science to create impactful solutions. Learn to build, deploy, and monitor ML and GenAI models on a data lakehouse architecture using Databricks Leverage LLMs to extract deeper, actionable insights from your business data residing in lakehouses Discover how to integrate traditional ML and GenAI models for customized, scalable solutions Utilize open source models to control costs while maintaining model performance and efficiency Implement best practices for optimizing ML and GenAI models within the Databricks platform

Learn Data Science Using SAS Studio : From Clicks to Code

2026-01-01 · O'Reilly Data Science Books O'Reilly Amazon

book

by Engy Fouda

Analytics Marketing Python SAS analytics-platforms data data-science

Do you want to create data analysis reports without writing a line of code? This book introduces SAS Studio, a free, web-based data science product for educational and non-commercial purposes. The power of SAS Studio lies in its visual, point-and-click user interface, which generates SAS code. It is easier to learn SAS Studio than to learn R and Python to accomplish data cleaning, statistics, and visualization tasks. The book includes a case study analyzing the data required to predict the results of presidential elections in the state of Maine for 2016 and 2020. In addition to the presidential elections, the book provides real-life examples, including analyses of stock, oil, and gold prices, crime, marketing, and healthcare. You will see data science in action and how easily it can be performed using complicated tasks and visualizations in SAS Studio. You will learn, step by step, how to perform visualizations, including creating maps. In most cases, you will not need a line of code as you work with the SAS Studio graphical user interface. The book includes explanations of the code that SAS Studio generates automatically. You will learn how to edit this code to perform more complicated advanced tasks. What You Will Learn Become familiar with the SAS Studio IDE. How to create essential visualizations. Know the fundamental statistical analysis required in most data science and analytics reports. Clean the most common dataset problems Learn linear and logistic regression for data prediction and analysis. Write programs in SAS. How to analyze data and get insights from it for decision-making. Learn character, numeric, date, time, and datetime functions and typecasting. Who This Book Is For A general audience of people who are new to data science, students, and data analysts and scientists who are new to SAS. No prior programming or statistical knowledge is required.

Bioinformatics with Python Cookbook - Fourth Edition

2025-12-19 · O'Reilly Data Science Books O'Reilly Amazon

book

by Shane Brubaker

AI/ML Cloud Computing Python bioinformatics data data-science data-science-domains

Bioinformatics with Python Cookbook provides a practical, hands-on approach to solving computational biology challenges with Python, enabling readers to analyze sequencing data, leverage AI for bioinformatics applications, and design robust computational pipelines. What this Book will help me do Perform comprehensive sequence analysis using Python libraries for refined data interpretation. Configure and run bioinformatics workflows on cloud environments for scalable solutions. Apply advanced data science practices to analyze and visualize bioinformatics data. Explore the integration of AI tools in processing multimodal biological datasets. Understand and utilize bioinformatics databases for research and development. Author(s) Shane Brubaker is an experienced computational biologist and software developer with a strong background in bioinformatics and Python programming. With years of experience in data analysis and software engineering, Shane has authored numerous solutions for real-world bioinformatics issues. He brings a practical, example-driven teaching approach, aimed at empowering readers to apply techniques effectively in their work. Who is it for? This book is suitable for bioinformatics professionals, data scientists, and software engineers with moderate experience seeking to expand their computational biology knowledge. Readers should have basic understanding of biology, programming, and cloud tools. By engaging with this book, learners can advance their skills in Python and bioinformatics to address complex biological data challenges effectively.

How AI Is Transforming Data Careers — A Panel Discussion

2025-12-10 · PyData Boston 2025 Watch

talk

by Chuxin Liu , Gayathri Ramanathan

AI/ML

AI is transforming data careers. Roles once centered on modeling and feature engineering are evolving into positions that involve building AI products, crafting prompts, and managing workflows shaped by automation and augmentation. In this panel discussion, ambassadors from Women in Data Science (WiDS) share how they have adapted through this shift—turning personal experiments into company practices, navigating uncertainty, and redefining their professional identities. They’ll also discuss how to future-proof your career by integrating AI into your daily work and career growth strategy. Attendees will leave with a clearer view of how AI is reshaping data careers and practical ideas for how to evolve their own skills, direction, and confidence in an era where AI is not replacing, but redefining, human expertise.

Building a small end-to-end product with AI: personal learnings and experiences :)

2025-12-10 · 🎄 PyData Berlin 2025 December Meetup 🎄

talk

by Jean Carlo Machado (GetYourGuide)

AI/ML Data Collection

In this talk, I will walk through how building data products is evolving with modern AI development tools. I’ll take you through a small end-to-end product I built in my free time—covering everything from design, to frontend development, to data collection, and ultimately to building data science components. Here is the link to the project https://stateoftheartwithai.com/

Accelerating Geospatial Analysis with GPUs

2025-12-10 · PyData Boston 2025 Watch

talk

by Jaya Venkatesh , Jacob Tomlinson , Naty Clementi

AI/ML Cloud Computing

Geospatial analysis often relies on raster data, n‑dimensional arrays where each cell holds a spatial measurement. Many raster operations, such as computing indices, statistical analysis, and classification, are naturally parallelizable and ideal for GPU acceleration.

This talk demonstrates an end‑to‑end GPU‑accelerated semantic segmentation pipeline for classifying satellite imagery into multiple land cover types. Starting with cloud-hosted imagery, we will process data in chunks, compute features, train a machine learning model, and run large-scale predictions. This process is accelerated with the open-source RAPIDS ecosystem, including Xarray, cuML, and Dask, often requiring only minor changes to familiar data science workflows.

Attendees who work with raster data or other parallelizable, computationally intensive workflows will benefit most from this talk, which focuses on GPU acceleration techniques. While the talk draws from geospatial analysis, key geospatial concepts will be introduced for beginners. The methods demonstrated can be applied broadly across domains to accelerate large-scale data processing.

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines

2025-12-09 · PyData Boston 2025

talk

by Dawn Wages

AI/ML ETL/ELT Spark

Most data science projects start with a simple notebook—a spark of curiosity, some exploration, and a handful of promising results. But what happens when that experiment needs to grow up and go into production?

This talk follows the story of a single machine learning exploration that matures into a full-fledged ETL pipeline. We’ll walk through the practical steps and real-world challenges that come up when moving from a Jupyter notebook to something robust enough for daily use.

We’ll cover how to:

Set clear objectives and document the process from the beginning
Break messy notebook logic into modular, reusable components
Choose the right tools (Papermill, nbconvert, shell scripts) based on your workflow—not just the hype
Track environments and dependencies to make sure your project runs tomorrow the way it did today
Handle data integrity, schema changes, and even evolving labels as your datasets shift over time

And as a bonus: bring your results to life with interactive visualizations using tools like PyScript, Voila, and Panel + HoloViz

Planning Hockey Careers With Python

2025-12-09 · PyData Eindhoven 2025 Watch

talk

by Jaroslav Bezdek

Python

How can data science help young athletes navigate their careers? In this talk, I’ll share my experience building a career path planner for aspiring ice hockey players. The project combines player performance data, career path patterns, and predictive modeling to suggest possible development paths and milestones. Along the way, I’ll discuss the challenges of messy sports data and communicating insights in a way that resonates with non-technical users like coaches, parents, and players.

From €1M License to In-House Success: How We Built a Real-Time Recommendation System and Saved Millions Doing It

2025-12-09 · PyData Eindhoven 2025 Watch

talk

by ALI KOHAN

Cloud Computing

When we at Bol decided to personalize campaign banners, we did what many companies do: bought an expensive solution. As a software engineering team with zero data science experience, we integrated a third-party recommender system for €1 million annually, built the cloud infrastructure, and waited for results. After our first season, the data told a harsh truth—the third-party tool wasn't delivering value proportional to its cost. We faced a crossroads: accept mediocrity or build our own solution from scratch, tailored to our requirements and architecture. We'll walk you through our journey of building a more intelligent and flexible recommendation system from the ground up, and how this journey saved us over a million euros per year. We will share the incremental steps that shaped our journey, alongside the valuable lessons learned along the way

Joshua Starmer on StatQuest, Storytelling, Next-Gen Learning & His Iconic BAMs

2025-12-05 · Future of Data and AI Listen

podcast_episode

by Joshua Starmer (StatQuest)

AI/ML Spark

Before StatQuest became the go-to learning companion for millions of AI and ML practitioners… Before the “BAM! Double BAM! Triple BAM!” became a teaching tool that many learners adore...

There was just one guy in a genetics lab, trying desperately to explain his data analysis to coworkers so they didn't think he was working magic.

In this deeply personal and inspiring episode, Joshua Starmer (CEO & Founder | StatQuest) shares the real story behind his rise — a journey shaped by strategy, struggle, blunt feedback, and a relentless desire to make complicated ideas simple.

What you’ll discover: 🔹How Josh went from helping colleagues in a genetics lab to becoming a renowned educator, treasuring his first 9 views and 2 subscribers as a big win. 🔹How early feedback Josh received as a kid became a quiet spark — motivating him to improve how he explained things and ultimately shaping the teaching style millions now rely on. 🔹How his method for breaking down complex topics with unique tools like his iconic BAM! help make learning lighter and less intimidating. 🔹His thoughts on AI tutors, avatars, and interactive learning and how ethics, bias, and hallucinations relate to next-gen learning.

This is more than a conversation about statistics, data science, AI, education, or YouTube. It’s the story of a researcher who never imagined starting a learning platform, yet became one of the most trusted teachers in statistics and machine learning—turning frustration into clarity, confusion into curiosity, and small beginnings into a massive global impact.

📌 If you’ve ever struggled with PCA, logistic regression, K-means clustering, neural networks, or any tricky stats and ML concepts… chances are StatQuest made it click. Now, hear from the creator himself about what goes on behind the scenes. Now you’ll finally understand how he made it click.

🔹A must-listen for: AI/ML learners, data scientists, educators, content creators, self-taught enthusiasts, and anyone who’s faced the fear of “I’m not good at explaining things.”Prepare to walk away inspired — and with a renewed belief that clarity is a superpower anyone can learn.

Mentoring session with Olivia Maly

2025-11-26 · Speed Mentoring Event

mentoring session

by Olivia Maly

Analytics leadership

Data Science Leader

#333 Creating an AI-First Data Team with Bilal Zia, Head of Data Science & Analytics at DuoLingo

2025-11-24 · DataFramed Listen

podcast_episode

by Bilal Zia (Duolingo) , Richie (DataCamp)

AI/ML Analytics

Data science leadership is about more than just technical expertise—it’s about building trust, embracing AI, and delivering real business impact. As organizations evolve toward AI-first strategies, data teams have an unprecedented opportunity to lead that transformation. But how do you turn a traditional analytics function into an AI-driven powerhouse that drives decision-making across the business? What’s the right structure to balance deep technical specialization with seamless business integration? From building credibility through high-impact forecasting to creating psychological safety around AI adoption, effective data leadership today requires both technical rigor and visionary communication. The landscape is shifting fast, but with the right approach, data science can stand as a true pillar of innovation alongside engineering, product, and design. Bilal Zia is currently the Head of Data Science & Analytics at Duolingo, an EdTech company whose mission is to develop the best education in the world and make it universally available. Previously, he spent two years helping to build and lead an interdisciplinary Central Science team at Amazon, comprising economists, data and applied scientists, survey specialists, user researchers, and engineers. Before that, he spent fifteen years in the Research Department of the World Bank in Washington, D.C., pursuing an applied academic career. He holds a Ph.D. in Economics from the Massachusetts Institute of Technology, and his interests span economics, data science, machine learning/AI, psychology, and user research. In the episode, Richie and Bilal explore rebuilding an underperforming data team, fostering trust with leadership, embedding data scientists within product teams, leveraging AI for productivity, the future of synthetic A/B testing, and much more. Links Mentioned in the Show: DuolingoDuolingo Blog: How machine learning supercharged our revenue by millions of dollarsConnect with BilalAI-Native Course: Intro to AI for WorkRelated Episode: The Future of Data & AI Education Just Arrived with Jonathan Cornelissen & Yusuf SaberRewatch RADAR AI New to DataCamp? Learn on the go using the DataCamp mobile appEmpower your business with world-class data and AI skills with DataCamp for business

Interactive Session: Design patterns for digital engineering in an agentic world

2025-11-20 · Microsoft Ignite 2025

breakout

by Aaron Luk (NVIDIA) , Christoph Berlin (Microsoft)

Data Engineering Microsoft

Explore how open standards and design patterns enable scalable, repeatable digital twins that aggregate heterogeneous operational data onto physically accurate 3D objects. Experience how Microsoft and NVIDIA apply OpenUSD, CloudEvents, OpenTelemetry, and other open standards to bridge data engineering and data science. Learn how these patterns unify real-world data into actionable digital twins—extensible across domains, systems, and protocols.

The Future of IT Ops: AI-Driven Tools You Can Build Today

2025-11-20 · Microsoft Ignite 2025

theater

by Pierre Roman (Microsoft)

AI/ML

Discover how IT and operations teams can harness AI to create custom tools that fit their unique needs. In this 25-minute demo, we’ll show practical ways to integrate AI into workflows, automate tasks, and build adaptive solutions—no data science required. Walk away with actionable insights to transform IT operations today.

Data Meets Art

2025-11-20 · UVA Data Points Listen

podcast_episode

by Nathalie Miebach (School of Data Science, University of Virginia) , Alex Gates (University of Virginia)

Here we explore the intersections of data, art, and storytelling. Our guest, Nathalie Miebach, is an internationally-recognized data artist and the School of Data Science’s inaugural Artist-in-Residence.

Using materials like reed and paper, she transforms complex datasets into woven sculptures and musical scores, inviting us to view and even hear data in new ways. Joining her is Alex Gates, assistant professor of data science at the University of Virginia research examines how patterns of connection shape creativity, innovation, and discovery.

Together, they discuss what happens when data meets art.

Chapters (00:00:01) - Data Points: When Art Meets Science(00:00:46) - Ian and Nicole: Introduction(00:06:18) - How Stories Get Made(00:09:59) - Basket Weaving Visualizing Data(00:20:33) - Wonders of the World(00:25:47) - Data and Artist Residency(00:27:50) - Breaking Habits in Creativity(00:30:06) - What is Data Science: Craftsmanship?(00:34:50) - How Art Affects Our Understanding of Data

AI-Powered Data Science in Fabric: Copilot Hacks You Need to Know

2025-11-19 · Microsoft Ignite 2025

theater

by David Patrick (DSA, Inc.)

AI/ML Analytics BI Microsoft Fabric Power BI

Discover how Copilot is reshaping the data science experience in Microsoft Fabric and Power BI. From transforming raw data into actionable insights to generating stunning visualizations and reports, Copilot brings powerful new capabilities to your fingertips. You'll learn how to unlock Copilot in Fabric, explore what it can do, and ensure it's ready to elevate your analytics game. These tips and tricks will help you harness AI to work smarter, faster, and with more impact.

Info Session on Mentor-Led Data Science Internship Program

2025-11-16 · Info Session on Mentor-Led Data Science Internship Program

talk

by Dr. Murat Baday (Magnimind Academy) , Dr. Yasin Ceran (KAIST)

Information session discussing Magnimind Academy's mentor-led data science internship program.

talk-data.com

Activity Trend

Top Events

Top Speakers

Practical Statistics for Data Scientists, 3rd Edition

High Performance Spark, 2nd Edition

Magic Quadrant Power Session: Analytics and BI and Data Science and Machine Learning Platforms

ML and Generative AI in the Data Lakehouse

Learn Data Science Using SAS Studio : From Clicks to Code

Bioinformatics with Python Cookbook - Fourth Edition

How AI Is Transforming Data Careers — A Panel Discussion

Building a small end-to-end product with AI: personal learnings and experiences :)

Accelerating Geospatial Analysis with GPUs

The Lifecycle of a Jupyter Environment: From Exploration to Production-Grade Pipelines

Planning Hockey Careers With Python

From €1M License to In-House Success: How We Built a Real-Time Recommendation System and Saved Millions Doing It

Joshua Starmer on StatQuest, Storytelling, Next-Gen Learning & His Iconic BAMs

Mentoring session with Olivia Maly

#333 Creating an AI-First Data Team with Bilal Zia, Head of Data Science & Analytics at DuoLingo

Interactive Session: Design patterns for digital engineering in an agentic world

The Future of IT Ops: AI-Driven Tools You Can Build Today

Data Meets Art

AI-Powered Data Science in Fabric: Copilot Hacks You Need to Know

Info Session on Mentor-Led Data Science Internship Program