Data + AI Summit 2025

How an Open, Scalable and Secure Data Platform is Powering Quick Commerce Swiggy's AI

2025-06-10 Watch

talk

Vasan Vembu Srini (Databricks) , Akash Agarwal (Swiggy)

AI/ML Analytics Flink Data Lakehouse Databricks Delta

Swiggy, India's leading quick commerce platform, serves ~13 million users across 653 cities, with 196,000 restaurant partners and 17,000 SKUs. To handle this scale, Swiggy developed a secure, scalable AI platform processing millions of predictions per second. The tech stack includes Apache Kafka for real-time streaming, Apache Spark on Databricks for analytics and ML, and Apache Flink for stream processing. The Lakehouse architecture on Delta ensures data reliability, while Unity Catalog enables centralized access control and auditing. These technologies power critical AI applications like demand forecasting, route optimization, personalized recommendations, predictive delivery SLAs, and generative AI use cases.Key Takeaway:This session explores building a data platform at scale, focusing on cost efficiency, simplicity, and speed, empowering Swiggy to seamlessly support millions of users and AI use cases.

How Data Sharing is Transforming Healthcare: Real World Insights

2025-06-10 Watch

talk

John Wollman (Komodo Health, Inc.) , Mark Lee (Databricks)

Delta

In today’s rapidly evolving healthcare landscape, the ability to securely and efficiently share data is critical to driving better patient outcomes, operational efficiencies, and groundbreaking research. In this session, Komodo Health will explore how Delta sharing unlocks new opportunities across the life sciences ecosystem, with de-identified longitudinal patient data without compromising patient privacy. We will share insights into customers' experiences leveraging de-identified patient data to reduce the burden of disease while improving the overall patient experience. Attendees will learn practical approaches to compliantly share data in life sciences.

Redesigning Kaizen's Cloud Data Lake for the Future

2025-06-10 Watch

talk

Triantafyllos Tsakmakis (Kaizen Gaming) , Nikolaos Michail (Kaizen Gaming)

Cloud Computing Data Lake Delta FinOps Cyber Security

At Kaizen Gaming, data drives our decision-making, but rapid growth exposed inefficiencies in our legacy cloud setup — escalating costs, delayed insights and scalability limits. Operating in 18 countries with 350M daily transactions (1PB+), shared quotas and limited cost transparency hindered efficiency. To address this, we redesigned our cloud architecture with Data Landing Zones, a modular framework that decouples resources, enabling independent scaling and cost accountability. Automation streamlined infrastructure, reduced overhead and enhanced FinOps visibility, while Unity Catalog ensured governance and security. Migration challenges included maintaining stability, managing costs and minimizing latency. A phased approach, Delta Sharing, and DBx Asset Bundles simplified transitions. The result: faster insights, improved cost control and reduced onboarding time, fostering innovation and efficiency. We share our transformation, offering insights for modern cloud optimization.

AI-Powered Marketing Data Management: Solving the Dirty Data Problem with Databricks

2025-06-10 Watch

talk

Steven Kostrzewski (Acxiom) , Ankur Jain (Acxiom)

AI/ML Data Management Databricks Delta GDPR/CCPA Marketing

Marketing teams struggle with ‘dirty data’ — incomplete, inconsistent, and inaccurate information that limits campaign effectiveness and reduces the accuracy of AI agents. Our AI-powered marketing data management platform, built on Databricks, solves this with anomaly detection, ML-driven transformations and the built-in Acxiom Referential Real ID Graph with Data Hygiene.We’ll showcase how Delta Lake, Unity Catalog and Lakeflow Declarative Pipelines power our multi-tenant architecture, enabling secure governance and 75% faster data processing. Our privacy-first design ensures compliance with GDPR, CCPA and HIPAA through role-based access, encryption key management and fine-grained data controls.Join us for a live demo and Q&A, where we’ll share real-world results and lessons learned in building a scalable, AI-driven marketing data solution with Databricks.

Lakeflow Connect: Smarter, Simpler File Ingestion With the Next Generation of Auto Loader

2025-06-10 Watch

talk

Sandip Agarwala (Databricks) , Chavdar Botev (Databricks)

API Cloud Computing Cloud Storage Data Lakehouse Data Quality Databricks

Auto Loader is the definitive tool for ingesting data from cloud storage into your lakehouse. In this session, we’ll unveil new features and best practices that simplify every aspect of cloud storage ingestion. We’ll demo out-of-the-box observability for pipeline health and data quality, walk through improvements for schema management, introduce a series of new data formats and unveil recent strides in Auto Loader performance. Along the way, we’ll provide examples and best practices for optimizing cost and performance. Finally, we’ll introduce a preview of what’s coming next — including a REST API for pushing files directly to Delta, a UI for creating cloud storage pipelines and more. Join us to help shape the future of file ingestion on Databricks.

Machine Learning Model Deployment

2025-06-10

talk

AI/ML Data Lakehouse Databricks Delta Python Scikit-learn

This course is designed to introduce three primary machine learning deployment strategies and illustrate the implementation of each strategy on Databricks. Following an exploration of the fundamentals of model deployment, the course delves into batch inference, offering hands-on demonstrations and labs for utilizing a model in batch inference scenarios, along with considerations for performance optimization. The second part of the course comprehensively covers pipeline deployment, while the final segment focuses on real-time deployment. Participants will engage in hands-on demonstrations and labs, deploying models with Model Serving and utilizing the serving endpoint for real-time inference. By mastering deployment strategies for a variety of use cases, learners will gain the practical skills needed to move machine learning models from experimentation to production. This course shows you how to operationalize AI solutions efficiently, whether it's automating decisions in real-time or integrating intelligent insights into data pipelines. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. common Python libraries for DS/ML like Scikit-Learn, awareness of model deployment strategies) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

2025-06-10 Watch

talk

Scott Haines (Nike)

AI/ML Data Engineering Delta LLM Data Streaming

As data engineering continues to evolve the shift from batch-oriented to streaming-first has become standard across the enterprise. The reality is these changes have been taking shape for the past decade — we just now also happen to be standing on the precipice of true disruption through automation, the likes of which we could only dream about before. Yes, AI Agents and LLMs are already a large part of our daily lives, but we (as data engineers) are ultimately on the frontlines ensuring that the future of AI is powered by consistent, just-in-time data — and Delta Lake is critical to help us get there. This session will provide you with best practices learned the hard way by one of the authors of The Delta Lake Definitive Guide including: Guide to writing generic applications as components Workflow automation tips and tricks Tips and tricks for Delta clustering (liquid, z-order, and classic) Future facing: Leveraging metadata for agentic pipelines and workflow automation

ThredUp’s Journey with Databricks: Modernizing Our Data Infrastructure

2025-06-10 Watch

talk

Aniket Mane (ThredUp Inc.) , Chintan Patel (Thredup)

AI/ML Analytics Data Management Databricks Delta SQL

Building an AI-ready data platform requires strong governance, performance optimization, and seamless adoption of new technologies. At ThredUp, our Databricks journey began with a need for better data management and evolved into a full-scale transformation powering analytics, machine learning, and real-time decision-making. In this session, we’ll cover: Key inflection points: Moving from legacy systems to a modernized Delta Lake foundation Unity Catalog’s impact: Improving governance, access control, and data discovery Best practices for onboarding: Ensuring smooth adoption for engineering and analytics teams What’s next? Serverless SQL and conversational analytics with Genie Whether you’re new to Databricks or scaling an existing platform, you’ll gain practical insights on navigating the transition, avoiding pitfalls, and maximizing AI and data intelligence.

Unlocking Industrial Intelligence with AVEVA and Agnico Eagle

2025-06-10 Watch

talk

Ray Yip (Agnico Eagle Mines Limited) , Bry Dillon (AVEVA)

Data Quality Databricks Delta

Industrial data is the foundation for operational excellence, but sharing and leveraging this data across systems presents significant challenges. Fragmented approaches create delays in decision-making, increase maintenance costs, and erode trust in data quality. This session explores how the partnership between AVEVA and Databricks addresses these issues through CONNECT, which integrates directly with Databricks via Delta Sharing. By accelerating time to value, eliminating data wrangling, ensuring high data quality, and reducing maintenance costs, this solution drives faster, more confident decision-making and greater user adoption. We will showcase how Agnico Eagle Mines—the world’s third-largest gold producer with 10 mines across Canada, Australia, Mexico, and Finland—is leveraging this capability to overcome data intelligence barriers at scale. With this solution, Agnico Eagle is making insights more accessible and actionable across its entire organization.

Ursa: Augment Your Lakehouse With Kafka-Compatible Data Streaming Capabilities

2025-06-10 Watch

talk

Gaurav Saxena (Automotive Industry) , Sijie Guo (StreamNative)

AI/ML API Data Lakehouse Delta GenAI Iceberg

As data architectures evolve to meet the demands of real-time GenAI applications, organizations increasingly need systems that unify streaming and batch processing while maintaining compatibility with existing tools. The Ursa Engine offers a Kafka-API-compatible data streaming engine built on Lakehouse (Iceberg and Delta Lake). Designed to seamlessly integrate with data lakehouse architectures, Ursa extends your lakehouse capabilities by enabling streaming ingestion, transformation and processing — using a Kafka-compatible interface. In this session, we will explore how Ursa Engine augments your existing lakehouses with Kafka-compatible capabilities. Attendees will gain insights into Ursa Engine architecture and real-world use cases of Ursa Engine. Whether you're modernizing legacy systems or building cutting-edge AI-driven applications, discover how Ursa can help you unlock the full potential of your data.

Databricks Data Privacy

2025-06-09

talk

Data Governance Databricks Delta Git PySpark SQL

In this course, you’ll learn how to apply patterns to securely store and delete personal information for data governance and compliance on the Data Intelligence Platform. We’ll cover topics like storing sensitive data appropriately to simplify granting access and processing deletes, processing deletes to ensure compliance with the right to be forgotten, performing data masking, and configuring fine-grained access control to configure appropriate privileges to sensitive data.Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Beginner experience with Lakeflow Declarative Pipelines and streaming workloads.Labs: YesCertification Path: Databricks Certified Data Engineer Professional

Machine Learning Model Development

2025-06-09

talk

AI/ML Data Lakehouse Databricks Delta Python Scikit-learn

In this course, you’ll learn how to develop traditional machine learning models on Databricks. We’ll cover topics like using popular ML libraries, executing common tasks efficiently with AutoML and MLflow, harnessing Databricks' capabilities to track model training, leveraging feature stores for model development, and implementing hyperparameter tuning. Additionally, the course covers AutoML for rapid and low-code model training, ensuring that participants gain practical, real-world skills for streamlined and effective machine learning model development in the Databricks environment. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. common Python libraries for DS/ML like Scikit-Learn, fundamental ML algorithms like regression and classification, model evaluation with common metrics) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Databricks Performance Optimization

2025-06-09

talk

Databricks Delta Git PySpark Spark SQL

In this course, you’ll learn how to optimize workloads and physical layout with Spark and Delta Lake and and analyze the Spark UI to assess performance and debug applications. We’ll cover topics like streaming, liquid clustering, data skipping, caching, photons, and more. Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Labs: Yes Certification Path: Databricks Certified Data Engineer Professional

Databricks Streaming and Lakeflow Declarative Pipelines

2025-06-09

talk

Data Quality Databricks Delta ETL/ELT Git PySpark

In this course, you’ll learn how to Incrementally process data to power analytic insights with Structured Streaming and Auto Loader, and how to apply design patterns for designing workloads to perform ETL on the Data Intelligence Platform with Lakeflow Declarative Pipelines. First, we’ll cover topics including ingesting raw streaming data, enforcing data quality, implementing CDC, and exploring and tuning state information. Then, we’ll cover options to perform a streaming read on a source, requirements for end-to-end fault tolerance, options to perform a streaming write to a sink, and creating an aggregation and watermark on a streaming dataset. Pre-requisites: Ability to perform basic code development tasks using the Databricks workspace (create clusters, run code in notebooks, use basic notebook operations, import repos from git, etc.), intermediate programming experience with SQL and PySpark (extract data from a variety of file formats and data sources, apply a number of common transformations to clean data, reshape and manipulate complex data using advanced built-in functions), intermediate programming experience with Delta Lake (create tables, perform complete and incremental updates, compact files, restore previous versions etc.). Beginner experience with streaming workloads and familiarity with Lakeflow Declarative Pipelines. Labs: No Certification Path: Databricks Certified Data Engineer Professional

Data Ingestion with Lakeflow Connect

2025-06-09 Watch

talk

AI/ML API Cloud Computing Databricks Delta ETL/ELT

In this course, you’ll learn how to have efficient data ingestion with Lakeflow Connect and manage that data. Topics include ingestion with built-in connectors for SaaS applications, databases and file sources, as well as ingestion from cloud object storage, and batch and streaming ingestion. We'll cover the new connector components, setting up the pipeline, validating the source and mapping to the destination for each type of connector. We'll also cover how to ingest data with Batch to Streaming ingestion into Delta tables, using the UI with Auto Loader, automating ETL with Lakeflow Declarative Pipelines or using the API.This will prepare you to deliver the high-quality, timely data required for AI-driven applications by enabling scalable, reliable, and real-time data ingestion pipelines. Whether you're supporting ML model training or powering real-time AI insights, these ingestion workflows form a critical foundation for successful AI implementation.Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.Labs: NoCertification Path: Databricks Certified Data Engineer Associate

Data Modeling Strategies

2025-06-09

talk

Data Lakehouse Data Modelling Databricks Delta DWH dimensional modeling

This course offers a deep dive into designing data models within the Databricks Lakehouse environment, and understanding the data products lifecycle. Participants will learn to align business requirements with data organization and model design leveraging Delta Lake and Unity Catalog for defining data architectures, and techniques for data integration and sharing. Prerequisites: Foundational knowledge equivalent to Databricks Certified Data Engineer Associate and familiarity with many topics covered in Databricks Certified Data Engineer Professional. Experience with: Basic SQL queries and table creation on Databricks Lakehouse architecture fundamentals (medallion layers) Unity Catalog concepts (high-level) [Optional] Familiarity with data warehousing concepts (dimensional modeling, 3NF, etc.) is beneficial but not mandatory. Labs: Yes

Get started with Data Engineering

talk

Data Engineering Databricks Delta DWH SQL

In this course, you will learn basic skills that will allow you to use the Databricks Data Intelligence Platform to perform a simple data engineering workflow and support data warehousing endeavors. You will be given a tour of the workspace and be shown how to work with objects in Databricks such as catalogs, schemas, volumes, tables, compute clusters and notebooks. You will then follow a basic data engineering workflow to perform tasks such as creating and working with tables, ingesting data into Delta Lake, transforming data through the medallion architecture, and using Databricks Workflows to orchestrate data engineering tasks. You’ll also learn how Databricks supports data warehousing needs through the use of Databricks SQL, DLT, and Unity Catalog.

talk-data.com

Top Topics

Top Speakers

How an Open, Scalable and Secure Data Platform is Powering Quick Commerce Swiggy's AI

How Data Sharing is Transforming Healthcare: Real World Insights

Redesigning Kaizen's Cloud Data Lake for the Future

AI-Powered Marketing Data Management: Solving the Dirty Data Problem with Databricks

Lakeflow Connect: Smarter, Simpler File Ingestion With the Next Generation of Auto Loader

Machine Learning Model Deployment

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

ThredUp’s Journey with Databricks: Modernizing Our Data Infrastructure

Unlocking Industrial Intelligence with AVEVA and Agnico Eagle

Ursa: Augment Your Lakehouse With Kafka-Compatible Data Streaming Capabilities

Databricks Data Privacy

Machine Learning Model Development

Databricks Performance Optimization

Databricks Streaming and Lakeflow Declarative Pipelines

Data Ingestion with Lakeflow Connect

Data Modeling Strategies

Get started with Data Engineering