Data + AI Summit 2025

Gen AI Evaluation and Governance

2025-06-10

talk

AI/ML Databricks GenAI RAG Cyber Security Vector DB

This course introduces learners to evaluating and governing GenAI (generative artificial intelligence) systems. First, learners will explore the meaning behind and motivation for building evaluation and governance/security systems. Next, the course will connect evaluation and governance systems to the Databricks Data Intelligence Platform. Third, learners will be introduced to a variety of evaluation techniques for specific components and types of applications. Finally, the course will conclude with an analysis of evaluating entire AI systems with respect to performance and cost. Pre-requisites: Familiarity with prompt engineering, and experience with the Databricks Data Intelligence Platform. Additionally, knowledge of retrieval-augmented generation (RAG) techniques including data preparation, embeddings, vectors, and vector databases Labs: Yes Certification Path: Databricks Certified Generative AI Engineer Associate

Getting Started With Lakeflow Connect

2025-06-10 Watch

talk

Peter Pogorski (Databricks) , Giselle Goicochea (Databricks)

AI/ML Analytics CI/CD Databricks Google Analytics postgresql

Hundreds of customers are already ingesting data with Lakeflow Connect from SQL Server, Salesforce, ServiceNow, Google Analytics, SharePoint, PostgreSQL and more to unlock the full power of their data. Lakeflow Connect introduces built-in, no-code ingestion connectors from SaaS applications, databases and file sources to help unlock data intelligence. In this demo-packed session, you’ll learn how to ingest ready-to-use data for analytics and AI with a few clicks in the UI or a few lines of code. We’ll also demonstrate how Lakeflow Connect is fully integrated with the Databricks Data Intelligence Platform for built-in governance, observability, CI/CD, automated pipeline maintenance and more. Finally, we’ll explain how to use Lakeflow Connect in combination with downstream analytics and AI tools to tackle common business challenges and drive business impact.

Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions

2025-06-10 Watch

talk

Dane Corneil (NVIDIA) , Yev Meyer (NVIDIA)

AI/ML Data Quality Databricks LLM

A big challenge in LLM development and synthetic data generation is ensuring data quality and diversity. While data incorporating varied perspectives and reasoning traces consistently improves model performance, procuring such data remains impossible for most enterprises. Human-annotated data struggles to scale, while purely LLM-based generation often suffers from distribution clipping and low entropy. In a novel compound AI approach, we combine LLMs with probabilistic graphical models and other tools to generate synthetic personas grounded in real demographic statistics. The approach allows us to address major limitations in bias, licensing, and persona skew of existing methods. We release the first open-source dataset aligned with real-world distributions and show how enterprises can leverage it with Gretel Data Designer (now part of NVIDIA) to bring diversity and quality to model training on the Databricks platform, all while addressing model collapse and data provenance concerns head-on.

Lakeflow Connect: Smarter, Simpler File Ingestion With the Next Generation of Auto Loader

2025-06-10 Watch

talk

Sandip Agarwala (Databricks) , Chavdar Botev (Databricks)

API Cloud Computing Cloud Storage Data Lakehouse Data Quality Databricks

Auto Loader is the definitive tool for ingesting data from cloud storage into your lakehouse. In this session, we’ll unveil new features and best practices that simplify every aspect of cloud storage ingestion. We’ll demo out-of-the-box observability for pipeline health and data quality, walk through improvements for schema management, introduce a series of new data formats and unveil recent strides in Auto Loader performance. Along the way, we’ll provide examples and best practices for optimizing cost and performance. Finally, we’ll introduce a preview of what’s coming next — including a REST API for pushing files directly to Delta, a UI for creating cloud storage pipelines and more. Join us to help shape the future of file ingestion on Databricks.

Machine Learning Model Deployment

2025-06-10

talk

AI/ML Data Lakehouse Databricks Delta Python Scikit-learn

This course is designed to introduce three primary machine learning deployment strategies and illustrate the implementation of each strategy on Databricks. Following an exploration of the fundamentals of model deployment, the course delves into batch inference, offering hands-on demonstrations and labs for utilizing a model in batch inference scenarios, along with considerations for performance optimization. The second part of the course comprehensively covers pipeline deployment, while the final segment focuses on real-time deployment. Participants will engage in hands-on demonstrations and labs, deploying models with Model Serving and utilizing the serving endpoint for real-time inference. By mastering deployment strategies for a variety of use cases, learners will gain the practical skills needed to move machine learning models from experimentation to production. This course shows you how to operationalize AI solutions efficiently, whether it's automating decisions in real-time or integrating intelligent insights into data pipelines. Pre-requisites: Familiarity with Databricks workspace and notebooks, familiarity with Delta Lake and Lakehouse, intermediate level knowledge of Python (e.g. common Python libraries for DS/ML like Scikit-Learn, awareness of model deployment strategies) Labs: Yes Certification Path: Databricks Certified Machine Learning Associate

Measuring User Adoption and KPIs for Data Products Using Databricks

2025-06-10

talk

Grant Stubblefield (Kythera Labs)

Databricks KPI

In this session, attendees will learn how to leverage Databricks' system tables to measure user adoption and track key performance indicators (KPIs) for data products. The session will focus on how organizations can use system tables to analyze user behavior, assess engagement with data products and identify usage trends that can inform product development. By measuring KPIs such as user retention, frequency of use and data queries, organizations can optimize their data products for better performance and ROI.

Open Source Unity Catalog: Getting Started, Best Practices and Governance at Scale

2025-06-10 Watch

talk

Tathagata Das (Databricks) , Ben Wilson (Databricks)

AI/ML

How to use UC OSS, what features are available, and intro to the ecosystem. We'll dive into the latest release and get hands-on with demos for working with your UC data and AI assets — including tables, volumes, models and AI functions.

Scaling Sales Excellence: How Databricks Uses Its Own Tech to Train GTM Teams

2025-06-10 Watch

talk

Sergio Ballesteros Solanas (Databricks)

AI/ML Databricks GenAI GTM

In this session, discover how Databricks leverages the power of Gen AI, MosaicML, Model Serving and Databricks Apps to revolutionize sales enablement. We’ll showcase how we built an advanced chatbot that equips our go-to-market team with the tools and knowledge needed to excel in customer-facing interactions. This AI-driven solution not only trains our salespeople but also enhances their confidence and effectiveness in demonstrating the transformative potential of Databricks to future customers. Attendees will gain insights into the architecture, development process and practical applications of this innovative approach. The session will conclude with an interactive demo, offering a firsthand look at the chatbot in action. Join us to explore how Databricks is using its own platform to drive sales excellence through cutting-edge AI solutions.

Self-Improving Agents and Agent Evaluation With Arize & Databricks ML Flow

2025-06-10 Watch

talk

Aparna Dhinakaran (Arize)

AI/ML Databricks

As autonomous agents become increasingly sophisticated and widely deployed, the ability for these agents to evaluate their own performance and continuously self-improve is essential. However, the growing complexity of these agents amplifies potential risks, including exposure to malicious inputs and generation of undesirable outputs. In this talk, we'll explore how to build resilient, self-improving agents. To drive self-improvement effectively, both the agent and the evaluation techniques must simultaneously improve with a continuously iterating feedback loop. Drawing from extensive real-world experiences across numerous productionized use cases, we will demonstrate practical strategies for combining tools from Arize, Databricks MLflow and Mosaic AI to evaluate and improve high-performing agents.

Sponsored by: Lovelytics | Predict and Mitigate Asset Risk: Unlock Geospatial Analytics with GenAI

State Street Uses Databricks as a Cybersecurity Lakehouse for Threat Intelligence & Real-Time Alerts

2025-06-10 Watch

talk

Paul Signorelli (Databricks) , Ajish George (State Street)

Data Lakehouse Databricks Cyber Security

Organizations face the challenge of managing vast amounts of data to combat emerging threats. The Databricks Data Intelligence platform represents a paradigm shift in cybersecurity at State Street, providing a comprehensive solution for managing and analyzing diverse security data. Through its partnership with Databricks, State Street has created a capability to: Efficiently manage structured and unstructured data. Scale up to analyze 50 petabytes of data in real-time. Ingest and parse data for critical security data streams. Build advanced cybersecurity data products and use automation & orchestration to streamline cybersecurity operations. By leveraging these capabilities, State Street has positioned itself as a leader in the financial services industry when it comes to cybersecurity.

The Future of Real Time Insights with Databricks and SAP

2025-06-10 Watch

talk

Alejandro Saucedo (Zalando SE) , Jon Levine (JPL) (Databricks) , Olaf Melchior (Zalando SE)

AI/ML Analytics Databricks ETL/ELT SAP SQL

Tired of waiting on SAP data? Join this session to see how Databricks and SAP make it easy to query business-ready data—no ETL. With Databricks SQL, you’ll get instant scale, automatic optimizations, and built-in governance across all your enterprise analytics data. Fast and AI-powered insights from SAP data are finally possible—and this is how.

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

2025-06-10 Watch

talk

Scott Haines (Nike)

AI/ML Data Engineering Delta LLM Data Streaming

As data engineering continues to evolve the shift from batch-oriented to streaming-first has become standard across the enterprise. The reality is these changes have been taking shape for the past decade — we just now also happen to be standing on the precipice of true disruption through automation, the likes of which we could only dream about before. Yes, AI Agents and LLMs are already a large part of our daily lives, but we (as data engineers) are ultimately on the frontlines ensuring that the future of AI is powered by consistent, just-in-time data — and Delta Lake is critical to help us get there. This session will provide you with best practices learned the hard way by one of the authors of The Delta Lake Definitive Guide including: Guide to writing generic applications as components Workflow automation tips and tricks Tips and tricks for Delta clustering (liquid, z-order, and classic) Future facing: Leveraging metadata for agentic pipelines and workflow automation

The Lakeflow Effect

2025-06-10 Watch

talk

Josue Bogran (JosueBogran.com & zeb.co) , Bilal Aslam (Databricks)

Databricks

Lakeflow brings much excitement, simplicity and unification to Databricks’ engineering experience. Databricks’ Bilal Aslam (Sr. Director of Product Management) and Josue A. Bogran (Databricks MVP & content creator) provide an overview of the history of Lakeflow, current value to your organization and the direction its capabilities are going toward. The session covers: What is Lakeflow? Differences and similarities between Lakeflow Declarative Pipelines Overview of current Lakeflow Connect, Pipelines and Jobs capabilities How to get started What's Next? The session will also provide you with an opportunity to ask questions to the team behind Lakeflow.

ThredUp’s Journey with Databricks: Modernizing Our Data Infrastructure

2025-06-10 Watch

talk

Aniket Mane (ThredUp Inc.) , Chintan Patel (Thredup)

AI/ML Analytics Data Management Databricks Delta SQL

Building an AI-ready data platform requires strong governance, performance optimization, and seamless adoption of new technologies. At ThredUp, our Databricks journey began with a need for better data management and evolved into a full-scale transformation powering analytics, machine learning, and real-time decision-making. In this session, we’ll cover: Key inflection points: Moving from legacy systems to a modernized Delta Lake foundation Unity Catalog’s impact: Improving governance, access control, and data discovery Best practices for onboarding: Ensuring smooth adoption for engineering and analytics teams What’s next? Serverless SQL and conversational analytics with Genie Whether you’re new to Databricks or scaling an existing platform, you’ll gain practical insights on navigating the transition, avoiding pitfalls, and maximizing AI and data intelligence.

Transforming Credit Analytics With a Compliant Lakehouse at Rabobank

2025-06-10 Watch

talk

Taras Chaikovskyi (Databricks) , Floris Hendriks (Rabobank)

Analytics Data Lakehouse Hive

This presentation outlines Rabobank Credit analytics transition to a secure, audit-ready data architecture using Unity Catalog (UC), addressing critical regulatory challenges in credit analytics for IRB and IFRS9 regulatory modeling. Key technical challenges included legacy infrastructure (Hive metastore, ADLS mounts using Service Principals and Credential passthrough) lacking granular access controls, data access auditing and limited visibility into lineage, creating governance and compliance gaps. Details cover a framework for phased migration to UC. Outcomes include data lineage mapping demonstrating compliance with regulatory requirements, granular role based access control and unified audit trails. Next steps involve a lineage visualization toolkit (custom app for impact analysis and reporting) and lineage expansion to incorporate upstream banking systems.

Transforming Government With Data and AI: Singapore GovTech's Journey With Databricks

2025-06-10 Watch

talk

Sachin Tonk (GovTech)

AI/ML Analytics BI Data Analytics Data Management Data Quality

GovTech is an agency in the Singapore Government focused on tech for good. The GovTech Chief Data Office (CDO) has built the GovTech Data Platform with Databricks at the core. As the government tech agency, we safeguard national-level government and citizen data. A comprehensive data strategy is essential to uplifting data maturity. GovTech has adopted the service model approach where data services are offered to stakeholders based on their data maturity. Their maturity is uplifted through partnership, readying them for more advanced data analytics. CDO offers a plethora of data assets in a “data restaurant” ranging from raw data to data products, all delivered via Databricks and enabled through fine-grained access control, underpinned by data management best practices such as data quality, security and governance. Within our first year on Databricks, CDO was able to save 8,000 man-hours, democratize data across 50% of the agency and achieve six-figure savings through BI consolidation.

Unlocking Industrial Intelligence with AVEVA and Agnico Eagle

2025-06-10 Watch

talk

Ray Yip (Agnico Eagle Mines Limited) , Bry Dillon (AVEVA)

Data Quality Databricks Delta

Industrial data is the foundation for operational excellence, but sharing and leveraging this data across systems presents significant challenges. Fragmented approaches create delays in decision-making, increase maintenance costs, and erode trust in data quality. This session explores how the partnership between AVEVA and Databricks addresses these issues through CONNECT, which integrates directly with Databricks via Delta Sharing. By accelerating time to value, eliminating data wrangling, ensuring high data quality, and reducing maintenance costs, this solution drives faster, more confident decision-making and greater user adoption. We will showcase how Agnico Eagle Mines—the world’s third-largest gold producer with 10 mines across Canada, Australia, Mexico, and Finland—is leveraging this capability to overcome data intelligence barriers at scale. With this solution, Agnico Eagle is making insights more accessible and actionable across its entire organization.

Ursa: Augment Your Lakehouse With Kafka-Compatible Data Streaming Capabilities

2025-06-10 Watch

talk

Gaurav Saxena (Automotive Industry) , Sijie Guo (StreamNative)

AI/ML API Data Lakehouse Delta GenAI Iceberg

As data architectures evolve to meet the demands of real-time GenAI applications, organizations increasingly need systems that unify streaming and batch processing while maintaining compatibility with existing tools. The Ursa Engine offers a Kafka-API-compatible data streaming engine built on Lakehouse (Iceberg and Delta Lake). Designed to seamlessly integrate with data lakehouse architectures, Ursa extends your lakehouse capabilities by enabling streaming ingestion, transformation and processing — using a Kafka-compatible interface. In this session, we will explore how Ursa Engine augments your existing lakehouses with Kafka-compatible capabilities. Attendees will gain insights into Ursa Engine architecture and real-world use cases of Ursa Engine. Whether you're modernizing legacy systems or building cutting-edge AI-driven applications, discover how Ursa can help you unlock the full potential of your data.

Validating Clinical Trial Platforms on Databricks

2025-06-10 Watch

talk

Kamesh Raghavendra (Purgo AI) , Neha Pande (Databricks)

AI/ML Databricks GenAI

Clinical Trial Data is undergoing a renaissance with new insights and data sources being added daily. The speed of new innovations and modalities that are found within trials poses an existential dilemma for 21CFR Part 11 compliance. In these validated environments, new components and methods need to be tested for reproducibility and restricted data access. In classical systems, this validation process would often have taken three months or more due to the manual validation process via validation scripts like Installation Qualification (IQ) and Operational Qualification (OQ) scripts. In conjunction with Databricks, Purgo AI has developed a new technology leveraging generative AI to automate the execution of IQ and OQ scripts and has drastically reduced the amount of time for validating Databricks from three months to less than a day. This drastic speedup of validation will enable the continuous flow of new ideas and implementations for clinical trials.

DevConnect Meetup

2025-06-10

talk

Jonathan Hsieh (LanceDB) , Cathy Yin (Databricks) , Denny Lee (Databricks) , Andrew Shieh (Databricks) , Ziyi Yang (Databricks) , Andy Konwinski (Databricks) , Asfandyar Qureshi (Databricks) , Yuki Watanabe (Databricks) , Brandon Cui (Databricks) , Andrew Drozdov (Databricks) , Anand Kannappan (Patronus AI) , Harsh Panchal (Databricks) , Tomu Hirata (Databricks) , Daya Khudia (Databricks) , Jose Javier Gonzalez (Databricks) , Jasmine Collins (Databricks) , MAHESWARAN SATHIAMOORTHY (Bespoke Labs) , Matei Zaharia (Databricks) , Jonathan Chang (Databricks) , Alexander Trott (Databricks) , Tejas Sundaresan (Databricks) , Pallavi Koppol (Databricks) , Jonathan Frankle (Databricks) , Erich Elsen (Databricks) , Ivan Zhou (Databricks) , Davis Blalock , Gayathri Murali (META)

https://bit.ly/devconnectdais

Advanced Machine Learning Operations

2025-06-09

talk

AI/ML CI/CD Data Lakehouse Databricks Git GitHub

The course is designed to cover advanced concepts and workflows in machine learning operations. It starts by introducing participants to continuous integration (CI) and continuous development (CD) workflows within machine learning projects, guiding them through the deployment of a sample CI/CD workflow using Databricks in the first section. Moving on to the second part, participants delve into data and model testing, where they actively create tests and automate CI/CD workflows. Finally, the course concludes with an exploration of model monitoring concepts, demonstrating the use of Lakehouse Monitoring to oversee machine learning models in production settings. Pre-requisites: Familiarity with Databricks workspace and notebooks; knowledge of machine learning model development and deployment with MLflow (e.g. intermediate-level knowledge of traditional ML concepts, development with CI/CD, the use of Python and Git for ML projects with popular platforms like GitHub) Labs: Yes Certification Path: Databricks Certified Machine Learning Professional

AI/BI for Data Analysts

2025-06-09

talk

AI/ML Analytics BI Databricks SQL

In this course, you’ll learn how to use the features Databricks provides for business intelligence needs: AI/BI Dashboards and AI/BI Genie. As a Databricks Data Analyst, you will be tasked with creating AI/BI Dashboards and AI/BI Genie Spaces within the platform, managing the access to these assets by stakeholders and necessary parties, and maintaining these assets as they are edited, refreshed, or decommissioned over the course of their lifespan. This course intends to instruct participants on how to design dashboards for business insights, share those with collaborators and stakeholders, and maintain those assets within the platform. Participants will also learn how to utilize AI/BI Genie Spaces to support self-service analytics through the creation and maintenance of these environments powered by the Databricks Data Intelligence Engine. Pre-requisites: The content was developed for participants with these skills/knowledge/abilities: A basic understanding of SQL for querying existing data tables in Databricks. Prior experience or basic familiarity with the Databricks Workspace UI. A basic understanding of the purpose and use of statistical analysis results. Familiarity with the concepts around dashboards used for business intelligence. Labs: Yes

Build Data Pipelines with Lakeflow Declarative Pipelines

2025-06-09

talk

AI/ML API Cloud Computing Data Engineering Databricks Python

In this course, you’ll learn how to define and schedule data pipelines that incrementally ingest and process data through multiple tables on the Data Intelligence Platform, using Lakeflow Declarative Pipelines in Spark SQL and Python. We’ll cover topics like how to get started with Lakeflow Declarative Pipelines, how Lakeflow Declarative Pipelines tracks data dependencies in data pipelines, how to configure and run data pipelines using the Lakeflow Declarative Pipelines. UI, how to use Python or Spark SQL to define data pipelines that ingest and process data through multiple tables on the Data Intelligence Platform, using Auto Loader and Lakeflow Declarative Pipelines, how to use APPLY CHANGES INTO syntax to process Change Data Capture feeds, and how to review event logs and data artifacts created by pipelines and troubleshoot syntax.By streamlining and automating reliable data ingestion and transformation workflows, this course equips you with the foundational data engineering skills needed to help kickstart AI use cases. Whether you're preparing high-quality training data or enabling real-time AI-driven insights, this course is a key step in advancing your AI journey.Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.)Labs: NoCertification Path: Databricks Certified Data Engineer Associate

talk-data.com

Top Topics

Top Speakers

Gen AI Evaluation and Governance

Getting Started With Lakeflow Connect

Improve AI Training With the First Synthetic Personas Dataset Aligned to Real-World Distributions

Lakeflow Connect: Smarter, Simpler File Ingestion With the Next Generation of Auto Loader

Machine Learning Model Deployment

Measuring User Adoption and KPIs for Data Products Using Databricks

Open Source Unity Catalog: Getting Started, Best Practices and Governance at Scale

Scaling Sales Excellence: How Databricks Uses Its Own Tech to Train GTM Teams

Self-Improving Agents and Agent Evaluation With Arize & Databricks ML Flow

Sponsored by: Lovelytics | Predict and Mitigate Asset Risk: Unlock Geospatial Analytics with GenAI

Sponsored by: Qlik | Turning Data into Business Impact: How to Build AI-Ready, Trusted Data Products on Databricks

State Street Uses Databricks as a Cybersecurity Lakehouse for Threat Intelligence & Real-Time Alerts

The Future of Real Time Insights with Databricks and SAP

The Hitchhiker's Guide to Delta Lake Streaming in an Agentic Universe

The Lakeflow Effect

ThredUp’s Journey with Databricks: Modernizing Our Data Infrastructure

Transforming Credit Analytics With a Compliant Lakehouse at Rabobank

Transforming Government With Data and AI: Singapore GovTech's Journey With Databricks

Unlocking Industrial Intelligence with AVEVA and Agnico Eagle

Ursa: Augment Your Lakehouse With Kafka-Compatible Data Streaming Capabilities

Validating Clinical Trial Platforms on Databricks

DevConnect Meetup

Advanced Machine Learning Operations

AI/BI for Data Analysts

Build Data Pipelines with Lakeflow Declarative Pipelines