Data + AI Summit 2025

How to Migrate from Teradata to Databricks SQL

2025-06-11 Watch

talk

Fabien Contaminard (Databricks) , Mehran Golestaneh (Databricks)

Databricks DWH LLM SQL Teradata

Storage and processing costs of your legacy Teradata data warehouses impact your ability to deliver. Migrating your legacy Teradata data warehouse to the Databricks Data Intelligence Platform can accelerate your data modernization journey. In this session, learn the top strategies for completing this data migration. We will cover data type conversion, basic to complex code conversions, validation and reconciliation best practices. How to use Databricks natively hosted LLMs to assist with migration activities. See before-and-after architectures of customers who have migrated, and learn about the benefits they realized.

How We Turned 200+ Business Users Into Analysts With AI/BI Genie

2025-06-11 Watch

talk

Thomas Russell (Databricks)

AI/ML Analytics BI Databricks Marketing SQL

AI/BI Genie has transformed self-service analytics for the Databricks Marketing team. This user-friendly conversational AI tool empowers marketers to perform advanced data analysis using natural language — no SQL required. By reducing reliance on data teams, Genie increases productivity and enables faster, data-driven decisions across the organization. But realizing Genie’s full potential takes more than just turning it on. In this session, we’ll share the end-to-end journey of implementing Genie for over 200 marketing users, including lessons learned, best practices and the real business impact of this Databricks-on-Databricks solution. Learn how Genie democratizes data access, enhances insight generation and streamlines decision-making at scale.

Unity Catalog Lakeguard: Secure and Efficient Compute for Your Enterprise

2025-06-11 Watch

talk

Scott Van Woudenberg (Databricks) , Jakob Mund (Databricks)

Cloud Computing Databricks Cloud Functions Cyber Security Spark SQL

Modern data workloads span multiple sources — data lakes, databases, apps like Salesforce and services like cloud functions. But as teams scale, secure data access and governance across shared compute becomes critical. In this session, learn how to confidently integrate external data and services into your workloads using Spark and Unity Catalog on Databricks. We'll explore compute options like serverless, clusters, workflows and SQL warehouses, and show how Unity Catalog’s Lakeguard enforces fine-grained governance — even when concurrently sharing compute by multiple users. Walk away ready to choose the right compute model for your team’s needs — without sacrificing security or efficiency.

What’s New with Databricks Assistant: From Exploration to Production

2025-06-11 Watch

talk

Samantha Banchik (Databricks) , Gal Oshri (Databricks)

Databricks SQL

Databricks Assistant helps you get from initial exploration all the way to production faster and easier than ever. In this session, we'll show you how Assistant simplifies and accelerates common workflows, boosting your productivity across notebooks and the SQL editor. You'll get practical tips, see end-to-end examples in action, and hear about the latest capabilities we're excited about. We'll also discuss how we're continually improving Assistant to make your development experience faster, more contextual and more customizable. Join us to discover how to get the most out of Databricks Assistant and empower your team to build better and faster.

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

2025-06-11 Watch

lightning_talk

Kevin Wilson (NCS Australia)

Analytics Data Governance Data Lakehouse Data Quality Databricks dbt

In this session, we will share NCS’s approach to implementing a Databricks Lakehouse architecture, focusing on key lessons learned and best practices from our recent implementations. By integrating Databricks SQL Warehouse, the DBT Transform framework and our innovative test automation framework, we’ve optimized performance and scalability, while ensuring data quality. We’ll dive into how Unity Catalog enabled robust data governance, empowering business units with self-serve analytical workspaces to create insights while maintaining control. Through the use of solution accelerators, rapid environment deployment and pattern-driven ELT frameworks, we’ve fast-tracked time-to-value and fostered a culture of innovation. Attendees will gain valuable insights into accelerating data transformation, governance and scaling analytics with Databricks.

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

2025-06-11 Watch

lightning_talk

Allison Wang (Databricks) , LU QIU (LanceDB)

AI/ML Analytics API Big Data Data Analytics Lance

PySpark has long been a cornerstone of big data processing, excelling in data preparation, analytics and machine learning tasks within traditional data lakes. However, the rise of multimodal AI and vector search introduces challenges beyond its capabilities. Spark’s new Python data source API enables integration with emerging AI data lakes built on the multi-modal Lance format. Lance delivers unparalleled value with its zero-copy schema evolution capability and robust support for large record-size data (e.g., images, tensors, embeddings, etc), simplifying multimodal data storage. Its advanced indexing for semantic and full-text search, combined with rapid random access, enables high-performance AI data analytics to the level of SQL. By unifying PySpark's robust processing capabilities with Lance's AI-optimized storage, data engineers and scientists can efficiently manage and analyze the diverse data types required for cutting-edge AI applications within a familiar big data framework.

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

2025-06-11

talk

Frank Munz (Databricks)

AI/ML Data Engineering Data Governance Databricks GenAI GitHub

This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

Hands-On Learning: Build Custom Data Intelligence Apps on Databricks

2025-06-11

talk

Justin DeBrabant (Databricks) , Giran Moodley (Databricks) , Ivan Trusov (Databricks)

AI/ML BI Databricks SQL

Want to learn how to build your own custom data intelligence applications directly in Databricks? In this workshop, we’ll guide you through a hands-on tutorial for building a Streamlit web app that leverages many of the key products at Databricks as building blocks. You’ll integrate a live DB SQL warehouse, use Genie to ask questions in natural language, and embed AI/BI dashboards for interactive visualizations. In addition, we’ll discuss key concepts and best practices for building production-ready apps, including logging and observability, scalability, different authorization models, and deployment. By the end, you'll have a working AI app—and the skills to build more.

Lakeflow Connect: Easy, Efficient Ingestion From Databases

2025-06-11 Watch

talk

Peter Pogorski (Databricks) , Bret Grantham (Databricks)

postgresql Cyber Security SQL

Lakeflow Connect streamlines the ingestion of incremental data from popular databases like SQL Server and PostgreSQL. In this session, we’ll review best practices for networking, security, minimizing database load, monitoring and more — tailored to common industry scenarios. Join us to gain practical insights into Lakeflow Connect's functionality so that you’re ready to build your own pipelines. Whether you're looking to optimize data ingestion or enhance your database integrations, this session will provide you with a deep understanding of how Lakeflow Connect works with databases.

Retail Genie: No-Code AI Apps for Empowering BI Users to be Self-Sufficient

2025-06-11 Watch

talk

Harish Rajagopalan (Databricks) , Siddhesh Pore (Databricks)

AI/ML Analytics BI Databricks GenAI NLP

Explore how Databricks AI/BI Genie revolutionizes retail analytics, empowering business users to become self-reliant data explorers. This session highlights no-code AI apps that create a conversational interface for retail data analysis. Genie spaces harness NLP and generative AI to convert business questions into actionable insights, bypassing complex SQL queries. We'll showcase retail teams effortlessly analyzing sales trends, inventory and customer behavior through Genie's intuitive interface. Witness real-world examples of AI/BI Genie's adaptive learning, enhancing accuracy and relevance over time. Learn how this technology democratizes data access while maintaining governance via Unity Catalog integration. Discover Retail Genie's impact on decision-making, accelerating insights and cultivating a data-driven retail culture. Join us to see the future of accessible, intelligent retail analytics in action.

Selectively Overwrite Data With Delta Lake’s Dynamic Insert Overwrite

2025-06-11 Watch

lightning_talk

Bart Samwel (Databricks) , Thang Long Vu (Databricks)

Databricks dbt Delta ETL/ELT SQL

Dynamic Insert Overwrite is an important Delta Lake feature that allows fine-grained updates by selectively overwriting specific rows, eliminating the need for full-table rewrites. For examples, this capability is essential for: DBT-Databricks' incremental models/workloads, enabling efficient data transformations by processing only new or updated records ETL Slowly Changing Dimension (SCD) Type 2 In this lightning talk, we will: Introduce Dynamic Insert Overwrite: Understand its functionality and how it works Explore key use cases: Learn how it optimizes performance and reduces costs Share best practices: Discover practical tips for leveraging this feature on Databricks, including on the cutting-edge Serverless SQL Warehouses

Using Clean Rooms for Privacy-Centric Data Collaboration

2025-06-11 Watch

talk

DJ Sharkey (Databricks) , Nikhil Gaekwad (Databricks)

AI/ML Analytics Databricks Delta Python SQL

Databricks Clean Rooms make privacy-safe collaboration possible for data, analytics, and AI — across clouds and platforms. Built on Delta Sharing, Clean Rooms enable organizations to securely share and analyze data together in a governed, isolated environment — without ever exposing raw data. In this session, you’ll learn how to get started with Databricks Clean Rooms and unlock advanced use cases including: Cross-platform collaboration and joint analytics Training machine learning and AI models Enforcing custom privacy policies Analyzing unstructured data Incorporating proprietary libraries in Python and SQL notebooks Auditing clean room activity for compliance Whether you're a data scientist, engineer or data leader, this session will equip you to drive high-value collaboration while maintaining full control over data privacy and governance.

What’s New in Security and Compliance on the Databricks Data Intelligence Platform

2025-06-11 Watch

talk

Filippo Seracini (Databricks) , Suresh Thiru (Databricks)

AI/ML AWS Azure Cloud Computing Databricks GCP

In this session, we’ll walk through the latest advancements in platform security and compliance on Databricks — from networking updates to encryption, serverless security and new compliance certifications across AWS, Azure and Google Cloud. We’ll also share our roadmap and best practices for how to securely configure workloads on Databricks SQL Serverless, Unity Catalog, Mosaic AI and more — at scale. If you're building on Databricks and want to stay ahead of evolving risk and regulatory demands, this session is your guide.

Your Wish is AI Command — Get to Grips With Databricks Genie

2025-06-11 Watch

talk

Simon Whiteley (Advancing Analytics)

AI/ML Analytics BI Databricks LLM SQL

Picture the scene — you're exploring a deep, dark cave looking for insights to unearth when, in a burst of smoke, Genie appears and offers you not three but unlimited data wishes. This isn't a folk tale, it's the growing wave of Generative BI that is going to be a part of analytics platforms. Databricks Genie is a tool powered by a SQL-writing LLM that redefines how we interact with data. We'll look at the basics of creating a new Genie room, scoping its data tables and asking questions. We'll help it out with some complex pre-defined questions and ensure it has the best chance of success. We'll give the tool a personality, set some behavioural guidelines and prepare some hidden easter eggs for our users to discover. Generative BI is going to be a fundamental part of the analytics toolset used across businesses. If you're using Databricks, you should be aware of Genie, if you're not, you should be planning your Generative BI Roadmap, and this session will answer your wishes.

Enterprise Cost Management for Data Warehousing with Databricks SQL

2025-06-11 Watch

talk

Patrick Yang (Databricks) , Joo Ho Yeo (Databricks)

Databricks DWH SQL

This session shows you how to gain visibility into your Databricks SQL spend and ensure cost efficiency. Learn about the latest features to gain detailed insights into Databricks SQL expenses so you can easily monitor and control your costs. Find out how you can enable attribution to internal projects, understand the Total Cost of Ownership, set up proactive controls and find ways to continually optimize your spend.

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Turn Genie Into an Agent Using Conversation APIs

2025-06-11 Watch

talk

Prithvi Kannan (Databricks) , Hanlin Sun (Databricks)

AI/ML Analytics API BI SQL

Transform your AI/BI Genie into a text-to-SQL powerhouse using the Genie Conversation APIs. This session explores how Genie functions as an intelligent agent, translating natural language queries into SQL to accelerate insights and enhance self-service analytics. You'll learn practical techniques for configuring agents, optimizing queries and handling errors — ensuring Genie delivers accurate, relevant responses in real time. A must-attend for teams looking to level up their AI/BI capabilities and deliver smarter analytics experiences.

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

2025-06-10 Watch

talk

Sid Taneja (Databricks) , Youngbin Kim (Databricks)

AI/ML Databricks GenAI LLM NLP RAG

This session is repeated. Integrating AI into existing data workflows can be challenging, often requiring specialized knowledge and complex infrastructure. In this session, we'll share how SQL users can leverage AI/ML to access large language models (LLMs) and traditional machine learning directly from within SQL, simplifying the process of incorporating AI into data workflows. We will demonstrate how to use Databricks SQL for natural language processing, traditional machine learning, retrieval augmented generation and more. You'll learn about best practices and see examples of solving common use cases such as opinion mining, sentiment analysis, forecasting and other common AI/ML tasks.

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

2025-06-10 Watch

talk

Siying Dong (Databricks) , Jerry Peng (Databricks)

AI/ML Databricks Spark SQL Data Streaming

Real-time mode is a new low-latency execution mode for Apache Spark™ Structured Streaming. It can consistently provide p99 latencies less than 300 milliseconds for a broad set of stateless and stateful streaming queries. Our talk focuses on the technical aspects of making this possible in Spark. We’ll dive into the core architecture that enables these dramatic latency improvements, including a concurrent stage scheduler and a non-blocking shuffle. We’ll explore how we maintained Spark’s fault-tolerance guarantees, and we’ll also share specific optimizations we made to our streaming SQL operators. These architectural improvements have already enabled Databricks customers to build workloads with latencies up to 10x lower than before. Early adopters in our Private Preview have successfully implemented real-time enrichment pipelines and feature engineering for machine learning — use cases that were previously impossible at these latencies.

Geospatial Insights With Databricks SQL: Techniques and Applications

2025-06-10 Watch

lightning_talk

Michael Johns (Databricks) , Kent Marten (Databricks)

Databricks SQL

Spatial data is increasingly important, but working with it can be complex. In this session, we’ll explore how Databricks SQL supports spatial analysis and helps analysts and engineers get more value from location-based data. We’ll cover what’s coming in the Public Preview of Spatial SQL, when and how to use the new Geometry and Geography data types, and practical use cases for H3. You’ll also learn about common challenges with spatial data and how we're addressing them, along with a look at the near-term roadmap.

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

2025-06-10 Watch

talk

Chris Crawford (Databricks) , Hobson Bryan (Global Water Security Center)

CI/CD Data Lakehouse Databricks Git Cyber Security Spark

The Global Water Security Center translates environmental science into actionable insights for the U.S. Department of Defense. Prior to incorporating Databricks, responding to these requests required querying approximately five hundred thousand raster files representing over five hundred billion points. By leveraging lakehouse architecture, Databricks Auto Loader, Spark Streaming, Databricks Spatial SQL, H3 geospatial indexing and Databricks Liquid Clustering, we were able to drastically reduce our “time to analysis” from multiple business days to a matter of seconds. Now, our data scientists execute queries on pre-computed tables in Databricks, resulting in a “time to analysis” that is 99% faster, giving our teams more time for deeper analysis of the data. Additionally, we’ve incorporated Databricks Workflows, Databricks Asset Bundles, Git and Git Actions to support CI/CD across workspaces. We completed this work in close partnership with Databricks.

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

2025-06-10 Watch

talk

Mathieu Pelletier (Databricks)

BI Data Lakehouse Databricks Power BI SQL Tableau

In this presentation, we will explore how to leverage Databricks' SQL engine to efficiently ingest and transform geospatial data. We'll demonstrate the seamless process of connecting to external systems such as ArcGIS to retrieve datasets, showcasing the platform's versatility in handling diverse data sources. We'll then delve into the power of Databricks Apps, illustrating how you can create custom geospatial dashboards using various frameworks like Streamlit and Flask, or any framework of your choice. This flexibility allows you to tailor your visualizations to your specific needs and preferences. Furthermore, we'll highlight the Databricks Lakehouse's integration capabilities with popular dashboarding tools such as Tableau and Power BI. This integration enables you to combine the robust data processing power of Databricks with the advanced visualization features of these specialized tools.

SQL-First ETL: Building Easy, Efficient Data Pipelines With Lakeflow Declarative Pipelines

2025-06-10 Watch

talk

Paul Lappas (Databricks) , Ritwik Yadav (Databricks) , Meixian Li (Databricks)

Databricks dbt ETL/ELT SQL Data Streaming

This session explores how SQL-based ETL can accelerate development, simplify maintenance and make data transformation more accessible to both engineers and analysts. We'll walk through how Databricks Lakeflow Declarative Pipelines and Databricks SQL warehouse support building production-grade pipelines using familiar SQL constructs.Topics include: Using streaming tables for real-time ingestion and processing Leveraging materialized views to deliver fast, pre-computed datasets Integrating with tools like dbt to manage batch and streaming workflows at scale By the end of the session, you’ll understand how SQL-first approaches can streamline ETL development and support both operational and analytical use cases.

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

2025-06-10 Watch

talk

Ivan Spiriev (The World Bank) , Ivan Donev (The World Bank)

Analytics Databricks Delta SQL

This session will take you on our journey of integrating Databricks as the core serving layer in a large enterprise, demonstrating how you can build a unified data platform that meets diverse business needs. We will walk through the steps for constructing a central serving layer by leveraging Databricks’ SQL Warehouse to efficiently deliver data to analytics tools and downstream applications. To tackle low latency requirements, we’ll show you how to incorporate an interim scalable relational database layer that delivers sub-second performance for hot data scenarios. Additionally, we’ll explore how Delta Sharing enables secure and cost-effective data distribution beyond your organization, eliminating silos and unnecessary duplication for a truly end-to-end centralized solution. This session is perfect for data architects, engineers and decision-makers looking to unlock the full potential of Databricks as a centralized serving hub.

Using Catalogs for a Well-Governed and Efficient Data Ecosystem

2025-06-10 Watch

talk

Kajal Woods (Capital One Financial) , jim Lebonitte (Capital One)

Data Management Databricks SQL

The ability to enforce data management controls at scale and reduce the effort required to manage data pipelines is critical to operating efficiently. Capital One has scaled its data management capabilities and invested in platforms to help address this need. In the past couple of years, the role of “the catalog” in a data platform architecture has transitioned from just providing SQL to providing a full suite of capabilities that can help solve this problem at scale. This talk will give insight into how Capital One is thinking about leveraging Databricks Unity Catalog to help tackle these challenges.

talk-data.com

Top Topics

Top Speakers

How to Migrate from Teradata to Databricks SQL

How We Turned 200+ Business Users Into Analysts With AI/BI Genie

Unity Catalog Lakeguard: Secure and Efficient Compute for Your Enterprise

What’s New with Databricks Assistant: From Exploration to Production

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals

Hands-On Learning: Build Custom Data Intelligence Apps on Databricks

Lakeflow Connect: Easy, Efficient Ingestion From Databases

Retail Genie: No-Code AI Apps for Empowering BI Users to be Self-Sufficient

Selectively Overwrite Data With Delta Lake’s Dynamic Insert Overwrite

Using Clean Rooms for Privacy-Centric Data Collaboration

What’s New in Security and Compliance on the Databricks Data Intelligence Platform

Your Wish is AI Command — Get to Grips With Databricks Genie

Enterprise Cost Management for Data Warehousing with Databricks SQL

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Turn Genie Into an Agent Using Conversation APIs

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

Real-Time Mode Technical Deep Dive: How We Built Sub-300 Millisecond Streaming Into Apache Spark™

Geospatial Insights With Databricks SQL: Techniques and Applications

From Days to Seconds — Reducing Query Times on Large Geospatial Datasets by 99%

Geo-Powering Insights: The Art of Spatial Data Integration and Visualization

SQL-First ETL: Building Easy, Efficient Data Pipelines With Lakeflow Declarative Pipelines

Unifying Data Delivery: Using Databricks as Your Enterprise Serving Layer

Using Catalogs for a Well-Governed and Efficient Data Ecosystem