Data + AI Summit 2025

Sponsored by: Sigma | Moving from On-premises to Unified Business Intelligence with Databricks & Sigma

SQL-Based ETL: Options for SQL-Only Databricks Development

2025-06-10 Watch

talk

Dustin Vannoy (Databricks)

Analytics Databricks dbt Delta ETL/ELT Git

Using SQL for data transformation is a powerful way for an analytics team to create their own data pipelines. However, relying on SQL often comes with tradeoffs such as limited functionality, hard-to-maintain stored procedures or skipping best practices like version control and data tests. Databricks supports building high-performing SQL ETL workloads. Attend this session to hear how Databricks supports SQL for data transformation jobs as a core part of your Data Intelligence Platform. In this session we will cover 4 options to use Databricks with SQL syntax to create Delta tables: Lakeflow Declarative Pipelines: A declarative ETL option to simplify batch and streaming pipelines dbt: An open-source framework to apply engineering best practices to SQL based data transformations SQLMesh: an open-core product to easily build high-quality and high-performance data pipelines SQL notebooks jobs: a combination of Databricks Workflows and parameterized SQL notebooks

Top Performance and Cost Optimizations for Lakeflow Declarative Pipelines

2025-06-10

talk

Steven Yu (Databricks)

Lakeflow Declarative Pipelines simplifies pipeline development and management — but how do you optimize for performance and cost? In this session, we’ll explore practical strategies for tuning Lakeflow Declarative Pipelines, including when and how to use autoscaling, Photon and different node types. We'll also cover how to monitor resource usage and decide when serverless is the right choice. You'll learn best practices drawn from real-world customer implementations, along with an overview of the latest performance enhancements available in serverless Lakeflow Declarative Pipelines.

Tracing the Path of a Row Through a GPU-Enabled Query Engine on the Grace-Blackwell Architecture

2025-06-10 Watch

talk

Thomas Graves (NVIDIA) , Clemens Lutz (NVIDIA)

Analytics Data Analytics Spark SQL

Grace-Blackwell is NVIDIA’s most recent GPU system architecture. It addresses a key concern of query engines: fast data access. In this session, we will take a close look at how GPUs can accelerate data analytics by tracing how a row flows through a GPU-enabled query engine.Query engines read large data from CPU memory or from disk. On Blackwell GPUs, a query engine can rely on hardware-accelerated decompression of compact formats. The Grace-Blackwell system takes data access performance even further, by reading data at up to 450 GB/s across its CPU to GPU interconnect. We demonstrate full end-to-end SQL query acceleration using GPUs in a prototype query engine using industry standard benchmark queries. We compare the results to existing CPU solutions.Using Apache Spark™ and the RAPIDS Accelerator for Apache Spark, we demonstrate the impact GPU acceleration has on the performance of SQL queries at the 100TB scale using NDS, a suite that simulates real-world business scenarios.

Transforming Financial Intelligence with FactSet Structured and Unstructured Data and Delta Sharing

2025-06-10 Watch

talk

Kristen Clark (FactSet) , Keon Shahab (Databricks)

AI/ML Databricks Delta GenAI

Join us to explore the dynamic partnership between FactSet and Databricks, transforming data accessibility and insights. Discover the launch of FactSet’s Structured DataFeeds via Delta Sharing on the Databricks Marketplace, enhancing access to crucial financial data insights. Learn about the advantages of streamlined data delivery and how this integration empowers data ecosystems. Beyond structured data, explore the innovative potential of vectorized data sharing of unstructured content such as news, transcripts, and filings. Gain insights into the importance of seamless vectorized data delivery to support GenAI applications and how FactSet is preparing to simplify client GenAI workflows with AI-ready data. Experience a demo that showcases the complete journey from data delivery to actionable GenAI application responses in a real-world Financial Services scenario. See firsthand how FactSet is simplifying client GenAI workflows with AI-ready data that drives faster, more informed financial decisions.

Transforming HP’s Print ELT Reporting with GenIT: Real-Time Insights Tool Powered by Databricks AI

2025-06-10 Watch

talk

Weiwei Hu (HP)

AI/ML DataViz Databricks ETL/ELT GenAI SQL

Timely and actionable insights are critical for staying competitive in today’s fast-paced environment. At HP Print, manual reporting for executive leadership (ELT) has been labor-intensive, hindering agility and productivity. To address this, we developed the Generative Insights Tool (GenIT) using Databricks Genie and Mosaic AI to create a real-time insights engine automating SQL generation, data visualization, and narrative creation. GenIT delivers instant insights, enabling faster decisions, greater productivity, and improved consistency while empowering leaders to respond to printer trends. With automated querying, AI-powered narratives, and a chatbot, GenIT reduces inefficiencies and ensures quality data and insights. Our roadmap integrates multi-modal data, enhances chatbot functionality, and scales globally. This initiative shows how HP Print leverages GenAI to improve decision-making, efficiency, and agility, and we will showcase this transformation at the Databricks AI Summit.

Unify Your Data and Governance With Lakehouse Federation

2025-06-10 Watch

talk

Zeashan Pappa (Databricks) , Fuat Can Efeoglu (Databricks)

AI/ML Analytics Data Lakehouse Hive Snowflake SQL

In today's data landscape, organizations often grapple with fragmented data spread across various databases, data warehouses and catalogs. Lakehouse Federation addresses this challenge by enabling seamless discovery, querying, and governance of distributed data without the need for duplication or migration. This session will explore how Lakehouse Federation integrates external data sources like Hive Metastore, Snowflake, SQL Server and more into a unified interface, providing consistent access controls, lineage tracking and auditing across your entire data estate. Learn how to streamline analytics and AI workloads, enhance compliance and reduce operational complexity by leveraging a single, cohesive platform for all your data needs.

Unlocking AI Value: Build AI Agents on SAP Data in Databricks

2025-06-10 Watch

talk

Qi Su (Databricks)

AI/ML Databricks Delta ETL/ELT SAP

Discover how enterprises are turning SAP data into intelligent AI. By tapping into contextual SAP data through Delta Sharing on Databricks - no messy ETL needed - they’re accelerating AI innovation and business insights. Learn how they: - Build domain-specific AI that can reason on private SAP data- Deliver data intelligence to power insights for business leaders- Govern and secure their new unified data estate

Unlock Your Use Cases: A Deep Dive on Structured Streaming’s New TransformWithState API

2025-06-10 Watch

talk

Angela Chu (Databricks) , Anish Shrigondekar (Databricks)

API Spark Data Streaming

Don’t you just hate telling your customers “No”? “No, I can’t get you the data that quickly”, or “No that logic isn’t possible to implement” really aren’t fun to say. But what if you had a tool that would allow you to implement those use cases? What if it was in a technology you were already familiar with — say, Spark Structured Streaming? There is a brand new arbitrary stateful operations API called TransformWithState, and after attending this deep dive you won’t have to say “No” anymore. During this presentation we’ll go through some real-world use cases and build them step-by-step. Everything from state variables, process vs. event time, watermarks, timers, state TTL, and even how you can initialize state with the checkpoint of another stream. Unlock your use cases with the power of Structured Streaming’s TransformWithState!

Using Databricks to Power News Sentiment, a Capital IQ Pro Application

2025-06-10 Watch

talk

Debbie Connolly (S&P Global)

Databricks ETL/ELT SQL

The News Sentiment application enhances the discoverability of news content through our flagship platform, Capital IQ Pro. We processed news articles for 10,000+ public companies through entity recognition, along with a series of proprietary financial sentiment models to assess whether the news was positive or negative, as well as its significance and relevance to the company. We built a database containing over 1.5 million signals and operationalized the end-to-end ETL as a daily Workflow on Databricks. The development process included model training and selection. We utilized training data from our internal financial analysts to train Google’s T5-Flan to create our proprietary sentiment model and two additional models. Our models are deployed on Databricks Model-Serving as serverless endpoints that can be queried on-demand. The last phase of the project was to develop a UI, in which we utilized Databricks serverless SQL warehouses to surface this data in real-time.

Advanced RAG Overview — Thawing Your Frozen RAG Pipeline

2025-06-10 Watch

talk

James Lin (Experian) , Jason Li (Experian)

Data Lakehouse Databricks RAG

The most common RAG systems rely on a frozen RAG system — one where there’s a single embedding model and single vector index. We’ve achieved a modicum of success with that, but when it comes to increasing accuracy for production systems there is only so much this approach solves. In this session we will explore how to move from the frozen systems to adaptive RAG systems which produce more tailored outputs with higher accuracy. Databricks services: Lakehouse, Unity Catalog, Mosaic, Sweeps, Vector Search, Agent Evaluation, Managed Evaluation, Inference Tables

AI Agents for Marketing: Leveraging Mosaic AI to Create a Multi-Purpose Agentic Marketing Assistant

2025-06-10 Watch

talk

Sailesh Bharathwaaj Krishnamurthy (7-Eleven Inc)

AI/ML Databricks Marketing

Marketing professionals build campaigns, create content and use effective copywriting to tell a good story to promote a product/offer. All of this requires a thorough and meticulous process for every individual campaign. In order to assist marketing professionals at 7-Eleven, we built a multi-purpose assistant that could: Use campaign briefs to generate campaign ideas and taglines Do copy-writing for marketing content Verify images for messaging accuracy Answer general questions and browse the web as a generic assistant We will walk you through how we created multiple agents as different personas with LangGraph and Mosaic AI to create a chat assistant that assumes a different persona based on the user query. We will also explain our evaluation methodology in choosing models and prompts and how we implemented guardrails for high reliability with sensitive marketing content. This assistant by 7-Eleven was showcased at the Databricks booth at NRF earlier this year.

AI/BI Driving Speed to Value in Supply Chain

2025-06-10 Watch

talk

Adrian McClure (Conagra Brands) , Heather Cooley (Conagra Brands)

AI/ML Analytics BI Data Science Databricks

Conagra is a global food manufacturer with $12.2B in revenue, 18K+ employees, 45+ plants in US, Canada and Mexico. Conagra's Supply Chain organization is heavily focused on delivering results in productivity, waste reduction, inventory rationalization, safety and customer service levels. By migrating the Supply Chain reporting suite to Databricks over the past 2 years, Conagra's Supply Chain Analytics & Data Science team has been able to deliver new AI solutions which complement traditional BI platforms and lay the foundation for additional AI/ML applications in the future. With Databricks Genie integrated within traditional BI reports, Conagra Supply Chain users can now go from insight to action faster and with fewer clicks, enabling speed to value in a complex Supply Chain. The Databricks platform also allows the team to curate data products to be consumed by traditional BI applications today as well as the ability to rapidly scale for the AI/ML applications of tomorrow.

Best Practices for Building User-Facing AI Systems on Databricks

2025-06-10 Watch

talk

Jyotsna Bharadwaj (Databricks) , Arthur Dooner (Databricks)

AI/ML Databricks GenAI Cyber Security

This session is repeated. Integrating AI agents into business systems requires tailored approaches for different maturity levels (crawl-walk-run) that balance scalability, accuracy and usability. This session addresses the critical challenge of making AI agents accessible to business users. We will explore four key integration methods: Databricks apps: The fastest way to build and run applications that leverage your data, with the full security and governance of Databricks Genie: Tool enabling non-technical users to gain data insights on Structured Data through natural language queries Chatbots: Combine real-time data retrieval with generative AI for contextual responses and process automation Batch inference: Scalable, asynchronous processing for large-scale AI tasks, optimizing efficiency and cost We'll compare these approaches, discussing their strengths, challenges and ideal use cases to help businesses select the most suitable integration strategy for their specific needs.

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog

2025-06-10 Watch

talk

Mohit Kumar (T-Mobile) , Geoffrey Freeman (T-Mobile)

API Databricks Delta Iceberg Cyber Security Snowflake

As data ecosystems grow more complex, organizations often struggle with siloed platforms and fragmented governance. In this session, we’ll explore how our team made Databricks the central hub for cross-platform interoperability, enabling seamless Snowflake integration through Unity Catalog and the Iceberg REST API. We’ll cover: Why interoperability matters and the business drivers behind our approach How Unity Catalog and Uniform simplify interoperability, allowing Databricks to expose an Iceberg REST API for external consumption Technical deep dive into data sharing, query performance, and access control across Databricks and Snowflake Lessons learned and best practices for building a multi-engine architecture while maintaining governance and efficiency By leveraging Uniform, Delta, and Iceberg, we created a flexible, vendor-agnostic architecture that bridges Databricks and Snowflake without compromising performance or security.

Building Responsible and Resilient AI: The Databricks AI Governance Framework

2025-06-10 Watch

talk

Abhi Arikapudi (Databricks) , David Wells (Databricks)

AI/ML Databricks GenAI Cyber Security

GenAI & machine learning are reshaping industries, driving innovation and redefining business strategies. As organizations embrace these technologies, they face significant challenges in managing AI initiatives effectively, such as balancing innovation with ethical integrity, operational resilience and regulatory compliance. This presentation introduces the Databricks AI Governance Framework (DAGF), a practical framework designed to empower organizations to navigate the complexities of AI. It provides strategies for building scalable, responsible AI programs that deliver measurable value, foster innovation and achieve long-term success. By examining the framework's five foundational pillars — AI organization, ethics, legal and regulatory compliance, transparency and interpretability, AI operations and infrastructure and AI security — this session highlights how AI governance aligns programs with the organization's strategic goals, mitigates risks and builds trust across stakeholders.

Creating LLM Judges to Measure Domain-Specific Agent Quality

2025-06-10 Watch

talk

Samraj Moorjani (Databricks) , Nikhil Thorat (Databricks)

AI/ML LLM

This session is repeated. Measuring the effectiveness of domain-specific AI agents requires specialized evaluation frameworks that go beyond standard LLM benchmarks. This session explores methodologies for assessing agent quality across specialized knowledge domains, tailored workflows, and task-specific objectives. We'll demonstrate practical approaches to designing robust LLM judges that align with your business goals and provide meaningful insights into agent capabilities and limitations. Key session takeaways include: Tools for creating domain-relevant evaluation datasets and benchmarks that accurately reflect real-world use cases Approach for creating LLM judges to measure domain-specific metrics Strategies for interpreting those results to drive iterative improvement in agent performance Join us to learn how proper evaluation methodologies can transform your domain-specific agents from experimental tools to trusted enterprise solutions with measurable business value.

Driving Databricks Platform With Revenue Intelligence ROI

2025-06-10 Watch

talk

Joel Fuernsinn (Veeam)

Databricks

Demonstrating a real ROI is key to driving executive and stakeholder buy-in for major technology changes. At Veeam, we aligned our Databricks Platform change with projects to increase sales pipeline and improve customer retention. By delivering targeted improvements on those critical business metrics, we created positive ROI in short order while at the same time setting the foundation for long term Databricks Platform success. This session targets data and business leaders looking to understand how they can turn their infrastructure change into a business revenue driver.

Empowering Healthcare Insights: A Unified Lakehouse Approach With Databricks

2025-06-10 Watch

talk

BIANCA STRATULAT (BJSS) , Mike Dobing (Databricks)

AWS Azure Data Lake Data Lakehouse Databricks Iceberg

NHS England is revolutionizing healthcare research by enabling secure, seamless access to de-identified patient data through the Federated Data Platform (FDP). Despite vast data resources spread across regional and national systems, analysts struggle with fragmented, inconsistent datasets. Enter Databricks: powering a unified, virtual data lake with Unity Catalog at its core — integrating diverse NHS systems while ensuring compliance and security. By bridging AWS and Azure environments with a private exchange and leveraging the Iceberg connector to interface with Palantir, analysts gain scalable, reliable and governed access to vital healthcare data. This talk explores how this innovative architecture is driving actionable insights, accelerating research and ultimately improving patient outcomes.

GPU Accelerated Spark Connect

2025-06-10 Watch

talk

Gera Shegalov (NVIDIA) , Erik eordentlich (NVIDIA)

AI/ML API ETL/ELT Cyber Security Spark SQL

Spark Connect, first included for SQL/DataFrame API in Apache Spark 3.4 and recently extended to MLlib in 4.0, introduced a new way to run Spark applications over a gRPC protocol. This has many benefits, including easier adoption for non-JVM clients, version independence from applications and increased stability and security of the associated Spark clusters. The recent Spark Connect extension for ML also included a plugin interface to configure enhanced server-side implementations of the MLlib algorithms when launching the server. In this talk, we shall demonstrate how this new interface, together with Spark SQL’s existing plugin interface, can be used with NVIDIA GPU-accelerated plugins for ML and SQL to enable no-code change, end-to-end GPU acceleration of Spark ETL and ML applications over Spark Connect, with optimal performance up to 9x at 80% cost reduction compared to CPU baselines.

How an Open, Scalable and Secure Data Platform is Powering Quick Commerce Swiggy's AI

2025-06-10 Watch

talk

Vasan Vembu Srini (Databricks) , Akash Agarwal (Swiggy)

AI/ML Analytics Flink Data Lakehouse Databricks Delta

Swiggy, India's leading quick commerce platform, serves ~13 million users across 653 cities, with 196,000 restaurant partners and 17,000 SKUs. To handle this scale, Swiggy developed a secure, scalable AI platform processing millions of predictions per second. The tech stack includes Apache Kafka for real-time streaming, Apache Spark on Databricks for analytics and ML, and Apache Flink for stream processing. The Lakehouse architecture on Delta ensures data reliability, while Unity Catalog enables centralized access control and auditing. These technologies power critical AI applications like demand forecasting, route optimization, personalized recommendations, predictive delivery SLAs, and generative AI use cases.Key Takeaway:This session explores building a data platform at scale, focusing on cost efficiency, simplicity, and speed, empowering Swiggy to seamlessly support millions of users and AI use cases.

How Data Sharing is Transforming Healthcare: Real World Insights

2025-06-10 Watch

talk

John Wollman (Komodo Health, Inc.) , Mark Lee (Databricks)

Delta

In today’s rapidly evolving healthcare landscape, the ability to securely and efficiently share data is critical to driving better patient outcomes, operational efficiencies, and groundbreaking research. In this session, Komodo Health will explore how Delta sharing unlocks new opportunities across the life sciences ecosystem, with de-identified longitudinal patient data without compromising patient privacy. We will share insights into customers' experiences leveraging de-identified patient data to reduce the burden of disease while improving the overall patient experience. Attendees will learn practical approaches to compliantly share data in life sciences.

How to Get the Most Out of Your BI Tools on Databricks

2025-06-10 Watch

talk

Kyle Hale (Databricks)

AI/ML Analytics BI Databricks DWH Power BI

Unlock the full potential of your BI tools with Databricks. This session explores how features like Photon, Databricks SQL, Liquid Clustering, AI/BI Genie and Publish to Power BI enhance performance, scalability and user experience. Learn how Databricks accelerates query performance, optimizes data layouts and integrates seamlessly with BI tools. Gain actionable insights and best practices to improve analytics efficiency, reduce latency and drive better decision-making. Whether migrating from a data warehouse or optimizing an existing setup, this talk provides the strategies to elevate your BI capabilities.

Introduction to Databricks SQL

2025-06-10 Watch

talk

Himanshu Raja (Databricks) , Pearl Ubaru (Databricks)

Databricks DWH SQL

This session is repeated. If you are brand new to Databricks SQL and want to get a lightning tour of this intelligent data warehouse, this session is for you. Learn about the architecture of Databricks SQL. Then show how simple, streamlined interfaces are making it easier for analysts, developers, admins and business users to get their jobs done and questions answered. We’ll show how easy it is to create a warehouse, get data, transform it and build queries and dashboards. By the end of the session, you’ll be able to build a Databricks SQL warehouse in 5 minutes.

Lakeflow Connect: The Game-Changer for Complex Event-Driven Architectures

2025-06-10 Watch

talk

Giancarlo Costa (European Food Safety Authority) , Jeroen De Clercq (delaware) , Tim Bal (delaware)

AI/ML Data Quality React

In 2020, Delaware implemented a state-of-the-art, event-driven architecture for EFSA, enabling a highly decoupled system landscape, presented at the Data&AI Summit 2021. By centrally brokering events in near real-time, consumer applications react instantly to events from producer applications as they occur. Event producers are decoupled from consumers via a publisher/subscriber mechanism. Over the past years, we noticed some drawbacks. The processing of these custom events, primarily aimed for process integration weren’t covering all edge cases, the data quality was not always optimal due to missing events and we needed to create a complex logic for SCD2 tables. Lakeflow Connect allows us to extract the data directly from the source without the complex architecture in between, avoiding data loss and thus, data quality issues, and with some simple adjustments, an SCD2 table is created automatically. Lakeflow Connect allows us to create more efficient and intelligent data provisioning.

talk-data.com

Top Topics

Top Speakers

Sponsored by: Sigma | Moving from On-premises to Unified Business Intelligence with Databricks & Sigma

SQL-Based ETL: Options for SQL-Only Databricks Development

Top Performance and Cost Optimizations for Lakeflow Declarative Pipelines

Tracing the Path of a Row Through a GPU-Enabled Query Engine on the Grace-Blackwell Architecture

Transforming Financial Intelligence with FactSet Structured and Unstructured Data and Delta Sharing

Transforming HP’s Print ELT Reporting with GenIT: Real-Time Insights Tool Powered by Databricks AI

Unify Your Data and Governance With Lakehouse Federation

Unlocking AI Value: Build AI Agents on SAP Data in Databricks

Unlock Your Use Cases: A Deep Dive on Structured Streaming’s New TransformWithState API

Using Databricks to Power News Sentiment, a Capital IQ Pro Application

Advanced RAG Overview — Thawing Your Frozen RAG Pipeline

AI Agents for Marketing: Leveraging Mosaic AI to Create a Multi-Purpose Agentic Marketing Assistant

AI/BI Driving Speed to Value in Supply Chain

Best Practices for Building User-Facing AI Systems on Databricks

Breaking Silos: Enabling Databricks-Snowflake Interoperability With Iceberg and Unity Catalog

Building Responsible and Resilient AI: The Databricks AI Governance Framework

Creating LLM Judges to Measure Domain-Specific Agent Quality

Driving Databricks Platform With Revenue Intelligence ROI

Empowering Healthcare Insights: A Unified Lakehouse Approach With Databricks

GPU Accelerated Spark Connect

How an Open, Scalable and Secure Data Platform is Powering Quick Commerce Swiggy's AI

How Data Sharing is Transforming Healthcare: Real World Insights

How to Get the Most Out of Your BI Tools on Databricks

Introduction to Databricks SQL

Lakeflow Connect: The Game-Changer for Complex Event-Driven Architectures