Data + AI Summit 2025

Sponsored by: Monte Carlo | The Illusion of Done: Why the Real Work for AI Starts in Production

Transforming Title Insurance With Databricks Batch Inference

2025-06-10 Watch

talk

Madhu Kolli (First American Financial) , Prabhaker Narsina (First American Financial)

AI/ML Analytics Databricks

Join us as we explore how First American Data & Analytics, a leading property-centric information provider, revolutionized its data extraction processes using batch inference on the Databricks Platform. Discover how it overcame the challenges of extracting data from millions of historical title policy images and reduced project timelines by 75%. Learn how First American optimized its data processing capabilities, reduced costs by 70% and enhanced the efficiency of its title insurance processes, ultimately improving the home-buying experience for buyers, sellers and lenders. This session will delve into the strategic integration of AI technologies, highlighting the power of collaboration and innovation in transforming complex data challenges into scalable solutions.

Unleash the Power of Automated Data Governance: Classify, Tag and Protect Your Data — Effortlessly

2025-06-10 Watch

talk

Zeashan Pappa (Databricks) , Kristen Wilder (Databricks)

Data Governance Cyber Security

Struggling to keep up with data governance at scale? Join us to explore how automated data classification, tag policies and ABAC streamline access control while enhancing security and compliance. Get an exclusive look at the new Governance Hub, built to give your teams deeper visibility into data usage, access patterns and metadata — all in one place. Whether you're managing thousands or millions of assets, discover how to classify, tag and protect your data estate effortlessly with the latest advancements in Unity Catalog.

You Mean I Can Talk to My Data? Reimagining How KPMG Engages Data Using AI|BI Genie

2025-06-10 Watch

lightning_talk

Dennis Tally (KPMG)

AI/ML BI Dashboard

“I don’t want to spend time filtering through another dashboard — I just need an answer now.” We’ve all experienced the frustration of wading through dashboards, yearning for immediate answers. Traditional reports and visualizations, though essential, often complicate the process for decision-makers. The digital enterprise demands a shift towards conversational, natural language interactions with data. At KPMG, AI|BI Genie is reimagining our approach by allowing users to inquire about data just as they would consult a knowledgeable colleague, obtaining precise and actionable insights instantly. Discover how the KPMG Contract to Cash team leverages AI|BI Genie to enhance data engagement, drive insights and foster business growth. Join us to see AI|BI Genie in action and learn how you can transform your data interaction paradigm.

A Comprehensive Guide to Streaming on the Data Intelligence Platform

2025-06-10 Watch

talk

Indrajit Roy (Databricks) , Ray Zhu (Databricks)

Spark Data Streaming

This session is repeated.Is stream processing the future? We think so — and we’re building it with you using the latest capabilities in Apache Spark™ Structured Streaming. If you're a power user, this session is for you: we’ll demo new advanced features, from state transformations to real-time mode. If you prefer simplicity, this session is also for you: we’ll show how Lakeflow Declarative Pipelines simplifies managing streaming pipelines. And if you’re somewhere in between, we’ve got you covered — we’ll explain when to use your own streaming jobs versus Lakeflow Declarative Pipelines.

AI/BI Dashboards and AI/BI Genie: Dashboards and Last-Mile Analytics Made Simple

2025-06-10 Watch

talk

Josue Bogran (JosueBogran.com & zeb.co) , Youssef Mrini (Databricks)

AI/ML Analytics BI Dashboard Databricks LLM

Databricks announced two new features in 2024: AI/BI Dashboards and AI/BI Genie. Dashboards is a redesigned dashboarding experience for your regular reporting needs, while Genie provides a natural language experience for your last-mile analytics. In this session, Databricks Solutions Architect and content creator Youssef Mrini will present alongside Databricks MVP and content creator Josue A. Bogran on how you can get the most value from these tools for your organization. Content covered includes: Setup necessary, including Unity Catalog, permissions and compute Building out a dashboard with AI/BI Dashboards Creating and training an AI/BI Genie workspace to reliably deliver answers When to use Dashboards, Genie, and when to use other tools such as PBI, Tableau, Sigma, ChatGPT, etc. Fluff-free, full of practical tips, and geared to help you deliver immediate impact with these new Databricks capabilities.

Best Practices to Mitigate AI Security Risks

2025-06-10 Watch

talk

Arun Pamulapati (Databricks) , Samrat Ray (Databricks)

AI/ML Databricks Cyber Security

This session is repeated. AI is transforming industries, enhancing customer experiences and automating decisions. As organizations integrate AI into core operations, robust security is essential. The Databricks Security team collaborated with top cybersecurity researchers from OWASP, Gartner, NIST, HITRUST and Fortune 100 companies to evolve the Databricks AI Security Framework (DASF) to version 2.0. In this session, we’ll cover an AI security architecture using Unity Catalog, MLflow, egress controls, and AI gateway. Learn how security teams, AI practitioners and data engineers can secure AI applications on Databricks. Walk away with:• A reference architecture for securing AI applications• A worksheet with AI risks and controls mapped to industry standards like MITRE, OWASP, NIST and HITRUST• A DASF AI assistant tool to test your AI security

Building AI Models In Health Care Using Semi-Synthetic Data

2025-06-10 Watch

talk

Holden Karau (Fight Health Insurance INC)

AI/ML GitHub

Regulated or restricted fields like Health Care make collecting training data complicated. We all want to do the right thing, but how? This talk will look at how Fight Health Insurance used de-identified public and proprietary information to create a semi-synthetic training set for use in fine-tuning machine learning models to power Fight Paperwork. We'll explore how to incorporate the latest "reasoning" techniques in fine tuning as well as how to make models that you can afford to serve — think single GPU inference instead of a cluster of A100s. In addition to the talk we have the code used in a public GitHub repo — although it is a little rough, so you might want to use it more as a source of inspiration rather than directly forking it.

Building Knowledge Agents to Automate Document Workflows

2025-06-10 Watch

talk

Jerry Liu (LlamaIndex)

AI/ML LLM

This session is repeated. One of the biggest promises for LLM agents is automating all knowledge work over unstructured data — we call these "knowledge agents". To date, while there are fragmented tools around data connectors, storage and agent orchestration, AI engineers have trouble building and shipping production-grade agents beyond basic chatbots. In this session, we first outline the highest-value knowledge agent use cases we see being built and deployed at various enterprises. These are: Multi-step document research, Automated document extraction Report generation We then define the core architectural components around knowledge management and agent orchestration required to build these use cases. By the end you'll not only have an understanding of the core technical concepts, but also an appreciation of the ROI you can generate for end-users by shipping these use cases to production.

Building Real-Time Sport Model Insights with Spark Structured Streaming

2025-06-10 Watch

talk

Aaron Hope (Draftkings) , Ethan Summers (Draftkings)

Databricks Kafka Spark Data Streaming

In the dynamic world of sports betting, precision and adaptability are key. Sports traders must navigate risk management, limitations of data feeds, and much more to prevent small model miscalculations from causing significant losses. To ensure accurate real-time pricing of hundreds of interdependent markets, traders provide key inputs such as player skill-level adjustments, whilst maintaining precise correlations. Black-box models aren’t enough— constant feedback loops drive informed, accurate decisions. Join DraftKings as we showcase how we expose real-time metrics from our simulation engine, to empower traders with deeper insights into how their inputs shape the model. Using Spark Structured Streaming, Kafka, and Databricks dashboards, we transform raw simulation outputs into actionable data. This transparency into our engines enables fine-grained control over pricing― leading to more accurate odds, a more efficient sportsbook, and an elevated customer experience.

Composing High-Accuracy AI Systems With SLMs and Mini-Agents

2025-06-10 Watch

talk

Sharon Zhou (Lamini)

AI/ML LLM RAG SQL

This session is repeated. For most companies, building compound AI systems remains aspirational. LLMs are powerful, but imperfect, and their non-deterministic nature makes steering them to high accuracy a challenge. In this session, we’ll demonstrate how to build compound AI systems using SLMs and highly accurate mini-agents that can be integrated into agentic workflows. You'll learn about breakthrough techniques, including: memory RAG, an embedding algorithm that reduces hallucinations using embed-time compute to generate contextual embeddings, improving indexing and retrieval, and memory tuning, a finetuning algorithm that reduces hallucinations using a Mixture of Memory Experts (MoME) to specialize models with proprietary data. We’ll also share real-world examples (text-to-SQL, factual reasoning, function calling, code analysis and more) across various industries. With these building blocks, we’ll demonstrate how to create high accuracy mini-agents that can be composed into larger AI systems.

Comprehensive Data Management and Governance With Azure Data Lake Storage

2025-06-10 Watch

talk

James Baker (Microsoft) , Santhosh Pillai (Microsoft Corporation)

Azure Data Governance Data Lake Data Management Databricks

Given that data is the new oil, it must be treated as such. Organizations that pursue greater insight into their businesses and their customers must manage, govern, protect and observe the use of the data that drives these insights in an efficient, cost-effective, compliant and auditable manner without degrading access to that data. Azure Data Lake Storage offers many features which allow customers to apply such controls and protections to their critical data assets. Understanding how these features behave, the granularity, cost and scale implications and the degree of control or protection that they apply are essential to implement a data lake that reflects the value contained within. In this session, the various data protection, governance and management capabilities available now and upcoming in ADLS will be discussed. This will include how deep integration with Azure Databricks can provide a more comprehensive, end-to-end coverage for these concerns, yielding a highly efficient and effective data governance solution.

Data Modeling 101 for Data Lakehouse Demystified

2025-06-10 Watch

talk

Luan Moreno Medeiros Maciel (Pythian)

Data Lakehouse Data Modelling

This session is repeated. In today’s data-driven world, the Data Lakehouse has emerged as a powerful architectural paradigm that unifies the flexibility of data lakes with the reliability and structure of traditional data warehouses. However, organizations must adopt the right data modeling techniques to unlock its full potential to ensure scalability, maintainability and efficiency. This session is designed for beginners looking to demystify the complexities of data modeling for the lakehouse and make informed design decisions. We’ll break down Medallion Architecture, explore key data modeling techniques and walk through the maturity stages of a successful data platform — transitioning from raw, unstructured data to well-organized, query-efficient models.

Delta Lake and the Data Mesh

2025-06-10 Watch

talk

KyJah Keys (Nextdata)

Data Lakehouse Databricks Delta DuckDB Polars Spark

Delta Lake has proven to be an excellent storage format. Coupled with the Databricks platform, the storage format has shined as a component of a distributed system on the lakehouse. The pairing of Delta and Spark provides an excellent platform, but users often struggle to perform comparable work outside of the Spark ecosystem. Tools such as delta-rs, Polars and DuckDb have brought access to users outside of Spark, but they are only building blocks of a larger system. In this 40-minute talk we will demonstrate how users can use data products on the Nextdata OS data mesh to interact with the Databricks platform to drive Delta Lake workflows. Additionally, we will show how users can build autonomous data products that interact with their Delta tables both inside and outside of the lakehouse platform. Attendees will learn how to integrate the Nextdata OS data mesh with the Databricks platform as both an external and integral component.

From Metadata to Agents: Building the future of content understanding with Coactive AI + Databricks

2025-06-10

talk

Augusto Moreno (NBC Universal) , William Gaviria Rojas (Coactive AI)

AI/ML BI Databricks Delta

Media enterprises generate vast amounts of visual content, but unlocking its full potential requires multimodal AI at scale. Coactive AI and NBCUniversal’s Corporate Decision Sciences team are transforming how enterprises discover and understand visual content. We explore how Coactive AI and Databricks — from Delta Share to Genie — can revolutionize media content search, tagging and enrichment, enabling new levels of collaboration. Attendees will see how this AI-powered approach fuels AI workflows, enhances BI insights and drives new applications — from automating cut sheet generation to improving content compliance and recommendations. By structuring and sharing enriched media metadata, Coactive AI and NBCU are unlocking deeper intelligence and laying the groundwork for agentic AI systems that retrieve, interpret and act on visual content. This session will showcase real-world examples of these AI agents and how they can reshape future content discovery and media workflows.

How Corning Harnesses Unity Catalog for Enhanced FinOps Maturity and Cost Optimization

2025-06-10 Watch

talk

Hamenoo, Jibreal (Corning Incorporated) , Matthew Kuehn (Databricks)

Data Management Databricks FinOps

We will explore how leveraging Databricks' Unity Catalog has accelerated our FinOps maturity, enabling us to optimize platform utilization and achieve significant cost reductions. By implementing Unity Catalog, we've gained comprehensive visibility and governance over our data assets, leading to more informed decision-making and efficient resource allocation. Learn how Corning discovered actionable insights and leveraged best practices on utilizing Unity Catalog to streamline data management, enhance financial operations and drive substantial savings within your organization.

Migrating Legacy SAS Code to Databricks Lakehouse: What We Learned Along the Way

2025-06-10 Watch

talk

Dmitriy Alergant (Tier One Analytics Inc.) , Matt Adams (PacificSource Health Plans)

Analytics Data Lakehouse Databricks DWH SAS

In PacificSource Health Plans, a health insurance company in the US, we are on a successful multi-year journey to migrate all of our data and analytics ecosystem to Databricks Enterprise Data Warehouse (lakehouse). A particular obstacle on this journey was a reporting data mart which relied on copious amounts of legacy SAS code that applied sophisticated business logic transformations for membership, claims, premiums and reserves. This core data mart was driving many of our critical reports and analytics. In this session we will share the unique and somewhat unexpected challenges and complexities we encountered in migrating this legacy SAS code. How our partner (T1A) leveraged automation technology (Alchemist) and some unique approaches to reverse engineer (analyze), instrument, translate, migrate, validate and reconcile these jobs; and what lessons we learned and carried from this migration effort.

Orchestration With Lakeflow Jobs

2025-06-10 Watch

talk

Saad Ansari (Databricks) , Anthony Podgorsak (Databricks)

AI/ML Databricks ETL/ELT

This session is repeated. Curious about orchestrating data pipelines on Databricks? Join us for an introduction to Lakeflow Jobs (formerly Databricks Workflows) — an easy-to-use orchestration service built into the Databricks Data Intelligence Platform. Lakeflow Jobs simplifies automating your data and AI workflows, from ETL pipelines to machine learning model training. In this beginner-friendly session, you'll learn how to: Build and manage pipelines using a visual approach Monitor workflows and rerun failures with repair runs Automate tasks like publishing dashboards or ingesting data using Lakeflow Connect Add smart triggers that respond to new files or table updates Use built-in loops and conditions to reduce manual work and make workflows more dynamic We’ll walk through common use cases, share demos and offer tips to help you get started quickly. If you're new to orchestration or just getting started with Databricks, this session is for you.

Revolutionizing Data Insights and the Buyer Experience at GM Financial with Cloud Data Modernization

2025-06-10 Watch

talk

Latha Subramanian (GM Financial) , Rick Whitford (Deloitte Consulting, LLP)

AI/ML Analytics Cloud Computing Data Governance Databricks Cyber Security

Deloitte and GM (General Motors) Financial have collaborated to design and implement a cutting-edge cloud analytics platform, leveraging Databricks. In this session, we will explore how we overcame challenges including dispersed and limited data capabilities, high-cost hardware and outdated software, with a strategic and comprehensive approach. With the help of Deloitte and Databricks, we were able to develop a unified Customer360 view, integrate advanced AI-driven analytics, and establish robust data governance and cyber security measures. Attendees will gain valuable insights into the benefits realized, such as cost savings, enhanced customer experiences, and broad employee upskilling opportunities. Unlock the impact of cloud data modernization and advanced analytics in the automotive finance industry and beyond with Deloitte and Databricks.

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

2025-06-10 Watch

talk

El Ghali Benchekroun (Databricks) , Bilal Obeidat (Databricks) , Bhavin Kukadia (Databricks)

AI/ML Analytics Databricks Delta Cyber Security

This session will focus on the security aspects of Databricks Delta Sharing, Databricks Cleanrooms and Databricks Marketplace, providing an exploration of how these solutions enable secure and scalable data collaboration while prioritizing privacy. Highlights: Use cases — Understand how Delta Sharing facilitates governed, real-time data exchange across platforms and how Cleanrooms support multi-party analytics without exposing sensitive information Security internals — Dive into Delta Sharing's security frameworks Dynamic views — Learn about fine-grained security controls Privacy-first Cleanrooms — Explore how Cleanrooms enable secure analytics while maintaining strict data privacy standards Private exchanges — Explore the role of private exchanges using Databricks Marketplace in securely sharing custom datasets and AI models with specific partners or subsidiaries Network security & compliance — Review best practices for network configurations and compliance measures

Simplifying Training and GenAI Finetuning Using Serverless GPU Compute

2025-06-10 Watch

talk

Tejas Sundaresan (Databricks)

AI/ML Data Lakehouse Databricks GenAI LLM PyTorch

The last year has seen the rapid progress of Open Source GenAI models and frameworks. This talk covers best practices for custom training and OSS GenAI finetuning on Databricks, powered by the newly announced Serverless GPU Compute. We’ll cover how to use Serverless GPU compute to power AI training/GenAI finetuning workloads and framework support for libraries like LLM Foundry, Composer, HuggingFace, and more. Lastly, we’ll cover how to leverage MLFlow and the Databricks Lakehouse to streamline the end to end development of these models. Key takeaways include: How Serverless GPU compute saves customers valuable developer time and overhead when dealing with GPU infrastructure Best practices for training custom deep learning models (forecasting, recommendation, personalization) and finetuning OSS GenAI Models on GPUs across the Databricks stack Leveraging distributed GPU training frameworks (e.g. Pytorch, Huggingface) on Databricks Streamlining the path to production for these models Join us to learn about the newly announced Serverless GPU Compute and the latest updates to GPU training and finetuning on Databricks!

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

talk-data.com

Top Topics

Top Speakers

Sponsored by: Monte Carlo | The Illusion of Done: Why the Real Work for AI Starts in Production

Sponsored by: Prophecy | Taming Industry-Specific Data Sets: How to Simplify Access and Collaboration to FHIR and Beyond

Transforming Title Insurance With Databricks Batch Inference

Unleash the Power of Automated Data Governance: Classify, Tag and Protect Your Data — Effortlessly

You Mean I Can Talk to My Data? Reimagining How KPMG Engages Data Using AI|BI Genie

A Comprehensive Guide to Streaming on the Data Intelligence Platform

AI/BI Dashboards and AI/BI Genie: Dashboards and Last-Mile Analytics Made Simple

Best Practices to Mitigate AI Security Risks

Building AI Models In Health Care Using Semi-Synthetic Data

Building Knowledge Agents to Automate Document Workflows

Building Real-Time Sport Model Insights with Spark Structured Streaming

Composing High-Accuracy AI Systems With SLMs and Mini-Agents

Comprehensive Data Management and Governance With Azure Data Lake Storage

Data Modeling 101 for Data Lakehouse Demystified

Delta Lake and the Data Mesh

From Metadata to Agents: Building the future of content understanding with Coactive AI + Databricks

How Corning Harnesses Unity Catalog for Enhanced FinOps Maturity and Cost Optimization

Migrating Legacy SAS Code to Databricks Lakehouse: What We Learned Along the Way

Orchestration With Lakeflow Jobs

Revolutionizing Data Insights and the Buyer Experience at GM Financial with Cloud Data Modernization

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

Simplifying Training and GenAI Finetuning Using Serverless GPU Compute

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

Sponsored by: Cognizant | Toyota Utilizes a Unified Lakehouse Approach with Databricks

Sponsored by: Microsoft | Leverage the power of the Microsoft Ecosystem with Azure Databricks