Data + AI Summit 2025

What’s New in Security and Compliance on the Databricks Data Intelligence Platform

2025-06-11 Watch

talk

Filippo Seracini (Databricks) , Suresh Thiru (Databricks)

AI/ML AWS Azure Cloud Computing Databricks GCP

In this session, we’ll walk through the latest advancements in platform security and compliance on Databricks — from networking updates to encryption, serverless security and new compliance certifications across AWS, Azure and Google Cloud. We’ll also share our roadmap and best practices for how to securely configure workloads on Databricks SQL Serverless, Unity Catalog, Mosaic AI and more — at scale. If you're building on Databricks and want to stay ahead of evolving risk and regulatory demands, this session is your guide.

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

2025-06-11 Watch

talk

Tao Tao (Databricks) , Harish Gaur (Databricks)

AI/ML Data Lakehouse Databricks Delta

Databricks continues to redefine how organizations securely and openly collaborate on data. With new innovations like Clean Rooms for multi-party collaboration, Sharing for Lakehouse Federation, cross-platform view sharing and Databricks Apps in the Marketplace, teams can now share and access data more easily, cost-effectively and across platforms — whether or not they’re using Databricks. In this session, we’ll deliver live demos of key capabilities that power this transformation: Delta Sharing: The industry’s only open protocol for seamless cross-platform data sharing Databricks Marketplace: A central hub for discovering and monetizing data and AI assets Clean Rooms: A privacy-preserving solution for secure, multi-party data collaboration Join us to see how these tools enable trusted data sharing, accelerate insights and drive innovation across your ecosystem. Bring your questions and walk away with practical ways to put these capabilities into action today.

Your Wish is AI Command — Get to Grips With Databricks Genie

2025-06-11 Watch

talk

Simon Whiteley (Advancing Analytics)

AI/ML Analytics BI Databricks LLM SQL

Picture the scene — you're exploring a deep, dark cave looking for insights to unearth when, in a burst of smoke, Genie appears and offers you not three but unlimited data wishes. This isn't a folk tale, it's the growing wave of Generative BI that is going to be a part of analytics platforms. Databricks Genie is a tool powered by a SQL-writing LLM that redefines how we interact with data. We'll look at the basics of creating a new Genie room, scoping its data tables and asking questions. We'll help it out with some complex pre-defined questions and ensure it has the best chance of success. We'll give the tool a personality, set some behavioural guidelines and prepare some hidden easter eggs for our users to discover. Generative BI is going to be a fundamental part of the analytics toolset used across businesses. If you're using Databricks, you should be aware of Genie, if you're not, you should be planning your Generative BI Roadmap, and this session will answer your wishes.

Summit Live: What's New With Databricks and AI

2025-06-11 Watch

talk

Ari Kaplan (Databricks) , Jonathan Frankle (Databricks)

AI/ML Databricks

Databricks has been innovating the AI landscape, faster than ever before. Hear what's new directly from the Chief AI Scientist at Databricks, Jonathan Frankle, who co-founded MosaicML and now leads the AI research team.

Wednesday Keynote

2025-06-11

keynote

Reynold Xin (Databricks) , Kasey Uhlenhuth (Databricks) , Jamie Dimon (JPMorgan Chase) , Greg Ulrich (Mastercard) , Justin DeBrabant (Databricks) , Ali Ghodsi (Databricks) , Richard Masters (Virgin Atlantic Airways) , Holly Smith (Databricks) , Nikita Shamgunov (Neon) , Dario Amodei (Anthropic) , Hanlin Tang (Databricks)

AI/ML Databricks

Be first to witness the latest breakthroughs from Databricks and share the success of innovative data and AI companies.

Over Architected: LIVE

2025-06-11

lightning_talk

Holly Smith (Databricks) , Nick Karpov (Databricks)

Databricks

Join a live recording of the Over Architected Databricks podcast with Nick and Holly as they take the hottest features for the coming week and try to shoehorn them into one architecture. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Learn How the Virtue Foundation Saves Lives by Optimizing Health Care Delivery Across the Globe

2025-06-11 Watch

lightning_talk

Joan LaRovere (Virtue Foundation)

AI/ML GenAI

The Virtue Foundation uses cutting-edge techniques in AI to optimize global health care delivery to save lives. With Unity Catalog as a foundation, they are using advanced Gen AI with model serving, vector search and MLflow to radically change how they map volunteer health resources with the right locations and facilities. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Welcome Reception

2025-06-11

talk

Join us at our Welcome Reception at the Expo to socialize with your fellow data professionals while enjoying food and drinks.

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

2025-06-11 Watch

talk

Arpan Ghosh (Databricks) , Shuting Zhang (Databricks)

Data Governance Databricks Delta Cyber Security

As data-driven companies scale from small startups to global enterprises, managing secure data access becomes increasingly complex. Traditional access control models fall short at enterprise scale, where dynamic, purpose-driven access is essential. In this talk, we explore how our “Just-in-Time” Purpose-Based Access Control (PBAC) platform addresses the evolving challenges of data privacy and compliance, maintaining least privilege while ensuring productivity. Using features like Unity Catalog, Delta Sharing & Databricks Apps, the platform delivers real-time, context-aware data governance. Leveraging JIT PBAC keeps your data secure, your engineers productive, your legal & security teams happy and your organization future-proof in the ever-evolving compliance landscape.

A Practitioner’s Guide to Databricks Serverless

2025-06-11 Watch

talk

Prashanth Babu Velanati Venkata (Databricks)

Analytics Data Engineering Databricks

This session is repeated. Databricks Serverless revolutionizes data engineering and analytics by eliminating the complexities of infrastructure management. This talk will provide an overview of this powerful serverless compute option, highlighting how it enables practitioners to focus solely on building robust data pipelines. We'll explore the core benefits, including automatic scaling, cost optimization and seamless integration with the Databricks ecosystem. Learn how serverless workflows simplify the orchestration of various data tasks, from ingestion to dashboards, ultimately accelerating time-to-insight and boosting productivity. This session is ideal for data engineers, data scientists and analysts looking to leverage the agility and efficiency of serverless computing in their data workflows.

Databricks as the Backbone of MLOps: From Orchestration to Inference

2025-06-11 Watch

talk

Reinier Veral (Globe Telecom) , Cyd Kristoff Redelosa (Globe Telecom)

AI/ML Databricks MLOps

As machine learning (ML) models scale in complexity and impact, organizations must establish a robust MLOps foundation to ensure seamless model deployment, monitoring and retraining. In this session, we’ll share how we leverage Databricks as the backbone of our MLOps ecosystem — handling everything from workflow orchestration to large-scale inference. We’ll walk through our journey of transitioning from fragmented workflows to an integrated, scalable system powered by Databricks Workflows. You’ll learn how we built an automated pipeline that streamlines model development, inference and monitoring while ensuring reliability in production. We’ll also discuss key challenges we faced, lessons learned and best practices for organizations looking to operationalize ML with Databricks.

Delta and Databricks as a Performant Exabyte-Scale Application Backend

2025-06-11 Watch

lightning_talk

Scott Schenkein (Capital One Financial)

Databricks Delta DWH ETL/ELT NoSQL Cyber Security

The Delta Lake architecture promises to provide a single, highly functional, and high-scale copy of data that can be leveraged by a variety of tools to satisfy a broad range of use cases. To date, most use cases have focused on interactive data warehousing, ETL, model training, and streaming. Real-time access is generally delegated to costly and sometimes difficult-to-scale NoSQL, indexed storage, and domain-specific specialty solutions, which provide limited functionality compared to Spark on Delta Lake. In this session, we will explore the Delta data-skipping and optimization model and discuss how Capital One leveraged it along with Databricks photon and Spark Connect to implement a real-time web application backend. We’ll share how we built a highly-functional and performant security information and event management user experience (SIEM UX) that is cost effective.

Empowering Business Users With Databricks — Integrating AI/BI Genie With Microsoft Teams

2025-06-11 Watch

talk

Nathan Sundararajan (Rooms To Go) , Ryan Bates (Databricks)

AI/ML BI Databricks Microsoft

In this session, we'll explore how Rooms To Go enhances organizational collaboration by integrating AI/BI Genie with Microsoft Teams. Genie enables warehouse employees and members of the sales team to interact with data using natural language, simplifying data exploration and analysis. By connecting Genie to Microsoft Teams, we bring real-time data insights directly to a user’s phone. We'll provide a comprehensive overview on setting up this integration as well as a demo of how the team uses it daily. Attendees will gain practical knowledge to implement this integration, empowering their teams to access and interact with data seamlessly within Microsoft Teams.

Enterprise Cost Management for Data Warehousing with Databricks SQL

2025-06-11 Watch

talk

Patrick Yang (Databricks) , Joo Ho Yeo (Databricks)

Databricks DWH SQL

This session shows you how to gain visibility into your Databricks SQL spend and ensure cost efficiency. Learn about the latest features to gain detailed insights into Databricks SQL expenses so you can easily monitor and control your costs. Find out how you can enable attribution to internal projects, understand the Total Cost of Ownership, set up proactive controls and find ways to continually optimize your spend.

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

2025-06-11 Watch

talk

Deepak Khetpal (Tiger Analytics) , Surya Sai Turaga (Databricks)

AI/ML Analytics Data Lakehouse Data Quality Databricks

We will present a framework for FinCrime detection leveraging Databricks lakehouse architecture specifically how institutions can achieve both data flexibility & ACID transaction guarantees essential for FinCrime monitoring. The framework incorporates advanced ML models for anomaly detection, pattern recognition, and predictive analytics, while maintaining clear data lineage & audit trails required by regulatory bodies. We will also discuss some specific improvements in reduction of false positives, improvement in detection speed, and faster regulatory reporting, delve deep into how the architecture addresses specific FATF recommendations, Basel III risk management requirements, and BSA compliance obligations, particularly in transaction monitoring and SAR. The ability to handle structured and unstructured data while maintaining data quality and governance makes it particularly valuable for large financial institutions dealing with complex, multi-jurisdictional compliance requirements.

From Imperative to Declarative Paradigm: Rebuilding a CI/CD Infrastructure Using Hatch and DABs

2025-06-11 Watch

talk

Luigi Di Tacchio (FreeWheel, a Comcast Company)

API CI/CD Databricks PySpark

Building and deploying Pyspark pipelines to Databricks should be effortless. However, our team at FreeWheel has, for the longest time, struggled with a convoluted and hard-to-maintain CI/CD infrastructure. It followed an imperative paradigm, demanding that every project implement custom scripts to build artifacts and deploy resources, and resulting in redundant boilerplate code and awkward interactions with the Databricks REST API. We set our mind on rebuilding it from scratch, following a declarative paradigm instead. We will share how we were able to eliminate thousands of lines of code from our repository, create a fully configuration-driven infrastructure where projects can be easily onboarded, and improve the quality of our codebase using Hatch and Databricks Asset Bundles as our tools of choice. In particular, DAB has made deploying across our 3 environments a breeze, and has allowed us to quickly adopt new features as soon as they are released by Databricks.

Future of Anti-Cheat With Riot Games

2025-06-11 Watch

talk

Carly Taylor (Databricks) , Phillip Koskinas (Riot Games)

AI/ML Analytics Cyber Security

As online gaming evolves, so do cheating methods that exploit client-server vulnerabilities. Traditional anti-cheat, such as kernel-level drivers and runtime detections, has long been the primary defense. However, advanced cheats like Direct Memory Access (DMA) exploits and AI-powered Computer Vision (CV) hacks increasingly render client-side detection ineffective. This presentation examines the escalating arms race between cheat creators and developers, highlighting client-side limitations. With CV cheats mimicking human behavior, anti-cheat must shift toward server-side, data-driven detection. By leveraging AI, machine learning, and behavioral analytics to analyze player patterns, input anomalies, and decision inconsistencies, future solutions can move beyond static detection to adaptive security models, ensuring fair play at scale. The session will also include real-life examples from Riot Games’ anti-cheat efforts, specifically insights and case studies from the development and operation of Riot Vanguard, to illustrate how these strategies are applied in practice.

GenAI-Powered Shopping Assistant for Prada e-Commerce Search Bar

2025-06-11

talk

Simone Giordani (Data Reply IT) , Maria Paola Tatulli (Prada Group)

AI/ML GenAI

Prada has developed a complex solution, leveraging MosaicAI to propose an interactive and natural language product discovery capability that could improve its e-commerce search bar. The backbone is a 70B model and a Vector Store, which collaborates with additional filterings and AI solutions to suggest not only the perfect outfit for each occasion, but also provide alternative solutions and similar items.

Generating Zero-Shot Hard-Case Hallucinations: A Synthetic and Open Data Approach

2025-06-11 Watch

lightning_talk

Eric Tramel (NVIDIA)

LLM

We present a novel framework for designing and inducing controlled hallucinations in long-form content generation by LLMs across diverse domains. The purpose is to create fully-synthetic benchmarks and mine hard cases for iterative refinement of zero-shot hallucination detectors. We will first demonstrate how Gretel Data Designer (now part of NVIDIA) can be used to design realistic, high-quality long-context datasets across various domains. Second, we will describe our reasoning-based approach to hard-case mining. Specifically, our methodology relies on chain-of-thought-based generation of both faithful and deceptive question-answer pairs based upon long-context samples. Subsequently, a consensus labeling & detector framework is employed to filter synthetic examples to zero-shot hard cases. The result of this process is a fully-automated system, operating under open data licenses such as Apache-2.0, for the generation of hallucinations at the edge-of-capabilities for a target LLM to detect.

Integrating AI With Data: A Unified Strategy for Business

2025-06-11 Watch

talk

David Crolene (EXL)

AI/ML Data Quality

In the modern business landscape, AI and data strategies can no longer operate in isolation. To drive meaningful outcomes, organizations must align these critical components within a unified framework tied to overarching business objectives. This presentation explores the necessity of integrating AI and data strategies, emphasizing the importance of high-quality data, scalable architectures and robust governance. Attendees will learn three essential steps that need to be taken: Recognize that AI requires the right data to succeed Prioritize data quality and architecture Establish strong governance practices Additionally, the talk will highlight the cultural shift required to bridge IT and business silos, fostering roles that combine technical and business expertise. We’ll dive into specific practical steps that can be taken to ensure an organization has a cohesive and blended AI and data strategy, using specific case examples.

Maximize Retail Data Insights in Genie with DeltaSharing via Crisp’s Collaborative Commerce Platform

2025-06-11 Watch

lightning_talk

Julienne Biglin (Crisp)

AI/ML Analytics BI Data Science Databricks

Crisp streamlines a brand’s data ingestion across 60+ retail sources, to build a foundation of sales and inventory intelligence on Databricks. Data is normalized and analysis-ready, and integrates seamlessly with AI tools - such as Databricks’ Genie and Blueprints. This session will provide an overview of the Crisp retail data platform and how our semantic layer, normalized and harmonized data sets can help drive powerful insights for supply chain, BI/Analytics, and data science teams.

Meet Goose, an Open Source AI Agent

2025-06-11 Watch

talk

Bradley Axen (Block)

AI/ML GitHub Jira

goose is an open source AI agent framework that allows anyone to connect language model output to real world action. Released in January by Block (the company made up of Square, Cash App, Afterpay, and TIDAL), its use cases range from vibe coding to connecting all of the internal apps and services an enterprise uses. It can be powered by any language model that has tool calling capabilities.goose's modular design allows it to connect with any system through simple extensions. Built on the open Model Context Protocol (developed with Anthropic), goose transforms natural language into actions across various tools and services. Whether integrating with platforms like Jira and GitHub, or executing system commands and scripts, its plug-and-play architecture means anyone can extend Goose's capabilities to suit their needs.Finally, goose has both a command line interface and desktop app — it isn't limited to an IDE to start connecting to MCP servers and building powerful agentic workflows.

One-Stop Machine Translation Solution in Game Domain From Real-Time UGC Content to In-Game Text

2025-06-11 Watch

talk

Junxuan Huang (Tencent)

AI/ML LLM

We present Level Infinite AI Translation, a translation engine developed by Tencent, tailored specifically for the gaming industry. The primary challenge in game machine translation (MT) lies in accurately interpreting the intricate context of game texts, effectively handling terminology and adapting to the highly diverse translation formats and stylistic requirements across different games. Traditional MT approaches cannot effectively address the aforementioned challenges due to their weak context representation ability and lack of common knowledge. Leveraging large language model and related technology, our engine is crafted to capture the subtleties of localized language expression while ensuring optimization for domain-specific terminology, jargon and required formats and styles. To date, the engine has been successfully implemented in 15 international projects, translating over one billion words across 23 languages, and has demonstrated cost savings exceeding 25% for partners.

Real-Time Analytics Pipeline for IoT Device Monitoring and Reporting

2025-06-11 Watch

talk

Nayan Sharma (CKDelta) , Padraic Kirrane (CK Delta)

Analytics API Cloud Computing Cloud Storage Data Quality Databricks

This session will show how we implemented a solution to support high-frequency data ingestion from smart meters. We implemented a robust API endpoint that interfaces directly with IoT devices. This API processes messages in real time from millions of distributed IoT devices and meters across the network. The architecture leverages cloud storage as a landing zone for the raw data, followed by a streaming pipeline built on Lakeflow Declarative Pipelines. This pipeline implements a multi-layer medallion architecture to progressively clean, transform and enrich the data. The pipeline operates continuously to maintain near real-time data freshness in our gold layer tables. These datasets connect directly to Databricks Dashboards, providing stakeholders with immediate insights into their operational metrics. This solution demonstrates how modern data architecture can handle high-volume IoT data streams while maintaining data quality and providing accessible real-time analytics for business users.

Scaling Trust in BI: How Bolt Manages Thousands of Metrics Across Databricks, dbt, and Looker

2025-06-11 Watch

talk

Silja Märdla (Bolt) , Sarah Levy (Euno)

AI/ML Analytics BI Databricks dbt Looker

Managing metrics across teams can feel like everyone’s speaking a different language, which often leads to loss of trust in numbers. Based on a real-world use case, we’ll show you how to establish a governed source of truth for metrics that works at scale and builds a solid foundation for AI integration. You’ll explore how Bolt.eu’s data team governs consistent metrics for different data users and leverages Euno’s automations to navigate the overlap between Looker and dbt. We’ll cover best practices for deciding where your metrics belong and how to optimize engineering and maintenance workflows across Databricks, dbt and Looker. For curious analytics engineers, we’ll dive into thinking in dimensions & measures vs. tables & columns and determining when pre-aggregations make sense. The goal is to help you contribute to a self-serve experience with consistent metric definitions, so business teams and AI agents can access the right data at the right time without endless back-and-forth.

talk-data.com

Top Topics

Top Speakers

What’s New in Security and Compliance on the Databricks Data Intelligence Platform

What’s new with Collaboration: Delta Sharing, Clean Room, Marketplace and the Ecosystem

Your Wish is AI Command — Get to Grips With Databricks Genie

Summit Live: What's New With Databricks and AI

Wednesday Keynote

Over Architected: LIVE

Learn How the Virtue Foundation Saves Lives by Optimizing Health Care Delivery Across the Globe

Welcome Reception

Advanced Data Access Control for the Exabyte Era: Scaling with Purpose

A Practitioner’s Guide to Databricks Serverless

Databricks as the Backbone of MLOps: From Orchestration to Inference

Delta and Databricks as a Performant Exabyte-Scale Application Backend

Empowering Business Users With Databricks — Integrating AI/BI Genie With Microsoft Teams

Enterprise Cost Management for Data Warehousing with Databricks SQL

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

From Imperative to Declarative Paradigm: Rebuilding a CI/CD Infrastructure Using Hatch and DABs

Future of Anti-Cheat With Riot Games

GenAI-Powered Shopping Assistant for Prada e-Commerce Search Bar

Generating Zero-Shot Hard-Case Hallucinations: A Synthetic and Open Data Approach

Integrating AI With Data: A Unified Strategy for Business

Maximize Retail Data Insights in Genie with DeltaSharing via Crisp’s Collaborative Commerce Platform

Meet Goose, an Open Source AI Agent

One-Stop Machine Translation Solution in Game Domain From Real-Time UGC Content to In-Game Text

Real-Time Analytics Pipeline for IoT Device Monitoring and Reporting

Scaling Trust in BI: How Bolt Manages Thousands of Metrics Across Databricks, dbt, and Looker