Data + AI Summit 2025

Securing Databricks Using Databricks as SIEM

2025-06-11

talk

Kishore Prabakaran Fernando (Databricks) , Yugi Reddy (Databricks)

Data Collection Databricks Cyber Security

Securing Databricks using Databricks as SIEM showcases our approach on how we leverage Databricks product capabilities to prevent and mitigate security risks for Databricks. It demonstrates how Databricks can serve as a powerful Security Information and Event Management (SIEM) platform, offering advanced capabilities for data collection and threat detection. This session explores data collection from diverse data sources and real-time threat detection.

Simplified Delta Sharing With Network Security

2025-06-11 Watch

networking

Krishna Puttaswamy (Databricks) , Samrat Ray (Databricks)

Cloud Computing Delta Cyber Security

Delta Sharing enables cross-domain sharing of data assets for collaboration. A practical concern providers and recipients face in doing so is the need to manually configure network and storage firewalls. This is particularly challenging for large-scale providers and recipients with strict compliance requirements. In this talk, we will describe our solution to fully eliminate these complexities. This enhances user experience, scalability and security, facilitating seamless data collaboration across diverse environments and cloud platforms.

Simplify Data Ingest and Egress with the New Python Data Source API

2025-06-11 Watch

talk

Craig Lukasik (Databricks)

API Data Engineering Databricks Python Spark

Data engineering teams are frequently tasked with building bespoke ingest and/or egress solutions for myriad custom, proprietary, or industry-specific data sources or sinks. Many teams find this work cumbersome and time-consuming. Recognizing these challenges, Databricks interviewed numerous companies across different industries to better understand their diverse data integration needs. This comprehensive feedback led us to develop the Python Data Source API for Apache Spark™.

Sponsored by: Google Cloud | Unleash the power of Gemini for Databricks

Transforming Data Governance for Multimodal Data at Amgen With Databricks

2025-06-11 Watch

talk

Jaison Dominic (Amgen) , Jinesh Kunjumon (AMGEN)

Data Governance Databricks GenAI Fabric

Amgen is advancing its Enterprise Data Fabric to securely manage sensitive multimodal data, such as imaging and research data, across formats.Databricks is already the de facto standard for governance on structured data, and Amgen seeks to extend it for unstructured multi modal data too. This approach will also allow Amgen to standardize its GenAI projects on Databricks. Key priorities include: Centralized data access: establishing a unified, secure access control system Enhanced traceability: implementing detailed processes for transparency and accountability Consistent access standards: ensuring uniform data access privilege experience User support: providing flexible access for diverse stakeholders Comprehensive auditing: enabling thorough permission audits and data usage tracking Learn strategies for implementing a comprehensive multimodal data governance framework using Databricks, as we share our experience on standardizing data governance for GenAI use cases.

Turn Genie Into an Agent Using Conversation APIs

2025-06-11 Watch

talk

Prithvi Kannan (Databricks) , Hanlin Sun (Databricks)

AI/ML Analytics API BI SQL

Transform your AI/BI Genie into a text-to-SQL powerhouse using the Genie Conversation APIs. This session explores how Genie functions as an intelligent agent, translating natural language queries into SQL to accelerate insights and enhance self-service analytics. You'll learn practical techniques for configuring agents, optimizing queries and handling errors — ensuring Genie delivers accurate, relevant responses in real time. A must-attend for teams looking to level up their AI/BI capabilities and deliver smarter analytics experiences.

Unlocking Access: Simplifying Identity Management at Scale With Databricks

2025-06-11 Watch

talk

Keegan Dubbs (Databricks) , Hari Selvarajan (Databricks)

Azure Cloud Computing Databricks GCP

Effective Identity and Access Management (IAM) is essential for securing enterprise environments while enabling innovation and collaboration. As companies scale, ensuring users have the right access without adding administrative overhead is critical. In this session, we’ll explore how Databricks is simplifying identity management by integrating with customers’ Identity Providers (IDPs). Learn about Automatic Identity Management in Azure Databricks, which eliminates SCIM for Entra ID users and ensures scalable identity provisioning for other IDPs. We'll also cover externally managed groups, PIM integration and upcoming enhancements like a bring-your-own-IDP model for Google Cloud. Through a customer success story and live demo, see how Databricks is making IAM more scalable, secure and user-friendly.

Industry Forum Networking Reception

2025-06-11

networking

AI/ML

Data + AI Summit brings together thousands of leaders from across the industry ecosystem, sharing ideas and practical applications for data and AI. At 5:00pm on Tuesday, June 10 following the Industry & Solution Forums, join us in Moscone West Level 2 for a networking event (refreshments provided) and connect with industry peers!

Building Trustworthy AI at Northwestern Mutual: Guardrail Technologies and Strategies

2025-06-10 Watch

lightning_talk

Nicholas Brathwaite (Northwestern Mutual)

AI/ML Databricks LLM

This intermediate-level presentation will explore the various methods we've leveraged within Databricks to deliver and evaluate guardrail models for AI safety. From prompt engineering with custom built frameworks to hosting models served from the market place and beyond. We've utilized GPU within clusters to fine-tune and run large open sourced models at inference such as Llama Guard 3.1 and generate synthetic datasets based on questions we've received from production.

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

2025-06-10 Watch

lightning_talk

Gowri Shankar (Doordash) , Chao Wang (DoorDash)

Analytics Data Governance Delta Marketing

The "Doordash Customer 360 Data Store" represents a foundational step in centralizing and managing customer profile to enable targeting and personalized customer experiences built on Delta Lake. This presentation will explore the initial goals and architecture of the Customer 360 Data Store, its journey to becoming a robust entity management framework, and the challenges and opportunities encountered along the way. We will discuss how the evolution addressed scalability, data governance and integration needs, enabling the system to support dynamic and diverse use cases, including customer lifecycle analytics, marketing campaign targeting using segmentation. Attendees will gain insight into key design principles, technical innovations and strategic decisions that transformed the system into a flexible platform for entity management, positioning it as a critical enabler of data-driven growth at Doordash. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

No Time for the Dad Bod: Automating Life with AI and Databricks

2025-06-10 Watch

lightning_talk

Sean Falconer (Confluent)

AI/ML Data Management Databricks Delta GenAI

Life as a father, tech leader, and fitness enthusiast demands efficiency. To reclaim my time, I’ve built AI-driven solutions that automate everyday tasks—from research agents that prep for podcasts to multi-agent systems that plan meals—all powered by real-time data and automation. This session dives into the technical foundations of these solutions, focusing on event-driven agent design and scalable patterns for robust AI systems. You’ll discover how Databricks technologies like Delta Lake, for reliable and scalable data management, and DSPy, for streamlining the development of generative AI workflows, empower seamless decision-making and deliver actionable insights. Through detailed architecture diagrams and a live demo, I’ll showcase how to design systems that process data in motion to tackle complex, real-world problems. Whether you’re an engineer, architect, or data scientist, you’ll leave with practical strategies to integrate AI-driven automation into your workflows.

Scaling Data Quality at Zillow: Migrating and Enhancing Data Quality Systems on Databricks

2025-06-10

lightning_talk

Laura Zhou (Zillow) , Firas Farah (Databricks)

Data Lakehouse Data Quality Databricks

Zillow has well-established, comprehensive systems for defining and enforcing data quality contracts and detecting anomalies.In this session, we will share how we evaluated Databricks’ native data quality features and why we chose Lakeflow Declarative Pipelines expectations for Lakeflow Declarative Pipelines, along with a combination of enforced constraints and self-defined queries for other job types. Our evaluation considered factors such as performance overhead, cost and scalability. We’ll highlight key improvements over our previous system and demonstrate how these choices have enabled Zillow to enforce scalable, production-grade data quality.Additionally, we are actively testing Databricks’ latest data quality innovations, including enhancements to lakehouse monitoring and the newly released DQX project from Databricks Labs.In summary, we will cover Zillow’s approach to data quality in the lakehouse, key lessons from our migration and actionable takeaways.

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

Traditional MDM is Dead. How Next-Generation Data Products are Winning the Enterprise

2025-06-10 Watch

lightning_talk

Dan Onions (Quantexa)

AI/ML Databricks

Organizations continue to struggle under the weight of data that still exists across multiple siloed sources, leaving data teams caught between their crumbling legacy data foundations and the race to build new AI and data-driven applications. Modern enterprises are quickly pivoting to data products that simplify and improve reusable data pipelines by joining data at massive scale and publishing it for internal users and the applications that drive business outcomes. Learn how Quantexa with Databricks enables an internal data marketplace to deliver the value that traditional data platforms never could.

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

2025-06-10 Watch

lightning_talk

Tomer Sabag (LSports)

Data Engineering Data Lakehouse Databricks Iceberg

This session showcases our journey of adopting Apache Iceberg™ to build a modern lakehouse architecture and leveraging Databricks advanced Iceberg support to take it to the next level. We’ll dive into the key design principles behind our lakehouse, the operational challenges we tackled and how Databricks enabled us to unlock enhanced performance, scalability and streamlined data workflows. Whether you’re exploring Apache Iceberg™ or building a lakehouse on Databricks, this session offers actionable insights, lessons learned and best practices for modern data engineering.

Accelerate End-to-End Multi-Agents on Databricks and DSPy

2025-06-10 Watch

talk

Austin Choi (Databricks)

AI/ML Databricks ETL/ELT GenAI LLM

A production-ready GenAI application is more than the framework itself. Like ML, you need a unified platform to create an end-to-end workflow for production quality applications.Below is an example of how this works on Databricks: Data ETL with Lakeflow Declarative Pipelines and jobs Data storage for governance and access with Unity Catalog Code development with Notebooks Agent versioning and metric tracking with MLflow and Unity Catalog Evaluation and optimizations with Mosaic AI Agent Framework and DSPy Hosting infrastructure with monitoring with Model Serving and AI Gateway Front-end apps using Databricks Apps In this session, learn how to build agents to access all your data and models through function calling. Then, learn how DSPy enables agent interaction with each other to ensure the question is answered correctly. We will demonstrate a chatbot, powered by multiple agents, to be able to answer questions and reason answers the base LLM does not know and very specialized topics.ow and very specialized topics.

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data

2025-06-10 Watch

talk

Sid Taneja (Databricks) , Youngbin Kim (Databricks)

AI/ML Databricks GenAI LLM NLP RAG

This session is repeated. Integrating AI into existing data workflows can be challenging, often requiring specialized knowledge and complex infrastructure. In this session, we'll share how SQL users can leverage AI/ML to access large language models (LLMs) and traditional machine learning directly from within SQL, simplifying the process of incorporating AI into data workflows. We will demonstrate how to use Databricks SQL for natural language processing, traditional machine learning, retrieval augmented generation and more. You'll learn about best practices and see examples of solving common use cases such as opinion mining, sentiment analysis, forecasting and other common AI/ML tasks.

talk-data.com

Top Topics

Top Speakers

Securing Databricks Using Databricks as SIEM

Simplified Delta Sharing With Network Security

Simplify Data Ingest and Egress with the New Python Data Source API

Sponsored by: Google Cloud | Unleash the power of Gemini for Databricks

Sponsored by: Hightouch | Unleashing AI at PetSmart: Using AI Decisioning Agents to Drive Revenue

Sponsored by: Onehouse | Open By Default, Fast By Design: One Lakehouse That Scales From BI to AI

Sponsored by: RowZero | Spreadsheets in the modern data stack: security, governance, AI, and self-serve analytics

Sponsored by: Snowplow | Snowplow Signals: Powering Tomorrow’s Customer Experiences on Databricks

Sponsored by: Tealium | Personalizing Experiences and Improving Engagement with a Modernized Data Infrastructure

Transforming Data Governance for Multimodal Data at Amgen With Databricks

Turn Genie Into an Agent Using Conversation APIs

Unlocking Access: Simplifying Identity Management at Scale With Databricks

Industry Forum Networking Reception

Building Trustworthy AI at Northwestern Mutual: Guardrail Technologies and Strategies

Doordash Customer 360 Data Store and its Evolution to Become an Entity Management Framework

No Time for the Dad Bod: Automating Life with AI and Databricks

Scaling Data Quality at Zillow: Migrating and Enhancing Data Quality Systems on Databricks

Sponsored by: Actian | Beyond the Lakehouse: Unlocking Enterprise-Wide AI-Ready Data with Unified Metadata Intelligence

Sponsored by: Alation | Better Together: Enterprise Catalog with Databricks & Alation at American Airlines

Sponsored by: Fivetran | Raw Data to Real-Time Insights: How Dropbox Revolutionized Data Ingestion

Sponsored by: Slalom | Nasdaq's Journey from Fragmented Customer Data to AI-Ready Insights

Traditional MDM is Dead. How Next-Generation Data Products are Winning the Enterprise

Unlocking the Power of Iceberg: Our Journey to a Unified Lakehouse on Databricks

Accelerate End-to-End Multi-Agents on Databricks and DSPy

AI Meets SQL: Leverage GenAI at Scale to Enrich Your Data