Data + AI Summit 2025

Deploying Databricks Asset Bundles (DABs) at Scale

2025-06-10 Watch

talk

Saad Ansari (Databricks) , Pieter Noordhuis (Databricks)

AI/ML Azure Azure DevOps BI Dashboard Databricks

This session is repeated.Managing data and AI workloads in Databricks can be complex. Databricks Asset Bundles (DABs) simplify this by enabling declarative, Git-driven deployment workflows for notebooks, jobs, Lakeflow Declarative Pipelines, dashboards, ML models and more.Join the DABs Team for a Deep Dive and learn about:The Basics: Understanding Databricks asset bundlesDeclare, define and deploy assets, follow best practices, use templates and manage dependenciesCI/CD & Governance: Automate deployments with GitHub Actions/Azure DevOps, manage Dev vs. Prod differences, and ensure reproducibilityWhat’s new and what's coming up! AI/BI Dashboard support, Databricks Apps support, a Pythonic interface and workspace-based deploymentIf you're a data engineer, ML practitioner or platform architect, this talk will provide practical insights to improve reliability, efficiency and compliance in your Databricks workflows.

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

2025-06-10 Watch

lightning_talk

Amina Alavi (Doctors Without Borders)

AI/ML Databricks GenAI

Artificial Intelligence (AI) is more than a corporate tool; it’s a force for good. At Doctors Without Borders/Médecins Sans Frontières (MSF), we use AI to optimize fundraising, ensuring that every dollar raised directly supports life-saving medical aid worldwide. With Databricks, Mosaic AI and Unity Catalog, we analyze donor behavior, predict giving patterns and personalize outreach, increasing contributions while upholding ethical AI principles. This session will showcase how AI maximizes fundraising impact, enabling faster crisis response and resource allocation. We’ll explore predictive modeling for donor engagement, secure AI governance with Unity Catalog and our vision for generative AI in fundraising, leveraging AI-assisted storytelling to deepen donor connections. AI is not just about efficiency; it’s about saving lives. Join us to see how AI-driven fundraising is transforming humanitarian aid on a global scale.

Federated Data Analytics Platform

2025-06-10 Watch

talk

Rohit Mathews (Databricks) , Neo Ni (Databricks)

Analytics BI Data Analytics Data Engineering Databricks

Are you struggling to keep up with rapid business changes that demand constant updates to your data pipelines? Is your data engineering team growing rapidly just to manage this complexity? Databricks was not immune to this challenge either. Managing our BI with contributions from hundreds of Product Engineering Teams across the company while maintaining central oversight and quality posed significant hurdles. Join us to learn how we developed a config-driven data pipeline framework using Metric Store and UC Metrics that helped us reduce engineering effort — achieving the work of 100 classical data engineers with just two platform engineers.

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

2025-06-10 Watch

talk

Arvindram Krishnamoorthy (T-Mobile) , Brian Schober (T-Mobile)

AI/ML Databricks LLM

Genie Rooms have played an integral role in democratizing important datasets like Cell Tower and Lease Information. However, in order to ensure that this exciting new release from Databricks was configured as optimally as possible from development to deployment, we needed additional scaffolding around governance. In this talk we will describe the four main components we used in conjunction with the Genie Room to build a successful product and will provide generalizable lessons to help others get the most out of this object. At the core are a declarative, metadata approach to creating UC tables deployed on a robust framework. Second, a platform that efficiently crowdsourced targeted feedback from different user groups. Third, a tool that balances the LLM’s creativity with human wisdom. And finally, a platform that enforces our principle of separating Storage from Compute to manage access to the room at a fine-grained level and enables a whole host of interesting use-cases.

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

2025-06-10 Watch

lightning_talk

Eric Sun (Coinbase) , Danfeng Xu (PuppyGraph)

Databricks

Observability data — logs, metrics, and traces — captures the complex interactions within modern distributed systems. A graph query engine on top of Databricks enables complex traversal of massive observability data, helping users trace service dependencies, analyze upstream/downstream impacts, and uncover recurring error patterns, making it easier to diagnose issues and optimize system performance. A critical challenge in handling observability data is managing dynamic RBAC for the sensitive system telemetry. This session explains how Coinbase leverages credential vending, a method for issuing short-lived credentials to enable fine-grained, secure access to observability data stored in Databricks without long-lived secrets. Key takeaways: Querying Databricks tables as graph structures without ETLing data out Secure access management with credential vending Practical graph-based incident analysis solution at Coinbase, with insights on how PuppyGraph enables this application Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

2025-06-10 Watch

lightning_talk

yunlong wang (IQVIA)

Databricks LLM

This research introduces a groundbreaking method for healthcare time-series forecasting using a Large Language Model (LLM) foundation model. By leveraging a comprehensive dataset of over 50 million IQVIA time-series trends, which includes data on procedure demands, sales and prescriptions (TRx), alongside publicly available data spanning two decades, the model aims to significantly enhance predictive accuracy in various healthcare applications. The model's transformer-based architecture incorporates self-attention mechanisms to effectively capture complex temporal dependencies within historical time-series trends, offering a sophisticated approach to understanding patterns, trends and cyclical variations.

Harnessing Real-Time Data and AI for Retail Innovation

2025-06-10 Watch

talk

Lorenz Verzosa (Databricks) , Tristen Wentling (Databricks)

AI/ML Analytics Databricks GenAI Data Streaming

This talk explores using advanced data processing and generative AI techniques to revolutionize the retail industry. Using Databricks, we will discuss how cutting-edge technologies enable real-time data analysis and machine learning applications, creating a powerful ecosystem for large-scale, data-driven retail solutions. Attendees will gain insights into architecting scalable data pipelines for retail operations and implementing advanced analytics on streaming customer data. Discover how these integrated technologies drive innovation in retail, enhancing customer experiences, streamlining operations and enabling data-driven decision-making. Learn how retailers can leverage these tools to gain a competitive edge in the rapidly evolving digital marketplace, ultimately driving growth and adaptability in the face of changing consumer behaviors and market dynamics.

HIPAA Without the Headache at Hinge Health: Simple PHI Governance With Fine Grain Access Control

2025-06-10 Watch

talk

Alex Owen (Databricks) , Veera Mukkanagoudar (Hinge Health)

Hinge Health faced challenges in hiring global data teams due to the complexities of PHI (Protected Health Information) governance. Unable to hire without compliant data sharing and unable to scale PHI governance without a larger team, we overcame this chicken or the egg challenge by adopting Unity Catalog's Fine-Grain Access Control. In this session, we will share our journey migrating to Unity Catalog, securing PHI with row filters/column masks, lessons learned and how our efforts surpassed our own expectations. This session equips data teams with strategies for HIPAA compliance without compromising flexibility and collaboration. Hinge Health is the leading digital MSK clinic, serving 11M+ members and 500+ employer health plans offering virtual physical therapy to reduce pain, surgeries and opioid use.

How Danone Enhanced Global Data Sharing with Delta Sharing

2025-06-10 Watch

talk

BASELTO Yohan (Danone) , Gergő Pásztor (Databricks)

Data Contracts Databricks Delta

Learn how Danone, a global leader in the food industry, improved its data-sharing processes using Delta Sharing, an open protocol developed by Databricks. This session will explore how Danone migrated from a traditional hub-and-spoke model to a more efficient and scalable data-sharing approach that works seamlessly across regions and platforms. We’ll discuss practical concepts such as in-region and cross-region data sharing, fine-grained access control, data discovery, and the implementation of data contracts. You’ll also hear about the strategies Danone uses to deliver governed data efficiently while maintaining compliance with global regulations. Additionally, we’ll discuss a cost comparison between direct data access and replication. Finally, we’ll share insights into the challenges faced by global organizations in managing data sharing at scale and how Danone addressed these issues. Attendees will gain practical knowledge on building a reliable and secure data-sharing framework for international collaboration.

How Nubank improves Governance, Security and User Experience with Unity Catalog

2025-06-10 Watch

talk

juliano.fonseca juliano.fonseca (Nubank)

Data Governance Data Management Cyber Security

At Nubank, we successfully migrated to Unity Catalog, addressing the needs of our large-scale data environment with 3k active users, over 4k notebooks and jobs and 1.1 million tables, including sensitive PII data. Our primary objectives were to enhance data governance, security and user experience.Key points: Comprehensive data access monitoring and control implementation Enhanced security measures for handling PII and sensitive data Efficient migration of 4,000+ notebooks and jobs to the new system Improved cataloging and governance for 1.1 million tables Implementation of robust access controls and permissions model Optimized user experience and productivity through centralized data management This migration significantly improved our data governance capabilities, enhanced security measures and provided a more user-friendly experience for our large user base, ultimately leading to better control and utilization of our vast data resources.

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

2025-06-10 Watch

talk

Dave Ruijter (Blue Rocket (ABN Amro))

Databricks Cyber Security

Join us on a technical journey into GreenOps at ABN AMRO Bank using Databricks system tables. We'll explore security, implementation challenges and best-practice verification, with practical examples and actionable reports. Discover how to optimize resource usage, ensure compliance and maintain agility. We'll discuss best practices, potential pitfalls and the nuanced 'it depends' scenarios, offering a comprehensive guide for intermediate to advanced practitioners.

Italgas’ AI Factory and the Future of Gas Distribution

2025-06-10 Watch

talk

Nicola Giorcelli (Cluster Reply) , Delli, Serena (Italgas)

AI/ML Azure BI Databricks GenAI SQL

At Italgas, Europe’s leading gas distributor both by network size and number of customers, we are spearheading digital transformation through a state-of-the-art, fully-fledged Databricks Intelligent platform. Achieved 50% cost reduction and 20% performance boost migrating from Azure Synapse to Databricks SQL Deployed 41 ML/GenAI models in production, with 100% of workloads governed by Unity Catalog Empowered 80% of employees with self-service BI through Genie Dashboards Enabled natural language queries for control-room operators analyzing network status The future of gas distribution is data-driven: predictive maintenance, automated operations, and real-time decision making are now realities. Our AI Factory isn't just digitizing infrastructure—it's creating a more responsive, efficient, and sustainable gas network that anticipates needs before they arise.

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

2025-06-10 Watch

talk

Jigar Bhati (Open AI)

Databricks Kafka LLM Data Streaming

At OpenAI, Kafka fuels real-time data streaming at massive scale, but traditional consumers struggle under the burden of partition management, offset tracking, error handling, retries, Dead Letter Queues (DLQ), and dynamic scaling — all while racing to maintain ultra-high throughput. As deployments scale, complexity multiplies. Enter Kafka Forwarder — a game-changing Kafka Consumer Proxy that flips the script on traditional Kafka consumption. By offloading client-side complexity and pushing messages to consumers, it ensures at-least-once delivery, automated retries, and seamless DLQ management via Databricks. The result? Scalable, reliable and effortless Kafka consumption that lets teams focus on what truly matters. Curious how OpenAI simplified self-service, high-scale Kafka consumption? Join us as we walk through the motivation, architecture and challenges behind Kafka Forwarder, and share how we structured the pipeline to seamlessly route DLQ data into Databricks for analysis.

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

2025-06-10 Watch

talk

Zero Qu (Databricks) , Chang She (LanceDB)

AI/ML Lance Vector DB

If you're building AI applications, chances are you're solving a retrieval problem somewhere along the way. This is why vector databases are popular today. But if we zoom out from just vector search, serving AI applications also requires handling KV workloads like a traditional feature store, as well as analytical workloads to explore and visualize data. This means that building an AI application often requires multiple data stores, which means multiple data copies, manual syncing, and extra infrastructure expenses. LanceDB is the first and only system that supports all of these workloads in one system. Powered by Lance columnar format, LanceDB completely breaks open the impossible triangle of performance, scalability, and cost for AI serving. Serving AI applications is different from previous waves of technology, and a new paradigm demands new tools.

No-Code Change in Your Python UDF for Arrow Optimization

2025-06-10 Watch

lightning_talk

Hyukjin Kwon (Databricks)

API Arrow Pandas Python Spark

Apache Spark™ has introduced Arrow-optimized APIs such as Pandas UDFs and the Pandas Functions API, providing high performance for Python workloads. Yet, many users continue to rely on regular Python UDFs due to their simple interface, especially when advanced Python expertise is not readily available. This talk introduces a powerful new feature in Apache Spark that brings Arrow optimization to regular Python UDFs. With this enhancement, users can leverage performance gains without modifying their existing UDFs — simply by enabling a configuration setting or toggling a UDF-level parameter. Additionally, we will dive into practical tips and features for using Arrow-optimized Python UDFs effectively, exploring their strengths and limitations. Whether you’re a Spark beginner or an experienced user, this session will allow you to achieve the best of both simplicity and performance in your workflows with regular Python UDFs.

Optimize Cost and User Value Through Model Routing AI Agent

2025-06-10 Watch

talk

Aditya Gautam (Meta)

AI/ML API Data Collection Databricks LLM Vector DB

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

Reimagining Data Governance and Access at Atlassian

2025-06-10 Watch

lightning_talk

Gerald Nakhle (Atlassian)

Agile/Scrum Data Governance Data Lakehouse Cyber Security

Atlassian is rebuilding its central lakehouse from the ground up to deliver a more secure, flexible and scalable data environment. In this session, we’ll share how we leverage Unity Catalog for fine-grained governance and supplement it with Immuta for dynamic policy management, enabling row and column level security at scale. By shifting away from broad, monolithic access controls toward a modern, agile solution, we’re empowering teams to securely collaborate on sensitive data without sacrificing performance or usability. Join us for an inside look at our end-to-end policy architecture, from how data owners declare metadata and author policies to the seamless application of access rules across the platform. We’ll also discuss lessons learned on streamlining data governance, ensuring compliance, and improving user adoption. Whether you’re a data architect, engineer or leader, walk away with actionable strategies to simplify and strengthen your own governance and access practices.

Revolutionizing Cybersecurity: SCB's Journey to a Self-Managed SIEM

2025-06-10 Watch

talk

Lavy Stokhamer (Standard Chartered Bank)

AI/ML Cloud Computing Data Lakehouse Databricks

Join us to explore how Standard Chartered Bank's (SCB) groundbreaking strategy is reshaping the future of the cybersecurity landscape by replacing traditional SIEM with a cutting-edge Databricks solution, achieving remarkable business outcomes: 80% Reduction in time to detect incidents 92% Faster threat investigation 35% Cost reduction 60% Better detection accuracy Significant enhancements in threat detection and response metrics Substantial increase in ML-driven use cases This session unveils SCB's journey to a distributed, multi-cloud lakehouse architecture that unlocks unprecedented performance and commercial optimization. Explore why a unified data and AI platform is becoming the cornerstone of next-generation, self-managed SIEM solutions for forward-thinking organizations in this era of AI-powered banking transformation.

Scaling Data Intelligence at NAB: Balancing Innovation with Enterprise-Grade Governance

2025-06-10 Watch

talk

Tom McMeekin (Databricks) , Daniel Antoinette (National Australia Bank)

AI/ML Databricks GenAI Cyber Security

In this session, discover how National Australia Bank (NAB) is reshaping its data and AI strategy by positioning data as a strategic enabler. Driven by a vision to unlock data like electricity—continuous and reliable—NAB has established a scalable foundation for data intelligence that balances agility with enterprise-grade control. We'll delve into the key architectural, security, and governance capabilities underpinning this transformation, including Unity Catalog, Serverless, Lakeflow and GenAI. The session will highlight NAB's adoption of Databricks Serverless, platform security controls like private link, and persona-based data access patterns. Attendees will walk away with practical insights into building secure, scalable, and cost-efficient data platforms that fuel innovation while meeting the demands of compliance in highly regulated environments.

Sponsored by: Accenture & Avanade | Enterprise Data Journey for The Standard Insurance Leveraging Databricks on Azure and AI Innovation

talk-data.com

Top Topics

Top Speakers

Deploying Databricks Asset Bundles (DABs) at Scale

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

Federated Data Analytics Platform

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

Harnessing Real-Time Data and AI for Retail Innovation

HIPAA Without the Headache at Hinge Health: Simple PHI Governance With Fine Grain Access Control

How Danone Enhanced Global Data Sharing with Delta Sharing

How Nubank improves Governance, Security and User Experience with Unity Catalog

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

Italgas’ AI Factory and the Future of Gas Distribution

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

No-Code Change in Your Python UDF for Arrow Optimization

Optimize Cost and User Value Through Model Routing AI Agent

Reimagining Data Governance and Access at Atlassian

Revolutionizing Cybersecurity: SCB's Journey to a Self-Managed SIEM

Scaling Data Intelligence at NAB: Balancing Innovation with Enterprise-Grade Governance

Sponsored by: Accenture & Avanade | Enterprise Data Journey for The Standard Insurance Leveraging Databricks on Azure and AI Innovation

Sponsored by: Astronomer | Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3

Sponsored by: AVEVA | CONNECT and Databricks IT-OT Convergence for Industrial Intelligence at Scale

Sponsored by: Capital One Software | How to Manage High-Quality, Secure Data and Cost Visibility for AI

Sponsored by: Domo | Behind the Brand: How Sol de Janeiro Powers Amazon Ops with Databricks + DOMO

Sponsored by: Firebolt | The Power of Low-latency Data for AI Apps