Databricks

Sponsored by: Sigma | Flogistix by Flowco, and the Role of Data in Responsible Energy Production

Apache Iceberg with Unity Catalog at HelloFresh

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Max Schultze (HelloFresh) , Adam Komisarek (HelloFresh)

Flink Data Lakehouse Delta Iceberg Snowflake Spark

Table formats like Delta Lake and Iceberg have been game changers for pushing lakehouse architecture into modern Enterprises. The acquisition of Tabular added Iceberg to the Databricks ecosystem, an open format that was already well supported by processing engines across the industry. At HelloFresh we are building a lakehouse architecture that integrates many touchpoints and technologies all across the organization. As such we chose Iceberg as the table format to bridge the gaps in our decentralized managed tech landscape. We are leveraging Unity Catalog as the Iceberg REST catalog of choice for storing metadata and managing tables. In this talk we will outline our architectural setup between Databricks, Spark, Flink and Snowflake and will explain the native Unity Iceberg REST catalog, as well as catalog federation towards connected engines. We will highlight the impact on our business and discuss the advantages and lessons learned from our early adopter experience.

AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Hien Lam (AT&T) , Colton Peltier (Databricks)

LLM NLP

We present AT&T AutoClassify, built jointly between AT&T's Chief Data Office (CDO) and Databricks professional services, a novel end-to-end system for automatic multi-head binary classifications from unlabeled text data. Our approach automates the challenge of creating labeled datasets and training multi-head binary classifiers with minimal human intervention. Starting only from a corpus of unlabeled text and a list of desired labels, AT&T AutoClassify leverages advanced natural language processing techniques to automatically mine relevant examples from raw text, fine-tune embedding models and train individual classifier heads for multiple true/false labels. This solution can reduce LLM classification costs by 1,000x, making it an efficient solution in operational costs. The end result is a highly optimized and low-cost model servable in Databricks capable of taking raw text and producing multiple binary classifications. An example use case using call transcripts will be examined.

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Venkatesh Guruprasad (BAYADA Home Health Care) , PradeepKumar jain Vimalraj (Tredence Inc) , Elaine O'Neill (BAYADA Home Health Care)

AI/ML Analytics Matillion Snowflake SQL SSIS

Bayada is transforming its data ecosystem by consolidating Matillion+Snowflake and SSIS+SQL Server into a unified Enterprise Data Platform powered by Databricks. Using Databricks' Medallion architecture, this platform enables seamless data integration, advanced analytics and machine learning across critical domains like general ledger, recruitment and activity-based costing. Databricks was selected for its scalability, real-time analytics and ability to handle both structured and unstructured data, positioning Bayada for future growth. The migration aims to reduce data processing times by 35%, improve reporting accuracy and cut reconciliation efforts by 40%. Operational costs are projected to decrease by 20%, while real-time analytics is expected to boost efficiency by 15%. Join this session to learn how Bayada is leveraging Databricks to build a high-performance data platform that accelerates insights, drives efficiency and fosters innovation organization-wide.

Beyond Simple RAG: Unlocking Quality, Scale and Cost-Efficient Retrieval With Mosaic AI Vector Search

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Ankit Vij (Databricks) , Adam Gurary (Databricks)

AI/ML RAG

This session is repeated. Mosaic AI Vector Search is powering high-accuracy retrieval systems in production across a wide range of use cases — including RAG applications, entity resolution, recommendation systems and search. Fully integrated with the Databricks Data Intelligence Platform, it eliminates pipeline maintenance by automatically syncing data from source to index. Over the past year, customers have asked for greater scale, better quality out-of-the-box and cost-efficient performance. This session delivers on those needs — showcasing best practices for implementing high-quality retrieval systems and revealing major product advancements that improve scalability, efficiency and relevance. What you’ll learn: How to optimize Vector Search with hybrid retrieval and reranking for better out-of-the-box results Best practices for managing vector indexes with minimal operational overhead Real-world examples of how organizations have scaled and improved their search and recommendation systems

Building a Seamless Multi-Cloud Platform for Secure Portable Workloads

2025-06-10 · Data + AI Summit 2025 Watch

talk

by James Burns (Databricks) , Scott Reynolds (Databricks)

Cloud Computing

There are many challenges to making a data platform actually a platform, something that hides complexity. Data engineers and scientists are looking for a simple and intuitive abstraction to focus on their work, not where it runs to maintain compliance, what credentials it uses to access data or how it generates operational telemetry. At Databricks we’ve developed a data-centric approach to workload development and deployment that enables data workers to stop doing migrations and instead develop with confidence. Attend this session to learn how to run simple, secure and compliant global multi-cloud workloads at scale on Databricks.

Chaos to Clarity: Secure, Scalable, and Governed SaaS Ingestion through Lakeflow Connect and more

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Krishna Bhupatiraju (Databricks) , Prashant Gupta (Databricks)

API SaaS

Ingesting data from SaaS systems sounds straightforward—until you hit API limits, miss SLAs, or accidentally ingest PII. Sound familiar? In this talk, we’ll share how Databricks evolved from scrappy ingestion scripts to a unified, secure, and scalable ingestion platform. Along the way, we’ll highlight the hard lessons, the surprising pitfalls, and the tools that helped us level up. Whether you’re just starting to wrangle third-party data or looking to scale while handling governance and compliance, this session will help you think beyond pipelines and toward platform thinking.

Crafting Business Brilliance: Leveraging Databricks SQL for Next-Gen Applications

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Mohammad Shalchi (Haleon) , Wasim Ahmad (Databricks)

API Data Lakehouse GenAI SAP SQL

At Haleon, we've leveraged Databricks APIs and serverless compute to develop customer-facing applications for our business. This innovative solution enables us to efficiently deliver SAP invoice and order management data through front-end applications developed and served via our API Gateway. The Databricks lakehouse architecture has been instrumental in eliminating the friction associated with directly accessing SAP data from operational systems, while enhancing our performance capabilities. Our system acheived response times of less than 3 seconds from API call, with ongoing efforts to optimise this performance. This architecture not only streamlines our data and application ecosystem but also paves the way for integrating GenAI capabilities with robust governance measures for our future infrastructure. The implementation of this solution has yielded significant benefits, including a 15% reduction in customer service costs and a 28% increase in productivity for our customer support team.

Deploying Databricks Asset Bundles (DABs) at Scale

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Saad Ansari (Databricks) , Pieter Noordhuis (Databricks)

AI/ML Azure Azure DevOps BI Dashboard DevOps Git GitHub

This session is repeated.Managing data and AI workloads in Databricks can be complex. Databricks Asset Bundles (DABs) simplify this by enabling declarative, Git-driven deployment workflows for notebooks, jobs, Lakeflow Declarative Pipelines, dashboards, ML models and more.Join the DABs Team for a Deep Dive and learn about:The Basics: Understanding Databricks asset bundlesDeclare, define and deploy assets, follow best practices, use templates and manage dependenciesCI/CD & Governance: Automate deployments with GitHub Actions/Azure DevOps, manage Dev vs. Prod differences, and ensure reproducibilityWhat’s new and what's coming up! AI/BI Dashboard support, Databricks Apps support, a Pythonic interface and workspace-based deploymentIf you're a data engineer, ML practitioner or platform architect, this talk will provide practical insights to improve reliability, efficiency and compliance in your Databricks workflows.

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Amina Alavi (Doctors Without Borders)

AI/ML GenAI

Artificial Intelligence (AI) is more than a corporate tool; it’s a force for good. At Doctors Without Borders/Médecins Sans Frontières (MSF), we use AI to optimize fundraising, ensuring that every dollar raised directly supports life-saving medical aid worldwide. With Databricks, Mosaic AI and Unity Catalog, we analyze donor behavior, predict giving patterns and personalize outreach, increasing contributions while upholding ethical AI principles. This session will showcase how AI maximizes fundraising impact, enabling faster crisis response and resource allocation. We’ll explore predictive modeling for donor engagement, secure AI governance with Unity Catalog and our vision for generative AI in fundraising, leveraging AI-assisted storytelling to deepen donor connections. AI is not just about efficiency; it’s about saving lives. Join us to see how AI-driven fundraising is transforming humanitarian aid on a global scale.

Federated Data Analytics Platform

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Rohit Mathews (Databricks) , Neo Ni (Databricks)

Analytics BI Data Analytics Data Engineering

Are you struggling to keep up with rapid business changes that demand constant updates to your data pipelines? Is your data engineering team growing rapidly just to manage this complexity? Databricks was not immune to this challenge either. Managing our BI with contributions from hundreds of Product Engineering Teams across the company while maintaining central oversight and quality posed significant hurdles. Join us to learn how we developed a config-driven data pipeline framework using Metric Store and UC Metrics that helped us reduce engineering effort — achieving the work of 100 classical data engineers with just two platform engineers.

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Arvindram Krishnamoorthy (T-Mobile) , Brian Schober (T-Mobile)

AI/ML LLM

Genie Rooms have played an integral role in democratizing important datasets like Cell Tower and Lease Information. However, in order to ensure that this exciting new release from Databricks was configured as optimally as possible from development to deployment, we needed additional scaffolding around governance. In this talk we will describe the four main components we used in conjunction with the Genie Room to build a successful product and will provide generalizable lessons to help others get the most out of this object. At the core are a declarative, metadata approach to creating UC tables deployed on a robust framework. Second, a platform that efficiently crowdsourced targeted feedback from different user groups. Third, a tool that balances the LLM’s creativity with human wisdom. And finally, a platform that enforces our principle of separating Storage from Compute to manage access to the room at a fine-grained level and enables a whole host of interesting use-cases.

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by Eric Sun (Coinbase) , Danfeng Xu (PuppyGraph)

Observability data — logs, metrics, and traces — captures the complex interactions within modern distributed systems. A graph query engine on top of Databricks enables complex traversal of massive observability data, helping users trace service dependencies, analyze upstream/downstream impacts, and uncover recurring error patterns, making it easier to diagnose issues and optimize system performance. A critical challenge in handling observability data is managing dynamic RBAC for the sensitive system telemetry. This session explains how Coinbase leverages credential vending, a method for issuing short-lived credentials to enable fine-grained, secure access to observability data stored in Databricks without long-lived secrets. Key takeaways: Querying Databricks tables as graph structures without ETLing data out Secure access management with credential vending Practical graph-based incident analysis solution at Coinbase, with insights on how PuppyGraph enables this application Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

2025-06-10 · Data + AI Summit 2025 Watch

lightning_talk

by yunlong wang (IQVIA)

LLM

This research introduces a groundbreaking method for healthcare time-series forecasting using a Large Language Model (LLM) foundation model. By leveraging a comprehensive dataset of over 50 million IQVIA time-series trends, which includes data on procedure demands, sales and prescriptions (TRx), alongside publicly available data spanning two decades, the model aims to significantly enhance predictive accuracy in various healthcare applications. The model's transformer-based architecture incorporates self-attention mechanisms to effectively capture complex temporal dependencies within historical time-series trends, offering a sophisticated approach to understanding patterns, trends and cyclical variations.

Harnessing Real-Time Data and AI for Retail Innovation

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Lorenz Verzosa (Databricks) , Tristen Wentling (Databricks)

AI/ML Analytics GenAI Data Streaming

This talk explores using advanced data processing and generative AI techniques to revolutionize the retail industry. Using Databricks, we will discuss how cutting-edge technologies enable real-time data analysis and machine learning applications, creating a powerful ecosystem for large-scale, data-driven retail solutions. Attendees will gain insights into architecting scalable data pipelines for retail operations and implementing advanced analytics on streaming customer data. Discover how these integrated technologies drive innovation in retail, enhancing customer experiences, streamlining operations and enabling data-driven decision-making. Learn how retailers can leverage these tools to gain a competitive edge in the rapidly evolving digital marketplace, ultimately driving growth and adaptability in the face of changing consumer behaviors and market dynamics.

How Danone Enhanced Global Data Sharing with Delta Sharing

2025-06-10 · Data + AI Summit 2025 Watch

talk

by BASELTO Yohan (Danone) , Gergő Pásztor (Databricks)

Data Contracts Delta

Learn how Danone, a global leader in the food industry, improved its data-sharing processes using Delta Sharing, an open protocol developed by Databricks. This session will explore how Danone migrated from a traditional hub-and-spoke model to a more efficient and scalable data-sharing approach that works seamlessly across regions and platforms. We’ll discuss practical concepts such as in-region and cross-region data sharing, fine-grained access control, data discovery, and the implementation of data contracts. You’ll also hear about the strategies Danone uses to deliver governed data efficiently while maintaining compliance with global regulations. Additionally, we’ll discuss a cost comparison between direct data access and replication. Finally, we’ll share insights into the challenges faced by global organizations in managing data sharing at scale and how Danone addressed these issues. Attendees will gain practical knowledge on building a reliable and secure data-sharing framework for international collaboration.

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Dave Ruijter (Blue Rocket (ABN Amro))

Cyber Security

Join us on a technical journey into GreenOps at ABN AMRO Bank using Databricks system tables. We'll explore security, implementation challenges and best-practice verification, with practical examples and actionable reports. Discover how to optimize resource usage, ensure compliance and maintain agility. We'll discuss best practices, potential pitfalls and the nuanced 'it depends' scenarios, offering a comprehensive guide for intermediate to advanced practitioners.

Italgas’ AI Factory and the Future of Gas Distribution

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Nicola Giorcelli (Cluster Reply) , Delli, Serena (Italgas)

AI/ML Azure BI GenAI SQL Synapse

At Italgas, Europe’s leading gas distributor both by network size and number of customers, we are spearheading digital transformation through a state-of-the-art, fully-fledged Databricks Intelligent platform. Achieved 50% cost reduction and 20% performance boost migrating from Azure Synapse to Databricks SQL Deployed 41 ML/GenAI models in production, with 100% of workloads governed by Unity Catalog Empowered 80% of employees with self-service BI through Genie Dashboards Enabled natural language queries for control-room operators analyzing network status The future of gas distribution is data-driven: predictive maintenance, automated operations, and real-time decision making are now realities. Our AI Factory isn't just digitizing infrastructure—it's creating a more responsive, efficient, and sustainable gas network that anticipates needs before they arise.

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Jigar Bhati (Open AI)

Kafka LLM Data Streaming

At OpenAI, Kafka fuels real-time data streaming at massive scale, but traditional consumers struggle under the burden of partition management, offset tracking, error handling, retries, Dead Letter Queues (DLQ), and dynamic scaling — all while racing to maintain ultra-high throughput. As deployments scale, complexity multiplies. Enter Kafka Forwarder — a game-changing Kafka Consumer Proxy that flips the script on traditional Kafka consumption. By offloading client-side complexity and pushing messages to consumers, it ensures at-least-once delivery, automated retries, and seamless DLQ management via Databricks. The result? Scalable, reliable and effortless Kafka consumption that lets teams focus on what truly matters. Curious how OpenAI simplified self-service, high-scale Kafka consumption? Join us as we walk through the motivation, architecture and challenges behind Kafka Forwarder, and share how we structured the pipeline to seamlessly route DLQ data into Databricks for analysis.

Optimize Cost and User Value Through Model Routing AI Agent

2025-06-10 · Data + AI Summit 2025 Watch

talk

by Aditya Gautam (Meta)

AI/ML API Data Collection LLM Vector DB

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

talk-data.com

Activity Trend

Top Events

Top Speakers

Sponsored by: Sigma | Flogistix by Flowco, and the Role of Data in Responsible Energy Production

Apache Iceberg with Unity Catalog at HelloFresh

AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

Beyond Simple RAG: Unlocking Quality, Scale and Cost-Efficient Retrieval With Mosaic AI Vector Search

Building a Seamless Multi-Cloud Platform for Secure Portable Workloads

Chaos to Clarity: Secure, Scalable, and Governed SaaS Ingestion through Lakeflow Connect and more

Crafting Business Brilliance: Leveraging Databricks SQL for Next-Gen Applications

Deploying Databricks Asset Bundles (DABs) at Scale

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

Federated Data Analytics Platform

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

Harnessing Real-Time Data and AI for Retail Innovation

How Danone Enhanced Global Data Sharing with Delta Sharing

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

Italgas’ AI Factory and the Future of Gas Distribution

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

Optimize Cost and User Value Through Model Routing AI Agent