talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

715

Sessions & talks

Showing 576–600 of 715 · Newest first

Search within this event →
Deploying Databricks Asset Bundles (DABs) at Scale

Deploying Databricks Asset Bundles (DABs) at Scale

2025-06-10 Watch
talk
Saad Ansari (Databricks) , Pieter Noordhuis (Databricks)

This session is repeated.Managing data and AI workloads in Databricks can be complex. Databricks Asset Bundles (DABs) simplify this by enabling declarative, Git-driven deployment workflows for notebooks, jobs, Lakeflow Declarative Pipelines, dashboards, ML models and more.Join the DABs Team for a Deep Dive and learn about:The Basics: Understanding Databricks asset bundlesDeclare, define and deploy assets, follow best practices, use templates and manage dependenciesCI/CD & Governance: Automate deployments with GitHub Actions/Azure DevOps, manage Dev vs. Prod differences, and ensure reproducibilityWhat’s new and what's coming up! AI/BI Dashboard support, Databricks Apps support, a Pythonic interface and workspace-based deploymentIf you're a data engineer, ML practitioner or platform architect, this talk will provide practical insights to improve reliability, efficiency and compliance in your Databricks workflows.

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

Empowering Fundraising With AI: A Journey With Databricks Mosaic AI

2025-06-10 Watch
lightning_talk
Amina Alavi (Doctors Without Borders)

Artificial Intelligence (AI) is more than a corporate tool; it’s a force for good. At Doctors Without Borders/Médecins Sans Frontières (MSF), we use AI to optimize fundraising, ensuring that every dollar raised directly supports life-saving medical aid worldwide. With Databricks, Mosaic AI and Unity Catalog, we analyze donor behavior, predict giving patterns and personalize outreach, increasing contributions while upholding ethical AI principles. This session will showcase how AI maximizes fundraising impact, enabling faster crisis response and resource allocation. We’ll explore predictive modeling for donor engagement, secure AI governance with Unity Catalog and our vision for generative AI in fundraising, leveraging AI-assisted storytelling to deepen donor connections. AI is not just about efficiency; it’s about saving lives. Join us to see how AI-driven fundraising is transforming humanitarian aid on a global scale.

Federated Data Analytics Platform

Federated Data Analytics Platform

2025-06-10 Watch
talk
Rohit Mathews (Databricks) , Neo Ni (Databricks)

Are you struggling to keep up with rapid business changes that demand constant updates to your data pipelines? Is your data engineering team growing rapidly just to manage this complexity? Databricks was not immune to this challenge either. Managing our BI with contributions from hundreds of Product Engineering Teams across the company while maintaining central oversight and quality posed significant hurdles. Join us to learn how we developed a config-driven data pipeline framework using Metric Store and UC Metrics that helped us reduce engineering effort — achieving the work of 100 classical data engineers with just two platform engineers.

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

Getting Data AI Ready: Testimonial of Good Governance Practices Constructing Accurate Genie Spaces

2025-06-10 Watch
talk
Arvindram Krishnamoorthy (T-Mobile) , Brian Schober (T-Mobile)

Genie Rooms have played an integral role in democratizing important datasets like Cell Tower and Lease Information. However, in order to ensure that this exciting new release from Databricks was configured as optimally as possible from development to deployment, we needed additional scaffolding around governance. In this talk we will describe the four main components we used in conjunction with the Genie Room to build a successful product and will provide generalizable lessons to help others get the most out of this object. At the core are a declarative, metadata approach to creating UC tables deployed on a robust framework. Second, a platform that efficiently crowdsourced targeted feedback from different user groups. Third, a tool that balances the LLM’s creativity with human wisdom. And finally, a platform that enforces our principle of separating Storage from Compute to manage access to the room at a fine-grained level and enables a whole host of interesting use-cases.

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

Graph-Powered Observability Data Analysis in Databricks With Credential Vending

2025-06-10 Watch
lightning_talk
Eric Sun (Coinbase) , Danfeng Xu (PuppyGraph)

Observability data — logs, metrics, and traces — captures the complex interactions within modern distributed systems. A graph query engine on top of Databricks enables complex traversal of massive observability data, helping users trace service dependencies, analyze upstream/downstream impacts, and uncover recurring error patterns, making it easier to diagnose issues and optimize system performance. A critical challenge in handling observability data is managing dynamic RBAC for the sensitive system telemetry. This session explains how Coinbase leverages credential vending, a method for issuing short-lived credentials to enable fine-grained, secure access to observability data stored in Databricks without long-lived secrets. Key takeaways: Querying Databricks tables as graph structures without ETLing data out Secure access management with credential vending Practical graph-based incident analysis solution at Coinbase, with insights on how PuppyGraph enables this application Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

Harnessing Databricks for Advanced LLM Time-Series Models in Healthcare Forecasting

2025-06-10 Watch
lightning_talk
yunlong wang (IQVIA)

This research introduces a groundbreaking method for healthcare time-series forecasting using a Large Language Model (LLM) foundation model. By leveraging a comprehensive dataset of over 50 million IQVIA time-series trends, which includes data on procedure demands, sales and prescriptions (TRx), alongside publicly available data spanning two decades, the model aims to significantly enhance predictive accuracy in various healthcare applications. The model's transformer-based architecture incorporates self-attention mechanisms to effectively capture complex temporal dependencies within historical time-series trends, offering a sophisticated approach to understanding patterns, trends and cyclical variations.

Harnessing Real-Time Data and AI for Retail Innovation

Harnessing Real-Time Data and AI for Retail Innovation

2025-06-10 Watch
talk
Lorenz Verzosa (Databricks) , Tristen Wentling (Databricks)

This talk explores using advanced data processing and generative AI techniques to revolutionize the retail industry. Using Databricks, we will discuss how cutting-edge technologies enable real-time data analysis and machine learning applications, creating a powerful ecosystem for large-scale, data-driven retail solutions. Attendees will gain insights into architecting scalable data pipelines for retail operations and implementing advanced analytics on streaming customer data. Discover how these integrated technologies drive innovation in retail, enhancing customer experiences, streamlining operations and enabling data-driven decision-making. Learn how retailers can leverage these tools to gain a competitive edge in the rapidly evolving digital marketplace, ultimately driving growth and adaptability in the face of changing consumer behaviors and market dynamics.

HIPAA Without the Headache at Hinge Health: Simple PHI Governance With Fine Grain Access Control

HIPAA Without the Headache at Hinge Health: Simple PHI Governance With Fine Grain Access Control

2025-06-10 Watch
talk
Alex Owen (Databricks) , Veera Mukkanagoudar (Hinge Health)

Hinge Health faced challenges in hiring global data teams due to the complexities of PHI (Protected Health Information) governance. Unable to hire without compliant data sharing and unable to scale PHI governance without a larger team, we overcame this chicken or the egg challenge by adopting Unity Catalog's Fine-Grain Access Control. In this session, we will share our journey migrating to Unity Catalog, securing PHI with row filters/column masks, lessons learned and how our efforts surpassed our own expectations. This session equips data teams with strategies for HIPAA compliance without compromising flexibility and collaboration. Hinge Health is the leading digital MSK clinic, serving 11M+ members and 500+ employer health plans offering virtual physical therapy to reduce pain, surgeries and opioid use.

How Danone Enhanced Global Data Sharing with Delta Sharing

How Danone Enhanced Global Data Sharing with Delta Sharing

2025-06-10 Watch
talk
BASELTO Yohan (Danone) , Gergő Pásztor (Databricks)

Learn how Danone, a global leader in the food industry, improved its data-sharing processes using Delta Sharing, an open protocol developed by Databricks. This session will explore how Danone migrated from a traditional hub-and-spoke model to a more efficient and scalable data-sharing approach that works seamlessly across regions and platforms. We’ll discuss practical concepts such as in-region and cross-region data sharing, fine-grained access control, data discovery, and the implementation of data contracts. You’ll also hear about the strategies Danone uses to deliver governed data efficiently while maintaining compliance with global regulations. Additionally, we’ll discuss a cost comparison between direct data access and replication. Finally, we’ll share insights into the challenges faced by global organizations in managing data sharing at scale and how Danone addressed these issues. Attendees will gain practical knowledge on building a reliable and secure data-sharing framework for international collaboration.

How Nubank improves Governance, Security and User Experience with Unity Catalog

How Nubank improves Governance, Security and User Experience with Unity Catalog

2025-06-10 Watch
talk

At Nubank, we successfully migrated to Unity Catalog, addressing the needs of our large-scale data environment with 3k active users, over 4k notebooks and jobs and 1.1 million tables, including sensitive PII data. Our primary objectives were to enhance data governance, security and user experience.Key points: Comprehensive data access monitoring and control implementation Enhanced security measures for handling PII and sensitive data Efficient migration of 4,000+ notebooks and jobs to the new system Improved cataloging and governance for 1.1 million tables Implementation of robust access controls and permissions model Optimized user experience and productivity through centralized data management This migration significantly improved our data governance capabilities, enhanced security measures and provided a more user-friendly experience for our large user base, ultimately leading to better control and utilization of our vast data resources.

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

Implementing GreenOps in Databricks: A Practical Guide for Regulated Environments

2025-06-10 Watch
talk
Dave Ruijter (Blue Rocket (ABN Amro))

Join us on a technical journey into GreenOps at ABN AMRO Bank using Databricks system tables. We'll explore security, implementation challenges and best-practice verification, with practical examples and actionable reports. Discover how to optimize resource usage, ensure compliance and maintain agility. We'll discuss best practices, potential pitfalls and the nuanced 'it depends' scenarios, offering a comprehensive guide for intermediate to advanced practitioners.

Italgas’ AI Factory and the Future of Gas Distribution

Italgas’ AI Factory and the Future of Gas Distribution

2025-06-10 Watch
talk
Nicola Giorcelli (Cluster Reply) , Delli, Serena (Italgas)

At Italgas, Europe’s leading gas distributor both by network size and number of customers, we are spearheading digital transformation through a state-of-the-art, fully-fledged Databricks Intelligent platform. Achieved 50% cost reduction and 20% performance boost migrating from Azure Synapse to Databricks SQL Deployed 41 ML/GenAI models in production, with 100% of workloads governed by Unity Catalog Empowered 80% of employees with self-service BI through Genie Dashboards Enabled natural language queries for control-room operators analyzing network status The future of gas distribution is data-driven: predictive maintenance, automated operations, and real-time decision making are now realities. Our AI Factory isn't just digitizing infrastructure—it's creating a more responsive, efficient, and sustainable gas network that anticipates needs before they arise.

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

Kafka Forwarder: Simplifying Kafka Consumption at OpenAI

2025-06-10 Watch
talk
Jigar Bhati (Open AI)

At OpenAI, Kafka fuels real-time data streaming at massive scale, but traditional consumers struggle under the burden of partition management, offset tracking, error handling, retries, Dead Letter Queues (DLQ), and dynamic scaling — all while racing to maintain ultra-high throughput. As deployments scale, complexity multiplies. Enter Kafka Forwarder — a game-changing Kafka Consumer Proxy that flips the script on traditional Kafka consumption. By offloading client-side complexity and pushing messages to consumers, it ensures at-least-once delivery, automated retries, and seamless DLQ management via Databricks. The result? Scalable, reliable and effortless Kafka consumption that lets teams focus on what truly matters. Curious how OpenAI simplified self-service, high-scale Kafka consumption? Join us as we walk through the motivation, architecture and challenges behind Kafka Forwarder, and share how we structured the pipeline to seamlessly route DLQ data into Databricks for analysis.

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

LanceDB: A Complete Search and Analytical Store for Serving Production-scale AI Applications

2025-06-10 Watch
talk
Zero Qu (Databricks) , Chang She (LanceDB)

If you're building AI applications, chances are you're solving a retrieval problem somewhere along the way. This is why vector databases are popular today. But if we zoom out from just vector search, serving AI applications also requires handling KV workloads like a traditional feature store, as well as analytical workloads to explore and visualize data. This means that building an AI application often requires multiple data stores, which means multiple data copies, manual syncing, and extra infrastructure expenses. LanceDB is the first and only system that supports all of these workloads in one system. Powered by Lance columnar format, LanceDB completely breaks open the impossible triangle of performance, scalability, and cost for AI serving. Serving AI applications is different from previous waves of technology, and a new paradigm demands new tools.

No-Code Change in Your Python UDF for Arrow Optimization

No-Code Change in Your Python UDF for Arrow Optimization

2025-06-10 Watch
lightning_talk
Hyukjin Kwon (Databricks)

Apache Spark™ has introduced Arrow-optimized APIs such as Pandas UDFs and the Pandas Functions API, providing high performance for Python workloads. Yet, many users continue to rely on regular Python UDFs due to their simple interface, especially when advanced Python expertise is not readily available. This talk introduces a powerful new feature in Apache Spark that brings Arrow optimization to regular Python UDFs. With this enhancement, users can leverage performance gains without modifying their existing UDFs — simply by enabling a configuration setting or toggling a UDF-level parameter. Additionally, we will dive into practical tips and features for using Arrow-optimized Python UDFs effectively, exploring their strengths and limitations. Whether you’re a Spark beginner or an experienced user, this session will allow you to achieve the best of both simplicity and performance in your workflows with regular Python UDFs.

Optimize Cost and User Value Through Model Routing AI Agent

Optimize Cost and User Value Through Model Routing AI Agent

2025-06-10 Watch
talk
Aditya Gautam (Meta)

Each LLM has unique strengths and weaknesses, and there is no one-size-fits-all solution. Companies strive to balance cost reduction with maximizing the value of their use cases by considering various factors such as latency, multi-modality, API costs, user need, and prompt complexity. Model routing helps in optimizing performance and cost along with enhanced scalability and user satisfaction. Overview of cost-effective models training using AI gateway logs, user feedback, prompt, and model features to design an intelligent model-routing AI agent. Covers different strategies for model routing, deployment in Mosaic AI, re-training, and evaluation through A/B testing and end-to-end Databricks workflows. Additionally, it will delve into the details of training data collection, feature engineering, prompt formatting, custom loss functions, architectural modifications, addressing cold-start problems, query embedding generation and clustering through VectorDB, and RL policy-based exploration.

Reimagining Data Governance and Access at Atlassian

Reimagining Data Governance and Access at Atlassian

2025-06-10 Watch
lightning_talk
Gerald Nakhle (Atlassian)

Atlassian is rebuilding its central lakehouse from the ground up to deliver a more secure, flexible and scalable data environment. In this session, we’ll share how we leverage Unity Catalog for fine-grained governance and supplement it with Immuta for dynamic policy management, enabling row and column level security at scale. By shifting away from broad, monolithic access controls toward a modern, agile solution, we’re empowering teams to securely collaborate on sensitive data without sacrificing performance or usability. Join us for an inside look at our end-to-end policy architecture, from how data owners declare metadata and author policies to the seamless application of access rules across the platform. We’ll also discuss lessons learned on streamlining data governance, ensuring compliance, and improving user adoption. Whether you’re a data architect, engineer or leader, walk away with actionable strategies to simplify and strengthen your own governance and access practices.

Revolutionizing Cybersecurity: SCB's Journey to a Self-Managed SIEM

Revolutionizing Cybersecurity: SCB's Journey to a Self-Managed SIEM

2025-06-10 Watch
talk
Lavy Stokhamer (Standard Chartered Bank)

Join us to explore how Standard Chartered Bank's (SCB) groundbreaking strategy is reshaping the future of the cybersecurity landscape by replacing traditional SIEM with a cutting-edge Databricks solution, achieving remarkable business outcomes: 80% Reduction in time to detect incidents 92% Faster threat investigation 35% Cost reduction 60% Better detection accuracy Significant enhancements in threat detection and response metrics Substantial increase in ML-driven use cases This session unveils SCB's journey to a distributed, multi-cloud lakehouse architecture that unlocks unprecedented performance and commercial optimization. Explore why a unified data and AI platform is becoming the cornerstone of next-generation, self-managed SIEM solutions for forward-thinking organizations in this era of AI-powered banking transformation.

Scaling Data Intelligence at NAB: Balancing Innovation with Enterprise-Grade Governance

Scaling Data Intelligence at NAB: Balancing Innovation with Enterprise-Grade Governance

2025-06-10 Watch
talk
Tom McMeekin (Databricks) , Daniel Antoinette (National Australia Bank)

In this session, discover how National Australia Bank (NAB) is reshaping its data and AI strategy by positioning data as a strategic enabler. Driven by a vision to unlock data like electricity—continuous and reliable—NAB has established a scalable foundation for data intelligence that balances agility with enterprise-grade control. We'll delve into the key architectural, security, and governance capabilities underpinning this transformation, including Unity Catalog, Serverless, Lakeflow and GenAI. The session will highlight NAB's adoption of Databricks Serverless, platform security controls like private link, and persona-based data access patterns. Attendees will walk away with practical insights into building secure, scalable, and cost-efficient data platforms that fuel innovation while meeting the demands of compliance in highly regulated environments.

Sponsored by: Accenture & Avanade | Enterprise Data Journey for The Standard Insurance Leveraging Databricks on Azure and AI Innovation

Sponsored by: Accenture & Avanade | Enterprise Data Journey for The Standard Insurance Leveraging Databricks on Azure and AI Innovation

2025-06-10 Watch
lightning_talk
Sumanta Paul (Accenture)

Modern insurers require agile, integrated data systems to harness AI. This framework for a global insurer uses Azure Databricks to unify legacy systems into a governed lakehouse medallion architecture (bronze/silver/gold layers), eliminating silos and enabling real-time analytics. The solution employs: Medallion architecture for incremental data quality improvement. Unity Catalog for centralized governance, row/column security, and audit compliance. Azure encryption/confidential computing for data mesh security. Automated ingestion/semantic/DevOps pipelines for scalability. By combining Databricks’ distributed infrastructure with Azure’s security, the insurer achieves regulatory compliance while enabling AI-driven innovation (e.g., underwriting, claims). The framework establishes a future-proof foundation for mergers/acquisitions (M&A) and cross-functional data products, balancing governance with agility.

Sponsored by: Astronomer | Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3

Sponsored by: Astronomer | Unlocking the Future of Data Orchestration: Introducing Apache Airflow® 3

2025-06-10 Watch
talk
Vikram Koka (Astronomer)

Airflow 3 is here, bringing a new era of flexibility, scalability, and security to data orchestration. This release makes building, running, and managing data pipelines easier than ever. In this session, we will cover the key benefits of Airflow 3, including: (1) Ease of Use: Airflow 3 rethinks the user experience—from an intuitive, upgraded UI to DAG Versioning and scheduler-integrated backfills that let teams manage pipelines more effectively than ever before (2) Stronger Security: By decoupling task execution from direct database connections, Airflow 3 enforces task isolation and minimal-privilege access. This meets stringent compliance standards while reducing the risk of unauthorized data exposure. (3) Ultimate Flexibility: Run tasks anywhere, anytime with remote execution and event-driven scheduling. Airflow 3 is designed for global, heterogeneous modern data environments with an architecture that facilitates edge and hybrid-cloud to GPU-based deployments.

Sponsored by: AVEVA | CONNECT and Databricks IT-OT Convergence for Industrial Intelligence at Scale

2025-06-10
talk
glenn moffett (AVEVA) , John Baier (Aveva)

Industrial organizations are unlocking new possibilities through the partnership between AVEVA and Databricks. The seamless, no-code, zero-copy solution—powered by Delta Sharing and CONNECT—enables companies to combine IT and OT data effortlessly. By bridging the gap between operational and enterprise data, businesses can harness the power of AI, data science, and business intelligence at an unprecedented scale to drive innovation. In this session, explore real-world applications of this integration, including how industry leaders are using CONNECT and Databricks to boost efficiency, reduce costs, and advance sustainability—all without fragmented point solutions. You’ll also see a live demo of the integration, showcasing how secure, scalable access to trusted industrial data is enabling new levels of industrial intelligence across sectors like mining, manufacturing, power, and oil and gas.

Sponsored by: Capital One Software | How to Manage High-Quality, Secure Data and Cost Visibility for AI

2025-06-10
talk
Laura Case (Capital One Software) , Yudhish Batra (Capital One)

Companies need robust data management capabilities to build and deploy AI. Data needs to be easy to find, understandable, and trustworthy. And it’s even more important to secure data properly from the beginning of its lifecycle, otherwise it can be at risk of exposure during training or inference. Tokenization is a highly efficient method for securing data without compromising performance. In this session, we’ll share tips for managing high-quality, well-protected data at scale that are key for accelerating AI. In addition, we’ll discuss how to integrate visibility and optimization into your compute environment to manage the hidden cost of AI — your data.

Sponsored by: Domo | Behind the Brand: How Sol de Janeiro Powers Amazon Ops with Databricks + DOMO

Sponsored by: Domo | Behind the Brand: How Sol de Janeiro Powers Amazon Ops with Databricks + DOMO

2025-06-10 Watch
talk
Caio Pimenta (Sol de Janeiro)

How does one of the world’s fastest-growing beauty brands stay ahead of Amazon’s complexity and scale retail with precision? At Sol de Janeiro, we built a real-time Amazon Operations Hub—powered by Databricks and DOMO—that drives decisions across inventory, profitability, and marketing ROI. See how the Databricks Lakehouse and DOMO dashboards work together to simplify workflows, surface actionable insights, and enable smarter decisions across the business—from frontline operators to the executive suite. In this session, you’ll get a behind-the-scenes look at how we unified trillions of rows from NetSuite, Amazon, Shopify, and carrier systems into a single source of truth. We’ll show how this hub streamlined cross-functional workflows, eliminated manual reporting, and laid the foundation for AI-powered forecasting and automation.

Sponsored by: Firebolt | The Power of Low-latency Data for AI Apps

Sponsored by: Firebolt | The Power of Low-latency Data for AI Apps

2025-06-10 Watch
lightning_talk
Cole Bowden (Firebolt)

Retrieval-augmented generation (RAG) has transformed AI applications by grounding responses with external data. It can be better. By pairing RAG with low latency SQL analytics, you can enrich responses with instant insights, leading to a more interactive and insightful user experience with fresh, data-driven intelligence. In this talk, we’ll demo how low latency SQL combined with an AI application can deliver speed, accuracy, and trust.