talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

715

Sessions & talks

Showing 551–575 of 715 · Newest first

Search within this event →
Power BI and Databricks: Practical Best Practices

Power BI and Databricks: Practical Best Practices

2025-06-10 Watch
talk
Hobbs Hobbs (Databricks)

This session is repeated. Power BI has long been the dominant BI tool in the market. In this session, we'll discuss how to get the most out of PowerBI and Databricks, beginning with high-level architecture and moving down into detailed how-to guides for troubleshooting common failure points. At the end, you'll receive a cheat-sheet which summarizes those best practices into an easy-to-reference format.

Scaling Data Governance: How Unity Catalog is Empowering Picpay's Data Governance Strategy

Scaling Data Governance: How Unity Catalog is Empowering Picpay's Data Governance Strategy

2025-06-10 Watch
talk
Lucas Morelato (PicPay) , Gustavo Tadao Okida (PicPay)

With massive data volume and complexity, scaling data governance became a significant challenge. Centralizing metadata management, ensuring regulatory compliance and controlling data access across multiple platforms turned to be critical to maintaining efficiency and trust.

Scaling Demand Forecasting at Nikon: Automating Camera Accessories Sales Planning with Databricks

Scaling Demand Forecasting at Nikon: Automating Camera Accessories Sales Planning with Databricks

2025-06-10 Watch
talk
Heya Ouyang (Nikon Corporation)

At Nikon, camera accessories are essential in meeting the diverse needs of professional photographers worldwide, making their timely availability a priority. Forecasting accessories, however, presents unique challenges including dependencies on parent products, sparse demand patterns, and managing predictions for thousands of items across global subsidiaries. To address this, we leveraged Databricks' unified data and AI platform to develop and deploy an automated, scalable solution for accessory sales planning. Our solution employs a hybrid approach that auto-selects best algorithm from a suite of ML and time-series models, incorporating anomaly detection and methods to handle sparse and low-demand scenarios. MLflow is utilized to automate model logging and versioning, enabling efficient management, and scalable deployment. The framework includes data preparation, model selection and training, performance tracking, prediction generation, and output processing for downstream systems.

Scaling XGBoost With Spark Connect ML on Grace Blackwell

Scaling XGBoost With Spark Connect ML on Grace Blackwell

2025-06-10 Watch
talk
Bobby Wang (NVIDIA) , Jiaming Yuan (​NVIDIA Semiconductor Co., Ltd)

XGBoost is one of the off-the-shelf gradient boosting algorithms for analyzing tabular datasets. Unlike deep learning, gradient-boosting decision trees require the entire dataset to be in memory for efficient model training. To overcome the limitation, XGBoost features a distributed out-of-core implementation that fetches data in batch, which benefits significantly from the latest NVIDIA GPUs and the NVLink-C2C’s ultra bandwidth. In this talk, we will share our work on optimizing XGBoost using the Grace Blackwell super chip. The fast chip-to-chip link between the CPU and the GPU enables XGBoost to scale up without compromising performance. Our work has effectively increased XGBoost’s training capacity to over 1.2TB on a single node. The approach is scalable to GPU clusters using Spark, enabling XGBoost to handle terabytes of data efficiently. We will demonstrate combining XGBoost out-of-core algorithms with the latest connect ML from Spark 4.0 for large model training workflows.

Spark 4.0 and Delta 4.0 For Streaming Data

Spark 4.0 and Delta 4.0 For Streaming Data

2025-06-10 Watch
talk
Bryce Bartmann (Shell)

Real-time data is one of the most important datasets for any Data and AI Platform across any industry. Spark 4.0 and Delta 4.0 include new features that make ingestion and querying of real-time data better than ever before. Features such as: Python custom data sources for simple ingestion of streaming and batch time series data sources using Spark Variant types for managing variable data types and json payloads that are common in the real time domain Delta liquid clustering for simple data clustering without the overhead or complexity of partitioning In this presentation you will learn how data teams can leverage these latest features to build industry-leading, real-time data products using Spark and Delta and includes real world examples and metrics of the improvements they make in performance and processing of data in the real time space.

Spark Connect: Flexible, Local Access to Apache Spark at Scale

Spark Connect: Flexible, Local Access to Apache Spark at Scale

2025-06-10 Watch
talk
James Malone (Databricks)

What if you could run Spark jobs without worrying about clusters, versions and upgrades? Did you know Spark has this functionality built-in today? Join us to take a look at this functionality — Spark Connect. Join us to dig into how Spark Connect works — abstracting away Spark clusters away in favor of the DataFrame API and unresolved logical plans. You will learn some of the cool things Spark Connect unlocks, including: Moving you from thinking about clusters to just thinking about jobs Making Spark code more portable and platform agnostic Enabling support for languages such as Go

Unlocking the Future of Dairy Farming: Leveraging Data Marketplaces at Lely

Unlocking the Future of Dairy Farming: Leveraging Data Marketplaces at Lely

2025-06-10 Watch
talk
Simon Krejci (Lely) , Bulut Ficici (Lely)

Lely, a Dutch company specializing in dairy farming robotics, helps farmers with advanced solutions for milking, feeding and cleaning. This session explores Lely’s implementation of an Internal Data Marketplace, built around Databricks' Private Exchange Marketplace. The marketplace serves as a central hub for data teams and business users, offering seamless access to data, analytics and dashboards. Powered by Delta Sharing, it enables secure, private listing of data products across business domains, including notebooks, views, models and functions. This session covers the pros and cons of this approach, best practices for setting up a data marketplace and its impact on Lely’s operations. Real-world examples and insights will showcase the potential of integrating data-driven solutions into dairy farming. Join us to discover how data innovation drives the future of dairy farming through Lely’s experience.

AI Agents in Action: Structuring Unstructured Data on Demand With Databricks and Unstructured

AI Agents in Action: Structuring Unstructured Data on Demand With Databricks and Unstructured

2025-06-10 Watch
lightning_talk
Christopher Maddock (Unstructured)

LLM agents aren’t just answering questions — they’re running entire workflows. In this talk, we’ll show how agents can autonomously ingest, process and structure unstructured data using Unstructured, with outputs flowing directly into Databricks. Powered by the Model Context Protocol (MCP), agents can interface with Unstructured’s full suite of capabilities — discovering documents across sources, building ephemeral workflows and exporting structured insights into Delta tables. We’ll walk through a demo where an agent responds to a natural language request, dynamically pulls relevant documents, transforms them into usable data and surfaces insights — fast. Join us for a sneak peek into the future of AI-native data workflows, where LLMs don’t just assist — they operate.

Breaking Silos: Cigna’s Journey to Seamless Data Sharing with Delta Sharing

Breaking Silos: Cigna’s Journey to Seamless Data Sharing with Delta Sharing

2025-06-10 Watch
lightning_talk
Jay Ehlen (Evernorth Health Services) , Nick De Young (The Cigna Group)

As data ecosystems grow increasingly complex, the ability to share data securely, seamlessly, and in real time has become a strategic differentiator. In this session, Cigna will showcase how Delta Sharing on Databricks has enabled them to modernize data delivery, reduce operational overhead, and unlock new market opportunities. Learn how Cigna achieved significant savings by streamlining operations, compute, and platform overhead for just one use case. Explore how decentralizing data ownership—transitioning from hyper-centralized teams to empowered product owners—has simplified delivery and accelerated innovation. Most importantly, see how this modern open data-sharing framework has positioned Cigna to win contracts they previously couldn’t, by enabling real-time, cross-organizational data collaboration with external partners. Join us to hear how Cigna is using Delta Sharing not just as a technical enabler, but as a business catalyst.

Cracking Complex Documents with Databricks Mosaic AI

Cracking Complex Documents with Databricks Mosaic AI

2025-06-10 Watch
lightning_talk
Gavi Regunath (Advancing Analytics)

In this session, we will share how we are transforming the way organizations process unstructured and non-standard documents using Mosaic AI and agentic patterns within the Databricks ecosystem. We have developed a scalable pipeline that turns complex legal and regulatory content into structured, tabular data.We will walk through the full architecture, which includes Unity Catalog for secure and governed data access, Databricks Vector Search for intelligent indexing and retrieval and Databricks Apps to deliver clear insights to business users. The solution supports multiple languages and formats, making it suitable for teams working across different regions. We will also discuss some of the key technical challenges we addressed, including handling parsing inconsistencies, grounding model responses and ensuring traceability across the entire process. If you are exploring how to apply GenAI and large language models, this session is for you. Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

From Spaghetti Bowl Pipeline to Lakeflow Declarative Pipelines Efficiency

2025-06-10
lightning_talk
Peter Jones (Intermountain Healthcare)

In today's data-driven world, the ability to efficiently manage and transform data is crucial for any organization. This presentation will explore the process of converting a complex and messy workflow into a clean and simple Lakeflow Declarative Pipelines at a large integrated health system, Intermountain Health.Alteryx is a powerful tool for data preparation and blending, but as workflows grow in complexity, they can become difficult to manage and maintain. Lakeflow Declarative Pipelines, on the other hand, offers a more democratized, streamlined and scalable approach to data engineering, leveraging the power of Apache Spark and Delta Lake.We will begin by examining a typical legacy workflow, identifying common pain points such as tangled logic, performance bottlenecks and maintenance challenges. Next, we will demonstrate how to translate this workflow into a Lakeflow Declarative Pipelines, highlighting key steps such as data transformation, validation and delivery.

Learn to Program Not Write Prompts with DSPy

Learn to Program Not Write Prompts with DSPy

2025-06-10 Watch
lightning_talk
Austin Choi (Databricks)

Writing prompts for our GenAI applications is long, tedious, and unmaintainable. A proper software development lifecycle requires proper testing and maintenance, something incredibly difficult to do on a block of text. Our current prompt engineering best practices have largely been manual trial and error, testing which of our prompts work well in certain situations. This process worsens as our prompts become more complex, adding multiple tasks and functionality within one long singular prompt. Enter DSPy, your PROGRAMATIC way of building GenAI Applications. Learn how DSPy allows you to modularize your prompt into modules and enforce typing through signatures. Then, utilize state of the art algorithms to optimize the prompts and weights against your evaluation datasets, just like machine learning! We will compare DSPy to a restaurant to help illustrate and demo DSPy’s capabilities. It's time to start programming, rather than prompting, again!

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

Petabyte-Scale On-Chain Insights: Real-Time Intelligence for the Next-Gen Financial Backbone

2025-06-10 Watch
lightning_talk
Leo Liang (CipherOwl Inc)

We’ll explore how CipherOwl Inc. constructed a near real-time, multi-chain data lakehouse to power anti-money laundering (AML) monitoring at a petabyte scale. We will walk through the end-to-end architecture, which integrates cutting-edge open-source technologies and AI-driven analytics to handle massive on-chain data volumes seamlessly. Off-chain intelligence complements this to meet rigorous AML requirements. At the core of our solution is ChainStorage, an OSS started by Coinbase that provides robust blockchain data ingestion and block-level serving. We enhanced it with Apache Spark™ and Arrow™, coupled for high-throughput processing and efficient data serialization, backed by Delta Lake and Kafka. For the serving layer, we employ StarRocks to deliver lightning-fast SQL analytics over vast datasets. Finally, our system incorporates machine learning and AI agents for continuous data curation and near real-time insights, which are crucial for tackling on-chain AML challenges.

Sponsored by: Deloitte | Accelerating Biopharmaceutical Breakthroughs with an Innovative Enterprise Data Strategy

Sponsored by: Deloitte | Accelerating Biopharmaceutical Breakthroughs with an Innovative Enterprise Data Strategy

2025-06-10 Watch
lightning_talk
Shri Chary (Deloitte)

In the rapidly evolving life sciences and healthcare industry, leveraging data-as-a-product is crucial for driving innovation and achieving business objectives. Join us to explore how Deloitte is revolutionizing data strategy solutions by overcoming challenges such as data silos, poor data quality, and lack of real-time insights with the Databricks Data Intelligence Platform. Learn how effective data governance, seamless data integration, and scalable architectures support personalized medicine, regulatory compliance, and operational efficiency. This session will highlight how these strategies enable biopharma companies to transform data into actionable insights, accelerate breakthroughs and enhance life sciences outcomes.

Sponsored by: Neo4j | Get Your Data AI-Ready: Knowledge Graphs & GraphRAG for GenAI Success

Sponsored by: Neo4j | Get Your Data AI-Ready: Knowledge Graphs & GraphRAG for GenAI Success

2025-06-10 Watch
lightning_talk
Pramod Borkar (Neo4j)

Enterprise-grade GenAI needs a unified data strategy for accurate, reliable results. Learn how knowledge graphs make structured and unstructured data AI-ready while enabling governance and transparency. See how GraphRAG (retrieval-augmented generation with knowledge graphs) drives real success: Learn how companies like Klarna have deployed GenAI to build chatbots grounded in knowledge graphs, improving productivity and trust, while a major gaming company achieved 10x faster insights. We’ll share real examples and practical steps for successful GenAI deployment.

Sponsored by: SAP | SAP and Databricks Open a Bold New Era of Data and AI​

Sponsored by: SAP | SAP and Databricks Open a Bold New Era of Data and AI​

2025-06-10 Watch
lightning_talk
H Nair (SAP)

SAP and Databricks have formed a landmark partnership that brings together SAP's deep expertise in mission-critical business processes and semantically rich data with Databricks' industry-leading capabilities in AI, machine learning, and advanced data engineering. From curated, SAP-managed data products to zero-copy Delta Sharing integration, discover how SAP Business Data Cloud empowers data and AI professionals to build AI solutions that unlock unparalleled business insights using trusted business data.

Sponsored by: Sigma | Flogistix by Flowco, and the Role of Data in Responsible Energy Production

Sponsored by: Sigma | Flogistix by Flowco, and the Role of Data in Responsible Energy Production

2025-06-10 Watch
lightning_talk
Ali Sylvester (Flogistix) , Danny Burrows (Flowco)

As global energy demands continue to rise, organizations must boost efficiency while staying environmentally responsible. Flogistix uses Sigma and Databricks to build a unified data architecture for real-time, data-driven decisions in vapor recovery systems. With Sigma on the Databricks Data Intelligence Platform, Flogistix gains precise operational insights and identifies optimization opportunities that reduce emissions, streamline workflows, and meet industry regulations. This empowers everyone, from executives to field mechanics, to drive sustainable resource production. Discover how advanced analytics are transforming energy practices for a more responsible future.

AI and Genie: Analyzing Healthcare Improvement Opportunities

AI and Genie: Analyzing Healthcare Improvement Opportunities

2025-06-10 Watch
talk
Jay Sharma (Premier Inc) , Tim Riddle (Premier Inc)

This session is repeated. Improving healthcare impacts us all. We highlight how Premier Inc. took risk-adjusted patient data from more than 1,300 member hospitals across America, applying a natural language interface using AI/BI Genie, allowing our users to discover new insights. The stakes are high, new insights surfaced represent potential care improvement and lives positively impacted. Using Genie and our AI-ready data in Unity Catalog, our team was able to stand up a Genie instance in three short days, bypassing costs and time of custom modeling and application development. Additionally, Genie allowed our internal teams to generate complex SQL, as much as 10 times faster than writing it by hand. As Genie and lakehouse apps continue to advance rapidly, we are excited to leverage these features by introducing Genie to as many as 20,000 users across hundreds of hospitals. This will support our members’ ongoing mission to enhance the care they provide to the communities they serve.

Apache Iceberg with Unity Catalog at HelloFresh

Apache Iceberg with Unity Catalog at HelloFresh

2025-06-10 Watch
talk
Max Schultze (HelloFresh) , Adam Komisarek (HelloFresh)

Table formats like Delta Lake and Iceberg have been game changers for pushing lakehouse architecture into modern Enterprises. The acquisition of Tabular added Iceberg to the Databricks ecosystem, an open format that was already well supported by processing engines across the industry. At HelloFresh we are building a lakehouse architecture that integrates many touchpoints and technologies all across the organization. As such we chose Iceberg as the table format to bridge the gaps in our decentralized managed tech landscape. We are leveraging Unity Catalog as the Iceberg REST catalog of choice for storing metadata and managing tables. In this talk we will outline our architectural setup between Databricks, Spark, Flink and Snowflake and will explain the native Unity Iceberg REST catalog, as well as catalog federation towards connected engines. We will highlight the impact on our business and discuss the advantages and lessons learned from our early adopter experience.

AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text

AT&T AutoClassify: Unified Multi-Head Binary Classification From Unlabeled Text

2025-06-10 Watch
talk
Hien Lam (AT&T) , Colton Peltier (Databricks)

We present AT&T AutoClassify, built jointly between AT&T's Chief Data Office (CDO) and Databricks professional services, a novel end-to-end system for automatic multi-head binary classifications from unlabeled text data. Our approach automates the challenge of creating labeled datasets and training multi-head binary classifiers with minimal human intervention. Starting only from a corpus of unlabeled text and a list of desired labels, AT&T AutoClassify leverages advanced natural language processing techniques to automatically mine relevant examples from raw text, fine-tune embedding models and train individual classifier heads for multiple true/false labels. This solution can reduce LLM classification costs by 1,000x, making it an efficient solution in operational costs. The end result is a highly optimized and low-cost model servable in Databricks capable of taking raw text and producing multiple binary classifications. An example use case using call transcripts will be examined.

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

Bayada’s Snowflake-to-Databricks Migration: Transforming Data for Speed & Efficiency

2025-06-10 Watch
talk
Venkatesh Guruprasad (BAYADA Home Health Care) , PradeepKumar jain Vimalraj (Tredence Inc) , Elaine O'Neill (BAYADA Home Health Care)

Bayada is transforming its data ecosystem by consolidating Matillion+Snowflake and SSIS+SQL Server into a unified Enterprise Data Platform powered by Databricks. Using Databricks' Medallion architecture, this platform enables seamless data integration, advanced analytics and machine learning across critical domains like general ledger, recruitment and activity-based costing. Databricks was selected for its scalability, real-time analytics and ability to handle both structured and unstructured data, positioning Bayada for future growth. The migration aims to reduce data processing times by 35%, improve reporting accuracy and cut reconciliation efforts by 40%. Operational costs are projected to decrease by 20%, while real-time analytics is expected to boost efficiency by 15%. Join this session to learn how Bayada is leveraging Databricks to build a high-performance data platform that accelerates insights, drives efficiency and fosters innovation organization-wide.

Beyond Simple RAG: Unlocking Quality, Scale and Cost-Efficient Retrieval With Mosaic AI Vector Search

Beyond Simple RAG: Unlocking Quality, Scale and Cost-Efficient Retrieval With Mosaic AI Vector Search

2025-06-10 Watch
talk
Ankit Vij (Databricks) , Adam Gurary (Databricks)

This session is repeated. Mosaic AI Vector Search is powering high-accuracy retrieval systems in production across a wide range of use cases — including RAG applications, entity resolution, recommendation systems and search. Fully integrated with the Databricks Data Intelligence Platform, it eliminates pipeline maintenance by automatically syncing data from source to index. Over the past year, customers have asked for greater scale, better quality out-of-the-box and cost-efficient performance. This session delivers on those needs — showcasing best practices for implementing high-quality retrieval systems and revealing major product advancements that improve scalability, efficiency and relevance. What you’ll learn: How to optimize Vector Search with hybrid retrieval and reranking for better out-of-the-box results Best practices for managing vector indexes with minimal operational overhead Real-world examples of how organizations have scaled and improved their search and recommendation systems

Building a Seamless Multi-Cloud Platform for Secure Portable Workloads

Building a Seamless Multi-Cloud Platform for Secure Portable Workloads

2025-06-10 Watch
talk
James Burns (Databricks) , Scott Reynolds (Databricks)

There are many challenges to making a data platform actually a platform, something that hides complexity. Data engineers and scientists are looking for a simple and intuitive abstraction to focus on their work, not where it runs to maintain compliance, what credentials it uses to access data or how it generates operational telemetry. At Databricks we’ve developed a data-centric approach to workload development and deployment that enables data workers to stop doing migrations and instead develop with confidence. Attend this session to learn how to run simple, secure and compliant global multi-cloud workloads at scale on Databricks.

Chaos to Clarity: Secure, Scalable, and Governed SaaS Ingestion through Lakeflow Connect and more

Chaos to Clarity: Secure, Scalable, and Governed SaaS Ingestion through Lakeflow Connect and more

2025-06-10 Watch
talk
Krishna Bhupatiraju (Databricks) , Prashant Gupta (Databricks)

Ingesting data from SaaS systems sounds straightforward—until you hit API limits, miss SLAs, or accidentally ingest PII. Sound familiar? In this talk, we’ll share how Databricks evolved from scrappy ingestion scripts to a unified, secure, and scalable ingestion platform. Along the way, we’ll highlight the hard lessons, the surprising pitfalls, and the tools that helped us level up. Whether you’re just starting to wrangle third-party data or looking to scale while handling governance and compliance, this session will help you think beyond pipelines and toward platform thinking.

Crafting Business Brilliance: Leveraging Databricks SQL for Next-Gen Applications

Crafting Business Brilliance: Leveraging Databricks SQL for Next-Gen Applications

2025-06-10 Watch
talk
Mohammad Shalchi (Haleon) , Wasim Ahmad (Databricks)

At Haleon, we've leveraged Databricks APIs and serverless compute to develop customer-facing applications for our business. This innovative solution enables us to efficiently deliver SAP invoice and order management data through front-end applications developed and served via our API Gateway. The Databricks lakehouse architecture has been instrumental in eliminating the friction associated with directly accessing SAP data from operational systems, while enhancing our performance capabilities. Our system acheived response times of less than 3 seconds from API call, with ongoing efforts to optimise this performance. This architecture not only streamlines our data and application ecosystem but also paves the way for integrating GenAI capabilities with robust governance measures for our future infrastructure. The implementation of this solution has yielded significant benefits, including a 15% reduction in customer service costs and a 28% increase in productivity for our customer support team.