talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

105

Filtering by: Data Lakehouse ×

Sessions & talks

Showing 1–25 of 105 · Newest first

Search within this event →
Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

2025-06-12 Watch
lightning_talk
Ricky D'Sa (Addepar)

Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

2025-06-12 Watch
lightning_talk

While modern lakehouse architectures and open-table formats provide flexibility, they are often challenging to manage. Data layouts, clustering, and small files need to be managed for efficiency. Qbeast’s platform-independent patented muti-column indexing optimizes lakehouse data layout, accelerates queries, and sharply reduces compute cost — without disrupting existing architectures. Qbeast also handles high-cardinality clustering and supports incremental updates. Join us to explore how Qbeast enables efficient, scalable, AI-ready data infrastructure — reducing compute costs independent of data platform and compute engine.

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

2025-06-12 Watch
lightning_talk
Gianfranco Arena (Dxc Technology)

At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

2025-06-12 Watch
talk
Jay Chia (Eventual)

Modern data organizations have moved beyond big data analytics to also incorporate advanced AI/ML data workloads. These workflows often involve multimodal datasets containing documents, images, long-form text, embeddings, URLs and more. Unity Catalog is an ideal solution for organizing and governing this data at scale. When paired with the Daft open source data engine, you can build a truly multimodal, AI-ready data lakehouse. In this session, we’ll explore how Daft integrates with Unity Catalog’s core features (such as volumes and functions) to enable efficient, AI-driven data lakehouses. You will learn how to ingest and process multimodal data (images, text and videos), run AI/ML transformations and feature extractions at scale, and maintain full control and visibility over your data with Unity Catalog’s fine-grained governance.

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 Watch
talk
Jonathan Brito (Databricks)

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Rust and Lakehouse Format — Ask Us Anything

2025-06-12
lightning_talk
Denny Lee (Databricks) , Robert Pack (Databricks) , Tyler Croy (Scribd, Inc.)

Join us for an in-depth Ask Me Anything (AMA) on how Rust is revolutionizing Lakehouse formats like Delta Lake and Apache Iceberg through projects like delta-rs and iceberg-rs! Discover how Rust’s memory safety, zero-cost abstractions and fearless concurrency unlock faster development and higher-performance data operations. Whether you’re a data engineer, Rustacean or Lakehouse enthusiast, bring your questions on how Rust is shaping the future of open table formats!

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

2025-06-12 Watch
lightning_talk
Gabriel Lyons (Datahub)

While Databricks powers your data lakehouse, DataHub delivers the critical context layer connecting your entire ecosystem. We'll demonstrate how DataHub extends Unity Catalog to provide comprehensive metadata intelligence across platforms. DataHub's real-time platform:Cut AI model time-to-market with our unified REST and GraphQL APIs that ensure models train on reliable and compliant data from across platforms, with complete lineage trackingDecrease data incidents by 60% using our event-driven architecture that instantly propagates changes across systems*Transform data discovery from days to minutes with AI-powered search and natural language interfaces.Leaders use DataHub to transform Databricks data into integrated insights that drive business value. See our demo of syncback technology—detecting sensitive data and enforcing Databricks access controls automatically—plus our AI assistant that enhances' LLMs with cross-platform metadata.

Sponsored by: definity | How You Could Be Saving 50% of Your Spark Costs

Sponsored by: definity | How You Could Be Saving 50% of Your Spark Costs

2025-06-12 Watch
lightning_talk
Roy Daniel (definity)

Enterprise lakehouse platforms are rapidly scaling – and so are complexity and cost. After monitoring over 1B vCore-hours across Databricks and other Apache Spark™ environments, we consistently saw resource waste, preventable data incidents, and painful troubleshooting. Join this session to discover how definity’s unique full-stack observability provides job-level visibility in-motion, unifying infrastructure performance, pipeline execution, and data behavior, and see how enterprise teams use definity to easily optimize jobs and save millions – while proactively ensuring SLAs, preventing issues, and simplifying RCA.

Eliminate Hops in Your Streaming Architecture with Zerobus, Part of Lakeflow Connect

2025-06-12
talk
Victoria Bukta (Databricks) , Nikola Obradovic (Databricks)

In this session, we’ll introduce Zerobus Direct Write API, part of Lakeflow Connect, which enables you to push data directly to your lakehouse and simplify ingestion for IOT, clickstreams, telemetry, and more. We’ll start with an overview of the ingestion landscape to date. Then, we'll cover how you can “shift left” with Zerobus, embedding data ingestion into your operational systems to make analytics and AI a core component of the business, rather than an afterthought. The result is a significantly simpler architecture that scales your operations, using this new paradigm to skip unnecessary hops. We'll also highlight one of our early customers, Joby Aviation and how they use Zerobus. Finally, we’ll provide a framework to help you understand when to use Zerobus versus other ingestion offerings—and we’ll wrap up with a live Q&A so that you can hit the ground running with your own use cases.

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

2025-06-12 Watch
lightning_talk
Robert Young (BDO Canada)

Drawing on BDO Canada’s deep expertise in the electricity sector, this session explores how clean energy innovation can be accelerated through a holistic approach to data quality. Discover BDO’s practical framework for implementing data quality and rebuilding trust in data through a structured, scalable approach. BDO will share a real-world example of monitoring data at scale—from high-level executive dashboards to the details of daily ETL and ELT pipelines. Learn how they leveraged Soda’s data observability platform to unlock near-instant insights, and how they moved beyond legacy validation pipelines with built-in checks across their production Lakehouse. Whether you're a business leader defining data strategy or a data engineer building robust data products, this talk connects the strategic value of clean data with actionable techniques to make it a reality.

Building Responsible AI Agents on Databricks

Building Responsible AI Agents on Databricks

2025-06-12 Watch
talk
Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Get the Most of Your Delta Lake

Get the Most of Your Delta Lake

2025-06-12 Watch
lightning_talk
Youssef Mrini (Databricks)

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

2025-06-12 Watch
talk
Tim Kessler (Redox, Inc.) , Matthew Giglia (Databricks)

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 Watch
lightning_talk
Alex Dean (Snowplow)

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

ClickHouse and Databricks for Real-Time Analytics

ClickHouse and Databricks for Real-Time Analytics

2025-06-11 Watch
talk
Melvyn Peignon (ClickHouse)

ClickHouse is a C++ based, column-oriented database built for real-time analytics. While it has its own internal storage format, the rise of open lakehouse architectures has created a growing need for seamless interoperability. In response, we have developed integrations with your favorite lakehouse ecosystem to enhance compatibility, performance and governance. From integrating with Unity Catalog to embedding the Delta Kernel into ClickHouse, this session will explore the key design considerations behind these integrations, their benefits to the community, the lessons learned and future opportunities for improved compatibility and seamless integration.

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

2025-06-11 Watch
talk
Satish Karunakaran (Robert Bosch GmbH) , Marc-Alexander Frey (Robert Bosch GmbH)

This session will showcase Bosch’s journey in consolidating supply chain information using the Databricks platform. It will dive into how Databricks not only acts as the central data lakehouse but also integrates seamlessly with transformative components such as dbt and Large Language Models (LLMs). The talk will highlight best practices, architectural considerations, and the value of an interoperable platform in driving actionable insights and operational excellence across complex supply chain processes. Key Topics and Sections Introduction & Business Context Brief Overview of Bosch’s Supply Chain Challenges and the Need for a Consolidated Data Platform. Strategic Importance of Data-Driven Decision-Making in a Global Supply Chain Environment. Databricks as the Core Data Platform Integrating dbt for Transformation Leveraging LLM Models for Enhanced Insights

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

2025-06-11 Watch
talk
Tathagata Das (Databricks) , Michelle Leon (Databricks)

The lakehouse is built for storage flexibility, but what about compute? In this session, we’ll explore how Unity Catalog enables you to connect and govern multiple compute engines across your data ecosystem. With open APIs and support for the Iceberg REST Catalog, UC lets you extend access to engines like Trino, DuckDB, and Flink while maintaining centralized security, lineage, and interoperability. We will show how you can get started today working with engines like Apache Spark and Starburst to read and write to UC managed tables with some exciting demos. Learn how to bring flexibility to your compute layer—without compromising control.

Hands-on Learning: Databricks SQL in Action: Intelligent Data Warehousing, Analytics and BI Workshop (repeat)

2025-06-11
workshop
Pearl Ubaru (Databricks)

Most organizations run complex cloud data architectures that silo applications, users and data. Join this interactive hands-on workshop to learn how Databricks SQL allows you to operate a multi-cloud lakehouse architecture that delivers data warehouse performance at data lake economics — with up to 12x better price/performance than traditional cloud data warehouses. Here’s what we’ll cover: How Databricks SQL fits in the Data Intelligence Platform, enabling you to operate a multicloud lakehouse architecture that delivers data warehouse performance at data lake economics How to manage and monitor compute resources, data access and users across your lakehouse infrastructure How to query directly on your data lake using your tools of choice or the built-in SQL editor and visualizations How to use AI to increase productivity when querying, completing code or building dashboards Ask your questions during this hands-on lab, and the Databricks experts will guide you.

HP's Data Platform Migration Journey: Redshift to Lakehouse

HP's Data Platform Migration Journey: Redshift to Lakehouse

2025-06-11 Watch
talk
Isaac Chan (HP Inc.) , Kavya Atmakuri (HP Inc.)

HP Print's data platform team took on a migration from a monolithic, shared resource of AWS Redshift, to a modular and scalable data ecosystem on Databricks lakehouse.​ The result was 30–40% cost savings, scalable and isolated resources for different data consumers and ETL workloads, and performance optimization for a variety of query types.​ Through this migration, there were technical challenges and learnings relating to the ETL migrations with DBT, new Databricks features like Liquid Clustering, predictive optimization, Photon, SQL serverless warehouses, managing multiple teams on Unity Catalog, and others.​ This presentation dives into both the business and technical sides of this migration. Come along as we share our key takeaways from this journey.​

Summit Live: OLTP for the Lakehouse

Summit Live: OLTP for the Lakehouse

2025-06-11 Watch
talk
Dave Nettleton (Databricks)

Analytical and operational use cases are starting to converge, and AI-assisted applications are accelerating the trend. Most applications require a transactional, OLTP database to power data. Hear from a Databricks expert on the latest developments and our strategy for operational data integrated into the lakehouse.

How Serverless Empowered Nationwide to Build Cost-Efficient and World Class BI

How Serverless Empowered Nationwide to Build Cost-Efficient and World Class BI

2025-06-11 Watch
lightning_talk
Ananya Ghosh (Nationwide)

Databricks’ Serverless compute streamlines infrastructure setup and management, delivering unparalleled performance and cost optimization for Data and BI workflows. In this presentation, we will explore how Nationwide is leveraging Databricks’ serverless technology and unified governance through Unity Catalog to build scalable, world-class BI solutions. Key features like AI/BI Dashboards, Genie, Materialized Views, Lakehouse Federation and Lakehouse Apps, all powered by serverless, have empowered business teams to deliver faster, scalable and smarter insights. We will show how Databricks’ serverless technology is enabling Nationwide to unlock new levels of efficiency and business impact, and how other organizations can adopt serverless technology to realize similar benefits.

Developing the Dreamers of Data + AI’s Future: How 84.51˚ builds upskilling to accelerate adoption

Developing the Dreamers of Data + AI’s Future: How 84.51˚ builds upskilling to accelerate adoption

2025-06-11 Watch
talk
Michael Carrico (84.51)

“Once an idea has taken hold of the brain it's almost impossible to eradicate. An idea that is fully formed — fully understood — that sticks, right in there somewhere.” The Data Scientists and Engineers at 84.51˚ utilize the Databricks Lakehouse for a wide array of tasks, including data exploration, analysis, machine learning operations, orchestration, automated deployments and collaboration. In this talk, 84.51˚’s Data Science Learning Lead, Michael Carrico, will share their approach to upskilling a diverse workforce to support the company’s strategic initiatives. This approach includes creating tailored learning experiences for a variety of personas using content curated in partnership with Databricks’ educational offerings. Then he will demonstrate how he puts his 11 years of data science and engineering experience to work by using the Databricks Lakehouse not just as a subject, but also as a tool to create impactful training experiences and a learning culture at 84.51˚.

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

2025-06-11 Watch
talk
Ruotian Wang (Doordash) , Sergey Zavgorodni (DoorDash)

DoorDash Data organization actively adopts LakeHouse paradigm. This presentation describes the methodology which allows to migrate the classic Data Warehouse and Data Lake platforms to unified LakeHouse solution.The objective of this effort include Elimination of excessive data movement.Seamless integration and consolidation of the query engine layers, including Snowflake, Databricks, EMR and Trino.Query performance optimization.Abstracting away complexity of underlying storage layers and table formatsStrategic and justified decision on the Unified Metadata catalog used across varios compute platforms

Multi-Statement Transactions: How to Improve Data Consistency and Performance

Multi-Statement Transactions: How to Improve Data Consistency and Performance

2025-06-11 Watch
lightning_talk
Franco Patano (Databricks)

Multi-statement transactions bring the atomicity and reliability of traditional databases to modern data warehousing on the lakehouse. In this session, we’ll explore real-world patterns enabled by multi-statement transactions — including multi-table updates, deduplication pipelines and audit logging — and show how Databricks ensures atomicity and consistency across complex workflows. We’ll also dive into demos and share tips to getting started and migrations with this feature in Databricks SQL.

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

2025-06-11 Watch
lightning_talk
Dmitri Ryssev (Dataiku)

AI agent systems hold immense promise for automating complex tasks and driving intelligent decision‑making, but only when they are engineered to be both resilient and transparent. In this session we will explore how Dataiku’s LLM Mesh pairs with Databricks Mosaic AI to streamline the entire lifecycle: ingesting and preparing data in the Lakehouse, prompt engineering LLMs hosted on Mosaic AI Model Serving Endpoints, visually orchestrating multi‑step chains, and monitoring them in real time. We’ll walk through a live demo of a Dataiku flow that connects to a Databricks hosted model, adds automated validation, lineage, and human‑in‑the‑loop review, then exposes the agent via Dataiku's Agent Connect interface. You’ll leave with actionable patterns for setting guardrails, logging decisions, and surfacing explanations—so your organization can deploy trustworthy domain‑specific agents faster & safer.