Data + AI Summit 2025

Thursday Keynote (Virtual Replay)

2025-06-13

keynote

AI/ML Databricks

Be first to witness the latest breakthroughs from Databricks and share the success of innovative data and AI companies.

Summit Live: A Conversation With AI influencer Josue Bogran

2025-06-12 Watch

talk

Josue Bogran (JosueBogran.com & zeb.co)

AI/ML

Josue is well known for his practical perspectives on the data and AI landscape. We'll talk about what he is seeing in the market, his take on product feature updates, and some humor mixed in.

Summit Live: Women In Data and AI Conversation

2025-06-12 Watch

talk

Lisa Cohen (Anthropic) , Kate Ostbye (Pfizer) , Holly Smith (Databricks) , Pallavi Koppol (Databricks)

AI/ML Databricks

Each year at Summit, Women in Data and AI have a half day for in-person discussions on empowering Women in Data and AI Breakfast, and networking with like-minded professionals and trailblazers. For this virtual discussion, hear from Kate Ostbye (Pfizer), Lisa Cohen (Anthropic), Pallavi Koppol and Holly Smith (Databricks) about navigating challenges, celebrating successes, and inspire one another as we champion diversity and innovation in data together. And how to get involved year-round.

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

2025-06-12 Watch

lightning_talk

Ricky D'Sa (Addepar)

Data Lakehouse Databricks Delta

Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.

Declarative Pipelines — Ask Us Anything

2025-06-12

lightning_talk

Denny Lee (Databricks) , Sandy Ryza (Databricks) , Xiao Li (Databricks)

ETL/ELT SQL

Join us for an insightful Ask Me Anything (AMA) session on Declarative Pipelines — a powerful approach to simplify and optimize data workflows. Learn how to define data transformations using high-level, SQL-like semantics, reducing boilerplate code while improving performance and maintainability. Whether you're building ETL processes, feature engineering pipelines, or analytical workflows, this session will cover best practices, real-world use cases and how Declarative Pipelines can streamline your data applications. Bring your questions and discover how to make your data processing more intuitive and efficient!

Route to Success: Scalable Routing Agents With Databricks and DSPy

2025-06-12 Watch

lightning_talk

Luis Moros (Databricks)

AI/ML Databricks GenAI

As companies increasingly adopt Generative AI, they're faced with a new challenge: managing multiple AI assistants. What if you could have a single, intuitive interface that automatically directs questions to the best assistant for the task? Join us to discover how to implement a flexible Routing Agent that streamlines working with multiple AI Assistants. We'll show you how to leverage Databricks and DSPy 3.0 to simplify adding this powerful pattern to your system. We'll dive into the essential aspects including: Using DSPy optimizers to maximize correct route selections Optimizing smaller models to reduce latency Creating stateful interactions Designing for growth and adaptability to support tens or hundreds of AI Assistants Ensuring authorized access to AI Assistants Tracking performance in production environments We'll share real-world examples that you can apply today. You'll leave with the knowledge to make your AI system run smoothly and efficiently.

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

2025-06-12 Watch

lightning_talk

Gianfranco Arena (Dxc Technology)

AWS Data Lakehouse Databricks Delta DWH

At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.

Achieve Your Mission With AI-Driven Decisions

2025-06-12 Watch

talk

Shannon Bisselink (Databricks) , Spencer Schaefer (Federal Gov (VA) / Lunar Analytics (Ai)) , Suresh Kaudi (World Bank) , Andrew Hahn (Databricks)

AI/ML Cyber Security

Government leaders overwhelmingly recognize the potential benefits of AI as critical to long-term strategic goals of efficiency, but implementation challenges and security concerns could be obstacles to success.

Advanced Governance and Auth With Databricks Apps

2025-06-12 Watch

talk

Andre Furlan Bueno (Databricks) , Doug Judice (Addepar)

API Databricks

Explore advanced governance and authentication patterns for building secure, enterprise-grade apps with Databricks Apps. Learn how to configure complex permissions and manage access control using Unity Catalog. We’ll dive into “on-behalf-of-user” authentication — allowing agents to enforce user-specific access controls — and cover API-based authentication, including PATs and OAuth flows for external integrations. We’ll also highlight how Addepar uses these capabilities to securely build and scale applications that handle sensitive financial data. Whether you're building internal tools or customer-facing apps, this session will equip you with the patterns and tools to ensure robust, secure access in your Databricks apps.

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

2025-06-12 Watch

talk

Pallavi Koppol (Databricks) , Jonathan Frankle (Databricks)

AI/ML Databricks GenAI LLM

Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.

Automating Taxonomy Generation With Compound AI on Databricks

2025-06-12 Watch

talk

Allistair Cota (Lovelytics) , Sudhir Gajre (Lovelytics)

AI/ML API Databricks LLM

Taxonomy generation is a challenge across industries such as retail, manufacturing and e-commerce. Incomplete or inconsistent taxonomies can lead to fragmented data insights, missed monetization opportunities and stalled revenue growth. In this session, we will explore a modern approach to solving this problem by leveraging Databricks platform to build a scalable compound AI architecture for automated taxonomy generation. The first half of the session will walk you through the business significance and implications of taxonomy, followed by a technical deep dive in building an architecture for taxonomy implementation on the Databricks platform using a compound AI architecture. We will walk attendees through the anatomy of taxonomy generation, showcasing an innovative solution that combines multimodal and text-based LLMs, internal data sources and external API calls. This ensemble approach ensures more accurate, comprehensive and adaptable taxonomies that align with business needs.

Beyond Chatbots: Building Autonomous Insurance Applications With Agentic AI Framework

2025-06-12 Watch

talk

Amit Kumar Jha (Databricks) , Marcela Granados (Databricks)

AI/ML BI Data Governance Databricks

The insurance industry is at the crossroads of digital transformation, facing challenges from market competition and customer expectations. While conventional ML applications have historically provided capabilities in this domain, the emergence of Agentic AI frameworks presents a revolutionary opportunity to build truly autonomous insurance applications. We will address issues related to data governance and quality while discussing how to monitor/evaluate fine-tune models. We'll demonstrate the application of the agentic framework in the insurance context and how these autonomous agents can work collaboratively to handle complex insurance workflows — from submission intake and risk evaluation to expedited quote generation. This session demonstrates how to architect intelligent insurance solutions using Databricks Mosaic AI agentic core components including Unity Catalog, Playground, model evaluation/guardrails, privacy filters, AI functions and AI/BI Genie.

Breaking Up With Spark Versions: Client APIs, AI-Powered Automatic Updates, and Dependency Management for Databricks Serverless

2025-06-12 Watch

talk

Justin Breese (Databricks)

AI/ML API Databricks Spark

This session explains how we've made our Apache Spark™ versionless for end users by introducing a stable client API, environment versioning and automatic remediation. These capabilities have enabled auto-upgrade of hundreds of millions of workloads with minimal disruption for Serverless Notebooks and Jobs. We'll also introduce a new approach to dependency management using environments. Admins will learn how to speed up package installation with Default Base Environments, and users will see how to manage custom environments for their own workloads.

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

2025-06-12 Watch

talk

Jay Chia (Eventual)

AI/ML Analytics Big Data Data Analytics Data Lakehouse

Modern data organizations have moved beyond big data analytics to also incorporate advanced AI/ML data workloads. These workflows often involve multimodal datasets containing documents, images, long-form text, embeddings, URLs and more. Unity Catalog is an ideal solution for organizing and governing this data at scale. When paired with the Daft open source data engine, you can build a truly multimodal, AI-ready data lakehouse. In this session, we’ll explore how Daft integrates with Unity Catalog’s core features (such as volumes and functions) to enable efficient, AI-driven data lakehouses. You will learn how to ingest and process multimodal data (images, text and videos), run AI/ML transformations and feature extractions at scale, and maintain full control and visibility over your data with Unity Catalog’s fine-grained governance.

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 Watch

talk

Jonathan Brito (Databricks)

AWS AWS Glue Data Lakehouse Databricks Delta Hive

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

2025-06-12 Watch

talk

Wenwen Xie (Databricks) , Arthur Dooner (Databricks)

AI/ML API LLM

In enterprise AI, Evaluation-Driven Development (EDD) ensures reliable, efficient systems by embedding continuous assessment and improvement into the AI development lifecycle. High-quality evaluation datasets are created using techniques like document analysis, synthetic data generation via Mosaic AI’s synthetic data generation API, SME validation, and relevance filtering, reducing manual effort and accelerating workflows. EDD focuses on metrics such as context relevance, groundedness, and response accuracy to identify and address issues like retrieval errors or model limitations. Custom LLM judges, tailored to domain-specific needs like PII detection or tone assessment, enhance evaluations. By leveraging tools like Mosaic AI Agent Framework and Agent Evaluation, MLflow, EDD automates data tracking, streamlines workflows, and quantifies improvements, transforming AI development for delivering scalable, high-performing systems that drive measurable organizational value.

From Apache Airflow to Lakeflow Jobs: A Guide for Workflow Modernization

2025-06-12 Watch

talk

James Malone (Databricks) , Roland Fäustlin (Databricks)

Airflow ETL/ELT

This is an overview of migrating from Apache Airflow to Lakeflow Jobs for modern data orchestration. It covers key differences, best practices and practical examples of transitioning from traditional Airflow DAGs orchestrating legacy systems to declarative, incremental ETL pipelines with Lakeflow. Attendees will gain actionable tips on how to improve efficiency, scalability and maintainability in their workflows.

Got Metrics? Build a Metric Store — A Tour of Developing Metrics Through UC Metric Views

2025-06-12 Watch

talk

Amit Pahwa (Databricks) , Cristian Figueroa (Databricks)

AI/ML BI Databricks

I have metrics, you have metrics — we all have metrics. But the real problem isn’t having metrics, it’s that the numbers never line up, leading to endless cycles of reconciliation and confusion. Join us as we share how our Data Team at Databricks tackled this fundamental challenge in Business Intelligence by building an internal Metric Store — creating a single source of truth for all business metrics using the newly-launched UC Metric Views. Imagine a world where numbers always align, metric definitions are consistently applied across the organization and every metric comes with built-in ML-based forecasting, AI-powered anomaly detection and automatic explainability. That’s the future we’ve built — and we’ll show you how you can get started today.

Iceberg Geo Type: Transforming Geospatial Data Management at Scale

2025-06-12 Watch

talk

Szehon Ho (Databricks) , Jia Yu (Wherobots Inc.)

Analytics Data Management DWH Iceberg Spark

The Apache Iceberg™ community is introducing native geospatial type support, addressing key challenges in managing geospatial data at scale, including fragmented formats and inefficiencies in storing large spatial datasets. This talk will delve into the origins of the Iceberg geo type, its specification design and future goals. We will examine the impact on both the geospatial and Iceberg communities, in introducing a standard data warehouse storage layer to the geospatial community, and enabling optimized geospatial analytics for Iceberg users. We will also present a live demonstration of the Iceberg geo data type with Apache Sedona™ and Apache Spark™, showcasing how it simplifies and accelerates geospatial analytics workflows and queries. Finally, we will also provide an in-depth look at its current capabilities and outline the roadmap for future developments, and offer a perspective on its role in advancing geospatial data management in the industry.

Lakeflow Observability: From UI Monitoring to Deep Analytics

2025-06-12 Watch

talk

Saad Ansari (Databricks) , Theresa Hammer (Databricks)

Analytics Data Quality Databricks

Monitoring data pipelines is key to reliability at scale. In this session, we’ll dive into the observability experience in Lakeflow, Databricks’ unified DE solution — from intuitive UI monitoring to advanced event analysis, cost observability and custom dashboards. We’ll walk through the revamped UX for Lakeflow observability, showing how to: Monitor runs and task states, dependencies and retry behavior in the UI Set up alerts for job and pipeline outcomes + failures Use pipeline and job system tables for historical insights Explore run events and event logs for root cause analysis Analyze metadata to understand and optimize pipeline spend How to build custom dashboards using system tables to track performance data quality, freshness, SLAs and failure trends, and drive automated alerting based on real-time signals This session will help you unlock full visibility into your data workflows.

Latest Innovations in AI/BI Dashboards and Genie

2025-06-12 Watch

talk

Miranda Luna (Databricks) , Chao Cai (Databricks)

AI/ML Analytics BI Databricks

Discover how the latest innovations in Databricks AI/BI Dashboards and Genie are transforming self-service analytics. This session offers a high-level tour of new capabilities that empower business users to ask questions in natural language, generate insights faster and make smarter decisions. Whether you're a long-time Databricks user or just exploring what's possible with AI/BI, you'll walk away with a clear understanding of how these tools are evolving — and how to leverage them for greater business impact.

Low-Emission Oil & Gas: Engineering the Balance Between Clean and Reliable

2025-06-12 Watch

talk

Krishanu Roy (bp) , Jay Yoon (NOV) , Srinivas Chandolu (BP) , Ali Marzban (NOV)

AI/ML Analytics GenAI Cyber Security

Join two energy industry leaders as they showcase groundbreaking applications of AI and data solutions in modern oil and gas operations. NOV demonstrates how their Generative AI pipeline revolutionized drilling mud report processing, automating the analysis of 300 reports daily with near-perfect accuracy and real-time analytics capabilities. BP shares how Unity Catalog has transformed their enterprise-wide data strategy, breaking down silos while maintaining robust governance and security. Together, these case studies illustrate how AI and advanced analytics are enabling cleaner, more efficient energy operations while maintaining the reliability demanded by today's market.

Performance Best Practices for Fast Queries, High Concurrency, and Scaling on Databricks SQL

2025-06-12 Watch

talk

Mostafa Mokhtar (Databricks) , Jeremy Lewallen (Databricks)

Databricks DWH SQL

Data warehousing in enterprise and mission-critical environments needs special consideration for price/performance. This session will explain how Databricks SQL addresses the most challenging requirements for high-concurrency, low-latency performance at scale. We will also cover the latest advancements in resource-based scheduling, autoscaling and caching enhancements that allow for seamless performance and workload management.

Revolutionizing Insurance: How to Drive Growth and Innovation

2025-06-12 Watch

talk

Anindita Mahapatra (Databricks) , Porter Orr (The Standard Insurance Company) , Kranthi Nekkalapu (Suncorp) , Adrien de Nazelle (Oliver Wyman)

AI/ML Analytics Data Analytics Data Modelling

The insurance industry is rapidly evolving as advances in data and artificial intelligence (AI) drive innovation, enabling more personalized customer experiences, streamlined operations, and improved efficiencies. With powerful data analytics and AI-driven solutions, insurers can automate claims processing, enhance risk management, and make real-time decisions. Leveraging insights from large and complex datasets, organizations are delivering more customer-centric products and services than ever before. Key takeaways: Real-world applications of data and AI in claims automation, underwriting, and customer engagementHow predictive analytics and advanced data modeling help anticipate risks and meet customer needs. Personalization of policies, optimized pricing, and more efficient workflows for greater ROI. Discover how data and AI are fueling growth, improving protection, and shaping the future of the insurance industry!

talk-data.com

Top Topics

Top Speakers

Thursday Keynote (Virtual Replay)

Summit Live: A Conversation With AI influencer Josue Bogran

Summit Live: Women In Data and AI Conversation

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

Declarative Pipelines — Ask Us Anything

Route to Success: Scalable Routing Agents With Databricks and DSPy

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

Achieve Your Mission With AI-Driven Decisions

Advanced Governance and Auth With Databricks Apps

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

Automating Taxonomy Generation With Compound AI on Databricks

Beyond Chatbots: Building Autonomous Insurance Applications With Agentic AI Framework

Breaking Up With Spark Versions: Client APIs, AI-Powered Automatic Updates, and Dependency Management for Databricks Serverless

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

From Apache Airflow to Lakeflow Jobs: A Guide for Workflow Modernization

Got Metrics? Build a Metric Store — A Tour of Developing Metrics Through UC Metric Views

Iceberg Geo Type: Transforming Geospatial Data Management at Scale

Lakeflow Observability: From UI Monitoring to Deep Analytics

Latest Innovations in AI/BI Dashboards and Genie

Low-Emission Oil & Gas: Engineering the Balance Between Clean and Reliable

Performance Best Practices for Fast Queries, High Concurrency, and Scaling on Databricks SQL

Revolutionizing Insurance: How to Drive Growth and Innovation