Data + AI Summit 2025

Laying Data and AI Foundations for the Agentic Future at P&G

2025-06-10

talk

Alfredo Colas (Procter & Gamble)

Agile/Scrum AI/ML Analytics BI Data Governance Cyber Security

In today's rapidly evolving digital landscape, organizations must prioritize robust data architectures and AI strategies to remain competitive. In this session, we will explore how Procter & Gamble (P&G) has embarked on a transformative journey to digitize its operations via scalable data, analytics and AI platforms, establishing a strong foundation for data-driven decision-making and the emergence of agentic AI.Join us as we delve into the comprehensive architecture and platform initiatives undertaken at P&G to create scalable and agile data platforms unleashing BI/AI value. We will discuss our approach to implementing data governance and semantics, ensuring data integrity and accessibility across the organization. By leveraging advanced analytics and Business Intelligence (BI) tools, we will illustrate how P&G harnesses data to generate actionable insights at scale, all while maintaining security and speed.

Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines

2025-06-10 Watch

talk

Drew Breunig (Overture Maps Foundation)

AI/ML LLM

Large Language Models (LLMs) excel at understanding messy, real-world data, but integrating them into production systems remains challenging. Prompts can be unruly to write, vary by model and can be difficult to manage in the large context of a pipeline. In this session, we'll demonstrate incorporating LLMs into a geospatial conflation pipeline, using DSPy. We'll discuss how DSPy works under the covers and highlight the benefits it provides pipeline creators and managers.

Leveraging Databricks Unity Catalog for Enhanced Data Governance in Unipol

2025-06-10 Watch

talk

Beniamino Del Pizzo (Unipol S.p.A.) , Giovanni Cinquepalmi (Data Reply)

AWS Data Governance Data Management Databricks

In the contemporary landscape of data management, organizations are increasingly faced with the challenges of data segregation, governance and permission management, particularly when operating within complex structures such as holding companies with multiple subsidiaries. Unipol comprises seven subsidiary companies, each with a diverse array of workgroups, leading to a cumulative total of multiple operational groups. This intricate organizational structure necessitates a meticulous approach to data management, particularly regarding the segregation of data and the assignment of precise read-and-write permissions tailored to each workgroup. The challenge lies in ensuring that sensitive data remains protected while enabling seamless access for authorized users. This speech wants to demonstrate how Unity Catalog emerges as a pivotal tool in the daily use of the data platform, offering a unified governance solution that supports data management across diverse AWS environments.

Marketing Data + AI Leaders Forum

2025-06-10 Watch

talk

Dan Morris (Databricks) , Calen Holbrooks (Airtable) , Elizabeth Dobbs (Databricks) , David Geisinger (Deloitte) , Kristen Brophy (ThredUp) , Joyce Hwang (Dropbox) , Zeynep Inanoglu Ozdemir (Atlassian Pty Ltd.) , Bryan Saftler (Databricks) , Alex Dean (Snowplow) , Derek Slager (Amperity) , Rick Schultz (Databricks) , Bryce Peake (Domino's) , Julie Foley Long (Grammarly)

AI/ML Databricks Marketing

Join us Tuesday June 10th, 9:10-12:10 PM PT Hosted by Databricks CMO, Rick Schultz, hear from executives and speakers at PetSmart, Valentino, Domino’s, AirTable, Dropbox, ThredUp, Grammarly, Deloitte, and more. Come for actionable strategies and real-world examples: Hear from marketing experts on how to build data and AI-driven marketing organizations. Learn how Databricks Marketing supercharges impact using the Data Intelligence Platform; scaling personalization, building more efficient campaigns, and empowering marketers to self-serve insights.

Redesigning Kaizen's Cloud Data Lake for the Future

2025-06-10 Watch

talk

Triantafyllos Tsakmakis (Kaizen Gaming) , Nikolaos Michail (Kaizen Gaming)

Cloud Computing Data Lake Delta FinOps Cyber Security

At Kaizen Gaming, data drives our decision-making, but rapid growth exposed inefficiencies in our legacy cloud setup — escalating costs, delayed insights and scalability limits. Operating in 18 countries with 350M daily transactions (1PB+), shared quotas and limited cost transparency hindered efficiency. To address this, we redesigned our cloud architecture with Data Landing Zones, a modular framework that decouples resources, enabling independent scaling and cost accountability. Automation streamlined infrastructure, reduced overhead and enhanced FinOps visibility, while Unity Catalog ensured governance and security. Migration challenges included maintaining stability, managing costs and minimizing latency. A phased approach, Delta Sharing, and DBx Asset Bundles simplified transitions. The result: faster insights, improved cost control and reduced onboarding time, fostering innovation and efficiency. We share our transformation, offering insights for modern cloud optimization.

Responsible AI at Scale: Balancing Democratization and Regulation in the Financial Sector

2025-06-10 Watch

talk

Aman Thind (State Street)

AI/ML Databricks LLM Cyber Security

We partnered with Databricks to pioneer a new standard in financial sector's enterprise AI, balancing rapid AI democratization with strict regulatory and security requirements. At the core is our Responsible AI Gateway, enforcing jailbreak prevention and compliance on every LLM query. Real-time observability, powered by Databricks, calculates risk and accuracy metrics, detecting issues before escalation. Leveraging Databricks' model hosting ensures scalable LLM access, fortifying security and efficiency. We built frameworks to democratize AI without compromising guardrails. Operating in a regulated environment, we showcase how Databricks enables democratization and responsible AI at scale, offering best practices for financial organizations to harness AI safely and efficiently.

Simplifying Data Pipelines With Lakeflow Declarative Pipelines: A Beginner’s Guide

2025-06-10 Watch

talk

Matt Jones (Databricks) , Brad Turnbaugh (84.51)

AI/ML Analytics Data Engineering ETL/ELT Kafka SQL

As part of the new Lakeflow data engineering experience, Lakeflow Declarative Pipelines makes it easy to build and manage reliable data pipelines. It unifies batch and streaming, reduces operational complexity and ensures dependable data delivery at scale — from batch ETL to real-time processing.Lakeflow Declarative Pipelines excels at declarative change data capture, batch and streaming workloads, and efficient SQL-based pipelines. In this session, you’ll learn how we’ve reimagined data pipelining with Lakeflow Declarative Pipelines, including: A brand new pipeline editor that simplifies transformations Serverless compute modes to optimize for performance or cost Full Unity Catalog integration for governance and lineage Reading/writing data with Kafka and custom sources Monitoring and observability for operational excellence “Real-time Mode” for ultra-low-latency streaming Join us to see how Lakeflow Declarative Pipelines powers better analytics and AI with reliable, unified pipelines.

Sponsored by: Atlan | How Fox & Atlan are Partnering to Make Metadata a Common System of Trust, Context, and Governance

Unity Catalog Managed Tables: Faster Queries, Lower Costs, Effortless Data Management

2025-06-10 Watch

talk

Elizabeth Bowman (Databricks) , Sirui Sun (Databricks)

AI/ML Data Management

What if you could simplify data management, boost performance, and cut costs-all at once? Join us to discover how Unity Catalog managed tables can slash your storage costs, supercharge query speeds, and automate optimizations with AI on the Data Intelligence Platform. Experience seamless interoperability with third-party clients, and be among the first to preview our new game-changing tool that makes moving to UC managed tables effortless. Don’t miss this exciting session that will redefine your data strategy!

Using Identity Security With Unity Catalog for Faster, Safer Data Access

2025-06-10 Watch

talk

Siddharth Bhai (Databricks) , Kelly Albano (Databricks)

CI/CD Databricks Cyber Security

Managing authentication effectively is key to securing your data platform. In this session, we’ll explore best practices from Databricks for overcoming authentication challenges, including token visibility, MFA/SSO, CI/CD token federation and risk containment. Discover how to map your authentication maturity journey while maximizing security ROI. We'll showcase new capabilities like access token reports for improved visibility, streamlined MFA implementation and secure SSO with token federation. Learn strategies to minimize token risk through TTL limits, scoped tokens and network policies. You'll walk away with actionable insights to enhance your authentication practices and strengthen platform security on Databricks.

Startup Forum

2025-06-10

talk

Dan Tobin (Databricks) , Guy Fighel (Hetz Ventures) , Steve Sobel (Databricks) , Andrew Ferguson (Databricks) , Sri Tikkireddy (Databricks) , Aaron Jacobson (NEA) , George Webster (Zigguratum Inc) , Nima Alidoust (Tahoe Therapeutics) , Sarah Catanzaro (Amplify Partners) , Atindriyo Sanyal (Galileo)

Databricks

Hear from VC leaders, startup founders and early stage customers building on Databricks around what they are seeing in the market and how they are scaling their early stage companies on Databricks. This event is a must see for VCs, founders and those interested in the early stage company ecosystem.

Accelerating Analytics: Integrating BI and Partner Tools to Databricks SQL

2025-06-10 Watch

talk

Fuat Can Efeoglu (Databricks) , Toussaint Webb (Databricks)

Analytics BI Databricks dbt Microsoft Fabric

This session is repeated. Did you know that you can integrate with your favorite BI tools directly from Databricks SQL? You don’t even need to stand up an additional warehouse. This session shows the integrations with Microsoft Power Platform, Power BI, Tableau and dbt so you can have a seamless integration experience. Directly connect your Databricks workspace with Fabric and Power BI workspaces or Tableau to publish and sync data models, with defined primary and foreign keys, between the two platforms.

Advanced JSON Schema handing and Event Demuxing

2025-06-10 Watch

talk

Dattatraya Walake (Databricks) , Murali Talluri (Databricks)

JSON JSON Schema

This session explores advanced JSON Schema handing(inference and evolving), and event DemuxingTopics include: How from_json is currently used today and its challenges. How to use Variant for rapidly changing schema. How from_json in Lakeflow Declarative Pipelines with primed schema helps simplify schema handling. Demultiplexing patterns for scalable stream processing. Simply event Demuxing with Lakeflow Declarative Pipelines.

AI-Powered Marketing Data Management: Solving the Dirty Data Problem with Databricks

2025-06-10 Watch

talk

Steven Kostrzewski (Acxiom) , Ankur Jain (Acxiom)

AI/ML Data Management Databricks Delta GDPR/CCPA Marketing

Marketing teams struggle with ‘dirty data’ — incomplete, inconsistent, and inaccurate information that limits campaign effectiveness and reduces the accuracy of AI agents. Our AI-powered marketing data management platform, built on Databricks, solves this with anomaly detection, ML-driven transformations and the built-in Acxiom Referential Real ID Graph with Data Hygiene.We’ll showcase how Delta Lake, Unity Catalog and Lakeflow Declarative Pipelines power our multi-tenant architecture, enabling secure governance and 75% faster data processing. Our privacy-first design ensures compliance with GDPR, CCPA and HIPAA through role-based access, encryption key management and fine-grained data controls.Join us for a live demo and Q&A, where we’ll share real-world results and lessons learned in building a scalable, AI-driven marketing data solution with Databricks.

Boosting Data Science and AI Productivity With Databricks Notebooks

2025-06-10 Watch

talk

Vijay Raghavan (Thumbtack) , Jason Cui (Databricks)

AI/ML Data Science Databricks Git

This session is repeated. Want to accelerate your team's data science workflow? This session reveals how Databricks Notebooks can transform your productivity through an optimized environment designed specifically for data science and AI work. Discover how notebooks serve as a central collaboration hub where code, visualizations, documentation and results coexist seamlessly, enabling faster iteration and development. Key takeaways: Leveraging interactive coding features including multi-language support, command-mode shortcuts and magic commands Implementing version control best practices through Git integration and notebook revision history Maximizing collaboration through commenting, sharing and real-time co-editing capabilities Streamlining ML workflows with built-in MLflow tracking and experiment management You'll leave with practical techniques to enhance your notebook-based workflow and deliver AI projects faster with higher-quality results.

CI/CD for Databricks: Advanced Asset Bundles and GitHub Actions

2025-06-10 Watch

talk

Dustin Vannoy (Databricks)

CI/CD Databricks GitHub Python

This session is repeated.Databricks Asset Bundles (DABs) provide a way to use the command line to deploy and run a set of Databricks assets — like notebooks, Python code, Lakeflow Declarative Pipelines and workflows. To automate deployments, you create a deployment pipeline that uses the power of DABs along with other validation steps to ensure high quality deployments.In this session you will learn how to automate CI/CD processes for Databricks while following best practices to keep deployments easy to scale and maintain. After a brief explanation of why Databricks Asset Bundles are a good option for CI/CD, we will walk through a working project including advanced variables, target-specific overrides, linting, integration testing and automatic deployment upon code review approval. You will leave the session clear on how to build your first GitHub Action using DABs.ub Action using DABs.

Databricks, the Good, the Bad and the Ugly

2025-06-10 Watch

talk

Holly Smith (Databricks)

Databricks

Databricks is the bestest platform ever where everything is perfect and nothing else could ever make it any better, right? …right? You and I know, this is not true. Don’t get me wrong, there are features that I absolutely love, but there are also some that require powering through the papercuts. And then there are those that I pretend don’t exist. I’ll be opening up to give my honest take on three of each category, why I do (or don’t) like them, and then telling you which talks to attend to find out more.

Databricks Without Disruption: A Deep Dive on Catalog Federation with Hive Metastore, Glue, and Snowflake

2025-06-10

talk

John Spencer (Databricks) , Milos Stojanovic (Databricks)

AWS Glue Data Governance Databricks Hive Snowflake

You shouldn’t have to sacrifice data governance just to leverage the tools your business needs. In this session, we will give practical tips on how you can cut through the data sprawl and get a unified view of your data estate in Unity Catalog without disrupting existing workloads. We will walk through how to set up federation with Glue, Hive Metastore, and other catalogs like Snowflake, and show you how powerful new tools help you adopt Databricks at your own pace with no downtime and full interoperability.

Data Management and Governance With UC

2025-06-10

talk

AI/ML API Cloud Computing Data Governance Data Management Databricks

In this course, you'll learn concepts and perform labs that showcase workflows using Unity Catalog - Databricks' unified and open governance solution for data and AI. We'll start off with a brief introduction to Unity Catalog, discuss fundamental data governance concepts, and then dive into a variety of topics including using Unity Catalog for data access control, managing external storage and tables, data segregation, and more. Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.) Labs: Yes Certification Path: Databricks Certified Data Engineer Associate

Deploy Workloads with Lakeflow Jobs (previously Databricks Workflows)

2025-06-10

talk

Analytics API Cloud Computing Dashboard Databricks Python

In this course, you’ll learn how to orchestrate data pipelines with Lakeflow Jobs (previously Databricks Workflows) and schedule dashboard updates to keep analytics up-to-date. We’ll cover topics like getting started with Lakeflow Jobs, how to use Databricks SQL for on-demand queries, and how to configure and schedule dashboards and alerts to reflect updates to production data pipelines. Pre-requisites: Beginner familiarity with the Databricks Data Intelligence Platform (selecting clusters, navigating the Workspace, executing notebooks), cloud computing concepts (virtual machines, object storage, etc.), production experience working with data warehouses and data lakes, intermediate experience with basic SQL concepts (select, filter, groupby, join, etc), beginner programming experience with Python (syntax, conditions, loops, functions), beginner programming experience with the Spark DataFrame API (Configure DataFrameReader and DataFrameWriter to read and write data, Express query transformations using DataFrame methods and Column expressions, etc.) Labs: No Certification Path: Databricks Certified Data Engineer Associate

Easy Ways to Optimize Your Databricks Costs

2025-06-10 Watch

talk

Youssef Mrini (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Databricks

In this session, we will explore effective strategies for optimizing costs on the Databricks platform, a leading solution for handling large-scale data workloads. Databricks, known for its open and unified approach, offers several tools and methodologies to ensure users can maximize their return on investment (ROI) while managing expenses efficiently. Key points: Understanding usage with AI/BI tools Organizing costs with tagging Setting up budgets Leveraging System Tables By the end of this session, you will have a comprehensive understanding of how to leverage Databricks' built-in tools for cost optimization, ensuring that their data and AI projects not only deliver value but do so in a cost-effective manner. This session is ideal for data engineers, financial analysts, and decision-makers looking to enhance their organization’s efficiency and financial performance through strategic cost management on Databricks.

Elevating Data Quality Standards With Databricks DQX

2025-06-10 Watch

talk

Marcin Wojtyczka (Databricks) , Neha Milak (Databricks)

Data Quality Databricks PySpark Python Data Streaming

Join us for an introductory session on Databricks DQX, a Python-based framework designed to validate the quality of PySpark DataFrames. Discover how DQX can empower you to proactively tackle data quality challenges, enhance pipeline reliability and make more informed business decisions with confidence. Traditional data quality tools often fall short by providing limited, actionable insights, relying heavily on post-factum monitoring, and being restricted to batch processing. DQX overcomes these limitations by enabling real-time quality checks at the point of data entry, supporting both batch and streaming data validation and delivering granular insights at the row and column level. If you’re seeking a simple yet powerful data quality framework that integrates seamlessly with Databricks, this session is for you.

From Code Completion to Autonomous Software Engineering Agents

2025-06-10 Watch

talk

Kilian Lieret (Princeton University)

AI/ML GitHub

As language models have advanced, they have moved beyond code completion and are beginning to tackle software engineering tasks in a more autonomous, agentic way. However, evaluating agentic capabilities is challenging. To address this, we first introduce SWE-bench, a benchmark built from real GitHub issues that has become the standard for assessing AI’s ability to resolve complex software tasks in large codebases. We will discuss the current state of the field, the limitations of today’s models, and how far we still are from truly autonomous AI developers. Next, we will explore the fundamentals of agents based on hands-on demonstrations with SWE-agent, a simple yet powerful agent framework designed for software engineering but adaptable to a variety of domains. By the end of this session, you will have a clear understanding of the current frontier of agentic AI in software engineering, the challenges ahead and how you can experiment with AI agents in your own workflows.

talk-data.com

Top Topics

Top Speakers

Laying Data and AI Foundations for the Agentic Future at P&G

Let the LLM Write the Prompts: An Intro to DSPy in Compound AI Pipelines

Leveraging Databricks Unity Catalog for Enhanced Data Governance in Unipol

Marketing Data + AI Leaders Forum

Redesigning Kaizen's Cloud Data Lake for the Future

Responsible AI at Scale: Balancing Democratization and Regulation in the Financial Sector

Simplifying Data Pipelines With Lakeflow Declarative Pipelines: A Beginner’s Guide

Sponsored by: Atlan | How Fox & Atlan are Partnering to Make Metadata a Common System of Trust, Context, and Governance

Sponsored by: dbt Labs | Empowering the Enterprise for the Next Era of AI and BI

Sponsored by: EY | Navigating the Future: Knowledge-Powered Insights on AI, Information Governance, Real-Time Analytics

Unity Catalog Managed Tables: Faster Queries, Lower Costs, Effortless Data Management

Using Identity Security With Unity Catalog for Faster, Safer Data Access

Startup Forum

Accelerating Analytics: Integrating BI and Partner Tools to Databricks SQL

Advanced JSON Schema handing and Event Demuxing

AI-Powered Marketing Data Management: Solving the Dirty Data Problem with Databricks

Boosting Data Science and AI Productivity With Databricks Notebooks

CI/CD for Databricks: Advanced Asset Bundles and GitHub Actions

Databricks, the Good, the Bad and the Ugly

Databricks Without Disruption: A Deep Dive on Catalog Federation with Hive Metastore, Glue, and Snowflake

Data Management and Governance With UC

Deploy Workloads with Lakeflow Jobs (previously Databricks Workflows)

Easy Ways to Optimize Your Databricks Costs

Elevating Data Quality Standards With Databricks DQX

From Code Completion to Autonomous Software Engineering Agents