Data + AI Summit 2025

Lakehouse to Powerhouse: Reckitt's Enterprise AI Transformation Story

2025-06-11 Watch

talk

Tom Martin (Boston Consulting Group) , Tewfik Bedreddine (Reckitt)

AI/ML Data Lakehouse GenAI Marketing

In this presentation, we showcase Reckitt’s journey to develop and implement a state-of-the-art Gen AI platform, designed to transform enterprise operations starting with the marketing function. We will explore the unique technical challenges encountered and the innovative architectural solutions employed to overcome them. Attendees will gain insights into how cutting-edge Gen AI technologies were integrated to meet Reckitt’s specific needs. This session will not only highlight the transformative impacts on Reckitt’s marketing operations but also serve as a blueprint for AI-driven innovation in the Consumer Goods sector, demonstrating a successful model of partnership in technology and business transformation.

Managing Databricks at Scale

2025-06-11 Watch

talk

Vikas Ranjan (T-Mobile)

AI/ML Data Governance Databricks Delta

T-Mobile’s leadership in 5G innovation and its rapid growth in the fixed wireless business have led to an exponential increase in data, reaching 100s of terabytes daily. This session explores how T-Mobile uses Databricks to manage this data efficiently, focusing on scalable architecture with Delta Lake, auto-scaling clusters, performance optimization through data partitioning and caching and comprehensive data governance with Unity Catalog. Additionally, it covers cost management, collaborative tools and AI-driven productivity tools, highlighting how these strategies empower T-Mobile to innovate, streamline operations and maximize data impact across network optimization, supporting the community, energy management and more.

Multi-Format, Multi-Table, Multi-Statement Transactions on Unity Catalog

2025-06-11 Watch

talk

Prakhar Jain (Databricks) , Michelle Leon (Databricks)

Data Lakehouse Databricks Delta Iceberg

Get a first look at multi-statement transactions in Databricks. In this session, we will dive into their capabilities, exploring how multi-statement transactions enable atomic updates across multiple tables in your data pipelines, ensuring data consistency and integrity for complex operations. We will also share how we are enabling unified transactions across Delta Lake and Iceberg with Unity Catalog — powering our vision for an open and interoperable lakehouse.

Scaling Generative AI: Batch Inference Strategies for Foundation Models

2025-06-11 Watch

talk

Andrew Shieh (Databricks) , Ankit Mathur (Databricks)

AI/ML Databricks GenAI LLM

Curious how to apply resource-intensive generative AI models across massive datasets without breaking the bank? This session reveals efficient batch inference strategies for foundation models on Databricks. Learn how to architect scalable pipelines that process large volumes of data through LLMs, text-to-image models and other generative AI systems while optimizing for throughput, cost and quality. Key takeaways: Implementing efficient batch processing patterns for foundation models using AI functions Optimizing token usage and prompt engineering for high-volume inference Balancing compute resources between CPU preprocessing and GPU inference Techniques for parallel processing and chunking large datasets through generative models Managing model weights and memory requirements across distributed inference tasks You'll discover how to process any scale of data through your generative AI models efficiently.

Serverless as the New "Easy Button": How HP Inc. Used Serverless to Turbocharge Their Data Pipeline

2025-06-11 Watch

talk

Matthew Wright (Zahlen Solutions LLC) , Jason Hart (Zahlen Solutions)

Adobe Analytics Analytics AWS Databricks Spark

How do you wrangle over 8TB of granular “hit-level” website analytics data with hundreds of columns, all while eliminating the overhead of cluster management, decreasing runtime and saving money? In this session, we’ll dive into how we helped HP Inc. use Databricks serverless compute and Lakeflow Declarative Pipelines to streamline Adobe Analytics data ingestion while making it faster, cheaper and easier to operate. We’ll walk you through our full migration story — from managing unwieldy custom-defined AWS-based Apache Spark™ clusters to spinning up Databricks serverless pipelines and workflows with on-demand scalability and near-zero overhead. If you want to simplify infrastructure, optimize performance and get more out of your Databricks workloads, this session is for you.

Simon + Denny - Unfiltered & Unscripted

2025-06-11

talk

Denny Lee (Databricks) , Simon Whiteley (Advancing Analytics)

AI/ML

Two industry veterans have been debating data architecture, tearing apart trends and tinkering with tech for decades and they’re bringing the conversation live — and you’re in control. Got a burning question about lake structures or internal performance? Worried about AI taking over the world? Want straight-talking opinions on the latest hype? Need real-world advice from the people who the experts get advice from? Want to get the juicy behind-the-scenes gossip about any announcements and shockwaves from the Keynotes? This is your chance to have your questions answered! Submit your questions ahead of time or bring them on the day — no topic is off-limits (though there's always a risk of side quests into coffee, sci-fi, or the quirks of English weather). Come for the insights, stay for the chaos.

Smashing Silos, Shaping the Future: Data for All in the Next-Gen Ecosystem

2025-06-11 Watch

talk

Michael Flynn (Rivian)

AI/ML BI Cloud Computing Databricks

A successful data strategy requires the right platform and the ability to empower the broader user community by creating simple, scalable and secure patterns that lower the barrier to entry while ensuring robust data practices. Guided by the belief that everyone is a data person, we focus on breaking down silos, democratizing access and enabling distributed teams to contribute through a federated "data-as-a-product" model. We’ll share the impact and lessons learned in creating a single source of truth on Unity Catalog, consolidated from diverse sources and cloud platforms. We’ll discuss how we streamlined governance with Databricks Apps, Workflows and native capabilities, ensuring compliance without hindering innovation. We’ll also cover how we maximize the value of that catalog by leveraging semantics to enable trustworthy, AI-driven self-service in AI/BI dashboards and downstream apps. Come learn how we built a next-gen data ecosystem that empowers everyone to be a data person.

Sponsored by: DataNimbus | Building an AI Platform in 30 Days and Shaping the Future with Databricks

Tech Industry Session: Optimizing Costs and Controls to Democratize Data and AI

2025-06-11 Watch

talk

Miranda Luna (Databricks) , Anup Segu (YipitData) , Vivek Srivastava (OT Technology, LLC)

AI/ML Analytics BI Data Lakehouse Databricks

Join us for this session focused on how leading tech companies are enabling data intelligence across their organizations while maintaining cost efficiency and governance. Hear the successes and the challenges when Databricks empowers thousands of users—from engineers to business teams—by providing scalable tools for AI, BI and analytics. Topics include: Combining AI/BI and Lakehouse Apps to streamline workflows and accelerate insights Implementing systems tables, tagging and governance frameworks for granular control Democratizing data access while optimizing costs for large-scale analytical workloads Hear from customers and Databricks experts, followed by a customer panel featuring industry leaders. Gain insights into how Databricks helps tech innovators scale their platforms while maintaining operational excellence.

Telco Reimagined: Real-World Journeys in Data and AI for Customer Experience Transformation

2025-06-11 Watch

talk

Russell Marks (AT&T) , Emma Hartwell (Optus) , Adam Hudson (Plume Design, Inc.) , AUSTIN, MARK (AT&T) , Nevash Pillay (Databricks)

AI/ML Databricks

How are today’s leading telecom operators transforming customer experience at scale with data and AI? Join us for an inspiring fireside chat with senior leaders from Optus, Plume and AT&T as they share their transformation stories — from the first steps to major milestones and the tangible business impact achieved with Databricks’ Data Intelligence Platform. You’ll hear firsthand how these forward-thinking CSP’s are driving measurable outcomes through unified data, machine learning and AI. Discover the high-impact use cases they’re prioritizing — like proactive care and hyper-personalization — and gain insight into their bold vision for the future of customer experience in telecom. Whether you're just beginning your AI journey or scaling to new heights, this session offers an authentic look at what’s working, what’s next and how data and AI are helping telecoms lead in a competitive landscape.

The Upcoming Apache Spark 4.1: The Next Chapter in Unified Analytics

2025-06-11 Watch

talk

DB Tsai (Databricks) , Xiao Li (Databricks)

Analytics API Data Quality ETL/ELT PySpark Python

Apache Spark has long been recognized as the leading open-source unified analytics engine, combining a simple yet powerful API with a rich ecosystem and top-notch performance. In the upcoming Spark 4.1 release, the community reimagines Spark to excel at both massive cluster deployments and local laptop development. We’ll start with new single-node optimizations that make PySpark even more efficient for smaller datasets. Next, we’ll delve into a major “Pythonizing” overhaul — simpler installation, clearer error messages and Pythonic APIs. On the ETL side, we’ll explore greater data source flexibility (including the simplified Python Data Source API) and a thriving UDF ecosystem. We’ll also highlight enhanced support for real-time use cases, built-in data quality checks and the expanding Spark Connect ecosystem — bridging local workflows with fully distributed execution. Don’t miss this chance to see Spark’s next chapter!

Unity Catalog Lakeguard: Secure and Efficient Compute for Your Enterprise

2025-06-11 Watch

talk

Scott Van Woudenberg (Databricks) , Jakob Mund (Databricks)

Cloud Computing Databricks Cloud Functions Cyber Security Spark SQL

Modern data workloads span multiple sources — data lakes, databases, apps like Salesforce and services like cloud functions. But as teams scale, secure data access and governance across shared compute becomes critical. In this session, learn how to confidently integrate external data and services into your workloads using Spark and Unity Catalog on Databricks. We'll explore compute options like serverless, clusters, workflows and SQL warehouses, and show how Unity Catalog’s Lakeguard enforces fine-grained governance — even when concurrently sharing compute by multiple users. Walk away ready to choose the right compute model for your team’s needs — without sacrificing security or efficiency.

What’s New with Databricks Assistant: From Exploration to Production

2025-06-11 Watch

talk

Samantha Banchik (Databricks) , Gal Oshri (Databricks)

Databricks SQL

Databricks Assistant helps you get from initial exploration all the way to production faster and easier than ever. In this session, we'll show you how Assistant simplifies and accelerates common workflows, boosting your productivity across notebooks and the SQL editor. You'll get practical tips, see end-to-end examples in action, and hear about the latest capabilities we're excited about. We'll also discuss how we're continually improving Assistant to make your development experience faster, more contextual and more customizable. Join us to discover how to get the most out of Databricks Assistant and empower your team to build better and faster.

Summit Live: How Databricks Uses Databricks

2025-06-11 Watch

talk

Bruce Wong (Databricks)

AI/ML Data Lakehouse Databricks

Ever wonder how Databricks operates its own enterprise lakehouse, where all employees and all teams inside use data and AI to solve problems and guide our decisions? Bruce Wong, head of data platforms, will talk about how his team leverages Databricks itself.

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

2025-06-11 Watch

lightning_talk

Kevin Wilson (NCS Australia)

Analytics Data Governance Data Lakehouse Data Quality Databricks dbt

In this session, we will share NCS’s approach to implementing a Databricks Lakehouse architecture, focusing on key lessons learned and best practices from our recent implementations. By integrating Databricks SQL Warehouse, the DBT Transform framework and our innovative test automation framework, we’ve optimized performance and scalability, while ensuring data quality. We’ll dive into how Unity Catalog enabled robust data governance, empowering business units with self-serve analytical workspaces to create insights while maintaining control. Through the use of solution accelerators, rapid environment deployment and pattern-driven ELT frameworks, we’ve fast-tracked time-to-value and fostered a culture of innovation. Attendees will gain valuable insights into accelerating data transformation, governance and scaling analytics with Databricks.

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

2025-06-11 Watch

lightning_talk

Allison Wang (Databricks) , LU QIU (LanceDB)

AI/ML Analytics API Big Data Data Analytics Lance

PySpark has long been a cornerstone of big data processing, excelling in data preparation, analytics and machine learning tasks within traditional data lakes. However, the rise of multimodal AI and vector search introduces challenges beyond its capabilities. Spark’s new Python data source API enables integration with emerging AI data lakes built on the multi-modal Lance format. Lance delivers unparalleled value with its zero-copy schema evolution capability and robust support for large record-size data (e.g., images, tensors, embeddings, etc), simplifying multimodal data storage. Its advanced indexing for semantic and full-text search, combined with rapid random access, enables high-performance AI data analytics to the level of SQL. By unifying PySpark's robust processing capabilities with Lance's AI-optimized storage, data engineers and scientists can efficiently manage and analyze the diverse data types required for cutting-edge AI applications within a familiar big data framework.

Driving Trusted Insights With AI/BI and Unity Catalog Metric Views

2025-06-11 Watch

lightning_talk

Fuat Can Efeoglu (Databricks)

AI/ML Analytics BI

Deliver trusted, high-performance insights by incorporating Unity Catalog metric views and business semantics into your AI/BI workflows. This session dives into the architecture and best practices for defining reusable metrics, implementing governance and enhancing query performance in AI/BI Dashboards and Genie. Learn how to manage business semantics effectively to ensure data consistency while empowering business users with governed, self-service analytics. Ideal for teams looking to streamline analytics at scale, this session provides practical strategies for driving data accuracy and governance.

Inscape Smart TV Data: Unlocking Consumption and Competitive Intelligence

2025-06-11 Watch

lightning_talk

Rich Guinness (Vizio Inscape)

Databricks Data Streaming

With VIZIO's Inscape viewership data now available in the Databricks marketplace, our expansive dataset has never been easier to access. With real-time availability, flexible integrations, and secure, governed sharing, it's built for action.Join our team as we explore the full depth of this comprehensive data across both linear and streaming TV - showcasing real-world use cases like measuring the incremental reach of streaming or matching to 1st/3rd party data for ROI analyses. We will review our competitive intelligence through a share-of-voice analysis to provide the seamless steps to success.This session will show you how to turn Inscape data into a strategic advantage.

Reducing Transaction Conflicts in Databricks—Fundamentals and Applications at Asana

2025-06-11 Watch

lightning_talk

Dima Kamalov (Asana)

Databricks

When using ACID-guaranteed transactions on Databricks concurrently, we can run into transaction conflicts. This talk discusses the basics of concurrent transaction functionality in Databricks—what happens when various combinations of INSERT, UPDATE and MERGE INTO happen concurrently. We discuss how table isolation level, partitioning and deletion vectors affect this. We also mention how Asana used an intermediate blind append stage to support several hundred concurrent transaction updates into the same table.

Searching for Meaning in the Age of AI

2025-06-11 Watch

lightning_talk

Bryan McCann (You.com)

AI/ML Computer Science

Bryan McCann, You.com’s co-founder and CTO, shares his journey from studying philosophy and meaning to the Stanford Computer Science Department working on groundbreaking AI research alongside Richard Socher. Right now, AI is reshaping everything we hold dear — our jobs, creativity, and identities. It’s also our greatest source of inspiration. The Age of AI is simultaneously a Renaissance, Enlightenment, Industrial Revolution and likely source of humanity’s greatest existential crisis. To surmount this, Bryan will discuss how he uses AI responses as new starting points rather than answers, building teams like neural networks optimized for learning and how the answer to our meaning crisis may be for humans to be more like AI. Exploring AI’s impact on politics, economics, healthcare, education and culture, Bryan asserts that we must all take part in authoring humanity’s new story — AI can inspire us to become something new, rather than merely replace what we are now.

Sponsored by: Accenture & Avanade | Reinventing State Services with Databricks: AI-Driven Innovations in Health and Transportation

talk-data.com

Top Topics

Top Speakers

Lakehouse to Powerhouse: Reckitt's Enterprise AI Transformation Story

Managing Databricks at Scale

Multi-Format, Multi-Table, Multi-Statement Transactions on Unity Catalog

Scaling Generative AI: Batch Inference Strategies for Foundation Models

Serverless as the New "Easy Button": How HP Inc. Used Serverless to Turbocharge Their Data Pipeline

Simon + Denny - Unfiltered & Unscripted

Smashing Silos, Shaping the Future: Data for All in the Next-Gen Ecosystem

Sponsored by: DataNimbus | Building an AI Platform in 30 Days and Shaping the Future with Databricks

Sponsored by: Genpact | Powering Change at GE Vernova: Inside One of the World’s Largest Databricks Migrations

Sponsored by: Google Cloud | Building Powerful Agentic Ecosystems with Google Cloud's A2A

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

Sponsored by: Infosys | AI-Driven Growth: Expedite Potential of Agentic AI and Drive Beyond Customer Experience and Operational Efficiency

Tech Industry Session: Optimizing Costs and Controls to Democratize Data and AI

Telco Reimagined: Real-World Journeys in Data and AI for Customer Experience Transformation

The Upcoming Apache Spark 4.1: The Next Chapter in Unified Analytics

Unity Catalog Lakeguard: Secure and Efficient Compute for Your Enterprise

What’s New with Databricks Assistant: From Exploration to Production

Summit Live: How Databricks Uses Databricks

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

Bridging Big Data and AI: Empowering PySpark With Lance Format for Multi-Modal AI Data Pipelines

Driving Trusted Insights With AI/BI and Unity Catalog Metric Views

Inscape Smart TV Data: Unlocking Consumption and Competitive Intelligence

Reducing Transaction Conflicts in Databricks—Fundamentals and Applications at Asana

Searching for Meaning in the Age of AI

Sponsored by: Accenture & Avanade | Reinventing State Services with Databricks: AI-Driven Innovations in Health and Transportation