talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

715

Sessions & talks

Showing 226–250 of 715 · Newest first

Search within this event →
Master Schema Translations in the Era of Open Data Lake

Master Schema Translations in the Era of Open Data Lake

2025-06-11 Watch
lightning_talk
Eric Sun (Coinbase)

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

Multi-Statement Transactions: How to Improve Data Consistency and Performance

Multi-Statement Transactions: How to Improve Data Consistency and Performance

2025-06-11 Watch
lightning_talk
Franco Patano (Databricks)

Multi-statement transactions bring the atomicity and reliability of traditional databases to modern data warehousing on the lakehouse. In this session, we’ll explore real-world patterns enabled by multi-statement transactions — including multi-table updates, deduplication pipelines and audit logging — and show how Databricks ensures atomicity and consistency across complex workflows. We’ll also dive into demos and share tips to getting started and migrations with this feature in Databricks SQL.

Somebody Set Up Us the Bomb: Identifying List Bombing of End Users in an Email Anti-Spam Context

Somebody Set Up Us the Bomb: Identifying List Bombing of End Users in an Email Anti-Spam Context

2025-06-11 Watch
lightning_talk
Doug Sibley (Cisco Talos)

Traditionally, spam emails are messages a user does not want, containing some kind of threat like phishing. Because of this, detection systems can focus on malicious content or sender behavior. List bombing upends this paradigm. By abusing public forms such as marketing signups, attackers can fill a user's inbox with high volumes of legitimate mail. These emails don't contain threats, and each sender is following best practices to confirm the recipient wants to be subscribed, but the net effect for an end user is their inbox being flooded with dozens of emails per minute. This talk covers the the exploration and implementation for identifying this attack in our company's anti-spam telemetry: from reading and writing to Kafka, Delta table streaming for ETL workflows, multi-table liquid clustering design for efficient table joins, curating gold tables to speed up critical queries and using Delta tables as an auditable integration point for interacting with external services.

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

2025-06-11 Watch
lightning_talk
Dmitri Ryssev (Dataiku)

AI agent systems hold immense promise for automating complex tasks and driving intelligent decision‑making, but only when they are engineered to be both resilient and transparent. In this session we will explore how Dataiku’s LLM Mesh pairs with Databricks Mosaic AI to streamline the entire lifecycle: ingesting and preparing data in the Lakehouse, prompt engineering LLMs hosted on Mosaic AI Model Serving Endpoints, visually orchestrating multi‑step chains, and monitoring them in real time. We’ll walk through a live demo of a Dataiku flow that connects to a Databricks hosted model, adds automated validation, lineage, and human‑in‑the‑loop review, then exposes the agent via Dataiku's Agent Connect interface. You’ll leave with actionable patterns for setting guardrails, logging decisions, and surfacing explanations—so your organization can deploy trustworthy domain‑specific agents faster & safer.

Sponsored by: Hightouch | Earning $66 Million with AI: How XP Overhauled Customer Acquisition with Databricks and Hightouch

Sponsored by: Hightouch | Earning $66 Million with AI: How XP Overhauled Customer Acquisition with Databricks and Hightouch

2025-06-11 Watch
lightning_talk
Marcelo Duarte (XP Inc.)

XP is one of the largest financial institutions in Brazil– and they didn’t reach that scale by moving slowly. After getting inspired at Data + AI Summit in 2024, they sprang into action to overhaul their advertising strategy using their first-party data in Databricks. Just a year later, they’ve achieved remarkable success: they’ve unlocked $66 million in incremental revenue from advertising, with the same budget as before. In this session, XP will share the tactical steps they took to bring a first-party data and AI strategy to customer acquisition– including how they built predictive models for customer quality and connected Databricks to their ecosystem through Hightouch’s Composable CDP. If you’re supporting an advertising team or are looking for real strategies to take home from this conference that can transform your business: this session is for you.

Sponsored by: Insight Enterprises | Unity Catalog Agent Assistant

Sponsored by: Insight Enterprises | Unity Catalog Agent Assistant

2025-06-11 Watch
lightning_talk
Stephen Lundall (Insight)
SQL

Insight will explore a multi-agent system built with LangGraph designed to alleviate the challenges faced by data analysts inundated with requests from business users. This innovative solution empowers users who lack SQL skills to easily access insights from specific Unity Catalog datasets. Discover how the Unity Catalog Agent Assistant streamlines data requests, enhances collaboration, and ultimately drives better decision-making across your organization.

Streamlining AI Application Development With Databricks Apps

Streamlining AI Application Development With Databricks Apps

2025-06-11 Watch
lightning_talk
Domonkos Pal (Hiflylabs Zrt.)

Think Databricks is just for data and models? Think again. In this session, you’ll see how to build and scale a full-stack AI app capable of handling thousands of queries per second entirely on Databricks. No extra cloud platforms, no patchwork infrastructure. Just one unified platform with native hosting, LLM integration, secure access, and built-in CI/CD. Learn how Databricks Apps, along with services like Model Serving, Jobs, and Gateways, streamline your architecture, eliminate boilerplate, and accelerate development, from prototype to production.

Streamlining DSPy Development: Track, Debug, and Deploy With MLflow

Streamlining DSPy Development: Track, Debug, and Deploy With MLflow

2025-06-11 Watch
lightning_talk
Chen Qian (Databricks)

DSPy is a framework for authoring GenAI applications with automatic prompt optimization, while MLflow provides powerful MLOps tooling to track, monitor, and productize machine learning workflows. In this lightning talk, we demonstrate how to integrate MLflow with DSPy to bring full observability to your DSPy development. We’ll walk through how to track DSPy module calls, evaluations, and optimizers using MLflow’s tracing and autologging capabilities. By the end, you'll see how combining these two tools makes it easier to debug, iterate, and understand your DSPy workflows, then deploy your DSPy program — end to end.

Unlock Agentic AI for Insurance With Deloitte & Databricks

Unlock Agentic AI for Insurance With Deloitte & Databricks

2025-06-11 Watch
lightning_talk

In an era where insights-driven decision-making is paramount, the insurance industry stands at the cusp of a major technological revolution. This session will delve into how Agentic AI — AI agents act autonomously to achieve critical goals — can be leveraged to transform insurance operation (underwriting, claims, services), enhance customer experiences and drive strategic growth.

Summit Live: Partners - Hear From Key Companies on Adding Value

Summit Live: Partners - Hear From Key Companies on Adding Value

2025-06-11 Watch
talk
Ari Kaplan (Databricks)

The Databricks ecosystem has 5,000+ partners, who help enable you to leverage Databricks to unify all your data and AI workloads for more meaningful insights. Hear from some of the leading cloud, technology, and consulting partners.

SAP and Databricks: Building Your Lakehouse Reference Architecture

SAP and Databricks: Building Your Lakehouse Reference Architecture

2025-06-11 Watch
talk
Qi Su (Databricks) , Niclas Schlautkoetter (SAP SE)

SAP is the world's 3rd-largest publicly traded software company by revenue, and recently launched the joint SAP Databricks "Business Data Cloud". See how it all works from a practitioner's perspective, including reference architecture, demo, and example customers. See firsthand how the powerful suite of SAP applications benefits from a joint Databricks solution - with data being more easily governed, discovered, shared, and used for AI/ML..

Apache Spark — Ask Us Anything

2025-06-11
lightning_talk
Allison Wang (Databricks) , Jules Damji (Databricks) , DB Tsai (Databricks)

Join us for an interactive Ask Me Anything (AMA) session on the latest advancements in Apache Spark 4, including Spark Connect — the new client-server architecture enabling seamless integration with IDEs, notebooks and custom applications. Learn about performance improvements, enhanced APIs and best practices for leveraging Spark’s next-generation features. Whether you're a data engineer, Spark developer or big data enthusiast, bring your questions on architecture, real-world use cases and how these innovations can optimize your workflows. Don’t miss this chance to dive deep into the future of distributed computing with Spark!

Cost-Effective Data Architecture and AI Practice With Databricks at FunPlus

Cost-Effective Data Architecture and AI Practice With Databricks at FunPlus

2025-06-11 Watch
lightning_talk
Chao Chen (FunPlus)

FunPlus's journey to building a cost-effective and efficient data platform with Databricks: exploring how FunPlus leveraged Databricks to tackle key challenges, enhance data engineering and ML efficiency, and showcasing best practices and their impact on game development and operations.

Deploying Unity Catalog OSS on Kubernetes: Simplifying Infrastructure Management

Deploying Unity Catalog OSS on Kubernetes: Simplifying Infrastructure Management

2025-06-11 Watch
lightning_talk
Vasilii Bulatov (Nebius)

In modern data infrastructure, efficient and scalable data governance is essential for ensuring security, compliance, and accessibility. This session explores how to deploy Unity Catalog OSS on Kubernetes, leveraging its cloud-agnostic nature and efficient resource management. Helm makes Unity Catalog deployment simple and easy by providing a simplified installation process, easy configuration and credentials management.The session will cover why Kubernetes is the ideal platform, provide a technical breakdown of Unity Catalog on Kubernetes, and include a live showcase of its seamless deployment process. By the end, participants will confidently configure and deploy Unity Catalog OSS in their preferred Kubernetes environment and integrate it into their existing infrastructure.

Multi-Agents in Production: How to Orchestrate Effective Agents

Multi-Agents in Production: How to Orchestrate Effective Agents

2025-06-11 Watch
lightning_talk

Agents solve complex problems, but building them can be hard. This talk will walk through techniques to build effective multi-agent networks in production deterministically, reducing the compounding error rates across each AI call.

Sponsored by: Prophecy | Ready for GenAI? Survey Says Governed Self-Service Is the New Playbook for Data Teams

Sponsored by: Prophecy | Ready for GenAI? Survey Says Governed Self-Service Is the New Playbook for Data Teams

2025-06-11 Watch
lightning_talk
Mitesh Shah (Prophecy)

Are data teams ready for AI? Prophecy’s exclusive survey, “The Impact of GenAI on Data Teams”, gives the clearest picture yet of GenAI’s potential in data management, and what’s standing in the way. The top two obstacles? Poor governance and slow access to high-quality data. The message is clear: Modernizing your data platform with Databricks is essential. But it’s only the beginning. To unlock the power of AI and analytics, organizations must deliver governed, self-service access to clean, trusted data. Traditional data prep tools introduce risks around security, quality, and cost. It’s no wonder data leaders cited data transformation as the area where GenAI will make the biggest impact. To deliver what’s needed teams need a shift to governed self-service. Data analysts and scientists move fast while staying within IT’s guardrails. Join us to learn more details from the survey and how leading organizations are ahead of the curve, using GenAI to reshape how data gets done.

Sponsored by: Snorkel AI | Evaluating and Improving Performance of Agentic Systems

Sponsored by: Snorkel AI | Evaluating and Improving Performance of Agentic Systems

2025-06-11 Watch
lightning_talk
Chris Borg (Snorkel AI)

GenAI systems are evolving beyond basic information retrieval and question answering, becoming sophisticated agents capable of managing multi-turn dialogues and executing complex, multi-step tasks autonomously. However, reliably evaluating and systematically improving their performance remains challenging. In this session, we'll explore methods for assessing the behavior of LLM-driven agentic systems, highlighting techniques and showcasing actionable insights to identify performance bottlenecks and to creating better-aligned, more reliable agentic AI systems.

Sponsored by: ThoughtSpot | Supercharge Your Databricks Investment with DataSpot

2025-06-11
lightning_talk
Samuel Weick (ThoughtSpot)

Join us to see how the powerful combination of ThoughtSpot's agentic analytics platform and the Databricks Data Intelligence Platform is changing the game for data-driven organizations. We'll demonstrate how DataSpot breaks down technical barriers to insight. You'll learn how to get trusted, real-time answers thanks to the seamless integration between ThoughtSpot's semantic layer and Databricks Unity Catalog. This session is for anyone looking to leverage data more effectively, whether you're a business leader seeking AI-driven insights, a data scientist building models in Python, or a product owner creating intelligent applications.

Three Big Unlocks to AI Interoperability with Databricks

Three Big Unlocks to AI Interoperability with Databricks

2025-06-11 Watch
lightning_talk
Shiyi Pickrell (Expedia)

The ability for different AI systems to collaborate is more critical than ever. From traditional ML development to fine-tuning GenAI models, Databricks delivers the stability, cost-optimization and productivity Expedia Group (EG) needs. Learn how to unlock the full potential of AI interoperability with Databricks. AI acceleration: Discover how Databricks acts as a central hub, helping to scale AI model training and prediction generation to deliver high-quality insights for customers. Cross-platform synergy: Learn how EG seamlessly integrated Databricks' powerful features into its ecosystem, streamlining workflows and accelerating time to market. Scalable deployment: Understand how Databricks stability and reliability increased efficiency in prototyping and running scalable production workloads. Join Shiyi Pickrell to understand the future of AI interoperability, how it’s generating business value and driving the next generation of travel AI-powered experiences.

Unleash Your Content: AI-Powered Metadata for Targeting, Personalization and Brand Safety

Unleash Your Content: AI-Powered Metadata for Targeting, Personalization and Brand Safety

2025-06-11 Watch
lightning_talk
William Gaviria Rojas (Coactive AI)

In an era of skyrocketing content volumes, companies are sitting on huge libraries — of video, images and audio — just waiting to be leveraged to power targeted advertising and recommendations, as well as reinforce brand safety. Coactive AI will show how fast and accurate AI-driven metadata enrichment, combined with Databricks Unity Catalog and lakehouse, is accelerating and optimizing media workflows. Learn how leading brands are using content metadata to: Unlock new revenue through contextual advertising Drive personalization at scale Enhance brand safety with detailed, scene-level analysis Build unified taxonomies that fuel cross-functional insights Transform content from a static asset into a dynamic engine for growth, engagement and compliance.

Unlocking Enterprise Potential: Key Insights from P&G's Deployment of Unity Catalog at Scale

Unlocking Enterprise Potential: Key Insights from P&G's Deployment of Unity Catalog at Scale

2025-06-11 Watch
lightning_talk

This session will explore Databricks Unity Catalog (UC) implementation by P&G to enhance data governance, reduce data redundancy and improve the developer experience through the enablement of a Lakehouse architecture. The presentation will cover: The distinction between data treated as a product and standard application data, highlighting how UC's structure maximizes the value of data in P&G's data lake. Real-life examples from two years of using Unity Catalog, demonstrating benefits such as improved governance, reduced waste and enhanced data discovery. Challenges related to disaster recovery and external data access, along with our collaboration with Databricks to address these issues. Sharing our experience can provide valuable insights for organizations planning to adopt Unity Catalog on an enterprise scale.

Adobe’s Security Lakehouse: OCSF, Data Efficiency and Threat Detection at Scale

Adobe’s Security Lakehouse: OCSF, Data Efficiency and Threat Detection at Scale

2025-06-11 Watch
talk
Bharat Gamini (Adobe) , Andrew Krioukov (Antimatter)

This session will explore how Adobe uses a sophisticated data security architecture built on the Databricks Data Intelligence Platform, along with the Open Cybersecurity Schema Framework (OCSF), to enable scalable, real-time threat detection across more than 10 PB of security data. We’ll compare different approaches to OCSF implementation and demonstrate how Adobe processes massive security datasets efficiently — reducing query times by 18%, maintaining 99.4% SLA compliance, and supporting 286 security users across 17 teams with over 4,500 daily queries. By using Databricks' Platform for serverless compute, scalable architecture, and LLM-powered recommendations, Adobe has significantly improved processing speed and efficiency, resulting in substantial cost savings. We’ll also highlight how OCSF enables advanced cross-tool analytics and automation, streamlining investigations. Finally, we’ll introduce Databricks’ new open-source OCSF toolkit for scalable security data normalization and invite the community to contribute.

Build AI-Powered Applications Natively on Databricks

Build AI-Powered Applications Natively on Databricks

2025-06-11 Watch
talk
Andre Furlan Bueno (Databricks)

Discover how to build and deploy AI-powered applications natively on the Databricks Data Intelligence Platform. This session introduces best practices and a standard reference architecture for developing production-ready apps using popular frameworks like Dash, Shiny, Gradio, Streamlit and Flask. Learn how to leverage agents for orchestration and explore primary use cases supported by Databricks Apps, including data visualization, AI applications, self-service analytics and data quality monitoring. With serverless deployment and built-in governance through Unity Catalog, Databricks Apps enables seamless integration with your data and AI models, allowing you to focus on delivering impactful solutions without the complexities of infrastructure management. Whether you're a data engineer or an app developer, this session will equip you with the knowledge to create secure, scalable and efficient applications within a Databricks environment.

Cost Management Foundations: The First 100 Days Checklist

Cost Management Foundations: The First 100 Days Checklist

2025-06-11 Watch
talk
Sadhana Bala (Databricks) , Piyush Singh (Databricks)

In this session you'll learn how to onboard to Databricks in a way that ensures you can effectively measure and manage ROI of using Databricks down the line. We will show you how to set-up workspaces and compute, decide on a tagging strategy and how to utilize policies to enforce best practices and make future you a happy camper.

Data Strategy in Motion: What Successful Organizations Get Right

Data Strategy in Motion: What Successful Organizations Get Right

2025-06-11 Watch
talk
Robin Sutara (Databricks)

Join Robin Sutara, Field CDO for Databricks, as she discusses creating a robust data strategy for organizational change in an ecosystem that is under constant transformation. Attendees will learn best practices from Databricks customers for successful data strategy, including business alignment, people and culture, democratization, governance, and measurement as vital strategic aspects. Understanding these elements will help you drive more data and AI transformation success within your organization.