talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

715

Sessions & talks

Showing 526–550 of 715 · Newest first

Search within this event →
Patients Are Waiting...Accelerating Healthcare Innovation with Data, AI and Agents

Patients Are Waiting...Accelerating Healthcare Innovation with Data, AI and Agents

2025-06-10 Watch
talk
JZTS (Jonatan Selsing) (Novo Nordisk) , Christian Sørensen (Novo Nordisk) , Thomas Larsen (Novo Nordisk A/S)

This session is repeated. In an era of exponential data growth, organizations across industries face common challenges in transforming raw data into actionable insights. This presentation showcases how Novo Nordisk is pioneering insights generation approaches to clinical data management and AI. Using our clinical trials platform FounData, built on Databricks, we demonstrate how proper data architecture enables advanced AI applications. We'll introduce a multi-agent AI framework that revolutionizes data interaction, combining specialized AI agents to guide users through complex datasets. While our focus is on clinical data, these principles apply across sectors – from manufacturing to financial services. Learn how democratizing access to data and AI capabilities can transform organizational efficiency while maintaining governance. Through this real-world implementation, participants will gain insights on building scalable data architectures and leveraging multi-agent AI frameworks for responsible innovation.

Petrobras MLOps Transformation With MLflow and Databricks

2025-06-10
talk
Bruno Guberfain do Amaral (Petrobras) , Luiz Carrossoni Neto (Databricks)

As a global energy leader, Petrobras relies on machine learning to optimize operations, but manual model deployment and validation processes once created bottlenecks that delayed critical insights. In this session, we’ll reveal how we revolutionized our MLOps framework using MLflow, Databricks Asset Bundles (DABs) and Unity Catalog to: Replace error-prone manual validation with automated metric-driven workflows Reduce model deployment timelines from days to hours Establish granular governance and reproducibility across production models Discover how we enabled data scientists to focus on innovation—not infrastructure—through standardized pipelines while ensuring compliance and scalability in one of the world’s most complex energy ecosystems.

Smart Vehicles, Secure Data: Recreating Vehicle Environments for Privacy-Preserving Machine Learning

Smart Vehicles, Secure Data: Recreating Vehicle Environments for Privacy-Preserving Machine Learning

2025-06-10 Watch
talk
Frankie Cancino (Mercedes-Benz R&D)

As connected vehicles generate vast amounts of personal and sensitive data, ensuring privacy and security in machine learning (ML) processes is essential. This session explores how Trusted Execution Environments (TEEs) and Azure Confidential Computing can enable privacy-preserving ML in cloud environments. We’ll present a method to recreate a vehicle environment in the cloud, where sensitive data remains private throughout model training, inference and deployment. Attendees will learn how Mercedes-Benz R&D North America builds secure, privacy-respecting personalized systems for the next generation of connected vehicles.

Swimming at Our Own Lakehouse: How Databricks Uses Databricks

Swimming at Our Own Lakehouse: How Databricks Uses Databricks

2025-06-10 Watch
talk
Alan Jackoway (Databricks) , Bruce Wong (Databricks)

This session is repeated. Peek behind the curtain to learn how Databricks processes hundreds of petabytes of data across every region and cloud where we operate. Learn how Databricks leverages Data and AI to scale and optimize every aspect of the company. From facilities and legal to sales and marketing and of course product research and development. This session is a high-level tour inside Databricks to see how Data and AI enable us to be a better company. We will go into the architecture of things for how Databricks is used for internal use cases like business analytics and SIEM as well as customer-facing features like system tables and assistant. We will cover how data production of our data flow and how we maintain security and privacy while operating a large multi-cloud, multi-region environment.

Toyota: Maximizing Business Value and Ensuring Data Privacy with Databricks in Connected Vehicles

2025-06-10
talk
Yoshihiro Oe (TOYOTA MOTOR CORPORATION) , Satoshi Kuramitsu (Databricks)

As global data privacy regulations tighten, balancing user data protection with maximizing its business value is crucial.This presentation explores how integrating Databricks into our connected-vehicle data platform enhances both governance and business outcomes. We’ll highlight a case where migrating from EMR to Databricks improved deletion performance and cut costs by 99% with Delta Lake. This shift not only ensures compliance with data-privacy regulations but also maximizes the potential of connected-vehicle data. We are developing a platform that balances compliance with business value and sets a global standard for data usage, inviting partners to join us in building a secure, efficient mobility ecosystem.

Trillions of Data Records, Zero Bottlenecks for Investor Decision-Making

Trillions of Data Records, Zero Bottlenecks for Investor Decision-Making

2025-06-10 Watch
talk
Ryan Sullivan (J. Goldman & Co., L.P.) , Carlos Capellan (J. Goldman & Co., L.P.)

In finance, every second counts. That’s why the Data team at J. Goldman & Co. needed to transform trillions of real-time market data records into a single, actionable insight — instantly, and without waiting on development resources. By modernizing their internal data platform with a scalable architecture, they built a streamlined, web-native alternative data interface that puts live market data directly in the hands of investment teams. With Databricks’ computational power and Unity Catalog’s secure governance, they eliminated bottlenecks and achieved the fastest time-to-market for critical investor decisions possible. Learn how J. Goldman & Co. Innovates with Databricks and Sigma to: Ensure live, scalable data access across trillions of records in a flexible UI Empower non-technical teams with true self-service data exploration

Trust You Can Measure: Data Quality Standards in The Lakehouse

Trust You Can Measure: Data Quality Standards in The Lakehouse

2025-06-10 Watch
talk
Amit Pahwa (Databricks) , Sergiy Kanyshchev (Databricks)

Do you trust your data? If you’ve ever struggled to figure out which datasets are reliable, well-governed, or safe to use, you’re not alone. At Databricks, our own internal lakehouse faced the same challenge—hundreds of thousands of tables, but no easy way to tell which data met quality standards. In this talk, the Databricks Data Platform team shares how we tackled this problem by building the Data Governance Score—a way to systematically measure and surface trust signals across the entire lakehouse. You’ll learn how we leverage Unity Catalog, governed tags, and enforcement to drive better data decisions at scale. Whether you're a data engineer, platform owner, or business leader, you’ll leave with practical ideas on how to raise the bar for data quality and trust in your own data ecosystem.

Unlocking Streaming Power: How SEGA Wins With Lakeflow Declarative Pipelines

2025-06-10
talk
Felix Baker (SEGA Europe Limited) , Craig Porteous (Advancing Analytics)

Streaming data is hard and costly — that's the default opinion, but it doesn’t have to be.In this session, discover how SEGA simplified complex streaming pipelines and turned them into a competitive edge. SEGA sees over 40,000 events per second. That's no easy task, but enabling personalised gaming experiences for over 50 million gamers drives a huge competitive advantage. If you’re wrestling with streaming challenges, this talk is your next checkpoint.We’ll unpack how Lakeflow Declarative Pipelines helped SEGA, from automated schema evolution and simple data quality management to seamless streaming reliability. Learn how Lakeflow Declarative Pipelines drives value by transforming chaos emeralds into clarity, delivering results for a global gaming powerhouse. We'll step through the architecture, approach and challenges we overcame.Join Craig Porteous, Microsoft MVP from Advancing Analytics, and Felix Baker, Head of Data Services at SEGA Europe, for a fast-paced, hands-on journey into Lakeflow Declarative Pipelines’ unique powers.

Analyst Roadmap to Databricks: From SQL to End-to-End BI

Analyst Roadmap to Databricks: From SQL to End-to-End BI

2025-06-10 Watch
lightning_talk
Jake Duckers (Spencer Gifts)

Analysts often begin their Databricks journey by running familiar SQL queries in the SQL Editor, but that’s just the start. In this session, I’ll share the roadmap I followed to expand beyond ad-hoc querying into SQL Editor/notebook-driven development to scheduled data pipelines producing interactive dashboards — all powered by Databricks SQL and Unity Catalog. You’ll learn how to organize tables with primary-key/foreign-key relationships along with creating table and column comments to form the semantic model, utilizing DBSQL features like RELY constraints. I’ll also show how parameterized dashboards can be set up to empower self-service analytics and feed into Genie Spaces. Attendees will walk away with best practices for starting out with building a robust BI platform on Databricks, including tips for table design and metadata enrichment. Whether you’re a data analyst or BI developer, this talk will help you unlock powerful, AI-enhanced analytics workflows.

Building Reliable Agentic AI on Databricks

Building Reliable Agentic AI on Databricks

2025-06-10 Watch
lightning_talk
Barr Moses (Monte Carlo)

Agentic AI is the next evolution in artificial intelligence, with the potential to revolutionize the industry. However, its potential is matched only by its risk: without high-quality, trustworthy data, agentic AI can be exponentially dangerous. Join Barr Moses, CEO and Co-Founder of Monte Carlo, to explore how to leverage Databricks' powerful platform to ensure your agentic AI initiatives are underpinned by reliable, high-quality data. Barr will share: How data quality impacts agentic AI performance at every stage of the pipeline Strategies for implementing data observability to detect and resolve data issues in real-time Best practices for building robust, error-resilient agentic AI models on Databricks. Real-world examples of businesses harnessing Databricks' scalability and Monte Carlo’s observability to drive trustworthy AI outcomes Learn how your organization can deliver more reliable agentic AI and turn the promise of autonomous intelligence into a strategic advantage.Audio for this session is delivered in the conference mobile app, you must bring your own headphones to listen.

From Overwhelmed to Empowered: How SAP is Democratizing Data & AI with Databricks to Solve Problems

From Overwhelmed to Empowered: How SAP is Democratizing Data & AI with Databricks to Solve Problems

2025-06-10 Watch
lightning_talk

From Overwhelmed to Empowered - How SAP is Democratizing Data & AI to Solve Real Business Problems with Databricks: Scaling the adoption of Data & AI within enterprises is critical for driving transformative business outcomes. Learn how the SAP Experience Garage, SAP’s largest internal enablement and innovation driver, is turning all employees into data enthusiast through the integration of Databricks technologies. The SAP Experience Garage platform brings together colleagues with various levels of data knowledge and skills in one seamless space. Here, they can explore and use tangible datasets and data science/AI tooling from Databricks, enablement capabilities, and collaborative features to tackle real business-related challenges and create prototypes that find their way into SAP’s ecosystem.

Sponsored by: Immuta | Agentic Impact to Secure Data Provisioning

Sponsored by: Immuta | Agentic Impact to Secure Data Provisioning

2025-06-10 Watch
lightning_talk
Matthew Vogt (Immuta)

As AI, internal data marketplaces, and self-service access become more popular, data teams must rethink how they securely govern and provision data at scale. Success depends on provisioning data in a way that balances security, compliance, and innovation, and promotes data-driven decision making when decision makers are AI Agents. In this session, we'll discuss how you can:- Launch and manage effective and secure data provisioning- Secure your AI initiatives- Scale your Data Governors through Agentic AIJoin us to learn how to navigate the complexities of modern data environments, and start putting your data to work faster.

Sponsored by:  Infosys | Agentic AI Governance: Shaping a Responsible Future

Sponsored by: Infosys | Agentic AI Governance: Shaping a Responsible Future

2025-06-10 Watch
lightning_talk
Anup Kumar Bose (Infosys Limited)

Agentic AI represents a quantum leap beyond generative AI—enabling systems to make autonomous decisions and act independently. While this unlocks transformative potential, it also brings complex governance challenges. This session explores novel risks, practical strategies and proven Data & AI governance frameworks for governing agentic AI at scale

Sponsored by: Salesforce | Elevate Agentforce experience with Databricks Agents and Zero Copy

Sponsored by: Salesforce | Elevate Agentforce experience with Databricks Agents and Zero Copy

2025-06-10 Watch
lightning_talk
Alex Correa (Salesforce)

See how Agentforce connects with Databricks to create a seamless, intelligent workspace. With zero copy integration, users can access real-time Databricks data without moving or duplicating it. Explore how Agentforce automatically delegates tasks to a Databricks agent and enables end-to-end execution without leaving the flow of work,, eliminating the swivel-chair effect. Together, these capabilities power a unified, cross-platform experience that drives faster decisions and smarter outcomes.

The JLL Training and Upskill Program for Our Warehouse Migration to Databricks

The JLL Training and Upskill Program for Our Warehouse Migration to Databricks

2025-06-10 Watch
lightning_talk

Databricks Odyssey is JLL’s bespoke training program designed to upskill and prepare data professionals for a new world of data lakehouse. Based on the concepts of learn, practice and certify, participants earn points, moving through five levels by completing activities with business application of Databricks key features. Databricks Odyssey facilitates cloud data warehousing migration by providing best practice frameworks, ensuring efficient use of pay-per-compute platforms. JLL/T Insights and Data fosters a data culture through learning programs that develop in-house talent and create career pathways. Databricks Odyssey offers: JLL-specific hands-on learning Gamified 'level up' approach Practical, applicable skills Benefits include: Improved platform efficiency Enhanced data accuracy and client insights Ongoing professional development Potential cost savings through better utilization

“What I Wish I Had Known in My Last SOC.” Confessions of a Cybersecurity Executive

“What I Wish I Had Known in My Last SOC.” Confessions of a Cybersecurity Executive

2025-06-10 Watch
lightning_talk
Bruce Hembree (Zigguratum Inc)

In Bruce’s career in cyber warfare and enterprise cybersecurity, he worked on many of the highest profile botnet and nation state takedowns in history. He also helped build the tech in one of the world’s most advanced SOCs. Bruce will explain what he learned from that experience and why it prompted him to leave early retirement, sell his beloved sports car and co-found ziggiz. We all know there’s more data than ever. Anyone close to cybersecurity also knows that SIEMs, typically at the center of enterprise cybersecurity operations, have become too expensive even at the highest levels of government and Fortune 100s.

Why You Should Move to Lakeflow Declarative Pipelines Serverless

Why You Should Move to Lakeflow Declarative Pipelines Serverless

2025-06-10 Watch
lightning_talk
Nandini N (Databricks)

Lakeflow Declarative Pipelines Serverless offers a range of benefits that make it an attractive option for organizations looking to optimize their ETL (Extract, Transform, Load) processes.Key benefits of Lakeflow Declarative Pipelines Serverless: Automatic infrastructure management Unified batch and streaming Cost and performance optimization Simplified configuration Granular observability By moving to Lakeflow Declarative Pipelines Serverless, organizations can achieve faster, more reliable, and cost-effective data pipeline management, ultimately driving better business insights and outcomes.

Accelerating Model Development and Fine-Tuning on Databricks with TwelveLabs

Accelerating Model Development and Fine-Tuning on Databricks with TwelveLabs

2025-06-10 Watch
talk
Wenwen Gao (NVIDIA) , Aiden Lee (Twelve Labs, Inc)

Scaling large language models (LLMs) and multimodal architectures requires efficient data management and computational power. NVIDIA NeMo Framework Megatron-LM on Databricks is an open source solution that integrates GPU acceleration and advanced parallelism with Databricks Delta Lakehouse, streamlining workflows for pre-training and fine-tuning models at scale. This session highlights context parallelism, a unique NeMo capability for parallelizing over sequence lengths, making it ideal for video datasets with large embeddings. Through the case study of TwelveLabs’ Pegasus-1 model, learn how NeMo empowers scalable multimodal AI development, from text to video processing, setting a new standard for LLM workflows.

A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

A Unified Solution for Data Management and Model Training With Apache Iceberg and Mosaic Streaming

2025-06-10 Watch
talk
Zilong Zhou (ByteDance)

This session introduces ByteDance’s challenges in data management and model training, and addresses them by Magnus (enhanced Apache Iceberg) and Byted Streaming (customized Mosaic Streaming). Magnus uses Iceberg’s branch/tag to manage massive datasets/checkpoints efficiently. With enhanced metadata and a custom C++ data reader, Magnus achieves optimal sharding, shuffling and data loading. Flexible table migration, detailed metrics and built-in full-text indexes on Iceberg tables further ensure training reliability. When training with ultra-large datasets, ByteDance faced scalability and performance issues. Given Streaming's scalability in distributed training and good code structure, the team chose and customized it to resolve challenges like slow startup, high resource consumption, and limited data source compatibility. In this session, we will explore Magnus and Byted Streaming, discuss their enhancements and demonstrate how they enable efficient and robust distributed training.

Building Dashboards as a Production-Grade Data Product

2025-06-10
talk
Caleb Priester (Zillow)

At Zillow, we have accelerated the volume and quality of our dashboards by leveraging a modern SDLC with version control and CI/CD. In the past three months, we have released 32 production-grade dashboards and shared them securely across the organization while cutting error rates in half over that span. In this session, we will provide an overview of how we utilize Databricks asset bundles and GitLab CI/CD to create performant dashboards that can be confidently used for mission-critical operations. As a concrete example, we'll then explore how Zillow's Data Platform team used this approach to automate our on-call support analysis, leveraging our dashboard development strategy alongside Databricks LLM offerings to create a comprehensive view that provides actionable performance metrics alongside AI-generated insights and action items from the hundreds of requests that make up our support workload.

Comprehensive Data Warehouse Migrations to Databricks SQL

Comprehensive Data Warehouse Migrations to Databricks SQL

2025-06-10 Watch
talk
Simon Eligulashvili (Databricks) , Sundar Shankar (Databricks)

This session is repeated. Databricks has a free, comprehensive solution for migrating legacy data warehouses from a wide range of source systems. See how we accelerate migrations from legacy data warehouses to Databricks SQL, achieving 50% faster migration than traditional methods. We'll cover the tool’s automated migration process: Discovery: Source system profiling Assessment: Legacy code analysis Conversion: Advanced code transpilation Reconciliation: Data validation This comprehensive approach increases the predictability of migration projects, allowing businesses to plan and execute migrations with greater confidence.

Democratizing Data in a Regulated Industry: Best Practices and Outcomes With J.P. Morgan Payments

Democratizing Data in a Regulated Industry: Best Practices and Outcomes With J.P. Morgan Payments

2025-06-10 Watch
talk
Narayan Raj (JPMorgan Chase)

Join our 2024 Databricks Disruptor award winners for a session on how they leveraged the Databricks and AWS platforms to build an internal technology marketplace in the highly regulated banking industry empowering end-users to innovate and own their data sets while maintaining strict compliance. In this talk, leaders from the J.P. Morgan Payments Data team share how they’ve done it — from keeping customer needs at the center of all decision-making to promoting a culture of experimentation. They’ll also expand upon how J.P. Morgan Payments products team now leverages the data platform they’ve built to create customer products including Cash Flow Intelligence.

FinOps at Scale: Best Practices for Cost-Efficient Growth on Databricks

FinOps at Scale: Best Practices for Cost-Efficient Growth on Databricks

2025-06-10 Watch
talk
Sadhana Bala (Databricks) , Cody Davis (Databricks)

This session is repeated. You’ve seen your usage grow on Databricks, across departments, use cases, product lines and users. What can you do to ensure your end-users (data practitioners) of the platform remain cost-efficient and productive, while staying accountable to your budget? We’ll discuss spend monitoring, chargeback models and developing a culture of cost efficiency by using Databricks tools.

How Databricks Powers Real-Time Threat Detection at Barracuda XDR

How Databricks Powers Real-Time Threat Detection at Barracuda XDR

2025-06-10 Watch
talk
Alex Dangel (Barracuda Networks) , Merium Khalid (Barracuda Networks)

As cybersecurity threats grow in volume and complexity, organizations must efficiently process security telemetry for best-in-class detection and mitigation. Barracuda’s XDR platform is redefining security operations by layering advanced detection methodologies over a broad range of supported technologies. Our vision is to deliver unparalleled protection through automation, machine learning and scalable detection frameworks, ensuring threats are identified and mitigated quickly. To achieve this, we have adopted Databricks as the foundation of our security analytics platform, providing greater control and flexibility while decoupling from traditional SIEM tools. By leveraging Lakeflow Declarative Pipelines, Spark Structured Streaming and detection-as-code CI/CD pipelines, we have built a real-time detection engine that enhances scalability, accuracy and cost efficiency. This session explores how Databricks is shaping the future of XDR through real-time analytics and cloud-native security.

Let's Save Tons of Money With Cloud-Native Data Ingestion!

Let's Save Tons of Money With Cloud-Native Data Ingestion!

2025-06-10 Watch
talk
Tyler Croy (Scribd, Inc.)

Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this session we will dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more. By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed! Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform. This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments.