talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

715

Sessions & talks

Showing 601–625 of 715 · Newest first

Search within this event →
Sponsored by: Monte Carlo | The Illusion of Done: Why the Real Work for AI Starts in Production

Sponsored by: Monte Carlo | The Illusion of Done: Why the Real Work for AI Starts in Production

2025-06-10 Watch
lightning_talk
Shane Murray (Monte Carlo)

Your model is trained. Your pilot is live. Your data looks AI-ready. But for most teams, the toughest part of building successful AI starts after deployment. In this talk, Shane Murray and Ethan Post share lessons from the development of Monte Carlo’s Troubleshooting Agent – an AI assistant that helps users diagnose and fix data issues in production. They’ll unpack what it really takes to build and operate trustworthy AI systems in the real world, including: The Illusion of Done – Why deployment is just the beginning, and what breaks in production; Lessons from the Field – A behind-the-scenes look at the architecture, integration, and user experience of Monte Carlo’s agent; Operationalizing Reliability – How to evaluate AI performance, build the right team, and close the loop between users and model. Whether you're scaling RAG pipelines or running LLMs in production, you’ll leave with a playbook for building data and AI systems you—and your users—can trust.

Sponsored by: Prophecy | Taming Industry-Specific Data Sets: How to Simplify Access and Collaboration to FHIR and Beyond

Sponsored by: Prophecy | Taming Industry-Specific Data Sets: How to Simplify Access and Collaboration to FHIR and Beyond

2025-06-10 Watch
lightning_talk
Itai Weiss (Prophecy)

Industry-standard data formats can streamline data exchange, but they come with significant complexity and can seem ‘impossible’. One example is FHIR (Fast Healthcare Interoperability Resources), the healthcare data standard with a broad scope of 180 entities. With its widespread adoption, FHIR promises to speed delivering insights to improve patient experiences, optimize operations, and drive better outcomes. Yet working with FHIR data is no small task. Its complexity and extensions push it beyond the reach of most analysts. Instead, it lands on data engineers, who must load, transform, and keep up with changes. This creates bottlenecks, slows down insights, and places a heavy maintenance burden on already backlogged data engineering teams. Learn how to tame FHIR for analysts - if you want to speed time-to-insight, reduce engineering effort, and enable true cross-functional collaboration on industry-specific data - this is a session you don’t want to miss.

Transforming Title Insurance With Databricks Batch Inference

Transforming Title Insurance With Databricks Batch Inference

2025-06-10 Watch
talk
Madhu Kolli (First American Financial) , Prabhaker Narsina (First American Financial)

Join us as we explore how First American Data & Analytics, a leading property-centric information provider, revolutionized its data extraction processes using batch inference on the Databricks Platform. Discover how it overcame the challenges of extracting data from millions of historical title policy images and reduced project timelines by 75%. Learn how First American optimized its data processing capabilities, reduced costs by 70% and enhanced the efficiency of its title insurance processes, ultimately improving the home-buying experience for buyers, sellers and lenders. This session will delve into the strategic integration of AI technologies, highlighting the power of collaboration and innovation in transforming complex data challenges into scalable solutions.

Unleash the Power of Automated Data Governance: Classify, Tag and Protect Your Data — Effortlessly

Unleash the Power of Automated Data Governance: Classify, Tag and Protect Your Data — Effortlessly

2025-06-10 Watch
talk
Zeashan Pappa (Databricks) , Kristen Wilder (Databricks)

Struggling to keep up with data governance at scale? Join us to explore how automated data classification, tag policies and ABAC streamline access control while enhancing security and compliance. Get an exclusive look at the new Governance Hub, built to give your teams deeper visibility into data usage, access patterns and metadata — all in one place. Whether you're managing thousands or millions of assets, discover how to classify, tag and protect your data estate effortlessly with the latest advancements in Unity Catalog.

You Mean I Can Talk to My Data? Reimagining How KPMG Engages Data Using AI|BI Genie

You Mean I Can Talk to My Data? Reimagining How KPMG Engages Data Using AI|BI Genie

2025-06-10 Watch
lightning_talk
Dennis Tally (KPMG)

“I don’t want to spend time filtering through another dashboard — I just need an answer now.” We’ve all experienced the frustration of wading through dashboards, yearning for immediate answers. Traditional reports and visualizations, though essential, often complicate the process for decision-makers. The digital enterprise demands a shift towards conversational, natural language interactions with data. At KPMG, AI|BI Genie is reimagining our approach by allowing users to inquire about data just as they would consult a knowledgeable colleague, obtaining precise and actionable insights instantly. Discover how the KPMG Contract to Cash team leverages AI|BI Genie to enhance data engagement, drive insights and foster business growth. Join us to see AI|BI Genie in action and learn how you can transform your data interaction paradigm.

A Comprehensive Guide to Streaming on the Data Intelligence Platform

A Comprehensive Guide to Streaming on the Data Intelligence Platform

2025-06-10 Watch
talk
Indrajit Roy (Databricks) , Ray Zhu (Databricks)

This session is repeated.Is stream processing the future? We think so — and we’re building it with you using the latest capabilities in Apache Spark™ Structured Streaming. If you're a power user, this session is for you: we’ll demo new advanced features, from state transformations to real-time mode. If you prefer simplicity, this session is also for you: we’ll show how Lakeflow Declarative Pipelines simplifies managing streaming pipelines. And if you’re somewhere in between, we’ve got you covered — we’ll explain when to use your own streaming jobs versus Lakeflow Declarative Pipelines.

AI/BI Dashboards and AI/BI Genie: Dashboards and Last-Mile Analytics Made Simple

AI/BI Dashboards and AI/BI Genie: Dashboards and Last-Mile Analytics Made Simple

2025-06-10 Watch
talk
Josue Bogran (JosueBogran.com & zeb.co) , Youssef Mrini (Databricks)

Databricks announced two new features in 2024: AI/BI Dashboards and AI/BI Genie. Dashboards is a redesigned dashboarding experience for your regular reporting needs, while Genie provides a natural language experience for your last-mile analytics. In this session, Databricks Solutions Architect and content creator Youssef Mrini will present alongside Databricks MVP and content creator Josue A. Bogran on how you can get the most value from these tools for your organization. Content covered includes: Setup necessary, including Unity Catalog, permissions and compute Building out a dashboard with AI/BI Dashboards Creating and training an AI/BI Genie workspace to reliably deliver answers When to use Dashboards, Genie, and when to use other tools such as PBI, Tableau, Sigma, ChatGPT, etc. Fluff-free, full of practical tips, and geared to help you deliver immediate impact with these new Databricks capabilities.

Best Practices to Mitigate AI Security Risks

Best Practices to Mitigate AI Security Risks

2025-06-10 Watch
talk
Arun Pamulapati (Databricks) , Samrat Ray (Databricks)

This session is repeated. AI is transforming industries, enhancing customer experiences and automating decisions. As organizations integrate AI into core operations, robust security is essential. The Databricks Security team collaborated with top cybersecurity researchers from OWASP, Gartner, NIST, HITRUST and Fortune 100 companies to evolve the Databricks AI Security Framework (DASF) to version 2.0. In this session, we’ll cover an AI security architecture using Unity Catalog, MLflow, egress controls, and AI gateway. Learn how security teams, AI practitioners and data engineers can secure AI applications on Databricks. Walk away with:• A reference architecture for securing AI applications• A worksheet with AI risks and controls mapped to industry standards like MITRE, OWASP, NIST and HITRUST• A DASF AI assistant tool to test your AI security

Building AI Models In Health Care Using Semi-Synthetic Data

Building AI Models In Health Care Using Semi-Synthetic Data

2025-06-10 Watch
talk
Holden Karau (Fight Health Insurance INC)

Regulated or restricted fields like Health Care make collecting training data complicated. We all want to do the right thing, but how? This talk will look at how Fight Health Insurance used de-identified public and proprietary information to create a semi-synthetic training set for use in fine-tuning machine learning models to power Fight Paperwork. We'll explore how to incorporate the latest "reasoning" techniques in fine tuning as well as how to make models that you can afford to serve — think single GPU inference instead of a cluster of A100s. In addition to the talk we have the code used in a public GitHub repo — although it is a little rough, so you might want to use it more as a source of inspiration rather than directly forking it.

Building Knowledge Agents to Automate Document Workflows

Building Knowledge Agents to Automate Document Workflows

2025-06-10 Watch
talk
Jerry Liu (LlamaIndex)

This session is repeated. One of the biggest promises for LLM agents is automating all knowledge work over unstructured data — we call these "knowledge agents". To date, while there are fragmented tools around data connectors, storage and agent orchestration, AI engineers have trouble building and shipping production-grade agents beyond basic chatbots. In this session, we first outline the highest-value knowledge agent use cases we see being built and deployed at various enterprises. These are: Multi-step document research, Automated document extraction Report generation We then define the core architectural components around knowledge management and agent orchestration required to build these use cases. By the end you'll not only have an understanding of the core technical concepts, but also an appreciation of the ROI you can generate for end-users by shipping these use cases to production.

Building Real-Time Sport Model Insights with Spark Structured Streaming

Building Real-Time Sport Model Insights with Spark Structured Streaming

2025-06-10 Watch
talk
Aaron Hope (Draftkings) , Ethan Summers (Draftkings)

In the dynamic world of sports betting, precision and adaptability are key. Sports traders must navigate risk management, limitations of data feeds, and much more to prevent small model miscalculations from causing significant losses. To ensure accurate real-time pricing of hundreds of interdependent markets, traders provide key inputs such as player skill-level adjustments, whilst maintaining precise correlations. Black-box models aren’t enough— constant feedback loops drive informed, accurate decisions. Join DraftKings as we showcase how we expose real-time metrics from our simulation engine, to empower traders with deeper insights into how their inputs shape the model. Using Spark Structured Streaming, Kafka, and Databricks dashboards, we transform raw simulation outputs into actionable data. This transparency into our engines enables fine-grained control over pricing― leading to more accurate odds, a more efficient sportsbook, and an elevated customer experience.

Composing High-Accuracy AI Systems With SLMs and Mini-Agents

Composing High-Accuracy AI Systems With SLMs and Mini-Agents

2025-06-10 Watch
talk
Sharon Zhou (Lamini)

This session is repeated. For most companies, building compound AI systems remains aspirational. LLMs are powerful, but imperfect, and their non-deterministic nature makes steering them to high accuracy a challenge. In this session, we’ll demonstrate how to build compound AI systems using SLMs and highly accurate mini-agents that can be integrated into agentic workflows. You'll learn about breakthrough techniques, including: memory RAG, an embedding algorithm that reduces hallucinations using embed-time compute to generate contextual embeddings, improving indexing and retrieval, and memory tuning, a finetuning algorithm that reduces hallucinations using a Mixture of Memory Experts (MoME) to specialize models with proprietary data. We’ll also share real-world examples (text-to-SQL, factual reasoning, function calling, code analysis and more) across various industries. With these building blocks, we’ll demonstrate how to create high accuracy mini-agents that can be composed into larger AI systems.

Comprehensive Data Management and Governance With Azure Data Lake Storage

Comprehensive Data Management and Governance With Azure Data Lake Storage

2025-06-10 Watch
talk
James Baker (Microsoft) , Santhosh Pillai (Microsoft Corporation)

Given that data is the new oil, it must be treated as such. Organizations that pursue greater insight into their businesses and their customers must manage, govern, protect and observe the use of the data that drives these insights in an efficient, cost-effective, compliant and auditable manner without degrading access to that data. Azure Data Lake Storage offers many features which allow customers to apply such controls and protections to their critical data assets. Understanding how these features behave, the granularity, cost and scale implications and the degree of control or protection that they apply are essential to implement a data lake that reflects the value contained within. In this session, the various data protection, governance and management capabilities available now and upcoming in ADLS will be discussed. This will include how deep integration with Azure Databricks can provide a more comprehensive, end-to-end coverage for these concerns, yielding a highly efficient and effective data governance solution.

Data Modeling 101 for Data Lakehouse Demystified

Data Modeling 101 for Data Lakehouse Demystified

2025-06-10 Watch
talk

This session is repeated. In today’s data-driven world, the Data Lakehouse has emerged as a powerful architectural paradigm that unifies the flexibility of data lakes with the reliability and structure of traditional data warehouses. However, organizations must adopt the right data modeling techniques to unlock its full potential to ensure scalability, maintainability and efficiency. This session is designed for beginners looking to demystify the complexities of data modeling for the lakehouse and make informed design decisions. We’ll break down Medallion Architecture, explore key data modeling techniques and walk through the maturity stages of a successful data platform — transitioning from raw, unstructured data to well-organized, query-efficient models.

Delta Lake and the Data Mesh

Delta Lake and the Data Mesh

2025-06-10 Watch
talk
KyJah Keys (Nextdata)

Delta Lake has proven to be an excellent storage format. Coupled with the Databricks platform, the storage format has shined as a component of a distributed system on the lakehouse. The pairing of Delta and Spark provides an excellent platform, but users often struggle to perform comparable work outside of the Spark ecosystem. Tools such as delta-rs, Polars and DuckDb have brought access to users outside of Spark, but they are only building blocks of a larger system. In this 40-minute talk we will demonstrate how users can use data products on the Nextdata OS data mesh to interact with the Databricks platform to drive Delta Lake workflows. Additionally, we will show how users can build autonomous data products that interact with their Delta tables both inside and outside of the lakehouse platform. Attendees will learn how to integrate the Nextdata OS data mesh with the Databricks platform as both an external and integral component.

From Metadata to Agents: Building the future of content understanding with Coactive AI + Databricks

2025-06-10
talk
Augusto Moreno (NBC Universal) , William Gaviria Rojas (Coactive AI)

Media enterprises generate vast amounts of visual content, but unlocking its full potential requires multimodal AI at scale. Coactive AI and NBCUniversal’s Corporate Decision Sciences team are transforming how enterprises discover and understand visual content. We explore how Coactive AI and Databricks — from Delta Share to Genie — can revolutionize media content search, tagging and enrichment, enabling new levels of collaboration. Attendees will see how this AI-powered approach fuels AI workflows, enhances BI insights and drives new applications — from automating cut sheet generation to improving content compliance and recommendations. By structuring and sharing enriched media metadata, Coactive AI and NBCU are unlocking deeper intelligence and laying the groundwork for agentic AI systems that retrieve, interpret and act on visual content. This session will showcase real-world examples of these AI agents and how they can reshape future content discovery and media workflows.

How Corning Harnesses Unity Catalog for Enhanced FinOps Maturity and Cost Optimization

How Corning Harnesses Unity Catalog for Enhanced FinOps Maturity and Cost Optimization

2025-06-10 Watch
talk
Hamenoo, Jibreal (Corning Incorporated) , Matthew Kuehn (Databricks)

We will explore how leveraging Databricks' Unity Catalog has accelerated our FinOps maturity, enabling us to optimize platform utilization and achieve significant cost reductions. By implementing Unity Catalog, we've gained comprehensive visibility and governance over our data assets, leading to more informed decision-making and efficient resource allocation. Learn how Corning discovered actionable insights and leveraged best practices on utilizing Unity Catalog to streamline data management, enhance financial operations and drive substantial savings within your organization.

Migrating Legacy SAS Code to Databricks Lakehouse: What We Learned Along the Way

Migrating Legacy SAS Code to Databricks Lakehouse: What We Learned Along the Way

2025-06-10 Watch
talk
Dmitriy Alergant (Tier One Analytics Inc.) , Matt Adams (PacificSource Health Plans)

In PacificSource Health Plans, a health insurance company in the US, we are on a successful multi-year journey to migrate all of our data and analytics ecosystem to Databricks Enterprise Data Warehouse (lakehouse). A particular obstacle on this journey was a reporting data mart which relied on copious amounts of legacy SAS code that applied sophisticated business logic transformations for membership, claims, premiums and reserves. This core data mart was driving many of our critical reports and analytics. In this session we will share the unique and somewhat unexpected challenges and complexities we encountered in migrating this legacy SAS code. How our partner (T1A) leveraged automation technology (Alchemist) and some unique approaches to reverse engineer (analyze), instrument, translate, migrate, validate and reconcile these jobs; and what lessons we learned and carried from this migration effort.

Orchestration With Lakeflow Jobs

Orchestration With Lakeflow Jobs

2025-06-10 Watch
talk
Saad Ansari (Databricks) , Anthony Podgorsak (Databricks)

This session is repeated. Curious about orchestrating data pipelines on Databricks? Join us for an introduction to Lakeflow Jobs (formerly Databricks Workflows) — an easy-to-use orchestration service built into the Databricks Data Intelligence Platform. Lakeflow Jobs simplifies automating your data and AI workflows, from ETL pipelines to machine learning model training. In this beginner-friendly session, you'll learn how to: Build and manage pipelines using a visual approach Monitor workflows and rerun failures with repair runs Automate tasks like publishing dashboards or ingesting data using Lakeflow Connect Add smart triggers that respond to new files or table updates Use built-in loops and conditions to reduce manual work and make workflows more dynamic We’ll walk through common use cases, share demos and offer tips to help you get started quickly. If you're new to orchestration or just getting started with Databricks, this session is for you.

Revolutionizing Data Insights and the Buyer Experience at GM Financial with Cloud Data Modernization

Revolutionizing Data Insights and the Buyer Experience at GM Financial with Cloud Data Modernization

2025-06-10 Watch
talk
Latha Subramanian (GM Financial) , Rick Whitford (Deloitte Consulting, LLP)

Deloitte and GM (General Motors) Financial have collaborated to design and implement a cutting-edge cloud analytics platform, leveraging Databricks. In this session, we will explore how we overcame challenges including dispersed and limited data capabilities, high-cost hardware and outdated software, with a strategic and comprehensive approach. With the help of Deloitte and Databricks, we were able to develop a unified Customer360 view, integrate advanced AI-driven analytics, and establish robust data governance and cyber security measures. Attendees will gain valuable insights into the benefits realized, such as cost savings, enhanced customer experiences, and broad employee upskilling opportunities. Unlock the impact of cloud data modernization and advanced analytics in the automotive finance industry and beyond with Deloitte and Databricks.

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

Securing Data Collaboration: A Deep Dive Into Security, Frameworks, and Use Cases

2025-06-10 Watch
talk
El Ghali Benchekroun (Databricks) , Bilal Obeidat (Databricks) , Bhavin Kukadia (Databricks)

This session will focus on the security aspects of Databricks Delta Sharing, Databricks Cleanrooms and Databricks Marketplace, providing an exploration of how these solutions enable secure and scalable data collaboration while prioritizing privacy. Highlights: Use cases — Understand how Delta Sharing facilitates governed, real-time data exchange across platforms and how Cleanrooms support multi-party analytics without exposing sensitive information Security internals — Dive into Delta Sharing's security frameworks Dynamic views — Learn about fine-grained security controls Privacy-first Cleanrooms — Explore how Cleanrooms enable secure analytics while maintaining strict data privacy standards Private exchanges — Explore the role of private exchanges using Databricks Marketplace in securely sharing custom datasets and AI models with specific partners or subsidiaries Network security & compliance — Review best practices for network configurations and compliance measures

Simplifying Training and GenAI Finetuning Using Serverless GPU Compute

Simplifying Training and GenAI Finetuning Using Serverless GPU Compute

2025-06-10 Watch
talk
Tejas Sundaresan (Databricks)

The last year has seen the rapid progress of Open Source GenAI models and frameworks. This talk covers best practices for custom training and OSS GenAI finetuning on Databricks, powered by the newly announced Serverless GPU Compute. We’ll cover how to use Serverless GPU compute to power AI training/GenAI finetuning workloads and framework support for libraries like LLM Foundry, Composer, HuggingFace, and more. Lastly, we’ll cover how to leverage MLFlow and the Databricks Lakehouse to streamline the end to end development of these models. Key takeaways include: How Serverless GPU compute saves customers valuable developer time and overhead when dealing with GPU infrastructure Best practices for training custom deep learning models (forecasting, recommendation, personalization) and finetuning OSS GenAI Models on GPUs across the Databricks stack Leveraging distributed GPU training frameworks (e.g. Pytorch, Huggingface) on Databricks Streamlining the path to production for these models Join us to learn about the newly announced Serverless GPU Compute and the latest updates to GPU training and finetuning on Databricks!

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

Sponsored by: Amperity | Transforming Guest Experiences: GoTo Foods’ Data Journey with Amperity & Databricks

2025-06-10 Watch
talk
Brett Newcome (GoTo Foods) , Manuel Valdes (GoTo Foods)

GoTo Foods, the platform company behind brands like Auntie Anne’s, Cinnabon, Jamba, and more, set out to turn a fragmented data landscape into a high-performance customer intelligence engine. In this session, CTO Manuel Valdes and Director of Marketing Technology Brett Newcome share how they unified data using Databricks Delta Sharing and Amperity’s Customer Data Cloud to speed up time to market. As part of GoTo’s broader strategy to support its brands with shared enterprise tools, the team: Unified loyalty, catering, and retail data into one customer view Cut campaign lead times from weeks to hours Activated audiences in real time without straining engineering Unlocked new revenue through smarter segmentation and personalization

Sponsored by: Cognizant | Toyota Utilizes a Unified Lakehouse Approach with Databricks

2025-06-10
talk
Rajesh Emani (Toyota Motors North America) , Satish Hegde (Cognizant)

Toyota, the world’s largest automaker, sought to accelerate time-to-data and empower business users with secure data collaboration for faster insights. Partnering with Cognizant, they established a Unified Data Lake, integrating SOX principles, Databricks Unity Catalog to ensure compliance and security. Additionally, they developed a Data Scanner solution to automatically detect non-sensitive data and accelerate data ingestion. Join this dynamic session to discover how they achieved it.

Sponsored by: Microsoft | Leverage the power of the Microsoft Ecosystem with Azure Databricks

Sponsored by: Microsoft | Leverage the power of the Microsoft Ecosystem with Azure Databricks

2025-06-10 Watch
talk
Anavi Nahar (Microsoft)

Join us for this insightful session to learn how you can leverage the power of the Microsoft ecosystem along with Azure Databricks to take your business to the next level. Azure Databricks is a fully integrated, native, first-party solution on Microsoft Azure. Databricks and Microsoft continue to actively collaborate on product development, ensuring tight integration, optimized performance, and a streamlined support experience. Azure Databricks offers seamless integrations with Power BI, Azure Open AI, Microsoft Purview, Azure Data Lake Storage (ADLS) and Foundry. In this session, you’ll learn how you can leverage deep integration between Azure Databricks and the Microsoft solutions to empower your organization to do more with your data estate. You’ll also get an exclusive sneak peek into the product roadmap.