talk-data.com talk-data.com

Event

Data + AI Summit 2025

2025-06-09 – 2025-06-13 Databricks Summit Visit website ↗

Activities tracked

63

Filtering by: LLM ×

Sessions & talks

Showing 1–25 of 63 · Newest first

Search within this event →
AI Evaluation from First Principles: You Can't Manage What You Can't Measure

AI Evaluation from First Principles: You Can't Manage What You Can't Measure

2025-06-12 Watch
talk
Pallavi Koppol (Databricks) , Jonathan Frankle (Databricks)

Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.

Automating Taxonomy Generation With Compound AI on Databricks

Automating Taxonomy Generation With Compound AI on Databricks

2025-06-12 Watch
talk
Allistair Cota (Lovelytics) , Sudhir Gajre (Lovelytics)

Taxonomy generation is a challenge across industries such as retail, manufacturing and e-commerce. Incomplete or inconsistent taxonomies can lead to fragmented data insights, missed monetization opportunities and stalled revenue growth. In this session, we will explore a modern approach to solving this problem by leveraging Databricks platform to build a scalable compound AI architecture for automated taxonomy generation. The first half of the session will walk you through the business significance and implications of taxonomy, followed by a technical deep dive in building an architecture for taxonomy implementation on the Databricks platform using a compound AI architecture. We will walk attendees through the anatomy of taxonomy generation, showcasing an innovative solution that combines multimodal and text-based LLMs, internal data sources and external API calls. This ensemble approach ensures more accurate, comprehensive and adaptable taxonomies that align with business needs.

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

Evaluation-Driven Development Workflows: Best Practices and Real-World Scenarios

2025-06-12 Watch
talk
Wenwen Xie (Databricks) , Arthur Dooner (Databricks)

In enterprise AI, Evaluation-Driven Development (EDD) ensures reliable, efficient systems by embedding continuous assessment and improvement into the AI development lifecycle. High-quality evaluation datasets are created using techniques like document analysis, synthetic data generation via Mosaic AI’s synthetic data generation API, SME validation, and relevance filtering, reducing manual effort and accelerating workflows. EDD focuses on metrics such as context relevance, groundedness, and response accuracy to identify and address issues like retrieval errors or model limitations. Custom LLM judges, tailored to domain-specific needs like PII detection or tone assessment, enhance evaluations. By leveraging tools like Mosaic AI Agent Framework and Agent Evaluation, MLflow, EDD automates data tracking, streamlines workflows, and quantifies improvements, transforming AI development for delivering scalable, high-performing systems that drive measurable organizational value.

Sponsored by: Galileo Technologies Inc. | Taming Rogue AI Agents with Observability-Driven Evaluation

2025-06-12
talk
Atindriyo Sanyal (Galileo)

LLM agents often drift into failure when prompts, retrieval, external data, and policies interact in unpredictable ways. This technical session introduces a repeatable, metric-driven framework for detecting, diagnosing, and correcting these undesirable behaviors in agentic systems at production scale. We demonstrate how to instrument the agent loop with fine-grained signals—tool-selection quality, error rates, action progression, latency, and domain-specific metrics—and send them into an evaluation layer (e.g. Galileo). This telemetry enables a virtuous cycle of system improvement. We present a practical example of a stock-trading system and show how brittle retrieval and faulty business logic cause undesirable behavior. We refactor prompts, adjust the retrieval pipeline—verifying recovery through improved metrics. Attendees will learn how to: add observability with minimal code change, pinpoint root causes via tracing, and drive continuous, metric-validated improvement.

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

2025-06-12 Watch
lightning_talk
Gabriel Lyons (Datahub)

While Databricks powers your data lakehouse, DataHub delivers the critical context layer connecting your entire ecosystem. We'll demonstrate how DataHub extends Unity Catalog to provide comprehensive metadata intelligence across platforms. DataHub's real-time platform:Cut AI model time-to-market with our unified REST and GraphQL APIs that ensure models train on reliable and compliant data from across platforms, with complete lineage trackingDecrease data incidents by 60% using our event-driven architecture that instantly propagates changes across systems*Transform data discovery from days to minutes with AI-powered search and natural language interfaces.Leaders use DataHub to transform Databricks data into integrated insights that drive business value. See our demo of syncback technology—detecting sensitive data and enforcing Databricks access controls automatically—plus our AI assistant that enhances' LLMs with cross-platform metadata.

From Days to Minutes - AI Transforms Audit at KPMG

From Days to Minutes - AI Transforms Audit at KPMG

2025-06-12 Watch
talk
David Tempelmann (Databricks) , Mark Wallington (KPMG UK)

Imagine performing complex regulatory checks in minutes instead of days. We made this a reality using GenAI on the Databricks Data Intelligence Platform. Join us for a deep dive into our journey from POC to a production-ready AI audit tool. Discover how we automated thousands of legal requirement checks in annual reports with remarkable speed and accuracy. Learn our blueprint for: High-Performance AI: Building a scalable, >90% accurate AI system with an optimized RAG pipeline that auditors praise. Robust Productionization: Achieving secure, governed deployment using Unity Catalog, MLflow, LLM-based evaluation, and MLOps best practices. This session provides actionable insights for deploying impactful, compliant GenAI in the enterprise.

Sponsored by: Meta | Supercharge Your Apps with Llama 4: Essential Tools and Techniques for Developers

Sponsored by: Meta | Supercharge Your Apps with Llama 4: Essential Tools and Techniques for Developers

2025-06-12 Watch
talk
LLM

Dive into the latest Llama 4 models. See for yourself how to unleash the power of Llama models and achieve next level performance with our curated set of practical tools, techniques and recipes. Join us as we dive into the world of Llama models, exploring their capabilities, developer tools, and exciting use cases. Discover how these innovative models are transforming industries and improving performance in real-world applications.

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

2025-06-12 Watch
lightning_talk
Simon Whiteley (Advancing Analytics)

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Founder discussion: Matei on UC, Data Intelligence and AI Governance

Founder discussion: Matei on UC, Data Intelligence and AI Governance

2025-06-12 Watch
talk
Matei Zaharia (Databricks)

Matei is a legend of open source: he started the Apache Spark project in 2009, co-founded Databricks, and worked on other widely used data and AI software, including MLflow, Delta Lake, and Dolly. His most recent research is about combining large language models (LLMs) with external data sources, such as search systems, and improving their efficiency and result quality. This will be a conversation coverering the latest and greatest of UC, Data Intelligence, AI Governance, and more.

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

AI/BI Genie: A Look Under the Hood of Everyone's Friendly, Neighborhood GenAI Product

2025-06-12 Watch
talk
Amir Hormati (Databricks) , Alnur Ali (Databricks)

Go beyond the user interface and explore the cutting-edge technology driving AI/BI Genie. This session breaks down the AI/BI Genie architecture, showcasing how LLMs, retrieval-augmented generation (RAG) and finely tuned knowledge bases work together to deliver fast, accurate responses. We’ll also explore how AI agents orchestrate workflows, optimize query performance and continuously refine their understanding. Ideal for those who want to geek out about the tech stack behind Genie, this session offers a rare look at the magic under the hood.

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

Beyond AI Accuracy: Building Trustworthy and Responsible AI Application Through Mosaic AI Framework

2025-06-12 Watch
talk
Ananya Roy (Databricks)

Generic LLM metrics are useless until it meets your business needs.In this session we will dive deep into creating bespoke custom state-of-the-art AI metrics that matters to you. Discuss best practices on LLM evaluation strategies, when to use LLM judge vs. statistical metrics and many more. Through a live demo using Mosaic AI Framework, we will showcase: How you can build your own custom AI metric tailored to your needs for your GenAI application Implement autonomous AI evaluation suite for complex, multi-agent systems Generate ground truth data at scale and production monitoring strategies Drawing from extensive experience on working with customers on real-world use cases, we will share actionable insights on building a robust AI evaluation framework By the end of this session, you'll be equipped to create AI solutions that are not only powerful but also relevant to your organizations needs. Join us to transform your AI strategy and make a tangible impact on your business!

Building Responsible AI Agents on Databricks

Building Responsible AI Agents on Databricks

2025-06-12 Watch
talk
Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

Sponsored by: Securiti | Safely Curating Data to Enable Enterprise AI with Databricks

2025-06-12 Watch
lightning_talk
Jocelyn Houle (Securiti.ai)

This session will explore how developers can easily select, extract, filter, and control data pre-ingestion to accelerate safe AI. Learn how the Securiti and Databricks partnership empowers Databricks users by providing the critical foundation for unlocking scalability and accelerating trustworthy AI development and adoption.Key Takeaways:● Understand how to leverage data intelligence to establish a foundation for frameworks like OWASP top 10 for LLM’s, NIST AI RMF and Gartner’s TRiSM.● Learn how automated data curation and synching address specific risks while accelerating AI development in Databricks.● Discover how leading organizations are able to apply robust access controls across vast swaths of mostly unstructured data● Learn how to maintain data provenance and control as data is moved and transformed through complex pipelines in the Databricks platform.

Sponsored by: Qubika | Agentic AI In Finance: How To Build Agents Using Databricks And LangGraph

Sponsored by: Qubika | Agentic AI In Finance: How To Build Agents Using Databricks And LangGraph

2025-06-11 Watch
lightning_talk
Sebastian Diaz (Qubika)

Join us for this session on how to build AI finance agents with Databricks and LangChain. This session introduces a powerful approach to building AI agents by combining a modular framework that integrates LangChain, retrieval-augmented generation (RAG), and Databricks' unified data platform to build intelligent, adaptable finance agents. We’ll walk through the architecture and key components, including Databricks Unity Catalog, ML Flow, and Mosaic AI involved in building a system tailored for complex financial tasks like portfolio analysis, reporting automation, and real-time risk insights. We’ll also showcase a demo of one such agent in action - a Financial Analyst Agent. This agent emulates the expertise of a seasoned data analyst, delivering in-depth analysis in seconds - eliminating the need to wait hours or days for manual reports. The solution provides organizations with 24/7 access to advanced data analysis, enabling faster, smarter decision-making.

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

2025-06-11 Watch
lightning_talk
Doug MacWilliams (West Monroe)

Seismic shift Large Language Models are unleashing on data engineering, challenging traditional workflows. LLMs obliterate inefficiencies and redefine productivity. AI powerhouses automate complex tasks like documentation, code translation, and data model development with unprecedented speed and precision. Integrating LLMs into tools promises to reduce offshore dependency, fostering agile onshore innovation. Harnessing LLMs' full potential involves challenges, requiring deep dives into domain-specific data and strategic business alignment. Session will addresses deploying LLMs effectively, overcoming data management hurdles, and fostering collaboration between engineers and stakeholders. Join us to explore a future where LLMs redefine possibilities, inviting you to embrace AI-driven innovation and position your organization as a leader in data engineering.

Driving Secure AI Innovation with Obsidian Security, Databricks, and PointGuard AI

Driving Secure AI Innovation with Obsidian Security, Databricks, and PointGuard AI

2025-06-11 Watch
talk
Alfredo Hickman (Obsidian Security) , JD Braun (Databricks) , Mali Gorantla (PointGuard AI)

As enterprises adopt AI and Large Language Models (LLMs), securing and governing these models - and the data used to train them - is essential. In this session, learn how Databricks Partner PointGuard AI helps organizations implement the Databricks AI Security Framework to manage AI-specific risks, ensuring security, compliance, and governance across the entire AI lifecycle. Then, discover how Obsidian Security provides a robust approach to AI security, enabling organizations to confidently scale AI applications.

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

End-to-End Interoperable Data Platform: How Bosch Leverages Databricks Supply Chain Consolidation

2025-06-11 Watch
talk
Satish Karunakaran (Robert Bosch GmbH) , Marc-Alexander Frey (Robert Bosch GmbH)

This session will showcase Bosch’s journey in consolidating supply chain information using the Databricks platform. It will dive into how Databricks not only acts as the central data lakehouse but also integrates seamlessly with transformative components such as dbt and Large Language Models (LLMs). The talk will highlight best practices, architectural considerations, and the value of an interoperable platform in driving actionable insights and operational excellence across complex supply chain processes. Key Topics and Sections Introduction & Business Context Brief Overview of Bosch’s Supply Chain Challenges and the Need for a Consolidated Data Platform. Strategic Importance of Data-Driven Decision-Making in a Global Supply Chain Environment. Databricks as the Core Data Platform Integrating dbt for Transformation Leveraging LLM Models for Enhanced Insights

Generative AI Merchant Matching

Generative AI Merchant Matching

2025-06-11 Watch
lightning_talk
Tomáš Drietomský (Mastercard)

Our project demonstrates building enterprise AI systems cost-effectively, focusing on matching merchant descriptors to known businesses. Using fine-tuned LLMs and advanced search, we created a solution rivaling alternatives at minimal cost. The system works in three steps: A fine-tuned Llama 3 8B model parses merchant descriptors into standardized components. A hybrid search system uses these components to find candidate matches in our database. A Llama 3 70B model then evaluates top candidates, with an AI judge reviewing results for hallucination. We achieved a 400% latency improvement while maintaining accuracy and keeping costs low and each fine-tuning round cost hundreds of dollars. Through careful optimization and simple architecture for a balance between cost, speed and accuracy, we show that small teams with modest budgets can tackle complex problems effectively using this technology. We share key insights on prompt engineering, fine-tuning and cost and latency management.

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

Sponsored by: Cognizant | How Cognizant Helped RJR Transform Market Intelligence with GenAI

2025-06-11 Watch
lightning_talk
Vijay Gandapodi (Reynolds American)

Cognizant developed a GenAI-driven market intelligence chatbot for RJR using Dash UI. This chatbot leverages Databricks Vector Search for vector embeddings and semantic search, along with the DBRX-Instruct LLM model to provide accurate and contextually relevant responses to user queries. The implementation involved loading prepared metadata into a Databricks vector database using the GTE model to create vector embeddings, indexing these embeddings for efficient semantic search, and integrating the DBRX-Instruct LLM into the chat system with prompts to guide the LLM in understanding and responding to user queries. The chatbot also generated responses containing URL links to dashboards with requested numerical values, enhancing user experience and productivity by reducing report navigation and discovery time by 30%. This project stands out due to its innovative AI application, advanced reasoning techniques, user-friendly interface, and seamless integration with MicroStrategy.

LLMOps at Intermountain Health: A Case Study on AI Inventory Agents

LLMOps at Intermountain Health: A Case Study on AI Inventory Agents

2025-06-11 Watch
talk
Mark Nielsen (Intermountain Healthcare)

In this session, we will delve into the creation of an infrastructure, CI/CD processes and monitoring systems that facilitate the responsible and efficient deployment of Large Language Models (LLMs) at Intermountain Healthcare. Using the "AI Inventory Agents" project as a case study, we will showcase how an LLM Agent can assist in effort and impact estimates, as well as provide insights into various AI products, both custom-built and third-party hosted. This includes their responsible AI certification status, development status and monitoring status (lights on, performance, drift, etc.). Attendees will learn how to build and customize their own LLMOps infrastructure to ensure seamless deployment and monitoring of LLMs, adhering to responsible AI practices.

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

Sponsored by: Dataiku | Engineering Trustworthy AI Agents with LLM Mesh + Mosaic AI

2025-06-11 Watch
lightning_talk
Dmitri Ryssev (Dataiku)

AI agent systems hold immense promise for automating complex tasks and driving intelligent decision‑making, but only when they are engineered to be both resilient and transparent. In this session we will explore how Dataiku’s LLM Mesh pairs with Databricks Mosaic AI to streamline the entire lifecycle: ingesting and preparing data in the Lakehouse, prompt engineering LLMs hosted on Mosaic AI Model Serving Endpoints, visually orchestrating multi‑step chains, and monitoring them in real time. We’ll walk through a live demo of a Dataiku flow that connects to a Databricks hosted model, adds automated validation, lineage, and human‑in‑the‑loop review, then exposes the agent via Dataiku's Agent Connect interface. You’ll leave with actionable patterns for setting guardrails, logging decisions, and surfacing explanations—so your organization can deploy trustworthy domain‑specific agents faster & safer.

Streamlining AI Application Development With Databricks Apps

Streamlining AI Application Development With Databricks Apps

2025-06-11 Watch
lightning_talk
Domonkos Pal (Hiflylabs Zrt.)

Think Databricks is just for data and models? Think again. In this session, you’ll see how to build and scale a full-stack AI app capable of handling thousands of queries per second entirely on Databricks. No extra cloud platforms, no patchwork infrastructure. Just one unified platform with native hosting, LLM integration, secure access, and built-in CI/CD. Learn how Databricks Apps, along with services like Model Serving, Jobs, and Gateways, streamline your architecture, eliminate boilerplate, and accelerate development, from prototype to production.

Sponsored by: Snorkel AI | Evaluating and Improving Performance of Agentic Systems

Sponsored by: Snorkel AI | Evaluating and Improving Performance of Agentic Systems

2025-06-11 Watch
lightning_talk
Chris Borg (Snorkel AI)

GenAI systems are evolving beyond basic information retrieval and question answering, becoming sophisticated agents capable of managing multi-turn dialogues and executing complex, multi-step tasks autonomously. However, reliably evaluating and systematically improving their performance remains challenging. In this session, we'll explore methods for assessing the behavior of LLM-driven agentic systems, highlighting techniques and showcasing actionable insights to identify performance bottlenecks and to creating better-aligned, more reliable agentic AI systems.

Adobe’s Security Lakehouse: OCSF, Data Efficiency and Threat Detection at Scale

Adobe’s Security Lakehouse: OCSF, Data Efficiency and Threat Detection at Scale

2025-06-11 Watch
talk
Bharat Gamini (Adobe) , Andrew Krioukov (Antimatter)

This session will explore how Adobe uses a sophisticated data security architecture built on the Databricks Data Intelligence Platform, along with the Open Cybersecurity Schema Framework (OCSF), to enable scalable, real-time threat detection across more than 10 PB of security data. We’ll compare different approaches to OCSF implementation and demonstrate how Adobe processes massive security datasets efficiently — reducing query times by 18%, maintaining 99.4% SLA compliance, and supporting 286 security users across 17 teams with over 4,500 daily queries. By using Databricks' Platform for serverless compute, scalable architecture, and LLM-powered recommendations, Adobe has significantly improved processing speed and efficiency, resulting in substantial cost savings. We’ll also highlight how OCSF enables advanced cross-tool analytics and automation, streamlining investigations. Finally, we’ll introduce Databricks’ new open-source OCSF toolkit for scalable security data normalization and invite the community to contribute.

Generating Laughter: Testing and Evaluating the Success of LLMs for Comedy

Generating Laughter: Testing and Evaluating the Success of LLMs for Comedy

2025-06-11 Watch
lightning_talk
Erin Staples (Galileo)

Nondeterministic AI models, like large language models (LLMs), offer immense creative potential but require new approaches to testing and scalability. Drawing from her experience running New York Times-featured Generative AI comedy shows, Erin uncovers how traditional benchmarks may fall short and how embracing unpredictability can lead to innovative, laugh-inducing results. This talk will explore methods like multi-tiered feedback loops, chaos testing and exploratory user testing, where AI outputs are evaluated not by rigid accuracy standards but by their adaptability and resonance across different contexts — from comedy generation to functional applications. Erin will emphasize the importance of establishing a root source of truth — a reliable dataset or core principle — to manage consistency while embracing creativity. Whether you’re looking to generate a few laughs of your own or explore creative uses of Generative AI, this talk will inspire and delight enthusiasts of all levels.