Thursday Keynote (Virtual Replay)
Be first to witness the latest breakthroughs from Databricks and share the success of innovative data and AI companies.
Activities tracked
715
Sessions & talks
Showing 1–25 of 715 · Newest first
Be first to witness the latest breakthroughs from Databricks and share the success of innovative data and AI companies.
Josue is well known for his practical perspectives on the data and AI landscape. We'll talk about what he is seeing in the market, his take on product feature updates, and some humor mixed in.
Each year at Summit, Women in Data and AI have a half day for in-person discussions on empowering Women in Data and AI Breakfast, and networking with like-minded professionals and trailblazers. For this virtual discussion, hear from Kate Ostbye (Pfizer), Lisa Cohen (Anthropic), Pallavi Koppol and Holly Smith (Databricks) about navigating challenges, celebrating successes, and inspire one another as we champion diversity and innovation in data together. And how to get involved year-round.
Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.
Join us for an insightful Ask Me Anything (AMA) session on Declarative Pipelines — a powerful approach to simplify and optimize data workflows. Learn how to define data transformations using high-level, SQL-like semantics, reducing boilerplate code while improving performance and maintainability. Whether you're building ETL processes, feature engineering pipelines, or analytical workflows, this session will cover best practices, real-world use cases and how Declarative Pipelines can streamline your data applications. Bring your questions and discover how to make your data processing more intuitive and efficient!
As companies increasingly adopt Generative AI, they're faced with a new challenge: managing multiple AI assistants. What if you could have a single, intuitive interface that automatically directs questions to the best assistant for the task? Join us to discover how to implement a flexible Routing Agent that streamlines working with multiple AI Assistants. We'll show you how to leverage Databricks and DSPy 3.0 to simplify adding this powerful pattern to your system. We'll dive into the essential aspects including: Using DSPy optimizers to maximize correct route selections Optimizing smaller models to reduce latency Creating stateful interactions Designing for growth and adaptability to support tens or hundreds of AI Assistants Ensuring authorized access to AI Assistants Tracking performance in production environments We'll share real-world examples that you can apply today. You'll leave with the knowledge to make your AI system run smoothly and efficiently.
While modern lakehouse architectures and open-table formats provide flexibility, they are often challenging to manage. Data layouts, clustering, and small files need to be managed for efficiency. Qbeast’s platform-independent patented muti-column indexing optimizes lakehouse data layout, accelerates queries, and sharply reduces compute cost — without disrupting existing architectures. Qbeast also handles high-cardinality clustering and supports incremental updates. Join us to explore how Qbeast enables efficient, scalable, AI-ready data infrastructure — reducing compute costs independent of data platform and compute engine.
At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.
Government leaders overwhelmingly recognize the potential benefits of AI as critical to long-term strategic goals of efficiency, but implementation challenges and security concerns could be obstacles to success.
Explore advanced governance and authentication patterns for building secure, enterprise-grade apps with Databricks Apps. Learn how to configure complex permissions and manage access control using Unity Catalog. We’ll dive into “on-behalf-of-user” authentication — allowing agents to enforce user-specific access controls — and cover API-based authentication, including PATs and OAuth flows for external integrations. We’ll also highlight how Addepar uses these capabilities to securely build and scale applications that handle sensitive financial data. Whether you're building internal tools or customer-facing apps, this session will equip you with the patterns and tools to ensure robust, secure access in your Databricks apps.
Is your AI evaluation process holding back your system's true potential? Many organizations struggle with improving GenAI quality because they don't know how to measure it effectively. This research session covers the principles of GenAI evaluation, offers a framework for measuring what truly matters, and demonstrates implementation using Databricks.Key Takeaways:-Practical approaches for establishing reliable metrics for subjective evaluations-Techniques for calibrating LLM judges to enable cost-effective, scalable assessment-Actionable frameworks for evaluation systems that evolve with your AI capabilitiesWhether you're developing models, implementing AI solutions, or leading technical teams, this session will equip you to define meaningful quality metrics for your specific use cases and build evaluation systems that expose what's working and what isn't, transforming AI guesswork into measurable success.
Taxonomy generation is a challenge across industries such as retail, manufacturing and e-commerce. Incomplete or inconsistent taxonomies can lead to fragmented data insights, missed monetization opportunities and stalled revenue growth. In this session, we will explore a modern approach to solving this problem by leveraging Databricks platform to build a scalable compound AI architecture for automated taxonomy generation. The first half of the session will walk you through the business significance and implications of taxonomy, followed by a technical deep dive in building an architecture for taxonomy implementation on the Databricks platform using a compound AI architecture. We will walk attendees through the anatomy of taxonomy generation, showcasing an innovative solution that combines multimodal and text-based LLMs, internal data sources and external API calls. This ensemble approach ensures more accurate, comprehensive and adaptable taxonomies that align with business needs.
The insurance industry is at the crossroads of digital transformation, facing challenges from market competition and customer expectations. While conventional ML applications have historically provided capabilities in this domain, the emergence of Agentic AI frameworks presents a revolutionary opportunity to build truly autonomous insurance applications. We will address issues related to data governance and quality while discussing how to monitor/evaluate fine-tune models. We'll demonstrate the application of the agentic framework in the insurance context and how these autonomous agents can work collaboratively to handle complex insurance workflows — from submission intake and risk evaluation to expedited quote generation. This session demonstrates how to architect intelligent insurance solutions using Databricks Mosaic AI agentic core components including Unity Catalog, Playground, model evaluation/guardrails, privacy filters, AI functions and AI/BI Genie.
This session explains how we've made our Apache Spark™ versionless for end users by introducing a stable client API, environment versioning and automatic remediation. These capabilities have enabled auto-upgrade of hundreds of millions of workloads with minimal disruption for Serverless Notebooks and Jobs. We'll also introduce a new approach to dependency management using environments. Admins will learn how to speed up package installation with Default Base Environments, and users will see how to manage custom environments for their own workloads.
Modern data organizations have moved beyond big data analytics to also incorporate advanced AI/ML data workloads. These workflows often involve multimodal datasets containing documents, images, long-form text, embeddings, URLs and more. Unity Catalog is an ideal solution for organizing and governing this data at scale. When paired with the Daft open source data engine, you can build a truly multimodal, AI-ready data lakehouse. In this session, we’ll explore how Daft integrates with Unity Catalog’s core features (such as volumes and functions) to enable efficient, AI-driven data lakehouses. You will learn how to ingest and process multimodal data (images, text and videos), run AI/ML transformations and feature extractions at scale, and maintain full control and visibility over your data with Unity Catalog’s fine-grained governance.
Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.
In enterprise AI, Evaluation-Driven Development (EDD) ensures reliable, efficient systems by embedding continuous assessment and improvement into the AI development lifecycle. High-quality evaluation datasets are created using techniques like document analysis, synthetic data generation via Mosaic AI’s synthetic data generation API, SME validation, and relevance filtering, reducing manual effort and accelerating workflows. EDD focuses on metrics such as context relevance, groundedness, and response accuracy to identify and address issues like retrieval errors or model limitations. Custom LLM judges, tailored to domain-specific needs like PII detection or tone assessment, enhance evaluations. By leveraging tools like Mosaic AI Agent Framework and Agent Evaluation, MLflow, EDD automates data tracking, streamlines workflows, and quantifies improvements, transforming AI development for delivering scalable, high-performing systems that drive measurable organizational value.
This is an overview of migrating from Apache Airflow to Lakeflow Jobs for modern data orchestration. It covers key differences, best practices and practical examples of transitioning from traditional Airflow DAGs orchestrating legacy systems to declarative, incremental ETL pipelines with Lakeflow. Attendees will gain actionable tips on how to improve efficiency, scalability and maintainability in their workflows.
I have metrics, you have metrics — we all have metrics. But the real problem isn’t having metrics, it’s that the numbers never line up, leading to endless cycles of reconciliation and confusion. Join us as we share how our Data Team at Databricks tackled this fundamental challenge in Business Intelligence by building an internal Metric Store — creating a single source of truth for all business metrics using the newly-launched UC Metric Views. Imagine a world where numbers always align, metric definitions are consistently applied across the organization and every metric comes with built-in ML-based forecasting, AI-powered anomaly detection and automatic explainability. That’s the future we’ve built — and we’ll show you how you can get started today.
The Apache Iceberg™ community is introducing native geospatial type support, addressing key challenges in managing geospatial data at scale, including fragmented formats and inefficiencies in storing large spatial datasets. This talk will delve into the origins of the Iceberg geo type, its specification design and future goals. We will examine the impact on both the geospatial and Iceberg communities, in introducing a standard data warehouse storage layer to the geospatial community, and enabling optimized geospatial analytics for Iceberg users. We will also present a live demonstration of the Iceberg geo data type with Apache Sedona™ and Apache Spark™, showcasing how it simplifies and accelerates geospatial analytics workflows and queries. Finally, we will also provide an in-depth look at its current capabilities and outline the roadmap for future developments, and offer a perspective on its role in advancing geospatial data management in the industry.
Monitoring data pipelines is key to reliability at scale. In this session, we’ll dive into the observability experience in Lakeflow, Databricks’ unified DE solution — from intuitive UI monitoring to advanced event analysis, cost observability and custom dashboards. We’ll walk through the revamped UX for Lakeflow observability, showing how to: Monitor runs and task states, dependencies and retry behavior in the UI Set up alerts for job and pipeline outcomes + failures Use pipeline and job system tables for historical insights Explore run events and event logs for root cause analysis Analyze metadata to understand and optimize pipeline spend How to build custom dashboards using system tables to track performance data quality, freshness, SLAs and failure trends, and drive automated alerting based on real-time signals This session will help you unlock full visibility into your data workflows.
Discover how the latest innovations in Databricks AI/BI Dashboards and Genie are transforming self-service analytics. This session offers a high-level tour of new capabilities that empower business users to ask questions in natural language, generate insights faster and make smarter decisions. Whether you're a long-time Databricks user or just exploring what's possible with AI/BI, you'll walk away with a clear understanding of how these tools are evolving — and how to leverage them for greater business impact.
Join two energy industry leaders as they showcase groundbreaking applications of AI and data solutions in modern oil and gas operations. NOV demonstrates how their Generative AI pipeline revolutionized drilling mud report processing, automating the analysis of 300 reports daily with near-perfect accuracy and real-time analytics capabilities. BP shares how Unity Catalog has transformed their enterprise-wide data strategy, breaking down silos while maintaining robust governance and security. Together, these case studies illustrate how AI and advanced analytics are enabling cleaner, more efficient energy operations while maintaining the reliability demanded by today's market.
Data warehousing in enterprise and mission-critical environments needs special consideration for price/performance. This session will explain how Databricks SQL addresses the most challenging requirements for high-concurrency, low-latency performance at scale. We will also cover the latest advancements in resource-based scheduling, autoscaling and caching enhancements that allow for seamless performance and workload management.
The insurance industry is rapidly evolving as advances in data and artificial intelligence (AI) drive innovation, enabling more personalized customer experiences, streamlined operations, and improved efficiencies. With powerful data analytics and AI-driven solutions, insurers can automate claims processing, enhance risk management, and make real-time decisions. Leveraging insights from large and complex datasets, organizations are delivering more customer-centric products and services than ever before. Key takeaways: Real-world applications of data and AI in claims automation, underwriting, and customer engagementHow predictive analytics and advanced data modeling help anticipate risks and meet customer needs. Personalization of policies, optimized pricing, and more efficient workflows for greater ROI. Discover how data and AI are fueling growth, improving protection, and shaping the future of the insurance industry!