Data Quality

Lakeflow Observability: From UI Monitoring to Deep Analytics

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Saad Ansari (Databricks) , Theresa Hammer (Databricks)

Analytics Databricks

Monitoring data pipelines is key to reliability at scale. In this session, we’ll dive into the observability experience in Lakeflow, Databricks’ unified DE solution — from intuitive UI monitoring to advanced event analysis, cost observability and custom dashboards. We’ll walk through the revamped UX for Lakeflow observability, showing how to: Monitor runs and task states, dependencies and retry behavior in the UI Set up alerts for job and pipeline outcomes + failures Use pipeline and job system tables for historical insights Explore run events and event logs for root cause analysis Analyze metadata to understand and optimize pipeline spend How to build custom dashboards using system tables to track performance data quality, freshness, SLAs and failure trends, and drive automated alerting based on real-time signals This session will help you unlock full visibility into your data workflows.

Health Data, Delivered: How Lakeflow Declarative Pipelines Powers the HealthVerity Marketplace

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Ron DeFreitas (HealthVerity)

Databricks ETL/ELT

Building scalable, reliable ETL pipelines is a challenge for organizations managing large, diverse data sources. Theseus, our custom ETL framework, streamlines data ingestion and transformation by fully leveraging Databricks-native capabilities, including Lakeflow Declarative Pipelines, auto loader and event-driven orchestration. By decoupling supplier logic and implementing structured bronze, silver, and gold layers, Theseus ensures high-performance, fault-tolerant data processing with minimal operational overhead. The result? Faster time-to-value, simplified governance and improved data quality — all within a declarative framework that reduces engineering effort. In this session, we’ll explore how Theseus automates complex data workflows, optimizes cost efficiency and enhances scalability, showcasing how Databricks-native tools drive real business outcomes.

Monitor Quality and Compliance at Scale with Data Intelligence Powered by Unity Catalog

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jacqueline Li (Databricks) , Danny Chiao (Databricks)

AI/ML

Learn how Data Profiling, Data Quality Monitoring, and Data Classification come together to provide end-to-end visibility into the health of your data and AI pipelines.

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Simon Whiteley (Advancing Analytics)

AI/ML Data Engineering LLM

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Sponsored by: Anomalo | Reconciling IoT, Policy, and Insurer Data to Deliver Better Customer Discounts

From 10 Hours to 10 Minutes: Unleashing the Power of Lakeflow Declarative Pipelines

2025-06-12 · Data + AI Summit 2025

talk

by Sidney Cardoso (Michelin) , Yash Joshi (Accenture)

Analytics Azure ADF BI Databricks Power BI SQL

How do you transform a data pipeline from sluggish 10-hour batch processing into a real-time powerhouse that delivers insights in just 10 minutes? This was the challenge we tackled at one of France's largest manufacturing companies, where data integration and analytics were mission-critical for supply chain optimization. Power BI dashboards needed to refresh every 15 minutes. Our team struggled with legacy Azure Data Factory batch pipelines. These outdated processes couldn’t keep up, delaying insights and generating up to three daily incident tickets. We identified Lakeflow Declarative Pipelines and Databricks SQL as the game-changing solution to modernize our workflow, implement quality checks, and reduce processing times.In this session, we’ll dive into the key factors behind our success: Pipeline modernization with Lakeflow Declarative Pipelines: improving scalability Data quality enforcement: clean, reliable datasets Seamless BI integration: Using Databricks SQL to power fast, efficient queries in Power BI

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Adriana Ispas (Databricks) , Lennart Kats (Databricks)

CI/CD Data Engineering Databricks Git

Building robust, production-grade data pipelines goes beyond writing transformation logic — it requires rigorous testing, version control, automated CI/CD workflows and a clear separation between development and production. In this talk, we’ll demonstrate how Lakeflow, paired with Databricks Asset Bundles (DABs), enables Git-based workflows, automated deployments and comprehensive testing for data engineering projects. We’ll share best practices for unit testing, CI/CD automation, data quality monitoring and environment-specific configurations. Additionally, we’ll explore observability techniques and performance tuning to ensure your pipelines are scalable, maintainable and production-ready.

Sponsored by: Oxylabs | Web Scraping and AI: A Quiet but Critical Partnership

How the Texas Rangers Use a Unified Data Platform to Drive World Class Baseball Analytics

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Michael Topol (Texas Rangers) , Oliver Dykstra (Texas Rangers)

AI/ML Analytics Databricks MLOps

Don't miss this session where we demonstrate how the Texas Rangers baseball team is staying one step ahead of the competition by going back to the basics. After implementing a modern data strategy with Databricks and winnng the 2023 World Series the rest of the league quickly followed suit. Now more than ever, data and AI are a central pillar of every baseball team's strategy driving profound insights into player performance and game dynamics. With a 'fundamentals win games' back to the basics focus, join us as we explain our commmitment to world-class data quality, engineering, and MLOPS by taking full advantage of the Databricks Data Intelligence Platform. From system tables to federated querying, find out how the Rangers use every tool at their disposal to stay one step ahead in the hyper competitive world of baseball.

Sponsored by: KPMG | Enhancing Regulatory Compliance through Data Quality and Traceability

Optimizing Analytics Infrastructure: Lessons from Migrating Snowflake to Databricks

2025-06-11 · Data + AI Summit 2025 Watch

talk

by AMIT RUSTAGI (DeeplearningAPI)

Analytics Data Lake Databricks Snowflake

This session explores the strategic migration from Snowflake to Databricks, focusing on the journey of transforming a data lake to leverage Databricks’ advanced capabilities. It outlines the assessment of key architectural differences, performance benchmarks, and cost implications driving the decision. Attendees will gain insights into planning and execution, including data ingestion pipelines, schema conversion and metadata migration. Challenges such as maintaining data quality, optimizing compute resources and minimizing downtime are discussed, alongside solutions implemented to ensure a seamless transition. The session highlights the benefits of unified analytics and enhanced scalability achieved through Databricks, delivering actionable takeaways for similar migrations.

Fueling Efficiency: How Pilot Uses Vector Stores, Data Quality, and GenAI to Deliver Business Value

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Travis Lawrence (Pilot Travel Centers)

AI/ML GenAI

In the complex world of logistics, efficiency and accuracy are paramount. At Pilot, the largest travel center network in North America, managing fuel delivery operations was a time-intensive and error-prone process. Tasks like processing delivery records and validating fuel transaction data posed significant challenges due to the diverse formats and handwritten elements involved. After several attempts to use robotic process automation failed, the team turned to Generative AI to automate and streamline this critical business process. In this session, discover how Pilot leverages GenAI, powered by advanced text and vision models, to revolutionize BOL processing. By implementing few-shot learning and vectorized examples, the data team at Pilot was able to increase document parsing accuracy from 70% to 95%, enabling real-time validation against truck driver inputs, which has resulted in millions of savings from accelerating credit reconciliation and improved financial operations.

Build AI-Powered Applications Natively on Databricks

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Andre Furlan Bueno (Databricks)

AI/ML Analytics DataViz Databricks

Discover how to build and deploy AI-powered applications natively on the Databricks Data Intelligence Platform. This session introduces best practices and a standard reference architecture for developing production-ready apps using popular frameworks like Dash, Shiny, Gradio, Streamlit and Flask. Learn how to leverage agents for orchestration and explore primary use cases supported by Databricks Apps, including data visualization, AI applications, self-service analytics and data quality monitoring. With serverless deployment and built-in governance through Unity Catalog, Databricks Apps enables seamless integration with your data and AI models, allowing you to focus on delivering impactful solutions without the complexities of infrastructure management. Whether you're a data engineer or an app developer, this session will equip you with the knowledge to create secure, scalable and efficient applications within a Databricks environment.

Busting Data Modeling Myths: Truths and Best Practices for Data Modeling in the Lakehouse

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Kyle Hale (Databricks) , Shannon Barrow (Databricks)

Analytics Data Lakehouse Data Modelling Databricks

Unlock the truth behind data modeling in Databricks. This session will tackle the top 10 myths surrounding relational and dimensional data modeling. Attendees will gain a clear understanding of what Databricks Lakehouse truly supports today, including how to leverage primary and foreign keys, identity columns for surrogate keys, column-level data quality constraints and much more. This session will talk through the lens of medallion architecture, explaining how to implement data models across bronze, silver, and gold tables. Whether you’re migrating from a legacy warehouse or building new analytics solutions, you’ll leave equipped to fully leverage Databricks’ capabilities, and design scalable, high-performance data models for enterprise analytics.

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

The Upcoming Apache Spark 4.1: The Next Chapter in Unified Analytics

2025-06-11 · Data + AI Summit 2025 Watch

talk

by DB Tsai (Databricks) , Xiao Li (Databricks)

Analytics API ETL/ELT PySpark Python Spark

Apache Spark has long been recognized as the leading open-source unified analytics engine, combining a simple yet powerful API with a rich ecosystem and top-notch performance. In the upcoming Spark 4.1 release, the community reimagines Spark to excel at both massive cluster deployments and local laptop development. We’ll start with new single-node optimizations that make PySpark even more efficient for smaller datasets. Next, we’ll delve into a major “Pythonizing” overhaul — simpler installation, clearer error messages and Pythonic APIs. On the ETL side, we’ll explore greater data source flexibility (including the simplified Python Data Source API) and a thriving UDF ecosystem. We’ll also highlight enhanced support for real-time use cases, built-in data quality checks and the expanding Spark Connect ecosystem — bridging local workflows with fully distributed execution. Don’t miss this chance to see Spark’s next chapter!

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Kevin Wilson (NCS Australia)

Analytics Data Governance Data Lakehouse Databricks dbt ETL/ELT SQL

In this session, we will share NCS’s approach to implementing a Databricks Lakehouse architecture, focusing on key lessons learned and best practices from our recent implementations. By integrating Databricks SQL Warehouse, the DBT Transform framework and our innovative test automation framework, we’ve optimized performance and scalability, while ensuring data quality. We’ll dive into how Unity Catalog enabled robust data governance, empowering business units with self-serve analytical workspaces to create insights while maintaining control. Through the use of solution accelerators, rapid environment deployment and pattern-driven ELT frameworks, we’ve fast-tracked time-to-value and fostered a culture of innovation. Attendees will gain valuable insights into accelerating data transformation, governance and scaling analytics with Databricks.

Achieving AI Success with a Solid Data Foundation

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Santosh Kudva (GE Vernova) , Kevin Tollison (EY)

AI/ML Analytics Databricks GenAI

Join for an insightful presentation on creating a robust data architecture to drive business outcomes in the age of Generative AI. Santosh Kudva, GE Vernova Chief Data Officer and Kevin Tollison, EY AI Consulting Partner, will share their expertise on transforming data strategies to unleash the full potential of AI. Learn how GE Vernova, a dynamic enterprise born from the 2024 spin-off of GE, revamped its diverse landscape. They will provide a look into how they integrated the pre-spin-off Finance Data Platform into the GE Vernova Enterprise Data & Analytics ecosystem utilizing Databricks to enable high-performance AI-led analytics. Key insights include: Incorporating Generative AI into your overarching strategy Leveraging comprehensive analytics to enhance data quality Building a resilient data framework adaptable to continuous evolution Don't miss this opportunity to hear from industry leaders and gain valuable insights to elevate your data strategy and AI success.

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Deepak Khetpal (Tiger Analytics) , Surya Sai Turaga (Databricks)

AI/ML Analytics Data Lakehouse Databricks

We will present a framework for FinCrime detection leveraging Databricks lakehouse architecture specifically how institutions can achieve both data flexibility & ACID transaction guarantees essential for FinCrime monitoring. The framework incorporates advanced ML models for anomaly detection, pattern recognition, and predictive analytics, while maintaining clear data lineage & audit trails required by regulatory bodies. We will also discuss some specific improvements in reduction of false positives, improvement in detection speed, and faster regulatory reporting, delve deep into how the architecture addresses specific FATF recommendations, Basel III risk management requirements, and BSA compliance obligations, particularly in transaction monitoring and SAR. The ability to handle structured and unstructured data while maintaining data quality and governance makes it particularly valuable for large financial institutions dealing with complex, multi-jurisdictional compliance requirements.

talk-data.com

Activity Trend

Top Events

Top Speakers

Lakeflow Observability: From UI Monitoring to Deep Analytics

Health Data, Delivered: How Lakeflow Declarative Pipelines Powers the HealthVerity Marketplace

Monitor Quality and Compliance at Scale with Data Intelligence Powered by Unity Catalog

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

Sponsored by: Anomalo | Reconciling IoT, Policy, and Insurer Data to Deliver Better Customer Discounts

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

From 10 Hours to 10 Minutes: Unleashing the Power of Lakeflow Declarative Pipelines

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

Sponsored by: Oxylabs | Web Scraping and AI: A Quiet but Critical Partnership

How the Texas Rangers Use a Unified Data Platform to Drive World Class Baseball Analytics

Sponsored by: KPMG | Enhancing Regulatory Compliance through Data Quality and Traceability

Optimizing Analytics Infrastructure: Lessons from Migrating Snowflake to Databricks

Fueling Efficiency: How Pilot Uses Vector Stores, Data Quality, and GenAI to Deliver Business Value

Build AI-Powered Applications Natively on Databricks

Busting Data Modeling Myths: Truths and Best Practices for Data Modeling in the Lakehouse

Sponsored by: Informatica | Modernize analytics and empower AI in Databricks with trusted data using Informatica

The Upcoming Apache Spark 4.1: The Next Chapter in Unified Analytics

Accelerating Data Transformation: Best Practices for Governance, Agility and Innovation

Achieving AI Success with a Solid Data Foundation

Enterprise Financial Crime Detection: A Lakehouse Framework for FATF, Basel III, and BSA Compliance