Data Engineering

Kill Bill-ing? Revenge is a Dish Best Served Optimized with GenAI

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Abdul Furkhan (Sportsbet)

AI/ML Cloud Computing Databricks GenAI Spark

In an era where cloud costs can spiral out of control, Sportsbet achieved a remarkable 49% reduction in Total Cost of Ownership (TCO) through an innovative AI-powered solution called 'Kill Bill.' This presentation reveals how we transformed Databricks' consumption-based pricing model from a challenge into a strategic advantage through an intelligent automation and optimization. Understand how to use GenAI to reduce Databricks TCO Leverage generative AI within Databricks solutions enables automated analysis of cluster logs, resource consumption, configurations, and codebases to provide Spark optimization suggestions Create AI agentic workflows by integrating Databricks' AI tools and Databricks Data Engineering tools Review a case study demonstrating how Total Cost of Ownership was reduced in practice. Attendees will leave with a clear understanding of how to implement AI within Databricks solutions to address similar cost challenges in their environments.

Sponsored by: Dagster Labs | The Age of AI is Changing Data Engineering for Good

Databricks Lakeflow: the Foundation of Data + AI Innovation for Your Industry

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Sam Sawyer (Databricks) , Ori Zohar (Databricks)

AI/ML Analytics BI Databricks

Every analytics, BI and AI project relies on high-quality data. This is why data engineering, the practice of building reliable data pipelines that ingest and transform data, is consequential to the success of these projects. In this session, we'll show how you can use Lakeflow to accelerate innovation in multiple parts of the organization. We'll review real-world examples of Databricks customers using Lakeflow in different industries such as automotive, healthcare and retail. We'll touch on how the foundational data engineering capabilities Lakeflow provides help power initiatives that improve customer experiences, make real-time decisions and drive business results.

Getting the Most Out of Lakeflow Declarative Pipelines: A Deep Dive on What’s New and Best Practices

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Michael Armbrust (Databricks)

This deep dive covers advanced usage patterns, tips and best practices for maximizing the potential of Lakeflow Declarative Pipelines. Attendees will explore new features, enhanced workflows and cost-optimization strategies through a demo-heavy presentation. The session will also address complex use cases, showcasing how Lakeflow Declarative Pipelines simplifies the management of robust data pipelines while maintaining scalability and efficiency across diverse data engineering challenges.

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals (repeat)

2025-06-12 · Data + AI Summit 2025

talk

by Frank Munz (Databricks)

AI/ML Data Governance Databricks GenAI GitHub SQL Data Streaming

This session is repeated. This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

IQVIA’s Serverless Journey: Enabling Data and AI in a Regulated World

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Alex Esibov (Databricks) , Matthew Schwartz (IQVIA)

AI/ML Analytics Data Analytics Databricks Cyber Security

Your data and AI use-cases are multiplying. At the same time, there is increased focus and scrutiny to meet sophisticated security and regulatory requirements. IQVIA utilizes serverless use-cases across data engineering, data analytics, and ML and AI, to empower their customers to make informed decisions, support their R&D processes and improve patient outcomes. By leveraging native controls on the platform, serverless enables them to streamline their use cases while maintaining a strong security posture, top performance and optimized costs. This session will go over IQVIA’s journey to serverless, how they met their security and regulatory requirements, and the latest and upcoming enhancements to the Databricks Platform.

A Practical Roadmap to Becoming an Expert Databricks Data Engineer

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Derar Alhussein (Acadford)

Databricks

The demand for skilled Databricks data engineers continues to rise as enterprises accelerate their adoption of the Databricks platform. However, navigating the complex ecosystem of data engineering tools, frameworks and best practices can be overwhelming. This session provides a structured roadmap to becoming an expert Databricks data engineer, offering a clear progression from foundational skills to advanced capabilities. Acadford, a leading training provider, has successfully trained thousands of data engineers on Databricks, equipping them with the skills needed to excel in their careers and obtain professional certifications. Drawing on this experience, we will guide attendees through the most in-demand skills and knowledge areas through a combination of structured learning and practical insights. Key takeaways: Understand the core tech stack in Databricks Explore real-world code examples and live demonstrations Receive an actionable learning path with recommended resources

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Simon Whiteley (Advancing Analytics)

AI/ML Data Quality LLM

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

Democratizing Data Engineering with Databricks and dbt at Ludia

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jean-Christophe Rodrigue (Ludia) , Huntting Buckley (Databricks)

Databricks dbt

Ludia, a leading mobile gaming company, is empowering its analysts and domain experts by democratizing data engineering with Databricks and dbt. This talk explores how Ludia enabled cross-functional teams to build and maintain production-grade data pipelines without relying solely on centralized data engineering resources—accelerating time to insight, improving data reliability, and fostering a culture of data ownership across the organization.

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Krishnakumar Sivasubramanian (NFCU) , Ricardo Portilla (Databricks)

AI/ML Cloud Computing Data Lake DWH

Navy Federal Credit Union has 200+ enterprise data sources in the enterprise data lake. These data assets are used for training 100+ machine learning models and hydrating a semantic layer for serving, at an average 4,000 business users daily across the credit union. The only option for extracting data from analytic semantic layer was to allow consuming application to access it via an already-overloaded cloud data warehouse. Visualizing data lineage for 1,000 + data pipelines and associated metadata is impossible and understanding the granular cost for running data pipelines is a challenge. Implementing Unity Catalog opened alternate path for accessing analytic semantic data from lake. It also opened the doors to remove duplicate data assets stored across multiple lakes which will save hundred thousands of dollars in data engineering efforts, compute and storage costs.

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Adriana Ispas (Databricks) , Lennart Kats (Databricks)

CI/CD Data Quality Databricks Git

Building robust, production-grade data pipelines goes beyond writing transformation logic — it requires rigorous testing, version control, automated CI/CD workflows and a clear separation between development and production. In this talk, we’ll demonstrate how Lakeflow, paired with Databricks Asset Bundles (DABs), enables Git-based workflows, automated deployments and comprehensive testing for data engineering projects. We’ll share best practices for unit testing, CI/CD automation, data quality monitoring and environment-specific configurations. Additionally, we’ll explore observability techniques and performance tuning to ensure your pipelines are scalable, maintainable and production-ready.

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

Harnessing Databricks Asset Bundles: Transforming Pipeline Management at Scale at Stack Overflow

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Chelsea Zhang (Stack Overflow)

Databricks Cyber Security

Discover how Stack Overflow optimized its data engineering workflows using Databricks Asset Bundles (DABs) for scalable and efficient pipeline deployments. This session explores the structured pipeline architecture, emphasizing code reusability, modular design and bundle variables to ensure clarity and data isolation across projects. Learn how the data team leverages enterprise infrastructure to streamline deployment across multiple environments. Key topics include DRY-principled modular design, essential DAB features for automation and data security strategies using Unity Catalog. Designed for data engineers and teams managing multi-project workflows, this talk offers actionable insights on optimizing pipelines with Databricks evolving toolset.

Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Luke Garzia (Mastercard) , Brandon DeShon (Mastercard)

AI/ML Big Data Delta

We discuss two real-world use cases in big data engineering, focusing on constructing stable pipelines and managing storage at a petabyte scale. The first use case highlights the implementation of Delta Lake to optimize data pipelines, resulting in an 80% reduction in query time and a 70% reduction in storage space. The second use case demonstrates the effectiveness of the Workflows ‘ForEach’ operator in executing compute-intensive pipelines across multiple clusters, significantly reducing processing time from months to days. This approach involves a reusable design pattern that isolates notebooks into units of work, enabling data scientists to independently test and develop.

Sponsored by: Monte Carlo | Cleared for Takeoff: How American Airlines Builds Data Trust

Cost-Effective Data Architecture and AI Practice With Databricks at FunPlus

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Chao Chen (FunPlus)

AI/ML Databricks

FunPlus's journey to building a cost-effective and efficient data platform with Databricks: exploring how FunPlus leveraged Databricks to tackle key challenges, enhance data engineering and ML efficiency, and showcasing best practices and their impact on game development and operations.

Metadata-Driven Streaming Ingestion Using Lakeflow Declarative Pipelines, Azure Event Hubs and a Schema Registry

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Vicky Avison (Plexure)

Azure Marketing React Data Streaming

At Plexure, we ingest hundreds of millions of customer activities and transactions into our data platform every day, fuelling our personalisation engine and providing insights into the effectiveness of marketing campaigns.We're on a journey to transition from infrequent batch ingestion to near real-time streaming using Azure Event Hubs and Lakeflow Declarative Pipelines. This transformation will allow us to react to customer behaviour as it happens, rather than hours or even days later.It also enables us to move faster in other ways. By leveraging a Schema Registry, we've created a metadata-driven framework that allows data producers to: Evolve schemas with confidence, ensuring downstream processes continue running smoothly. Seamlessly publish new datasets into the data platform without requiring Data Engineering assistance. Join us to learn more about our journey and see how we're implementing this with Lakeflow Declarative Pipelines meta-programming - including a live demo of the end-to-end process!

Transforming Data Pipeline Management With a Targeted Proof of Concept

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Yi-Chen Tu (Capital One Financial) , Raghu Valluri (Capital One Financial)

AI/ML

At Capital One, data-driven decision making is paramount to our success. This session explores how a focused proof of concept (POC) accelerated a shift in our data pipeline management strategy, resulting in operational improvements and expanded analytical capabilities. We'll cover the business challenges that motivated POC initiation, including data latency, cost savings and scalability limitations, and real-world results. We'll also dive into an examination of the before-and-after architecture with highlights for key technological levers. This session offers insights for data engineering and machine learning practitioners seeking to optimize their data pipelines for improved performance, scalability and business value.

talk-data.com

Activity Trend

Top Events

Top Speakers

Kill Bill-ing? Revenge is a Dish Best Served Optimized with GenAI

Sponsored by: Dagster Labs | The Age of AI is Changing Data Engineering for Good

Databricks Lakeflow: the Foundation of Data + AI Innovation for Your Industry

Getting the Most Out of Lakeflow Declarative Pipelines: A Deep Dive on What’s New and Best Practices

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals (repeat)

IQVIA’s Serverless Journey: Enabling Data and AI in a Regulated World

A Practical Roadmap to Becoming an Expert Databricks Data Engineer

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

Democratizing Data Engineering with Databricks and dbt at Ludia

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

Sponsored by: dbt Labs | Leveling Up Data Engineering at Riot: How We Rolled Out dbt and Transformed the Developer Experience

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Harnessing Databricks Asset Bundles: Transforming Pipeline Management at Scale at Stack Overflow

Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

Sponsored by: Monte Carlo | Cleared for Takeoff: How American Airlines Builds Data Trust

Cost-Effective Data Architecture and AI Practice With Databricks at FunPlus

Metadata-Driven Streaming Ingestion Using Lakeflow Declarative Pipelines, Azure Event Hubs and a Schema Registry

Transforming Data Pipeline Management With a Targeted Proof of Concept