Data Engineering

Creating Data Pipelines with Zero Touch

2025-06-17 · Summit 2025 - On Demand

session

Python S3 Snowflake

In response to the growing demand for integrating new data into our data platform, the Data Engineering Team at Okta has developed a solution utilizing Snowpark for Python to automate construction of data pipelines. Discover how Okta's Zero Touch Platform creates end-to-end pipelines that ingest events arriving on S3 and transforms data in Snowflake using Streams and Tasks. The platform features integrated capabilities to detect schema changes in data streams, facilitating automatic evolution of Snowflake table schemas. Crafted with privacy in mind, it also allows for data classification through tags and systemically masking data using tag-based masking policies.

What's New: Scaling Data Pipelines with SQL, dbt Projects, and Python

2025-06-17 · Summit 2025 - On Demand Watch

session

dbt Python Snowflake SQL

Learn how to efficiently scale and manage data engineering pipelines with Snowflake's latest capabilities for SQL- and Python-based transformations. Join us for new product and feature overviews, best practices and live demos.

Kill Bill-ing? Revenge is a Dish Best Served Optimized with GenAI

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Abdul Furkhan (Sportsbet)

AI/ML Cloud Computing Databricks GenAI Spark

In an era where cloud costs can spiral out of control, Sportsbet achieved a remarkable 49% reduction in Total Cost of Ownership (TCO) through an innovative AI-powered solution called 'Kill Bill.' This presentation reveals how we transformed Databricks' consumption-based pricing model from a challenge into a strategic advantage through an intelligent automation and optimization. Understand how to use GenAI to reduce Databricks TCO Leverage generative AI within Databricks solutions enables automated analysis of cluster logs, resource consumption, configurations, and codebases to provide Spark optimization suggestions Create AI agentic workflows by integrating Databricks' AI tools and Databricks Data Engineering tools Review a case study demonstrating how Total Cost of Ownership was reduced in practice. Attendees will leave with a clear understanding of how to implement AI within Databricks solutions to address similar cost challenges in their environments.

Sponsored by: Dagster Labs | The Age of AI is Changing Data Engineering for Good

Databricks Lakeflow: the Foundation of Data + AI Innovation for Your Industry

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Sam Sawyer (Databricks) , Ori Zohar (Databricks)

AI/ML Analytics BI Databricks

Every analytics, BI and AI project relies on high-quality data. This is why data engineering, the practice of building reliable data pipelines that ingest and transform data, is consequential to the success of these projects. In this session, we'll show how you can use Lakeflow to accelerate innovation in multiple parts of the organization. We'll review real-world examples of Databricks customers using Lakeflow in different industries such as automotive, healthcare and retail. We'll touch on how the foundational data engineering capabilities Lakeflow provides help power initiatives that improve customer experiences, make real-time decisions and drive business results.

Getting the Most Out of Lakeflow Declarative Pipelines: A Deep Dive on What’s New and Best Practices

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Michael Armbrust (Databricks)

This deep dive covers advanced usage patterns, tips and best practices for maximizing the potential of Lakeflow Declarative Pipelines. Attendees will explore new features, enhanced workflows and cost-optimization strategies through a demo-heavy presentation. The session will also address complex use cases, showcasing how Lakeflow Declarative Pipelines simplifies the management of robust data pipelines while maintaining scalability and efficiency across diverse data engineering challenges.

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals (repeat)

2025-06-12 · Data + AI Summit 2025

talk

by Frank Munz (Databricks)

AI/ML Data Governance Databricks GenAI GitHub SQL Data Streaming

This session is repeated. This introductory workshop caters to data engineers seeking hands-on experience and data architects looking to deepen their knowledge. The workshop is structured to provide a solid understanding of the following data engineering and streaming concepts: Introduction to Lakeflow and the Data Intelligence Platform Getting started with Lakeflow Declarative Pipelines for declarative data pipelines in SQL using Streaming Tables and Materialized Views Mastering Databricks Workflows with advanced control flow and triggers Understanding serverless compute Data governance and lineage with Unity Catalog Generative AI for Data Engineers: Genie and Databricks Assistant We believe you can only become an expert if you work on real problems and gain hands-on experience. Therefore, we will equip you with your own lab environment in this workshop and guide you through practical exercises like using GitHub, ingesting data from various sources, creating batch and streaming data pipelines, and more.

IQVIA’s Serverless Journey: Enabling Data and AI in a Regulated World

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Alex Esibov (Databricks) , Matthew Schwartz (IQVIA)

AI/ML Analytics Data Analytics Databricks Cyber Security

Your data and AI use-cases are multiplying. At the same time, there is increased focus and scrutiny to meet sophisticated security and regulatory requirements. IQVIA utilizes serverless use-cases across data engineering, data analytics, and ML and AI, to empower their customers to make informed decisions, support their R&D processes and improve patient outcomes. By leveraging native controls on the platform, serverless enables them to streamline their use cases while maintaining a strong security posture, top performance and optimized costs. This session will go over IQVIA’s journey to serverless, how they met their security and regulatory requirements, and the latest and upcoming enhancements to the Databricks Platform.

A Practical Roadmap to Becoming an Expert Databricks Data Engineer

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Derar Alhussein (Acadford)

Databricks

The demand for skilled Databricks data engineers continues to rise as enterprises accelerate their adoption of the Databricks platform. However, navigating the complex ecosystem of data engineering tools, frameworks and best practices can be overwhelming. This session provides a structured roadmap to becoming an expert Databricks data engineer, offering a clear progression from foundational skills to advanced capabilities. Acadford, a leading training provider, has successfully trained thousands of data engineers on Databricks, equipping them with the skills needed to excel in their careers and obtain professional certifications. Drawing on this experience, we will guide attendees through the most in-demand skills and knowledge areas through a combination of structured learning and practical insights. Key takeaways: Understand the core tech stack in Databricks Explore real-world code examples and live demonstrations Receive an actionable learning path with recommended resources

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Simon Whiteley (Advancing Analytics)

AI/ML Data Quality LLM

The demand for data engineering keeps growing, but data teams are bored by repetitive tasks, stumped by growing complexity and endlessly harassed by an unrelenting need for speed. What if AI could take the heavy lifting off your hands? What if we make the move away from code-generation and into config-generation — how much more could we achieve? In this session, we’ll explore how AI is revolutionizing data engineering, turning pain points into innovation. Whether you’re grappling with manual schema generation or struggling to ensure data quality, this session offers practical solutions to help you work smarter, not harder. You’ll walk away with a good idea of where AI is going to disrupt the data engineering workload, some good tips around how to accelerate your own workflows and an impending sense of doom around the future of the industry!

164: I built an entire data pipeline in 30 minutes using only AI (no code required)

2025-06-12 · Data Career Podcast: Helping You Land a Data Analyst Job FAST Listen

podcast_episode

by Avery Smith

AI/ML Analytics BigQuery Data Analytics GCP

Try Keboola 👉 https://www.keboola.com/mcp?utm_campaign=FY25_Q2_RoW_Marketing_Events_Webinar_Keboola_MCP_Server_Launch_June&utm_source=Youtube&utm_medium=Avery Today, we'll create an entire data pipeline from scratch without writing a single line of code! Using Keboola MCP server and ClaudeAI, we’ll extract data from my FindADataJob.com RSS feed, transform it, load it into Google BigQuery, and visualize it with Streamlit. This is the future of data engineering! Keboola MCP Integration: https://mcp.connection.us-east4.gcp.keboola.com/sse I Analyzed Data Analyst Jobs to Find Out What Skills You ACTUALLY Need https://www.youtube.com/watch?v=lo3VU1srV1E&t=212s 💌 Join 10k+ aspiring data analysts & get my tips in your inbox weekly 👉 https://www.datacareerjumpstart.com/newsletter 🆘 Feeling stuck in your data journey? Come to my next free "How to Land Your First Data Job" training 👉 https://www.datacareerjumpstart.com/training 👩‍💻 Want to land a data job in less than 90 days? 👉 https://www.datacareerjumpstart.com/daa 👔 Ace The Interview with Confidence 👉 https://www.datacareerjumpstart.com/interviewsimulator ⌚ TIMESTAMPS 00:00 - Introduction 00:54 - Definition of Basic Data Engineering Terms 02:26 - Keboola MCP and Its Capabilities 07:48 - Extracting Data from RSS Feed 12:43 - Transforming and Cleaning the Data 19:19 - Aggregating and Analyzing Data 23:19 - Scheduling and Automating the Pipeline 25:04 - Visualizing Data with Streamlit

🔗 CONNECT WITH AVERY 🎥 YouTube Channel: https://www.youtube.com/@averysmith 🤝 LinkedIn: https://www.linkedin.com/in/averyjsmith/ 📸 Instagram: https://instagram.com/datacareerjumpstart 🎵 TikTok: https://www.tiktok.com/@verydata 💻 Website: https://www.datacareerjumpstart.com/ Mentioned in this episode: Join the last cohort of 2025! The LAST cohort of The Data Analytics Accelerator for 2025 kicks off on Monday, December 8th and enrollment is officially open!

To celebrate the end of the year, we’re running a special End-of-Year Sale, where you’ll get: ✅ A discount on your enrollment 🎁 6 bonus gifts, including job listings, interview prep, AI tools + more

If your goal is to land a data job in 2026, this is your chance to get ahead of the competition and start strong.

👉 Join the December Cohort & Claim Your Bonuses: https://DataCareerJumpstart.com/daa https://www.datacareerjumpstart.com/daa

Democratizing Data Engineering with Databricks and dbt at Ludia

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jean-Christophe Rodrigue (Ludia) , Huntting Buckley (Databricks)

Databricks dbt

Ludia, a leading mobile gaming company, is empowering its analysts and domain experts by democratizing data engineering with Databricks and dbt. This talk explores how Ludia enabled cross-functional teams to build and maintain production-grade data pipelines without relying solely on centralized data engineering resources—accelerating time to insight, improving data reliability, and fostering a culture of data ownership across the organization.

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Krishnakumar Sivasubramanian (NFCU) , Ricardo Portilla (Databricks)

AI/ML Cloud Computing Data Lake DWH

Navy Federal Credit Union has 200+ enterprise data sources in the enterprise data lake. These data assets are used for training 100+ machine learning models and hydrating a semantic layer for serving, at an average 4,000 business users daily across the credit union. The only option for extracting data from analytic semantic layer was to allow consuming application to access it via an already-overloaded cloud data warehouse. Visualizing data lineage for 1,000 + data pipelines and associated metadata is impossible and understanding the granular cost for running data pipelines is a challenge. Implementing Unity Catalog opened alternate path for accessing analytic semantic data from lake. It also opened the doors to remove duplicate data assets stored across multiple lakes which will save hundred thousands of dollars in data engineering efforts, compute and storage costs.

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Adriana Ispas (Databricks) , Lennart Kats (Databricks)

CI/CD Data Quality Databricks Git

Building robust, production-grade data pipelines goes beyond writing transformation logic — it requires rigorous testing, version control, automated CI/CD workflows and a clear separation between development and production. In this talk, we’ll demonstrate how Lakeflow, paired with Databricks Asset Bundles (DABs), enables Git-based workflows, automated deployments and comprehensive testing for data engineering projects. We’ll share best practices for unit testing, CI/CD automation, data quality monitoring and environment-specific configurations. Additionally, we’ll explore observability techniques and performance tuning to ensure your pipelines are scalable, maintainable and production-ready.

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

Harnessing Databricks Asset Bundles: Transforming Pipeline Management at Scale at Stack Overflow

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Chelsea Zhang (Stack Overflow)

Databricks Cyber Security

Discover how Stack Overflow optimized its data engineering workflows using Databricks Asset Bundles (DABs) for scalable and efficient pipeline deployments. This session explores the structured pipeline architecture, emphasizing code reusability, modular design and bundle variables to ensure clarity and data isolation across projects. Learn how the data team leverages enterprise infrastructure to streamline deployment across multiple environments. Key topics include DRY-principled modular design, essential DAB features for automation and data security strategies using Unity Catalog. Designed for data engineers and teams managing multi-project workflows, this talk offers actionable insights on optimizing pipelines with Databricks evolving toolset.

Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Luke Garzia (Mastercard) , Brandon DeShon (Mastercard)

AI/ML Big Data Delta

We discuss two real-world use cases in big data engineering, focusing on constructing stable pipelines and managing storage at a petabyte scale. The first use case highlights the implementation of Delta Lake to optimize data pipelines, resulting in an 80% reduction in query time and a 70% reduction in storage space. The second use case demonstrates the effectiveness of the Workflows ‘ForEach’ operator in executing compute-intensive pipelines across multiple clusters, significantly reducing processing time from months to days. This approach involves a reusable design pattern that isolates notebooks into units of work, enabling data scientists to independently test and develop.

Sponsored by: Monte Carlo | Cleared for Takeoff: How American Airlines Builds Data Trust

talk-data.com

Activity Trend

Top Events

Top Speakers

Creating Data Pipelines with Zero Touch

What's New: Scaling Data Pipelines with SQL, dbt Projects, and Python

Kill Bill-ing? Revenge is a Dish Best Served Optimized with GenAI

Sponsored by: Dagster Labs | The Age of AI is Changing Data Engineering for Good

Databricks Lakeflow: the Foundation of Data + AI Innovation for Your Industry

Getting the Most Out of Lakeflow Declarative Pipelines: A Deep Dive on What’s New and Best Practices

Hands-on Learning: AI-Powered Data Engineering with Lakeflow: Techniques for Modern Data Professionals (repeat)

IQVIA’s Serverless Journey: Enabling Data and AI in a Regulated World

A Practical Roadmap to Becoming an Expert Databricks Data Engineer

Automating Engineering with AI - LLMs in Metadata Driven Frameworks

164: I built an entire data pipeline in 30 minutes using only AI (no code required)

Democratizing Data Engineering with Databricks and dbt at Ludia

How Navy Federal's Enterprise Data Ecosystem Leverages Unity Catalog for Data + AI Governance

Lakeflow in Production: CI/CD, Testing and Monitoring at Scale

Sponsored by: Acceldata | Agentic Data Management: Trusted Data for Enterprise AI on Databricks

Sponsored by: dbt Labs | Leveling Up Data Engineering at Riot: How We Rolled Out dbt and Transformed the Developer Experience

Sponsored by: West Monroe | Disruptive Forces: LLMs and the New Age of Data Engineering

Harnessing Databricks Asset Bundles: Transforming Pipeline Management at Scale at Stack Overflow

Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

Sponsored by: Monte Carlo | Cleared for Takeoff: How American Airlines Builds Data Trust