talk-data.com talk-data.com

L

Speaker

Luke Garzia

1

talks

Lead Data Engineer Mastercard

Luke Garzia: He's a lead data engineering with Mastercard supporting the fraud modeling team. He has extensive history working across the financial sector building cloud centric data pipelines and applications. He is a continuous learner with 3 GCP certifications and earned the nickname, 'Mr ForEach' after extensive usage of databricks Workflow capabilities.

Bio from: Data + AI Summit 2025

Filter by Event / Source

Talks & appearances

1 activities · Newest first

Search activities →
Scaling Data Engineering Pipelines: Preparing Credit Card Transactions Data for Machine Learning

We discuss two real-world use cases in big data engineering, focusing on constructing stable pipelines and managing storage at a petabyte scale. The first use case highlights the implementation of Delta Lake to optimize data pipelines, resulting in an 80% reduction in query time and a 70% reduction in storage space. The second use case demonstrates the effectiveness of the Workflows ‘ForEach’ operator in executing compute-intensive pipelines across multiple clusters, significantly reducing processing time from months to days. This approach involves a reusable design pattern that isolates notebooks into units of work, enabling data scientists to independently test and develop.