Data processing across systems relies on data lineage, a crucial aspect that tracks the movement and transformation of data records. Data lineage is necessary for Data Operations and Governance that support incident response, legal investigations, and privacy and compliance standards. However, things can go wrong due to the proprietary hand-coded business logic that alters the data. When that happens, the current data lineage systems that operate at the dataset/table level are not very helpful. They require additional analysis effort that can be expensive and time-consuming. In today's data landscape, we need a record-level lineage that can pinpoint the exact source and cause of data issues with minimal manual intervention. This problem has long been neglected due to its complexity, but we have a solution to propose. In this presentation, we will introduce a novel concept and its reference implementation using Starburst Enterprise Platform.
talk-data.com
Speaker
Dr. Ethan D. Peck
1
talks
Dr. Ethan D. Peck is a Director of Data Engineering in charge of Data Operations for the Product Data organization at Zoominfo. Ethan has over 15 years of experience in data ranging from Data Science to Governance and everything in between. Ethan's background ranges from academia where he received his Ph.D. in Atmospheric and Oceanic Sciences doing global climate modeling, to the startup world where he focused on ML-based entity resolution, to data operations at the publicly listed company Zoominfo. Besides his day job, Ethan is a husband, father of two, and a Shotokan Karate instructor.
Bio from: Data Universe 2024
Filter by Event / Source
Talks & appearances
Showing 1 of 1 activities