Zillow has well-established, comprehensive systems for defining and enforcing data quality contracts and detecting anomalies.In this session, we will share how we evaluated Databricks’ native data quality features and why we chose Lakeflow Declarative Pipelines expectations for Lakeflow Declarative Pipelines, along with a combination of enforced constraints and self-defined queries for other job types. Our evaluation considered factors such as performance overhead, cost and scalability. We’ll highlight key improvements over our previous system and demonstrate how these choices have enabled Zillow to enforce scalable, production-grade data quality.Additionally, we are actively testing Databricks’ latest data quality innovations, including enhancements to lakehouse monitoring and the newly released DQX project from Databricks Labs.In summary, we will cover Zillow’s approach to data quality in the lakehouse, key lessons from our migration and actionable takeaways.
talk-data.com
L
Speaker
Laura Zhou
1
talks
Software Dev Engineer, Big Data
Zillow
Laura is a Big Data Engineer at Zillow, where she builds scalable data pipelines and transforms raw data into insights that drive impact and feed directly into Zillow’s products. With 7 years of experience across diverse data domains—ranging from geo data and sports analytics to user behavior—she enjoys tackling complex data challenges and designing systems that make data processing more efficient and reliable. When not immersed in data, she likes exercising, hiking, and spending time with her family.
Bio from: Data + AI Summit 2025
Filtering by:
Data + AI Summit 2025
×
Filter by Event / Source
Talks & appearances
Showing 1 of 1 activities