Tekmetric is the largest cloud based auto shop management system in the United States. We process vast amounts of data from various integrations with internal and external systems. Data quality and governance are crucial for both our internal operations and the success of our customers. We leverage multi-step data processing pipelines using AWS services and Airflow. While we utilize traditional data pipeline workflows to manage and move data, we go beyond standard orchestration. After data is processed, we apply tailored quality checks for schema validation, record completeness, freshness, duplication and more. In this talk, we’ll explore how Airflow allows us to enhance data observability. We’ll discuss how Airflow’s flexibility enables seamless integration and monitoring across different teams and datasets, ensuring reliable and accurate data at every stage. This session will highlight how Tekmetric uses data quality governance and observability practices to drive business success through trusted data.
talk-data.com
Speaker
Ipsa Trivedi
2
talks
Filter by Event / Source
Talks & appearances
2 activities · Newest first
This usecase shows how we deal with data of different varieties from different sources. Each source sends data in different layout, timings, structures, location patterns sizes. The goal is to process the files within SLA and send them out. This a complex multi step processing pipeline that involves multiple spark jobs, api based integrations with microservices, resolving unique ids, deduplication and filtering. Note that this is an event driven system, but not a streaming data system. The files are of gigabyte scale, and each day the data being processed is of terabyte scale. We will be talking about how to make DAG creation and business logic building a “low-code no-code process” so that non technical analysts can write business logic and light developers can deploy DAGs without much manual effort. Every aspect is either source specific or source-agnostic configuration driven. Airflow was chosen to enable easy DAG building, scaling, monitoring, troubleshooting and rerunning.