talk-data.com talk-data.com

S

Speaker

Shivnath Babu

2

talks

Cofounder/CTO Unravel

Filter by Event / Source

Talks & appearances

2 activities · Newest first

Search activities →

We are witnessing a rapid growth in the number of mission-critical data pipelines that leaders of data products are responsible for. “Are your data pipelines healthy?” This question was posed to more than 200 leaders of data products from various industries. The answers ranged from “unfortunately, no” to “they are mostly fine, but I am always afraid that something or the other will cause a pipeline to break”. This talk presents the concept of Pipeline HealthCheck (PHC) which enables leaders of data products to have high confidence in the correctness, performance, and cost efficiency of their data pipelines. More importantly, PHC enables leaders of data products as well as their development and operations teams to have high confidence in their ability to quickly detect, troubleshoot, and fix problems that make data pipelines unhealthy. The talk also includes a demo of how PHC helps handle common problems in data pipelines like incorrect results, missing SLAs, and overshooting cost budgets.

Digital transformation, application modernization, and data platform migration to the cloud are key initiatives in most enterprises today. These initiatives are stressing the scheduling and automation tools in these enterprises to the point that many users are looking for better solutions. A survey revealed that 88% of users believe that their business will benefit from an improved automation strategy across technology and business. Airflow has an excellent opportunity to capture mindshare and emerge as the leading solution here. At Unravel, we are seeing the trend where many of our enterprise customers are at various stages of migrating to Airflow from their enterprise schedulers or ETL/ELT orchestration tools like Autosys, Informatica, Oozie, Pentaho, and Tidal. In this talk, we will share lessons learned and best practices found in the entire pipeline migration life-cycle which includes: (i) The evaluation process which led to picking Airflow, including certain aspects where Airflow can do better (ii) The challenges in discovering and understanding all components and dependencies that need to be considered in the migration (iii) The challenges arising during the pipeline code and data migration, especially, in getting a single-pane-of-glass and apples-to-apples views to track the progress of the migration (iv) The challenges in ensuring that the pipelines that have been migrated to Airflow are able to perform and scale on par or better compared to what existed previously