Saurabh Gupta

Airflow at Zoox: A journey to orchestrate heterogeneous workflows

2025-07-01 · Airflow Summit 2025

session

with Justin Wang , Saurabh Gupta (The Modern Data Company)

AI/ML Airflow

The workflow orchestration team at Zoox aims to build a solution for orchestrating heterogeneous workflows encompassing data, ML, and QA pipelines. We have encountered two primary challenges: first, the steep learning curve for new Airflow users and the need for a user-friendly yet scalable development process; second, integrating and migrating existing pipelines with established solutions. This presentation will detail our approach, as a small team at Zoox, to address these challenges. The discussion will first present an exciting introduction to Zoox and what we do. Then we will walk down the memory lane of the past and current of Airflow use at Zoox. Furthermore, we will share our strategies for simplifying the Airflow DAG creation process and enhancing user experience. Lastly, we will share a few of our thoughts for how to grow the team and grow Airflow’s presence at Zoox in the future.

#190 How Data Leaders Can Make Data Governance a Priority with Saurabh Gupta, Chief Strategy & Revenue Officer at The Modern Data Company

2024-03-22 · DataFramed Listen

podcast_episode

with Adel (DataFramed) , Saurabh Gupta (The Modern Data Company)

AI/ML Data Governance Data Quality

There is a concept in software engineering which is called ‘shifting left’, this focuses on testing software a lot earlier in the development lifecycle than you would normally expect it to. This helps teams building the software create better rituals and processes, while also ensuring quality and usability are key aspects to evaluate as the software is being built. We know this works in software development, but what happens when these practices are used when building AI tools? Saurabh Gupta is a seasoned technology executive and is currently Chief Strategy & Revenue Officer The Modern Data Company. With over 25 years of experience in tech, data and strategy, he has led many strategy and modernization initiatives across industries and disciplines. Through his career, he has worked with various Internation Organizations and NGOs, Public sector and Private sector organizations. Before joining TMDC, he was the Head of Data Strategy & Governance at ThoughtWorks & CDO/Director for Washington DC Gov., where he developed the digital/data modernization strategy for education data. Prior to DCGov he played leadership and strategic roles at organizations including IMF and World Bank where he was responsible for their Data strategy and led the OpenData initiatives. He has also closely worked with African Development Bank, OECD, EuroStat, ECB, UN and FAO as a part of inter-organization working groups on data and development goals. As a part of the taskforce for international data cooperation under the G20 Data Gaps initiative, he chaired the technical working group on data standards and exchange. He also played an advisor role to the African Development Bank on their data democratization efforts under the Africa Information Highway. In the episode, Adel & Saurabh explore the importance of data quality and how ‘shifting left’ can improve data quality practices, the role of data governance, the emergence of data product managers, operationalizing ‘shift left’ strategies through collaboration and data governance, the challenges faced when implementing data governance, future trends in data quality and governance, and much more. Links Mentioned in the Show: The Modern Data CompanyMonte Carlo: The Annual State of Data Quality Survey[Course] Data Governance Concepts[Webinar] Crafting a Lean and Effective Data Governance Strategy Related Episode: Building Trust in Data with Data Governance New to DataCamp? Learn on the go using the DataCamp mobile app Empower your business with world-class data and AI skills with DataCamp for business

Practical Enterprise Data Lake Insights: Handle Data-Driven Challenges in an Enterprise Big Data Lake

2018-06-27 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Saurabh Gupta (The Modern Data Company) , Venkata Giri

data data-engineering storage-repositories data-lake Big Data Data Lake

Use this practical guide to successfully handle the challenges encountered when designing an enterprise data lake and learn industry best practices to resolve issues. When designing an enterprise data lake you often hit a roadblock when you must leave the comfort of the relational world and learn the nuances of handling non-relational data. Starting from sourcing data into the Hadoop ecosystem, you will go through stages that can bring up tough questions such as data processing, data querying, and security. Concepts such as change data capture and data streaming are covered. The book takes an end-to-end solution approach in a data lake environment that includes data security, high availability, data processing, data streaming, and more. Each chapter includes application of a concept, code snippets, and use case demonstrations to provide you with a practical approach. You will learn the concept, scope, application, and starting point. What You'll Learn Get to know data lake architecture and design principles Implement data capture and streaming strategies Implement data processing strategies in Hadoop Understand the data lake security framework and availability model Who This Book Is For Big data architects and solution architects

Practical Real-time Data Processing and Analytics

2017-09-28 · O'Reilly Data Engineering Books O'Reilly Amazon

book

with Prateek Bhati , Selva raj Ramasamy , Shilpi Saxena , Saurabh Gupta (The Modern Data Company)

data data-engineering streaming-messaging real-time-analytics Analytics Flink

This book provides a comprehensive guide to real-time data processing and analytics using modern frameworks like Apache Spark, Flink, Storm, and Kafka. Through practical examples and in-depth explanations, you will learn how to implement efficient, scalable, real-time processing pipelines. What this Book will help me do Understand real-time data processing essentials and the technology stack Learn integration of components like Apache Spark and Kafka Master the concepts of stream processing with detailed case studies Gain expertise in developing monitoring and alerting solutions for real-time systems Prepare to implement production-grade real-time data solutions Author(s) Shilpi Saxena and Saurabh Gupta, the authors, are experienced professionals in distributed systems and data engineering, focusing on practical applications of real-time computing. They bring their extensive industry experience to this book, helping readers understand the complexities of real-time data solutions in an approachable and hands-on manner. Who is it for? This book is ideal for software engineers and data engineers with a background in Java who seek to develop real-time data solutions. It is suitable for readers familiar with concepts of real-time data processing, and enhances knowledge in frameworks like Spark, Flink, Storm, and Kafka. Target audience includes learners building production data solutions and those designing distributed analytics engines.

Filter by Event / Source