Operating data lakes over object storage poses challenges: testing ETL changes, staging pipelines, ensuring best practices, debugging, and tracking data usage for ML reproducibility. Enter lakeFS—an open-source data version control tool transforming object storage into Git-like repositories. Learn how lakeFS enables unified workflows for code and data, providing benefits like faster development and error recovery. Join us to explore lakeFS and harness the power of data as code for your team's success.
talk-data.com
Speaker
Iddo Avneri
3
talks
Iddo has a strong software development background. Iddo built technical teams for several startups in the Observability, Cloud and data spaces. Prior to joining the lakeFS team Iddo built the technical enterprise field team at Turbonomic, from the ground up, as well as served as the Field CTO, and was the account executive for some of the company’s largest customers; up to the $1.9B IBM acquisition in 2021. In Treeverse, the company behind lakeFS, Iddo runs all customer engagements from sales to customer success.
Bio from: Data Universe 2024
Filter by Event / Source
Talks & appearances
3 activities · Newest first
Are you tired of spending countless hours testing your data pipelines, only to find that they don’t work as expected? Do you wish there was a better way to manage your data versions and streamline your testing processes? If so, this presentation is for you! Join us as we explore the problem domain of testing environments for data pipelines and take a deep dive into the available tools currently in use. We’ll introduce you to the game-changing concepts of data versioning and lakeFS and show you how to integrate these tools with Airflow to revolutionize your testing workflows. But don’t just take our word for it - witness this power firsthand with a live demo of testing an Airflow DAG on a snapshot of production data, providing a practical demo of the tools and techniques covered in the presentation. So don’t miss out on this opportunity to supercharge your testing processes and take your data pipelines to the next level.
Learn how to simplify the management of a data lake by enabling git-like operations over files in object storage. See how experimenting, reproducing data, and guaranteeing data quality are simplified and is all possible to achieve through open source tooling.