Have you ever wondered how a data company does data? In this session, Isaac Obezo, Staff Data Engineer at Starburst, will take you for a peek behind the curtain into Starburst’s own data architecture built to support batch processing of telemetry data within Galaxy data pipelines. Isaac will walk you through our architecture utilizing tools like git, dbt, and Starburst Galaxy to create a CI/CD process allowing our data engineering team to iterate quickly to deploy new models, develop and land data, and create and improve existing models in the data lake. Isaac will also discuss Starburst’s mentality toward data quality, the use of data products, and the process toward delivering quality analytics.
talk-data.com
Topic
Git
3
tagged
Activity Trend
Top Events
Operating data lakes over object storage poses challenges: testing ETL changes, staging pipelines, ensuring best practices, debugging, and tracking data usage for ML reproducibility. Enter lakeFS—an open-source data version control tool transforming object storage into Git-like repositories. Learn how lakeFS enables unified workflows for code and data, providing benefits like faster development and error recovery. Join us to explore lakeFS and harness the power of data as code for your team's success.
Project Nessie is an open-source project that provides a Git-like approach to version control for data lakehouse tables. This makes it possible to track data changes over time and revert to previous versions if necessary.
In a lakehouse environment, catalog versioning is essential for ensuring the accuracy and reliability of data. By tracking changes to the catalog, you can ensure that everyone is working with the same data version. This can help to prevent errors and inconsistencies.
Project Nessie can be used to implement catalog versioning in a lakehouse environment. This can be done by creating a Nessie repository for the catalog and then tracking changes to the repository using Git.
This presentation will discuss the benefits of using Project Nessie for catalog versioning in a lakehouse environment. We will also discuss how to implement catalog versioning using Project Nessie.