talk-data.com talk-data.com

Big Data LDN/Paris Face To Face 2024-04-10 at 12:00

Open Source Nessie: Enabling DataOps, Catalog Versioning and Git for Data

Description

Project Nessie is an open-source project that provides a Git-like approach to version control for data lakehouse tables. This makes it possible to track data changes over time and revert to previous versions if necessary.

In a lakehouse environment, catalog versioning is essential for ensuring the accuracy and reliability of data. By tracking changes to the catalog, you can ensure that everyone is working with the same data version. This can help to prevent errors and inconsistencies.

Project Nessie can be used to implement catalog versioning in a lakehouse environment. This can be done by creating a Nessie repository for the catalog and then tracking changes to the repository using Git.

This presentation will discuss the benefits of using Project Nessie for catalog versioning in a lakehouse environment. We will also discuss how to implement catalog versioning using Project Nessie.