Topic

OTF

Open Table Format (OTF)

data_lake open_table_format

Activities

3

tagged

Activity Trend

1 peak/qtr

2020-Q1 2026-Q1

Top Events

Big Data LDN 2024 1 Data + AI Summit 2025 1 dbt Coalesce 2025 1

Top Speakers

George Wu (Goldman Sachs) 1 Abhishek Narang (Goldman Sachs) 1 Anders Swanson (dbt Labs) 1 Tal Sofer 1

Activities

3 activities · Newest first

All Video Podcast Book

Iceberg: Below the surface of hype

2025-10-14 · dbt Coalesce 2025

talk

by Anders Swanson (dbt Labs)

Analytics Iceberg

What’s the big deal about Apache Iceberg anyway? "Might Iceberg solve problems for my team?" "I’m using Iceberg already, but I find it lacking in key areas!" If you have any of the above thoughts, this peer exchange is for you! Last year’s peer exchange on Apache Iceberg was standing room only given all the hype surrounding the open table format. However, when participants were asked asked when they might start testing Iceberg capabilities, most said: “wait at least a few months for the dust to settle”. So now we’re a year later, the dust has settled, adoption of Iceberg by analytics engineers continue to grow. But, there’s still some open questions and product integrations to be built. Join your peers in socially constructing knowledge that’ll inform you for the year to come and beyond!

Learning from Goldman Sachs' Legend Lakehouse for Data Governance

2025-06-11 · Data + AI Summit 2025

talk

by George Wu (Goldman Sachs) , Abhishek Narang (Goldman Sachs)

Analytics Data Contracts Data Governance Data Lakehouse Databricks Iceberg

Data is the backbone of modern decision-making, but centralizing it is only the tip of the iceberg. Entitlements, secure sharing and just-in-time availability are critical challenges to any large-scale platform. Join Goldman Sachs as we reveal how our Legend Lakehouse, coupled with Databricks, overcomes these hurdles to deliver high-quality, governed data at scale. By leveraging an open table format (Apache Iceberg) and open catalog format (Unity Catalog), we ensure platform interoperability and vendor neutrality. Databricks Unity Catalog then provides a robust entitlement system that aligns with our data contracts, ensuring consistent access control across producer and consumer workspaces. Finally, Legend functions, integrating with Databricks User Defined Functions (UDF), offer real-time data enrichment and secure transformations without exposing raw datasets. Discover how these components unite to streamline analytics, bolster governance and power innovation.

Dataset Versioning in the Age of Open Table Formats

2024-09-18 · Big Data LDN 2024

Face To Face

by Tal Sofer

Data Management Delta Git Hudi Iceberg

Open table formats such as Apache Iceberg, Delta Lake, and Apache Hudi have dramatically transformed the data management landscape by enabling high-speed operations on massive datasets stored in object stores while maintaining ACID guarantees.

In this talk, we will explore the evolution and future of dataset versioning in the context of open table formats. Open table formats introduced the concept of table-level versioning and have become widely adopted standards. Data versioning systems that have emerged more recently, bringing best practices from software engineering into the data ecosystem, enable the management of multiple datasets within a large-scale data repository using Git-like semantics. Data versioning systems operate at the file level and are compatible with any open table format. On top of this, new catalogs that support these table formats and add a layer of access control are becoming the standard way to manage tabular datasets.

Despite these advancements, there remains a significant gap between current data versioning practices and the requirements for effective tabular dataset versioning.

The session will introduce the concept of a versioned catalog as a solution, demonstrating how it provides comprehensive data and metadata versioning for tables.

We’ll cover key requirements of tabular dataset management, including:

Capturing multi-table changes as single logical operations
Enabling seamless rollbacks without identifying each affected table
Implementing table format-aware versioning operations such as diff and merge

Join us to explore the future of dataset versioning in the era of open table formats and evolving data management practices!