Distributed version control systems - such as Git - unlock software development in multi-player mode: devs can safely work over the same code base, with standard (albeit perhaps not user-friendly!) abstractions for snapshotting, time-travel, and branching. Data folks have rarely been so lucky, as their projects crucially depend on data, whose life-cycle management is often cumbersome and custom. In this talk, we present open formats - such as Apache Iceberg - to practitioners with limited exposure to modern cloud infrastructure. In particular, we show how moving from datasets to tables unlocks a similar multi-player mode when building data pipelines, with equivalent abstractions for snapshotting, time-travel, branching, and a unified backbone for pipelines, data science, and AI use cases.
talk-data.com
Topic
apache iceberg
1
tagged
Activity Trend
1
peak/qtr
2020-Q1
2026-Q1
Top Events
Open Source Data Deep Dive - Santa Clara, CA - 9/18/24
2
IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup
2
ClickHouse Meetup Zurich
1
Come in, it's cold outside
1
REPLAY - Construire la plateforme DATA idéale - meetup OVHcloud du 3 Avril
1
Apache Iceberg Paris Community Meetup #2
1
Autumn Leaves, Data Stays
1
Git for Data: How Table Formats Unify Software and Data Development
1
Apache Iceberg x Apache Kafka x Grafana
1
IN-PERSON! Apache Kafka® Meetup Septembre
1
IN PERSON: Tooling for running Apache Kafka in Production
1
Git for Data: How Table Formats Unify Software and Data Development
1
Filtering by:
Git for Data: How Table Formats Unify Software and Data Development
×