This session looks at the ever-increasing demand for data and AI, the current challenges slowing development and how companies can overcome these challenges and shorten time to value using generative AI and open tables like Apache Iceberg. It also looks at how this approach makes it possible to transitioning away from siloed analytical systems to a modern data architecture where multiple teams can create reusable data products across multiple clouds and op-premises environments using generative AI in Data Fabric and share that data across multiple analytical workloads.
talk-data.com
Topic
Iceberg
Apache Iceberg
4
tagged
Activity Trend
Top Events
Join us for an in-depth exploration of Apache Iceberg and Apache Polaris (incubating), where we delve into the past, present, and future of these transformative technologies. This session will provide a comprehensive overview of Iceberg's journey, its current role within the data ecosystem, and the promising future it holds with the integration of Polaris (incubating). We will discuss how these technologies redefine table formats and catalog management, empowering organisations to efficiently manage and analyse large-scale data. Attendees will gain valuable insights into the evolving landscape, ensuring they remain at the forefront of innovation and continue to shape thought leadership in the data ecosystem.
Open table formats such as Apache Iceberg, Delta Lake, and Apache Hudi have dramatically transformed the data management landscape by enabling high-speed operations on massive datasets stored in object stores while maintaining ACID guarantees.
In this talk, we will explore the evolution and future of dataset versioning in the context of open table formats. Open table formats introduced the concept of table-level versioning and have become widely adopted standards. Data versioning systems that have emerged more recently, bringing best practices from software engineering into the data ecosystem, enable the management of multiple datasets within a large-scale data repository using Git-like semantics. Data versioning systems operate at the file level and are compatible with any open table format. On top of this, new catalogs that support these table formats and add a layer of access control are becoming the standard way to manage tabular datasets.
Despite these advancements, there remains a significant gap between current data versioning practices and the requirements for effective tabular dataset versioning.
The session will introduce the concept of a versioned catalog as a solution, demonstrating how it provides comprehensive data and metadata versioning for tables.
We’ll cover key requirements of tabular dataset management, including:
- Capturing multi-table changes as single logical operations
- Enabling seamless rollbacks without identifying each affected table
- Implementing table format-aware versioning operations such as diff and merge
Join us to explore the future of dataset versioning in the era of open table formats and evolving data management practices!
In the next five years, we are poised to witness a significant transformation towards modern data lake architecture across industries. This shift is driven by an urgent need for a unified, flexible, and scalable data management solution. Such a solution must address the challenges of siloed data environments and the increasing complexity of data sources while balancing the benefits of data mesh principles with centralized governance and semantic consistency.
In this talk, we will cover latest trends and benefits in this field, as well as usage of open formats like Iceberg, lower costs of data movement, & multiple engines to support different workloads that ultimately helps in getting into a single source of truth.