talk-data.com

Topic

Iceberg

Apache Iceberg

table_format data_lake schema_evolution file_format storage open_table_format

Activities

tagged

Activity Trend

39 peak/qtr

2020-Q1 2026-Q2

Top Events

Data Engineering Podcast 65 Data + AI Summit 2025 23 Big Data LDN 2025 13 dbt Coalesce 2025 9 O'Reilly Data Engineering Books 9 Databricks DATA + AI Summit 2023 6 Big Data & AI Paris 2025 5 AWS re:Invent 2024 5 Snowflake World Tour Berlin 5 Google Cloud Next '25 4 The Analytics Engineering Podcast 4 Big Data LDN 2024 4

Top Speakers

Tobias Macey 65 Yingjun Wu (RisingWave Labs) 5 Tom Scott (Streambased) 5 Tristan Handy (dbt Labs) 4 Ryan Blue (Tabular) 4 Adi Polak (Treeverse) 3 Dipti Borkar (Microsoft) 3 alex merced (Dremio) 3 Holly Smith (Databricks) 3 Julien Le Dem (Astronomer) 3 Jean-Baptiste Onofre (Apache Software Foundation) 2 Melvyn Peignon (ClickHouse) 2

Activities

9 activities · Newest first

All Video Podcast Book

Snowflake: The Definitive Guide, 2nd Edition

2027-05-25 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Joyce Kaye Avila

AI/ML Analytics Cloud Computing Data Governance Data Management GenAI Cyber Security Snowflake SQL data data-engineering

Snowflake is reshaping data management by integrating AI, analytics, and enterprise workloads into a single cloud platform. Snowflake: The Definitive Guide is a comprehensive resource for data architects, engineers, and business professionals looking to harness Snowflake's evolving capabilities, including Cortex AI, Snowpark, and Polaris Catalog for Apache Iceberg. This updated edition provides real-world strategies and hands-on activities for optimizing performance, securing data, and building AI-driven applications. With hands-on SQL examples and best practices, this book helps readers process structured and unstructured data, implement scalable architectures, and integrate Snowflake's AI tools seamlessly. Whether you're setting up accounts, managing access controls, or leveraging generative AI, this guide equips you with the expertise to maximize Snowflake's potential. Implement AI-powered workloads with Snowflake Cortex Explore Snowsight and Streamlit for no-code development Ensure security with access control and data governance Optimize storage, queries, and computing costs Design scalable data architectures for analytics and machine learning

Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

2026-01-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dunith Danushka

Airflow Flink Data Engineering Kafka Spark Trino data data-engineering streaming-messaging

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more. Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios. At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering. In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering. You Will Learn: The foundational concepts of data engineering and practical experience in solving real-world data engineering problems How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino 10 hands-on data engineering projects Troubleshoot common challenges in data engineering projects Who is this book for: Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools.

Engineering Lakehouses with Open Table Formats

2025-12-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dipankar Mazumdar , Vinoth Govindarajan (Apple)

Airflow Flink Big Data Data Lakehouse Data Management dbt Delta Hudi Python Spark data data-engineering +2 more

Engineering Lakehouses with Open Table Formats introduces the architecture and capabilities of open table formats like Apache Iceberg, Apache Hudi, and Delta Lake. The book guides you through the design, implementation, and optimization of lakehouses that can handle modern data processing requirements effectively with real-world practical insights. What this Book will help me do Understand the fundamentals of open table formats and their benefits in lakehouse architecture. Learn how to implement performant data processing using tools like Apache Spark and Flink. Master advanced topics like indexing, partitioning, and interoperability between data formats. Explore data lifecycle management and integration with frameworks like Apache Airflow and dbt. Build secure lakehouses with regulatory compliance using best practices detailed in the book. Author(s) Dipankar Mazumdar and Vinoth Govindarajan are seasoned professionals with extensive experience in big data processing and software architecture. They bring their expertise from working with data lakehouses and are known for their ability to explain complex technical concepts clearly. Their collaborative approach brings valuable insights into the latest trends in data management. Who is it for? This book is ideal for data engineers, architects, and software professionals aiming to master modern lakehouse architectures. If you are familiar with data lakes or warehouses and wish to transition to an open data architectural design, this book is suited for you. Readers should have basic knowledge of databases, Python, and Apache Spark for the best experience.

Advanced Snowflake

2025-10-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Muhammad Fasih Ullah

AI/ML Analytics Snowflake data data-engineering

As Snowflake's capabilities expand, staying updated with its latest features and functionalities can be overwhelming. The platform's rapid development gave rise to advanced tools like Snowpark and the Native App Framework, which are crucial for optimizing data operations but may seem complex to navigate. In this essential book, author Muhammad Fasih Ullah offers a detailed guide to understanding these sophisticated tools, ensuring you can leverage the full potential of Snowflake for data processing, application development, and deploying machine learning models at scale. You'll gain actionable insights and structured examples to transform your understanding and skills in handling advanced data scenarios within Snowflake. By the end of this book, you will: Grasp advanced features such as Snowpark, Snowflake Native App Framework, and Iceberg tables Enhance your projects with geospatial functions for comprehensive geospatial analytics Interact with Snowflake using a variety of programming languages through Snowpark Implement and manage machine learning models effectively using Snowpark ML Develop and deploy applications within the Snowflake environment

Apache Polaris: The Definitive Guide

2025-09-17 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by alex merced (Dremio) , Andrew Madson , Tomer Shiran (Dremio)

Data Lakehouse Data Management Dremio Snowflake Spark apache-iceberg data data-engineering data-lake storage-repositories

Revolutionize your understanding of modern data management with Apache Polaris (incubating), the open source catalog designed for data lakehouse industry standard Apache Iceberg. This comprehensive guide takes you on a journey through the intricacies of Apache Iceberg data lakehouses, highlighting the pivotal role of Iceberg catalogs. Authors Alex Merced, Andrew Madson, and Tomer Shiran explore Apache Polaris's architecture and features in detail, equipping you with the knowledge needed to leverage its full potential. Data engineers, data architects, data scientists, and data analysts will learn how to seamlessly integrate Apache Polaris with popular data tools like Apache Spark, Snowflake, and Dremio to enhance data management capabilities, optimize workflows, and secure datasets. Get a comprehensive introduction to Iceberg data lakehouses Understand how catalogs facilitate efficient data management and querying in Iceberg Explore Apache Polaris's unique architecture and its powerful features Deploy Apache Polaris locally, and deploy managed Apache Polaris from Snowflake and Dremio Perform basic table operations on Apache Spark, Snowflake, and Dremio

Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

2025-08-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Donna Strok , Dmitry Foshin , Dmitry Anoshin

Analytics BI Cloud Computing Data Analytics Databricks DWH ETL/ELT Matillion Cyber Security Snowflake Tableau data +1 more

This book is your guide to the modern market of data analytics platforms and the benefits of using Snowflake, the data warehouse built for the cloud. As organizations increasingly rely on modern cloud data platforms, the core of any analytics framework—the data warehouse—is more important than ever. This updated 2nd edition ensures you are ready to make the most of the industry’s leading data warehouse. This book will onboard you to Snowflake and present best practices for deploying and using the Snowflake data warehouse. The book also covers modern analytics architecture, integration with leading analytics software such as Matillion ETL, Tableau, and Databricks, and migration scenarios for on-premises legacy data warehouses. This new edition includes expanded coverage of SnowPark for developing complex data applications, an introduction to managing large datasets with Apache Iceberg tables, and instructions for creating interactive data applications using Streamlit, ensuring readers are equipped with the latest advancements in Snowflake's capabilities. What You Will Learn Master key functionalities of Snowflake Set up security and access with cluster Bulk load data into Snowflake using the COPY command Migrate from a legacy data warehouse to Snowflake Integrate the Snowflake data platform with modern business intelligence (BI) and data integration tools Manage large datasets with Apache Iceberg Tables Implement continuous data loading with Snowpipe and Dynamic Tables Who This Book Is For Data professionals, business analysts, IT administrators, and existing or potential Snowflake users

Apache Iceberg: The Definitive Guide

2024-05-02 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by alex merced (Dremio) , Tomer Shiran (Dremio) , Jason Hughes (Dremio)

AI/ML Analytics Flink Data Lakehouse Dremio ETL/ELT Spark Data Streaming apache-iceberg data data-engineering data-lake +1 more

Traditional data architecture patterns are severely limited. To use these patterns, you have to ETL data into each tool—a cost-prohibitive process for making warehouse features available to all of your data. The lack of flexibility with these patterns requires you to lock into a set of priority tools and formats, which creates data silos and data drift. This practical book shows you a better way. Apache Iceberg provides the capabilities, performance, scalability, and savings that fulfill the promise of an open data lakehouse. By following the lessons in this book, you'll be able to achieve interactive, batch, machine learning, and streaming analytics with this high-performance open source format. Authors Tomer Shiran, Jason Hughes, and Alex Merced from Dremio show you how to get started with Iceberg. With this book, you'll learn: The architecture of Apache Iceberg tables What happens under the hood when you perform operations on Iceberg tables How to further optimize Iceberg tables for maximum performance How to use Iceberg with popular data engines such as Apache Spark, Apache Flink, and Dremio Discover why Apache Iceberg is a foundational technology for implementing an open data lakehouse.

Trino: The Definitive Guide, 2nd Edition

2022-10-03 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Manfred Moser , Matt Fuller , Martin Traverso (Facebook)

Analytics Cassandra Data Lake Data Lakehouse Delta Hive Kafka Oracle SQL Trino data data-engineering +2 more

Perform fast interactive analytics against different data sources using the Trino high-performance distributed SQL query engine. In the second edition of this practical guide, you'll learn how to conduct analytics on data where it lives, whether it's a data lake using Hive, a modern lakehouse with Iceberg or Delta Lake, a different system like Cassandra, Kafka, or SingleStore, or a relational database like PostgreSQL or Oracle. Analysts, software engineers, and production engineers learn how to manage, use, and even develop with Trino and make it a critical part of their data platform. Authors Matt Fuller, Manfred Moser, and Martin Traverso show you how a single Trino query can combine data from multiple sources to allow for analytics across your entire organization. Explore Trino's use cases, and learn about tools that help you connect to Trino for querying and processing huge amounts of data Learn Trino's internal workings, including how to connect to and query data sources with support for SQL statements, operators, functions, and more Deploy and secure Trino at scale, monitor workloads, tune queries, and connect more applications Learn how other organizations apply Trino successfully

IBM e-business Technology, Solution, and Design Overview

2003-08-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jennifer Maynard , Brian R. Smith , Prabhakar Gopalan , Thomas G. Bradford , Charles Ackeifi , Abdulamir Mryhij

IBM data data-engineering

In a few short years, e-business has gone from a simple concept to an undeniable reality, and for good reason. It works for everyone: Consumers, businesses, and governments. The primary values of e-business, such as cost savings, revenue growth, and customer satisfaction, are proving to be only the tip of the iceberg. Having realized the benefit of Web-enabling individual business processes, many companies now seek further Return On Investment (ROI) by integrating new and existing e-business applications and technologies. The key to their success is to find a way to give customers what they want without the expense of traditional business operations. This IBM Redbook explains the IBM approach to creating e-business solutions. This publication targets IT specialists and architects who want to learn about proven technologies, products, and solutions to build advanced e-business applications. This publication is also written for the technical professional who is planning to take IBM Certification Test 815, IBM e-business Solution Design. This is a revision of Test 811, Designing IBM e-business Solutions. This publication, written by the same people who created Test 815, IBM e-business Solution Design, is a guide to the style and thinking that went into each and every test question. The information in this book is designed to help you prepare for IBM Test 815 and includes helpful tips for taking the test and sample questions.