AWS re:Invent 2025 - Best practices for building Apache Iceberg based lakehouse architectures on AWS

2025-12-06 · AWS re:Invent 2024 Watch

video

Agile/Scrum Athena AWS Amazon EMR AWS Glue Cloud Computing Data Lakehouse ETL/ELT Iceberg Redshift S3 Amazon SageMaker +1 more

Discover advanced strategies for implementing Apache Iceberg on AWS, focusing on Amazon S3 Tables and integration of Iceberg Rest Catalog with the lakehouse in Amazon SageMaker. We'll cover performance optimization techniques for Amazon Athena and Amazon Redshift queries, real-time processing using Apache Spark, and integration with Amazon EMR, AWS Glue, and Trino. Explore practical implementations of zero-ETL, change data capture (CDC) patterns, and medallion architecture. Gain hands-on expertise in implementing enterprise-grade lakehouse solutions with Iceberg on AWS.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Tathagata Das (Databricks) , Michelle Leon (Databricks)

Flink API Data Lakehouse DuckDB Iceberg Cyber Security Spark

The lakehouse is built for storage flexibility, but what about compute? In this session, we’ll explore how Unity Catalog enables you to connect and govern multiple compute engines across your data ecosystem. With open APIs and support for the Iceberg REST Catalog, UC lets you extend access to engines like Trino, DuckDB, and Flink while maintaining centralized security, lineage, and interoperability. We will show how you can get started today working with engines like Apache Spark and Starburst to read and write to UC managed tables with some exciting demos. Learn how to bring flexibility to your compute layer—without compromising control.

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Ruotian Wang (Doordash) , Sergey Zavgorodni (DoorDash)

Amazon EMR Data Lake Data Lakehouse Databricks DWH Iceberg Snowflake

DoorDash Data organization actively adopts LakeHouse paradigm. This presentation describes the methodology which allows to migrate the classic Data Warehouse and Data Lake platforms to unified LakeHouse solution.The objective of this effort include Elimination of excessive data movement.Seamless integration and consolidation of the query engine layers, including Snowflake, Databricks, EMR and Trino.Query performance optimization.Abstracting away complexity of underlying storage layers and table formatsStrategic and justified decision on the Unified Metadata catalog used across varios compute platforms

Coalesce 2024: How dbt transformed FinOps cost analysis at Workday

2024-10-16 · Dbt Coalesce 2024 Watch

video

by Pattabhi Nanduri (Workday) , Eric Pu (Workday)

Analytics Cloud Computing dbt FinOps Lightdash

Eric will share the team's experience with dbt and tell the development story of bringing Workday FinOps cost analysis to cloud engineers and stakeholders. He will describe how the team is using dbt, Trino and Lightdash to build a new data platform that is now a key part of their data-driven business decision process in multiple organizations within Workday. Plus, he'll show how they created a secure, efficient, and scalable platform — through dbt governance features — to drive those successful data projects.

Speakers: Eric Pu Senior Software Engineer Workday

Pattabhi Nanduri FinOps Data Engineer Workday

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

Workshop – Not Your Father's Data Lakehouse: Building with Trino and Iceberg

2024-03-26 · Data Council Austin 2024 - Day 1 Watch

workshop

by Jack Klamer , Monica Miller

Data Lakehouse Iceberg

Igor Khrol: Big Data With Open Source Solutions

2023-12-05 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Igor Khrol (Automattic)

Airflow Big Data Cloud Computing Hadoop Spark Superset

Join Igor Khrol as he delves into the world of Big Data with Open Source Solutions at Automattic, a company rooted in the power of open source. 📊🌐 Discover their unique approach to maintaining a data ecosystem based on Hadoop, Spark, Trino, Airflow, Superset, and JupyterHub, all hosted on bare metal infrastructure, and gain insights on how it compares to cloud-based alternatives in 2023. 💡🚀 #BigData #opensource

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Delta Kernel: Simplifying Building Connectors for Delta

2023-07-26 · Databricks DATA + AI Summit 2023 Watch

video

by Denny Lee (Databricks) , Tathagata Das (Databricks)

Flink API Data Lakehouse Databricks Delta PySpark Rust

Since the release of Delta 2.0, the project has been growing at a breakneck speed. In this session, we will cover all the latest capabilities that makes Delta Lake the best format for the lakehouse. Based on lessons learned from this past year, we will introduce Project Aqueduct and how we will simplify building Delta Lake APIs from Rust and Go to Trino, Flink, and PySpark.

Talk by: Tathagata Das and Denny Lee

Connect with us: Website: https://databricks.com Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/databricks Instagram: https://www.instagram.com/databricksinc Facebook: https://www.facebook.com/databricksinc

Why rent when you can own? Build your modern data lakehouse with true optionality

2022-10-25 · dbt Coalesce 2022 Watch

video

by Tom Nats (dbt Labs) , Brian Zhan (dbt Labs)

Analytics Data Lakehouse dbt

With Trino (formerly PrestoSQL) and dbt combined, you can get faster access to your data and the ability to analyze data across multiple data sources with ease. Extract, load and transform data in your data lakehouse easier than ever before using dbt’s Trino adapter. Join Brian Zhan and Tom Nats as they talk about the new dbt connector for Trino and how it works, along with a demo showing how easy it is to deploy, build and serve up analytics using dbt and Starburst Galaxy.

Check the slides here: https://docs.google.com/presentation/d/1-A-mfc1RIj87ypz6KeZvxK62QLaGthmMqBPy10vNnDk/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Build an Enterprise Lakehouse for Free with Trino and Delta Lake

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

by Claudius , Tom

Data Lake Data Lakehouse Databricks Delta SQL

Delta Lake has quickly grown in usage across data lakes everywhere due to the growing use cases that require DML capabilities that Delta Lake brings. Outside of support for ACID transactions, users want the ability to interactively query the data in their data lake. This is where a query engine like Trino (formerly PrestoSQL) comes in. Starburst provides an enterprise version of the popular Trino MPP SQL query engine and has recently open sourced their Delta Lake connector.

In this talk, Tom and Claudius will talk about the connector, its features, and how their users are taking advantage of expanding the functionality of their data lakes with improved performance and the ability to handle colliding modifications. Get started with this feature-rich and open stack without the need of a multi-million dollar budget.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

API Data Governance Databricks Hive Spark SQL

In this talk, we present two open source projects, Coral and Transport, that enable deep SQL and UDF interoperability between Spark and other engines, such as Trino and Hive. Coral is a SQL analysis, rewrite, and translation engine that enables compute engines to interoperate and analyze different SQL dialects and plans, through the conversion to a common relational algebraic intermediate representation. Transport is a UDF framework that enables users to write UDFs against a single API but execute them as native UDFs of multiple engines, such as Spark, Trino, and Hive. Further, we discuss how LinkedIn leverages Coral and Transport, and present a production use case for accessing views of other engines in Spark as well as enhancing Spark DataFrame and Dataset view schema. We discuss other potential applications such as automatic data governance and data obfuscation, query optimization, materialized view selection, incremental compute, and data source SQL and UDF communication.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Diving into Delta Lake 2.0

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Flink API Databricks Delta Presto S3 Spark

The Delta ecosystem rapidly expanded with the release of Delta Lake 1.2 which included integrations with Apache Spark™, Apache Flink, Presto, Trino, features such as OPTIMIZE, data skipping using column statistics, restore APIs, S3 multi-cluster writes, and more.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together; as well as the current roadmap. This will be an interactive session so come prepared with your questions—we should have answers!

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

Delta Lake 2.0 Overview

2022-07-19 · Databricks DATA + AI Summit 2023 Watch

video

Flink API Databricks Delta Go Java Presto Python Rust S3 Scala Spark

After three years of hard work by the Delta community, we are proud to announce the release of Delta Lake 2.0. Completing the work to open-source all of Delta Lake while tens of thousands of organizations were running in production was no small feat and we have the ever-expanding Delta community to thank! Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together.

Join this session to learn about how the wider Delta community collaborated together to bring these features and integrations together. This includes the Integrations with Apache Spark™, Apache Flink, Apache Pulsar, Presto, Trino, and more.

Features such as OPTIMIZE ZORDER, data skipping using column stats, S3 multi-cluster writes, Change Data Feed, and more.

Language APIs including Rust, Python, Ruby, GoLang, Scala, and Java.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

talk-data.com

Trino

Activity Trend

Top Events

Top Speakers

AWS re:Invent 2025 - Best practices for building Apache Iceberg based lakehouse architectures on AWS

AWSreInvent #AWSreInvent2025 #AWS

Extending the Lakehouse: Power Interoperable Compute With Unity Catalog Open APIs

Iceberg Table Format Adoption and Unified Metadata Catalog Implementation in Lakehouse Platform

Coalesce 2024: How dbt transformed FinOps cost analysis at Workday

Workshop – Not Your Father's Data Lakehouse: Building with Trino and Iceberg

Igor Khrol: Big Data With Open Source Solutions

Delta Kernel: Simplifying Building Connectors for Delta

Why rent when you can own? Build your modern data lakehouse with true optionality

Build an Enterprise Lakehouse for Free with Trino and Delta Lake

Coral and Transport Portable SQL and UDFs for the Interoperability of Spark and Other Engines

Diving into Delta Lake 2.0

Delta Lake 2.0 Overview