AWS re:Invent 2025 - Best practices for building Apache Iceberg based lakehouse architectures on AWS

2025-12-06 · AWS re:Invent 2024 Watch

video

Agile/Scrum Athena AWS Amazon EMR AWS Glue Cloud Computing ETL/ELT Iceberg Redshift S3 Amazon SageMaker Spark +1 more

Discover advanced strategies for implementing Apache Iceberg on AWS, focusing on Amazon S3 Tables and integration of Iceberg Rest Catalog with the lakehouse in Amazon SageMaker. We'll cover performance optimization techniques for Amazon Athena and Amazon Redshift queries, real-time processing using Apache Spark, and integration with Amazon EMR, AWS Glue, and Trino. Explore practical implementations of zero-ETL, change data capture (CDC) patterns, and medallion architecture. Gain hands-on expertise in implementing enterprise-grade lakehouse solutions with Iceberg on AWS.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Analytics AWS Cloud Computing Iceberg LLM Amazon SageMaker Cyber Security

Data lakes, data warehouses, or both? Join this session to explore how to build a unified, open, and secure data lakehouse architecture, fully compatible with Apache Iceberg, in Amazon SageMaker. Learn how the lakehouse breaks down data silos and opens your data estate offering flexibility to use your preferred query engines and tools that accelerate time to insights. Learn about recent launches that improve data interoperability and performance, and enable large language models (LLMs) and AI agents to interact with your data. Discover robust security features, including consistent fine-grained access controls, attribute-based access control, and tag-based access control that help democratize data without compromises.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - What's new with Amazon SageMaker in the era of unified data and AI (ANT216)

2025-12-04 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Analytics AWS Cloud Computing Amazon SageMaker

Learn the latest in data and AI development with the next generation of Amazon SageMaker. In this session, we'll cover new innovations that are transforming how enterprises build, deploy, and scale analytics and AI. Dive deep into the features of SageMaker Unified Studio, discover the latest catalog capabilities, and see how our lakehouse architecture is breaking down silos between data, analytics, and AI. From streamlined development experiences to enterprise-grade governance, you'll discover why Amazon SageMaker is the best place to to work with your data at AWS.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

Best practice for leveraging Amazon Analytic Services + dbt

2025-10-15 · dbt Coalesce 2025 Watch

talk

by Noritaka Sekiyama (Amazon Web Services (AWS)) , Venkatesh Aravamudan (Amazon Web Services (AWS)) , Neela Kulkarni (AWS)

Analytics Analytics Engineering Athena AWS AWS Glue dbt Iceberg Redshift S3

As organizations increasingly adopt modern data stacks, the combination of dbt and AWS Analytics services emerged as a powerful pairing for analytics engineering at scale. This session will explore proven strategies and hard-learned lessons for optimizing this technology stack to use dbt-athena, dbt-redshift, and dbt-glue to deliver reliable, performant data transformations. We will also cover case studies, best practices, and modern lakehouse scenarios with Apache Iceberg and Amazon S3 Tables.

DuckLake: Making BIG DATA feel small

2025-10-14 · dbt Coalesce 2025 Watch

talk

by Jacob Matson (MotherDuck) , Alex Monahan (MotherDuck)

Big Data dbt Motherduck

Easy, fast, and scalable: pick 3. MotherDuck’s managed DuckLake data lakehouse blends the cost efficiency, scale, and openness of a lakehouse with the speed of a warehouse for truly joyful dbt pipelines. We will show you how!

You Don’t Need Spark for That: Pythonic Data Lakehouse Workflows

2025-09-30 · PyData Paris 2025 Watch

talk

by Romain Clement

Delta Python Spark

Have you ever spun up a Spark cluster just to update three rows in a Delta table? In this talk, we’ll explore how modern Python libraries can power lightweight, production-grade Data Lakehouse workflows—helping you avoid over-engineering your data stack.

Minus Three Tier: Data Architecture Turned Upside Down

2025-09-26 · PyData Amsterdam 2025 Watch

talk

by Hannes Mühleisen (DuckDB Labs)

API Data Engineering DWH

Every data architecture diagram out there makes it abundantly clear who's in charge: At the bottom sits the analyst, above that is an API server, and on the very top sits the mighty data warehouse. This pattern is so ingrained we never ever question its necessity, despite its various issues like slow data response time, multi-level scaling issues, and massive cost.

But there is another way: Disconnect of storage and compute enables localization of query processing closer to people, leading to much snappier responses, natural scaling with client-side query processing, and much reduced cost.

In this talk, it will be discussed how modern data engineering paradigms like decomposition of storage, single-node query processing, and lakehouse formats enable a radical departure from the tired three-tier architecture. By inverting the architecture we can put user's needs first. We can rely on commoditised components like object store to enable fast, scalable, and cost-effective solutions.

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Ricky D'Sa (Addepar)

Databricks Delta

Addepar possesses an enormous private investment data set with 40% of the $7T assets on the platform allocated to alternatives. Leveraging the Addepar Data Lakehouse (ADL), built on Databricks, we have built a scalable data pipeline that assesses millions of private fund investment cash flows and translates it to a private fund benchmarks data offering. Investors on the Addepar platform can leverage this data seamlessly integrated against their portfolio investments and obtain actionable investment insights. At a high-level, this data offering consists of an extensive data aggregation, filtering, and construction logic that dynamically updates for clients through the Databricks job workflows. This derived dataset has gone through several iterations with investment strategists and academics that leveraged delta shared tables. Irrespective of the data source, the data pipeline coalesces all relevant cash flow activity against a unique identifier before constructing the benchmarks.

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Gianfranco Arena (Dxc Technology)

AWS Databricks Delta DWH

At DXC, we helped our customer FastWeb with their "Welcome Lakehouse" project - a data warehouse transformation from on-premises to Databricks on AWS. But the implementation became something more. Thanks to features such as Lakehouse Federation and Delta Sharing, from the first day of the Fastweb+Vodafone merger, we have been able to connect two different platforms with ease and make the business focus on the value of data and not on the IT integration. This session will feature our customer Alessandro Gattolin of Fastweb to talk about the experience.

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jay Chia (Eventual)

AI/ML Analytics Big Data Data Analytics

Modern data organizations have moved beyond big data analytics to also incorporate advanced AI/ML data workloads. These workflows often involve multimodal datasets containing documents, images, long-form text, embeddings, URLs and more. Unity Catalog is an ideal solution for organizing and governing this data at scale. When paired with the Daft open source data engine, you can build a truly multimodal, AI-ready data lakehouse. In this session, we’ll explore how Daft integrates with Unity Catalog’s core features (such as volumes and functions) to enable efficient, AI-driven data lakehouses. You will learn how to ingest and process multimodal data (images, text and videos), run AI/ML transformations and feature extractions at scale, and maintain full control and visibility over your data with Unity Catalog’s fine-grained governance.

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Jonathan Brito (Databricks)

AWS AWS Glue Databricks Delta Hive Iceberg Snowflake

Unity Catalog support for Apache Iceberg™ brings open, interoperable table formats to the heart of the Databricks Lakehouse. In this session, we’ll introduce new capabilities that allow you to write Iceberg tables from any REST-compatible engine, apply fine-grained governance across all data, and unify access to external Iceberg catalogs like AWS Glue, Hive Metastore, and Snowflake Horizon. Learn how Databricks is eliminating data silos, simplifying performance with Predictive Optimization, and advancing a truly open lakehouse architecture with Delta and Iceberg side by side.

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Building Responsible AI Agents on Databricks

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Pavithra Rao (Databricks) , Yassine Essawabi (Databricks)

AI/ML BI Databricks LLM Cyber Security

This presentation explores how Databricks' Data Intelligence Platform supports the development and deployment of responsible AI in credit decisioning, ensuring fairness, transparency and regulatory compliance. Key areas include bias and fairness monitoring using Lakehouse Monitoring to track demographic metrics and automated alerts for fairness thresholds. Transparency and explainability are enhanced through the Mosaic AI Agent Framework, SHAP values and LIME for feature importance auditing. Regulatory alignment is achieved via Unity Catalog for data lineage and AIBI dashboards for compliance monitoring. Additionally, LLM reliability and security are ensured through AI guardrails and synthetic datasets to validate model outputs and prevent discriminatory patterns. The platform integrates real-time SME and user feedback via Databricks Apps and AI/BI Genie Space.

Get the Most of Your Delta Lake

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Youssef Mrini (Databricks)

Analytics Data Management Delta Spark

Unlock the full potential of Delta Lake, the open-source storage framework for Apache Spark, with this session focused on its latest and most impactful features. Discover how capabilities like Time Travel, Column Mapping, Deletion Vectors, Liquid Clustering, UniForm interoperability, and Change Data Feed (CDF) can transform your data architecture. Learn not just what these features do, but when and how to use them to maximize performance, simplify data management, and enable advanced analytics across your lakehouse environment.

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

2025-06-12 · Data + AI Summit 2025 Watch

talk

by Tim Kessler (Redox, Inc.) , Matthew Giglia (Databricks)

AI/ML API Amazon EMR BI Databricks Delta ETL/ELT SQL Data Streaming

Redox & Databricks direct integration can streamline your interoperability workflows from responding in record time to preauthorization requests to letting attending physicians know about a change in risk for sepsis and readmission in near real time from ADTs. Data engineers will learn how to create fully-streaming ETL pipelines for ingesting, parsing and acting on insights from Redox FHIR bundles delivered directly to Unity Catalog volumes. Once available in the Lakehouse, AI/BI Dashboards and Agentic Frameworks help write FHIR messages back to Redox for direct push down to EMR systems. Parsing FHIR bundle resources has never been easier with SQL combined with the new VARIANT data type in Delta and streaming table creation against Serverless DBSQL Warehouses. We'll also use Databricks accelerators dbignite and redoxwrite for writing and posting FHIR bundles back to Redox integrated EMRs and we'll extend AI/BI with Unity Catalog SQL UDFs and the Redox API for use in Genie.

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

2025-06-12 · Data + AI Summit 2025 Watch

lightning_talk

by Alex Dean (Snowplow)

AI/ML Analytics Data Collection Databricks Delta Snowplow Data Streaming

In the competitive gaming industry, understanding player behavior is key to delivering engaging experiences. Supercell, creators of Clash of Clans and Brawl Stars, faced challenges with fragmented data and limited visibility into user journeys. To address this, they partnered with Snowplow and Databricks to build a scalable, privacy-compliant data platform for real-time insights. By leveraging Snowplow’s behavioral data collection and Databricks’ Lakehouse architecture, Supercell achieved: Cross-platform data unification: A unified view of player actions across web, mobile and in-game Real-time analytics: Streaming event data into Delta Lake for dynamic game balancing and engagement Scalable infrastructure: Supporting terabytes of data during launches and live events AI & ML use cases: Churn prediction and personalized in-game recommendations This session explores Supercell’s data journey and AI-driven player engagement strategies.

ClickHouse and Databricks for Real-Time Analytics

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Melvyn Peignon (ClickHouse)

Analytics ClickHouse Databricks Delta

ClickHouse is a C++ based, column-oriented database built for real-time analytics. While it has its own internal storage format, the rise of open lakehouse architectures has created a growing need for seamless interoperability. In response, we have developed integrations with your favorite lakehouse ecosystem to enhance compatibility, performance and governance. From integrating with Unity Catalog to embedding the Delta Kernel into ClickHouse, this session will explore the key design considerations behind these integrations, their benefits to the community, the lessons learned and future opportunities for improved compatibility and seamless integration.

talk-data.com

Data Lakehouse

Activity Trend

Top Events

Top Speakers

AWS re:Invent 2025 - Best practices for building Apache Iceberg based lakehouse architectures on AWS

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - What's new with Amazon SageMaker in the era of unified data and AI (ANT216)

AWSreInvent #AWSreInvent2025 #AWS

Best practice for leveraging Amazon Analytic Services + dbt

DuckLake: Making BIG DATA feel small

You Don’t Need Spark for That: Pythonic Data Lakehouse Workflows

Minus Three Tier: Data Architecture Turned Upside Down

Capitalizing Alternatives Data on the Addepar Platform: Private Markets Benchmarking

Sponsored by: C2S Technologies Inc. | Qbeast: Lakehouse Acceleration as a Service

Welcome Lakehouse, from a DWH transformation to a M&A data sharing

Daft and Unity Catalog: A Multimodal/AI-Native Lakehouse

Databricks + Apache Iceberg™: Managed and Foreign Tables in Unity Catalog

Sponsored by: DataHub | Beyond the Lakehouse: Supercharging Databricks with Contextual Intelligence

Sponsored by: definity | How You Could Be Saving 50% of Your Spark Costs

Sponsored by: Soda Data Inc. | Clean Energy, Clean Data: How Data Quality Powers Decarbonization

Building Responsible AI Agents on Databricks

Get the Most of Your Delta Lake

Healthcare Interoperability: End-to-End Streaming FHIR Pipelines With Databricks & Redox

Leveling Up Gaming Analytics: How Supercell Evolved Player Experiences With Snowplow and Databricks

ClickHouse and Databricks for Real-Time Analytics