talk-data.com talk-data.com

Topic

Data Lakehouse

data_architecture data_warehouse data_lake

5

tagged

Activity Trend

118 peak/qtr
2020-Q1 2026-Q1

Activities

Showing filtered results

Filtering by: Data Universe 2024 ×

The industry has witnessed some tectonic changes over the last few years: on prem to cloud to multi-cloud, BI to AI to GenAI, and data warehouses to data lakes to data lakehouses, to name a few. This constant evolution coupled with the ever-increasing demands of the business makes platform thinking crucial in order to ensure a future-proof infrastructure. As companies race to advance their AI strategies, Dell has seen a gravitational pull towards a modern data architecture that can create high quality data to feed AI and generate high quality outcomes. Join this session to learn about how the Dell Data Lakehouse, powered by Starburst, is the modern paradigm for this new era. You’ll learn about the investments Dell is making in data, analytics, and AI, why Dell and Starburst partnered up on this solution, and how it enables a tremendously powerful yet open and flexible data architecture.

Yello is currently embarking on a journey to modernize because their existing platform inhibits their ability to provide speed to insights for internal and external clients. Yello needed a solution that not only improved our ability to extract insights, but also enables the team to establish a single source of truth and enhance their level of data stewardship. Yello's new data architecture needed to be nimble, flexible, and agile - developing a solution that not only works for their clients, but also works internally for downstream consumers. Hear from Shawn Crenshaw and Peter Lim as they share insights from this moderinzation journey, and discuss how to develop and implement a data lakehouse as part of the journey. This final data lakehouse architecture will satisfy client needs and accomplish the mission of the Yello Data Services team, which is to improve the health and accessibility of data at Yello.

Slow query engines are forcing users to copy data from open data lakehouses into proprietary data warehouses to achieve their desired performance, but this results in a complex, costly ingestion pipeline that undermines data governance. In this talk, we will dive into the latest developments in data lakehouse querying, why you should avoid using proprietary data warehouses for accelerating queries, and how enterprises like Trip.com are unifying their SQL workloads directly on open data lakehouses.

Project Nessie is an open-source project that provides a Git-like approach to version control for data lakehouse tables. This makes it possible to track data changes over time and revert to previous versions if necessary.

In a lakehouse environment, catalog versioning is essential for ensuring the accuracy and reliability of data. By tracking changes to the catalog, you can ensure that everyone is working with the same data version. This can help to prevent errors and inconsistencies.

Project Nessie can be used to implement catalog versioning in a lakehouse environment. This can be done by creating a Nessie repository for the catalog and then tracking changes to the repository using Git.

This presentation will discuss the benefits of using Project Nessie for catalog versioning in a lakehouse environment. We will also discuss how to implement catalog versioning using Project Nessie.

This session will explore the transformative impact of Generative AI on data strategy. It will highlight how GenAI, based on a lakehouse platform, empowers organizations through people, process, and platform. The talk will also delve into how by grounding your strategy with governance in mind you can increase innovation, competitiveness, and productivity, by enabling data-driven decision-making.