talk-data.com talk-data.com

Databricks Summit talk 2025-06-10 at 19:40

Let's Save Tons of Money With Cloud-Native Data Ingestion!

Description

Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this session we will dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more. By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed! Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform. This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments.