talk-data.com talk-data.com

Topic

Airbyte

etl data_integration open_source

9

tagged

Activity Trend

4 peak/qtr
2020-Q1 2026-Q1

Activities

9 activities · Newest first

Sponsored by: Airbyte | How Data Movement Powers GenAI

In this session, discover how effective data movement is foundational to successful GenAI implementations. As organizations rush to adopt AI technologies, many struggle with the infrastructure needed to manage the massive influx of unstructured data these systems require. Jim Kutz, Head of Data at Airbyte, draws from 20+ years of experience leading data teams at companies like Grafana, CircleCI, and BlackRock to demonstrate how modern data movement architectures can enable secure, compliant GenAI applications. Learn practical approaches to data sovereignty, metadata management, and privacy controls that transform data governance into an enabler for AI innovation. This session will explore how you can securely leverage your most valuable asset—first-party data—for GenAI applications while maintaining complete control over sensitive information. Walk away with actionable strategies for building an AI-ready data infrastructure that balances innovation with governance requirements.

Let's Save Tons of Money With Cloud-Native Data Ingestion!

Delta Lake is a fantastic technology for quickly querying massive data sets, but first you need those massive data sets! In this session we will dive into the cloud-native architecture Scribd has adopted to ingest data from AWS Aurora, SQS, Kinesis Data Firehose and more. By using off-the-shelf open source tools like kafka-delta-ingest, oxbow and Airbyte, Scribd has redefined its ingestion architecture to be more event-driven, reliable, and most importantly: cheaper. No jobs needed! Attendees will learn how to use third-party tools in concert with a Databricks and Unity Catalog environment to provide a highly efficient and available data platform. This architecture will be presented in the context of AWS but can be adapted for Azure, Google Cloud Platform or even on-premise environments.

Coalesce 2024: Fueling product development and customer insights with dbt

At Airbyte, we leverage dbt to power our roadmap - from user discovery to customer retention efforts.

We need to parse across many sources of data across our open-source and Cloud communities, including Gong transcripts, NPS surveys, and Github issues. I'll share examples of how dbt powers how we work - from discovering product gaps and their importance to deals, to building retention tools like custom notifications around customer pipelines.

Speaker: Natalie Kwong Product Airbyte

Read the blog to learn about the latest dbt Cloud features announced at Coalesce, designed to help organizations embrace analytics best practices at scale https://www.getdbt.com/blog/coalesce-2024-product-announcements

New Girl, but Jess is a chatbot: AI joins the data team's loft - Coalesce 2023

A data team looks to grow by 33% by making their biggest hire to date: an AI Powered Chatbot. The biggest problem for this two person team: they don’t know where to start.

Join Airbyte on the journey from first Google search through procurement to implementation as they try to figure out if they can make an AI chatbot work for answering all of our companies questions.

They're (most likely) just like you: they know that they needed to do something with AI but weren't sure where or how to start. In this presentation, the Airbyte team talks through the process of trying to figure out how to use AI to make their team more efficient and have a wider reach in a rapidly growing, fully remote organization.

Speaker: Alex Gronemeyer, Lead Analytics Engineer, Airbyte

Register for Coalesce at https://coalesce.getdbt.com

Domesticating a feral cat data stack - Coalesce 2023

Lauren Benezra has been volunteering with a local cat rescue since 2018. She recently took on the challenge of rebuilding their data stack from scratch, replacing a Jenga tower of incomprehensible Google Sheets with a more reliable system backed by the Modern Data Stack. By using Airtable, Airbyte, BigQuery, dbt Cloud and Census, her role as Foster Coordinator has transformed: instead of digging for buried information while wrangling cats, she now serves up accurate data with ease while... well... wrangling cats.

Viewers will learn that it's possible to run an extremely scalable and reliable stack on a shoestring budget, and will come away with actionable steps to put Lauren's hard-won lessons into practice in their own volunteering projects or as the first data hire in a tiny startup.

Speakers: Lauren Benezra, Senior Analytics Engineer, dbt Labs

Register for Coalesce at https://coalesce.getdbt.com/

Building Leverage with dbt Labs at Airbyte

Airbyte supports loading data into a wide array of databases and data warehouses. We must enforce the same structure and transformations in each of these tools, and writing different transformations for each would be prohibitive. Instead, we use dbt to write this code once and reuse it for every database and data warehouse that we support. In an effort to improve our support across all these tools, we are also introducing a dbt Cloud integration within Airbyte Cloud. This will allow Airbyte Cloud users to leverage the lessons we’ve learned and build their own custom transformations using dbt Cloud.

Check the slides here: https://docs.google.com/presentation/d/19asIBrCgs04dJ07zhb1cosYEQHQC0yqEVSMlLcUymZY/edit?usp=sharing

Coalesce 2023 is coming! Register for free at https://coalesce.getdbt.com/.

Open Source Powers the Modern Data Stack

Lakehouses like Databricks’ Delta Lake are becoming the central brain for all data systems. But Lakehouses are only one component of the data stack. There are many building blocks required for tackling data needs, including data integrations, data transformation, data quality, observability, orchestration etc.

In this session, we will present how open source powers companies' approach to building a modern data stack. We will talk about technologies like Airbyte, Airflow, dbt, Preset, and how to connect them in order to build a customized and extensible data platform centered around Databricks.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/

How socat and UNIX Pipes Can Help Data Integration

Nearly every developer is familiar with creating a CLI. Containerized CLIs provide a flexible, cross-language standard with a low barrier to entry for open-source contributors. The ETL process can be reduced to two CLIs: one that reads data and one that writes data. While this interface is simple enough to implement from the contributor’s side, Kubernetes’ distributed nature means orchestrating data transfer between the CLIs on Kubernetes presents an unsolved problem.

This talk describes a novel approach to reliably orchestrate CLIs on Kubernetes for data integration. Through this lens, we go through the evaluation of strategies and describe the pros and cons of each architecture for horizontally scaling containerised data integration workflows on Kubernetes. We also cover the journey of implementing a TCP-based “process” abstraction over CLIs using socat and UNIX pipes. This same approach powers all of Airbyte’s Kubernetes deployments and helps sync TBs of data daily.

Connect with us: Website: https://databricks.com Facebook: https://www.facebook.com/databricksinc Twitter: https://twitter.com/databricks LinkedIn: https://www.linkedin.com/company/data... Instagram: https://www.instagram.com/databricksinc/