Search – talk-data.com

Title & Speakers	Event
Terraform: Reshaping the Data Engineering Experience 2023-12-04 · 15:30 Data infrastructure as code with Terraform and ClickHouse - Andrei Tserakhau About the event Crafting a better pipeline developer journey: harnessing terraform for enhanced data engineering experiences. We will have a closer look at how Terraform can help link existing infrastructure with your data infrastructure. And learn how Terraform helps to set up ClickHouse efficiently and connect it with the rest of the infrastructure. We’ll cover the following: Data infrastructure as a code. What is Terraform and how is it commonly used? What are the typical patterns in using Terraform? Analytics with ClickHouse Setting up ClickHouse with Terraform and linking it with the rest of the infrastructure By the end of this workshop, you’ll be able to learn when to use Terraform and set up ClickHouse with its help. About the speaker Andrei Tserakhau is an Engineer and Technical Leader at DoubleCloud with over 10 years of experience in IT. For the last 4 years, he has been working on distributed systems. In particular, his focus was on data delivery systems. He gradually merged disparate data delivery systems at Yandex into a single cross-system data delivery service - Yandex Data Transfer. He is also a big fan of moving data from A to B DataTalks.Club is the place to talk about data. Join our slack community! This event is sponsored by DoubleCloud. Thanks for supporting our community!	Terraform: Reshaping the Data Engineering Experience
Andrei Tserakhau: CDC: From Zero to Hero 2023-12-04 · 15:04 Andrei Tserakhau – guest @ DoubleCloud Join Andrei Tserakhau in his session 'CDC: From Zero to Hero' as he unravels the power of change data capture in modern microservice architectures. Discover how this single mechanism can tackle caches, full-text search indexes, replicas, and more, all in one heroic swoop! 🦸‍♂️💡 #CDC #microservicesarchitecture ✨ H I G H L I G H T S ✨ 🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍 Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️ Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear Big Data	DATA MINER Big Data Europe Conference 2020 YouTube
Designing Data Transfer Systems That Scale 2023-12-04 · 05:00 Andrei Tserakhau – guest @ DoubleCloud , Tobias Macey – host Summary The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud. Announcements Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues for every part of your data workflow, from migration to deployment. Datafold has recently launched a 3-in-1 product experience to support accelerated data migrations. With Datafold, you can seamlessly plan, translate, and validate data across systems, massively accelerating your migration project. Datafold leverages cross-database diffing to compare tables across environments in seconds, column-level lineage for smarter migration planning, and a SQL translator to make moving your SQL scripts easier. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold today! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Andrei Tserakhau about operationalizing high bandwidth and low-latency change-data capture Interview Introduction How did you get involved in the area of data management? Your most recent project involves operationalizing a generalized data transfer service. What was the original problem that you were trying to solve? What were the shortcomings of other options in the ecosystem that led you to building a new system? What was the design of your initial solution to the problem? What are the sharp edges that you had to deal with to operate and use that i AI/ML Analytics Cloud Computing Data Engineering Data Lake Data Lakehouse Data Management Data Quality Datafold Delta Hudi Iceberg SaaS SQL Data Streaming Trino	Data Engineering Podcast Listen

Terraform: Reshaping the Data Engineering Experience 2023-12-04 · 15:30

Data infrastructure as code with Terraform and ClickHouse - Andrei Tserakhau

About the event

Crafting a better pipeline developer journey: harnessing terraform for enhanced data engineering experiences. We will have a closer look at how Terraform can help link existing infrastructure with your data infrastructure. And learn how Terraform helps to set up ClickHouse efficiently and connect it with the rest of the infrastructure.

We’ll cover the following:

Data infrastructure as a code. What is Terraform and how is it commonly used?
What are the typical patterns in using Terraform?
Analytics with ClickHouse
Setting up ClickHouse with Terraform and linking it with the rest of the infrastructure

By the end of this workshop, you’ll be able to learn when to use Terraform and set up ClickHouse with its help.

About the speaker

Andrei Tserakhau is an Engineer and Technical Leader at DoubleCloud with over 10 years of experience in IT. For the last 4 years, he has been working on distributed systems. In particular, his focus was on data delivery systems. He gradually merged disparate data delivery systems at Yandex into a single cross-system data delivery service - Yandex Data Transfer. He is also a big fan of moving data from A to B DataTalks.Club is the place to talk about data. Join our slack community!

This event is sponsored by DoubleCloud. Thanks for supporting our community!

Terraform: Reshaping the Data Engineering Experience

Andrei Tserakhau: CDC: From Zero to Hero 2023-12-04 · 15:04

Andrei Tserakhau – guest @ DoubleCloud

Join Andrei Tserakhau in his session 'CDC: From Zero to Hero' as he unravels the power of change data capture in modern microservice architectures. Discover how this single mechanism can tackle caches, full-text search indexes, replicas, and more, all in one heroic swoop! 🦸‍♂️💡 #CDC #microservicesarchitecture

✨ H I G H L I G H T S ✨

🙌 A huge shoutout to all the incredible participants who made Big Data Conference Europe 2023 in Vilnius, Lithuania, from November 21-24, an absolute triumph! 🎉 Your attendance and active participation were instrumental in making this event so special. 🌍

Don't forget to check out the session recordings from the conference to relive the valuable insights and knowledge shared! 📽️

Once again, THANK YOU for playing a pivotal role in the success of Big Data Conference Europe 2023. 🚀 See you next year for another unforgettable conference! 📅 #BigDataConference #SeeYouNextYear

Big Data

DATA MINER Big Data Europe Conference 2020

YouTube

Designing Data Transfer Systems That Scale 2023-12-04 · 05:00

Andrei Tserakhau – guest @ DoubleCloud , Tobias Macey – host

Summary

The first step of data pipelines is to move the data to a place where you can process and prepare it for its eventual purpose. Data transfer systems are a critical component of data enablement, and building them to support large volumes of information is a complex endeavor. Andrei Tserakhau has dedicated his careeer to this problem, and in this episode he shares the lessons that he has learned and the work he is doing on his most recent data transfer system at DoubleCloud.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues for every part of your data workflow, from migration to deployment. Datafold has recently launched a 3-in-1 product experience to support accelerated data migrations. With Datafold, you can seamlessly plan, translate, and validate data across systems, massively accelerating your migration project. Datafold leverages cross-database diffing to compare tables across environments in seconds, column-level lineage for smarter migration planning, and a SQL translator to make moving your SQL scripts easier. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold today! Data lakes are notoriously complex. For data engineers who battle to build and scale high quality data workflows on the data lake, Starburst powers petabyte-scale SQL analytics fast, at a fraction of the cost of traditional methods, so that you can meet all your data needs ranging from AI to data applications to complete analytics. Trusted by teams of all sizes, including Comcast and Doordash, Starburst is a data lake analytics platform that delivers the adaptability and flexibility a lakehouse ecosystem promises. And Starburst does all of this on an open architecture with first-class support for Apache Iceberg, Delta Lake and Hudi, so you always maintain ownership of your data. Want to see Starburst in action? Go to dataengineeringpodcast.com/starburst and get $500 in credits to try Starburst Galaxy today, the easiest and fastest way to get started using Trino. Your host is Tobias Macey and today I'm interviewing Andrei Tserakhau about operationalizing high bandwidth and low-latency change-data capture

Interview

Introduction How did you get involved in the area of data management? Your most recent project involves operationalizing a generalized data transfer service. What was the original problem that you were trying to solve?

What were the shortcomings of other options in the ecosystem that led you to building a new system?

What was the design of your initial solution to the problem?

What are the sharp edges that you had to deal with to operate and use that i

AI/ML Analytics Cloud Computing Data Engineering Data Lake Data Lakehouse Data Management Data Quality Datafold Delta Hudi Iceberg SaaS SQL Data Streaming Trino

Data Engineering Podcast

Listen

talk-data.com

People (9 results)

Activities & events