talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (2 results)

Activities & events

Title & Speakers Event
Yingjun Wu – Speaker @ RisingWave Labs

At first, streaming Postgres changes into Iceberg feels like a no-brainer. You spin up Debezium or Kafka Connect, point it at Iceberg, and it all looks boringly straightforward. The surprise comes once the pipeline hits production. Replication slots vanish and start filling up WAL space, LSNs don't line up and cause duplicates or gaps, and Iceberg sinks fail in ways that push back all the way to your primary database. Then you throw in schema changes, backfills, and compaction, and suddenly the \"boring\" pipeline becomes a source of late-night firefights. In this talk, I'll share real stories from running Postgres to Iceberg CDC pipelines in production. We'll look at the unexpected problems that show up, why they happen, and the strategies that actually helped keep things stable. If you've ever thought of Postgres -> Iceberg as just plumbing, this session will show you why it's not so boring after all.

postgresql Iceberg
Melvyn Peignon – Principal Product Manager @ ClickHouse

En regardant certaines des premières pull requests dans le repository ClickHouse, vous verrez un fort accent mis sur l’intégration avec des systèmes externes. Au fil du temps, ClickHouse est devenu un puissant pont entre les data lakes et les data warehouses, prenant en charge les files d’attente, les bases de données et les object stores, avec une compatibilité pour plus de 60 formats d’entrée et de sortie. Cette polyvalence permet aux utilisateurs de bénéficier de la flexibilité d’un data lake tout en conservant les performances de requêtes en temps réel.

Dans cette session, nous discuterons de la manière dont nos utilisateurs exploitent ClickHouse et Iceberg, ainsi que de certaines fonctionnalités en cours de développement pour faciliter cette mouvance.

ClickHouse Iceberg
Victor Coustenoble – Staff Solution Architect and SEMEA Team Lead @ Starburst

Plongez au cœur du connecteur Trino pour Apache Iceberg ! Au-delà des bases, nous vous invitons à découvrir les dernières nouveautés et les fonctionnalités les plus avancées. À travers des démonstrations en direct, nous explorerons des sujets clés : La gestion des branches et des tags liés aux instantanés (snapshots). Les options de maintenance pour vos tables Iceberg. Le support étendu des métastores (catalogues). Ce talk est l'occasion de maîtriser des aspects souvent méconnus pour optimiser vos tables Iceberg avec Trino.

Trino apache iceberg
Behnaz Derakhshani – Specialist Data Engineer @ Diconium

Expect a hands-on journey showing how modern data lake tools and governance platforms connect the dots - making your data discoverable, governed, and productized for real-world use.

AWS Collibra data lake tools governance platforms
Yingjun Wu – Speaker @ RisingWave Labs

Stream processing systems have traditionally relied on local storage engines such as RocksDB to achieve low latency. While effective in single-node setups, this model doesn't scale well in the cloud, where elasticity and separation of compute and storage are essential. In this talk, we'll explore how RisingWave rethinks the architecture by building directly on top of S3 while still delivering sub-100 ms latency. At the core is Hummock, a log-structured state engine designed for object storage. Hummock organizes state into a three-tier hierarchy: in-memory cache for the hottest keys, disk cache managed by Foyer for warm data, and S3 as the persistent cold tier. This approach ensures queries never directly hit S3, avoiding its variable performance. We'll also examine how remote compaction offloads heavy maintenance tasks from query nodes, eliminating interference between user queries and background operations. Combined with fine-grained caching policies and eviction strategies, this architecture enables both consistent query performance and cloud-native elasticity. Attendees will walk away with a deeper understanding of how to design streaming systems that balance durability, scalability, and low latency in an S3-based environment.

S3 Data Streaming hummock object storage log-structured state engine
Erik Schmiegelow – CEO @ Hivemind Technologies

Successful gen AI projects strike the balance between impact, accuracy and cost - in this talk, we cover how to create agentic data applications effectively, choosing when and how to integrate them in data streams and keep response quality issues and costs in check.

genai data streaming ai in data pipelines

Dear data-loving community, we can't wait to present to you our new Meetup event: This time, it will be a collaboration with RisingWave, a platform for real-time streaming data management and analysis. Yingjun Wu, Founder and CEO at RisingWave Labs, will share his experience in a techy talk, as well as Behnaz Derakhshani, who works as a Specialist Data Engineer at Diconium's data department. Additionally, we're going to welcome external guest speaker Erik Schmiegelow, CEO at Hivemind Technologies. Exciting line-up, right? :D

Join us on September 16th in Berlin and bring all your questions! Here are the topics you can expect:

Yingjun Wu: Achieving Sub‑100 ms Real‑Time Stream Processing with an S3‑Native Architecture

Stream processing systems have traditionally relied on local storage engines such as RocksDB to achieve low latency. While effective in single-node setups, this model doesn't scale well in the cloud, where elasticity and separation of compute and storage are essential. In this talk, we'll explore how RisingWave rethinks the architecture by building directly on top of S3 while still delivering sub-100 ms latency. At the core is Hummock, a log-structured state engine designed for object storage. Hummock organizes state into a three-tier hierarchy: in-memory cache for the hottest keys, disk cache managed by Foyer for warm data, and S3 as the persistent cold tier. This approach ensures queries never directly hit S3, avoiding its variable performance. We'll also examine how remote compaction offloads heavy maintenance tasks from query nodes, eliminating interference between user queries and background operations. Combined with fine-grained caching policies and eviction strategies, this architecture enables both consistent query performance and cloud-native elasticity. Attendees will walk away with a deeper understanding of how to design streaming systems that balance durability, scalability, and low latency in an S3-based environment.

Behnaz Derakhshani: From Raw Data to Trusted Assets: A Practical Walkthrough with AWS services and Collibra

Expect a hands-on journey of Behnaz showing how modern data lake tools and governance platforms connect the dots, making your data discoverable, governed, and productized for real-world use.

Erik Schmiegelow: Effective Agentic GenAI in Data Streaming

Successful genAI projects strike the balance between impact, accuracy, and cost. In this talk, Erik will cover how to create agentic data applications effectively, choosing when and how to integrate them in data streams and keep response quality issues and costs in check.

What you can expect:

  • 3 expert talks
  • Interactive Q&A
  • Networking opportunities
  • Pizza & drinks (indoor or at our terrace)

Timetable:

  • 18:00 - Event admission
  • 18:30 - Welcome & introduction
  • 18:35 - Keynote by Yingjun Wu & Q&A
  • 19:05 - Short break
  • 19:15 - Keynote by Behnaz Derakhshani & Q&A
  • 19:45 - Keynote by Erik Schmiegelow & Q&A
  • 20:15 - Snacks, drinks & networking
  • 21:30 - End *

Our goal is to form a local data-loving community, so join us and let's talk data together!

-> Our event page, where you can also contact us if you want to present in the future at our Meetup: Data Engineering MeetUp Berlin - applydata

--- At the event, sound, image and video recordings are created and published for documentation purposes as well as for the presentation of the event in publicly accessible media, on websites and blogs and for presentation on social media. By participating the event, the participant implicitly consents to the aforementioned photo and/or video recordings. Find more information on data protection here.

Data Builders’ Evening: Architecture, Engineering & Beyond | Berlin, Sep. 16th
Let’s Take Action 2025-06-19 · 19:30

Looking to launch an R&D or POC project inspired by what you’ve seen? Franck Parienti will explain how to best leverage training budgets and available grants to upskill your teams on these new technologies, or how to kick off projects by reinforcing your teams smartly.

Data Streaming Lakehouse Tour (Paris)
Let’s Take Action 2025-06-19 · 19:30

Looking to launch an R&D or POC project inspired by what youve seen? Franck Parienti will explain how to best leverage training budgets and available grants to upskill your teams on these new technologies, or how to kick off projects by reinforcing your teams smartly.

Data Streaming Lakehouse Tour (Paris)

Allium is a methodology and platform designed to rapidly build translytic, data-centric, real-time, multi-application information systems. It defines the true value of a system around three main pillars worth investing in: UI/UX, data, and business rules (even more so in the era of AI). The result of years of projects and modern technologies, Allium puts the focus on managing "quality"—the source of endless discussions and failures, even in the age of "agility". If questions like “Is the business requirement fully specified before development?”, “Will the solution scale?”, “Will the code be maintainable, and at what level of quality?”, “Do we share a common, aligned vision of the IS?”, “Are there undetected gaps before implementation?” resonate with you, Allium will be of interest. Whether you’re a project owner, investor, inheritor of the notorious “monolith,” application designer, architect, data professional, developer, agile coach, designer, project, or program manager… you might just be surprised by this solution.

allium
Data Streaming Lakehouse Tour (Paris)

Allium is a methodology and platform designed to rapidly build translytic, data-centric, real-time, multi-application information systems. It defines the true value of a system around three main pillars worth investing in: UI/UX, data, and business rules (even more so in the era of AI). The result of years of projects and modern technologies, Allium puts the focus on managing \"quality\"—the source of endless discussions and failures, even in the age of \"agility\". If questions like \u201cIs the business requirement fully specified before development?\u201d, \u201cWill the solution scale?\u201d, \u201cWill the code be maintainable, and at what level of quality?\u201d, \u201cDo we share a common, aligned vision of the IS?\u201d, \u201cAre there undetected gaps before implementation?\u201d resonate with you, Allium will be of interest. Whether you’re a project owner, investor, inheritor of the notorious “monolith,” application designer, architect, data professional, developer, agile coach, designer, project, or program manager… you might just be surprised by this solution.

allium

How 1point6, a fintech founded by BNP Paribas and the 321 startup studio, uses RisingWave to build a declarative, fully auditable information system operating payment services.

risingwave

How 1point6, a fintech founded by BNP Paribas and the 321 startup studio, uses RisingWave to build a declarative, fully auditable information system operating payment services.

risingwave

Based on a large retail project, discover how to evolve an IT system built through incremental layers (monoliths, [micro]services, streaming, governance, applications…) into a data-centric, real-time, high-performance system, where governance, rule and data catalogs, data mesh, and scalability are integrated by design—not as add-on layers.

governance data catalogs data mesh

Based on a large retail project, discover how to evolve an IT system built through incremental layers (monoliths, [micro]services, streaming, governance, applications…) into a data-centric, real-time, high-performance system, where governance, rule and data catalogs, data mesh, and scalability are integrated by design—not as add-on layers.

Data Governance data catalogs data mesh
Data Streaming Lakehouse Tour (Paris)
Yingjun Wu – Speaker @ RisingWave Labs

Everyone makes streaming sound simple – until you try bolting it onto your batch pipeline and it blows up. This talk skips the marketing gloss and gets into the real work: how to make batch and streaming actually play nice. I’ll walk through the essentials, then get into the messy parts – compaction, primary key updates, exactly-once delivery, and keeping your compute bill from spiraling. You’ll learn how to plug RisingWave into your existing stack and get real-time results without rewriting everything. It’s based on what we’ve seen in production – real problems, real fixes, no buzzwords.

Iceberg risingwave
Data Streaming Lakehouse Tour (Paris)
Yingjun Wu – Speaker @ RisingWave Labs

Everyone makes streaming sound simple - until you try bolting it onto your batch pipeline and it blows up. This talk skips the marketing gloss and gets into the real work: how to make batch and streaming actually play nice. I will walk through the essentials, then get into the messy parts - compaction, primary key updates, exactly-once delivery, and keeping your compute bill from spiraling. You will learn how to plug RisingWave into your existing stack and get real-time results without rewriting everything. It is based on what we have seen in production - real problems, real fixes, no buzzwords.

risingwave Iceberg
Data Streaming Lakehouse Tour (Paris)
Event The June Meetup 2025-06-18
Jake Robert Mongaya – Data Engineering Manager @ SumUp , Tadej Štajner – Data Engineering Manager @ SumUp

SumUp's Data Lake journey.

Data Lake
Yingjun Wu – Speaker @ RisingWave Labs

Talk on optimizing Apache Iceberg for time-series and IoT.

apache iceberg
Iceberg at Netflix 2024-11-05 · 01:00
Bryan Keller – Software Engineer @ Netflix , Snehal Chennuru – Engineering Manager @ Netflix , Tim Jiang – Software Engineer @ Netflix

Netflix's Iceberg past, present, and future (call out to community for where they see the technology challenges). Netflix will briefly cover our journey from Hive to Iceberg, current systems with catalog, compaction, and replication, and the improvements we're making.

Iceberg
Apache Iceberg Bay Area Community Meetup