talk-data.com talk-data.com

Filter by Source

Select conferences and events

People (67 results)

See all 67 →

Activities & events

Title & Speakers Event
Olena Kutsenko – Staff Developer Advocate @ Confluent

In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.

confluent cloud Kafka flink apache iceberg tableflow DuckDB
Crypto Streams to AI Predictions: Apache Kafka®, Apache Flink® & Apache Iceberg®

PLEASE RSVP @

*** Join us for a hands-on workshop by Olena Kutsenko on Monday, December 1st from 6:00pm!

In this workshop, you’ll harness the power of Confluent Cloud - the fully managed data streaming platform built on Apache Kafka®, Apache Flink®, and Apache Iceberg® - to build a live crypto-streaming pipeline that ingests, processes, stores, and predicts real-time data.

Crypto Streams to AI Predictions: Apache Kafka®, Apache Flink® & Apache Iceberg®

Sorry to say we're now full! Videos of the talks will be available.

Roll into autumn with another Search & AI Meetup! We're hosted this time by Depop. The event will start at 6.30pm and finish by 9pm, with food and drinks available. This will be an in-person event (but we will try to record the talks and share the videos later).

Our first talk will come from our hosts Depop: "Putting taste at the heart of Depop's search; Personalising at scale" given by Hamza Wahid (Search engineer) and Adam Stapleton (Senior Staff Product Manager): "Search at Depop isn’t just about matching words — it’s about understanding taste. In this talk, we’ll share how we reframed broad searches as a recommendation problem, using embeddings trained on user–item interactions to deliver highly personalised results. We’ll cover how Depop’s marketplace data powers these embeddings, how we integrated them into our search ranking pipeline, and the engineering challenges of deploying such a system at scale."

Next up is Senior Search Product Manager Olena Gorbatiuk who'll tell us about "Three things I’d do differently as a first-time Search Product Manager": "Search PM isn’t like any other PM role. In this talk, I’ll share three key things I’d approach differently as a first-time Search PM, and how they changed the way I build products today."

We'll also make time for Q&A on all things search and AI and some general networking. As the nights draw in come to our cosy Search Meetup, organised by The Search Juggler, OpenSearch and Eliatra.

Please provide your full name and email address when registering for the event as we will need a list of attendees for security.

Personalising at scale with Depop & Tips for a First-time Search PM
Archana Vaidheeswaran – Developer Advocate @ Aleph Alpha

Abstract: Ever notice how your AI interactions start strong but quickly deteriorate with complexity? We've all been there – carefully crafting detailed prompts for AI models, only to receive increasingly mediocre responses as our inputs grow longer. The conventional wisdom says more context equals better results, but real-world evidence suggests otherwise. In this session, I'll share discoveries from analyzing thousands of AI interactions across various domains that reveal a surprising truth: the relationship between prompt length and response quality isn't linear – it's parabolic. There's a sweet spot, and most of us are operating well beyond it.

AI/ML
Olena Kutsenko – Staff Developer Advocate @ Confluent

Abstract: Detecting problems as they happen is essential in today’s fast-moving, data-driven world. In this talk, you’ll learn how to build a flexible, real-time anomaly detection pipeline using Apache Kafka and Apache Flink, backed by statistical and machine learning models. We’ll start by demystifying what anomaly really means - exploring the different types (point, contextual, and collective anomalies) and the difference between unintentional issues and intentional outliers like fraud or abuse. Then, we’ll look at how anomaly detection is solved in practice: from classical statistical models like ARIMA to deep learning models like LSTM. You’ll learn how ARIMA breaks time series into AutoRegressive, Integrated, and Moving Average components, no math degree required (just a Python library). We’ll also uncover why forgetting is a feature, not a bug, when it comes to LSTMs, and how these models learn to detect complex patterns over time. Throughout, we’ll show how Kafka handles high-throughput streaming data and how Flink enables low-latency, stateful processing to catch issues as they emerge. You’ll leave knowing not just how these systems work, but when to use each type of model depending on your data and goals. Whether you're monitoring system health, tracking IoT devices, or looking for fraud in transactions, this talk will give you the foundations and tools to detect the unexpected - before it becomes a problem.

AI/ML Flink IoT Kafka Python Data Streaming

***IMPORTANT: IF YOU RSVP here you don't need to also RSVP to London Kafka Group.***

Date and Time: 🗓️ Wednesday 7th May, ⏰ 18:00 - 21:00 PM 🕘

Venue: Snowflake, One Crown Place, London EC2A 4EF, U.K. 5th & 6th floors · London

Schedule:

  • 6:00pm: Doors open
  • 6:00pm – 6:30pm: Food/Drinks and networking
  • 6:30pm - 7:00pm: Mastering real-time anomaly detection
  • 7:00pm - 7:30pm: Iced Kaf-fee: Chilling Kafka Data into Iceberg Tables
  • 7:30pm - 8:00pm: Observing all the things: Apache Kafka® and Apache Flink® with OpenTelemetry
  • 8:30pm - 9:00pm: Additional Q&A and Networking

🎙️ \~Talk 1\~ Mastering real-time anomaly detection, Olena Kutsenko, Staff Developer Advocate, Confluent

Abstract: Detecting problems as they happen is essential in today's fast-moving world. This talk shows how to build a simple, powerful system for real-time anomaly detection in live data. We'll use Apache Kafka for streaming data, Apache Flink for processing it in real time, and various models to detect unusual patterns. Whether it's monitoring systems, or tracking IoT devices, this solution is flexible and reliable.

We'll start by exploring how Kafka helps collect and manage fast-moving data streams. Then, we'll demonstrate how Flink processes this data in real time and integrates anomaly detection models to uncover events as they occur. We'll dive into the details of how ARIMA and LSTM work, so even if you’re not into mathematics, you can still understand what happens behind the scenes!

This talk is ideal for anyone looking to monitor anomalies in real-time data streams.

🗣️ Speaker 1: Olena is a Staff Developer Advocate at Confluent and a recognized expert in data streaming and analytics. With two decades of experience in software engineering, she has built mission-critical applications, led high-performing teams, and driven large-scale technology adoption at industry leaders like Nokia, HERE Technologies, AWS, and Aiven.

🎙️ \~Talk 2\~ Iced Kaf-fee: Chilling Kafka Data into Iceberg Tables, Danica Fine, Lead Developer Advocate, Open Source at Snowflake

Abstract: Have piping-hot, real-time data in Apache Kafka® but want to chill it down into Apache Iceberg™ tables? Let’s see how we can craft the perfect cup of “Iced Kaf-fee” for you and your needs!

We’ll start by grinding through the motivation for moving data from Kafka topics into Iceberg tables, exploring the benefits that doing so has to offer your analytics workflows. From there, we’ll open up the menu of options available to cool down your streams, including Apache Flink®, Apache Spark™, and Kafka Connect. Each brewing method has its own recipe, so we’ll compare their pros and cons, walk through use cases for each, and highlight when you might prefer a strong Spark roast over a smooth Flink blend—or maybe a Connect cold brew. Plus, we’ll share a sneak peek at future innovations that are percolating in the community to make sinking your Kafka data into Iceberg even easier.

By the end of the session, you’ll have everything you need to whip up the perfect pipeline and serve up your “Iced Kaf-fee” with confidence.

🗣️ Speaker 2: Danica began her career as a software engineer in financial services and pivoted to developer relations, where she focussed primarily on open source technologies under the Apache Software Foundation umbrella such as Apache Kafka and Apache Flink. She now leads the open source advocacy efforts at Snowflake, supporting Apache Iceberg and Apache Polaris (incubating).

🎙️ \~Talk 3\~ Observing all the things: Apache Kafka® and Apache Flink® with OpenTelemetry, Mehreen Tahir Software Engineer, New Relic

🗣️ Speaker 3: Mehreen specializes in machine learning, data science, and artificial intelligence. Mehreen is passionate about observability and the use of telemetry data to improve application performance. She actively contributes to developer communities and has a keen interest in edge analytics and serverless architecture.

*** DISCLAIMER NOTE: We are unable to cater for any attendees under the age of 18. If you would like to speak or host our next event please let us know! [email protected]

IN PERSON: Apache Kafka® x Apache Iceberg x Apache Flink®

Join us for an a range of talks including Kafka to Apache Iceberg in London hosted by Snowflake!

Date and Time: 🗓️ Wednesday 7th May, ⏰ 18:00 - 21:00 PM 🕘 Venue: Snowflake, One Crown Place, London EC2A 4EF, U.K. 5th & 6th floors · London

Schedule:

  • 6:00pm: Doors open
  • 6:00pm – 6:30pm: Food/Drinks and networking
  • 6:30pm - 7:00pm: Mastering real-time anomaly detection
  • 7:00pm - 7:30pm: Iced Kaf-fee: Chilling Kafka Data into Iceberg Tables
  • 7:30pm - 8:00pm: Observing all the things: Apache Kafka® and Apache Flink® with OpenTelemetry
  • 8:30pm - 9:00pm: Additional Q&A and Networking

🎙️ \~Talk 1\~ Mastering real-time anomaly detection, Olena Kutsenko, Staff Developer Advocate, Confluent

Abstract: Detecting problems as they happen is essential in today's fast-moving world. This talk shows how to build a simple, powerful system for real-time anomaly detection in live data. We'll use Apache Kafka for streaming data, Apache Flink for processing it in real time, and various models to detect unusual patterns. Whether it's monitoring systems, or tracking IoT devices, this solution is flexible and reliable. We'll start by exploring how Kafka helps collect and manage fast-moving data streams. Then, we'll demonstrate how Flink processes this data in real time and integrates anomaly detection models to uncover events as they occur. We'll dive into the details of how ARIMA and LSTM work, so even if you’re not into mathematics, you can still understand what happens behind the scenes!

This talk is ideal for anyone looking to monitor anomalies in real-time data streams.

🗣️ Speaker 1: Olena is a Staff Developer Advocate at Confluent and a recognized expert in data streaming and analytics. With two decades of experience in software engineering, she has built mission-critical applications, led high-performing teams, and driven large-scale technology adoption at industry leaders like Nokia, HERE Technologies, AWS, and Aiven.

🎙️ \~Talk 2\~ Iced Kaf-fee: Chilling Kafka Data into Iceberg Tables, Danica Fine, Lead Developer Advocate, Open Source at Snowflake

Abstract: Have piping-hot, real-time data in Apache Kafka® but want to chill it down into Apache Iceberg™ tables? Let’s see how we can craft the perfect cup of “Iced Kaf-fee” for you and your needs!

We’ll start by grinding through the motivation for moving data from Kafka topics into Iceberg tables, exploring the benefits that doing so has to offer your analytics workflows. From there, we’ll open up the menu of options available to cool down your streams, including Apache Flink®, Apache Spark™, and Kafka Connect. Each brewing method has its own recipe, so we’ll compare their pros and cons, walk through use cases for each, and highlight when you might prefer a strong Spark roast over a smooth Flink blend—or maybe a Connect cold brew. Plus, we’ll share a sneak peek at future innovations that are percolating in the community to make sinking your Kafka data into Iceberg even easier.

By the end of the session, you’ll have everything you need to whip up the perfect pipeline and serve up your “Iced Kaf-fee” with confidence.

🗣️ Speaker 2: Danica began her career as a software engineer in financial services and pivoted to developer relations, where she focussed primarily on open source technologies under the Apache Software Foundation umbrella such as Apache Kafka and Apache Flink. She now leads the open source advocacy efforts at Snowflake, supporting Apache Iceberg and Apache Polaris (incubating).

🎙️ \~Talk 3\~ Observing all the things: Apache Kafka® and Apache Flink® with OpenTelemetry, Mehreen Tahir Software Engineer, New Relic

🗣️ Speaker 3: Mehreen specializes in machine learning, data science, and artificial intelligence. Mehreen is passionate about observability and the use of telemetry data to improve application performance. She actively contributes to developer communities and has a keen interest in edge analytics and serverless architecture.

IN PERSON: Apache Kafka to Apache Iceberg examples by Snowflake
Alexander Kropp – author , Anatoly Zelenin – author

Apache Kafka, start to finish. Apache Kafka in Action: From basics to production guides you through the concepts and skills you’ll need to deploy and administer Kafka for data pipelines, event-driven applications, and other systems that process data streams from multiple sources. Authors Anatoly Zelenin and Alexander Kropp have spent years using Kafka in real-world production environments. In this guide, they reveal their hard-won expert insights to help you avoid common Kafka pitfalls and challenges. Inside Apache Kafka in Action you’ll discover: Apache Kafka from the ground up Achieving reliability and performance Troubleshooting Kafka systems Operations, governance, and monitoring Kafka use cases, patterns, and anti-patterns Clear, concise, and practical, Apache Kafka in Action is written for IT operators, software engineers, and IT architects working with Kafka every day. Chapter by chapter, it guides you through the skills you need to deliver and maintain reliable and fault-tolerant data-driven applications. About the Technology Apache Kafka is the gold standard streaming data platform for real-time analytics, event sourcing, and stream processing. Acting as a central hub for distributed data, it enables seamless flow between producers and consumers via a publish-subscribe model. Kafka easily handles millions of events per second, and its rock-solid design ensures high fault tolerance and smooth scalability. About the Book Apache Kafka in Action is a practical guide for IT professionals who are integrating Kafka into data-intensive applications and infrastructures. The book covers everything from Kafka fundamentals to advanced operations, with interesting visuals and real-world examples. Readers will learn to set up Kafka clusters, produce and consume messages, handle real-time streaming, and integrate Kafka into enterprise systems. This easy-to-follow book emphasizes building reliable Kafka applications and taking advantage of its distributed architecture for scalability and resilience. What's Inside Master Kafka’s distributed streaming capabilities Implement real-time data solutions Integrate Kafka into enterprise environments Build and manage Kafka applications Achieve fault tolerance and scalability About the Reader For IT operators, software architects and developers. No experience with Kafka required. About the Authors Anatoly Zelenin is a Kafka expert known for workshops across Europe, especially in banking and manufacturing. Alexander Kropp specializes in Kafka and Kubernetes, contributing to cloud platform design and monitoring. Quotes A great introduction. Even experienced users will go back to it again and again. - Jakub Scholz, Red Hat Approachable, practical, well-illustrated, and easy to follow. A must-read. - Olena Kutsenko, Confluent A zero to hero journey to understanding and using Kafka! - Anthony Nandaa, Microsoft Thoughtfully explores a wide range of topics. A wealth of valuable information seamlessly presented and easily accessible. - Olena Babenko, Aiven Oy

data data-engineering streaming-messaging Kafka Analytics Cloud Computing Kubernetes Microsoft Data Streaming
O'Reilly Data Engineering Books
Panel Discussion 2025-04-24 · 19:40
Katarzyna (Kasia) Stoltmann – Head of Data Science & AI @ AstraZeneca , Anita Fechner – Data Science Batch Summer 2021, now Product Analyst @ Delivery Hero , Jennifer Lapp – Head of Growth, DACH & LATAM @ HubSpot , Olena Nahorna – Tech Lead Manager @ Grammarly

Panel featuring Olena Nahorna, Katarzyna Stoltmann, Jennifer Lapp, Aliya Boranbayeva, moderated by Anita Fechner, discussing AI in communication and data.

ai NLP machine learning data storytelling
Jennifer Lapp – Head of Growth, DACH & LATAM @ HubSpot

AI-powered tools translate complex information into accessible language, supporting data storytelling and collaboration between human insight and machine intelligence. Real-world examples show how organizations use AI to enhance research, reporting, and technical documentation.

NLP machine learning ai data storytelling
Katarzyna (Kasia) Stoltmann – Head of Data Science & AI @ AstraZeneca

The talk will explore how a background in linguistics can enhance the development of end-to-end AI-driven solutions.

linguistics ai
Olena Nahorna – Tech Lead Manager @ Grammarly

The talk explores how large language models (LLMs) have accelerated the development of linguistic features. It focuses on how to adapt feature development processes to match this rapid pace and highlights key considerations for maintaining high-quality output in a fast-evolving AI landscape.

llms NLP machine learning
Paul Andrew , Olena Kutsenko , Martin Zuern , Gunnar Morling – Software Engineer and open-source enthusiast @ Decodable

🌟 Session Overview 🌟

Session Name: Building Effective Data Teams: Strategies for Success Speaker: Gunnar Morling, Martin Zuern, Olena Kutsenko, Paul Andrew Session Description: Panel Discussion will explore the key strategies for assembling and nurturing high-performing data teams. Expert panelists will discuss best practices for recruiting top talent, fostering collaboration, and creating a culture of innovation within data teams. The session will also address common challenges such as skill gaps, team dynamics, and aligning data initiatives with business goals.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

AI/ML Analytics Big Data Dashboard

🌟 Session Overview 🌟

Session Name: Sentiment Analysis in Action: Building Your Real-time Pipeline Speaker: Olena Kutsenko Session Description: Monitoring and interpreting the sentiment of data records is important for a variety of use cases. However, traditional human-based methods fall short in handling huge volumes of information with the required speed and efficiency. AI, however, can address this challenge.

AI is only part of the solution. We need to build a data pipeline that ingests data from various channels, processes it using AI-driven sentiment analysis models to classify the sentiment of each individual record, and prepares it to be consumed by applications for aggregation and analysis.

In this session, we'll build a system using open-source technologies Apache Kafka and Apache Flink with AI models to obtain real-time sentiment from social media data. Apache Kafka's scalability ensures that no record is left behind, making it a reliable foundation for sentiment analysis. Apache Flink, with its adaptability to fluctuations in data volume and velocity, will enable the analysis of a continuous data stream using an AI model.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

AI/ML Analytics Flink Big Data Dashboard Kafka
Flutter Berlin Oktoberfest 2024-10-24 · 16:00

🍻 Come hang out with us at Flutter Berlin Oktoberfest! 🍻

We'll be chatting about the hottest trends in Flutter, catching some awesome talks, and, of course, there'll be plenty of time to network and enjoy a beer or two. Whether you're here to learn something new, meet cool people, or just have a good time, this is the place to be.

Don’t miss the fun—see you there!

AGENDA

💥 Intro 💥 Ensuring user presence during authentication by Olena Iskryzhytska and Ivan Branets (Affindi) How to encrypt and protect user data with passphrase or biometrics so it can’t be cracked even when an attacker has admin permissions 💥 Building Apps with GenAI Capabilities without Internet Connection and Privacy Concerns by Sasha Denisov (EPAM, Flutter GDE) In this talk, we'll dive into the exciting world of Gemma, a groundbreaking family of open AI models by Google. We'll explore how you can leverage Gemma's capabilities to build innovative Mobile and Web projects. One of these capabilities is running on your mobile device or browser without an internet connection. 💥 Octoberfest Challenge Details of the challenge will be announced soon 💥 Open Microphone

Flutter Berlin Oktoberfest
OSDI Berlin - October 2024 2024-10-22 · 16:30

Are you interested in learning more about open-source data technologies? ✅ Do you want to network with local and international tech professionals in a fun, relaxed environment? ✅

Then join us on October 22, 2024, at the Aiven office in Berlin for an evening full of inspiring conversations and exciting talks! 17:30 - 18:00 Welcome: Networking & snacks, kickoff 18:00 - 18:30 Talk TBD Speaker: Olena Kutskenko, Senior Developer Advocate @ Aiven

18:30 - 19:00: Fast aggregation & Real-time Insights on event data using Kafka and Druid Speaker: Ajith Ramanath Have you ever wondered how a modern event analytics pipeline is different from a classical ETL setup? I am going to explain event analytics using Apache Druid, the real time OLAP engine. In a live demo, I show how to design and set up an analytics pipeline in just a few minutes, using Kafka for delivery, and Apache Druid for real time OLAP.

19:30 - 21:00 Food & Networking * Please note that this is an alcohol-free event. Light bites will be provided. * By attending this event, you agree to abide by our community code of conduct.

OSDI Berlin - October 2024

Zoom link for online participants will be published on the date of meetup here: https://bolt.zoom.us/j/93106867043?pwd=NTMvM29DODgvOG5uQUJuQnJqSVN4UT09

During the meetup, we will delve into the cutting-edge technologies powering our TiDB Cloud Data Service and explore TiDB Upgrades. We will also dive into how Bolt manages their TiDB Upgrades with the paranoid TiDB upgrade guide.

We discuss a beginner guide to balance your data across Apache Kafka partitions

This is a fantastic opportunity to expand your knowledge, gain insights from industry experts, and connect with fellow tech enthusiasts. Whether you’re a seasoned professional or just starting out, these sessions have something valuable for everyone.

Agenda:

17:30 – 18:00 Networking

18:00 – 18:10 Opening words

18:15 – 18:55 TiDB Cloud Data Service

Presenter: Daniel James – Principal Solutions Engineer at PingCAP

19:05 – 19:45 The Paranoid TiDB Version Upgrader’s Guide.

Presenter: Leandro Morgado – Senior MySQL DBA at Bolt

19:45 – 20:00 BREAK

20:05 – 20:45 Beginners guide to balance your data across Apache Kafka partitions

Presenter: Olena Kutsenko – Sr. Developer Advocate at Aiven

20:45 – 21:30 Networking

Managing TiDB Upgrades & useful practices with Apache Kafka
Olena – Sr. Developer Advocate @ Aiven

Apache Kafka is a distributed system. At the heart of Apache Kafka is a set of brokers that contain topics. Topics are split into partitions. Dividing topics into smaller pieces allows us to work with data in parallel and achieve higher data throughput.\n\nSuch parallelization is the key to a performant cluster, however it comes with a price. First, reading from multiple partitions will eventually mess up the order of records, meaning that the resulting order will be different from when the data was pushed into the cluster. Another big challenge is uneven distribution of data across partitions.\n\nOverloaded partitions present a dangerous issue for performance of all involved parties, but especially for brokers and consumers. Therefore, when building our product architecture we should carefully weigh up how many partitions we need, how to ensure proper message ordering, how to balance records across partitions, not forgetting about data load distribution over time. And do all of this while still maintaining good performance of the cluster.\n\nIf you're fresh to Apache Kafka, or looking for good practices to design your partitions and avoid common pitfalls, you'll find this session useful!

Kafka
Mattias Jonsson – Senior Database Engineer @ PingCAP

Introduction to TiDB - a distributed SQL database. We will go through its architecture and how it distributes data and provides high availability. Besides its row-based storage for efficient transaction handling, it also provides optional column storage for speeding up analytics queries.

tidb

Are you interested in learning more about open-source data technologies? Do you want to network with other like-minded people in a fun, relaxed environment?

Then come join us on Thursday, July, at the Aiven office in Berlin for an evening filled with great opportunities for networking and special technical talks. Read below for more details!

Before and after the talks, we’ll have food* and some time for socializing.

Note: The event starts at 6:00 PM CEST and runs until 9 PM CEST.

Program: 6.15PM - Open Doors 6.30 PM - Welcome 6.40 PM - 7.00 PM - Food & refreshments 7.00 PM - Introduction to TiDB - a distributed SQL database

Introduction to TiDB - a distributed SQL database. We will go through its architecture and how it distributes data and provides high availability. Besides its row-based storage for efficient transaction handling, it also provides optional column storage for speeding up analytics queries.

About the speaker: Mattias Jonsson - Senior Database Engineer at PingCAP. He has worked with MySQL for more than 15 years. Prior to joining PingCAP, he worked for Booking.com, where he spent a significant amount of time on MySQL.

7.30 PM - Beginners guide to balance your data across Apache Kafka partitions

Apache Kafka is a distributed system. At the heart of Apache Kafka is a set of brokers that contain topics. Topics are split into partitions. Dividing topics into smaller pieces allows us to work with data in parallel and achieve higher data throughput.

Such parallelization is the key to a performant cluster, however it comes with a price. First, reading from multiple partitions will eventually mess up the order of records, meaning that the resulting order will be different from when the data was pushed into the cluster. Another big challenge is uneven distribution of data across partitions.

Overloaded partitions present a dangerous issue for performance of all involved parties, but especially for brokers and consumers. Therefore, when building our product architecture we should carefully weigh up how many partitions we need, how to ensure proper message ordering, how to balance records across partitions, not forgetting about data load distribution over time. And do all of this while still maintaining good performance of the cluster.

If you're fresh to Apache Kafka, or looking for good practices to design your partitions and avoid common pitfalls, you'll find this session useful!

About the speaker: Olena is a Sr. Developer Advocate at Aiven. With a background in software engineering, she's led teams and developed mission-critical applications at Nokia, HERE Technologies, and AWS. Currently, she works at Aiven where she supports developers and customers in using open-source data technologies such as Apache Kafka, ClickHouse, and OpenSearch. She is also an international public speaker and regularly present at conferences around the world. She holds AWS Developer and Solutions Architect certifications, and is also a Confluent Catalyst.

8 PM - 9 PM - More food & Socialising

*Please note that this is an alcohol-free event.

Catering Pizza (vegan, vegetarian, and gluten free options will be available), soft drinks & water.

Berlin Open Source Data Infrastructure Meetup - July 2023