Search – talk-data.com

VIRTUAL: One Does Not Simply Query a Stream 2025-12-15 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

VIRTUAL: One Does Not Simply Query a Stream 2025-12-15 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

VIRTUAL: One Does Not Simply Query a Stream 2025-12-15 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

VIRTUAL: One Does Not Simply Query a Stream 2025-12-15 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

VIRTUAL: One Does Not Simply Query a Stream 2025-12-15 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

VIRTUAL: One Does Not Simply Query a Stream 2025-12-15 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

VIRTUAL: One Does Not Simply Query a Stream 2025-12-03 · 19:00

IMPORTANT: PLEASE RSVP @ https://luma.com/7lonmd1t?tk=4gVuhX

*** “One Does Not Simply Query a Stream” with Viktor Gamov. Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available.

What to expect: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures.

In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

VIRTUAL: One Does Not Simply Query a Stream

IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup 2025-04-14 · 16:00

Join us for an Apache Kafka® Meetup on Monday, April 14th from 6:00pm hosted by Elaia!

Elaia is a full stack tech and deep tech investor. We partner with ambitious entrepreneurs from inception to leadership, helping them navigate the future and the unknown. For over twenty years, we have combined deep scientific and technological expertise with decades of operational experience to back those building tomorrow. From our offices in Paris, Barcelona and Tel Aviv, we have been active partners with over 100 startups including Criteo, Mirakl, Shift Technology, Aqemia and Alice & Bob.

📍Venue: Elaia 21 Rue d'Uzès, 75002 Paris, France

IF YOU RSVP HERE, YOU DO NOT NEED TO RSVP @ Paris Apache Kafka® Meetup group.

🗓 Agenda:

6:00pm: Doors Open/Welcome, Drinks
6:15pm - 7:00pm: Roman Kolesnev, Principal Software Engineer, Streambased
7:00pm - 7:45pm: Viktor Gamov, Principal Developer Advocate, Confluent
7:45pm - 8:30pm: Food, Additional Q&A, Networking

💡 Speaker One: Roman Kolesnev, Principal Software Engineer, Streambased

Talk: Melting Icebergs: Enabling Analytical Access to Kafka Data through Iceberg Projections

Abstract: An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be? In the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts?

Yes you can and we did!

This isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools.

In this talk, we'll cover: * How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL! * Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more. * Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka. Expect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles: * Kafka as the single source of truth—no separate stores. * Analytical processors shouldn't need Kafka-specific adjustments. * Operational performance must remain uncompromised. * Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented. Join us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end!

Bio: Roman is a Principal Software Engineer at Streambased. His experience includes building business critical event streaming applications and distributed systems in the financial and technology sectors.

💡 Speaker Two: Viktor Gamov, Principal Developer Advocate, Confluent

One Does Not Simply Query a Stream

Abstract: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures. In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures.

Bio: Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. . With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available. As a Java Champion and an esteemed speaker, Viktor is known for his insightful presentations at top industry events like JavaOne, Devoxx, Kafka Summit, and QCon. His expertise spans distributed systems, real-time data streaming, JVM, and DevOps.

Viktor has co-authored "Enterprise Web Development" from O'Reilly and "Apache Kafka® in Action" from Manning.

Follow Viktor on X - @gamussa to stay updated with Viktor's latest thoughts on technology, his gym and food adventures, and insights into open-source and developer advocacy.

*** DISCLAIMER We cannot cater to those under the age of 18. If you would like to speak at / host a future meetup, please reach out to [email protected]

IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup

Olena Kutsenko: Sentiment Analysis in Action: Building Your Real-time Pipeline 2024-12-06 · 17:45

Olena Kutsenko

🌟 Session Overview 🌟

Session Name: Sentiment Analysis in Action: Building Your Real-time Pipeline Speaker: Olena Kutsenko Session Description: Monitoring and interpreting the sentiment of data records is important for a variety of use cases. However, traditional human-based methods fall short in handling huge volumes of information with the required speed and efficiency. AI, however, can address this challenge.

AI is only part of the solution. We need to build a data pipeline that ingests data from various channels, processes it using AI-driven sentiment analysis models to classify the sentiment of each individual record, and prepares it to be consumed by applications for aggregation and analysis.

In this session, we'll build a system using open-source technologies Apache Kafka and Apache Flink with AI models to obtain real-time sentiment from social media data. Apache Kafka's scalability ensures that no record is left behind, making it a reliable foundation for sentiment analysis. Apache Flink, with its adaptability to fluctuations in data volume and velocity, will enable the analysis of a continuous data stream using an AI model.

🚀 About Big Data and RPA 2024 🚀

Unlock the future of innovation and automation at Big Data & RPA Conference Europe 2024! 🌟 This unique event brings together the brightest minds in big data, machine learning, AI, and robotic process automation to explore cutting-edge solutions and trends shaping the tech landscape. Perfect for data engineers, analysts, RPA developers, and business leaders, the conference offers dual insights into the power of data-driven strategies and intelligent automation. 🚀 Gain practical knowledge on topics like hyperautomation, AI integration, advanced analytics, and workflow optimization while networking with global experts. Don’t miss this exclusive opportunity to expand your expertise and revolutionize your processes—all from the comfort of your home! 📊🤖✨

📅 Yearly Conferences: Curious about the evolution of QA? Check out our archive of past Big Data & RPA sessions. Watch the strategies and technologies evolve in our videos! 🚀 🔗 Find Other Years' Videos: 2023 Big Data Conference Europe https://www.youtube.com/playlist?list=PLqYhGsQ9iSEpb_oyAsg67PhpbrkCC59_g 2022 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEryAOjmvdiaXTfjCg5j3HhT 2021 Big Data Conference Europe Online https://www.youtube.com/playlist?list=PLqYhGsQ9iSEqHwbQoWEXEJALFLKVDRXiP

💡 Stay Connected & Updated 💡

Don’t miss out on any updates or upcoming event information from Big Data & RPA Conference Europe. Follow us on our social media channels and visit our website to stay in the loop!

🌐 Website: https://bigdataconference.eu/, https://rpaconference.eu/ 👤 Facebook: https://www.facebook.com/bigdataconf, https://www.facebook.com/rpaeurope/ 🐦 Twitter: @BigDataConfEU, @europe_rpa 🔗 LinkedIn: https://www.linkedin.com/company/73234449/admin/dashboard/, https://www.linkedin.com/company/75464753/admin/dashboard/ 🎥 YouTube: http://www.youtube.com/@DATAMINERLT

AI/ML Analytics Flink Big Data Dashboard Kafka

DATA MINER Big Data Europe Conference 2020

YouTube

Intro to Apache Kafka 2024-10-22 · 19:00

David Anderson – Software Practice Lead @ Confluent

Dive into the world of real-time data streaming with this introduction to Apache Kafka. This talk is tailored for developers, data engineers, and IT professionals who want to gain a foundational understanding of Kafka, a powerful open-source platform used for building scalable, event-driven applications.You will learn about:

Kafka fundamentals: the core concepts of Kafka, including topics, partitions, producers, and consumers

The Kafka ecosystem: brokers, clients, Schema Registry, and Kafka Connect

Stream processing: Kafka Streams and Apache Flink

Use cases: discover how data streaming with Kafka has transformed various industries

Kafka kafka streams flink schema registry kafka connect

IN-PERSON: Apache Kafka® Meetup Berlin - October 2024

LDPaC In-Person Meetup 12th Sep: Transforming SQL Auth & Flink-Kafka Fusion 2024-09-12 · 16:45

Price - Free

Thank you to Accenture for sponsoring and hosting this event.

Join us for a In-Person User Group Meeting (LDPaC), where you can network, learn, ask a question, and meet other likeminded folks. These events are a really great opportunity to socialise in an informal learning setting.

Remember to tell your friends and the people you work with; make sure you register as soon as you can. We will need to provide a list of names to Accenture before the event, so to ensure there are no issues with access on the day please make sure you have registered.

17.45 - 18:00 Network 🤝

18.00 - 18:30 Drinks & Pizza 🍕

18:30 - 18:40 Intro🎙️

18:40 - 19:30 Transforming SQL Authentication: Real-World Scenarios with Azure Managed Identity. - Dieter Gobeyn 🔊

Discover how to enhance the security of your SQL databases by transitioning from traditional password-based authentication to a modern, passwordless approach using Azure Managed Identity. In this session, I will explore real-world scenarios that demonstrate the practical implementation of transitioning from connectionstrings to Role-Based Access Control (RBAC) with Managed Identity for Azure applications. Highlighting the benefits of eliminating passwords and simplifying access management.

19:30 - 19:40 10-min Break 🥤

19:40 - 20:30 Flink-Kafka Fusion: Patterns for Building Scalable Stream Processing Applications. - Dunith Dhanushka 🔊

When properly configured, Apache Kafka and Apache Flink provide a solid foundation for building scalable and reliable stream processing applications. However, it is difficult for most data professionals to get this combination right first, leading to project delays and costly mistakes. Consequently, having a set of patterns for stream processing applications is essential, just as it is for other software architectures.

This talk will walk you through more than ten stream processing patterns using Kafka and Flink. First, we discuss various computational patterns covering stateless and stateful operations. Then, we look at several state management patterns for state recovery and maintaining accuracy. Finally, we discuss several non-functional patterns, especially related to the deployment architecture, which helps us achieve high availability, security, and cost reductions of streaming applications. For each pattern, the business motivation, a sample code in Flink, and other related patterns will be provided, making it easy for the audience to grasp the essence.

This talk can benefit data practitioners of all levels, including application developers, data engineers, and architects, by providing them with blueprints for stream processing. Knowing these patterns will save time and avoid making the same mistakes that others have made in the past.

Come and join the Leeds Data Community and start learning and networking! All are welcome!

LDPaC In-Person Meetup 12th Sep: Transforming SQL Auth & Flink-Kafka Fusion

Stream processing with Redpanda and Apache Flink 2024-07-16 · 15:00

Abstract: Join us for a comprehensive discussion on stream processing with Redpanda and Apache Flink. This talk is designed for those who have some working knowledge of Kafka/Redpanda and are looking to learn more about stream processing.

The first part of our talk will focus on introducing stream processing with Flink. We begin by discussing stateless operations in Flink. Then we dive into how Flink handles the state and time concepts, covering a range of topics including stateful aggregations, window processing, and joins. Furthermore, we will explore some advanced topics, such as fault tolerance, state checkpointing, and event time processing, shedding light on its importance in stream processing.

The second half of the talk transitions from theory to practice, and will be more hands-on. We will guide you through the process of building a Flink application in Java. This segment is designed to provide a practical understanding of how to write, build, and deploy an application to a Flink cluster. We will demonstrate how this application is set up to ingest data from a Kafka topic, deploy it to a Flink cluster, and monitor how it operates under various conditions.

This talk is an excellent opportunity to enhance your understanding of these powerful tools. It will equip you with the knowledge and skills needed to build robust stream processing applications with Redpanda and Apache Flink.

My short biography: Dunith avidly enjoys designing, building, and operating large-scale real-time, event-driven architectures. He's got 10+ years of doing so and loves to share his learnings through blogging, videos, and public speaking. Dunith works at Redpanda as a Senior Developer Advocate, where he spends much time educating developers about building event-driven applications with Redpanda.

Stream processing with Redpanda and Apache Flink

Stream processing with Redpanda and Apache Flink 2024-07-16 · 15:00

Abstract: Join us for a comprehensive discussion on stream processing with Redpanda and Apache Flink. This talk is designed for those who have some working knowledge of Kafka/Redpanda and are looking to learn more about stream processing. The first part of our talk will focus on introducing stream processing with Flink. We begin by discussing stateless operations in Flink. Then we dive into how Flink handles the state and time concepts, covering a range of topics including stateful aggregations, window processing, and joins. Furthermore, we will explore some advanced topics, such as fault tolerance, state checkpointing, and event time processing, shedding light on its importance in stream processing. The second half of the talk transitions from theory to practice, and will be more hands-on. We will guide you through the process of building a Flink application in Java. This segment is designed to provide a practical understanding of how to write, build, and deploy an application to a Flink cluster. We will demonstrate how this application is set up to ingest data from a Kafka topic, deploy it to a Flink cluster, and monitor how it operates under various conditions. This talk is an excellent opportunity to enhance your understanding of these powerful tools. It will equip you with the knowledge and skills needed to build robust stream processing applications with Redpanda and Apache Flink.

My short biography: Dunith avidly enjoys designing, building, and operating large-scale real-time, event-driven architectures. He's got 10+ years of doing so and loves to share his learnings through blogging, videos, and public speaking. Dunith works at Redpanda as a Senior Developer Advocate, where he spends much time educating developers about building event-driven applications with Redpanda

Stream processing with Redpanda and Apache Flink

Data Night with GenAI, Data Stream and Vector DB 2024-03-21 · 17:00

** RSVP: https://www.aicamp.ai/event/eventdetails/W2024032110

Description: Join us for the Data Night in Paris. This time we are joining forces with our friend from RisingWave Labs, PingCap and Aiven to bring you the exciting event that focuses on generative AI, data stream and vector databases.

Agenda: * 6:00pm\~6:30pm: Checkin, Food and Networking * 6:30pm\~8:30pm: Tech talks and Q&A * 8:30pm\~9:00pm: Open discussing, mixer

Tech Talk: Unleashing the Power of SQL for Stream Processing Speaker: Rayees Pasha (RisingWave Labs) **Abstract:**In this talk, I will present: 1) An overview of differences between Stream processing engines and Streaming databases; 2) Introduce RisingWave, a distributed SQL streaming database designed for the cloud; 3) Common use cases where SQL is an excellent complement for Stream processing

Tech Talk: Powering Stream Processing with Generative AI: A Fusion of Flink, Kafka, and Langchain4j Speaker: Sebastien Blanc (Aiven) **Abstract:**This talk dives into a dynamic approach to stream processing, blending the power of Apache Flink, Kafka, and Langchain4j with generative AI. By integrating these tools, users can not only process real-time data efficiently but also generate synthetic data for various purposes. The fusion of these technologies opens doors to innovative stream processing solutions with broad applications across different fields.

Tech Talk: Tackle stream processing and GenAI Speaker: Daniel Valdivia (MinIO) Abstract:Traditional AI/ML pipelines that rely on the idea of the data being available locally before starting crunching on the accelerator are simply not fit for the scale of data modern AI requires. In this talk we'll touch on the key advantages of streaming data from object storage straight to memory and into the hardware accelerator, skipping the filesystem altogether and how to enable modern AI models like LLM and Diffusion models.

Topics/Speakers: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 10,000+ AI developers in Paris or 300K+ worldwide.

Community on Slack/Discord

Event chat: chat and connect with speakers and attendees
Sharing blogs, events, job openings, projects collaborations
Join Slack/Discord (link is at the bottom of the page)

Data Night with GenAI, Data Stream and Vector DB

Data Night with GenAI, Data Stream and Vector DB 2024-03-21 · 17:00

** RSVP: https://www.aicamp.ai/event/eventdetails/W2024032110

Description: Join us for the Data Night in Paris. This time we are joining forces with our friend from RisingWave Labs, PingCap and Aiven to bring you the exciting event that focuses on generative AI, data stream and vector databases.

Agenda: * 6:00pm\~6:30pm: Checkin, Food and Networking * 6:30pm\~8:30pm: Tech talks and Q&A * 8:30pm\~9:00pm: Open discussing, mixer

Tech Talk: Unleashing the Power of SQL for Stream Processing Speaker: Rayees Pasha (RisingWave Labs) **Abstract:**In this talk, I will present: 1) An overview of differences between Stream processing engines and Streaming databases; 2) Introduce RisingWave, a distributed SQL streaming database designed for the cloud; 3) Common use cases where SQL is an excellent complement for Stream processing

Tech Talk: Powering Stream Processing with Generative AI: A Fusion of Flink, Kafka, and Langchain4j Speaker: Sebastien Blanc (Aiven) **Abstract:**This talk dives into a dynamic approach to stream processing, blending the power of Apache Flink, Kafka, and Langchain4j with generative AI. By integrating these tools, users can not only process real-time data efficiently but also generate synthetic data for various purposes. The fusion of these technologies opens doors to innovative stream processing solutions with broad applications across different fields.

Tech Talk: Tackle stream processing and GenAI Speaker: Daniel Valdivia (MinIO) Abstract:Traditional AI/ML pipelines that rely on the idea of the data being available locally before starting crunching on the accelerator are simply not fit for the scale of data modern AI requires. In this talk we'll touch on the key advantages of streaming data from object storage straight to memory and into the hardware accelerator, skipping the filesystem altogether and how to enable modern AI models like LLM and Diffusion models.

Topics/Speakers: Stay tuned as we are updating speakers and schedules. If you have a keen interest in speaking to our community, we invite you to submit topics for consideration: Submit Topics

Sponsors: We are actively seeking sponsors to support AI developers community. Whether it is by offering venue spaces, providing food, or cash sponsorship. Sponsors will not only speak at the meetups, receive prominent recognition, but also gain exposure to our extensive membership base of 10,000+ AI developers in Paris or 300K+ worldwide.

Community on Slack/Discord

Event chat: chat and connect with speakers and attendees
Sharing blogs, events, job openings, projects collaborations
Join Slack/Discord (link is at the bottom of the page) *

Data Night with GenAI, Data Stream and Vector DB

Reducing The Barrier To Entry For Building Stream Processing Applications With Decodable 2023-10-15 · 23:00

Eric Sammer – Founder @ Decodable , Tobias Macey – host

Summary

Building streaming applications has gotten substantially easier over the past several years. Despite this, it is still operationally challenging to deploy and maintain your own stream processing infrastructure. Decodable was built with a mission of eliminating all of the painful aspects of developing and deploying stream processing systems for engineering teams. In this episode Eric Sammer discusses why more companies are including real-time capabilities in their products and the ways that Decodable makes it faster and easier.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstack This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare production and development environments and column-level lineage to show you the exact impact of every code change on data, metrics, and BI tools, keeping your team productive and stakeholders happy. Datafold integrates with dbt, the modern data stack, and seamlessly plugs in your data CI for team-wide and automated testing. If you are migrating to a modern data stack, Datafold can also help you automate data and code validation to speed up the migration. Learn more about Datafold by visiting dataengineeringpodcast.com/datafold You shouldn't have to throw away the database to build with fast-changing data. You should be able to keep the familiarity of SQL and the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date. With Materialize, you can! It’s the only true SQL streaming database built from the ground up to meet the needs of modern data products. Whether it’s real-time dashboarding and analytics, personalization and segmentation or automation and alerting, Materialize gives you the ability to work with fresh, correct, and scalable results — all in a familiar SQL interface. Go to dataengineeringpodcast.com/materialize today to get 2 weeks free! As more people start using AI for projects, two things are clear: It’s a rapidly advancing field, but it’s tough to navigate. How can you get the best results for your use case? Instead of being subjected to a bunch of buzzword bingo, hear directly from pioneers in the developer and data science space on how they use graph tech to build AI-powered apps. . Attend the dev and ML talks at NODES 2023, a free online conference on October 26 featuring some of the brightest minds in tech. Check out the agenda and register today at Neo4j.com/NODES. Your host is Tobias Macey and today I'm interviewing Eric Sammer about starting your stream processing journey with Decodable

Interview

Introduction How did you get involved in the area of data management? Can you describe what Decodable is and the story behind it?

What are the notable changes to the Decodable platform since we last spoke? (October 2021) What are the industry shifts that have influenced the product direction?

What are the problems that customers are trying to solve when they come to Decodable? When you launched your focus was on SQL transformations of streaming data. What was the process for adding full Java support in addition to SQL? What are the developer experience challenges that are particular to working with streaming data?

How have you worked to address that in the Decodable platform and interfaces?

As you evolve the technical and product direction, what is your heuristic for balancing the unification of interfaces and system integration against the ability to swap different components or interfaces as new technologies are introduced? What are the most interesting, innovative, or unexpected ways that you have seen Decodable used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Decodable? When is Decodable the wrong choice? What do you have planned for the future of Decodable?

Contact Info

esammer on GitHub LinkedIn

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don't forget to check out our other shows. Podcast.init covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you've learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

Links

Decodable

Podcast Episode

Understanding the Apache Flink Journey Flink

Podcast Episode

Debezium

Podcast Episode

Kafka Redpanda

Podcast Episode

Kinesis PostgreSQL

Podcast Episode

Snowflake

Podcast Episode

Databricks Startree Pinot

Podcast Episode

Rockset

Podcast Episode

Druid InfluxDB Samza Storm Pulsar

Podcast Episode

ksqlDB

Podcast Episode

dbt GitHub Actions Airbyte Singer Splunk Outbox Pattern

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA Sponsored By: Neo4J: NODES Conference Logo

NODES 2023 is a free online conference focused on graph-driven innovations with content for all skill levels. Its 24 hours are packed with 90 interactive technical sessions from top developers and data scientists across the world covering a broad range of topics and use cases. The event tracks: - Intelligent Applications: APIs, Libraries, and Frameworks – Tools and best practices for creating graph-powered applications and APIs with any software stack and programming language, including Java, Python, and JavaScript - Machine Learning and AI – How graph technology provides context for your data and enhances the accuracy of your AI and ML projects (e.g.: graph neural networks, responsible AI) - Visualization: Tools, Techniques, and Best Practices – Techniques and tools for exploring hidden and unknown patterns in your data and presenting complex relationships (knowledge graphs, ethical data practices, and data representation)

Don’t miss your chance to hear about the latest graph-powered implementations and best practices for free on October 26 at NODES 2023. Go to Neo4j.com/NODES today to see the full agenda and register!Rudderstack:

Introducing RudderStack Profiles. RudderStack Profiles takes the SaaS guesswork and SQL grunt work out of building complete customer profiles so you can quickly ship actionable, enriched data to every downstream team. You specify the customer traits, then Profiles runs the joins and computations for you to create complete customer profiles. Get all of the details and try the new product today at dataengineeringpodcast.com/rudderstackMaterialize:

You shouldn't have to throw away the database to build with fast-changing data. Keep the familiar SQL, keep the proven architecture of cloud warehouses, but swap the decades-old batch computation model for an efficient incremental engine to get complex queries that are always up-to-date.

That is Materialize, the only true SQL streaming database built from the ground up to meet the needs of modern data products: Fresh, Correct, Scalable — all in a familiar SQL UI. Built on Timely Dataflow and Differential Dataflow, open source frameworks created by cofounder Frank McSherry at Microsoft Research, Materialize is trusted by data and engineering teams at Ramp, Pluralsight, Onward and more to build real-time data products without the cost, complexity, and development time of stream processing.

Go to materialize.com today and get 2 weeks free!Datafold:

This episode is brought to you by Datafold – a testing automation platform for data engineers that finds data quality issues before the code and data are deployed to production. Datafold leverages data-diffing to compare…

AI/ML Airbyte Analytics Flink API Kinesis BI CI/CD Cloud Computing Data Engineering Data Management Data Quality Data Science Databricks Dataflow Datafold dbt Druid GitHub Java JavaScript Kafka Modern Data Stack Microsoft Neo4j postgresql Python Redpanda SaaS Singer Snowflake Splunk SQL Data Streaming

Data Engineering Podcast

Listen

Stream Processing for Kafka: Comparing Kafka Streams, ksqlDB and Flink 2023-08-29 · 19:00

Jan Svoboda – Senior Solutions Engineer @ Confluent

In this session, we will explore the stream processing capabilities for Kafka and compare the three popular options: Kafka Streams, ksqlDB, and Apache Flink®. We will dive into the strengths and limitations of each technology, and compare them based on their ease of use, performance, scalability, and flexibility. By the end of the session, attendees will have a better understanding of the different options available for stream processing with Kafka, and which technology might be the best fit for their specific use case. This session is ideal for developers, data engineers, and architects who want to leverage the power of Kafka for real-time data processing.

Bio:

Before Jan Svoboda started his Apache Kafka journey at Confluent, he worked as an Advisory Platform Architect at Pivotal and DevOps Solutions Architect at IBM, among others. Jan joined Confluent in April 2020 as a Solutions Engineer, establishing microservices development as his favourite topic. Jan holds degrees in Management of Information Systems from UNYP and Computer Science from UCF.

kafka streams ksqldb flink Kafka

Apache Flink® for Apache Kafka® Developers

Introduction to Apache Flink® 2023-08-29 · 18:30

David Moravek – Staff Software Engineer II @ Confluent

In this session, David will demystify the misconceptions around the complexity of Apache Flink, touch on its use cases, and get you up to speed for your stream processing endeavor. All of that, in real-time.

flink

Apache Flink® for Apache Kafka® Developers

talk-data.com

People (2 results)

Activities & events