talk-data.com
Activities & events
| Title & Speakers | Event |
|---|---|
|
IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup
2025-04-14 · 16:00
Join us for an Apache Kafka® Meetup on Monday, April 14th from 6:00pm hosted by Elaia! Elaia is a full stack tech and deep tech investor. We partner with ambitious entrepreneurs from inception to leadership, helping them navigate the future and the unknown. For over twenty years, we have combined deep scientific and technological expertise with decades of operational experience to back those building tomorrow. From our offices in Paris, Barcelona and Tel Aviv, we have been active partners with over 100 startups including Criteo, Mirakl, Shift Technology, Aqemia and Alice & Bob. 📍Venue: Elaia 21 Rue d'Uzès, 75002 Paris, France IF YOU RSVP HERE, YOU DO NOT NEED TO RSVP @ Paris Apache Kafka® Meetup group. 🗓 Agenda:
💡 Speaker One: Roman Kolesnev, Principal Software Engineer, Streambased Talk: Melting Icebergs: Enabling Analytical Access to Kafka Data through Iceberg Projections Abstract: An organisation's data has traditionally been split between the operational estate, for daily business operations, and the analytical estate for after-the-fact analysis and reporting. The journey from one side to the other is today a long and torturous one. But does it have to be? In the modern data stack Apache Kafka is your defacto standard operational platform and Apache Iceberg has emerged as the champion of table formats to power analytical applications. Can we leverage the best of Iceberg and Kafka to create a powerful solution greater than the sum of its parts? Yes you can and we did! This isn't a typical story of connectors, ELT, and separate data stores. We've developed an advanced projection of Kafka data in an Iceberg-compatible format, allowing direct access from warehouses and analytical tools. In this talk, we'll cover: * How we presented Kafka data for Iceberg processors without moving or transforming data upfront—no hidden ETL! * Integrating Kafka's ecosystem into Iceberg, leveraging Schema Registry, consumer groups, and more. * Meeting Iceberg's performance and cost reduction expectations while sourcing data directly from Kafka. Expect a technical deep dive into the protocols, formats, and services we used, all while staying true to our core principles: * Kafka as the single source of truth—no separate stores. * Analytical processors shouldn't need Kafka-specific adjustments. * Operational performance must remain uncompromised. * Kafka's mature ecosystem features, like ACLs and quotas, should be reused, not reinvented. Join us for a thrilling account of the highs and lows of merging two data giants and stay tuned for the surprise twist at the end! Bio: Roman is a Principal Software Engineer at Streambased. His experience includes building business critical event streaming applications and distributed systems in the financial and technology sectors. 💡 Speaker Two: Viktor Gamov, Principal Developer Advocate, Confluent One Does Not Simply Query a Stream Abstract: Streaming data with Apache Kafka® has become the backbone of modern day applications. While streams are ideal for continuous data flow, they lack built-in querying capability. Unlike databases with indexed lookups, Kafka's append-only logs are designed for high throughput processing, not for on-demand querying. This necessitates teams to build additional infrastructure to enable query capabilities for streaming data. Traditional methods replicate this data into external stores such as relational databases like PostgreSQL for operational workloads and object storage like S3 with Flink, Spark, or Trino for analytical use cases. While useful sometimes, these methods deepen the divide between operational and analytical estates, creating silos, complex ETL pipelines, and issues with schema mismatches, freshness, and failures. In this session, we’ll explore and see live demos of some solutions to unify the operational and analytical estates, eliminating data silos. We’ll start with stream processing using Kafka Streams, Apache Flink®, and SQL implementations, then cover integration of relational databases with real-time analytics databases such as Apache Pinot® and ClickHouse. Finally, we’ll dive into modern approaches like Apache Iceberg® with Tableflow, which simplifies data preparation by seamlessly representing Kafka topics and associated schemas as Iceberg or Delta tables in a few clicks. While there's no single right answer to this problem, as responsible system builders, we must understand our options and trade-offs to build robust architectures. Bio: Viktor Gamov is a Principal Developer Advocate at Confluent, founded by the original creators of Apache Kafka®. . With a rich background in implementing and advocating for distributed systems and cloud-native architectures, Viktor excels in open-source technologies. He is passionate about assisting architects, developers, and operators in crafting systems that are not only low in latency and scalable but also highly available. As a Java Champion and an esteemed speaker, Viktor is known for his insightful presentations at top industry events like JavaOne, Devoxx, Kafka Summit, and QCon. His expertise spans distributed systems, real-time data streaming, JVM, and DevOps. Viktor has co-authored "Enterprise Web Development" from O'Reilly and "Apache Kafka® in Action" from Manning. Follow Viktor on X - @gamussa to stay updated with Viktor's latest thoughts on technology, his gym and food adventures, and insights into open-source and developer advocacy. *** DISCLAIMER We cannot cater to those under the age of 18. If you would like to speak at / host a future meetup, please reach out to [email protected] |
IN PERSON: Apache Kafka® x Apache Iceberg™ Meetup
|
|
ClickHouse Delhi/Gurgaon Meetup - March 2025
2025-03-22 · 05:00
We are excited to finally have the first ClickHouse Meetup in the vibrant city of Delhi! Join the ClickHouse crew, from Singapore and from different cities in India, for an engaging day of talks, food, and discussion with your fellow database enthusiasts. But here's the deal: to secure your spot, make sure you register ASAP! 🗓️ Agenda:
If anyone from the community is interested in sharing a talk at future meetups, complete this CFP form and we’ll be in touch. _______ 🎤 Session Details: Introduction to ClickHouse Discover the secrets behind ClickHouse's unparalleled efficiency and performance. Johnny will give an overview of different use cases for which global companies are adopting this groundbreaking database to transform data storage and analytics. Speaker: Rakesh Puttaswamy, Solution Architect @ ClickHouse Rakesh Puttaswamy is a Solution Architect with ClickHouse, working with users across India, with over 12 years of experience in data architecture, big data, data science, and software engineering.Rakesh helps organizations design and implement cutting-edge data-driven solutions. With deep expertise in a broad range of databases and data warehousing technologies, he specializes in building scalable, innovative solutions to enable data transformation and drive business success. 🎤 Session Details: ClickPipes Overview and demo ClickPipes is a powerful integration engine that simplifies data ingestion at scale, making it as easy as a few clicks. With an intuitive onboarding process, setting up new ingestion pipelines takes just a few steps—select your data source, define the schema, and let ClickPipes handle the rest. Designed for continuous ingest, it automates pipeline management, ensuring seamless data flow without manual intervention. In this talk, Kunal will demo the Postgres CDC connector for ClickPipes, enabling seamless, native replication of Postgres data to ClickHouse Cloud in just a few clicks—no external tools needed for fast, cost-effective analytics. Speaker: Kunal Gupta, Sr. Software Engineer @ ClickHouse Kunal Gupta is a Senior Software Engineer at ClickHouse, joining through the acquisition of PeerDB in 2024, where he played a pivotal role as a founding engineer. With several years of experience in architecting scalable systems and real-time applications, Kunal has consistently driven innovation and technical excellence. Previously, he was a founding engineer for new solutions at ICICIdirect and at AsknBid Tech, leading high-impact teams and advancing code analysis, storage solutions, and enterprise software development. 🎤 Session Details: Optimizing Log Management with Clickhouse: Cost-Effective & Scalable Solutions Efficient log management is essential in today's cloud-native environments, yet traditional solutions like ElasticSearch often face scalability issues, high costs, and performance limitations. This talk will begin with an overview of common logging tools and their challenges, followed by an in-depth look at ClickHouse's architecture. We will compare ClickHouse with ElasticSearch, focusing on improvements in query performance, storage efficiency, and overall cost-effectiveness. A key highlight will be OLX India's migration to ClickHouse, detailing the motivations behind the shift, the migration strategy, key optimizations, and the resulting 50% reduction in log storage costs. By the end of this talk, attendees will gain a clear understanding of when and how to leverage ClickHouse for log management, along with best practices for optimizing performance and reducing operational costs. Speaker: Pushpender Kumar, DevOps Architect @ OLX India Born and raised in Bijnor, moved to Delhi to stay ahead in the race of life. Currently working as a DevOps Architect at OLX India, specializing in cloud infrastructure, Kubernetes, and automation with over 10 years of experience. Successfully optimized log storage costs by 50% using Clickhouse, bringing scalability and efficiency to large-scale logging systems. Passionate about cloud optimization, DevOps hiring, and performance engineering. 🎤 Session Details: ClickHouse at Physics Wallah: Empowering Real-Time Analytics at Scale This session explores how Physics Wallah revolutionized its real-time analytics capabilities by leveraging ClickHouse. We'll delve into the journey of implementing ClickHouse to efficiently handle large-scale data processing, optimize query performance, and power diverse use cases such as user activity tracking and engagement analysis. By enabling actionable insights and seamless decision-making, this transformation has significantly enhanced the learning experience for millions of users. Today, more than five customer-facing products at Physics Wallah are powered by ClickHouse, serving over 10 million students and parents, including 1.5 million Daily Active Users. Our in-house ClickHouse cluster, hosted and managed within our EKS infrastructure on AWS Cloud, ingests more than 10 million rows of data daily from various sources. Join us to learn about the architecture, challenges, and key strategies behind this scalable, high-performance analytics solution. Speaker: Utkarsh G. Srivastava, Software Development Engineer III @ Physics Wallah As a versatile Software Engineer with over 7 years of experience in the IT industry, I have had the privilege of taking on diverse roles, with a primary focus on backend development, data engineering, infrastructure, DevOps, and security. Throughout my career, I have played a pivotal role in transformative projects, consistently striving to craft innovative and effective solutions for customers in the SaaS space. 🎤 Session Details: FabFunnel & ClickHouse: Delivering Real-Time Marketing Analytics We are a performance marketing company that relies on real-time reporting to drive data-driven decisions and maximize campaign effectiveness. As our client base expanded, we encountered significant challenges with our reporting system—frequent data updates meant handling large datasets inefficiently, leading to slow query execution and delays in delivering insights. This bottleneck hindered our ability to provide timely optimizations for ad campaigns. To address these issues, we needed a solution that could handle rapid data ingestion and querying at scale without the overhead of traditional refresh processes. In this talk, we’ll share how we transformed our reporting infrastructure to achieve real-time insights, enhancing speed, scalability, and efficiency in managing large-scale ad performance data. Speakers: Anmol Jain, SDE-2 (Full stack Developer), & Siddhant Gaba, SDE-2 (Python) @ Idea Clan From competing as a national table tennis player to building high-performance software, Anmol Jain brings a unique mix of strategy and problem-solving to tech. With 3+ years of experience at Idea Clan, they play a key role in scaling Lookfinity and FabFunnel, managing multi-million-dollar ad spends every month. Specializing in ClickHouse, React.js, and Node.js, Anmol focuses on real-time data processing and scalable backend solutions. At this meet-up, they’ll share insights on solving reporting challenges and driving real-time decision-making in performance marketing. Siddhant Gaba is an SDE II at Idea Clan, with expertise in Python, Java, and C#, specializing in scalable backend systems. With four years of experience working with FastAPI, PostgreSQL, MongoDB, and ClickHouse, he focuses on real-time analytics, database optimization, and distributed systems. Passionate about high-performance computing, asynchronous APIs, and system design, he aims to advance real-time data processing. Outside of work, he enjoys playing volleyball. At this meetup, he will share insights on how ClickHouse transformed real-time reporting and scalability. 🎤 Session Details: From SQL to AI: Building Intelligent Applications with ClickHouse and LangDB As AI becomes a driving force behind innovation, building applications that seamlessly integrate AI capabilities with existing data infrastructures is critical. In this session, we explore the creation of agentic applications using ClickHouse and LangDB. We will introduce the concept of an AI gateway, explaining its role in connecting powerful AI models with the high-performance analytics engine of ClickHouse. By leveraging LangDB, we demonstrate how to directly interact with AI functions as User-Defined Functions (UDFs) in ClickHouse, enabling developers to design and execute complex AI workflows within SQL. Additionally, we will showcase how LangDB facilitates deep visibility into AI function behaviors and agent interactions, providing tools to analyze and optimize the performance of AI-driven logic. Finally, we will highlight how ClickHouse, powered by LangDB APIs, can be used to evaluate and refine the quality of LLM responses, ensuring reliable and efficient AI integrations. Speaker: Matteo Pelati, Co-founder, LangDB.ai Matteo Pelati is a seasoned software engineer with over two decades of experience, specializing in data engineering for the past ten years. He is the co-founder of LangDB, a company based in Singapore building the fastest Open Source AI Gateway. Before founding LangDB, he was part of the early team at DataRobot, where he contributed to scaling their product for enterprise clients. Subsequently, he joined DBS Bank where he built their data platform and team from the ground up. Prior to starting LangDB, Matteo led the data group for Asia Pacific and data engineering at Goldman Sachs. |
ClickHouse Delhi/Gurgaon Meetup - March 2025
|
|
Building Scalable Cloud-Native Applications With Distributed SQL
2024-05-16 · 15:30
PLEASE register via this following link if you are participating this event: https://www.pingcap.com/event/16347/ In this meetup, we’ll uncover how to build cloud-native applications designed to scale effortlessly in dynamic environments. If you’re a seasoned developer or just getting started with cloud-native architecture, we’ll offer valuable insights and practical tips for designing robust, scalable applications. We’ll also explore distributed SQL, an evolved database architecture built from the ground up to deliver the transactional guarantees of relational databases and the horizontal scale of NoSQL databases. Our speakers will dive into the principles, challenges, and best practices of utilizing this database architecture effectively in your applications. We hope to see you there! Agenda: 16:30 Doors open 17:00 – 17:15 Welcome Speech 17:15 – 18:00 Introduction to Distributed SQL – TiDB by Daniël van Eeden, Technical Solution Engineer at PingCAP 18:00 – 18:45 Planning Your TiDB Deployment: A Step-by-Step Guide by Susana Salazar Oroz, Senior DBRE at Bolt 18:45 – 19:00 Break 19:00 – 19:30 Real-time Steaming with RisingWave by Jan Mensch, Developer at RisingWave |
Building Scalable Cloud-Native Applications With Distributed SQL
|
|
Getting Started with CockroachDB
2022-03-11
Kishen Das Kondabagilu Rajanna
– author
"Getting Started with CockroachDB" provides an in-depth introduction to CockroachDB, a modern, distributed SQL database designed for cloud-native applications. Through this guide, you'll learn how to deploy, manage, and optimize CockroachDB to build highly reliable, scalable database solutions tailored for demanding and distributed workloads. What this Book will help me do Understand the architecture and design principles of CockroachDB and its fault-tolerant model. Learn how to set up and manage CockroachDB clusters for high availability and automatic scaling. Discover the concepts of data distribution and geo-partitioning to achieve low-latency global interactions. Explore indexing mechanisms in CockroachDB to optimize query performance for fast data retrieval. Master operational strategies, security configuration, and troubleshooting techniques for database management. Author(s) Kishen Das Kondabagilu Rajanna is an experienced software developer and database expert with a deep interest in distributed architectures. With hands-on experience working with CockroachDB and other database technologies, Kishen is passionate about sharing actionable insights with readers. His approach focuses on equipping developers with practical skills to excel in building and managing scalable, efficient database services. Who is it for? This book is ideal for software developers, database administrators, and database engineers seeking to learn CockroachDB for building robust, scalable database systems. If you're new to CockroachDB but possess basic database knowledge, this guide will equip you with the practical skills to leverage CockroachDB's capabilities effectively. |
O'Reilly Data Engineering Books
|