Advanced SQL

2026-08-25 · O'Reilly SQL Books O'Reilly Amazon

book

by Hélder Russa , Pedro Esmeriz , Rui Machado

AI/ML Analytics Flink Cloud Computing Data Lake GenAI RDBMS SQL Data Streaming nosql databases

SQL is no longer just a querying language for relational databases—it's a foundational tool for building scalable, modern data solutions across real-time analytics, machine learning workflows, and even generative AI applications. Advanced SQL shows data professionals how to move beyond conventional SELECT statements and tap into the full power of SQL as a programming interface for today's most advanced data platforms. Written by seasoned data experts Rui Pedro Machado, Hélder Russa, and Pedro Esmeriz, this practical guide explores the role of SQL in streaming architectures (like Apache Kafka and Flink), data lake ecosystems, cloud data warehouses, and ML pipelines. Geared toward data engineers, analysts, scientists, and analytics engineers, the book combines hands-on guidance with architectural best practices to help you extend your SQL skills into emerging workloads and real-world production systems. Use SQL to design and deploy modern, end-to-end data architectures Integrate SQL with data lakes, stream processing, and cloud platforms Apply SQL in feature engineering and ML model deployment Master pipe syntax and other advanced features for scalable, efficient queries Leverage SQL to build GenAI-ready data applications and pipelines

Practical Data Engineering with Apache Projects: Solving Everyday Data Challenges with Spark, Iceberg, Kafka, Flink, and More

2026-01-01 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Dunith Danushka

Airflow Flink Data Engineering Iceberg Spark Trino data data-engineering streaming-messaging

This book is a comprehensive guide designed to equip you with the practical skills and knowledge necessary to tackle real-world data challenges using Open Source solutions. Focusing on 10 real-world data engineering projects, it caters specifically to data engineers at the early stages of their careers, providing a strong foundation in essential open source tools and techniques such as Apache Spark, Flink, Airflow, Kafka, and many more. Each chapter is dedicated to a single project, starting with a clear presentation of the problem it addresses. You will then be guided through a step-by-step process to solve the problem, leveraging widely-used open-source data tools. This hands-on approach ensures that you not only understand the theoretical aspects of data engineering but also gain valuable experience in applying these concepts to real-world scenarios. At the end of each chapter, the book delves into common challenges that may arise during the implementation of the solution, offering practical advice on troubleshooting these issues effectively. Additionally, the book highlights best practices that data engineers should follow to ensure the robustness and efficiency of their solutions. A major focus of the book is using open-source projects and tools to solve problems encountered in data engineering. In summary, this book is an indispensable resource for data engineers looking to build a strong foundation in the field. By offering practical, real-world projects and emphasizing problem-solving and best practices, it will prepare you to tackle the complex data challenges encountered throughout your career. Whether you are an aspiring data engineer or looking to enhance your existing skills, this book provides the knowledge and tools you need to succeed in the ever-evolving world of data engineering. You Will Learn: The foundational concepts of data engineering and practical experience in solving real-world data engineering problems How to proficiently use open-source data tools like Apache Kafka, Flink, Spark, Airflow, and Trino 10 hands-on data engineering projects Troubleshoot common challenges in data engineering projects Who is this book for: Early-career data engineers and aspiring data engineers who are looking to build a strong foundation in the field; mid-career professionals looking to transition into data engineering roles; and technology enthusiasts interested in gaining insights into data engineering practices and tools.

AWS re:Invent 2025 - Powering your Agentic AI experience with AWS Streaming and Messaging (ANT310)

2025-12-05 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Flink AWS Kinesis Cloud Computing GenAI Data Streaming

Organizations are accelerating innovation with generative AI and agentic AI use cases. This session explores how AWS streaming and messaging services such as Amazon Managed Streaming for Apache Kafka, Kinesis Data Streams, Amazon Managed Service for Apache Flink, and Amazon SQS build intelligent, responsive applications. Discover how streaming supports real-time data ingestion and processing, while messaging ensures reliable coordination between AI agents, orchestrates workflows, and delivers critical information at scale. Learn architectural patterns that highlight how a unified approach acts on data as fast as needed, providing the reliability and scale to grow for your next generation of AI.

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

AWS re:Invent 2025 - Operating Apache Kafka and Apache Flink at scale (ANT307)

2025-12-02 · AWS re:Invent 2024 Watch

video

Agile/Scrum AI/ML Analytics Flink AWS Cloud Computing Data Streaming

Enterprises use Apache Kafka and Apache Flink for an increasing number of mission-critical use-cases, real-time analytics, application messaging, and machine learning. As this usage grows in size and scale, so does the criticality, scale, and cost of managing the Kafka and Flink clusters. Learn how customers can achieve the same or higher availability and durability of their growing clusters, both at lower unit costs and with operational simplicity with Amazon MSK (Managed Streaming for Apache Kafka), and Amazon MSF (Managed Streaming for Apache Flink).

Learn more: More AWS events: https://go.aws/3kss9CP

Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4

ABOUT AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.

AWSreInvent #AWSreInvent2025 #AWS

From Crypto Streams to AI-Powered Predictions

2025-12-01 · Crypto Streams to AI Predictions: Apache Kafka®, Apache Flink® & Apache Iceberg®

workshop

by Olena Kutsenko (Confluent)

DuckDB apache iceberg confluent cloud flink tableflow

In this 2-hour hands-on workshop, you'll build an end-to-end streaming analytics pipeline that captures live cryptocurrency prices, processes them in real-time, and uses AI to forecast the future. Ingest live crypto data into Apache Kafka using Kafka Connect; tame that chaos with Apache Flink's stream processing; freeze streams into queryable Apache Iceberg tables using Tableflow; and forecast price trends with Flink AI.

Stop Stitching Dashboards: How Kafka + Iceberg Deliver Seamless Time Travel

2025-11-27 · Grafana & Friends France : édition de novembre chez Elaia

talk

Grafana apache iceberg

Grafana's magic comes from seeing what's happening right now and instantly comparing it to everything that has happened before. Real-time data lets teams spot anomalies the moment they emerge. Long-term data reveals whether those anomalies are new, seasonal, or the same gremlins haunting your system every quarter. But actually building this capability? That's where everything gets messy. Today's dashboards are cobbled together from two very different worlds: Long-term data living in lakes and warehouses; Real-time streams blasting through Kafka or similar systems. These systems rarely fit together cleanly, which forces dashboard developers to wrestle with: Differing processing concepts - What does SQL even mean on a stream? Inconsistent governance - Tables vs. message schemas, different owners, different rules; Incomplete history - Not everything is kept forever, and you never know what will vanish next; Maintenance drift - As pipelines evolve, your ETL always falls behind. But what if there were no separation at all? Join us for a deep dive into a new, unified approach where real-time and historical data live together in a single, seamless dataset. Imagine dashboards powered by one source of truth that stretches from less than one second ago to five, ten, or even fifteen years into the past, without stitching, syncing, or duct-taping systems together. Using Apache Kafka, Apache Iceberg, and a modern architectural pattern that eliminates the old 'batch vs. stream' divide, we'll show how to: Build Grafana dashboards that just work with consistent semantics end-to-end; Keep every message forever without drowning in storage costs; Query real-time and historical data with the same language, same governance, same everything; Escape the ETL death spiral once and for all. If you've ever wished your dashboards were both lightning-fast and infinitely deep, this talk will show you how close that future really is.

The lifetime of a write, 3 ways: in Postgres, Kafka and Flink

2025-11-13 · Tides of Change: Real-Time Flow with Postgres, Kafka & Flink

talk

by Celeste Hogan (Snowflake)

flink postgresql

Kafka and Flink tend to get lumped in as "data services", in the sense that they process data, but in comparison to traditional databases they differ quite dramatically in functionality and utility. In this talk, we'll run through the lifetime of a write in Postgres to establish a baseline, understanding all the different services that data hits on its way down to the disk. Then we'll walk through writing data to a Kafka topic, and what 'writing' (or really, streaming) data to a Flink workflow looks like from a similar systems perspective. Along the way, we'll understand the key differences between the services and why some are more suited to long-term data storage than others.

From Stream to Table: Building Kafka-to-Iceberg Pipelines

2025-10-23 · IN PERSON: Tooling for running Apache Kafka in Production

talk

by Will Martin (Dremio)

apache iceberg

While Kafka excels at streaming data, the real challenge lies in making that data analytically useful without sacrificing consistency or performance. This talk explores why Apache Iceberg has emerged as the ideal streaming destination, offering ACID transactions, schema evolution, and time travel capabilities that traditional data lakes can't match. Learn about some foundational tools that enable streaming pipelines and why they all converge on this next-generation table format built for flexibility and scalability.

Message Flow Tracking for Kafka and other Messaging / Streaming Platforms

2025-10-23 · IN PERSON: Tooling for running Apache Kafka in Production

talk

by Scott Corrigan (meshIQ)

event streaming ibm mq meshiq messaging rabbitmq

In this session, see how meshIQ, a comprehensive management and observability platform for messaging and event streaming technologies like Kafka, RabbitMQ and IBM MQ, can be used to track application message flows, help identify bottlenecks and locate missing messages.

Message Flow Tracking for Kafka and other Messaging / Streaming Platforms

2025-10-23 · Message Tracking, Fluss in Apache Flink 2.x, & Kafka-to-Iceberg Transformation

talk

by Scott Corrigan (meshIQ)

event streaming ibm mq messaging rabbitmq

In this session, see how meshIQ, a comprehensive management and observability platform for messaging and event streaming technologies like Kafka, RabbitMQ and IBM MQ, can be used to track application message flows, help identify bottlenecks and locate "missing" messages.

DevOps/DataOps Journey to implement a Data Platform streaming solution

2025-10-01 · DevOps/DataOps Journey to implement a Data Platform streaming solution

talk

by Rafael Natali (Marionete) , Venkata Abburi (Marionete)

Kubernetes

In this session, we’ll explore the real-world journey of implementing a scalable, secure, and resilient data streaming platform—from the ground up. Bridging DevOps and DataOps practices, we’ll cover how our team designed the architecture, selected the right tools (like Kafka and Kubernetes), automated deployments, and enforced data governance across environments. You'll learn how we tackled challenges like schema evolution, CI/CD for data pipelines, monitoring at scale, and team collaboration. Whether you're just starting or scaling your data platform, this talk offers practical takeaways and battle-tested lessons from the trenches of building streaming infrastructure in production.

Comment assurer une gestion automatisée des pipelines de données pour du Big Data en temps réel

2025-10-01 · Big Data & AI Paris 2025

Face To Face

by ⁠Frédéric BONNET (ARCELORMITTAL) , Gabriel FRICOUT (ARCELORMITTAL)

AI/ML Big Data

L'industrie sidérurgique se distingue par des filières de production complexes et des processus physico-chimiques exigeants, générant un besoin majeur de modélisation pour optimiser la productivité.

Notre stratégie de consolidation des informations industrielles et de mise à disposition de flux de données temps réel permet de multiplier et d'accélérer le déploiement de modèles tant physiques qu'à base d'intelligence artificielle.

Les principaux sujets abordés sont :

• La libération et le désilotage des données via une architecture hybride basée sur Kafka

• La simplification de l'intégration de composants dans nos systèmes industriels

• Le gain d'autonomie des équipes via des outils self-service (catalogue de données, requêtage, plateformes de modélisation)

Cette session s’achève sur un cas concret de modélisation, illustrant les bénéfices opérationnels apportés par cette architecture, avec un focus sur le pilotage en temps réel du process pour maîtriser la qualité produit avec des solutions hybrides couplant la modélisation physique et des modèles de données.