postgresql

When Postgres is enough: solving document storage, pub/sub and distributed queues without more tools

2025-09-03 · PyData Berlin 2025 Watch

talk

by Eugen Geist

Pub/Sub

When a new requirement appears, whether it's document storage, pub/sub messaging, distributed queues, or even full-text search, Postgres can often handle it without introducing more infrastructure.

This talk explores how to leverage Postgres' native features like JSONB, LISTEN/NOTIFY, queueing patterns and vector extensions to build robust, scalable systems without increasing infrastructure complexity.

You'll learn practical patterns that extend Postgres just far enough, keeping systems simpler, more maintainable, and easier to operate, especially in small to medium projects or freelancing setups, where Postgres often already forms a critical part of the stack.

Postgres might not replace everything forever - but it can often get you much further than you think.

Flying Beyond Keywords: Our Aviation Semantic Search Journey

2025-09-03 · PyData Berlin 2025 Watch

talk

by Dat Tran (Priceloop) , Dennis Schmidt

AI/ML

In aviation, search isn’t simple—people use abbreviations, slang, and technical terms that make exact matching tricky. We started with just Postgres, aiming for something that worked. Over time, we upgraded: semantic embeddings, reranking. We tackled filter complexity, slow index builds, and embedding updates and much more. Along the way, we learned a lot about making AI search fast, accurate, and actually usable for our users. It’s been a journey—full of turbulence, but worth the landing.

Reco AI - Making the switch:ClickHouse as a Postgres alternative

2025-08-12 · Webinar: Cyber in Real Time: How Seemplicity & Reco Supercharged Their Security

talk

AI/ML ClickHouse

Reco AI presentation on switching to ClickHouse as a PostgreSQL alternative.

PGlite for Fun and Profit

2025-07-31 · Node.js Meetup #44 (Pizza 🍕 and Drinks 🍺)

talk

AI/ML

PGlite, a WASM build of PostgreSQL, offers a new way to run and use my favorite database. In this talk, we’ll explore the technology behind PGlite and look at various use cases. I’ll also share a real-world story about how I used it at my company, traide AI, and the challenges I faced—some of which I overcame, while others are still awaiting solutions.

Vectors: Best Practices for a Nasty Data Type

2025-07-14 · Vectors: Best Practices for a Nasty Data Type

talk

by Jonathan Katz (Amazon Redshift)

pgvector

Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. Applications requiring effective search techniques for vectors have advanced, with "retrieval-augmented generation" (RAG) becoming a key building technique. An extensible database like PostgreSQL can add vector search through an extension like pgvector.

In this talk, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector by taking a deeper look at how pgvector implements different vector search techniques. We'll also see where traditional databases methods are most effective for building RAG-driven apps.

At the end of this talk, you'll have a set of best practices you can use when designing applications that require vector search.

Real-Time Manufacturing Insights with Apache Flink and Kafka

2025-07-03 · IN-PERSON: Apache Flink® Meetup

talk

by Oded Nahum (Ness) , Laurentiu Bita (Ness)

Grafana Kafka flink

In this session, we’ll walk through how Apache Flink was used to enable near real-time operational insights using manufacturing IIoT Data sets. The goal: deliver actionable KPIs to production teams with sub-30-second latency, using streaming data pipelines built Kafka, Flink and Grafana. We’ll cover the key architectural patterns that made this possible, including handling structured data joins, managing out-of-order events, and integrating with downstream systems like PostgreSQL and Grafana. We’ll also share real-world performance benchmarks, lessons learned from scaling tests, and practical considerations for deploying Flink in a production-grade, low-latency analytics pipeline. The session will also include a live demo

If you're building Flink-based solutions for time-sensitive operations—whether in manufacturing, IoT, or other domains—this talk will provide proven insights from the field.

Real-Time Manufacturing Insights with Apache Flink and Kafka

2025-07-03 · IN-PERSON: Apache Flink® Meetup

talk

by Oded Nahum (Ness) , Laurentiu Bita (Ness)

Grafana Kafka flink

In this session, we’ll walk through how Apache Flink was used to enable near real-time operational insights using manufacturing IIoT Data sets. The goal: deliver actionable KPIs to production teams with sub-30-second latency, using streaming data pipelines built Kafka, Flink and Grafana. We’ll cover the key architectural patterns that made this possible, including handling structured data joins, managing out-of-order events, and integrating with downstream systems like PostgreSQL and Grafana. We’ll also share real-world performance benchmarks, lessons learned from scaling tests, and practical considerations for deploying Flink in a production-grade, low-latency analytics pipeline. The session will also include a live demo

If you're building Flink-based solutions for time-sensitive operations—whether in manufacturing, IoT, or other domains—this talk will provide proven insights from the field.

DISCLAIMER We don't cater to attendees under the age of 18. If you want to host or speak at a meetup, please email [email protected]

Common provider abstractions: Key for multi-cloud data handling

2025-07-01 · Airflow Summit 2025

session

by Vikram Koka (Astronomer)

Airflow Kinesis Cloud Computing Kafka Pub/Sub SAP Snowflake SQL

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time. This talk will dive into why these abstractions matter, how they reduce friction for developers while giving enterprises true multi-cloud optionality, and what’s next for Airflow’s evolving provider ecosystem.

Driving Analytics with Open Source: Airbyte, dbt, Airflow & Metabase

2025-07-01 · Airflow Summit 2025

session

by Ayoade Adegbite

Airbyte Airflow Analytics dbt Metabase

In this talk, I’ll walk through how we built an end-to-end analytics pipeline using open-source tools ( Airbyte, dbt, Airflow, and Metabase). At WirePick, we extract data from multiple sources using Airbyte OSS into PostgreSQL, transform it into business-specific data marts with dbt, and automate the entire workflow using Airflow. Our Metabase dashboards provide real-time insights, and we integrate Slack notifications to alert stakeholders when key business metrics change. This session will cover: Data extraction: Using Airbyte OSS to pull data from multiple sources Transformation & Modeling: How dbt helps create reusable data marts Automation & Orchestration: Managing the workflow with Airflow Data-driven decision-making: Delivering insights through Metabase & Slack alerts

Lessons learned for scaling up Airflow 3 in Public Cloud

2025-07-01 · Airflow Summit 2025

session

by Augusto Hidalgo , Przemek Więch

Airflow Cloud Computing GCP Cloud Composer Kubernetes

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2. This presentation is going to present performance results for various Airflow 3 configurations and provides potential Airflow 3 adopters good understanding of its performance. The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users. Users will be provided with comparison of Airflow 3 and Airflow 2 from performance standpoint Users also will learn how to optimize Airflow scheduler performance by understanding DAG file processing, task scheduling and configuring Scheduler to run tens of thousands of DAGs/tasks in Airflow 3

LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

2025-07-01 · Airflow Summit 2025

session

by Naseem Shah

Airflow API CI/CD Data Engineering LLM

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews. What You’ll Learn: Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability. Integrating LLM Analysis: Explore how we integrated LLM-based analysis into our pipeline. Learn how we designed the database, queries, and ingestion to Postgres. Extending Airflow UI: We developed a custom Airflow UI plugin that filters and visualizes DAG runs by customer, product, and marketplace, delivering clear insights for faster troubleshooting. Leveraging Airflow REST API: Discover how we leveraged the API to trigger DAGs on demand, elevating the UX by tracking mapped DAG progress and computing ETAs. CI/CD and Cost Management: Get practical tips for deploying DAGs with CI/CD.

Database in Distress: Testing and Repairing Different Types of Database Corruption

2025-06-19 · PostgreSQL Berlin June Meetup

talk

data corruption pageinspect

Every relation in PostgreSQL can be damaged, and sometimes the errors reported by the database are rather strange. In some cases, a session reading corrupted data can even crash the whole database. To better understand these issues and test different strategies for repairs, I created a Python application that simulates various types of damage. This talk demonstrates, through practical examples and outputs from the pageinspect extension, different types of data corruption — and proposes some future improvements that would help handle them more effectively.

The Lifecycle of a SELECT: A Glimpse into the Depths of PostgreSQL Internals

2025-06-19 · PostgreSQL Berlin June Meetup

talk

buffers explain analyze

PostgreSQL practitioners often advise developers with recommendations like "Always use EXPLAIN ANALYZE with BUFFERS" or "Run ANALYZE first". However, these suggestions are rarely accompanied by clear explanations of why they matter. Inspired by the motto "Knowledge of certain principles easily compensates for the lack of knowledge of certain facts," this talk sheds light on key PostgreSQL architectural concepts and their connection to common design and performance best practices. Through a series of increasingly complex SELECT queries, we will explore how PostgreSQL’s internal mechanisms enable safe, fast, and efficient data processing. This session is designed for application developers who want to deepen their understanding of how PostgreSQL executes queries— and how to harness its full potential without accidentally bringing it to its knees.

PostgreSQL as Data Integration tool

2025-06-19 · PostgreSQL Berlin June Meetup

talk

fdw sql/med

Abstract: Instead of using ETL Tools, which consume tons of memory on their own system, you will learn how to do ETL jobs directly in and with a database. The PostgreSQL implementation of the the standard ISO/IEC 9075-9:2016, Management of External Data (SQL/MED), is also known as Foreign Data Wrapper (FDW). With Foreign Data Wrapper, there is nearly no limit of external data, that you could use directly inside a PostgreSQL database. The talk will walk you through the definition of Foreign Data Wrapper as implemented in PostgreSQL. In the second part of the talk you will see how this technology does work shown by examples with several data sources.

What Is Dead May Roll Back: Undo Logging in OrioleDB

2025-06-19 · PostgreSQL Berlin June Meetup

talk

mvcc orioledb undo logging

PostgreSQL implements transactions and MVCC using tuple versioning and a background vacuum process. This design offers simplicity of concurrency control but has trade-offs, like table and index bloat and increased maintenance complexity of the vacuum process. OrioleDB is an alternative storage engine for PostgreSQL which introduces undo logs to implement transactions and MVCC. Undo logs offer immediate cleanup of tuples without additional vacuum process. This talk will examine the trade-offs of PostgreSQL’s current MVCC design. Then it will introduce the concept of undo logging, explain how OrioleDB implements it. The talk will provide a technical overview of how undo logs work in OrioleDB.

PostgreSQL Mistakes and How to Avoid Them

2025-06-19 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Jimmy Angelakos

Cyber Security SQL data data-engineering relational-databases

Recognize and avoid these common PostgreSQL mistakes! The best mistakes to learn from are ones made by other people! In PostgreSQL Mistakes and How To Avoid Them you’ll explore dozens of common PostgreSQL errors so you can easily avoid them in your own projects, learning proactively why certain approaches fail and others succeed. In PostgreSQL Mistakes and How To Avoid Them you’ll learn how to: Avoid configuration and operation issues Maximize PostgreSQL utility and performance Fix bad SQL practices Solve common security and administration issues Ensure smooth migration and upgrades Diagnose and fix a bad database As PostgreSQL continues its rise as a leading open source database, mastering its intricacies is crucial. PostgreSQL Mistakes and How To Avoid Them is full of tested best practices to ensure top performance, and future-proof your database systems for seamless change and growth. Each of the mistakes is carefully described and accompanied by a demo, along with an explanation that expands your knowledge of PostgreSQL internals and helps you to build a stronger mental model of how the database engine works. About the Technology Fixing mistakes in PostgreSQL databases can be time-consuming and risky—especially when you’re making live changes to an in-use system. Fortunately, you can learn from the mistakes other Postgres pros have already made! This incredibly practical book lays out how to find and avoid the most common, dangerous, and sneaky errors you’ll encounter using PostgreSQL. About the Book PostgreSQL Mistakes and How To Avoid Them identifies Postgres problems in key areas like data types, features, security, and high availability. For each mistake you’ll find a real-world narrative that illustrates the pattern and provides concrete recommendations for improvement. You’ll especially appreciate the illustrative code snippets, schema samples, mind maps, and tables that show the pros and cons of different approaches. What's Inside Diagnose configuration and operation issues Fix bad SQL code Address security and administration issues Ensure smooth migration and upgrades About the Reader For PostgreSQL database administrators and application developers. About the Author Jimmy Angelakos is a systems and database architect and PostgreSQL Contributor. He works as a Senior Principal Engineer at Deriv. Quotes I’ve run into many of these mistakes. Read up to get prepared! - Milorad Imbra, FEVO Navigates PostgreSQL pitfalls with clarity. I highly recommend it. - Manohar Sai Jasti, Workday A straightforward style and real-world examples make it an essential read. - Potito Coluccelli, Econocom Italia Provides valuable tips to avoid common PostgreSQL pitfalls. - Fernando Bugni, Grupo QuintoAndar

Sponsored by: Anomalo | Reconciling IoT, Policy, and Insurer Data to Deliver Better Customer Discounts

Race to Real-Time: Low-Latency Streaming ETL Meets Next-Gen Databricks OLTP-DB

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Irfan Elahi (Databricks)

Kinesis Databricks ETL/ELT Spark Data Streaming

In today’s digital economy, real-time insights and rapid responsiveness are paramount to delivering exceptional user experiences and lowering TCO. In this session, discover a pioneering approach that leverages a low-latency streaming ETL pipeline built with Spark Structured Streaming and Databricks’ new OLTP-DB—a serverless, managed Postgres offering designed for transactional workloads. Validated in a live customer scenario, this architecture achieves sub-2 second end-to-end latency by seamlessly ingesting streaming data from Kinesis and merging it into OLTP-DB. This breakthrough not only enhances performance and scalability but also provides a replicable blueprint for transforming data pipelines across various verticals. Join us as we delve into the advanced optimization techniques and best practices that underpin this innovation, demonstrating how Databricks’ next-generation solutions can revolutionize real-time data processing and unlock a myriad of new use cases in data landscape.

Master Schema Translations in the Era of Open Data Lake

2025-06-11 · Data + AI Summit 2025 Watch

lightning_talk

by Eric Sun (Coinbase)

Data Lake Databricks Delta DynamoDB ETL/ELT Iceberg Kafka MongoDB Snowflake

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

Lakebase: Fully Managed Postgres for the Lakehouse

2025-06-11 · Data + AI Summit 2025 Watch

talk

by Abbey Russell (Databricks) , Dave Nettleton (Databricks)

AI/ML Analytics Data Lakehouse ETL/ELT

Lakebase is a new Postgres-compatible OLTP database designed to support intelligent applications. Lakebase eliminates custom ETL pipelines with built-in lakehouse table synchronization, supports sub-10ms latency for high-throughput workloads, and offers full Postgres compatibility, so you can build applications more quickly.In this session, you’ll learn how Lakebase enables faster development, production-level concurrency, and simpler operations for data engineers and application developers building modern, data-driven applications. We'll walk through key capabilities, example use cases, and how Lakebase simplifies infrastructure while unlocking new possibilities for AI and analytics.

talk-data.com

Activity Trend

Top Events

Top Speakers

When Postgres is enough: solving document storage, pub/sub and distributed queues without more tools

Flying Beyond Keywords: Our Aviation Semantic Search Journey

Reco AI - Making the switch:ClickHouse as a Postgres alternative

PGlite for Fun and Profit

Vectors: Best Practices for a Nasty Data Type

Real-Time Manufacturing Insights with Apache Flink and Kafka

Real-Time Manufacturing Insights with Apache Flink and Kafka

Common provider abstractions: Key for multi-cloud data handling

Driving Analytics with Open Source: Airbyte, dbt, Airflow & Metabase

Lessons learned for scaling up Airflow 3 in Public Cloud

LLM-Powered Review Analysis: Optimising Data Engineering using Airflow

Database in Distress: Testing and Repairing Different Types of Database Corruption

The Lifecycle of a SELECT: A Glimpse into the Depths of PostgreSQL Internals

PostgreSQL as Data Integration tool

What Is Dead May Roll Back: Undo Logging in OrioleDB

PostgreSQL Mistakes and How to Avoid Them

Sponsored by: Anomalo | Reconciling IoT, Policy, and Insurer Data to Deliver Better Customer Discounts

Race to Real-Time: Low-Latency Streaming ETL Meets Next-Gen Databricks OLTP-DB

Master Schema Translations in the Era of Open Data Lake

Lakebase: Fully Managed Postgres for the Lakehouse