talk-data.com talk-data.com

Topic

postgresql

332

tagged

Activity Trend

6 peak/qtr
2020-Q1 2026-Q1

Activities

332 activities · Newest first

When Postgres is enough: solving document storage, pub/sub and distributed queues without more tools

When a new requirement appears, whether it's document storage, pub/sub messaging, distributed queues, or even full-text search, Postgres can often handle it without introducing more infrastructure.

This talk explores how to leverage Postgres' native features like JSONB, LISTEN/NOTIFY, queueing patterns and vector extensions to build robust, scalable systems without increasing infrastructure complexity.

You'll learn practical patterns that extend Postgres just far enough, keeping systems simpler, more maintainable, and easier to operate, especially in small to medium projects or freelancing setups, where Postgres often already forms a critical part of the stack.

Postgres might not replace everything forever - but it can often get you much further than you think.

Flying Beyond Keywords: Our Aviation Semantic Search Journey

In aviation, search isn’t simple—people use abbreviations, slang, and technical terms that make exact matching tricky. We started with just Postgres, aiming for something that worked. Over time, we upgraded: semantic embeddings, reranking. We tackled filter complexity, slow index builds, and embedding updates and much more. Along the way, we learned a lot about making AI search fast, accurate, and actually usable for our users. It’s been a journey—full of turbulence, but worth the landing.

PGlite, a WASM build of PostgreSQL, offers a new way to run and use my favorite database. In this talk, we’ll explore the technology behind PGlite and look at various use cases. I’ll also share a real-world story about how I used it at my company, traide AI, and the challenges I faced—some of which I overcame, while others are still awaiting solutions.

Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. Applications requiring effective search techniques for vectors have advanced, with "retrieval-augmented generation" (RAG) becoming a key building technique. An extensible database like PostgreSQL can add vector search through an extension like pgvector.

In this talk, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector by taking a deeper look at how pgvector implements different vector search techniques. We'll also see where traditional databases methods are most effective for building RAG-driven apps.

At the end of this talk, you'll have a set of best practices you can use when designing applications that require vector search.

In this session, we’ll walk through how Apache Flink was used to enable near real-time operational insights using manufacturing IIoT Data sets. The goal: deliver actionable KPIs to production teams with sub-30-second latency, using streaming data pipelines built Kafka, Flink and Grafana. We’ll cover the key architectural patterns that made this possible, including handling structured data joins, managing out-of-order events, and integrating with downstream systems like PostgreSQL and Grafana. We’ll also share real-world performance benchmarks, lessons learned from scaling tests, and practical considerations for deploying Flink in a production-grade, low-latency analytics pipeline. The session will also include a live demo

If you're building Flink-based solutions for time-sensitive operations—whether in manufacturing, IoT, or other domains—this talk will provide proven insights from the field.

In this session, we’ll walk through how Apache Flink was used to enable near real-time operational insights using manufacturing IIoT Data sets. The goal: deliver actionable KPIs to production teams with sub-30-second latency, using streaming data pipelines built Kafka, Flink and Grafana. We’ll cover the key architectural patterns that made this possible, including handling structured data joins, managing out-of-order events, and integrating with downstream systems like PostgreSQL and Grafana. We’ll also share real-world performance benchmarks, lessons learned from scaling tests, and practical considerations for deploying Flink in a production-grade, low-latency analytics pipeline. The session will also include a live demo

If you're building Flink-based solutions for time-sensitive operations—whether in manufacturing, IoT, or other domains—this talk will provide proven insights from the field.


DISCLAIMER We don't cater to attendees under the age of 18. If you want to host or speak at a meetup, please email [email protected]

Enterprises want the flexibility to operate across multiple clouds, whether to optimize costs, improve resiliency, to avoid vendor lock-in, or for data sovereignty. But for developers, that flexibility usually comes at the cost of extra complexity and redundant code. The goal here is simple: write once, run anywhere, with minimum boilerplate. In Apache Airflow, we’ve already begun tackling this problem with abstractions like Common-SQL, which lets you write database queries once and run them on 20+ databases, from Snowflake to Postgres to SQLite to SAP HANA. Similarly, Common-IO standardizes cloud blob storage interactions across all public clouds. With Airflow 3.0, we are pushing this further by introducing a Common Message Bus provider, which is an abstraction, initially supporting Amazon SQS and expanding to Google PubSub and Apache Kafka soon after. We expect additional implementations such as Amazon Kinesis and Managed Kafka over time. This talk will dive into why these abstractions matter, how they reduce friction for developers while giving enterprises true multi-cloud optionality, and what’s next for Airflow’s evolving provider ecosystem.

In this talk, I’ll walk through how we built an end-to-end analytics pipeline using open-source tools ( Airbyte, dbt, Airflow, and Metabase). At WirePick, we extract data from multiple sources using Airbyte OSS into PostgreSQL, transform it into business-specific data marts with dbt, and automate the entire workflow using Airflow. Our Metabase dashboards provide real-time insights, and we integrate Slack notifications to alert stakeholders when key business metrics change. This session will cover: Data extraction: Using Airbyte OSS to pull data from multiple sources Transformation & Modeling: How dbt helps create reusable data marts Automation & Orchestration: Managing the workflow with Airflow Data-driven decision-making: Delivering insights through Metabase & Slack alerts

Apache Airflow 3 is a new state-of-the-art version of Airflow. For many users who plan to adopt Airflow 3 it’s important to understand how Airflow 3 behaves from performance perspective compared to Airflow 2. This presentation is going to present performance results for various Airflow 3 configurations and provides potential Airflow 3 adopters good understanding of its performance. The reference Airflow 3 configuration will be using Kubernetes cluster as a compute layer, PostgreSQL as Airflow Database and would be performed on Google Cloud Platform. Performance tests will be performed using community version of performance tests framework and there might be references to Cloud Composer (managed service for Apache Airflow). The tests will be done in production-grade configurations that might be good references for Airflow community users. Users will be provided with comparison of Airflow 3 and Airflow 2 from performance standpoint Users also will learn how to optimize Airflow scheduler performance by understanding DAG file processing, task scheduling and configuring Scheduler to run tens of thousands of DAGs/tasks in Airflow 3

A real-world journey of how my small team at Xena Intelligence built robust data pipelines for our enterprise customers using Airflow. If you’re a data engineer, or part of a small team, this talk is for you. Learn how we orchestrated a complex workflow to process millions of public reviews. What You’ll Learn: Cost-Efficient DAG Designing: Decomposing complex processes into atomic tasks using the TaskFlow, XComs, Mapped tasks, and Task groups. Diving into one of our DAGs as a concrete example of how our approach optimizes parallelism, error handling, delivery speed, and reliability. Integrating LLM Analysis: Explore how we integrated LLM-based analysis into our pipeline. Learn how we designed the database, queries, and ingestion to Postgres. Extending Airflow UI: We developed a custom Airflow UI plugin that filters and visualizes DAG runs by customer, product, and marketplace, delivering clear insights for faster troubleshooting. Leveraging Airflow REST API: Discover how we leveraged the API to trigger DAGs on demand, elevating the UX by tracking mapped DAG progress and computing ETAs. CI/CD and Cost Management: Get practical tips for deploying DAGs with CI/CD.

Every relation in PostgreSQL can be damaged, and sometimes the errors reported by the database are rather strange. In some cases, a session reading corrupted data can even crash the whole database. To better understand these issues and test different strategies for repairs, I created a Python application that simulates various types of damage. This talk demonstrates, through practical examples and outputs from the pageinspect extension, different types of data corruption — and proposes some future improvements that would help handle them more effectively.

PostgreSQL practitioners often advise developers with recommendations like "Always use EXPLAIN ANALYZE with BUFFERS" or "Run ANALYZE first". However, these suggestions are rarely accompanied by clear explanations of why they matter. Inspired by the motto "Knowledge of certain principles easily compensates for the lack of knowledge of certain facts," this talk sheds light on key PostgreSQL architectural concepts and their connection to common design and performance best practices. Through a series of increasingly complex SELECT queries, we will explore how PostgreSQL’s internal mechanisms enable safe, fast, and efficient data processing. This session is designed for application developers who want to deepen their understanding of how PostgreSQL executes queries— and how to harness its full potential without accidentally bringing it to its knees.

Abstract: Instead of using ETL Tools, which consume tons of memory on their own system, you will learn how to do ETL jobs directly in and with a database. The PostgreSQL implementation of the the standard ISO/IEC 9075-9:2016, Management of External Data (SQL/MED), is also known as Foreign Data Wrapper (FDW). With Foreign Data Wrapper, there is nearly no limit of external data, that you could use directly inside a PostgreSQL database. The talk will walk you through the definition of Foreign Data Wrapper as implemented in PostgreSQL. In the second part of the talk you will see how this technology does work shown by examples with several data sources.

PostgreSQL implements transactions and MVCC using tuple versioning and a background vacuum process. This design offers simplicity of concurrency control but has trade-offs, like table and index bloat and increased maintenance complexity of the vacuum process. OrioleDB is an alternative storage engine for PostgreSQL which introduces undo logs to implement transactions and MVCC. Undo logs offer immediate cleanup of tuples without additional vacuum process. This talk will examine the trade-offs of PostgreSQL’s current MVCC design. Then it will introduce the concept of undo logging, explain how OrioleDB implements it. The talk will provide a technical overview of how undo logs work in OrioleDB.

PostgreSQL Mistakes and How to Avoid Them

Recognize and avoid these common PostgreSQL mistakes! The best mistakes to learn from are ones made by other people! In PostgreSQL Mistakes and How To Avoid Them you’ll explore dozens of common PostgreSQL errors so you can easily avoid them in your own projects, learning proactively why certain approaches fail and others succeed. In PostgreSQL Mistakes and How To Avoid Them you’ll learn how to: Avoid configuration and operation issues Maximize PostgreSQL utility and performance Fix bad SQL practices Solve common security and administration issues Ensure smooth migration and upgrades Diagnose and fix a bad database As PostgreSQL continues its rise as a leading open source database, mastering its intricacies is crucial. PostgreSQL Mistakes and How To Avoid Them is full of tested best practices to ensure top performance, and future-proof your database systems for seamless change and growth. Each of the mistakes is carefully described and accompanied by a demo, along with an explanation that expands your knowledge of PostgreSQL internals and helps you to build a stronger mental model of how the database engine works. About the Technology Fixing mistakes in PostgreSQL databases can be time-consuming and risky—especially when you’re making live changes to an in-use system. Fortunately, you can learn from the mistakes other Postgres pros have already made! This incredibly practical book lays out how to find and avoid the most common, dangerous, and sneaky errors you’ll encounter using PostgreSQL. About the Book PostgreSQL Mistakes and How To Avoid Them identifies Postgres problems in key areas like data types, features, security, and high availability. For each mistake you’ll find a real-world narrative that illustrates the pattern and provides concrete recommendations for improvement. You’ll especially appreciate the illustrative code snippets, schema samples, mind maps, and tables that show the pros and cons of different approaches. What's Inside Diagnose configuration and operation issues Fix bad SQL code Address security and administration issues Ensure smooth migration and upgrades About the Reader For PostgreSQL database administrators and application developers. About the Author Jimmy Angelakos is a systems and database architect and PostgreSQL Contributor. He works as a Senior Principal Engineer at Deriv. Quotes I’ve run into many of these mistakes. Read up to get prepared! - Milorad Imbra, FEVO Navigates PostgreSQL pitfalls with clarity. I highly recommend it. - Manohar Sai Jasti, Workday A straightforward style and real-world examples make it an essential read. - Potito Coluccelli, Econocom Italia Provides valuable tips to avoid common PostgreSQL pitfalls. - Fernando Bugni, Grupo QuintoAndar

Sponsored by: Anomalo | Reconciling IoT, Policy, and Insurer Data to Deliver Better Customer Discounts

As insurers increasingly leverage IoT data to personalize policy pricing, reconciling disparate datasets across devices, policies, and insurers becomes mission-critical. In this session, learn how Nationwide transitioned from prototype workflows in Dataiku to a hardened data stack on Databricks, enabling scalable data governance and high-impact analytics. Discover how the team orchestrates data reconciliation across Postgres, Oracle, and Databricks to align customer driving behavior with insurer and policy data—ensuring more accurate, fair discounts for policyholders. With Anomalo’s automated monitoring layered on top, Nationwide ensures data quality at scale while empowering business units to define custom logic for proactive stewardship. We’ll also look ahead to how these foundations are preparing the enterprise for unstructured data and GenAI initiatives.

Race to Real-Time: Low-Latency Streaming ETL Meets Next-Gen Databricks OLTP-DB

In today’s digital economy, real-time insights and rapid responsiveness are paramount to delivering exceptional user experiences and lowering TCO. In this session, discover a pioneering approach that leverages a low-latency streaming ETL pipeline built with Spark Structured Streaming and Databricks’ new OLTP-DB—a serverless, managed Postgres offering designed for transactional workloads. Validated in a live customer scenario, this architecture achieves sub-2 second end-to-end latency by seamlessly ingesting streaming data from Kinesis and merging it into OLTP-DB. This breakthrough not only enhances performance and scalability but also provides a replicable blueprint for transforming data pipelines across various verticals. Join us as we delve into the advanced optimization techniques and best practices that underpin this innovation, demonstrating how Databricks’ next-generation solutions can revolutionize real-time data processing and unlock a myriad of new use cases in data landscape.

Master Schema Translations in the Era of Open Data Lake

Unity Catalog puts variety of schemas into a centralized repository, now the developer community wants more productivity and automation for schema inference, translation, evolution and optimization especially for the scenarios of ingestion and reverse-ETL with more code generations.Coinbase Data Platform attempts to pave a path with "Schemaster" to interact with data catalog with the (proposed) metadata model to make schema translation and evolution more manageable across some of the popular systems, such as Delta, Iceberg, Snowflake, Kafka, MongoDB, DynamoDB, Postgres...This Lighting Talk covers 4 areas: The complexity and caveats of schema differences among The proposed field-level metadata model, and 2 translation patterns: point-to-point vs hub-and-spoke Why Data Profiling be augmented to enhance schema understanding and translation Integrate it with Ingestion & Reverse-ETL in a Databricks-oriented eco system Takeaway: standardize schema lineage & translation

Lakebase: Fully Managed Postgres for the Lakehouse

Lakebase is a new Postgres-compatible OLTP database designed to support intelligent applications. Lakebase eliminates custom ETL pipelines with built-in lakehouse table synchronization, supports sub-10ms latency for high-throughput workloads, and offers full Postgres compatibility, so you can build applications more quickly.In this session, you’ll learn how Lakebase enables faster development, production-level concurrency, and simpler operations for data engineers and application developers building modern, data-driven applications. We'll walk through key capabilities, example use cases, and how Lakebase simplifies infrastructure while unlocking new possibilities for AI and analytics.