PostgreSQL 18 is planned for release at the end of September 2025 - what are some of the new features in it, and how do they impact you? In this talk, we'll take a whirlwind tour of what to expect in PostgreSQL 18, including new performance, developer experience, and more. We'll also hear from Peter Geoghegan, the implementor of the new "skip scan" feature that accelerates index lookups!
talk-data.com
Speaker
Jonathan Katz
5
talks
Jonathan Katz is a Principal Product Manager - Technical on the Amazon Redshift team and is based in New York. He's on the Core Team of PostgreSQL and is an active open source contributor, including to pgvector.
Bio from: A PostgreSQL 18 Whirlwind Tour
Filter by Event / Source
Talks & appearances
5 activities · Newest first
Vectors are a centuries old, well-studied mathematical concept, yet they pose many challenges around efficient storage and retrieval in database systems. Applications requiring effective search techniques for vectors have advanced, with "retrieval-augmented generation" (RAG) becoming a key building technique. An extensible database like PostgreSQL can add vector search through an extension like pgvector.
In this talk, we'll review what vectors are, how they are used in applications, and what users are looking for in vector storage and search systems. We'll then see how you can search for vector data in PostgreSQL, including looking at best practices for using pgvector by taking a deeper look at how pgvector implements different vector search techniques. We'll also see where traditional databases methods are most effective for building RAG-driven apps.
At the end of this talk, you'll have a set of best practices you can use when designing applications that require vector search.
PostgreSQL makes it easier to store and query vector data for artificial intelligence and machine learning (AI/ML) use cases with the pgvector extension. Learning best practices for vector search will help you deliver a high-performance experience to your customers. In this session, learn how to store data from Amazon Bedrock in an Amazon Aurora PostgreSQL-Compatible Edition database and learn SQL queries and tuning parameters to optimize the performance of your application when working with AI/ML data, vector data types, exact and approximate nearest neighbor search algorithms, and vector-optimized indexing.
Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP
Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4
About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
AWSreInvent #AWSreInvent2024
In this session, gain the skills needed to deploy end-to-end generative AI applications using your most valuable data. While this session focuses on the Retrieval Augmented Generation (RAG) process, the concepts also apply to other methods of customizing generative AI applications. Discover best practice architectures using AWS database services like Amazon Aurora, Amazon OpenSearch Service, or Amazon MemoryDB along with data processing services like AWS Glue and streaming data services like Amazon Kinesis. Learn data lake, governance, and data quality concepts and how Amazon Bedrock Knowledge Bases, Amazon Bedrock Agents, and other features tie solution components together.
Learn more: AWS re:Invent: https://go.aws/reinvent. More AWS events: https://go.aws/3kss9CP
Subscribe: More AWS videos: http://bit.ly/2O3zS75 More AWS events videos: http://bit.ly/316g9t4
About AWS: Amazon Web Services (AWS) hosts events, both online and in-person, bringing the cloud computing community together to connect, collaborate, and learn from AWS experts. AWS is the world's most comprehensive and broadly adopted cloud platform, offering over 200 fully featured services from data centers globally. Millions of customers—including the fastest-growing startups, largest enterprises, and leading government agencies—are using AWS to lower costs, become more agile, and innovate faster.
AWSreInvent #AWSreInvent2024
Summary
One of the longest running and most popular open source database projects is PostgreSQL. Because of its extensibility and a community focus on stability it has stayed relevant as the ecosystem of development environments and data requirements have changed and evolved over its lifetime. It is difficult to capture any single facet of this database in a single conversation, let alone the entire surface area, but in this episode Jonathan Katz does an admirable job of it. He explains how Postgres started and how it has grown over the years, highlights the fundamental features that make it such a popular choice for application developers, and the ongoing efforts to add the complex features needed by the demanding workloads of today’s data layer. To cap it off he reviews some of the exciting features that the community is working on building into future releases.
Preamble
Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline you’ll need somewhere to deploy it, so check out Linode. With private networking, shared block storage, node balancers, and a 40Gbit network, all controlled by a brand new API you’ve got everything you need to run a bullet-proof data platform. Go to dataengineeringpodcast.com/linode to get a $20 credit and launch a new server in under a minute. Are you struggling to keep up with customer request and letting errors slip into production? Want to try some of the innovative ideas in this podcast but don’t have time? DataKitchen’s DataOps software allows your team to quickly iterate and deploy pipelines of code, models, and data sets while improving quality. Unlike a patchwork of manual operations, DataKitchen makes your team shine by providing an end to end DataOps solution with minimal programming that uses the tools you love. Join the DataOps movement and sign up for the newsletter at datakitchen.io/de today. After that learn more about why you should be doing DataOps by listening to the Head Chef in the Data Kitchen at dataengineeringpodcast.com/datakitchen Go to dataengineeringpodcast.com to subscribe to the show, sign up for the mailing list, read the show notes, and get in touch. Join the community in the new Zulip chat workspace at dataengineeringpodcast.com/chat Your host is Tobias Macey and today I’m interviewing Jonathan Katz about a high level view of PostgreSQL and the unique capabilities that it offers
Interview
Introduction How did you get involved in the area of data management? How did you get involved in the Postgres project? For anyone who hasn’t used it, can you describe what PostgreSQL is?
Where did Postgres get started and how has it evolved over the intervening years?
What are some of the primary characteristics of Postgres that would lead someone to choose it for a given project?
What are some cases where Postgres is the wrong choice?
What are some of the common points of confusion for new users of PostGreSQL? (particularly if they have prior database experience) The recent releases of Postgres have had some fairly substantial improvements and new features. How does the community manage to balance stability and reliability against the need to add new capabilities? What are the aspects of Postgres that allow it to remain relevant in the current landscape of rapid evolution at the data layer? Are there any plans to incorporate a distributed transaction layer into the core of the project along the lines of what has been done with Citus or CockroachDB? What is in store for the future of Postgres?
Contact Info
@jkatz05 on Twitter jkatz on GitHub
Parting Question
From your perspective, what is the biggest gap in the tooling or technology for data management today?
Links
PostgreSQL Crunchy Data Venuebook Paperless Post LAMP Stack MySQL PHP SQL ORDBMS Edgar Codd A Relational Model of Data for Large Shared Data Banks Relational Algebra Oracle DB UC Berkeley Dr. Michae