postgresql

Bruce Momjian: Data Horizons With Postgres

2022-06-17 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Bruce Momjian (PostgreSQL Global Development Group)

SQL in a Nutshell, 4th Edition

2022-06-15 · O'Reilly SQL Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe , Kevin Kline

MariaDB Microsoft MySQL Oracle RDBMS SQL SQL Server

For programmers, analysts, and database administrators, this Nutshell guide is the essential reference for the SQL language used in today's most popular database products. This new fourth edition clearly documents SQL commands according to the latest ANSI/ISO standard and details how those commands are implemented in Microsoft SQL Server 2019 and Oracle 19c, as well as in the MySQL 8, MariaDB 10.5, and PostgreSQL 14 open source database products. You'll also get a concise overview of the relational database management system (RDBMS) model and a clear-cut explanation of foundational RDBMS concepts--all packed into a succinct, comprehensive, and easy-to-use format. Sections include: Background on the relational database model, including current and previous SQL standards Fundamental concepts necessary for understanding relational databases and SQL commands An alphabetical command reference to SQL statements, according to the SQL:2016 ANSI standard The implementation of each command by MySQL, Oracle, PostgreSQL, and SQL Server An alphabetical reference of the ANSI SQL:2016 functions and constructs as well as the vendor implementations Platform-specific functions unique to each implementation

Discover And De-Clutter Your Unstructured Data With Aparavi

2022-06-13 · Data Engineering Podcast Listen

podcast_episode

by Rod Christensen (Aparavi) , Tobias Macey

AWS Azure BigQuery CDP Cloud Computing Data Engineering Data Lake Data Management Databricks ETL/ELT GCP Java +12 more

Summary Unstructured data takes many forms in an organization. From a data engineering perspective that often means things like JSON files, audio or video recordings, images, etc. Another category of unstructured data that every business deals with is PDFs, Word documents, workstation backups, and countless other types of information. Aparavi was created to tame the sprawl of information across machines, datacenters, and clouds so that you can reduce the amount of duplicate data and save time and money on managing your data assets. In this episode Rod Christensen shares the story behind Aparavi and how you can use it to cut costs and gain value for the long tail of your unstructured data.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Rod Christensen about Aparavi, a platform designed to find and unlock the value of data, no matter where it lives

Interview

Introduction How did you get involved in the area of data management? Can you describe what Aparavi is and the story behind it? Who are the target customers for Aparavi and how does that inform your product roadmap and messaging? What are some of th

Hire And Scale Your Data Team With Intention

2022-06-13 · Data Engineering Podcast Listen

podcast_episode

by Trupti Natu , Tobias Macey

Airflow Analytics BI CI/CD Data Engineering Data Management Data Quality Datafold DataOps dbt Kubernetes MongoDB +2 more

Summary Building a well rounded and effective data team is an iterative process, and the first hire can set the stage for future success or failure. Trupti Natu has been the first data hire multiple times and gone through the process of building teams across the different stages of growth. In this episode she shares her thoughts and insights on how to be intentional about establishing your own data team.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking all of that information into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how you can take advantage of active metadata and escape the chaos. Atlan is the metadata hub for your data ecosystem. Instead of locking your metadata into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how Atlan’s active metadata platform is helping pioneering data teams like Postman, Plaid, WeWork & Unilever achieve extraordinary things with metadata and escape the chaos. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Unstruk is the DataOps platform for your unstructured data. The options for ingesting, organizing, and curating unstructured files are complex, expensive, and bespoke. Unstruk Data is changing that equation with their platform approach to manage your unstructured assets. Built to handle all of your real-world data, from videos and images, to 3d point clouds and geospatial records, to industry specific file formats, Unstruk streamlines your workflow by converting human hours into machine minutes, and automatically alerting you to insights found in your dark data. Unstruk handles data versioning, lineage tracking, duplicate detection, consistency vali

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

2022-06-06 · Data Engineering Podcast Listen

podcast_episode

by Sean Falconer (Confluent) , Tobias Macey

Airflow Analytics AWS Azure BI BigQuery CI/CD Cloud Computing Data Engineering Data Governance Data Management Data Quality +14 more

Summary The best way to make sure that you don’t leak sensitive data is to never have it in the first place. The team at Skyflow decided that the second best way is to build a storage system dedicated to securely managing your sensitive information and making it easy to integrate with your applications and data systems. In this episode Sean Falconer explains the idea of a data privacy vault and how this new architectural element can drastically reduce the potential for making a mistake with how you manage regulated or personally identifiable information.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! Atlan is the metadata hub for your data ecosystem. Instead of locking all of that information into a new silo, unleash its transformative potential with Atlan’s active metadata capabilities. Push information about data freshness and quality to your business intelligence, automatically scale up and down your warehouse based on usage patterns, and let the bots answer those questions in Slack so that the humans can focus on delivering real value. Go to dataengineeringpodcast.com/atlan today to learn more about how you can take advantage of active metadata and escape the chaos. Modern data teams are dealing with a lot of complexity in their data pipelines and analytical code. Monitoring data quality, tracing incidents, and testing changes can be daunting and often takes hours to days or even weeks. By the time errors have made their way into production, it’s often too late and damage is done. Datafold built automated regression testing to help data and analytics engineers deal with data quality in their pull requests. Datafold shows how a change in SQL code affects your data, both on a statistical level and down to individual rows and values before it gets merged to production. No more shipping and praying, you can now know exactly what will change in your database! Datafold integrates with all major data warehouses as well as frameworks such as Airflow & dbt and seamlessly plugs into CI workflows. Visit dataengineeringpodcast.com/datafold today to book a demo with Datafold. Data teams are increasingly under pressure to deliver. According to a recent survey by Ascend.io, 95% in fact reported being at or over capacity. With 72% of data experts reporting demands on their team going up faster than they can hire, it’s no surprise they are increasingly turning to automation. In fact, while only 3.5% report having current investments in automation, 85% of data teams plan on investing in automation in the next 12 months. 85%!!! That’s where our friends at Ascend.io come in. The Ascend Data Automation Cloud provides a unified platform for data ingestion, transformation, orchestration, and observability. Ascend users love its declarative pipelines, powerful SDK, elegant UI, and extensible plug-in architecture, as well as its support for Python, SQL, Scala, and Java. Ascend automates workloads on Snowflake, Databricks, BigQuery, and open source Spark, and can be deployed in AWS, Azure, or GCP. Go to dataengineeringpodcast.com/ascend and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $5,000 when you become a customer. Your host is Tobias Macey and today I’m interviewing Sean Falconer about the idea of a data privacy vault and how the Skyflow team are working to make it turn-key

Interview

Introduction How did you get involved in the area of data management? Can you describe what Skyflow is and the story behind it? What is a "data privacy vault" and how does it differ from strategies such as privacy engineering or existing data governance patterns? What are the primary use cases and capabilities that you are focused on solving for with Skyflow?

Who is the target customer for Skyflow (e.g. how does it enter an organization)?

How is the Skyflow platform architected?

How have the design and goals of the system changed or evolved over time?

Can you describe the process of integrating with Skyflow at the application level? For organizations that are building analytical capabilities on top of the data managed in their applications, what are the interactions with Skyflow at each of the stages in the data lifecycle? One of the perennial problems with distributed systems is the challenge of joining data across machine boundaries. How do you mitigate that problem? On your website there are different "vaults" advertised in the form of healthcare, fintech, and PII. What are the different requirements across each of those problem domains?

What are the commonalities?

As a relatively new company in an emerging product category, what are some of the customer education challenges that you are facing? What are the most interesting, innovative, or unexpected ways that you have seen Skyflow used? What are the most interesting, unexpected, or challenging lessons that you have learned while working on Skyflow? When is Skyflow the wrong choice? What do you have planned for the future of Skyflow?

Contact Info

LinkedIn @seanfalconer on Twitter Website

Parting Question

From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

Thank you for listening! Don’t forget to check out our other show, Podcast.init to learn about the Python language, its community, and the innovative ways it is being used. Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes. If you’ve learned something or tried out a project from the show then tell us about it! Email [email protected]) with your story. To help other people find the show please leave a review on iTunes and tell your friends and co-workers

Links

Skyflow Privacy Engineering Data Governance Homomorphic Encryption Polymorphic Encryption

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Support Data Engineering Podcast

Bringing The Modern Data Stack To Everyone With Y42

2022-06-06 · Data Engineering Podcast Listen

podcast_episode

by Hung Dang (Y42) , Tobias Macey

Airflow Analytics CDP Cloud Computing Data Engineering Data Lake Data Management ETL/ELT Kubernetes Modern Data Stack MongoDB MySQL +3 more

Summary Cloud services have made highly scalable and performant data platforms economical and manageable for data teams. However, they are still challenging to work with and manage for anyone who isn’t in a technical role. Hung Dang understood the need to make data more accessible to the entire organization and created Y42 as a better user experience on top of the "modern data stack". In this episode he shares how he designed the platform to support the full spectrum of technical expertise in an organization and the interesting engineering challenges involved.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their new managed database service you can launch a production ready MySQL, Postgres, or MongoDB cluster in minutes, with automated backups, 40 Gbps connections from your application hosts, and high throughput SSDs. Go to dataengineeringpodcast.com/linode today and get a $100 credit to launch a database, create a Kubernetes cluster, or take advantage of all of their other services. And don’t forget to thank them for their continued support of this show! This episode is brought to you by Acryl Data, the company behind DataHub, the leading developer-friendly data catalog for the modern data stack. Open Source DataHub is running in production at several companies like Peloton, Optum, Udemy, Zynga and others. Acryl Data provides DataHub as an easy to consume SaaS product which has been adopted by several companies. Signup for the SaaS product at dataengineeringpodcast.com/acryl RudderStack helps you build a customer data platform on your warehouse or data lake. Instead of trapping data in a black box, they enable you to easily collect customer data from the entire stack and build an identity graph on your warehouse, giving you full visibility and control. Their SDKs make event streaming from any app or website easy, and their state-of-the-art reverse ETL pipelines enable you to send enriched data to any cloud tool. Sign up free… or just get the free t-shirt for being a listener of the Data Engineering Podcast at dataengineeringpodcast.com/rudder. The most important piece of any data project is the data itself, which is why it is critical that your data source is high quality. PostHog is your all-in-one product analytics suite including product analysis, user funnels, feature flags, experimentation, and it’s open source so you can host it yourself or let them do it for you! You have full control over your data and their plugin system lets you integrate with all of your other data tools, including data warehouses and SaaS platforms. Give it a try today with their generous free tier at dataengineeringpodcast.com/posthog Your host is Tobias Macey and today I’m interviewing Hung Dang about Y42, the full-stack data platform that anyone can run

Interview

Introduction How did you get involved in the area of data management? Can you describe what Y42 is and the story behind it? How would you characterize your positioning in the data ecosystem? What are the problems that you are trying to solve?

Who are the personas that you optimize for and how does that manifest in your product design and feature priorities?

How is the Y42 platform implemented?

What are the core engineering problems that you have had to address in order to tie together the various underlying services that you integrate? How have the design and goals of the product changed or evolved since you started working on it?

What are the sharp edges and failure conditions that you have had to automate around in order to support non-technical users? What is the process for integrating Y42 with an organization’s data systems?

What is the story for onboarding from existing systems and importing workflows (e.g. Airflow d

Juan Pan: PostgreSQL Distributed & Secure Database Ecosystem Building

2022-05-19 · DATA MINER Big Data Europe Conference 2020 Watch

video

by Juan Pan

PostgreSQL 14 Administration Cookbook

2022-03-31 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Simon Riggs , Gianni Ciolli

Cloud Computing Cyber Security data data-engineering relational-databases

PostgreSQL 14 Administration Cookbook provides a hands-on guide to mastering the administration of PostgreSQL 14. With over 175 recipes, this book equips you with practical techniques to manage, secure, and optimize your PostgreSQL databases, ensuring they are robust and high-performing. What this Book will help me do Master managing PostgreSQL databases both on-premises and in the cloud efficiently. Implement effective backup and recovery strategies to secure your data. Leverage the latest features of PostgreSQL 14 to enhance your database workflows. Understand and apply best practices for maintaining high availability and performance. Troubleshoot real-world challenges with guided solutions and expert insights. Author(s) Simon Riggs and Gianni Ciolli are seasoned database experts with years of experience working with PostgreSQL. Simon is a PostgreSQL core team member, contributing his technical knowledge towards building robust database solutions, while Gianni brings a wealth of expertise in database administration and support. Together, they share a passion for making complex database concepts accessible and actionable. Who is it for? This book is for database administrators, data architects, and developers who manage PostgreSQL databases and are looking to deepen their knowledge. It is suitable for professionals with some experience in PostgreSQL who aim to maximize their database's performance and security, as well as for those new to the system seeking a comprehensive start. Readers with an interest in practical, problem-solving approaches to database management will greatly benefit from this cookbook.

Practical SQL, 2nd Edition

2022-03-01 · O'Reilly SQL Books O'Reilly Amazon

book

by Anthony DeBarros

GIS JSON Microsoft MySQL Oracle RDBMS SQL SQL Server

Practical SQL is an approachable and fast-paced guide to SQL (Structured Query Language), the standard programming language for defining, organizing, and exploring data in relational databases. Anthony DeBarros, a journalist and data analyst, focuses on using SQL to find the story within your data. The examples and code use the open-source database PostgreSQL and its companion pgAdmin interface, and the concepts you learn will apply to most database management systems, including MySQL, Oracle, SQLite, and others.* You’ll first cover the fundamentals of databases and the SQL language, then build skills by analyzing data from real-world datasets such as US Census demographics, New York City taxi rides, and earthquakes from US Geological Survey. Each chapter includes exercises and examples that teach even those who have never programmed before all the tools necessary to build powerful databases and access information quickly and efficiently. You’ll learn how to: •Create databases and related tables using your own data •Aggregate, sort, and filter data to find patterns •Use functions for basic math and advanced statistical operations •Identify errors in data and clean them up •Analyze spatial data with a geographic information system (PostGIS) •Create advanced queries and automate tasks This updated second edition has been thoroughly revised to reflect the latest in SQL features, including additional advanced query techniques for wrangling data. This edition also has two new chapters: an expanded set of instructions on for setting up your system plus a chapter on using PostgreSQL with the popular JSON data interchange format. Learning SQL doesn’t have to be dry and complicated. Practical SQL delivers clear examples with an easy-to-follow approach to teach you the tools you need to build and manage your own databases. * Microsoft SQL Server employs a variant of the language called T-SQL, which is not covered by Practical SQL.

PostGIS in Action, Third Edition

2021-09-12 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Leo S. Hsu , Regina Obe

GIS JSON RDBMS SQL data data-engineering geographic-information-system-gis location-data postgis

In PostGIS in Action, Third Edition you will learn: An introduction to spatial databases Geometry, geography, raster, and topology spatial types, functions, and queries Applying PostGIS to real-world problems Extending PostGIS to web and desktop applications Querying data from external sources using PostgreSQL Foreign Data Wrappers Optimizing queries for maximum speed Simplifying geometries for greater efficiency PostGIS in Action, Third Edition teaches readers of all levels to write spatial queries for PostgreSQL. You’ll start by exploring vector-, raster-, and topology-based GIS before quickly progressing to analyzing, viewing, and mapping data. This fully updated third edition covers key changes in PostGIS 3.1 and PostgreSQL 13, including parallelization support, partitioned tables, and new JSON functions that help in creating web mapping applications. About the Technology PostGIS is a spatial database extender for PostgreSQL. It offers the features and firepower you need to take on nearly any geodata task. PostGIS lets you create location-aware queries with a few lines of SQL code, then build the backend for mapping, raster analysis, or routing application with minimal effort. About the Book PostGIS in Action, Third Edition shows you how to solve real-world geodata problems. You’ll go beyond basic mapping, and explore custom functions for your applications. Inside this fully updated edition, you’ll find coverage of new PostGIS features such as PostGIS Window functions, parallelization of queries, and outputting data for applications using JSON and Vector Tile functions. What's Inside Fully revised for PostGIS version 3.1 and PostgreSQL 13 Optimize queries for maximum speed Simplify geometries for greater efficiency Extend PostGIS to web and desktop applications About the Reader For readers familiar with relational databases and basic SQL. No prior geodata or GIS experience required. About the Authors Regina Obe and Leo Hsu are database consultants and authors. Regina is a member of the PostGIS core development team and the Project Steering Committee. Quotes The best introduction I’ve seen for engineers who want to get ramped up quickly and build advanced GIS applications. - Ikechukwu Okonkwo, Orum.io A wealth of information that showcases how powerful PostGIS is. - Luis Moux-Dominguez, EMO An extraordinary book for the world of GIS. Truly learned a lot! - DeUndre’ Rushon, DigiDiscover LLC Gives you insight into how best to provide map services for a wide audience. - Marcus Brown, Enel Green Power

SQL Pocket Guide, 4th Edition

2021-08-26 · O'Reilly SQL Books O'Reilly Amazon

book

by Alice Zhao (Best Fit Analytics)

Microsoft MySQL Oracle Python SQL SQL Server

If you use SQL in your day-to-day work as a data analyst, data scientist, or data engineer, this popular pocket guide is your ideal on-the-job reference. You'll find many examples that address the language's complexities, along with key aspects of SQL used in Microsoft SQL Server, MySQL, Oracle Database, PostgreSQL, and SQLite. In this updated edition, author Alice Zhao describes how these database management systems implement SQL syntax for both querying and making changes to a database. You'll find details on data types and conversions, regular expression syntax, window functions, pivoting and unpivoting, and more. Quickly look up how to perform specific tasks using SQL Apply the book's syntax examples to your own queries Update SQL queries to work in five different database management systems NEW: Connect Python and R to a relational database NEW: Look up frequently asked SQL questions in the "How Do I?" chapter

Developing Modern Database Applications with PostgreSQL

2021-08-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Quan Ha Le , Marcelo Diaz

API Cloud Computing DevOps Linux data data-engineering relational-databases

In "Developing Modern Database Applications with PostgreSQL", you will master the art of building database applications with the highly available and scalable PostgreSQL. Walk through a series of real-world projects that fully explore both the developmental and administrative aspects of PostgreSQL, all tied together through the example of a banking application. What this Book will help me do Set up high-availability PostgreSQL clusters using modern best practices. Monitor and tune database performance to handle enterprise-level workloads seamlessly. Automate testing and implement test-driven development strategies for robust applications. Leverage PostgreSQL along with DevOps pipelines to deploy applications on cloud platforms. Develop APIs and geospatial databases using popular tools like PostgREST and PostGIS. Author(s) The authors of this book, None Le and None Diaz, are experienced professionals in database technologies and software development. With a passion for PostgreSQL and its applications in modern computing, they bring a wealth of expertise and a practical approach to this book. Their methods focus on real-world applicability, ensuring that readers gain hands-on skills and practical knowledge. Who is it for? This book is perfect for database developers, administrators, and architects who want to advance their expertise in PostgreSQL. It is also suitable for software engineers and IT professionals aiming to tackle end-to-end database development projects. A basic knowledge of PostgreSQL and Linux will help you dive into the hands-on projects easily. If you're looking to take your PostgreSQL skills to the next level, this book is for you.

Building Your Data Warehouse On Top Of PostgreSQL

2021-05-14 · Data Engineering Podcast Listen

podcast_episode

by Thomas Richter (Swarm64) , Joshua D. Drake , Tobias Macey

Cloud Computing Data Engineering Data Lake Data Management DWH GitHub HDFS Kubernetes Looker Modern Data Stack S3 Snowflake +1 more

Summary There is a lot of attention on the database market and cloud data warehouses. While they provide a measure of convenience, they also require you to sacrifice a certain amount of control over your data. If you want to build a warehouse that gives you both control and flexibility then you might consider building on top of the venerable PostgreSQL project. In this episode Thomas Richter and Joshua Drake share their advice on how to build a production ready data warehouse with Postgres.

Announcements

Hello and welcome to the Data Engineering Podcast, the show about modern data management When you’re ready to build your next pipeline, or want to test out the projects you hear about on the show, you’ll need somewhere to deploy it, so check out our friends at Linode. With their managed Kubernetes platform it’s now even easier to deploy and scale your workflows, or try out the latest Helm charts from tools like Pulsar and Pachyderm. With simple pricing, fast networking, object storage, and worldwide data centers, you’ve got everything you need to run a bulletproof data platform. Go to dataengineeringpodcast.com/linode today and get a $100 credit to try out a Kubernetes cluster of your own. And don’t forget to thank them for their continued support of this show! Firebolt is the fastest cloud data warehouse. Visit dataengineeringpodcast.com/firebolt to get started. The first 25 visitors will receive a Firebolt t-shirt. Atlan is a collaborative workspace for data-driven teams, like Github for engineering or Figma for design teams. By acting as a virtual hub for data assets ranging from tables and dashboards to SQL snippets & code, Atlan enables teams to create a single source of truth for all their data assets, and collaborate across the modern data stack through deep integrations with tools like Snowflake, Slack, Looker and more. Go to dataengineeringpodcast.com/atlan today and sign up for a free trial. If you’re a data engineering podcast listener, you get credits worth $3000 on an annual subscription Your host is Tobias Macey and today I’m interviewing Thomas Richter and Joshua Drake about using Postgres as your data warehouse

Interview

Introduction How did you get involved in the area of data management? Can you start by establishing a working definition of what constitutes a data warehouse for the purpose of this discussion?

What are the limitations for out-of-the-box Postgres when trying to use it for these workloads?

There are a large and growing number of options for data warehouse style workloads. How would you categorize the different systems and what is PostgreSQL’s position in that ecosystem?

What do you see as the motivating factors for a team or organization to select from among those categories?

Why would someone want to use Postgres as their data warehouse platform rather than using a purpose-built engine? What is the cost/performance equation for Postgres as compared to other data warehouse solutions? For someone who wants to turn Postgres into a data warehouse engine, what are their options?

What are the relative tradeoffs of the different open source and commercial offerings? (e.g. Citus, cstore_fdw, zedstore, Swarm64, Greenplum, etc.)

One of the biggest areas of growth right now is in the "cloud data warehouse" market where storage and compute are decoupled. What are the options for making that possible with Postgres? (e.g. using foreign data wrappers for interacting with data lake storage (S3, HDFS, Alluxio, etc.)) What areas of work are happening in the Postgres community for upcoming releases to make it more easily suited to data warehouse/analytical workloads? What are some of the most interesting, innovative, or unexpected ways that you have seen Postgres used in analytical contexts? What are the most interesting, unexpected, or challenging lessons that you have learned from your own experiences of building analytical systems with Postgres? When is Postgres the wrong choice fo

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

2021-04-22 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Anna Bailliekova , Henrietta Dombrovskaya , Boris Novikov

SQL data data-engineering relational-databases

Write optimized queries. This book helps you write queries that perform fast and deliver results on time. You will learn that query optimization is not a dark art practiced by a small, secretive cabal of sorcerers. Any motivated professional can learn to write efficient queries from the get-go and capably optimize existing queries. You will learn to look at the process of writing a query from the database engine’s point of view, and know how to think like the database optimizer. The book begins with a discussion of what a performant system is and progresses to measuring performance and setting performance goals. It introduces different classes of queries and optimization techniques suitable to each, such as the use of indexes and specific join algorithms. You will learn to read and understand query execution plans along with techniques for influencing those plans for better performance. The book also covers advanced topics such as the use of functions and procedures, dynamic SQL, and generated queries. All of these techniques are then used together to produce performant applications, avoiding the pitfalls of object-relational mappers. What You Will Learn Identify optimization goals in OLTP and OLAP systems Read and understand PostgreSQL execution plans Distinguish between short queries and long queries Choose the right optimization technique for each query type Identify indexes that will improve query performance Optimize full table scans Avoid the pitfalls of object-relational mapping systems Optimize the entire application rather than just database queries Who This Book Is For IT professionals working in PostgreSQL who want to develop performant and scalable applications, anyone whosejob title contains the words “database developer” or “database administrator" or who is a backend developer charged with programming database calls, and system architects involved in the overall design of application systems running against a PostgreSQL database

PostgreSQL 13 Cookbook

2021-02-26 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Vallarapu Naga Avinash Kumar

SQL data data-engineering relational-databases

The "PostgreSQL 13 Cookbook" is your step-by-step resource for mastering PostgreSQL 13. Explore over 120 recipes, solving both common and advanced database management challenges, with a focus on high performance, fault tolerance, and cutting-edge features. What this Book will help me do Master the implementation of backup and recovery strategies tailored for PostgreSQL 13. Set up robust high availability clusters ensuring seamless failover with PostgreSQL replication features. Improve performance using optimization techniques specific to PostgreSQL 13 databases. Secure your databases with advanced authentication, encryption, and auditing measures. Analyze and monitor PostgreSQL servers to identify performance bottlenecks and maintain uptime efficiently. Author(s) Vallarapu Naga Avinash Kumar is an experienced PostgreSQL architect and developer who brings years of expertise in designing and managing enterprise-level databases. He has authored resources that simplify complex technical concepts for readers. His meticulous and straightforward writing approach empowers readers to skillfully apply PostgreSQL concepts in real-world scenarios. Who is it for? This book is perfect for database administrators, architects, and developers aiming to master PostgreSQL 13 capabilities. If you have prior experience with PostgreSQL and SQL, this cookbook will be a reliable reference to solve challenges and optimize your database solutions. If you're designing or managing databases, you'll find practical insights and actionable recipes tailored to your needs.

Mastering PostgreSQL 13 - Fourth Edition

2020-11-13 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Hans-Jürgen Schönig

Oracle Cyber Security SQL data data-engineering relational-databases

Dive into PostgreSQL 13 with this comprehensive guide that equips you to build, manage, and optimize database applications using state-of-the-art features. With a strong focus on hands-on insights, this book covers everything from SQL functions to advanced replication, helping you to enhance your database management expertise. What this Book will help me do Understand and utilize advanced SQL features to increase database efficiency. Optimize your PostgreSQL queries for improved performance in applications. Implement robust backup, recovery, and replication strategies for data integrity. Migrate seamlessly from Oracle to PostgreSQL using proven strategies. Strengthen server security to safeguard sensitive data in your PostgreSQL system. Author(s) Hans-Jürgen Schönig is a renowned PostgreSQL expert with decades of experience in database administration and consulting. He has guided companies across the globe to leverage the power of PostgreSQL, achieving high performance and reliability in their applications. His clear, methodical, and practical approach makes complex topics accessible to database professionals. Who is it for? This book is crafted for PostgreSQL database administrators and developers with some prior experience. If you are looking to refine your skills and adopt advanced features in PostgreSQL 13 to enhance performance and manageability, this book is ideal for you. It is best suited for individuals who aim to make their database applications more secure and robust.

SQL Cookbook, 2nd Edition

2020-11-03 · O'Reilly SQL Books O'Reilly Amazon

book

by Robert de Graaf , Anthony Molinaro

MySQL Oracle SQL

You may know SQL basics, but are you taking advantage of its expressive power? This second edition applies a highly practical approach to Structured Query Language (SQL) so you can create and manipulate large stores of data. Based on real-world examples, this updated cookbook provides a framework to help you construct solutions and executable examples in several flavors of SQL, including Oracle, DB2, SQL Server, MySQL, and PostgreSQL. SQL programmers, analysts, data scientists, database administrators, and even relatively casual SQL users will find SQL Cookbook to be a valuable problem-solving guide for everyday issues. No other resource offers recipes in this unique format to help you tackle nagging day-to-day conundrums with SQL. The second edition includes: Fully revised recipes that recognize the greater adoption of window functions in SQL implementations Additional recipes that reflect the widespread adoption of common table expressions (CTEs) for more readable, easier-to-implement solutions New recipes to make SQL more useful for people who aren't database experts, including data scientists Expanded solutions for working with numbers and strings Up-to-date SQL recipes throughout the book to guide you through the basics

Learn PostgreSQL

2020-10-09 · O'Reilly Data Engineering Books O'Reilly Amazon

book

by Luca Ferrari (Bending Spoons) , Enrico Pirozzi

RDBMS SQL data data-engineering relational-databases

Dive into the world of PostgreSQL, one of the most powerful and versatile open-source relational databases! This book guides you through all the essentials of PostgreSQL version 12 and 13, from installation to high-performance database deployments. You'll learn how to design schemas, perform database operations efficiently, and implement advanced functionalities. What this Book will help me do Install, configure, and monitor a PostgreSQL server for optimal performance. Implement SQL and PL/pgSQL scripts to build complex database solutions. Analyze and optimize database schemas and indexes for efficiency. Secure a PostgreSQL database and manage roles and permissions effectively. Set up high-availability configurations through replication techniques. Author(s) None Ferrari and Enrico Pirozzi are seasoned database professionals with extensive experience in PostgreSQL. They bring practical expertise and a real-world perspective to the subject, ensuring you get hands-on knowledge and apply it effectively. Their approachable writing style simplifies even the most complex database concepts. Who is it for? This book is perfect for database professionals, developers, or tech enthusiasts looking to gain mastery over PostgreSQL. Whether you are new to PostgreSQL or have a fundamental understanding of databases, you'll find this book highly insightful in achieving your database management goals.

Advanced R 4 Data Programming and the Cloud: Using PostgreSQL, AWS, and Shiny

2020-07-16 · O'Reilly Data Science Books O'Reilly Amazon

book

by Joshua F. Wiley , Matt Wiley

AWS Cloud Computing Dashboard Data Management GitHub data data-science data-science-tools r

Program for data analysis using R and learn practical skills to make your work more efficient. This revised book explores how to automate running code and the creation of reports to share your results, as well as writing functions and packages. It includes key R 4 features such as a new color palette for charts, an enhanced reference counting system, and normalization of matrix and array types where matrix objects now formally inherit from the array class, eliminating inconsistencies. Advanced R 4 Data Programming and the Cloud is not designed to teach advanced R programming nor to teach the theory behind statistical procedures. Rather, it is designed to be a practical guide moving beyond merely using R; it shows you how to program in R to automate tasks. This book will teach you how to manipulate data in modern R structures and includes connecting R to databases such as PostgreSQL, cloud services such as Amazon Web Services (AWS), and digital dashboards such as Shiny. Each chapter also includes a detailed bibliography with references to research articles and other resources that cover relevant conceptual and theoretical topics. What You Will Learn Write and document R functions using R 4 Make an R package and share it via GitHub or privately Add tests to R code to ensure it works as intended Use R to talk directly to databases and do complex data management Run R in the Amazon cloud Deploy a Shiny digital dashboard Generate presentation-ready tables and reports using R Who This Book Is For Working professionals, researchers, and students who are familiar with R and basic statistical techniques such as linear regression and who want to learn how to take their R coding and programming to the next level.

Autonomous driving with Airflow

2020-07-01 · Airflow Summit 2020

session

by Michal Dura (DXC) , Amr Noureldin

Airflow

This talk describes how Airflow is utilized in an Autonomous driving project, originating from Munich - Germany. We describe the Airflow setup, what challenges we encountered and how we maneuvered to achieve a distributed and highly scalable Airflow setup. One of the biggest automotive manufacturers elected to go for Airflow as an orchestration tool, in the pursuit of producing their first Level-3 autonomous driving vehicle in Germany. In this talk, we will describe the journey of deploying Airflow on top of OpenShift using a PostgreSQL database + RabbitMQ. We will describe how we achieve high-availability for the different Airflow components. We will tackle issues related to the database performance and failover recovery for the different Airflow components in our setup. In addition, we will present the bottlenecks we encountered with (1) Airflow scheduler (especially with complex DAGs), and (2) SparkSubmitOperator. For both topics, we will describe how we mitigated them. We will also describe how we leverage OpenShift to dynamically scale our Airflow deployment based on the running workloads. The talk will be concluded with a brief overview of future requirements and beneficial features we believe will be helpful for the community.

talk-data.com

Activity Trend

Top Events

Top Speakers

Bruce Momjian: Data Horizons With Postgres

SQL in a Nutshell, 4th Edition

Discover And De-Clutter Your Unstructured Data With Aparavi

Hire And Scale Your Data Team With Intention

Simplify Data Security For Sensitive Information With The Skyflow Data Privacy Vault

Bringing The Modern Data Stack To Everyone With Y42

Juan Pan: PostgreSQL Distributed & Secure Database Ecosystem Building

PostgreSQL 14 Administration Cookbook

Practical SQL, 2nd Edition

PostGIS in Action, Third Edition

SQL Pocket Guide, 4th Edition

Developing Modern Database Applications with PostgreSQL

Building Your Data Warehouse On Top Of PostgreSQL

PostgreSQL Query Optimization: The Ultimate Guide to Building Efficient Queries

PostgreSQL 13 Cookbook

Mastering PostgreSQL 13 - Fourth Edition

SQL Cookbook, 2nd Edition

Learn PostgreSQL

Advanced R 4 Data Programming and the Cloud: Using PostgreSQL, AWS, and Shiny

Autonomous driving with Airflow