Learn PostgreSQL - Luca Ferrari - E-Book

Learn PostgreSQL E-Book

Luca Ferrari

0,0
29,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

The latest edition of this PostgreSQL book will help you to start using PostgreSQL from absolute scratch, helping you to quickly understand the internal workings of the database. With a structured approach and practical examples, go on a journey that covers the basics, from SQL statements and how to run server-side programs, to configuring, managing, securing, and optimizing database performance.
This new edition will not only help you get to grips with all the recent changes within the PostgreSQL ecosystem but will also dig deeper into concepts like partitioning and replication with a fresh set of examples. The book is also equipped with Docker images for each chapter which makes the learning experience faster and easier. Starting with the absolute basics of databases, the book sails through to advanced concepts like window functions, logging, auditing, extending the database, configuration, partitioning, and replication. It will also help you seamlessly migrate your existing database system to PostgreSQL and contains a dedicated chapter on disaster recovery. Each chapter ends with practice questions to test your learning at regular intervals.
By the end of this book, you will be able to install, configure, manage, and develop applications against a PostgreSQL database.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 862

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Learn PostgreSQL

Second Edition

Use, manage, and build secure and scalable databases with PostgreSQL 16

Luca Ferrari

Enrico Pirozzi

BIRMINGHAM—MUMBAI

Learn PostgreSQL

Second Edition

Copyright © 2023 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Senior Publishing Product Manager: Gebin George

Acquisition Editor – Peer Reviews: Gaurav Gavas

Project Editor: Meenakshi Vijay

Content Development Editor: Elliot Dallow

Copy Editor: Safis Editing

Technical Editor: Kushal Sharma

Proofreader: Safis Editing

Indexer: Pratik Shirodkar

Presentation Designer: Rajesh Shirsath

Developer Relations Marketing Executive: Vignesh Raju

First published: October 2020

Second edition: October 2023

Production reference: 1251023

Published by Packt Publishing Ltd.

Grosvenor House

11 St Paul’s Square

Birmingham

B3 1RB, UK.

ISBN 978-1-83763-564-1

www.packt.com

To my beautiful wife, Emanuela; I love her like Santa loves his reindeer.

To my great son, Diego, who changed our lives on 1283788200.

To my parents, Miriam and Anselmo: my greatest fans since day one.

– Luca Ferrari

In loving memory of my father, Ilario.

– Enrico Pirozzi

Contributors

About the authors

Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master’s degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an adjunct professor by Nipissing University. An avid Unix user, he is a strong advocate of open-source, and in his free time, he collaborates on a few projects. He first encountered PostgreSQL back in the days of release 7.3; he was a founder and former president of the Italian PostgreSQL Users’ Group (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.

Enrico Pirozzi has been passionate about computer science since he was a 13-year-old. His first computer was a Commodore 64, and today he holds a master’s degree from the University of Bologna. He has participated as a speaker at national and international conferences on PostgreSQL. He first encountered PostgreSQL back in release 7.2, he was a co-founder of the first PostgreSQL Italian mailing list and the first Italian PostgreSQL website, and he talks regularly at technical conferences and events and delivers professional training. Right now, he is employed as a PostgreSQL database administrator at Zucchetti Hospitality (Zucchetti Group S.p.a).

About the reviewers

Chris Mair holds a master’s degree from the University of Trento, Italy, and has been freelance since 2003. His portfolio consists of contributions to over 25 companies, including consultancy work on database programming, performance optimization, and seamless migrations. Chris has expertise in system and network programming, data processing, ML, and more. He has a particular affinity for PostgreSQL. He has taught over 200 courses on various IT topics and is passionate about open-source software.

Silvio Trancanella is a software engineer with around 12 years of experience in backend development, mainly using Java Enterprise and PostgreSQL. He has always been fascinated by database management and was immediately drawn to PostgreSQL from the very beginning of his career.He worked for about 10 years on tourism industry software, developing and maintaining critical services that relied on the PostgreSQL DBMS.

Learn more on Discord

To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:

https://discord.gg/jYWCjF6Tku

Contents

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Introduction to PostgreSQL

Technical requirements

PostgreSQL at a glance

A brief history of PostgreSQL

What’s new in PostgreSQL 16?

PostgreSQL release policy, version numbers, and life cycle

Exploring PostgreSQL terminology

Installing PostgreSQL

What to install

Installing PostgreSQL from binary packages

Using the book’s Docker images

Installing PostgreSQL on GNU/Linux Debian, Ubuntu, and derivatives

Installing PostgreSQL on Fedora Linux

Installing PostgreSQL on FreeBSD

Installing PostgreSQL from sources

Installing PostgreSQL via pgenv

Summary

References

Learn more on Discord

Getting to Know Your Cluster

Technical requirements

Managing your cluster

pg_ctl

PostgreSQL processes

Connecting to the cluster

The template databases

The psql command-line client

Entering SQL statements via psql

A glance at the psql commands

Introducing the connection string

Solving common connection problems

Database “foo” does not exist

Connection refused

No pg_hba.conf entry

Exploring the disk layout of PGDATA

Objects in the PGDATA directory

Tablespaces

Exploring configuration files and parameters

Summary

Verify your knowledge

References

Learn more on Discord

Managing Users and Connections

Technical requirements

Introduction to users and groups

Managing roles

Creating new roles

Role passwords, connections, and availability

Using a role as a group

Removing an existing role

Inspecting existing roles

Managing incoming connections at the role level

The syntax of pg_hba.conf

Order of rules in pg_hba.conf

Merging multiple rules into a single one

Using groups instead of single roles

Using files instead of single roles

Inspecting pg_hba.conf rules

Including other files in pg_hba.conf

Summary

Verify your knowledge

References

Learn more on Discord

Basic Statements

Technical requirements

Using the Docker image

Connecting the database

Creating and managing databases

Creating a database

Managing databases

Introducing schemas

PostgreSQL and the public schema

The search_path variable

The correct way to start working

Listing all tables

Making a new database from a modified template

Dropping tables and databases

Dropping tables

Dropping databases

Making a database copy

Confirming the database size

The psql method

The SQL method

Behind the scenes of database creation

Managing tables

The EXISTS option

Managing temporary tables

Managing unlogged tables

Creating a table

Understanding basic table manipulation statements

Inserting and selecting data

NULL values

Sorting with NULL values

Creating a table starting from another table

Updating data

Deleting data

Summary

Verify your knowledge

References

Learn more on Discord

Advanced Statements

Technical requirements

Exploring the SELECT statement

Using the like clause

Using ilike

Using distinct

Using limit and offset

Using subqueries

Subqueries and the IN/NOT IN condition

Subqueries and the EXISTS/NOT EXISTS condition

Learning about joins

Using INNER JOIN

INNER JOIN versus EXISTS/IN

Using LEFT JOINS

Using RIGHT JOIN

Using FULL OUTER JOIN

Using LATERAL JOIN

Aggregate functions

UNION/UNION ALL

EXCEPT/INTERSECT

Using UPSERT

UPSERT – the PostgreSQL way

Learning the RETURNING clause for INSERT

Returning tuples out of queries

UPDATE related to multiple records

MERGE

Exploring UPDATE ... RETURNING

Exploring DELETE ... RETURNING

Exploring CTEs

CTE concept

CTE in PostgreSQL since version 12

CTE – use cases

Query recursion

Recursive CTEs

Summary

Verify your knowledge

References

Learn more on Discord

Window Functions

Technical requirements

Using basic statement window functions

Using the PARTITION BY function and WINDOW clause

Introducing some useful functions

The ROW_NUMBER function

The ORDER BY clause

FIRST_VALUE

LAST_VALUE

RANK

DENSE_RANK

The LAG and LEAD functions

The CUME_DIST function

The NTILE function

Using advanced statement window functions

The frame clause

ROWS BETWEEN start_point and end_point

RANGE BETWEEN start_point and end_point

Summary

Verify your knowledge

References

Learn more on Discord

Server-Side Programming

Technical requirements

Exploring data types

The concept of extensibility

Standard data types

Boolean data type

Numeric data type

Integer types

Numbers with a fixed precision data type

Numbers with an arbitrary precision data type

Character data type

Chars with fixed-length data types

Chars with variable length with a limit data types

Chars with a variable length without a limit data types

Date/timestamp data types

Date data types

Timestamp data types

The NoSQL data type

The hstore data type

The JSON data type

Exploring functions and languages

Functions

SQL functions

Basic functions

SQL functions returning a set of elements

SQL functions returning a table

Polymorphic SQL functions

PL/pgSQL functions

First overview

Dropping functions

Declaring function parameters

IN/OUT parameters

Function volatility categories

Control structure

Conditional statements

IF statements

CASE statements

Loop statements

The record type

Exception handling statements

Security definer

Summary

Verify your knowledge

References

Learn more on Discord

Triggers and Rules

Technical requirements

Exploring rules in PostgreSQL

Understanding the OLD and NEW variables

Rules on INSERT

The ALSO option

The INSTEAD OF option

Rules on DELETE/UPDATE

Creating the new_tags table

Creating two tables

Managing rules on INSERT, DELETE, and UPDATE events

INSERT rules

DELETE rules

UPDATE rules

Managing triggers in PostgreSQL

Trigger syntax

Triggers on INSERT

The TG_OP variable

Triggers on UPDATE / DELETE

Event triggers

An example of an event trigger

Summary

Verify your knowledge

References

Learn more on Discord

Partitioning

Technical requirements

Basic concepts

Range partitioning

List partitioning

Hash partitioning

Table inheritance

Dropping tables

Exploring declarative partitioning

List partitioning

Range partitioning

Partition maintenance

Attaching a new partition

Detaching an existing partition

Attaching an existing table to the parent table

The default partition

Partitioning and tablespaces

A simple case study

Summary

Verify your knowledge

References

Learn more on Discord

Users, Roles, and Database Security

Technical requirements

Understanding roles

Properties related to new objects

Properties related to superusers

Properties related to replication

Properties related to RLS

Changing properties of existing roles: the ALTER ROLE statement

Renaming an existing role

SESSION_USER versus CURRENT_USER

Per-role configuration parameters

Inspecting roles

Roles that inherit from other roles

Understanding how privileges are resolved

Role inheritance overview

ACLs

Default ACLs

Knowing the default ACLs

Granting and revoking permissions

Permissions related to tables

Column-based permissions

Permissions related to sequences

Permissions related to schemas

ALL objects in the schema

Permissions related to programming languages

Permissions related to routines

Permissions related to databases

Other GRANT and REVOKE statements

Assigning the object owner

Inspecting ACLs

RLS

Role password encryption

SSL connections

Configuring the cluster for SSL

Connecting to the cluster via SSL

Summary

Verify your knowledge

References

Learn more on Discord

Transactions, MVCC, WALs, and Checkpoints

Technical requirements

Introducing transactions

Comparing implicit and explicit transactions

Time within transactions

More about transaction identifiers – the XID wraparound problem

Virtual and real transaction identifiers

Multi-version concurrency control

Transaction isolation levels

READ UNCOMMITTED

READ COMMITTED

REPEATABLE READ

SERIALIZABLE

Explaining MVCC

Savepoints

Deadlocks

How PostgreSQL handles persistency and consistency: WALs

WALs

WALs as a rescue method in the event of a crash

Checkpoints

Checkpoint configuration parameters

checkpoint_timeout and max_wal_size

Checkpoint throttling

Manually issuing a checkpoint

VACUUM

Manual VACUUM

Automatic VACUUM

Summary

Verify your knowledge

References

Learn more on Discord

Extending the Database – the Extension Ecosystem

Technical requirements

Introducing extensions

The extension ecosystem

Extension components

The control file

The script file

Managing extensions

Creating an extension

Viewing installed extensions

Finding out available extension versions

Altering an existing extension

Removing an existing extension

Exploring the PGXN client

Installing pgxnclient on Debian GNU/Linux and derivatives

Installing pgxnclient on Fedora Linux and Red Hat-based distributions

Installing pgxnclient on FreeBSD

Installing pgxnclient from sources

The pgxnclient command-line interface

Installing extensions

Installing the extension via pgxnclient

Installing the extension manually

Using the installed extension

Removing an installed extension

Removing an extension via pgxnclient

Removing a manually compiled extension

Creating your own extension

Defining an example extension

Creating extension files

Installing the extension

Creating an extension upgrade

Performing an extension upgrade

Summary

Verify your knowledge

References

Learn more on Discord

Query Tuning, Indexes, and Performance Optimization

Technical requirements

Execution of a statement

Execution stages

The optimizer

Nodes that the optimizer uses

Sequential nodes

Parallel nodes

When does the optimizer choose a parallel plan?

Utility nodes

Node costs

Indexes

Index types

Creating an index

Inspecting indexes

Dropping an index

Invalidating an index

Rebuilding an index

The EXPLAIN statement

EXPLAIN output formats

EXPLAIN ANALYZE

EXPLAIN options

Examples of query tuning

ANALYZE and how to update statistics

Auto-explain

Summary

Verify your knowledge

References

Learn more on Discord

Logging and Auditing

Technical requirements

Introduction to logging

Where to log

When to log

What to log

Extracting information from logs – pgBadger

Installing pgBadger

Configuring PostgreSQL logging for pgBadger usage

Using pgBadger

Scheduling pgBadger

Implementing auditing

Installing PgAudit

Configuring PostgreSQL to exploit PgAudit

Configuring PgAudit

Auditing by session

Auditing by role

Summary

Verify your knowledge

References

Learn more on Discord

Backup and Restore

Technical requirements

Introducing types of backups and restores

Exploring logical backups

Dumping a single database

Restoring a single database

Limiting the amount of data to backup

Compression

Dump formats and pg_restore

Performing a selective restore

Dumping a whole cluster

Parallel backups

Backup automation

The COPY command

Exploring physical backups

Performing a manual physical backup

pg_verifybackup

Starting the cloned cluster

Restoring from a physical backup

Basic concepts behind PITR

Summary

Verify your knowledge

References

Learn more on Discord

Configuration and Monitoring

Technical requirements

Cluster configuration

Inspecting all the configuration parameters

Finding configuration errors

Nesting configuration files

Configuration contexts

Main configuration settings

WAL settings

Memory-related settings

Process information settings

Networking-related settings

Archive and replication settings

Vacuum and autovacuum-related settings

Optimizer settings

Statistics collector

Modifying the configuration from a live system

Configuration generators

Monitoring the cluster

Information about running queries and sessions

Inspecting locks

Inspecting databases

Inspecting tables and indexes

More statistics

Advanced statistics with pg_stat_statements

Installing the pg_stat_statements extension

Using pg_stat_statements

Resetting data collected from pg_stat_statements

Tuning pg_stat_statements

Summary

Verify your knowledge

References

Learn more on Discord

Physical Replication

Technical requirements

Exploring basic replication concepts

Physical replication and WALs

The wal_level directive

Preparing the environment setup for streaming replication

Managing streaming replication

Basic concepts of streaming replication

Asynchronous replication environment

The wal_keep_segments option

The slot way

The pg_basebackup command

Asynchronous replication

Replica monitoring

Synchronous replication

PostgreSQL settings

Cascading replication

Delayed replication

Promoting a replica server to a primary

Summary

Verify your knowledge

References

Learn more on Discord

Logical Replication

Technical requirements

Understanding the basic concepts of logical replication

Comparing logical replication and physical replication

Exploring a logical replication setup and new logical replication features on PostgreSQL 16

Logical replication environment settings

The replica role

Primary server – postgresql.conf

Replica server – postgresql.conf

The pg_hba.conf file

Logical replication setup

Monitoring logical replication

Read-only versus write-allowed

DDL commands

Disabling logical replication

Making a logical replication using a physical replication instance

Summary

Verify your knowledge

References

Learn more on Discord

Useful Tools and Extensions

Technical requirements

Exploring the pg_trgm extension

Using foreign data wrappers and the postgres_fdw extension

Disaster recovery with pgbackrest

Basic concepts

Environment set up

The exchange of public keys

Installing pgbackrest

Configuring pgbackrest

The repository configuration

Using pgbackrest with object store support

The PostgreSQL server configuration

The postgresql.conf file

The pgbackrest.conf file

Creating and managing continuous backups

Creating the stanza

Checking the stanza

Managing base backups

Managing PITR

Migrating from MySQL/MariaDB to PostgreSQL using pgloader

Summary

Verify your knowledge

References

Other Books You May Enjoy

Index

Landmarks

Cover

Index

Share your thoughts

Once you’ve read Learn PostgreSQL, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.

Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.

Download a free PDF copy of this book

Thanks for purchasing this book!

Do you like to read on the go but are unable to carry your print books everywhere?Is your eBook purchase not compatible with the device of your choice?

Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.

Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application. 

The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily

Follow these simple steps to get the benefits:

Scan the QR code or visit the link below

https://packt.link/free-ebook/9781837635641

Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directly

1

Introduction to PostgreSQL

PostgreSQL is a well-known open-source relational database, and its motto states what the project intends to be the most advanced open-source database in the world.

The main qualities that attract masses of new users every year and keep current users enthusiastic about PostgreSQL are its rock-solid stability, scalability, and safeness, as well as all the features that an enterprise-level database management system must provide.

While PostgreSQL is a relational database, its ecosystem has grown over time, providing a rich platform with extensions, tools, and languages tied together by communities spread around the world.

PostgreSQL is an open-source project and is fully developed in the open-source world. That means that there is no single entity in charge of the project and the result is that PostgreSQL is not a commercial product. In other words, PostgreSQL belongs to everyone, and anyone can contribute to it. Thanks to a very permissive BSD-style license, PostgreSQL can be used in any project or scenario, either open or closed source.

Of course, contributing to a project of that size and complexity requires experience in software development, database concepts, and, of course, a positive attitude to open-source and collaborative efforts. Being open-source in nature means that PostgreSQL will continue to live pretty much forever without the risk of a single company going out of business and sinking with the database.

The official PostgreSQL developers are generally known as the PostgreSQL Global Development Group (PGDG), and they are the developers that, after discussion and coordination, implement the main features and produce new releases. The PGDG delivers a new production release once per year, usually in the last quarter of the year.

At the time of writing, PostgreSQL 16 is the latest production release of this great database engine, and as usual, efforts for the next release (PostgreSQL 17) are ongoing.

This book will focus on how you can get the best out of PostgreSQL, starting from the basics (managing users, data tables, indexes, and so on) and moving toward the most exciting and complex features (such as replicating your data to prevent disasters). We’ll take a practical approach, with several examples, in order to let readers better understand every concept and acquire knowledge in a more fun and quick way. At the end, you will be able to fully administer a PostgreSQL cluster and, thanks to the resources pointed out in every chapter, you will be able to research even more features.

This book covers PostgreSQL 16, but the concepts explained in this book can also be applied to later versions (as well as to previous ones where the same features are present). In fact, while some tools could change in future releases (e.g., adding or removing some options), the basic concepts expressed in the book will remain pretty much the same without any regard to the PostgreSQL version.

This chapter will introduce you to this great open-source database starting from the project history and goals; you will learn basic PostgreSQL terminology, which is very important to help you search the documentation and understand the main error messages, in case you need to. Finally, you will see how to install PostgreSQL in different ways so that you will get a basic knowledge of how to install it on different platforms and in different contexts.

The following topics are covered in this chapter:

PostgreSQL at a glanceExploring PostgreSQL terminologyInstalling PostgreSQL 16 or higher

Technical requirements

You can find the code for this chapter at the following GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition.

PostgreSQL at a glance

As a relational database, PostgreSQL provides a lot of features, and it is quite difficult to “scare” a PostgreSQL instance.

In fact, a single instance can contain more than 4 billion individual databases, each with unlimited total size and capacity for more than 1 billion tables, each containing 32 TB of data. Moreover, if there’s any concern that those upper limits won’t suffice, please consider that a single table can have 1,600 columns, each 1 GB in size, with an unlimited number of multi-column indexes (up to 32 columns). In short, PostgreSQL can store much more data than you can possibly think of!

While PostgreSQL can handle such huge amounts of data, that does not mean that you should use it as a dumping ground or catch-all storage: in order to perform well with certain big databases, you need to understand PostgreSQL and its features, being therefore able to organize and manage your datasets.

PostgreSQL is fully ACID-compliant (see the box below) and has a very strong foundation in data integrity and concurrency. It ships with a procedural language, named PL/PgSQL, which can be used to write reusable pieces of code, such as functions and procedures, and it supports before and after triggers, views, materialized views, partitioned tables, foreign data wrappers, multiple schemas, generated columns, and so on. All of these concepts will be explained in the forthcoming chapters.

ACID is an acronym of properties, used to indicate that the database engine provides atomicity, consistency, isolation, and durability. Atomicity means that a complex database operation is processed as a single instruction even when it is made up of different operations. Consistency means that the data within the database is always kept consistent and that it is not corrupted due to partially performed operations. Isolation allows the database to handle concurrency in the “right way”—that is, without having corrupted data from interleaved changes. Lastly, durability means that the database engine is supposed to protect the data it contains, even in the case of either software or hardware failures, as much as it can.

PostgreSQL can be extended with other embedded languages, such as Perl, Python, Java, and even Bash! And if you think the database does not provide you with enough features, you can plug in extensions to obtain different behaviors and enhancements—for instance, Geospatial Information System (GIS), scheduled jobs, esoteric data types, and utilities in general. Such utilities and enhancement will not be covered in this book, but thanks to the knowledge this book provides, it will be possible to exploit the online documentation of such utilities to get the best out of them.

PostgreSQL runs on pretty much every operating system out there, including Linux, Unix, macOS X, and Microsoft Windows, and can even run on commodity hardware such as Raspberry Pi boards. There are also several cloud computing providers that list PostgreSQL in their software catalog.

Thanks to its extensive tuning mechanism, it can be adapted very well to the hosting platform. The community is responsible for keeping the database and documentation at a very high-quality level, and also, the mailing lists and IRC channels are very responsive and a valuable source for problem solutions and ideas.

In the experience of the authors, there has never been a case where PostgreSQL has not been able to adapt to an application scenario.

The PostgreSQL project has a very rich and extensive set of mailing lists that range from general topics to very specific details. It is a good habit to search for problems and solutions on the mailing list archives; see the web page at https://www.postgresql.org/list/ to get a better idea.

A brief history of PostgreSQL

PostgreSQL takes its name from its ancestor: Ingres.

Ingres was a relational database developed by Professor Michael Stonebraker. In 1986, Professor Stonebraker started a post-Ingres project to develop new, cool features in the database landscape and named this project POSTGRES (POST-Ingres). The project aimed to develop an object-relational database, where “object” means the user would have the capability to extend the database with their own objects, such as data types, functions, and so on.

In 1994, POSTGRES was released with version 4.2 and an MIT license, which opened up collaboration with other developers around the world. At that time, POSTGRES was using an internal query language named QUEL. Two Berkeley students, Andrew Yu and Jolly Chen, replaced the QUEL query language with the hot and cool SQL language, and the feature was so innovative that the project changed its name to Postgre95 to emphasize the difference compared to other, preceding versions.

Eventually, in 1996, the project gained a public server to host the code, and five developers, including Marc G. Fournier, Tom Lane, and Bruce Momjan, started the development of the newly branded project named PostgreSQL. Since then, the project has been kept in good shape and up to date.

This also means that PostgreSQL has been developed for nearly 30 years, again emphasizing the solidity and openness of the project itself. If you are curious, it is also possible to dig into the source code down to the initial commit in the open-source world:

$ git log 'git rev-list --max-parents=0 HEAD' commit d31084e9d1118b25fd16580d9d8c2924b5740dff Author: Marc G. Fournier <[email protected]> Date: Tue Jul 9 06:22:35 1996 +0000 Postgres95 1.01 Distribution - Virgin Sources

What’s new in PostgreSQL 16?

PostgreSQL 16 was released on 14th September 2023. It includes a rich set of improvements, including the following:

Several performance optimizations, ranging from internal memory allocation and management to a more parallelized-by-default behavior.A revised set of permissions for users and groups, including new system groups to provide specific capabilities.An improved configuration mechanism, to ease the inclusion of files and match users and hosts by means of regular expression.A more complete set of JSON functions.An improved logical replication engine that allows decoding even on the stand-by servers.A set of utility columns gained new options to fine-tune what the administrator needs to do.

As with other releases, PostgreSQL 16 also contains a set of changes aimed at making the Database Administrator (DBA)’s life easier—for instance, removing conflicting options and obsolete SQL terms and types. This emphasizes the fact that PostgreSQL developers do always take care of the database and its adherence to the current SQL standards.

PostgreSQL release policy, version numbers, and life cycle

PostgreSQL developers release a new major release once per year, usually around October. A major release is a stable version that introduces new features and possible incompatibilities with previous versions. During its life cycle, a major release is constantly improved by means of minor releases, which are usually bug-fixing and maintenance releases.

The PostgreSQL version number identifies the major and minor release. The version number is specified as major.minor; so, for instance, 16.0 indicates the first major release, 16, while 16.1 indicates the minor release, 1, of major release 16. In short, the greater the number, the more recent the version you are managing.

PostgreSQL’s different major versions are incompatible, while different minor versions are compatible. What does such incompatibility mean? PostgreSQL stores data in binary format, and this format could possibly change between major versions. This means that, while you are able to upgrade PostgreSQL between minor versions on the fly, you probably will have to dump and restore your database content between major version upgrades.

The recommendation, as for much other software, is to run the most recent version of PostgreSQL available to you: PostgreSQL developers put in a lot of effort in order to provide bug-free products, but new features could introduce new bugs, and regardless of the very extensive testing platform PostgreSQL has, it is software after all, and software has bugs. Despite internal bugs, new releases also include fixes for security exploits and performance improvements, so it is a very good habit to keep up to date with your running PostgreSQL server.

Last but not least, not all PostgreSQL versions will live forever. PostgreSQL provides support and upgrades for five years after a new release is issued; after this length of time, a major release will reach its End Of Life (EOL) and PostgreSQL developers will no longer maintain it. This does not mean you cannot run an ancient version of PostgreSQL; it simply means this version will not get any upgrades from the official project and, therefore, will be out of date. As an example, since PostgreSQL 16 was released in 2023, it will reach its EOL in 2028. Keep in mind that running an EOL release is not only a matter of not getting new upgrades, security patches, and bug fixes; you will be on your own and you will not find help when you run into trouble.

With that in mind, we’ll now introduce the main PostgreSQL terminology, as well as further useful-to-understand concepts.

Exploring PostgreSQL terminology

In order for you to understand how PostgreSQL works and follow the examples in the chapters of this book, we need to introduce the terminology used within PostgreSQL and its community of users.

PostgreSQL is a service, which means it runs as a daemon on the operating system; a running PostgreSQL daemon is called an instance. A PostgreSQL instance is often called a cluster because a single instance can serve and handle multiple databases. Every database is an isolated space where users and applications can store data.

A database is accessed by allowed users, but users connected to a database cannot cross the database boundaries and interact with data contained in another database unless they explicitly connect to the latter database too.

A database can be organized into namespaces, called schemas. A schema is a mnemonic name that the user can assign to organize database objects, such as tables, into a more structured collection. Schemas cannot be nested, so they represent a flat namespace.

Database objects are represented by everything the user can create and manage within the database—for instance, tables, functions, triggers, and data types. Every object belongs to one and only one schema that, if not specified, is named as the user that creates the object.

In PostgreSQL versions prior to 15, every new object belongs to the default public schema if not specified otherwise. Since PostgreSQL 15, every user is assigned a personal schema and objects belong to such a schema unless a different schema name is explicitly specified.

Users are defined at a cluster-wide level, which means they are not tied to a particular database in the cluster. A user can connect with and manage any database in the cluster they are allowed to.

PostgreSQL splits users into two main categories:

Normal users: These users are the ones who can connect to and handle databases and objects depending on their privilege set.Superusers: These users can do anything with any database object.

PostgreSQL allows the configuration of as many superusers as you need, and every superuser has the very same permissions: they can do everything with every database and object and, most notably, can also control the life cycle of the cluster (for instance, they can terminate normal user connections, reload the configuration, stop the whole cluster, and so on).

PostgreSQL internal data, such as users, databases, namespaces, configuration, and database runtime status, is provided by means of catalogs: special tables and views that present information in a SQL-interactive way. Many catalogs are trimmed depending on the user who is inspecting them, with the exception that superusers usually see the whole set of available information.

PostgreSQL stores the user data (for example, tables) and its internal status on the local filesystem.

This is an important point to keep in mind: PostgreSQL relies on the underlying filesystem to implement persistence, and therefore tuning the filesystem is an important task in order to make PostgreSQL perform well. In particular, PostgreSQL stores all of its content (user data and internal status) in a single filesystem directory known as PGDATA. The PGDATA directory represents what the cluster is serving as databases, so it is possible for you to have a single installation of PostgreSQL and make it switch to different PGDATA directories to deliver different content. As you will see in the next sections, the PGDATA directory needs to be initialized before it can be used by PostgreSQL; the initialization is the creation of the directory structure within PGDATA itself and is, of course, a one-time operation.

The detailed contents of PGDATA will be explained in the next chapter, but for now, it will suffice for you to remember that the PGDATA directory is where PostgreSQL expects to find data and configuration files. In particular, the PGDATA directory is made up of at least the Write-Ahead Logs (WALs) and the data storage. Without either of those two parts, the cluster is unable to guarantee data consistency and, in some critical circumstances, even start.

WALs are a technology that many database systems use, and the basic idea of how they work is shared with other technologies like transactional filesystems (such as ZFS, UFS with Soft Updates, and so on). The idea is that, before applying any change to a chunk of data, an intent log will be made persistent. In this case, if the cluster crashes, it can always rely on the already-written intent log to understand what operations have been completed and what must be recovered (more details on this in later chapters). Please note that with the term “crash,” we refer to any possible disaster that can hit your cluster, including a software bug, but more likely a lack of electrical power, hard disk failures, and so on. PostgreSQL does commit to providing you with the best data consistency it can, and therefore, it makes a great effort to ensure that the intent log (WAL) is as safe as possible.

Internally, PostgreSQL keeps track of the tables’ structures, indexes, functions, and all the stuff needed to manage the cluster in its dedicated storage, the catalog.

The SQL standard defines a so-called information schema, a collection of tables common to all standard database implementations, including PostgreSQL, that the DBA can use to inspect the internal status of the database itself. For instance, the information schema defines a table that collects information about all the user-defined tables so that it is possible to query the information schema to see whether a specific table exists or not. The PostgreSQL catalog is what could be called an “information schema on steroids”: the catalog is much more accurate and PostgreSQL-specific than the general information schema, and the DBA can extract a lot more information about the PostgreSQL status from the catalog. Of course, PostgreSQL does support the information schema, but throughout the whole book, you will see references to the catalogs because they provide much more detailed information.

When the cluster is started, PostgreSQL launches a single process called the postmaster. The aim of the postmaster is to bootstrap the instance, spawning needed processes to manage the database activity, and then to wait for incoming connections. A user connection, often made over a TCP/IP connection, requires the postmaster to fork another process named the backend process, which in turn is in charge of serving one and only one connection.

This means that every time a new connection against the cluster is opened, the cluster reacts by launching a new backend process to serve it until the connection ends and the process is, consequently, destroyed. The postmaster usually also starts some utility processes that are responsible for keeping PostgreSQL in good shape while it is running; these processes will be discussed later, in this and the next chapters.

To summarize, PostgreSQL provides you with executables that can be installed wherever you want on your system and can serve a single cluster. The cluster, in turn, serves data out of a single PGDATA directory that contains, among other stuff, the user data, the cluster’s internal status, the catalog, and the WALs. Every time a client connects to the server, the postmaster process forks a new backend process that is the minion in charge of serving the connection.

From the concepts explained above, the following is a quick recap of the most complex terms used in PostgreSQL:

Cluster: the whole PostgreSQL service.Postmaster: the first process the cluster executes, and this process is responsible for keeping track of the activities of the whole cluster. The postmaster spawns a backend process every time a new connection is established.Database: an isolated data container to which users (or applications) can connect. A cluster can handle multiple databases. A database can be made up of different objects, including schemas (namespaces), tables, triggers, and other objects you will see as the book progresses.PGDATA: the directory that, on persistent storage, is fully dedicated to PostgreSQL and its data. PostgreSQL stores the data within such a directory.WALs: the intent log of database changes, used to recover data from a critical crash.

Now that we’ve discussed the basic terminology related to PostgreSQL, it is time to get it installed on your machine.

Installing PostgreSQL

PostgreSQL can run on several operating systems, most notably Unix and Unix-like systems, including Linux, as well as on Microsoft Windows 11 or higher. So far, the most supported platform remains Linux because most PostgreSQL developers work on this platform, and so it is the one with the most tested use cases. However, deploying on other supported platforms should not present any problems and is not going to put your data at any risk.

This section will focus on installing PostgreSQL 16, since it is the latest stable version available worldwide. You will learn, however, how to build your own version of PostgreSQL, and this may also be the way for you to install other versions of PostgreSQL in the future.

Before installing PostgreSQL, you need to choose, or at least evaluate, how to install it. There are two main ways to get PostgreSQL up and running:

Compiling from sourcesUsing binary packages

Binary packages are provided by the PostgreSQL community or the operating system, and using them has the advantage that it can provide you with a smooth PostgreSQL installation.

Moreover, binary packages do not require a compilation toolchain and therefore are much easier to adopt. Lastly, a binary package adheres to the operating system conventions it has been built for (for instance, on where to place configuration files) and upgrades can be managed by the operating system as well. Since binary packages need to be pre-built from vendors, they may not reflect the latest released version. For example, when the PGDG delivers a new minor update, operating systems require some days to push out binary packages with such upgrades for all the supported platforms.

On the other hand, installing from sources requires a compilation toolchain, as well as much more time and CPU consumption to build PostgreSQL executables. You have full control over which components will be available in the final product and can trim and optimize your instance for very high performances and shrink resource consumption to a minimum. In the long term, however, you will be responsible for maintaining the installation and upgrading it in a similar manner.

What to install

PostgreSQL is split across several components to install:

The PostgreSQL server is the part that can serve your databases to applications and users and is required to store your data.The PostgreSQL client is the library and client tool to connect to the database server. It is not required if you don’t need to connect to the database on the very same machine, while it is required on client machines.The PostgreSQL contrib package is a set of well-known extensions and utilities that can enhance your PostgreSQL experience. This additional package is developed by the PGDG and is therefore well integrated and stable.The PostgreSQL docs is the documentation (e.g., man pages) related to the server and the client.PostgreSQL PL/Perl, PL/Python, and PL/Tcl are three components to allow the usage of programming languages— Perl, Python, and Tcl, respectively—directly within the PostgreSQL server.

The recommended set of components is the server, the client, and the contrib modules; these modules will be used across the book. You are free to decide whether to install the other components as you wish, but this book will not detail each of them.

Installing PostgreSQL from binary packages

In order to better understand the concepts explained in this book, we recommend readers try the code examples on their own; therefore, you will need? a PostgreSQL instance available. While the best choice to get a full PostgreSQL instance at your fingertips is to install it on a virtual machine or a physical computer, we have also provided a set of Docker images as containerized PostgreSQL instances to run and experiment on. Therefore, you can choose between performing a full installation or a quick Docker setup to get a PostgreSQL machine ready. However, it is important for every DBA to be able to install PostgreSQL on several systems, and therefore, this section aims to show you how to perform a complete installation from scratch on a few Unix-like operating systems.

In the following sections, you will see how to install PostgreSQL on a few popular Linux and Unix operating systems, namely the following:

Linux Docker containersGNU/Linux Debian, Ubuntu, and derivativesFedora Linux (this also applies to Red Hat Enterprise Linux and compatible distributions, like Rocky Linux)FreeBSD

It is not possible to provide detailed instructions for every operating system out there, but the concepts presented in the following sections should prove insightful regardless.

Before getting to the practical installation, it is worth noting that binary packages could come in two flavors: those provided by the operating system vendor, and those provided by the PGDG. Usually, on Linux-based systems, you should use binary packages provided by the PGDG, because they are the most authoritative source for PostgreSQL. In fact, packages provided by the operating system vendor tend to become out of date very soon, which means they are usually a few versions behind the latest version globally available. On the other hand, on BSD platforms like FreeBSD, OpenBSD, and NetBSD, the operating system porters do an excellent job of keeping the packages provided by the operating system itself very up to date, so you can safely and easily use the operating system packages.

An important thing to note is that different operating systems store files in different places: usually, all the configuration files are placed within the PGDATA itself, but packages from some operating systems scatter the configuration files under the /etc directory. A few operating systems also place executables in specific paths, separated by the version of PostgreSQL, while others place all executables in the same path.

You need to investigate with the operating system package provider where each file or directory is placed in order to be able to configure and use PostgreSQL.

Using the book’s Docker images

Docker is a container that allows you to run an isolated set of processes as if they are part of a micro virtual machine. The PGDG provides a Docker image that you can use to run a containerized cluster.

Explaining the Docker technology is out of the scope of this book, and in order to let you experiment in a quick and easy way with PostgreSQL, we have provided a set of Docker images, based on the PostgreSQL image, customized to let you experiment with the concepts explained in this book. You can use the above images as a starting point for your own projects, even if the above images are not meant to be used in a production environment. The images are contained in the docker_images directory of the book’s code repository (https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition/).

We separated every Docker image by means of the chapter the image refers to. There is a common catch-all image named standalone that can be used as a common base and will be used in the very first chapters. Other chapters, for instance, those on replication, require their own image to be executed.

In order to start the base standalone image, you can simply execute the shell script run-pg-docker.sh, as follows:

$ sh run-pg-docker.sh … postgres@learn_postgresql:~$

The script will ask you for a password; it is required that your user has sudo capabilities to connect the Docker network and ports. All the containers will launch a GNU Bash session with the postgres operating system user.