29,99 €
The latest edition of this PostgreSQL book will help you to start using PostgreSQL from absolute scratch, helping you to quickly understand the internal workings of the database. With a structured approach and practical examples, go on a journey that covers the basics, from SQL statements and how to run server-side programs, to configuring, managing, securing, and optimizing database performance.
This new edition will not only help you get to grips with all the recent changes within the PostgreSQL ecosystem but will also dig deeper into concepts like partitioning and replication with a fresh set of examples. The book is also equipped with Docker images for each chapter which makes the learning experience faster and easier. Starting with the absolute basics of databases, the book sails through to advanced concepts like window functions, logging, auditing, extending the database, configuration, partitioning, and replication. It will also help you seamlessly migrate your existing database system to PostgreSQL and contains a dedicated chapter on disaster recovery. Each chapter ends with practice questions to test your learning at regular intervals.
By the end of this book, you will be able to install, configure, manage, and develop applications against a PostgreSQL database.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 862
Veröffentlichungsjahr: 2023
Learn PostgreSQL
Second Edition
Use, manage, and build secure and scalable databases with PostgreSQL 16
Luca Ferrari
Enrico Pirozzi
BIRMINGHAM—MUMBAI
Learn PostgreSQL
Second Edition
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Senior Publishing Product Manager: Gebin George
Acquisition Editor – Peer Reviews: Gaurav Gavas
Project Editor: Meenakshi Vijay
Content Development Editor: Elliot Dallow
Copy Editor: Safis Editing
Technical Editor: Kushal Sharma
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Presentation Designer: Rajesh Shirsath
Developer Relations Marketing Executive: Vignesh Raju
First published: October 2020
Second edition: October 2023
Production reference: 1251023
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul’s Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83763-564-1
www.packt.com
To my beautiful wife, Emanuela; I love her like Santa loves his reindeer.
To my great son, Diego, who changed our lives on 1283788200.
To my parents, Miriam and Anselmo: my greatest fans since day one.
– Luca Ferrari
In loving memory of my father, Ilario.
– Enrico Pirozzi
Luca Ferrari has been passionate about computer science since the Commodore 64 era, and today holds a master’s degree (with honors) and a Ph.D. from the University of Modena and Reggio Emilia. He has written several research papers, technical articles, and book chapters. In 2011, he was named an adjunct professor by Nipissing University. An avid Unix user, he is a strong advocate of open-source, and in his free time, he collaborates on a few projects. He first encountered PostgreSQL back in the days of release 7.3; he was a founder and former president of the Italian PostgreSQL Users’ Group (ITPUG). He also talks regularly at technical conferences and events and delivers professional training.
Enrico Pirozzi has been passionate about computer science since he was a 13-year-old. His first computer was a Commodore 64, and today he holds a master’s degree from the University of Bologna. He has participated as a speaker at national and international conferences on PostgreSQL. He first encountered PostgreSQL back in release 7.2, he was a co-founder of the first PostgreSQL Italian mailing list and the first Italian PostgreSQL website, and he talks regularly at technical conferences and events and delivers professional training. Right now, he is employed as a PostgreSQL database administrator at Zucchetti Hospitality (Zucchetti Group S.p.a).
Chris Mair holds a master’s degree from the University of Trento, Italy, and has been freelance since 2003. His portfolio consists of contributions to over 25 companies, including consultancy work on database programming, performance optimization, and seamless migrations. Chris has expertise in system and network programming, data processing, ML, and more. He has a particular affinity for PostgreSQL. He has taught over 200 courses on various IT topics and is passionate about open-source software.
Silvio Trancanella is a software engineer with around 12 years of experience in backend development, mainly using Java Enterprise and PostgreSQL. He has always been fascinated by database management and was immediately drawn to PostgreSQL from the very beginning of his career.He worked for about 10 years on tourism industry software, developing and maintaining critical services that relied on the PostgreSQL DBMS.
To join the Discord community for this book – where you can share feedback, ask questions to the author, and learn about new releases – follow the QR code below:
https://discord.gg/jYWCjF6Tku
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Introduction to PostgreSQL
Technical requirements
PostgreSQL at a glance
A brief history of PostgreSQL
What’s new in PostgreSQL 16?
PostgreSQL release policy, version numbers, and life cycle
Exploring PostgreSQL terminology
Installing PostgreSQL
What to install
Installing PostgreSQL from binary packages
Using the book’s Docker images
Installing PostgreSQL on GNU/Linux Debian, Ubuntu, and derivatives
Installing PostgreSQL on Fedora Linux
Installing PostgreSQL on FreeBSD
Installing PostgreSQL from sources
Installing PostgreSQL via pgenv
Summary
References
Learn more on Discord
Getting to Know Your Cluster
Technical requirements
Managing your cluster
pg_ctl
PostgreSQL processes
Connecting to the cluster
The template databases
The psql command-line client
Entering SQL statements via psql
A glance at the psql commands
Introducing the connection string
Solving common connection problems
Database “foo” does not exist
Connection refused
No pg_hba.conf entry
Exploring the disk layout of PGDATA
Objects in the PGDATA directory
Tablespaces
Exploring configuration files and parameters
Summary
Verify your knowledge
References
Learn more on Discord
Managing Users and Connections
Technical requirements
Introduction to users and groups
Managing roles
Creating new roles
Role passwords, connections, and availability
Using a role as a group
Removing an existing role
Inspecting existing roles
Managing incoming connections at the role level
The syntax of pg_hba.conf
Order of rules in pg_hba.conf
Merging multiple rules into a single one
Using groups instead of single roles
Using files instead of single roles
Inspecting pg_hba.conf rules
Including other files in pg_hba.conf
Summary
Verify your knowledge
References
Learn more on Discord
Basic Statements
Technical requirements
Using the Docker image
Connecting the database
Creating and managing databases
Creating a database
Managing databases
Introducing schemas
PostgreSQL and the public schema
The search_path variable
The correct way to start working
Listing all tables
Making a new database from a modified template
Dropping tables and databases
Dropping tables
Dropping databases
Making a database copy
Confirming the database size
The psql method
The SQL method
Behind the scenes of database creation
Managing tables
The EXISTS option
Managing temporary tables
Managing unlogged tables
Creating a table
Understanding basic table manipulation statements
Inserting and selecting data
NULL values
Sorting with NULL values
Creating a table starting from another table
Updating data
Deleting data
Summary
Verify your knowledge
References
Learn more on Discord
Advanced Statements
Technical requirements
Exploring the SELECT statement
Using the like clause
Using ilike
Using distinct
Using limit and offset
Using subqueries
Subqueries and the IN/NOT IN condition
Subqueries and the EXISTS/NOT EXISTS condition
Learning about joins
Using INNER JOIN
INNER JOIN versus EXISTS/IN
Using LEFT JOINS
Using RIGHT JOIN
Using FULL OUTER JOIN
Using LATERAL JOIN
Aggregate functions
UNION/UNION ALL
EXCEPT/INTERSECT
Using UPSERT
UPSERT – the PostgreSQL way
Learning the RETURNING clause for INSERT
Returning tuples out of queries
UPDATE related to multiple records
MERGE
Exploring UPDATE ... RETURNING
Exploring DELETE ... RETURNING
Exploring CTEs
CTE concept
CTE in PostgreSQL since version 12
CTE – use cases
Query recursion
Recursive CTEs
Summary
Verify your knowledge
References
Learn more on Discord
Window Functions
Technical requirements
Using basic statement window functions
Using the PARTITION BY function and WINDOW clause
Introducing some useful functions
The ROW_NUMBER function
The ORDER BY clause
FIRST_VALUE
LAST_VALUE
RANK
DENSE_RANK
The LAG and LEAD functions
The CUME_DIST function
The NTILE function
Using advanced statement window functions
The frame clause
ROWS BETWEEN start_point and end_point
RANGE BETWEEN start_point and end_point
Summary
Verify your knowledge
References
Learn more on Discord
Server-Side Programming
Technical requirements
Exploring data types
The concept of extensibility
Standard data types
Boolean data type
Numeric data type
Integer types
Numbers with a fixed precision data type
Numbers with an arbitrary precision data type
Character data type
Chars with fixed-length data types
Chars with variable length with a limit data types
Chars with a variable length without a limit data types
Date/timestamp data types
Date data types
Timestamp data types
The NoSQL data type
The hstore data type
The JSON data type
Exploring functions and languages
Functions
SQL functions
Basic functions
SQL functions returning a set of elements
SQL functions returning a table
Polymorphic SQL functions
PL/pgSQL functions
First overview
Dropping functions
Declaring function parameters
IN/OUT parameters
Function volatility categories
Control structure
Conditional statements
IF statements
CASE statements
Loop statements
The record type
Exception handling statements
Security definer
Summary
Verify your knowledge
References
Learn more on Discord
Triggers and Rules
Technical requirements
Exploring rules in PostgreSQL
Understanding the OLD and NEW variables
Rules on INSERT
The ALSO option
The INSTEAD OF option
Rules on DELETE/UPDATE
Creating the new_tags table
Creating two tables
Managing rules on INSERT, DELETE, and UPDATE events
INSERT rules
DELETE rules
UPDATE rules
Managing triggers in PostgreSQL
Trigger syntax
Triggers on INSERT
The TG_OP variable
Triggers on UPDATE / DELETE
Event triggers
An example of an event trigger
Summary
Verify your knowledge
References
Learn more on Discord
Partitioning
Technical requirements
Basic concepts
Range partitioning
List partitioning
Hash partitioning
Table inheritance
Dropping tables
Exploring declarative partitioning
List partitioning
Range partitioning
Partition maintenance
Attaching a new partition
Detaching an existing partition
Attaching an existing table to the parent table
The default partition
Partitioning and tablespaces
A simple case study
Summary
Verify your knowledge
References
Learn more on Discord
Users, Roles, and Database Security
Technical requirements
Understanding roles
Properties related to new objects
Properties related to superusers
Properties related to replication
Properties related to RLS
Changing properties of existing roles: the ALTER ROLE statement
Renaming an existing role
SESSION_USER versus CURRENT_USER
Per-role configuration parameters
Inspecting roles
Roles that inherit from other roles
Understanding how privileges are resolved
Role inheritance overview
ACLs
Default ACLs
Knowing the default ACLs
Granting and revoking permissions
Permissions related to tables
Column-based permissions
Permissions related to sequences
Permissions related to schemas
ALL objects in the schema
Permissions related to programming languages
Permissions related to routines
Permissions related to databases
Other GRANT and REVOKE statements
Assigning the object owner
Inspecting ACLs
RLS
Role password encryption
SSL connections
Configuring the cluster for SSL
Connecting to the cluster via SSL
Summary
Verify your knowledge
References
Learn more on Discord
Transactions, MVCC, WALs, and Checkpoints
Technical requirements
Introducing transactions
Comparing implicit and explicit transactions
Time within transactions
More about transaction identifiers – the XID wraparound problem
Virtual and real transaction identifiers
Multi-version concurrency control
Transaction isolation levels
READ UNCOMMITTED
READ COMMITTED
REPEATABLE READ
SERIALIZABLE
Explaining MVCC
Savepoints
Deadlocks
How PostgreSQL handles persistency and consistency: WALs
WALs
WALs as a rescue method in the event of a crash
Checkpoints
Checkpoint configuration parameters
checkpoint_timeout and max_wal_size
Checkpoint throttling
Manually issuing a checkpoint
VACUUM
Manual VACUUM
Automatic VACUUM
Summary
Verify your knowledge
References
Learn more on Discord
Extending the Database – the Extension Ecosystem
Technical requirements
Introducing extensions
The extension ecosystem
Extension components
The control file
The script file
Managing extensions
Creating an extension
Viewing installed extensions
Finding out available extension versions
Altering an existing extension
Removing an existing extension
Exploring the PGXN client
Installing pgxnclient on Debian GNU/Linux and derivatives
Installing pgxnclient on Fedora Linux and Red Hat-based distributions
Installing pgxnclient on FreeBSD
Installing pgxnclient from sources
The pgxnclient command-line interface
Installing extensions
Installing the extension via pgxnclient
Installing the extension manually
Using the installed extension
Removing an installed extension
Removing an extension via pgxnclient
Removing a manually compiled extension
Creating your own extension
Defining an example extension
Creating extension files
Installing the extension
Creating an extension upgrade
Performing an extension upgrade
Summary
Verify your knowledge
References
Learn more on Discord
Query Tuning, Indexes, and Performance Optimization
Technical requirements
Execution of a statement
Execution stages
The optimizer
Nodes that the optimizer uses
Sequential nodes
Parallel nodes
When does the optimizer choose a parallel plan?
Utility nodes
Node costs
Indexes
Index types
Creating an index
Inspecting indexes
Dropping an index
Invalidating an index
Rebuilding an index
The EXPLAIN statement
EXPLAIN output formats
EXPLAIN ANALYZE
EXPLAIN options
Examples of query tuning
ANALYZE and how to update statistics
Auto-explain
Summary
Verify your knowledge
References
Learn more on Discord
Logging and Auditing
Technical requirements
Introduction to logging
Where to log
When to log
What to log
Extracting information from logs – pgBadger
Installing pgBadger
Configuring PostgreSQL logging for pgBadger usage
Using pgBadger
Scheduling pgBadger
Implementing auditing
Installing PgAudit
Configuring PostgreSQL to exploit PgAudit
Configuring PgAudit
Auditing by session
Auditing by role
Summary
Verify your knowledge
References
Learn more on Discord
Backup and Restore
Technical requirements
Introducing types of backups and restores
Exploring logical backups
Dumping a single database
Restoring a single database
Limiting the amount of data to backup
Compression
Dump formats and pg_restore
Performing a selective restore
Dumping a whole cluster
Parallel backups
Backup automation
The COPY command
Exploring physical backups
Performing a manual physical backup
pg_verifybackup
Starting the cloned cluster
Restoring from a physical backup
Basic concepts behind PITR
Summary
Verify your knowledge
References
Learn more on Discord
Configuration and Monitoring
Technical requirements
Cluster configuration
Inspecting all the configuration parameters
Finding configuration errors
Nesting configuration files
Configuration contexts
Main configuration settings
WAL settings
Memory-related settings
Process information settings
Networking-related settings
Archive and replication settings
Vacuum and autovacuum-related settings
Optimizer settings
Statistics collector
Modifying the configuration from a live system
Configuration generators
Monitoring the cluster
Information about running queries and sessions
Inspecting locks
Inspecting databases
Inspecting tables and indexes
More statistics
Advanced statistics with pg_stat_statements
Installing the pg_stat_statements extension
Using pg_stat_statements
Resetting data collected from pg_stat_statements
Tuning pg_stat_statements
Summary
Verify your knowledge
References
Learn more on Discord
Physical Replication
Technical requirements
Exploring basic replication concepts
Physical replication and WALs
The wal_level directive
Preparing the environment setup for streaming replication
Managing streaming replication
Basic concepts of streaming replication
Asynchronous replication environment
The wal_keep_segments option
The slot way
The pg_basebackup command
Asynchronous replication
Replica monitoring
Synchronous replication
PostgreSQL settings
Cascading replication
Delayed replication
Promoting a replica server to a primary
Summary
Verify your knowledge
References
Learn more on Discord
Logical Replication
Technical requirements
Understanding the basic concepts of logical replication
Comparing logical replication and physical replication
Exploring a logical replication setup and new logical replication features on PostgreSQL 16
Logical replication environment settings
The replica role
Primary server – postgresql.conf
Replica server – postgresql.conf
The pg_hba.conf file
Logical replication setup
Monitoring logical replication
Read-only versus write-allowed
DDL commands
Disabling logical replication
Making a logical replication using a physical replication instance
Summary
Verify your knowledge
References
Learn more on Discord
Useful Tools and Extensions
Technical requirements
Exploring the pg_trgm extension
Using foreign data wrappers and the postgres_fdw extension
Disaster recovery with pgbackrest
Basic concepts
Environment set up
The exchange of public keys
Installing pgbackrest
Configuring pgbackrest
The repository configuration
Using pgbackrest with object store support
The PostgreSQL server configuration
The postgresql.conf file
The pgbackrest.conf file
Creating and managing continuous backups
Creating the stanza
Checking the stanza
Managing base backups
Managing PITR
Migrating from MySQL/MariaDB to PostgreSQL using pgloader
Summary
Verify your knowledge
References
Other Books You May Enjoy
Index
Cover
Index
Once you’ve read Learn PostgreSQL, we’d love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we’re delivering excellent quality content.
Download a free PDF copy of this book
Thanks for purchasing this book!
Do you like to read on the go but are unable to carry your print books everywhere?Is your eBook purchase not compatible with the device of your choice?
Don’t worry, now with every Packt book you get a DRM-free PDF version of that book at no cost.
Read anywhere, any place, on any device. Search, copy, and paste code from your favorite technical books directly into your application.
The perks don’t stop there, you can get exclusive access to discounts, newsletters, and great free content in your inbox daily
Follow these simple steps to get the benefits:
Scan the QR code or visit the link belowhttps://packt.link/free-ebook/9781837635641
Submit your proof of purchaseThat’s it! We’ll send your free PDF and other benefits to your email directlyPostgreSQL is a well-known open-source relational database, and its motto states what the project intends to be the most advanced open-source database in the world.
The main qualities that attract masses of new users every year and keep current users enthusiastic about PostgreSQL are its rock-solid stability, scalability, and safeness, as well as all the features that an enterprise-level database management system must provide.
While PostgreSQL is a relational database, its ecosystem has grown over time, providing a rich platform with extensions, tools, and languages tied together by communities spread around the world.
PostgreSQL is an open-source project and is fully developed in the open-source world. That means that there is no single entity in charge of the project and the result is that PostgreSQL is not a commercial product. In other words, PostgreSQL belongs to everyone, and anyone can contribute to it. Thanks to a very permissive BSD-style license, PostgreSQL can be used in any project or scenario, either open or closed source.
Of course, contributing to a project of that size and complexity requires experience in software development, database concepts, and, of course, a positive attitude to open-source and collaborative efforts. Being open-source in nature means that PostgreSQL will continue to live pretty much forever without the risk of a single company going out of business and sinking with the database.
The official PostgreSQL developers are generally known as the PostgreSQL Global Development Group (PGDG), and they are the developers that, after discussion and coordination, implement the main features and produce new releases. The PGDG delivers a new production release once per year, usually in the last quarter of the year.
At the time of writing, PostgreSQL 16 is the latest production release of this great database engine, and as usual, efforts for the next release (PostgreSQL 17) are ongoing.
This book will focus on how you can get the best out of PostgreSQL, starting from the basics (managing users, data tables, indexes, and so on) and moving toward the most exciting and complex features (such as replicating your data to prevent disasters). We’ll take a practical approach, with several examples, in order to let readers better understand every concept and acquire knowledge in a more fun and quick way. At the end, you will be able to fully administer a PostgreSQL cluster and, thanks to the resources pointed out in every chapter, you will be able to research even more features.
This book covers PostgreSQL 16, but the concepts explained in this book can also be applied to later versions (as well as to previous ones where the same features are present). In fact, while some tools could change in future releases (e.g., adding or removing some options), the basic concepts expressed in the book will remain pretty much the same without any regard to the PostgreSQL version.
This chapter will introduce you to this great open-source database starting from the project history and goals; you will learn basic PostgreSQL terminology, which is very important to help you search the documentation and understand the main error messages, in case you need to. Finally, you will see how to install PostgreSQL in different ways so that you will get a basic knowledge of how to install it on different platforms and in different contexts.
The following topics are covered in this chapter:
PostgreSQL at a glanceExploring PostgreSQL terminologyInstalling PostgreSQL 16 or higherYou can find the code for this chapter at the following GitHub repository: https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition.
As a relational database, PostgreSQL provides a lot of features, and it is quite difficult to “scare” a PostgreSQL instance.
In fact, a single instance can contain more than 4 billion individual databases, each with unlimited total size and capacity for more than 1 billion tables, each containing 32 TB of data. Moreover, if there’s any concern that those upper limits won’t suffice, please consider that a single table can have 1,600 columns, each 1 GB in size, with an unlimited number of multi-column indexes (up to 32 columns). In short, PostgreSQL can store much more data than you can possibly think of!
While PostgreSQL can handle such huge amounts of data, that does not mean that you should use it as a dumping ground or catch-all storage: in order to perform well with certain big databases, you need to understand PostgreSQL and its features, being therefore able to organize and manage your datasets.
PostgreSQL is fully ACID-compliant (see the box below) and has a very strong foundation in data integrity and concurrency. It ships with a procedural language, named PL/PgSQL, which can be used to write reusable pieces of code, such as functions and procedures, and it supports before and after triggers, views, materialized views, partitioned tables, foreign data wrappers, multiple schemas, generated columns, and so on. All of these concepts will be explained in the forthcoming chapters.
ACID is an acronym of properties, used to indicate that the database engine provides atomicity, consistency, isolation, and durability. Atomicity means that a complex database operation is processed as a single instruction even when it is made up of different operations. Consistency means that the data within the database is always kept consistent and that it is not corrupted due to partially performed operations. Isolation allows the database to handle concurrency in the “right way”—that is, without having corrupted data from interleaved changes. Lastly, durability means that the database engine is supposed to protect the data it contains, even in the case of either software or hardware failures, as much as it can.
PostgreSQL can be extended with other embedded languages, such as Perl, Python, Java, and even Bash! And if you think the database does not provide you with enough features, you can plug in extensions to obtain different behaviors and enhancements—for instance, Geospatial Information System (GIS), scheduled jobs, esoteric data types, and utilities in general. Such utilities and enhancement will not be covered in this book, but thanks to the knowledge this book provides, it will be possible to exploit the online documentation of such utilities to get the best out of them.
PostgreSQL runs on pretty much every operating system out there, including Linux, Unix, macOS X, and Microsoft Windows, and can even run on commodity hardware such as Raspberry Pi boards. There are also several cloud computing providers that list PostgreSQL in their software catalog.
Thanks to its extensive tuning mechanism, it can be adapted very well to the hosting platform. The community is responsible for keeping the database and documentation at a very high-quality level, and also, the mailing lists and IRC channels are very responsive and a valuable source for problem solutions and ideas.
In the experience of the authors, there has never been a case where PostgreSQL has not been able to adapt to an application scenario.
The PostgreSQL project has a very rich and extensive set of mailing lists that range from general topics to very specific details. It is a good habit to search for problems and solutions on the mailing list archives; see the web page at https://www.postgresql.org/list/ to get a better idea.
PostgreSQL takes its name from its ancestor: Ingres.
Ingres was a relational database developed by Professor Michael Stonebraker. In 1986, Professor Stonebraker started a post-Ingres project to develop new, cool features in the database landscape and named this project POSTGRES (POST-Ingres). The project aimed to develop an object-relational database, where “object” means the user would have the capability to extend the database with their own objects, such as data types, functions, and so on.
In 1994, POSTGRES was released with version 4.2 and an MIT license, which opened up collaboration with other developers around the world. At that time, POSTGRES was using an internal query language named QUEL. Two Berkeley students, Andrew Yu and Jolly Chen, replaced the QUEL query language with the hot and cool SQL language, and the feature was so innovative that the project changed its name to Postgre95 to emphasize the difference compared to other, preceding versions.
Eventually, in 1996, the project gained a public server to host the code, and five developers, including Marc G. Fournier, Tom Lane, and Bruce Momjan, started the development of the newly branded project named PostgreSQL. Since then, the project has been kept in good shape and up to date.
This also means that PostgreSQL has been developed for nearly 30 years, again emphasizing the solidity and openness of the project itself. If you are curious, it is also possible to dig into the source code down to the initial commit in the open-source world:
$ git log 'git rev-list --max-parents=0 HEAD' commit d31084e9d1118b25fd16580d9d8c2924b5740dff Author: Marc G. Fournier <[email protected]> Date: Tue Jul 9 06:22:35 1996 +0000 Postgres95 1.01 Distribution - Virgin SourcesPostgreSQL 16 was released on 14th September 2023. It includes a rich set of improvements, including the following:
Several performance optimizations, ranging from internal memory allocation and management to a more parallelized-by-default behavior.A revised set of permissions for users and groups, including new system groups to provide specific capabilities.An improved configuration mechanism, to ease the inclusion of files and match users and hosts by means of regular expression.A more complete set of JSON functions.An improved logical replication engine that allows decoding even on the stand-by servers.A set of utility columns gained new options to fine-tune what the administrator needs to do.As with other releases, PostgreSQL 16 also contains a set of changes aimed at making the Database Administrator (DBA)’s life easier—for instance, removing conflicting options and obsolete SQL terms and types. This emphasizes the fact that PostgreSQL developers do always take care of the database and its adherence to the current SQL standards.
PostgreSQL developers release a new major release once per year, usually around October. A major release is a stable version that introduces new features and possible incompatibilities with previous versions. During its life cycle, a major release is constantly improved by means of minor releases, which are usually bug-fixing and maintenance releases.
The PostgreSQL version number identifies the major and minor release. The version number is specified as major.minor; so, for instance, 16.0 indicates the first major release, 16, while 16.1 indicates the minor release, 1, of major release 16. In short, the greater the number, the more recent the version you are managing.
PostgreSQL’s different major versions are incompatible, while different minor versions are compatible. What does such incompatibility mean? PostgreSQL stores data in binary format, and this format could possibly change between major versions. This means that, while you are able to upgrade PostgreSQL between minor versions on the fly, you probably will have to dump and restore your database content between major version upgrades.
The recommendation, as for much other software, is to run the most recent version of PostgreSQL available to you: PostgreSQL developers put in a lot of effort in order to provide bug-free products, but new features could introduce new bugs, and regardless of the very extensive testing platform PostgreSQL has, it is software after all, and software has bugs. Despite internal bugs, new releases also include fixes for security exploits and performance improvements, so it is a very good habit to keep up to date with your running PostgreSQL server.
Last but not least, not all PostgreSQL versions will live forever. PostgreSQL provides support and upgrades for five years after a new release is issued; after this length of time, a major release will reach its End Of Life (EOL) and PostgreSQL developers will no longer maintain it. This does not mean you cannot run an ancient version of PostgreSQL; it simply means this version will not get any upgrades from the official project and, therefore, will be out of date. As an example, since PostgreSQL 16 was released in 2023, it will reach its EOL in 2028. Keep in mind that running an EOL release is not only a matter of not getting new upgrades, security patches, and bug fixes; you will be on your own and you will not find help when you run into trouble.
With that in mind, we’ll now introduce the main PostgreSQL terminology, as well as further useful-to-understand concepts.
In order for you to understand how PostgreSQL works and follow the examples in the chapters of this book, we need to introduce the terminology used within PostgreSQL and its community of users.
PostgreSQL is a service, which means it runs as a daemon on the operating system; a running PostgreSQL daemon is called an instance. A PostgreSQL instance is often called a cluster because a single instance can serve and handle multiple databases. Every database is an isolated space where users and applications can store data.
A database is accessed by allowed users, but users connected to a database cannot cross the database boundaries and interact with data contained in another database unless they explicitly connect to the latter database too.
A database can be organized into namespaces, called schemas. A schema is a mnemonic name that the user can assign to organize database objects, such as tables, into a more structured collection. Schemas cannot be nested, so they represent a flat namespace.
Database objects are represented by everything the user can create and manage within the database—for instance, tables, functions, triggers, and data types. Every object belongs to one and only one schema that, if not specified, is named as the user that creates the object.
In PostgreSQL versions prior to 15, every new object belongs to the default public schema if not specified otherwise. Since PostgreSQL 15, every user is assigned a personal schema and objects belong to such a schema unless a different schema name is explicitly specified.
Users are defined at a cluster-wide level, which means they are not tied to a particular database in the cluster. A user can connect with and manage any database in the cluster they are allowed to.
PostgreSQL splits users into two main categories:
Normal users: These users are the ones who can connect to and handle databases and objects depending on their privilege set.Superusers: These users can do anything with any database object.PostgreSQL allows the configuration of as many superusers as you need, and every superuser has the very same permissions: they can do everything with every database and object and, most notably, can also control the life cycle of the cluster (for instance, they can terminate normal user connections, reload the configuration, stop the whole cluster, and so on).
PostgreSQL internal data, such as users, databases, namespaces, configuration, and database runtime status, is provided by means of catalogs: special tables and views that present information in a SQL-interactive way. Many catalogs are trimmed depending on the user who is inspecting them, with the exception that superusers usually see the whole set of available information.
PostgreSQL stores the user data (for example, tables) and its internal status on the local filesystem.
This is an important point to keep in mind: PostgreSQL relies on the underlying filesystem to implement persistence, and therefore tuning the filesystem is an important task in order to make PostgreSQL perform well. In particular, PostgreSQL stores all of its content (user data and internal status) in a single filesystem directory known as PGDATA. The PGDATA directory represents what the cluster is serving as databases, so it is possible for you to have a single installation of PostgreSQL and make it switch to different PGDATA directories to deliver different content. As you will see in the next sections, the PGDATA directory needs to be initialized before it can be used by PostgreSQL; the initialization is the creation of the directory structure within PGDATA itself and is, of course, a one-time operation.
The detailed contents of PGDATA will be explained in the next chapter, but for now, it will suffice for you to remember that the PGDATA directory is where PostgreSQL expects to find data and configuration files. In particular, the PGDATA directory is made up of at least the Write-Ahead Logs (WALs) and the data storage. Without either of those two parts, the cluster is unable to guarantee data consistency and, in some critical circumstances, even start.
WALs are a technology that many database systems use, and the basic idea of how they work is shared with other technologies like transactional filesystems (such as ZFS, UFS with Soft Updates, and so on). The idea is that, before applying any change to a chunk of data, an intent log will be made persistent. In this case, if the cluster crashes, it can always rely on the already-written intent log to understand what operations have been completed and what must be recovered (more details on this in later chapters). Please note that with the term “crash,” we refer to any possible disaster that can hit your cluster, including a software bug, but more likely a lack of electrical power, hard disk failures, and so on. PostgreSQL does commit to providing you with the best data consistency it can, and therefore, it makes a great effort to ensure that the intent log (WAL) is as safe as possible.
Internally, PostgreSQL keeps track of the tables’ structures, indexes, functions, and all the stuff needed to manage the cluster in its dedicated storage, the catalog.
The SQL standard defines a so-called information schema, a collection of tables common to all standard database implementations, including PostgreSQL, that the DBA can use to inspect the internal status of the database itself. For instance, the information schema defines a table that collects information about all the user-defined tables so that it is possible to query the information schema to see whether a specific table exists or not. The PostgreSQL catalog is what could be called an “information schema on steroids”: the catalog is much more accurate and PostgreSQL-specific than the general information schema, and the DBA can extract a lot more information about the PostgreSQL status from the catalog. Of course, PostgreSQL does support the information schema, but throughout the whole book, you will see references to the catalogs because they provide much more detailed information.
When the cluster is started, PostgreSQL launches a single process called the postmaster. The aim of the postmaster is to bootstrap the instance, spawning needed processes to manage the database activity, and then to wait for incoming connections. A user connection, often made over a TCP/IP connection, requires the postmaster to fork another process named the backend process, which in turn is in charge of serving one and only one connection.
This means that every time a new connection against the cluster is opened, the cluster reacts by launching a new backend process to serve it until the connection ends and the process is, consequently, destroyed. The postmaster usually also starts some utility processes that are responsible for keeping PostgreSQL in good shape while it is running; these processes will be discussed later, in this and the next chapters.
To summarize, PostgreSQL provides you with executables that can be installed wherever you want on your system and can serve a single cluster. The cluster, in turn, serves data out of a single PGDATA directory that contains, among other stuff, the user data, the cluster’s internal status, the catalog, and the WALs. Every time a client connects to the server, the postmaster process forks a new backend process that is the minion in charge of serving the connection.
From the concepts explained above, the following is a quick recap of the most complex terms used in PostgreSQL:
Cluster: the whole PostgreSQL service.Postmaster: the first process the cluster executes, and this process is responsible for keeping track of the activities of the whole cluster. The postmaster spawns a backend process every time a new connection is established.Database: an isolated data container to which users (or applications) can connect. A cluster can handle multiple databases. A database can be made up of different objects, including schemas (namespaces), tables, triggers, and other objects you will see as the book progresses.PGDATA: the directory that, on persistent storage, is fully dedicated to PostgreSQL and its data. PostgreSQL stores the data within such a directory.WALs: the intent log of database changes, used to recover data from a critical crash.Now that we’ve discussed the basic terminology related to PostgreSQL, it is time to get it installed on your machine.
PostgreSQL can run on several operating systems, most notably Unix and Unix-like systems, including Linux, as well as on Microsoft Windows 11 or higher. So far, the most supported platform remains Linux because most PostgreSQL developers work on this platform, and so it is the one with the most tested use cases. However, deploying on other supported platforms should not present any problems and is not going to put your data at any risk.
This section will focus on installing PostgreSQL 16, since it is the latest stable version available worldwide. You will learn, however, how to build your own version of PostgreSQL, and this may also be the way for you to install other versions of PostgreSQL in the future.
Before installing PostgreSQL, you need to choose, or at least evaluate, how to install it. There are two main ways to get PostgreSQL up and running:
Compiling from sourcesUsing binary packagesBinary packages are provided by the PostgreSQL community or the operating system, and using them has the advantage that it can provide you with a smooth PostgreSQL installation.
Moreover, binary packages do not require a compilation toolchain and therefore are much easier to adopt. Lastly, a binary package adheres to the operating system conventions it has been built for (for instance, on where to place configuration files) and upgrades can be managed by the operating system as well. Since binary packages need to be pre-built from vendors, they may not reflect the latest released version. For example, when the PGDG delivers a new minor update, operating systems require some days to push out binary packages with such upgrades for all the supported platforms.
On the other hand, installing from sources requires a compilation toolchain, as well as much more time and CPU consumption to build PostgreSQL executables. You have full control over which components will be available in the final product and can trim and optimize your instance for very high performances and shrink resource consumption to a minimum. In the long term, however, you will be responsible for maintaining the installation and upgrading it in a similar manner.
PostgreSQL is split across several components to install:
The PostgreSQL server is the part that can serve your databases to applications and users and is required to store your data.The PostgreSQL client is the library and client tool to connect to the database server. It is not required if you don’t need to connect to the database on the very same machine, while it is required on client machines.The PostgreSQL contrib package is a set of well-known extensions and utilities that can enhance your PostgreSQL experience. This additional package is developed by the PGDG and is therefore well integrated and stable.The PostgreSQL docs is the documentation (e.g., man pages) related to the server and the client.PostgreSQL PL/Perl, PL/Python, and PL/Tcl are three components to allow the usage of programming languages— Perl, Python, and Tcl, respectively—directly within the PostgreSQL server.The recommended set of components is the server, the client, and the contrib modules; these modules will be used across the book. You are free to decide whether to install the other components as you wish, but this book will not detail each of them.
In order to better understand the concepts explained in this book, we recommend readers try the code examples on their own; therefore, you will need? a PostgreSQL instance available. While the best choice to get a full PostgreSQL instance at your fingertips is to install it on a virtual machine or a physical computer, we have also provided a set of Docker images as containerized PostgreSQL instances to run and experiment on. Therefore, you can choose between performing a full installation or a quick Docker setup to get a PostgreSQL machine ready. However, it is important for every DBA to be able to install PostgreSQL on several systems, and therefore, this section aims to show you how to perform a complete installation from scratch on a few Unix-like operating systems.
In the following sections, you will see how to install PostgreSQL on a few popular Linux and Unix operating systems, namely the following:
Linux Docker containersGNU/Linux Debian, Ubuntu, and derivativesFedora Linux (this also applies to Red Hat Enterprise Linux and compatible distributions, like Rocky Linux)FreeBSDIt is not possible to provide detailed instructions for every operating system out there, but the concepts presented in the following sections should prove insightful regardless.
Before getting to the practical installation, it is worth noting that binary packages could come in two flavors: those provided by the operating system vendor, and those provided by the PGDG. Usually, on Linux-based systems, you should use binary packages provided by the PGDG, because they are the most authoritative source for PostgreSQL. In fact, packages provided by the operating system vendor tend to become out of date very soon, which means they are usually a few versions behind the latest version globally available. On the other hand, on BSD platforms like FreeBSD, OpenBSD, and NetBSD, the operating system porters do an excellent job of keeping the packages provided by the operating system itself very up to date, so you can safely and easily use the operating system packages.
An important thing to note is that different operating systems store files in different places: usually, all the configuration files are placed within the PGDATA itself, but packages from some operating systems scatter the configuration files under the /etc directory. A few operating systems also place executables in specific paths, separated by the version of PostgreSQL, while others place all executables in the same path.
You need to investigate with the operating system package provider where each file or directory is placed in order to be able to configure and use PostgreSQL.
Docker is a container that allows you to run an isolated set of processes as if they are part of a micro virtual machine. The PGDG provides a Docker image that you can use to run a containerized cluster.
Explaining the Docker technology is out of the scope of this book, and in order to let you experiment in a quick and easy way with PostgreSQL, we have provided a set of Docker images, based on the PostgreSQL image, customized to let you experiment with the concepts explained in this book. You can use the above images as a starting point for your own projects, even if the above images are not meant to be used in a production environment. The images are contained in the docker_images directory of the book’s code repository (https://github.com/PacktPublishing/Learn-PostgreSQL-Second-Edition/).
We separated every Docker image by means of the chapter the image refers to. There is a common catch-all image named standalone that can be used as a common base and will be used in the very first chapters. Other chapters, for instance, those on replication, require their own image to be executed.
In order to start the base standalone image, you can simply execute the shell script run-pg-docker.sh, as follows:
$ sh run-pg-docker.sh … postgres@learn_postgresql:~$The script will ask you for a password; it is required that your user has sudo capabilities to connect the Docker network and ports. All the containers will launch a GNU Bash session with the postgres operating system user.
