32,39 €
Master the capabilities of PostgreSQL 10 to efficiently manage and maintain your database
PostgreSQL is an open source database used for handling large datasets (big data) and as a JSON document database. This book highlights the newly introduced features in PostgreSQL 10, and shows you how you can build better PostgreSQL applications, and administer your PostgreSQL database more efficiently. We begin by explaining advanced database design concepts in PostgreSQL 10, along with indexing and query optimization. You will also see how to work with event triggers and perform concurrent transactions and table partitioning, along with exploring SQL and server tuning. We will walk you through implementing advanced administrative tasks such as server maintenance and monitoring, replication, recovery, high availability, and much more. You will understand common and not-so-common troubleshooting problems and how you can overcome them. By the end of this book, you will have an expert-level command of advanced database functionalities and will be able to implement advanced administrative tasks with PostgreSQL 10.
If you are a PostgreSQL data architect or an administrator and want to understand how to implement advanced functionalities and master complex administrative tasks with PostgreSQL 10, then this book is perfect for you. Prior experience of administrating a PostgreSQL database and a working knowledge of SQL are required to make the best use of this book.
Hans-Jürgen Schönig has 18 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig & Schönig GmbH. It has successfully served countless customers around the globe. Before founding Cybertec Schönig & Schönig GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 541
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Amey VarangaonkarAcquisition Editor:Namrata PatilContent Development Editor:Eisha DsouzaTechnical Editor:Varsha ShivhareCopy Editors: Ulka Manjrekar, Safis EditingProject Coordinator:Shweta H BirwatkarProofreader: Safis EditingIndexer:Rekha NairGraphics:Kirk D'PenhaProduction Coordinator:Shantanu Zagade
First published: January 2018
Production reference: 1300118
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78847-229-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Hans-Jürgen Schönig has 18 years of experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig & Schönig GmbH. It has successfully served countless customers around the globe.
Before founding Cybertec Schönig & Schönig GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.
Sheldon Strauch is a 23-year veteran of software consulting at companies such as IBM, Sears, Ernst & Young, and Kraft Foods. He has a bachelor's degree in business administration and leverages his technical skills to improve businesses' self-awareness. His interests include data gathering, management, and mining; maps and mapping; business intelligence; and the application of data analysis for continuous improvement. He is currently focusing on the development of end-to-end data management and mining at Enova International, a financial services company located in Chicago. In his spare time, he enjoys performing arts, particularly music, and traveling with his wife, Marilyn.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the color images
Conventions used
Get in touch
Reviews
PostgreSQL Overview
What is new in PostgreSQL 10.0?
Understanding new database administration functions
Using additional information in pg_stat_activity
Introducing SCRAM-SHA-256
Improving support for replication
Understanding logical replication
Introducing quorum COMMIT
Partitioning data
Making use of CREATE STATISTICS
Improving parallelism
Introducing ICU encodings
Summary
Understanding Transactions and Locking
Working with PostgreSQL transactions
Handling errors inside a transaction
Making use of SAVEPOINT
Transactional DDLs
Understanding basic locking
Avoiding typical mistakes and explicit locking
Considering alternative solutions
Making use of FOR SHARE and FOR UPDATE
Understanding transaction isolation levels
Considering SSI transactions
Observing deadlocks and similar issues
Utilizing advisory locks
Optimizing storage and managing cleanup
Configuring VACUUM and autovacuum
Digging into transaction wraparound-related issues
A word on VACUUM FULL
Watching VACUUM at work
Making use of snapshot too old
Summary
Making Use of Indexes
Understanding simple queries and the cost model
Making use of EXPLAIN
Digging into the PostgreSQL cost model
Deploying simple indexes
Making use of sorted output
Using more than one index at a time
Using bitmap scans effectively
Using indexes in an intelligent way
Improving speed using clustered tables
Clustering tables
Making use of index only scans
Understanding additional b-tree features
Combined indexes
Adding functional indexes
Reducing space consumption
Adding data while indexing
Introducing operator classes
Hacking up an operator class for a b-tree
Creating new operators
Creating operator classes
Testing custom operator classes
Understanding PostgreSQL index types
Hash indexes
GiST indexes
Understanding how GiST works
Extending GiST
GIN indexes
Extending GIN
SP-GiST indexes
BRIN indexes
Extending BRIN indexes
Adding additional indexes
Achieving better answers with fuzzy searching
Taking advantage of pg_trgm
Speeding up LIKE queries
Handling regular expressions
Understanding full-text search - FTS
Comparing strings
Defining GIN indexes
Debugging your search
Gathering word statistics
Taking advantage of exclusion operators
Summary
Handling Advanced SQL
Introducing grouping sets
Loading some sample data
Applying grouping sets
Investigating performance
Combining grouping sets with the FILTER clause
Making use of ordered sets
Understanding hypothetical aggregates
Utilizing windowing functions and analytics
Partitioning data
Ordering data inside a window
Using sliding windows
Abstracting window clauses
Making use of onboard windowing functions
The rank and dense_rank functions
The ntile() function
The lead() and lag() functions
The first_value(), nth_value(), and last_value() functions
The row_number() function
Writing your own aggregates
Creating simple aggregates
Adding support for parallel queries
Improving efficiency
Writing hypothetical aggregates
Summary
Log Files and System Statistics
Gathering runtime statistics
Working with PostgreSQL system views
Checking live traffic
Inspecting databases
Inspecting tables
Making sense of pg_stat_user_tables
Digging into indexes
Tracking the background worker
Tracking, archiving, and streaming
Checking SSL connections
Inspecting transactions in real time
Tracking vacuum progress
Using pg_stat_statements
Creating log files
Configuring the postgresql.conf file
Defining log destination and rotation
Configuring syslog
Logging slow queries
Defining what and how to log
Summary
Optimizing Queries for Good Performance
Learning what the optimizer does
Optimizations by example
Evaluating join options
Nested loops
Hash joins
Merge joins
Applying transformations
Inlining the view
Flattening subselects
Applying equality constraints
Exhaustive searching
Trying it all out
Making the process fail
Constant folding
Understanding function inlining
Join pruning
Speedup set operations
Understanding execution plans
Approaching plans systematically
Making EXPLAIN more verbose
Spotting problems
Spotting changes in runtime
Inspecting estimates
Inspecting buffer usage
Fixing high buffer usage
Understanding and fixing joins
Getting joins right
Processing outer joins
Understanding the join_collapse_limit variable
Enabling and disabling optimizer settings
Understanding genetic query optimization
Partitioning data
Creating partitions
Applying table constraints
Modifying inherited structures
Moving tables in and out of partitioned structures
Cleaning up data
Understanding PostgreSQL 10.0 partitioning
Adjusting parameters for good query performance
Speeding up sorting
Speeding up administrative tasks
Summary
Writing Stored Procedures
Understanding stored procedure languages
The anatomy of a stored procedure
Introducing dollar quoting
Making use of anonymous code blocks
Using functions and transactions
Understanding various stored procedure languages
Introducing PL/pgSQL
Handling quoting
Managing scopes
Understanding advanced error handling
Making use of GET DIAGNOSTICS
Using cursors to fetch data in chunks
Utilizing composite types
Writing triggers in PL/pgSQL
Introducing PL/Perl
Using PL/Perl for datatype abstraction
Deciding between PL/Perl and PL/PerlU
Making use of the SPI interface
Using SPI for set returning functions
Escaping in PL/Perl and support functions
Sharing data across function calls
Writing triggers in Perl
Introducing PL/Python
Writing simple PL/Python code
Using the SPI interface
Handling errors
Improving stored procedure performance
Reducing the number of function calls
Using cached plans
Assigning costs to functions
Using stored procedures
Summary
Managing PostgreSQL Security
Managing network security
Understanding bind addresses and connections
Inspecting connections and performance
Living in a world without TCP
Managing pg_hba.conf
Handling SSL
Handling instance-level security
Creating and modifying users
Defining database-level security
Adjusting schema-level permissions
Working with tables
Handling column-level security
Configuring default privileges
Digging into row-level security - RLS
Inspecting permissions
Reassigning objects and dropping users
Summary
Handling Backup and Recovery
Performing simple dumps
Running pg_dump
Passing passwords and connection information
Using environment variables
Making use of .pgpass
Using service files
Extracting subsets of data
Handling various formats
Replaying backups
Handling global data
Summary
Making Sense of Backups and Replication
Understanding the transaction log
Looking at the transaction log
Understanding checkpoints
Optimizing the transaction log
Transaction log archiving and recovery
Configuring for archiving
Configuring the pg_hba.conf file
Creating base backups
Reducing the bandwidth of a backup
Mapping tablespaces
Using different formats
Testing transaction log archiving
Replaying the transaction log
Finding the right timestamp
Cleaning up the transaction log archive
Setting up asynchronous replication
Performing a basic setup
Improving security
Halting and resuming replication
Checking replication to ensure availability
Performing failovers and understanding timelines
Managing conflicts
Making replication more reliable
Upgrading to synchronous replication
Adjusting durability
Making use of replication slots
Handling physical replication slots
Handling logical replication slots
Use cases of logical slots
Making use of CREATE PUBLICATION and CREATE SUBSCRIPTION
Summary
Deciding on Useful Extensions
Understanding how extensions work
Checking for available extensions
Making use of contrib modules
Using the adminpack
Applying bloom filters
Deploying btree_gist and btree_gin
Dblink - considering phasing out
Fetching files with file_fdw
Inspecting storage using pageinspect
Investigating caching with pg_buffercache
Encrypting data with pgcrypto
Prewarming caches with pg_prewarm
Inspecting performance with pg_stat_statements
Inspecting storage with pgstattuple
Fuzzy searches with pg_trgm
Connecting to remote servers using postgres_fdw
Handling mistakes and typos
Other useful extensions
Summary
Troubleshooting PostgreSQL
Approaching an unknown database
Inspecting pg_stat_activity
Querying pg_stat_activity
Treating Hibernate statements
Figuring out where queries come from
Checking for slow queries
Inspecting individual queries
Digging deeper with perf
Inspecting the log
Checking for missing indexes
Checking for memory and I/O
Understanding noteworthy error scenarios
Facing clog corruption
Understanding checkpoint messages
Managing corrupted data pages
Careless connection management
Fighting table bloat
Summary
Migrating to PostgreSQL
Migrating SQL statements to PostgreSQL
Using lateral joins
Supporting lateral
Using grouping sets
Supporting grouping sets
Using the WITH clause – common table expressions
Supporting the WITH clause
Using the WITH RECURSIVE clause
Supporting the WITH RECURSIVE clause
Using the FILTER clause
Supporting the FILTER clause
Using windowing functions
Supporting windowing and analytics
Using ordered sets – WITHIN GROUP clause
Supporting the WITHIN GROUP clause
Using the TABLESAMPLE clause
Supporting TABLESAMPLE clause
Using limit/offset
Supporting the FETCH FIRST clause
Using OFFSET
Supporting the OFFSET clause
Using temporal tables
Supporting temporal tables
Matching patterns in time series
Moving from Oracle to PostgreSQL
Using the oracle_fdw extension to move data
Using ora2pg to migrate from Oracle
Common pitfalls
Moving from MySQL or MariaDB to PostgreSQL
Handling data in MySQL and MariaDB
Changing column definitions
Handling null values
Expecting problems
Migrating data and schema
Using pg_chameleon
Using FDWs
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
PostgreSQL is an advanced relational open source database, which is also an excellent foundation for NoSQL workloads. It is capable of handling large datasets and running complex SQL for your business needs. This book will enable you to build better PostgreSQL applications and administer databases more efficiently. It will give you valuable insights into how to use PostgreSQL best for your advantage.
This book has explicitly been written for people who want to know more about PostgreSQL and who are not satisfied with basic information. The aim is to write a book that goes a bit deeper and explains the most important stuff in a clear and easy-to-understand way.
Chapter 1, PostgreSQL Overview, provides an overview of PostgreSQL and its features. You will learn new stuff and the new functionalities available in PostgreSQL.
Chapter 2, Understanding Transactions and Locking, covers one of the most important aspects of any database system: proper database work is usually not possible without the existence of transactions. Understanding transactions and locking is vital to performance, as well as professional work.
Chapter 3, Making Use of Indexes, covers everything you need to know about indexes. Indexes are key to performance and are therefore an important cornerstone if you want a good user experience and high throughput. All the important aspects of indexing will be covered.
Chapter 4, Handling Advanced SQL, introduces you to some of the most important concepts of modern SQL. You will learn about windowing functions as well as other important current elements of SQL.
Chapter 5, Log Files and System Statistics, guides you through administrative tasks such as log file management and monitoring. You will learn how to inspect your servers and extract runtime information from PostgreSQL.
Chapter 6, Optimizing Queries for Good Performance, tells you everything you need to know about good PostgreSQL performance. The chapter covers SQL tuning as well as information about memory management.
Chapter 7, Writing Stored Procedures, teaches you some of the advanced topics related to server-side code. The most important server-side programming languages are covered and important aspects are pointed out.
Chapter 8, Managing PostgreSQL Security, is designed to help you improve the security of your server. The chapter features everything from user management to Row-Level Security (RLS). Information about encryption is also included.
Chapter 9, Handling Backup and Recovery, is all about backups and data recovery. You will learn to back up your data , which will enable you to restore things in the event of a disaster.
Chapter 10, Making Sense of Backups and Replication, is all about redundancy. You will learn to asynchronously and synchronously replicate PostgreSQL database systems. All modern features are covered as extensively as possible.
Chapter 11, Deciding on Useful Extensions, describes widely used modules that add more functionality to PostgreSQL. You will learn about the most common extensions.
Chapter 12, Troubleshooting PostgreSQL, offers a systematic approach to fixing problems in PostgreSQL. It will enable you to spot common problems and approach them in an organized way.
Chapter 13, Migrating to PostgreSQL, is the final chapter of this book and shows you how to migrate from commercial databases to PostgreSQL. The chapter covers the most important databases migrated these days.
This book has been written for a broad audience. In order to follow the examples presented in this book, it makes sense to have at least some experience with SQL and maybe even PostgreSQL in general (although this is not a strict requirement). In general, it is a good idea to be familiar with the UNIX command line.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/MasteringPostgreSQL10_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "To figure out what is going wrong, PostgreSQL offers the EXPLAIN command."
Any command-line input or output is written as follows:
test=# EXPLAIN SELECT *
FROM t_test
ORDER BY id DESC
LIMIT 10;
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
PostgreSQL is one of the world's most advanced open source database systems, and it has many features that are widely used by developers and system administrators alike. Starting with PostgreSQL 10, many new features have been added to PostgreSQL, which contribute greatly to the success of this exceptional open source product. In this book, many of these cool features will be covered and discussed in great detail.
In this chapter, you will be introduced to PostgreSQL and its cool new features available in PostgreSQL 10.0 and beyond. All relevant new functionalities will be covered in detail. Given the sheer number of changes made to the code and given the size of the PostgreSQL project, this list of features is of course by far not complete, so I tried to focus on the most important aspects relevant to most people.
The features outlined in this chapter will be split into the following categories:
Database administration
SQL and developer related
Backup, recovery, and replication
Performance-related topics
PostgreSQL 10.0 has been released in late 2017 and is the first version that follows the new numbering scheme introduced by the PostgreSQL community. From now on, the way major releases are done will change and therefore, the next major version after PostgreSQL 10.0 will not be 10.1 but PostgreSQL 11. Versions 10.1 and 10.2 are merely service releases and will only contain bug fixes.
PostgreSQL 10.0 has many new features that can help the administrator reduce work and make systems more robust.
One of these features that makes life easier for administrators is related to additional information in pg_stat_activity.
Before PostgreSQL 10.0, pg_stat_activity only contained information about normal backend processes serving end users (connections). However, this has changed. Since PostgreSQL 10.0, a lot more information is exposed. It is possible to figure out what these other system processes are doing.
The following listing shows the content of pg_stat_activity on an idle database instance:
test=# \x Expanded display is on. test=# SELECT pid, wait_event_type, wait_event, backend_type FROM pg_stat_activity ; -[ RECORD 1 ]---+-------------------- pid | 12159 wait_event_type | Activity wait_event | AutoVacuumMain backend_type | autovacuum launcher -[ RECORD 2 ]---+-------------------- pid | 12161 wait_event_type | Activity wait_event | LogicalLauncherMain backend_type | background worker -[ RECORD 3 ]---+-------------------- pid | 12628 wait_event_type | wait_event | backend_type | client backend -[ RECORD 4 ]---+-------------------- pid | 12156 wait_event_type | Activity wait_event | BgWriterMain backend_type | background writer -[ RECORD 5 ]---+-------------------- pid | 12155 wait_event_type | Activity wait_event | CheckpointerMain backend_type | checkpointer -[ RECORD 6 ]---+-------------------- pid | 12157 wait_event_type | Activity wait_event | WalWriterMain backend_type | walwriter
What you see here is that every server process is listed. It will allow you to gain some insights into what is happening the server.
Most people use passwords to connect to the database and manage security. Traditionally, people utilized md5. However, md5 is not safe anymore and therefore new authentication methods are needed. Starting with version 10.0, PostgreSQL supports SCRAM-SHA-256, which is far safer than the previous authentication method.
The old way of doing it is still supported. However, it is strongly recommended to move to SCRAM-SHA-256 in favor of md5.
The introduction of PostgreSQL also saw the introduction of logical replication, which has not been in the core before.
Since version 8.0, PostgreSQL has supported binary replication (also often referred to as WAL-shipping). The ability to distribute transaction log ( WAL) has been improved steadily over the years.
With the introduction of PostgreSQL 10.0, a new feature has been added to PostgreSQL—Logical replication. How does it work? Logical replication allows you to publish a set of tables on one server and ask other servers to subscribe to the changes.
To publish data, the new CREATE PUBLICATION command has been introduced:
test=# \h CREATE PUBLICATION Command: CREATE PUBLICATION Description: define a new publication Syntax: CREATE PUBLICATION name [ FOR TABLE [ ONLY ] table_name [ * ] [, ...] | FOR ALL TABLES ] [ WITH ( publication_parameter [= value] [, ... ] ) ]
Once the data has been published, remote servers can subscribe to these changes and receive information about what has happened to those published data sets:
test=# \h CREATE SUBSCRIPTION Command: CREATE SUBSCRIPTION Description: define a new subscription Syntax: CREATE SUBSCRIPTION subscription_name CONNECTION 'conninfo' PUBLICATION publication_name [, ...] [ WITH ( subscription_parameter [= value] [, ... ] ) ]
CREATE SUBSCRIPTION is used on the slave side to attach to these changes. The beauty of the concept is that a server can publish one set of tables while subscribing to some other tables at the same time—there is no such thing as always master or always slave anymore. Logical replication allows you to flexibly distribute data.
There have been talks about introducing partitioning to PostgreSQL for years. However, big, important features take time to implement and this is especially true if you are aiming for a good, extensible, and future-proof implementation. In PostgreSQL 10.0, table partitioning has finally made it to the PostgreSQL core. Of course, the implementation is far from complete, and a lot of work has to be done in the future to add even more features. However, support for partitioning is important and will definitely be one of the most desirable things in PostgreSQL 10.0.
As of now, partitioning is able to:
Automatically create proper child constraints
Route changes made to the parent table to the child table
However, as stated earlier, there are still a couple of missing features that have not been addressed yet. Here are some of the more important things:
Create child tables automatically in case data comes in, which is not covered by partitioning criteria yet
No support for hash partitioning
Move updated rows that no longer match the partition
Handle partitions in parallel
The roadmap for PostgreSQL 11.0 already suggests that many of these things might be supported in the next release.
PostgreSQL 9.6 was the first version supporting parallel queries in their most basic form. Of course, not all parts of the server are fully parallel yet. Therefore, it is an ongoing effort to speed up even more operations than before. PostgreSQL 10.0 is a major step towards even more parallelism as a lot more operations can now benefit from multi-core systems.
Indexes are a key area of improvement and will benefit greatly from additional features introduced into PostgreSQL 10.0. There is now full support for parallel b-tree scans as well as for bitmap scans. For now, only b-tree indexes can benefit from parallelism but this will most likely change in future releases too, to ensure that all types of indexes can enjoy an even better performance.
In addition to indexing, the PostgreSQL community has also worked hard to introduce support for parallel merge joins and to allow for more procedures to run in parallel. Some of the latest blog posts from the PostgreSQL community already suggest that many new features related to parallelism are in the pipeline for PostgreSQL 11.0.
When a PostgreSQL database is created, the administrator can choose the encoding, which should be used to store the data. Basically, the configuration decides which characters exist and in which order they are displayed. Here is an example—de_AT@UTF-8. In this case, we will use Unicode characters, which will be displayed in an Austrian sort order (Austrians speak some sort of German). So, de_AT will define the order in which the data will be sorted.
To achieve this kind of sorting, PostgreSQL relies heavily on the operating system. The trouble is that if the sort order of characters changes in the operating system for some reason (maybe because of a bug or because of some other reason), PostgreSQL will have troubles with its indexes. A normal b-tree index is basically a sorted list, and if the sort order changes, naturally, there is a problem.
The introduction of the ICU library is supposed to fix this problem. ICU offers stronger promises than the operating system and is, therefore, more suitable for long-term storage of data. With the introduction of PostgreSQL 10.0, ICU encodings can be enabled.
In PostgreSQL 10.0, a lot of functionalities have been added that allow people to run even more professional applications even faster and more efficiently. All areas of the database server have been improved and many new professional features have been added. In the future, even more improvements will be made. Of course, the changes listed in this chapter are by far not complete because many small changes were made.
Locking is an important topic in any kind of database. It is not enough to understand just how it works to write proper or better applications; it is also essential from a performance point of view. Without handling locks properly, your applications might not only be slow, they might also be wrong and behave in very unexpected ways. In my opinion, locking is the key to performance and having a good overview will certainly help. Therefore, understanding locking and transaction is important for administrators and developers alike. In this chapter, you will learn the following topics:
Working with PostgreSQL transactions
Understanding basic locking
Making use of
FOR SHARE
and
FOR UPDATE
Understanding transaction isolation levels
Considering SSI transactions
Observing deadlocks and similar issues
Optimizing storage and managing cleanups
At the end of the chapter, you will be able to understand and utilize PostgreSQL transactions in the most efficient way possible.
PostgreSQL provides you with a highly advanced transaction machinery that offers countless features to developers and administrators alike. In this section, it is time to look at the basic concept of transactions.
The first important thing to know is that in PostgreSQL, everything is a transaction. If you send a simple query to the server, it is already a transaction. Here is an example:
test=# \x Expanded display is on. test=# SELECT now(), now();
now | now -------------------------------+------------------------------ 2017-08-24 16:03:27.174253+02 | 2017-08-24 16:03:27.174253+02(1 row)
In this case, the SELECT statement will be a separate transaction. If the same command is executed again, different timestamps will be returned.
If more than one statement has to be a part of the same transaction, the BEGIN statement must be used:
test=# \h BEGIN
Command: BEGIN
Description: Start a transaction block
Syntax:
BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]
where transaction_mode is one of:
ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ
| READ COMMITTED | READ UNCOMMITTED }
READ WRITE | READ ONLY
[ NOT ] DEFERRABLE
The BEGIN statement will ensure that more than one command will be packed into a transaction. Here is how it works:
test=# BEGIN; BEGIN test=# SELECT now(); now ------------------------------- 2017-08-24 16:04:08.105131+02 (1 row) test=# SELECT now(); now ------------------------------- 2017-08-24 16:04:08.105131+02 (1 row) test=# COMMIT; COMMIT
The important point here is that both timestamps will be identical. As mentioned earlier, we are talking about transaction time here.
To end the transaction, COMMIT can be used:
test=
# \h COMMIT
Command: COMMIT
Description: Commit the current transaction
Syntax:
COMMIT [ WORK | TRANSACTION ]
There are a couple of syntax elements here. You can just use COMMIT, COMMIT WORK, or COMMIT TRANSACTION. All three options have the same meaning. If this is not enough, there is more:
test=# \h END Command: END Description: commit the current transaction Syntax: END [ WORK | TRANSACTION ]
The END clause is the same as the COMMIT clause.
ROLLBACK is the counterpart of COMMIT. Instead of successfully ending a transaction, it will simply stop the transaction without ever making things visible to other transactions:
test=# \h ROLLBACK
Command: ROLLBACK
Description: Abort the current transaction
Syntax:
ROLLBACK [ WORK | TRANSACTION ]
Some applications use ABORT instead of ROLLBACK. The meaning is the same.
It is not always the case that transactions are correct from beginning to end. However, in PostgreSQL, only error-free transactions can be committed. Here is what happens:
test=# BEGIN; BEGINtest=# SELECT 1; ?column?
----------
1
(1 row)
test=# SELECT 1 / 0; ERROR: division by zero test=# SELECT 1; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# COMMIT; ROLLBACK
Note that division by zero did not work out.
It is important to point out that PostgreSQL will error-out, unlike MySQL, which is far less strict. After an error has occurred, no more instructions will be accepted even if those instructions are semantically and syntactically correct. It is still possible to issue a COMMIT. However, PostgreSQL will roll back the transaction because it is the only correct thing to be done at this point.
In professional applications, it can be pretty hard to write reasonably long transactions without ever encountering a single error. To solve the problem, users can utilize something called SAVEPOINT. As the name indicates, it is a safe place inside a transaction that the application can return to in the event things go terribly wrong. Here is an example:
test=# BEGIN; BEGIN test=# SELECT 1; ?column? ---------- 1 (1 row) test=# SAVEPOINT a; SAVEPOINT test=# SELECT 2 / 0; ERROR: division by zerotest=# SELECT 2; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# ROLLBACK TO SAVEPOINT a; ROLLBACK test=# SELECT 3; ?column? ---------- 3
(1 row) test=# COMMIT; COMMIT
After the first SELECT clause, I decided to create SAVEPOINT to make sure that the application can always return to this point inside the transaction. As you can see, SAVEPOINT has a name, which is referred to later.
After returning the savepoint called a, the transaction can proceed normally. The code has jumped back before the error, so everything is fine.
The number of savepoints inside a transaction is practically unlimited. We have seen customers with over 250,000 savepoints in a single operation. PostgreSQL can easily handle this.
If you want to remove a SAVEPOINT from inside a transaction, there is a RELEASE SAVEPOINT:
test=# \h RELEASE SAVEPOINTCommand: RELEASE SAVEPOINT
Description: Destroy a previously defined
SAVEPOINT
Syntax:
RELEASE [ SAVEPOINT ] savepoint_name
Many people ask, what will happen if you try to reach a savepoint after a transaction has ended? The answer is, the life of a savepoint ends as soon as the transaction ends. In other words, there is no way to return to a certain point in time after the transactions have been completed.
PostgreSQL has a very nice feature that is unfortunately not present in many commercial database systems. In PostgreSQL, it is possible to run DDLs (commands that change the data structure) inside a transaction block. In a typical commercial system, a DDL will implicitly commit the current transaction. Not so in PostgreSQL.
Apart from some minor exceptions (DROP DATABASE, CREATE TABLESPACE/DROP TABLESPACE, and so on), all DDLs in PostgreSQL are transactional, which is a huge plus and a real benefit to end users.
Here is an example:
test=# \d
No relations found. test=# BEGIN; BEGIN test=# CREATE TABLE t_test (id int); CREATE TABLE test=# ALTER TABLE t_test ALTER COLUMN id TYPE int8; ALTER TABLE test=# \d t_test Table "public.t_test" Column | Type | Modifiers --------+--------+----------- id | bigint | test=# ROLLBACK; ROLLBACK test=# \d
No relations found.
In this example, a table has been created and modified, and the entire transaction is aborted instantly. As you can see, there is no implicit COMMIT or any other strange behavior. PostgreSQL simply acts as expected.
Transactional DDLs are especially important if you want to deploy software. Just imagine running a Content Management System (CMS). If a new version is released, you'll want to upgrade. Running the old version would still be OK; running the new version is also OK but you really don't want a mixture of old and new. Therefore, deploying an upgrade in a single transaction is definitely highly beneficial as it upgrades an atomic operation.
In my life as a professional PostgreSQL consultant (https://www.cybertec-postgresql.com), I have seen a couple of mistakes that are made again and again. If there are constants in life, these typical mistakes are definitely some of the things that never change.
Here is my favorite:
Transaction 1
Transaction 2
BEGIN;
BEGIN;
SELECT max(id) FROM product;
SELECT max(id) FROM product;
User will see 17
User will see 17
User will decide to use 18
User will decide to use 18
INSERT INTO product ... VALUES (18, ...)
INSERT INTO product ... VALUES (18, ...)
COMMIT;
COMMIT;
In this case, there will be either a duplicate key violation or two identical entries. Neither variation of the problem is all that appealing.
One way to fix the problem is to use explicit table locking:
test=# \h LOCK Command: LOCK
Description: lock a table
Syntax:
LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where
lockmode
is one of the following:
ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE | SHARE UPDATE EXCLUSIVE| SHARE | SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE
As you can see, PostgreSQL offers eight types of locks to lock an entire table. In PostgreSQL, a lock can be as light as an ACCESS SHARE lock or as heavy as an ACCESS EXCLUSIVE lock. The following list shows what these locks do:
ACCESS SHARE
: This type of lock is taken by reads and conflicts only with
ACCESS EXCLUSIVE
, which is set by
DROP TABLE
and the like. Practically, this means that
SELECT
cannot start if a table is about to be dropped. This also implies that
DROP TABLE
has to wait until a reading transaction is completed.
ROW SHARE
: PostgreSQL takes this kind of lock in the case of
SELECT FOR UPDATE
/
SELECT FOR SHARE
. It conflicts with
EXCLUSIVE
and
ACCESS EXCLUSIVE
.
ROW EXCLUSIVE
: This lock is taken by
INSERT
,
UPDATE
, and
DELETE
. It conflicts with
SHARE
,
SHARE ROW EXCLUSIVE
,
EXCLUSIVE
, and
ACCESS EXCLUSIVE
.
SHARE UPDATE EXCLUSIVE
: This kind of lock is taken by
CREATE INDEX CONCURRENTLY
,
ANALYZE
,
ALTER TABLE
,
VALIDATE
, and some other flavors of
ALTER TABLE
as well as by
VACUUM
(not
VACUUM FULL
). It conflicts with the
SHARE UPDATE EXCLUSIVE
,
SHARE
,
SHARE ROW EXCLUSIVE
,
EXCLUSIVE
, and
ACCESS EXCLUSIVE
lock modes.
SHARE
: When an index is created,
SHARE
locks will be set. It conflicts with
ROW EXCLUSIVE
,
SHARE UPDATE EXCLUSIVE
,
SHARE ROW EXCLUSIVE
,
EXCLUSIVE
, and
ACCESS EXCLUSIVE
.
SHARE ROW EXCLUSIVE
: This one is set by
CREATE TRIGGER
and some forms of
ALTER TABLE
and conflicts with everything but
ACCESS SHARE
.
EXCLUSIVE
: This type of lock is by far the most restrictive one. It protects against reads and writes alike. If this lock is taken by a transaction, nobody else can read or write to the table affected.
ACCESS EXCLUSIVE
: This lock prevents concurrent transactions from reading and writing.
Given the PostgreSQL locking infrastructure, one solution to the max-problem outlined previously would be as follows:
BEGIN; LOCK TABLE product IN ACCESS EXCLUSIVE MODE; INSERT INTO product SELECT max(id) + 1, ... FROM product; COMMIT;
Keep in mind that this is a pretty nasty way of doing this kind of operation because nobody else can read or write to the table during your operation. Therefore, ACCESS EXCLUSIVE should be avoided at all costs.
Up until now, you have seen how to handle locking as well as some basic concurrency. In this section, you will learn transaction isolation. To me, this is one of the most neglected topics in modern software development. Only a small fraction of software developers are actually aware of this issue, which in turn leads to mind-boggling bugs.
Here is an example of what can happen:
Transaction 1
Transaction 2
BEGIN;
SELECT sum(balance) FROM t_account;
User will see 300
BEGIN;
INSERT INTO t_account (balance) VALUES (100);
COMMIT;
SELECT sum(balance) FROM t_account;
User will see 400
COMMIT;
Most users would actually expect the left transaction to always return 300 regardless of the second transaction. However, this is not true. By default, PostgreSQL runs in the READ COMMITTED transaction isolation mode. This means that every statement inside a transaction will get a new snapshot of the data, which will be constant throughout the query.
If you want to avoid this, you can use TRANSACTION ISOLATION LEVEL REPEATABLE READ. In this transaction isolation level, a transaction will use the same snapshot through the entire transaction. Here is what will happen:
Transaction 1
Transaction 2
BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
SELECT sum(balance) FROM t_account;
User will see 300
BEGIN;
INSERT INTO t_account
(balance) VALUES (100);
COMMIT;
SELECT sum(balance) FROM t_account;
SELECT sum(balance) FROM
t_account;
User will see 300
User will see 400
COMMIT;
As just outlined, the first transaction will freeze its snapshot of the data and provide us with constant results throughout the entire transaction. This feature is especially important if you want to run reports. The first and the last page of a report should always be consistent and operate on the same data. Therefore, the repeatable read is key to consistent reports.
Note that isolation-related errors won't always pop up instantly. It can happen that trouble is noticed years after an application has been moved to production.
On top of read committed and repeatable read, PostgreSQL offers Serializable Snapshot Isolation (SSI) transactions. So, in all, PostgreSQL supports three isolation levels. Note that read uncommitted (which still happens to be the default in some commercial databases) is not supported: if you try to start a read uncommitted transaction, PostgreSQL will silently map to read committed. Let us get back to the serializable isolation level.
The idea behind serializable is simple; if a transaction is known to work correctly when there is only a single user, it will also work in the case of concurrency when this isolation level is chosen. However, users have to be prepared; transactions may fail (by design) and error-out. In addition to this, a performance penalty has to be paid.
If you want to know more about this isolation level, consider checking out https://wiki.postgresql.org/wiki/Serializable.