Mastering PostgreSQL 9.6 - Hans-Jürgen Schönig - E-Book

Mastering PostgreSQL 9.6 E-Book

Hans-Jürgen Schönig

0,0
44,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

PostgreSQL is an open source database used for handling large datasets (Big Data) and as a JSON document database. It also has applications in the software and web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently.

We begin by explaining the advanced database design concepts in PostgreSQL 9.6, along with indexing and query optimization. You will also see how to work with event triggers and perform concurrent transactions and table partitioning, along with exploring SQL and server tuning. We will walk you through implementing advanced administrative tasks such as server maintenance and monitoring, replication, recovery and high availability, and much more. You will understand the common and not-so-common troubleshooting problems and how you can overcome them.

By the end of this book, you will have an expert-level command of the advanced database functionalities and will be able to implement advanced administrative tasks with PostgreSQL.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 514

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Title Page

Mastering PostgreSQL 9.6
A comprehensive guide for PostgreSQL 9.6 developers and administrators
Hans-Jürgen Schönig

BIRMINGHAM - MUMBAI

Copyright

Mastering PostgreSQL 9.6

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: May 2017

Production reference: 1250517

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-78355-535-2

www.packtpub.com

Credits

Author

Hans-Jürgen Schönig

Copy Editor

Vikrant Phadkay

Reviewer

Shaun Thomas

Project Coordinator

Nidhi Joshi

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Varsha Shetty

Indexer

Aishwarya Gangawane

Content Development Editor

Aishwarya Pandere

Production Coordinator

Arvindkumar Gupta

Technical Editor

Dinesh Pawar

About the Author

Hans-Jürgen Schönig has 18 years of experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig & Schönig GmbH (www.postgresql-support.de). It has successfully served countless customers around the globe.

Before founding Cybertec Schönig & Schönig GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.

About the Reviewer

Shaun Thomas has been working with PostgreSQL since late 2000. From 2011 and beyond, he's been a frequent presenter at the PostgresOpen conference on topics such as handling extreme throughput, high availability, monitoring, architecture, and automation. He contributed a few PostgreSQL extensions, as well as a tool for administering massive database clusters. On occasion, he's even been known to guest lecture at the local university. His goal is to help the community make PostgreSQL a bigger, better database for everyone to enjoy.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1783555351.

If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

PostgreSQL Overview

What is new in PostgreSQL 9.6?

Understanding new database administration functions

Killing idle sessions

Finding more detailed information in pg_stat_activity

Tracking vaccum progress

Improving vacuum speed

Digging into new SQL and developer-related functions

Using new backup and replication functionality

Streamlining wal_level and monitoring

Using multiple synchronous standby servers

Understanding performance-related features

Improving relation extensions

Checkpoint sorting and kernel interaction

Using more advanced foreign data wrappers

Introducing parallel queries

Adding snapshot too old

Summary

Understanding Transactions and Locking

Working with PostgreSQL transactions

Handling errors inside a transaction

Making use of savepoints

Transactional DDLs

Understanding basic locking

Avoiding typical mistakes and explicit locking

Considering alternative solutions

Making use of FOR SHARE and FOR UPDATE

Understanding transaction isolation levels

Considering SSI transactions

Observing deadlocks and similar issues

Utilizing advisory locks

Optimizing storage and managing cleanup

Configuring VACUUM and autovacuum

Digging into transaction wraparound-related issues

A word on VACUUM FULL

Watching VACUUM at work

Making use of snapshot too old

Summary

Making Use of Indexes

Understanding simple queries and the cost model

Making use of EXPLAIN

Digging into the PostgreSQL cost model

Deploying simple indexes

Making use of sorted output

Using more than one index at a time

Using bitmap scans effectively

Using indexes in an intelligent way

Improving speed using clustered tables

Clustering tables

Making use of index only scans

Understanding additional B-tree features

Combined indexes

Adding functional indexes

Reducing space consumption

Adding data while indexing

Introducing operator classes

Hacking up an operator class for a B-tree

Creating new operators

Creating operator classes

Testing custom operator classes

Understanding PostgreSQL index types

Hash indexes

GiST indexes

Understanding how GiST works

Extending GiST

GIN indexes

Extending GIN

SP-GiST indexes

BRIN indexes

Extending BRIN indexes

Adding additional indexes

Achieving better answers with fuzzy searching

Taking advantage of pg_trgm

Speed up LIKE queries

Handling regular expressions

Understanding full-text search - FTS

Comparing strings

Defining GIN indexes

Debugging your search

Gathering word statistics

Taking advantage of exclusion operators

Summary

Handling Advanced SQL

Introducing grouping sets

Loading some sample data

Applying grouping sets

Investigating performance

Combining grouping sets with the FILTER clause

Making use of ordered sets

Understanding hypothetical aggregates

Utilizing windowing functions and analytics

Partitioning data

Ordering data inside a window

Using sliding windows

Abstracting window clauses

Making use of onboard windowing functions

rank and dense_rank functions

ntile() function

lead() and lag() functions

first_value(), nth_value(), and last_value() functions

row_number() function

Writing your own aggregates

Creating simple aggregates

Adding support for parallel queries

Improving efficiency

Writing hypothetical aggregates

Summary

Log Files and System Statistics

Gathering runtime statistics

Working with PostgreSQL system views

Checking live traffic

Inspecting databases

Inspecting tables

Making sense of pg_stat_user_tables

Digging into indexes

Tracking the background worker

Tracking, archiving, and streaming

Checking SSL connections

Inspecting transactions in real time

Tracking vacuum progress

Using pg_stat_statements

Creating log files

Configuring postgresql.conf file

Defining log destination and rotation

Configuring syslog

Logging slow queries

Defining what and how to log

Summary

Optimizing Queries for Good Performance

Learning what the optimizer does

Optimizations by example

Evaluating join options

Nested loops

Hash joins

Merge joins

Applying transformations

Inlining the view

Flattening subselects

Applying equality constraints

Exhaustive searching

Trying it all out

Making the process fail

Constant folding

Understanding function inlining

Join pruning

Speedup set operations

Understanding execution plans

Approaching plans systematically

Making EXPLAIN more verbose

Spotting problems

Spotting changes in runtime

Inspecting estimates

Inspecting buffer usage

Fixing high buffer usage

Understanding and fixing joins

Getting joins right

Processing outer joins

Understanding the join_collapse_limit variable

Enabling and disabling optimizer settings

Understanding genetic query optimization

Partitioning data

Creating partitions

Applying table constraints

Modifying inherited structures

Moving tables in and out of partitioned structures

Cleaning up data

Adjusting parameters for good query performance

Speeding up sorting

Speedup administrative tasks

Summary

Writing Stored Procedures

Understanding stored procedure languages

The anatomy of a stored procedure

Introducing dollar quoting

Making use of anonymous code blocks

Using functions and transactions

Understanding various stored procedure languages

Introducing PL/pgSQL

Handling quoting

Managing scopes

Understanding advanced error handling

Making use of GET DIAGNOSTICS

Using cursors to fetch data in chunks

Utilizing composite types

Writing triggers in PL/pgSQL

Introducing PL/Perl

Using PL/Perl for datatype abstraction

Deciding between PL/Perl and PL/PerlU

Making use of the SPI interface

Using SPI for set returning functions

Escaping in PL/Perl and support functions

Sharing data across function calls

Writing triggers in Perl

Introducing PL/Python

Writing simple PL/Python code

Using the SPI interface

Handling errors

Improving stored procedure performance

Reducing the number of function calls

Using cached plans

Assigning costs to functions

Using stored procedures

Summary

Managing PostgreSQL Security

Managing network security

Understanding bind addresses and connections

Inspecting connections and performance

Living in a world without TCP

Managing pg_hba.conf

Handling SSL

Handling instance-level security

Creating and modifying users

Defining database-level security

Adjusting schema-level permissions

Working with tables

Handling column-level security

Configuring default privileges

Digging into row-level security - RLS

Inspecting permissions

Reassigning objects and dropping users

Summary

Handling Backup and Recovery

Performing simple dumps

Running pg_dump

Passing passwords and connection information

Using environment variables

Making use of .pgpass

Using service files

Extracting subsets of data

Handling various data formats

Replaying backups

Handling global data

Summary

Making Sense of Backups and Replication

Understanding the transaction log

Looking at the transaction log

Understanding checkpoints

Optimizing the transaction log

Transaction log archiving and recovery

Configuring for archiving

Confguring the pg_hba.conf file

Creating base backups

Reducing the bandwidth of a backup

Mapping tablespaces

Using different formats

Testing transaction log archiving

Replaying the transaction log

Finding the right timestamp

Cleaning up the transaction log archive

Setting up asynchronous replication

Performing a basic setup

Improving security

Halting and resuming replication

Checking replication to ensure availability

Performing failovers and understanding timelines

Managing conflicts

Making replication more reliable

Upgrading to synchronous replication

Adjusting durability

Making use of replication slots

Handling physical replication slots

Handling logical replication slots

Use cases of logical slots

Summary

Deciding on Useful Extensions

Understanding how extensions work

Checking for available extensions

Making use of contrib modules

Using the adminpack

Applying bloom filters

Deploying btree_gist and btree_gin

Dblink - consider phasing out

Fetching files with file_fdw

Inspecting storage using pageinspect

Investigating caching with pg_buffercache

Encrypting data with pgcrypto

Prewarming caches with pg_prewarm

Inspecting performance with pg_stat_statements

Inspecting storage with pgstattuple

Fuzzy searches with pg_trgm

Connecting to remote servers using postgres_fdw

Handling mistakes and typos

Other useful extensions

Summary

Troubleshooting PostgreSQL

Approaching an unknown database

Inspecting pg_stat_activity

Querying pg_stat_activity

Treating Hibernate statements

Figuring out where queries come from

Checking for slow queries

Inspecting individual queries

Digging deeper with perf

Inspecting the log

Checking for missing indexes

Checking for memory and I/O

Understanding noteworthy error scenarios

Facing clog corruption

Understanding checkpoint messages

Managing corrupted data pages

Careless connection management

Fighting table bloat

Summary

Migrating to PostgreSQL

Migrating SQL statements to PostgreSQL

Using lateral joins

Supporting lateral

Using grouping sets

Supporting grouping sets

Using WITH clause - common table expressions

Supporting WITH clause

Using WITH RECURSIVE clause

Supporting WITH RECURSIVE clause

Using FILTER clause

Supporting FILTER clause

Using windowing functions

Supporting windowing and analytics

Using ordered sets - WITHIN GROUP clause

Supporting WITHIN GROUP clause

Using TABLESAMPLE clause

Supporting TABLESAMPLE clause

Using limit/offset

Supporting FETCH FIRST clause

Using OFFSET

Supporting OFFSET clause

Using temporal tables

Supporting temporal tables

Matching patterns in time series

Moving from Oracle to PostgreSQL

Using the oracle_fdw extension to move data

Using ora2pg to migrate from Oracle

Common pitfalls

Moving from MySQL or MariaDB to PostgreSQL

Handling data in MySQL and MariaDB

Changing column definitions

Handling null values

Expecting problems

Migrating data and schema

Using pg_chameleon

Using foreign data wrappers

Summary

Preface

PostgreSQL is an open source database management tool used for handling large datasets (big data) and as a JSON document database. It also has applications in the software and web domains. This book will enable you to build better PostgreSQL applications and administer databases more efficiently.

What this book covers

Chapter 1, PostgreSQL Overview, will give you an overview of PostgreSQL and its features. You will learn about new stuff and new functionality available in PostgreSQL.

Chapter 2, Understanding Transactions and Locking, will cover one of the most important aspects of any database system. Proper database work is usually not possible without the existence of transactions, and understanding transactions and locking is vital to performance as well as professional work.

Chapter 3, Making Use of Indexes, covers everything you need to know about indexes. Indexes are key to performance and are therefore an important cornerstone if you want good user experience and high throughput. All important aspects of indexing will be covered.

Chapter 4, Handling Advanced SQL, will introduce some of the most important concepts of modern SQL. You will learn about windowing functions as well as other important, more modern, elements of SQL.

Chapter 5, Log Files and System Statistics, will guide you through more administrative tasks, such as log file management and monitoring. You will learn how to inspect your servers and extract runtime information from PostgreSQL.

Chapter 6, Optimizing for Good Query Performance, will tell you everything you need to know about good PostgreSQL performance. The chapter will cover SQL tuning as well as information about memory management.

Chapter 7, Writing Stored Procedures, teaches you some more advanced topics related to server-side code. The most important server-side programming languages are covered and important aspects are pointed out.

Chapter 8, Managing PostgreSQL Security, has been designed to help you improve the security of your server. The chapter features everything from user management to row-level security. Information about encryption is also included.

Chapter 9, Handling Backup and Recovery, is all about backups and data recovery. You will learn to backup your data and it will enable you to restore things in case of disaster.

Chapter 10, Making Sense of Backups and Replication, is all about redundancy. You will learn to asynchronously and synchronously replicate PostgreSQL database systems. All modern features are covered as extensively as possible.

Chapter 11, Deciding on Useful Extensions, describes widely used modules that add additional functionality to PostgreSQL. You will learn about the most common extensions.

Chapter 12, Troubleshooting PostgreSQL, offers a systematic approach to fixing problems in PostgreSQL. It will enable you to spot common problems and approach them in an organized way.

Chapter 13, Migrating to PostgreSQL, is the final chapter of this book and shows you the way from commercial databases to PostgreSQL. The most important databases migrated these days will be covered.

What you need for this book

This book has been written for a broad audience. In order to follow the examples presented in this book, it makes sense to have at least some experience with SQL and maybe even PostgreSQL in general (although this is not a hard requirement). In general, it is a good idea to be familiar with the Unix command line.

Who this book is for

This book has explicitly been written for people who want to know more about PostgreSQL and who are not satisfied with basic information. The aim is to write a book that goes a bit deeper and explains the most important stuff in a clear and easy-to-understand way.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "In this case, the \timingcommand will tell psql to show the runtime of a query."

Any command-line input or output is written as follows:

test=# CREATE TABLE t_test (id serial, name text); CREATE TABLE test=# INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);

New terms and important words are shown in bold.

Warnings or important notes appear in a box like this.
Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

PostgreSQL Overview

PostgreSQL is one of the world's most advanced open source database systems and it has many features widely used by developers and system administrators alike. In this book, many of those cool features will be covered and discussed in great detail.

In this chapter, you will be introduced to PostgreSQL and the cool new features available in PostgreSQL 9.6 and beyond. All relevant new functionality will be covered in detail. Given the sheer number of changes made to the code and given the size of the PostgreSQL project, this list of features is of course not complete, so I tried to focus on the most important aspects relevant to most people.

The features outlined in this chapter will be split into the following categories:

Database administration

SQL and developer-related

Backup, recovery, and replication

Performance-related topics

What is new in PostgreSQL 9.6?

PostgreSQL 9.6 was released in late 2016 and is the last version that will still be following the old numbering scheme PostgreSQL has been using for more than a decade now. From PostgreSQL 10.0 onward, a new version numbering system will be in place. From 10.0 on, major releases will happen way more frequently.

Understanding new database administration functions

PostgreSQL 9.6 has many new features that can help the administrator to reduce work and make systems more robust.

One of those features is the idle_in_transaction_session_timeout function.

Killing idle sessions

In PostgreSQL, a session or a transaction can basically live almost forever. In some cases, this has been a problem because transactions were kept open for too long. Usually, this was due to a bug. The trouble is this: insanely long transactions can cause cleanup problems and table bloat can occur. The uncontrolled growth of a table (table bloat) naturally leads to performance problems and unhappy end users.

Starting with PostgreSQL 9.6, it is possible to limit the duration a database connection is allowed to spend inside a transaction without performing real work. Here is how it works:

test=# SET idle_in_transaction_session_timeout TO 2500; SET test=# BEGIN; BEGIN test=# SELECT 1; ?column? ---------- 1 (1 row) test=# SELECT 1; FATAL: terminating connection due to idle-in-transaction timeout

Administrators and developers can set a timeout, which is 2.5 seconds in my example. As soon as a transaction is idle for too long, the connection will be terminated automatically by the server. Nasty side effects of long idle transactions can be prevented easily by adjusting this parameter.

Finding more detailed information in pg_stat_activity

The pg_stat_activity function is a system view that has been around for many years. It basically contains a list of active connections. In older versions of PostgreSQL, administrators could see that a query is waiting for somebody else—however, it was not possible to figure out why and for whom. This has changed in 9.6. Two columns have been added:

test=# \d pg_stat_activity View "pg_catalog.pg_stat_activity" Column | Type | Modifiers ------------------+--------------------------+----------- ... wait_event_type | text | wait_event | text | ...

In addition to this extension, a new procedure has been added, which shows who caused whom to wait:

test=# SELECT * FROM pg_blocking_pids(4711);

pg_blocking_pids

------------------

{3435}

(1 row)

When the function is called, it will return a list of blocking PIDs.

Tracking vaccum progress

For many years, people have asked for a progress tracker for vacuum. Finally, PostgreSQL 9.6 makes this wish come true by introducing a new system view. Here is how it works:

postgres=# SELECT * FROM pg_stat_progress_vacuum ;

­[ RECORD 1 ]­­­­­­+­­­­­­­­­­­­­­

pid | 29546

datid | 67535

datname | test

relid | 16402

phase | scanning heap

heap_blks_total | 6827

heap_blks_scanned | 77

heap_blks_vacuumed | 0

index_vacuum_count | 0

max_dead_tuples | 154

num_dead_tuples | 0

PostgreSQL will provide detailed information about ongoing vacuum processes so that people can track the progress of this vital operation.

Improving vacuum speed

PostgreSQL 9.6 not only provides you with deeper insights into what vacuum does at the moment, it will also speed up the process in general. From PostgreSQL 9.6 onward, PostgreSQL will keep track of all frozen pages and avoid vacuuming those pages.

Tables that are mostly read-only will massively benefit from this change, as vacuum load is drastically reduced.

Digging into new SQL and developer-related functions

One of the most promising new features of PostgreSQL is the ability to perform phrase searching. Up to 9.5 it was only possible to search for words—phrase searching was very hard to do. 9.6 nicely removes this limitation. Here is an example of how it works:

test=# SELECT phraseto_tsquery('Under pressure') @@ to_tsvector('Something was under some sort of pressure');

?column?

----------

f

(1 row)

test=# SELECT phraseto_tsquery('Under pressure') @@ to_tsvector('Under pressure by David Bowie hit number 1 again');

?column?

----------

t

(1 row)

The first query returns false because the words we are looking for do not occur in the desired order. In the second example, true is returned because there really is a proper match.

However, there is more: in 9.6 it is possible to check whether words show up in a certain order. In the following example, we want a word to be between united and nations:

test=# SELECT tsquery('united <2> nations') @@ to_tsvector('are we really united, happy nations?');

?column?

----------

t

(1 row)

test=# SELECT tsquery('united <2> nations') @@ to_tsvector('are we really at united nations?');

?column?

----------

f

(1 row)

The second example returns false as there is no word between united and nations.

Using new backup and replication functionality

PostgreSQL 9.6 has also seen improvements in the area of backup and recovery.

Streamlining wal_level and monitoring

The wal_level setting has always been a bit hard to understand for many people. Many were struggling with the difference between the archive and hot_standby settings. To remove this confusion altogether, both settings have been replaced with the easier-to-understand replica setting, which does the same as hot_standby.

In addition to that, the monitoring of replicated setups has been simplified. Prior to 9.6, there was only the pg_stat_replication view, which could be queried on the master to supervise the flow of data to the slave. Now it is also possible to monitor the flow of data on the slaves, by consulting the pg_stat_wal_receiver function. It is basically the slave-side mirror of the pg_stat_replication function and helps to determine the state of replication.

Understanding performance-related features

Just like every release of PostgreSQL, there are numerous performance improvements, which can help to speedup applications. In this section, I want to focus on the most important and most powerful ones. Of course, there are many more small improvements than listed here.

Improving relation extensions

For many years PostgreSQL has extended a table (or an index) block by block. In the case of a single writer process, this was usually fine. However, in cases of high-concurrency writing, writing a block at a time was a source of contention and suboptimal performance. From 9.6 onward, PostgreSQL started to extend tables by multiple blocks at a time. The number of blocks added at a time is 20 times the number of waiting processes.

Checkpoint sorting and kernel interaction

When PostgreSQL writes changes to disk during a checkpoint, it now does so in a more orderly way to ensure that writes are more sequential than earlier. This is done by sorting blocks before sending them too. Random writes will be dramatically reduced this way, which in turn leads to higher throughput on most hardware.

Sorted checkpoints are not the only scalability thing to make it into 9.6. There are also new kernel write-back configuration options: what does this mean? In case of large caches, it could take quite a long time to write all changes out. This used to be especially nasty on systems with hundreds of gigabytes of memory because fairly intense I/O storms could happen. Of course, the operating system, level behavior of Linux could be changed using the /proc/sys/vm/dirty_background_ratio command. However, only a handful of consultants and system administrators actually knew how to do that and why. The checkpoint_flush_after, bgwriter_flush_after, and backend_flush_after functions can be used now to control the flush behavior. In general, the rule is to flush earlier. Still, as the feature is new, people are still gathering experience on how to use those settings in the most efficient way possible.

Using more advanced foreign data wrappers

Foreign data wrappers have been around for many years. Starting with PostgreSQL 9.6, the optimizer can use foreign tables way more efficiently. This includes join push down (joins can now already be performed remotely) and order push down (sorting can now happen remotely). Distributing data inside a cluster is now way more efficient due to faster remote operations.

Introducing parallel queries

Traditionally, a query had to run on a single CPU. While this was just fine in the OLTP world, it started to be a problem for analytical applications, which were bound to the speed provided by a single core. With PostgreSQL 9.6, parallel queries were introduced. Of course, implementing parallel queries was hard and so a lot of infrastructure has already been implemented over the years. All this infrastructure is now available to provide the end user with parallel sequential scans. The idea is to make many CPUs work on complicated WHERE conditions during a sequential scan. Version 9.6 also allowed for parallel aggregates and parallel joins. Of course, there is a lot of work left, but we are already looking at a major leap forward.

To control parallelism, there are two essential settings:

test=# SHOW max_worker_processes; max_worker_processes ---------------------- 8 (1 row) test=# SHOW max_parallel_workers_per_gather ; max_parallel_workers_per_gather --------------------------------- 2 (1 row)

The first one limits the overall number of worker processes available. The second one controls the number of workers allowed per gather node.

A gather node is a new thing you will see in an execution plan. It is in charge of unifying results coming from parallel subprocesses.

In addition to those fundamental settings, there are a couple of new optimizer parameters to adjust the cost of parallel queries.

Adding snapshot too old

Those of you using Oracle would be aware of the following error message: snapshot too old. In Oracle, this message indicates that a transaction has been too long, so it has to be aborted. In PostgreSQL, transactions can run almost infinitely. However, long transactions can still be a problem, so the snapshot too old error has been added as a feature to 9.6, which allows transactions to be aborted after a certain amount of time.

The idea behind that is to prevent table bloat and to make sure that end users are aware of the fact that they might be about to do something stupid.

Summary

In PostgreSQL 9.6 and 10.0, a lot of functionality has been added, which allows people to run even more professional applications even more faster and more efficiently. As far as PostgreSQL 10.0 is concerned, the exact new features are not fully defined yet; some things are already known and are outlined in this chapter.

Understanding Transactions and Locking

Locking is an important topic in any kind of database. It is not enough to understand just how it works to write proper or better applications; it is also essential from a performance point of view. Without properly handling locks, your applications might not only be slow, it might also be wrong and behave in an insane way. In my judgment, locking is key to performance and having a good overview will certainly help. Therefore, understanding locking and transaction is important for administrators and developers alike.

In this chapter, you will learn:

Basic locking

Transactions and transaction isolation

Deadlocks

Locking and foreign keys

Explicit and implicit locking

Advisory locks

At the end of the chapter, you will be able to understand and utilize PostgreSQL transactions in the most efficient way possible.

Working with PostgreSQL transactions

PostgreSQL provides you with a highly advanced transaction machinery that offers countless features to developers and administrators alike. In this section, it is time to look at the basic concept.

The first important thing to know is this: in PostgreSQL, everything is a transaction. If you send a simple query to the server, it is already a transaction. Here is an example:

test=# SELECT now(), now();

now | now

------------------------------+------------------------------

2016-08-30 12:03:27.84596+02 | 2016-08-30 12:03:27.84596+02

(1 row)

In this case, the SELECT statement will be a separate transaction. If the same command is executed again, different timestamps will be returned.

Keep in mind that the now() function will return the transaction time. The SELECT statement will, therefore, always return two identical timestamps.

If more than one statement has to be part of the same transactions, the BEGIN clause must be used:

test=# h BEGIN

Command: BEGIN

Description: start a transaction block

Syntax:

BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]

where transaction_mode is one of:

ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ

| READ COMMITTED | READ UNCOMMITTED }

READ WRITE | READ ONLY

[ NOT ] DEFERRABLE

The BEGIN clause will ensure that more than one command will be packed into a transaction. Here is how it works:

test=# BEGIN;

BEGIN

test=# SELECT now();

now

-------------------------------

2016-08-30 12:13:54.839277+02

(1 row)

test=# SELECT now();

now

-------------------------------

2016-08-30 12:13:54.839277+02

(1 row)

test=# COMMIT;

COMMIT

The important point here is that both timestamps will be identical. As mentioned earlier, we are talking about transaction time here.

To end the transaction, COMMIT can be used:

test=# h COMMIT

Command: COMMIT

Description: commit the current transaction

Syntax:

COMMIT [ WORK | TRANSACTION ]

There are a couple of syntax elements here. You can just use COMMIT, COMMIT WORK, or COMMIT TRANSACTION. All three options have the same meaning. If this is not enough, there is more:

test=# h END

Command: END

Description: commit the current transaction

Syntax:

END [ WORK | TRANSACTION ]

The END clause is the same as the COMMIT clause.

ROLLBACK is the counterpart of COMMIT. Instead of successfully ending a transaction, it will simply stop the transaction without ever making things visible to other transactions:

test=# h ROLLBACK

Command: ROLLBACK

Description: abort the current transaction

Syntax:

ROLLBACK [ WORK | TRANSACTION ]

Some applications use ABORT instead of ROLLBACK. The meaning is the same.

Handling errors inside a transaction

It is not always the case that transactions are correct from beginning to end. However, in PostgreSQL, only error-free transactions can be committed. Here is what happens:

test=# BEGIN;

BEGIN

test=# SELECT 1;

?column?

----------

1

(1 row)

test=# SELECT 1 / 0;

ERROR: division by zero

test=# SELECT 1;

ERROR: current transaction is aborted, commands ignored until end of transaction block

test=# COMMIT;

ROLLBACK

Note that the division by zero did not work out.

In any proper database, an instruction like this will instantly error-out and make the statement fail.

It is important to point out that PostgreSQL will error-out, unlike MySQL, which does not seem to have a problem with a mathematically wrong result.

After an error has occurred, no more instructions will be accepted even if those instructions are semantically and syntactically correct. It is still possible to issue a COMMIT. However, PostgreSQL will roll back the transaction because it is the only thing at this point that can still be done.

Making use of savepoints

In professional applications, it can be pretty hard to write reasonably long transactions without ever encountering a single error. To solve the problem, users can utilize something called SAVEPOINT. As the name indicates, it is a safe place inside a transaction that the application can return to in the event things go terribly wrong. Here is an example:

test=# BEGIN;

BEGIN

test=# SELECT 1;

?column?

----------

1

(1 row)

test=# SAVEPOINT a;

SAVEPOINT

test=# SELECT 2 / 0;

ERROR: division by zero

test=# ROLLBACK TO SAVEPOINT a;

ROLLBACK

test=# SELECT 3;

?column?

----------

3

(1 row)

test=# COMMIT;

COMMIT

After the first SELECT clause, I decided to create a SAVEPOINT to make sure that the application can always return to this point inside the transaction. As you can see, a SAVEPOINT has a name, which is referred to later.

After returning to a, the transaction can proceed normally. The code has jumped back before the error, so everything is fine.

The number of savepoints inside a transaction is practically unlimited. We have seen customers with over 250,000 savepoints in a single operation. PostgreSQL can easily handle that.

If you want to remove a savepoint from inside a transaction, there is RELEASE SAVEPOINT:

test=# h RELEASE SAVEPOINT

Command: RELEASE SAVEPOINT

Description: destroy a previously defined savepoint

Syntax:

RELEASE [ SAVEPOINT ] savepoint_name

Many people ask, What will happen if you try to reach a savepoint after a transaction has ended? The answer is that the life of a savepoint ends as soon as the transaction ends. In other words, there is no way to return to a certain point in time after the transactions have been completed.

Transactional DDLs

PostgreSQL has a very nice feature that is unfortunately not present in many commercial database systems. In PostgreSQL, it is possible to run DDLs (commands that change the data structure) inside a transaction block. In a typical commercial system, a DDL will implicitly commit the current transaction. Not so in PostgreSQL.

Apart from some minor exceptions (DROP DATABASE, CREATE TABLESPACE/DROP TABLESPACE, and so on), all DDLs in PostgreSQL are transactional, which is a huge plus and a real benefit to end users.

Here is an example:

test=# d

No relations found.

test=# BEGIN;

BEGIN

test=# CREATE TABLE t_test (id int);

CREATE TABLE

test=# ALTER TABLE t_test ALTER COLUMN id TYPE int8;

ALTER TABLE

test=# d t_test

Table "public.t_test"

Column | Type | Modifiers

--------+--------+-----------

id | bigint |

test=# ROLLBACK;

ROLLBACK

test=# d t_test

Did not find any relation named "t_test".

In this example, a table has been created and modified, and the entire transaction is aborted instantly. As you can see, there is no implicit COMMIT or any other strange behavior. PostgreSQL simply acts as expected.

Transactional DDLs are especially important if you want to deploy software. Just imagine running a CMS. If a new version is released, you'll want to upgrade. Running the old version would still be OK; running the new version is also OK but you really don't want a mixture of old and new. Therefore, deploying an upgrade in a single transaction is definitely highly beneficial as it makes upgrades an atomic operation.

psql allows you to include files using the i directive. It allows you to start a transaction, load various files, and execute them in a single transaction.

Avoiding typical mistakes and explicit locking

In my life as a professional PostgreSQL consultant (http://postgresql-support.de/), I have seen a couple of mistakes that are made again and again. If there are constants in life, these typical mistakes are definitely some of the things that never change.

Here is my favorite:

Transaction 1

Transaction 2

BEGIN;

BEGIN;

SELECT max(id) FROM product;

SELECT max(id) FROM product;

User will see 17

User will see 17

User will decide to use 18

User will decide to use 18

INSERT INTO product ... VALUES (18, ...)

INSERT INTO product ... VALUES (18, ...)

COMMIT;

COMMIT;

In this case, there will be either a duplicate key violation or two identical entries. Neither variation of the problem is all that appealing.

One way to fix the problem is to use explicit table locking:

test=# h LOCK

Command: LOCK

Description: lock a table

Syntax:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ]

where lockmode is one of:

ACCESS SHARE | ROW SHARE | ROW EXCLUSIVE |

SHARE UPDATE EXCLUSIVE| SHARE |

SHARE ROW EXCLUSIVE | EXCLUSIVE | ACCESS EXCLUSIVE

As you can see, PostgreSQL offers eight types of locks to lock an entire table. In PostgreSQL, a lock can be as light as an ACCESS SHARE lock or as heavy as an ACCESS EXCLUSIVE lock. The following list shows what these locks do:

ACCESS SHARE

: This type of lock is taken by reads and conflicts only with

ACCESS EXCLUSIVE

, which is set by

DROP TABLE

and the like. Practically, this means that a

SELECT

cannot start if a table is about to be dropped. This also implies that

DROP TABLE

has to wait until a reading transaction is completed.

ROW SHARE

: PostgreSQL takes this kind of lock in the case of

SELECT FOR UPDATE

/

SELECT FOR

SHARE

. It conflicts with

EXCLUSIVE

and

ACCESS EXCLUSIVE

.

ROW EXCLUSIVE

: This lock is taken by

INSERT

,

UPDATE

, and

DELETE

. It conflicts with

SHARE

,

SHARE ROW EXCLUSIVE

,

EXCLUSIVE

, and

ACCESS EXCLUSIVE

.

SHARE UPDATE EXLUSIVE

: This kind of lock is taken by

CREATE INDEX CONCURRENTLY

,

ANALYZE

,

ALTER TABLE

,

VALIDATE

, and some other flavors of

ALTER TABLE

as well as by

VACUUM

(not

VACUUM FULL

). It conflicts with the

SHARE UPDATE EXCLUSIVE

,

SHARE

,

SHARE ROW EXCLUSIVE

,

EXCLUSIVE

, and

ACCESS EXCLUSIVE

lock modes.

SHARE

: When an index is created,

SHARE

locks will be set. It conflicts with

ROW EXCLUSIVE

,

SHARE UPDATE EXCLUSIVE

,

SHARE ROW EXCLUSIVE

,

EXCLUSIVE

, and

ACCESS EXCLUSIVE

.

SHARE ROW EXCLUSIVE

: This one is set by

CREATE TRIGGER

and some forms of

ALTER TABLE

, and conflicts with everything but

ACCESS SHARE

.

EXCLUSIVE

: This type of lock is by far the most restrictive one. It protects against reads and writes alike. If this lock is taken by a transaction, nobody else can read or write to the table affected.

Given the PostgreSQL locking infrastructure, one solution to the max-problem outlined previously would be:

BEGIN;

LOCK TABLE product IN ACCESS EXCLUSIVE MODE;

INSERT INTO product SELECT max(id) + 1, ... FROM product;

COMMIT;

Keep in mind that this is a pretty nasty way of doing this kind of operation because nobody else can read or write to the table during your operation. Therefore, ACCESS EXCLUSIVE should be avoided at all costs.

Understanding transaction isolation levels

Up to now, you have seen how to handle locking as well as some basic concurrency. In this section, you will learn about transaction isolation. To me, this is one of the most neglected topics in modern software development. Only a small fraction of software developers are actually aware of this issue, which in turn leads to disgusting and mind-boggling bugs.

Here is an example of what can happen:

Transaction 1

Transaction 2

BEGIN;

SELECT sum(balance) FROM t_account;

User will see 300

BEGIN;

INSERT INTO t_account (balance) VALUES (100);

COMMIT;

SELECT sum(balance) FROM t_account;

User will see 400

COMMIT;

Most users would actually expect the left transaction to always return 300 regardless of the second transaction. However, this is not true. By default, PostgreSQL runs in READ COMMITTED transaction isolation mode. This means that every statement inside a transaction will get a new snapshot of the data, which will be constant throughout the query.

An SQL statement will operate on the same snapshot and will ignore changes by concurrent transactions while it is running.

If you want to avoid that, you can use TRANSACTION ISOLATION LEVEL REPEATABLE READ. In this transaction isolation level, a transaction will use the same snapshot through the entire transactions. Here is what will happen:

Transaction 1

Transaction 2

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;

SELECT sum(balance) FROM t_account;

User will see 300

BEGIN;

INSERT INTO t_account (balance) VALUES (100);

COMMIT;

SELECT sum(balance) FROM t_account;

SELECT sum(balance) FROM t_account;

User will see 300

User will see 400

COMMIT;

As just outlined, the first transaction will freeze its snapshot of the data and provide us with constant results throughout the entire transaction. This feature is especially important if you want to run reports. The first and the last page of a report should always be consistent and operate on the same data. Therefore, repeatable read is key to consistent reports.

Note that isolation-related errors won't always pop up instantly. It can happen that trouble is noticed years after an application has been moved to production.

Repeatable read is not more expensive than read committed. There is no need to worry about performance penalties.

Considering SSI transactions

On top of read committed and repeatable read, PostgreSQL offers serializable (or SSI) transactions. So, in all, PostgreSQL supports three isolation levels. Note that read uncommitted (which still happens to be the default in some commercial databases) is not supported: if you try to start a read uncommitted transaction, PostgreSQL will silently map to read committed. However, back to serializable.

The idea behind serializable is simple; if a transaction is known to work correctly if there is only a single user, it will also work in the case of concurrency if this isolation level is chosen. However, users have to be prepared; transactions may fail (by design) and error-out. In addition to that, a performance penalty has to be paid.

If you want to know more about this isolation level, consider checking out https://wiki.postgresql.org/wiki/Serializable.

Consider using serializable only when you have a decent understanding of what is going on inside the database engine.