E-Book
36,59 €

Mastering PostgreSQL 11 - Hans-Jurgen Schonig - E-Book

Mastering PostgreSQL 11 E-Book

Hans-Jürgen Schönig

0,0

36,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Master the capabilities of PostgreSQL 11 to efficiently manage and maintain your database

Key Features

Master advanced concepts of PostgreSQL 11 with real-world datasets and examples

Explore query parallelism, data replication, and database performance while working with larger datasets

Extend the functionalities of your PostgreSQL instance to suit your organization's needs with minimal effort

Book Description

This second edition of Mastering PostgreSQL 11 helps you build dynamic database solutions for enterprise applications using the latest release of PostgreSQL, which enables database analysts to design both the physical and technical aspects of the system architecture with ease.

This book begins with an introduction to the newly released features in PostgreSQL 11 to help you build efficient and fault-tolerant PostgreSQL applications. You'll examine all of the advanced aspects of PostgreSQL in detail, including logical replication, database clusters, performance tuning, monitoring, and user management. You will also work with the PostgreSQL optimizer, configuring PostgreSQL for high speed, and see how to move from Oracle to PostgreSQL. As you progress through the chapters, you will cover transactions, locking, indexes, and optimizing queries to improve performance. Additionally, you'll learn to manage network security and explore backups and replications, while understanding the useful extensions of PostgreSQL so that you can optimize the speed and performance of large databases.

By the end of this book, you will be able to use your database to its utmost capacity by implementing advanced administrative tasks with ease.

What you will learn

Get to grips with advanced PostgreSQL 11 features and SQL functions

Make use of the indexing features in PostgreSQL and fine-tune the performance of your queries

Work with stored procedures and manage backup and recovery

Master replication and failover techniques

Troubleshoot your PostgreSQL instance for solutions to common and not-so-common problems

Perform database migration from MySQL and Oracle to PostgreSQL with ease

Who this book is for

This book is for data and database professionals wanting to implement advanced functionalities and master complex administrative tasks with PostgreSQL 11. Prior experience of database administration with PostgreSQL database will aid in understanding the concepts covered in this book.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 585

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Mastering PostgreSQL 11Second Edition

Expert techniques to build scalable, reliable, and fault-tolerant database applications

Hans-Jürgen Schönig

BIRMINGHAM - MUMBAI

Mastering PostgreSQL 11 Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Pravin DhandreAcquisition Editor:Namrata PatilContent Development Editor:Rhea HenriquesTechnical Editor: Dharmendra YadavCopy Editor: Safis EditingProject Coordinator: Manthan PatelProofreader: Safis EditingIndexer:Mariammal ChettiyarGraphics:Jisha ChirayilProduction Coordinator: Shantanu Zagade

First published: January 2018 Second edition: October 2018

Production reference: 1291018

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78953-781-9

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.Packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.Packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Hans-Jürgen Schönig has 18 years' experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig and Schönig GmbH. It has successfully served countless customers around the globe. Before founding Cybertec Schönig and Schönig GmbH in 2000, he worked as a database developer at a private research company that focused on the Austrian labor market, where he primarily worked on data mining and forecast models. He has also written several books about PostgreSQL.

About the reviewer

Sheldon Strauch is a twenty-year software consulting veteran, having consulted for companies such as IBM, Sears, Ernst and Young, and Kraft Foods. He has a Bachelor's degree in Business Administration and leverages his technical skills to improve businesses' self-awareness. His interests include data gathering, management, and mining; maps and mapping; business intelligence; and application of data analysis for continuous improvement. He is currently focused on development of end-to-end data management and mining at Enova International, a financial services company located in Chicago. In his spare time he enjoys the performing arts, particularly music, and traveling with his wife Marilyn.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Mastering PostgreSQL 11 Second Edition

Packt Upsell

Why subscribe?

PacktPub.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Conventions used

Get in touch

Reviews

PostgreSQL Overview

What is new in PostgreSQL 11.0?

Understanding the new database administration functions

Using configurable WAL-segment sizes

Larger queryid in pg_stat_statements

Improved indexing and better optimization

Expression index statistics

INCLUDE indexes or covering indexes

Parallel index creation

Better cache management

Improving pg_prewarm

Enhancing windowing functions

Introducing just-in-time compilation

Enhanced partitioning

Adding support for stored procedures

Improving ALTER TABLE

Summary

Questions

Understanding Transactions and Locking

Working with PostgreSQL transactions

Handling errors inside a transaction

Making use of SAVEPOINT

Transactional DDLs

Understanding basic locking

Avoiding typical mistakes and explicit locking

Considering alternative solutions

Making use of FOR SHARE and FOR UPDATE

Understanding transaction isolation levels

Considering SSI transactions

Observing deadlocks and similar issues

Utilizing advisory locks

Optimizing storage and managing cleanup

Configuring VACUUM and autovacuum

Digging into transaction wraparound-related issues

A word on VACUUM FULL

Watching VACUUM at work

Limiting transactions by making use of snapshot too old

Summary

Questions

Making Use of Indexes

Understanding simple queries and the cost model

Making use of EXPLAIN

Digging into the PostgreSQL cost model

Deploying simple indexes

Making use of sorted output

Using more than one index at a time

Using bitmap scans effectively

Using indexes in an intelligent way

Improving speed using clustered tables

Clustering tables

Making use of index-only scans

Understanding additional btree features

Combined indexes

Adding functional indexes

Reducing space consumption

Adding data while indexing

Introducing operator classes

Hacking up an operator class for a btree

Creating new operators

Creating operator classes

Testing custom operator classes

Understanding PostgreSQL index types

Hash indexes

GiST indexes

Understanding how GiST works

Extending GiST

GIN indexes

Extending GIN

SP-GiST indexes

BRIN indexes

Extending BRIN indexes

Adding additional indexes

Achieving better answers with fuzzy searching

Taking advantage of pg_trgm

Speeding up LIKE queries

Handling regular expressions

Understanding full-text search

Comparing strings

Defining GIN indexes

Debugging your search

Gathering word statistics

Taking advantage of exclusion operators

Summary

Questions

Handling Advanced SQL

Introducing grouping sets

Loading some sample data

Applying grouping sets

Investigating performance

Combining grouping sets with the FILTER clause

Making use of ordered sets

Understanding hypothetical aggregates

Utilizing windowing functions and analytics

Partitioning data

Ordering data inside a window

Using sliding windows

Abstracting window clauses

Making use of on-board windowing functions

The rank and dense_rank functions

The ntile() function

The lead() and lag() functions

The first_value(), nth_value(), and last_value() functions

The row_number() function

Writing your own aggregates

Creating simple aggregates

Adding support for parallel queries

Improving efficiency

Writing hypothetical aggregates

Summary

Log Files and System Statistics

Gathering runtime statistics

Working with PostgreSQL system views

Checking live traffic

Inspecting databases

Inspecting tables

Making sense of pg_stat_user_tables

Digging into indexes

Tracking the background worker

Tracking, archiving, and streaming

Checking SSL connections

Inspecting transactions in real time

Tracking vacuum progress

Using pg_stat_statements

Creating log files

Configuring the postgresql.conf file

Defining log destination and rotation

Configuring syslog

Logging slow queries

Defining what and how to log

Summary

Questions

Optimizing Queries for Good Performance

Learning what the optimizer does

Optimizations by example

Evaluating join options

Nested loops

Hash joins

Merge joins

Applying transformations

Inlining the view

Flattening subselects

Applying equality constraints

Exhaustive searching

Trying it all out

Making the process fail

Constant folding

Understanding function inlining

Join pruning

Speedup set operations

Understanding execution plans

Approaching plans systematically

Making EXPLAIN more verbose

Spotting problems

Spotting changes in runtime

Inspecting estimates

Inspecting buffer usage

Fixing high buffer usage

Understanding and fixing joins

Getting joins right

Processing outer joins

Understanding the join_collapse_limit variable

Enabling and disabling optimizer settings

Understanding genetic query optimization

Partitioning data

Creating partitions

Applying table constraints

Modifying inherited structures

Moving tables in and out of partitioned structures

Cleaning up data

Understanding PostgreSQL 11.0 partitioning

Adjusting parameters for good query performance

Speeding up sorting

Speeding up administrative tasks

Making use of parallel queries

What PostgreSQL is able to do in parallel?

Parallelism in practice

Introducing JIT compilation

Configuring JIT

Running queries

Summary

Writing Stored Procedures

Understanding stored procedure languages

Understanding fundamentals – stored procedures versus functions

The anatomy of a function

Introducing dollar quoting

Making use of anonymous code blocks

Using functions and transactions

Understanding various stored procedure languages

Introducing PL/pgSQL

Handling quoting

Managing scopes

Understanding advanced error handling

Making use of GET DIAGNOSTICS

Using cursors to fetch data in chunks

Utilizing composite types

Writing triggers in PL/pgSQL

Writing stored procedures in PL/pgSQL

Introducing PL/Perl

Using PL/Perl for datatype abstraction

Deciding between PL/Perl and PL/PerlU

Making use of the SPI interface

Using SPI for set returning functions

Escaping in PL/Perl and support functions

Sharing data across function calls

Writing triggers in Perl

Introducing PL/Python

Writing simple PL/Python code

Using the SPI interface

Handling errors

Improving functions

Reducing the number of function calls

Using cached plans

Assigning costs to functions

Using functions for various purposes

Summary

Questions

Managing PostgreSQL Security

Managing network security

Understanding bind addresses and connections

Inspecting connections and performance

Living in a world without TCP

Managing pg_hba.conf

Handling SSL

Handling instance-level security

Creating and modifying users

Defining database-level security

Adjusting schema-level permissions

Working with tables

Handling column-level security

Configuring default privileges

Digging into row-level security – RLS

Inspecting permissions

Reassigning objects and dropping users

Summary

Questions

Handling Backup and Recovery

Performing simple dumps

Running pg_dump

Passing passwords and connection information

Using environment variables

Making use of .pgpass

Using service files

Extracting subsets of data

Handling various formats

Replaying backups

Handling global data

Summary

Questions

Making Sense of Backups and Replication

Understanding the transaction log

Looking at the transaction log

Understanding checkpoints

Optimizing the transaction log

Transaction log archiving and recovery

Configuring for archiving

Configuring the pg_hba.conf file

Creating base backups

Reducing the bandwidth of a backup

Mapping tablespaces

Using different formats

Testing transaction log archiving

Replaying the transaction log

Finding the right timestamp

Cleaning up the transaction log archive

Setting up asynchronous replication

Performing a basic setup

Improving security

Halting and resuming replication

Checking replication to ensure availability

Performing failovers and understanding timelines

Managing conflicts

Making replication more reliable

Upgrading to synchronous replication

Adjusting durability

Making use of replication slots

Handling physical replication slots

Handling logical replication slots

Use cases of logical slots

Making use of CREATE PUBLICATION and CREATE SUBSCRIPTION

Summary

Questions

Deciding on Useful Extensions

Understanding how extensions work

Checking for available extensions

Making use of contrib modules

Using the adminpack module

Applying bloom filters

Deploying btree_gist and btree_gin

Dblink – considering phasing out

Fetching files with file_fdw

Inspecting storage using pageinspect

Investigating caching with pg_buffercache

Encrypting data with pgcrypto

Prewarming caches with pg_prewarm

Inspecting performance with pg_stat_statements

Inspecting storage with pgstattuple

Fuzzy searches with pg_trgm

Connecting to remote servers using postgres_fdw

Handling mistakes and typos

Other useful extensions

Summary

Troubleshooting PostgreSQL

Approaching an unknown database

Inspecting pg_stat_activity

Querying pg_stat_activity

Treating Hibernate statements

Figuring out where queries come from

Checking for slow queries

Inspecting individual queries

Digging deeper with perf

Inspecting the log

Checking for missing indexes

Checking for memory and I/O

Understanding noteworthy error scenarios

Facing clog corruption

Understanding checkpoint messages

Managing corrupted data pages

Careless connection management

Fighting table bloat

Summary

Questions

Migrating to PostgreSQL

Migrating SQL statements to PostgreSQL

Using lateral joins

Supporting lateral

Using grouping sets

Supporting grouping sets

Using the WITH clause – common table expressions

Supporting the WITH clause

Using the WITH RECURSIVE clause

Supporting the WITH RECURSIVE clause

Using the FILTER clause

Supporting the FILTER clause

Using windowing functions

Supporting windowing and analytics

Using ordered sets – the WITHIN GROUP clause

Supporting the WITHIN GROUP clause

Using the TABLESAMPLE clause

Supporting TABLESAMPLE clause

Using limit/offset

Supporting the FETCH FIRST clause

Using OFFSET

Supporting the OFFSET clause

Using temporal tables

Supporting temporal tables

Matching patterns in time series

Moving from Oracle to PostgreSQL

Using the oracle_fdw extension to move data

Using ora2pg to migrate from Oracle

Common pitfalls

ora_migrator – migrating from Oracle to PostgreSQL quickly

How does ora_migrator work?

Moving from MySQL or MariaDB to PostgreSQL

Handling data in MySQL and MariaDB

Changing column definitions

Handling null values

Expecting problems

Migrating data and schema

Using pg_chameleon

Using FDWs

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

This second edition of Mastering PostgreSQL 11 helps you build dynamic database solutions for enterprise applications using the latest release of PostgreSQL, which enables database analysts to design both physical and technical aspects of the system architecture with ease.

The book begins with an introduction to the newly released features in PostgreSQL 11 to help you build efficient and fault-tolerant PostgreSQL applications. You'll examine all the advanced aspects of PostgreSQL in detail, including logical replication, database clusters, performance tuning, monitoring, and user management. You will also work with the PostgreSQL optimizer, configure PostgreSQL for high speed, and see how to move from Oracle to PostgreSQL. As you progress through the chapters, you will cover transactions, locking, indexes, and optimizing queries to improve performance.

Additionally, you'll learn how to manage network security and explore backups and replications while understanding the useful extensions of PostgreSQL so that you can optimize the speed and performance of using large databases.

By the end of this book, you will be able to use your database to its utmost capacity by implementing advanced administrative tasks with ease.

Who this book is for

This book includes the newly introduced features in PostgreSQL 11 and shows you how to build better PostgreSQL applications, and administer your PostgreSQL database efficiently. You will master the advanced features of PostgreSQL and acquire the necessary skills to build efficient database solutions.

What this book covers

Chapter 1, PostgreSQL Overview, introduces PostgreSQL and the new features available in PostgreSQL 11 and beyond.

Chapter 2, Understanding Transactions and Locking, explores locking and transactions, and utilizing PostgreSQL transactions in the most efficient way possible.

Chapter 3, Making Use of Indexes, discusses indexes, their types, use cases, and how to implement our own indexing strategies.

Chapter 4, Handling Advanced SQL, is about modern SQL and its features. We will explore sets and their various types and write our own aggregates.

Chapter 5, Log Files and System Statistics, explains how to make sense of database statistics.

Chapter 6, Optimizing Queries for Good Performance, explains how to write better and faster queries. We shall also focus on understanding what makes a query poor or bad in nature.

Chapter 7, Writing Stored Procedures, takes a closer look at the basic differences between procedures and functions. Stored procedures, using extensions, and some more advanced features of PL/pgSQL shall also be discussed.

Chapter 8, Managing PostgreSQL Security, covers the most common security problems that you will face as a PostgreSQL developer and DBA.

Chapter 9, Handling Backup and Recovery, covers how to restore backups and deal with partially dumping data.

Chapter 10, Making Sense of Backups and Replication, looks at PostgreSQL's transaction log and explains what we can do with it to improve our setup and to make things more secure.

Chapter 11, Deciding on Useful Extensions, discusses some of the most widespread extensions for PostgreSQL.

Chapter 12, Troubleshooting PostgreSQL, focuses on approaching an unknown database, identifying the key bottlenecks, handling storage corruption, and inspecting broken replicas.

Chapter 13, Migrating to PostgreSQL, is about moving from other databases to PostgreSQL.

To get the most out of this book

This book has been written for a broad audience. In order to follow the examples presented in this book, it makes sense to have at least some experience with SQL and maybe even PostgreSQL in general (although this is not a strict requirement). In general, it is a good idea to be familiar with the UNIX command line.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "I will add one row to the table using the a simple INSERT command."

Any command-line input or output is written as follows:

test=# CREATE TABLE t_test (id int); CREATE TABLE test=# INSERT INTO t_test VALUES (0); INSERT 0 1

Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

PostgreSQL Overview

It has been a while since I embarked on writing another book on PostgreSQL. I have come a long way and I am proud to have made it to the third release of Mastering PostgreSQL, which is now covering all the cool features present PostgreSQL 11.

PostgreSQL is one of the world's most advanced open source database systems, and it has many features that are widely used by developers and system administrators alike. To begin with PostgreSQL 11, many new features have been added to PostgreSQL which contribute greatly to the success of this exceptional open source product.

This book will cover and discuss many of these cool features in great detail.

In this chapter, you will be introduced to PostgreSQL and the cool new features available in PostgreSQL 11 and beyond. All of the relevant new functionalities will be covered in detail. Given the sheer number of changes made to the code and given the size of the PostgreSQL project, this list of features is, of course, far from complete, so I have tried to focus on the most important aspects, relevant to most people.

The features outlined in this chapter will be split into the following categories:

What is new in PostgreSQL 11?

SQL and developer related

Backup, recovery, and replication

Performance-related topics

What is new in PostgreSQL 11.0?

PostgreSQL 11 was released in the fall of 2018 and provides users with a couple of modern features. These are useful to professionals and beginners alike. PostgreSQL 11 is the second major release following the new numbering scheme introduced by the PostgreSQL community. The next major release of PostgreSQL after version 11 will be 12. The service releases will be called PostgreSQL 11.1, 11.2, 11.3, and so on. Compared to the pre-10 world, this is a major change, which should be pointed out.

Which version should you use? The recommendation is to always use the most recent release. There is no point in getting started with, say, PostgreSQL 9.6 or so anymore. If you are new to PostgreSQL, begin with version 11. There is no such thing known as bugs in PostgreSQL—the community will always provide you with working code so there is no need to be afraid of PostgreSQL 10 or PostgreSQL 11. It just works.

Understanding the new database administration functions

PostgreSQL 11 has many new features that can help the administrator reduce work and run the system more reliably and in a more robust way.

One of the features that is supposed to help people to run even more efficient databases is the ability to configure the size of database instances, commonly known as WAL-segments.

Using configurable WAL-segment sizes

Since PostgreSQL was introduced 20 years ago, the size of a single WAL file has always been 16 MB. In the beginning, it was even a compiled in limit, which was later changed to a compile-time option. Starting with PostgreSQL 11 the size of those WAL segments can be changed at instance creation, which gives administrators an additional knob to configure and optimize PostgreSQL. Here is how it works. The following example shows how to configure the WAL-segment size when running initdb:

initdb -D /pgdata --wal-segsize=32

The initdb command is the tool that is called to create a database instance. It is usually the call you see, if hidden by some operating system scripts provided by your favorite Linux distribution, Windows, or whatever you like to use. However, initdb now has an option to pass the desired WAL-segment size directly to the program.

As I have already mentioned, the default size is 16 MB; hence, in most cases, it makes sense to use larger segments to improve performance. There is no point in using smaller ones unless you are running a really, really small database instance on an embedded system.

What is the real performance impact going to be? As always, this really depends on what you are doing. If you are a running a database system facing 99% reads, the impact of larger WAL-segments will be zero. Yes, you heard it right—ZERO. If you are facing writes while your system is 95% idle and not under severe load, the impact will still be zero or close to zero. You will only be able to witness gains if you are running a heavy, write-intense workload. Then, and only then, might a change be worth it. If you are only running a couple of online forms visited by an occasional customer—why bother? The new feature will only show its strength when there are many changes leading to a lot of WAL.

Larger queryid in pg_stat_statements

If you really want to dig into PostgreSQL performance, pg_stat_statements is the tool to look at. Personally, I consider it to be the gold standard, that is, if you really want to figure out what is going on in the system. The pg_stat_statements command is loaded via shared_preload_libraries as soon as PostgreSQL starts and aggregates statistics about queries running in your server. It will instantly show you if something goes wrong.

The pg_stat_statements command provides a field called queryid, which used to be a 32-bit identifier up to now. In some cases, this has led to problems because it is possible that keys collide in certain cases. Magnus Hagander calculated in one of his papers that after running 3 billion different queries, around 50,000 collisions could be expected. By introducing a 64-bitqueryid, this number is expected to drop to around 0.25 conflicts after 3 billion different types of queries, which is a substantial improvement.

Keep in mind that you might have to update your scripts if you are moving to PostgreSQL 11 and also if you are using pg_stat_statements to track down performance problems.

Improved indexing and better optimization

PostgreSQL 11 offers more than just a couple of improved functions to handle administration. There is also improved functionality around indexes. One of the most important features is related to indexes and statistics.

Parallel index creation

When an index is built in PostgreSQL, the database traditionally used one core to do the job. In many cases, this was not an issue. However, PostgreSQL is used for ever-growing systems and therefore index creation starts to be an issue in many cases. At the moment the community is trying to improve sorting as well. The first step is therefore to allow for the parallel creation of btrees, which has made it into PostgreSQL 11. Future versions of PostgreSQL will also allow provide parallel sorts for normal operations, which is unfortunately not yet supported by PostgreSQL 11.

Parallel index creation can speed up indexing dramatically and we are eager to see future improvements in this area (maybe support for other index types, and so on).

Better cache management

PostgreSQL 11 will also provide you with better ways to manage the I/O cache (the shared buffers). The pg_prewarm command is especially noteworthy.

Improving pg_prewarm

The pg_prewarm command allows you to restore the content of the PostgreSQL I/O cache after a restart. It has already been around for quite some time and is widely used by the PostgreSQL user base. In PostgreSQL 11, pg_prewarm has been extended and allows for automatic dumping of the buffer list in regular intervals.

It is also possible to automatically preload the old cache contents so that users will have better database performance after a restart. In particular, systems with a lot of RAM can benefit from these new improvements.

Enhancing windowing functions

Windowing functions and analytics are a cornerstone of any modern SQL implementation and are therefore widely used by professionals. PostgreSQL has provided support for windowing functions for quite some time now. However, there were still some small features proposed by the SQL standard missing. PostgreSQL 11 now fully supports what SQL: 2011 proposes.

The following features have been added:

Range between:

Previously just

ROWS

Now handles values

Exclusion clauses:

Exclude current row

Exclude ties

To demonstrate how the new features work, I have decided to include an example. The code contains two windowing functions. They are explained as follows:

The first one uses what is already available in PostgreSQL 10 and previously.

The second

array_agg

excludes the current row, which is a new feature provided by PostgreSQL 11.

The following code generates five rows and contains two windowing functions:

test=# SELECT *,

array_agg(x) OVER (ORDER BY x ROWS BETWEEN

1 PRECEDING AND 1 FOLLOWING),

array_agg(x) OVER (ORDER BY x ROWS BETWEEN

1 PRECEDING AND 1 FOLLOWING EXCLUDE CURRENT ROW)

FROM generate_series(1, 5) AS x;

x | array_agg | array_agg

---+-----------+-----------

1 | {1,2} | {2}

2 | {1,2,3} | {1,3}

3 | {2,3,4} | {2,4}

4 | {3,4,5} | {3,5}

5 | {4,5} | {4}

(5 rows)

Excluding the current row is a pretty common requirement and should therefore not be underestimated.

Introducing just-in-time compilation

Just-in-time (JIT) compilation is really one of the highlights of PostgreSQL 11. A lot of infrastructure has been added to support even more JIT compilation in the future, and PostgreSQL 11 is the first release that makes full use of this modern technique. Before we dig into details, what is JIT compilation all about? When running a query, a lot of stuff is actually only known at runtime and not at compile time (when PostgreSQL is compiled). Therefore, a traditional compiler is always at a disadvantage because it does not know what will happen at runtime. A JIT compiler already knows a lot more and can react accordingly.

Starting with PostgreSQL 11, you can make use of JIT compilation, which is especially useful for big queries. We will dig into the finer details in the later chapters of this book.

Enhanced partitioning

PostgreSQL 10 introduced the first version of partitioning in PostgreSQL. Of course, we used to have inheritance and all that before. However, PostgreSQL 10 was really the first version that provided a modern version of doing things. PostgreSQL 11 will add some new functionality to this already powerful feature by introducing a couple of new highlights, such as the ability to create a default partition if none of the existing partitions match.

Here is how it works:

postgres=# CREATE TABLE default_part PARTITION OF some_table DEFAULT;

CREATE TABLE

In this case, all the rows that simply don't match anywhere will end up in the default partition.

But there is more. In PostgreSQL, a row could not (easily) be moved from one partition to the other. Suppose you had one partition per country. If a person moved from, say, France to Estonia, you would not do that with a single UPDATE statement. You had to delete the old row and insert a new one. In PostgreSQL 11, this problem has been solved. Rows can now be moved from one partition to some other place in a totally transparent way.

PostgreSQL had many more shortcomings. In the old version, all partitions had to be indexed separately. There was no way to create a single index for all partitions. In PostgreSQL 11, an index added to the parent table will automatically make sure that all child tables are indexed too. This is really beneficial as it becomes less likely that indexes will simply be forgotten. Also, in PostgreSQL 11, you can actually add a global unique index. A partitioned table can therefore enforce a unique constraint.

Up to PostgreSQL 10, we had range partitioning and list partitioning. PostgreSQL 11 adds the ability to do hash partitioning. Here is an example showing how hash partitioning works:

test=# CREATE TABLE tab(i int, t text) PARTITION BY HASH (i);

CREATE TABLE

test=# CREATE table tab_1 PARTITION OF tab

FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE

But there is not just more functionality. There is also a lot of new stuff to improve performance. Partition pruning is now a lot faster and PostgreSQL has the ability to consider partition-wise joins, as well as partition-wise aggregates, which is exactly what is needed for analytics and data warehousing.

Improving ALTER TABLE

The ALTER TABLE command can be used to change the definition of a table. In PostgreSQL 11, the behavior of ALTER TABLE ... ADD COLUMN has been improved substantially. Let us take a look at the details. The following examples shows how columns can be added to a table and how PostgreSQL will handle those new columns:

ALTER TABLE x ADD COLUMN y int;

ALTER TABLE x ADD COLUMN z int DEFAULT 57;

The first command in the listing has always been fast, the reason being that in PostgreSQL, the default value of a column is NULL. So what PostgreSQL does is it adds a column to the system catalog without actually touching storage. The column will be added to the end of the table so if the row is too short on disk, we know that it will be NULL anyway. In other words, even if you add a column to a 10 TB table, the operation will be really fast because the system does not have to change rows on disk.

The situation used to be quite different in the second case. DEFAULT 57 actually does add real data to the row and in PostgreSQL 10 and older, this meant that the database had to rewrite the entire table to add this new default value. If you have a small table, it is not a big deal. However, if your table contains billions of rows, you cannot just lock it up and rewrite it—in a professional online transaction processing (OLTP) system, downtime is out of the question.

Starting with PostgreSQL 11, it is possible to add immutable default values to a table without rewriting the entire table, which greatly reduces the burden of a changing data structure.

Summary

In PostgreSQL 11, a lot of functionalities have been added that allow people to run even more professional applications even faster and more efficiently. Many areas of the database server have been improved and many new professional features have been added. In the future, even more improvements will be made. Of course, the changes listed in this chapter are by far not complete because many small changes were made. In the next chapter, you will learn about indexing and the PostgreSQL cost model, which is highly important if you want to maintain good performance.

Questions

What is the most important feature of PostgreSQL 11?

Actually, this is hard to tell. It really depends on how you are using the database and which features are most important to your application. Everybody has a personal favorite, however. In my case, it is the parallel index creation, which will be really important to customers running giant databases. Still, it is up to you to decide what you like most and what you don't like.

Does PostgreSQL 11 work on my platform?

PostgreSQL 11 works on all common platforms, including but not limited to Linux, Windows, Solaris, AIX, and macOS X. The community tries to cover as many platforms as possible so as not to exclude people from using PostgreSQL. For most common systems, PostgreSQL will even be prepackaged.

Did the license model change?

No, nothing has changed and most likely nothing ever will.

When can we expect PostgreSQL 12?

Usually, a major release can be expected once a year. So the next major release of PostgreSQL 12 will be in the fall of 2019.

Understanding Transactions and Locking

After the first introduction to PostgreSQL 11, we want to focus our attention on the next important topic. Locking is a vital concept for any kind of database. It is not enough to understand just how it works to write proper or better applications, it is also essential from a performance point of view. Without handling locks properly, your applications might not only be slow, they might also be wrong and behave in very unexpected ways. In my opinion, locking is the key to performance and having a good overview will certainly help. Therefore, understanding locking and transactions is important for administrators and developers alike. In this chapter, you will learn about the following topics:

Working with PostgreSQL transactions

Understanding basic locking

Making use of

FOR SHARE

and

FOR UPDATE

Understanding transaction isolation levels

Considering

serializable snapshot isolation

(

SSI

)

transactions

Observing deadlocks and similar issues

Optimizing storage and managing cleanups

At the end of the chapter, you will be able to understand and utilize PostgreSQL transactions in the most efficient way possible.

Working with PostgreSQL transactions

PostgreSQL provides you with a highly advanced transaction machinery that offers countless features to developers and administrators alike. In this section, it is time to look at the basic concept of transactions.

The first important thing to know is that in PostgreSQL, everything is a transaction. If you send a simple query to the server, it is already a transaction. Here is an example:

test=# SELECT now(), now();

now | now -------------------------------+------------------------------ 2018-08-24 16:03:27.174253+02 | 2018-08-24 16:03:27.174253+02(1 row)

In this case, the SELECT statement will be a separate transaction. If the same command is executed again, different timestamps will be returned.

Keep in mind that the now() function will return the transaction time. The SELECT statement will, therefore, always return two identical timestamps. If you want "real time" consider using clock_timestamp() instead of now().

If more than one statement has to be a part of the same transaction, the BEGIN statement must be used, shown as follows:

test=# \h BEGIN

Command: BEGIN

Description: Start a transaction block

Syntax:

BEGIN [ WORK | TRANSACTION ] [ transaction_mode [, ...] ]

where transaction_mode is one of:

ISOLATION LEVEL { SERIALIZABLE | REPEATABLE READ

| READ COMMITTED | READ UNCOMMITTED }

READ WRITE | READ ONLY

[ NOT ] DEFERRABLE

The BEGIN statement will ensure that more than one command will be packed into a transaction. Here is how it works:

test=# BEGIN; BEGIN test=# SELECT now(); now ------------------------------- 2018-08-24 16:04:08.105131+02 (1 row) test=# SELECT now(); now -------------------------------

2018-08-24 16:04:08.105131+02 (1 row) test=# COMMIT; COMMIT

The important point here is that both timestamps will be identical. As mentioned earlier, we are talking about transaction time.

To end the transaction, COMMIT can be used:

test=

# \h COMMIT

Command: COMMIT

Description: Commit the current transaction

Syntax:

COMMIT [ WORK | TRANSACTION ]

There are a couple of syntax elements here. You can just use COMMIT, COMMIT WORK, or COMMIT TRANSACTION. All three options have the same meaning. If this is not enough, there is more:

test=# \h END Command: END Description: commit the current transaction Syntax: END [ WORK | TRANSACTION ]

The END clause is the same as the COMMIT clause.

ROLLBACK is the counterpart of COMMIT. Instead of successfully ending a transaction, it will simply stop the transaction without ever making things visible to other transactions:

test=# \h ROLLBACK

Command: ROLLBACK

Description: Abort the current transaction

Syntax:

ROLLBACK [ WORK | TRANSACTION ]

Some applications use ABORT instead of ROLLBACK. The meaning is the same.

Handling errors inside a transaction

It is not always the case that transactions are correct from beginning to end. Things might just go wrong for whatever reason. However, in PostgreSQL, only error-free transactions can be committed. The following listing shows a failing transaction, which errors out due to a division by zero error:

test=# BEGIN; BEGINtest=# SELECT 1; ?column?

----------

(1 row)

test=# SELECT 1 / 0; ERROR: division by zero test=# SELECT 1; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# COMMIT; ROLLBACK

Note that division by zero did not work out.

In any proper database, an instruction similar to this will instantly error out and make the statement fail.

It is important to point out that PostgreSQL will error out, unlike MySQL, which is far less strict. After an error has occurred, no more instructions will be accepted, even if those instructions are semantically and syntactically correct. It is still possible to issue a COMMIT. However, PostgreSQL will roll back the transaction because it is the only correct thing to be done at this point.

Making use of SAVEPOINT

In professional applications, it can be pretty hard to write reasonably long transactions without ever encountering a single error. To solve the problem, users can utilize something called SAVEPOINT. As the name indicates, a savepoint is a safe place inside a transaction that the application can return to in the event things go terribly wrong. Here is an example:

test=# BEGIN; BEGIN test=# SELECT 1; ?column? ---------- 1 (1 row) test=# SAVEPOINT a; SAVEPOINT test=# SELECT 2 / 0; ERROR: division by zerotest=# SELECT 2; ERROR: current transaction is aborted, commands ignored until end of transaction block test=# ROLLBACK TO SAVEPOINT a; ROLLBACK test=# SELECT 3; ?column? ---------- 3

(1 row) test=# COMMIT; COMMIT

After the first SELECT clause, I decided to create a savepoint to make sure that the application can always return to this point inside the transaction. As you can see, the savepoint has a name, which is referred to later.

After returning to the savepoint called a, the transaction can proceed normally. The code has jumped back before the error, so everything is fine.

The number of savepoints inside a transaction is practically unlimited. We have seen customers with over 250,000 savepoints in a single operation. PostgreSQL can easily handle this.

If you want to remove a savepoint from inside a transaction, there is a RELEASE SAVEPOINT command:

test=# \h RELEASE SAVEPOINTCommand: RELEASE SAVEPOINT

Description: Destroy a previously defined

SAVEPOINT

Syntax:

RELEASE [ SAVEPOINT ] savepoint_name

Many people ask what will happen if you try to reach a savepoint after a transaction has ended? The answer is, the life of a savepoint ends as soon as the transaction ends. In other words, there is no way to return to a certain point in time after the transactions have been completed.

Transactional DDLs

PostgreSQL has a very nice feature that is unfortunately not present in many commercial database systems. In PostgreSQL, it is possible to run DDLs (commands that change the data structure) inside a transaction block. In a typical commercial system, a DDL will implicitly commit the current transaction. This does not occur in PostgreSQL.

Apart from some minor exceptions (DROP DATABASE, CREATE TABLESPACE, DROP TABLESPACE, and so on), all DDLs in PostgreSQL are transactional, which is a huge plus and a real benefit to end users.

Here is an example:

test=# \d

No relations found. test=# BEGIN; BEGIN test=# CREATE TABLE t_test (id int); CREATE TABLE test=# ALTER TABLE t_test ALTER COLUMN id TYPE int8; ALTER TABLE test=# \d t_test Table "public.t_test" Column | Type | Modifiers --------+--------+----------- id | bigint | test=# ROLLBACK; ROLLBACK test=# \d

No relations found.

In this example, a table has been created and modified, and the entire transaction is aborted. As you can see, there is no implicit COMMIT or any other strange behavior. PostgreSQL simply acts as expected.

Transactional DDLs are especially important if you want to deploy software. Just imagine running a content management system (CMS). If a new version is released, you'll want to upgrade. Running the old version would still be OK; running the new version is also OK but you really don't want a mixture of old and new. Therefore, deploying an upgrade in a single transaction is definitely highly beneficial as it upgrades an atomic operation.

In order to facilitate good software practices, we can include several separately coded modules from our source control system into a single deployment transaction.

Avoiding typical mistakes and explicit locking

In my life as a professional PostgreSQL consultant (https://www.cybertec-postgresql.com), I have seen a couple of mistakes that are repeated frequently. If there are constants in life, these typical mistakes are definitely some of the things that never change.

Here is my favorite:

Transaction 1

Transaction 2

BEGIN;

SELECT max(id) FROM product;

User will see 17

User will decide to use 18

INSERT INTO product ... VALUES (18, ...)

COMMIT;

In this case, there will be either a duplicate key violation or two identical entries. Neither variation of the problem is all that appealing.

One way to fix the problem is to use explicit table locking. The following listing shows us the syntax definition of LOCK:

test=# \h LOCK Command: LOCK

Description: lock a table

Syntax:

LOCK [ TABLE ] [ ONLY ] name [ * ] [, ...] [ IN lockmode MODE ] [ NOWAIT ] where

lockmode

is one of the following:

As you can see, PostgreSQL offers eight types of locks to lock an entire table. In PostgreSQL, a lock can be as light as an ACCESS SHARE lock or as heavy as an ACCESS EXCLUSIVE lock. The following list shows what these locks do:

ACCESS SHARE

: This type of lock is taken by reads and conflicts only with

ACCESS EXCLUSIVE

, which is set by

DROP TABLE

and the like. Practically, this means that

SELECT

cannot start if a table is about to be dropped. This also implies that

DROP TABLE

has to wait until a reading transaction is complete.

ROW SHARE

: PostgreSQL takes this kind of lock in the case of

SELECT FOR UPDATE

SELECT FOR SHARE

. It conflicts with

EXCLUSIVE

and

ACCESS EXCLUSIVE

ROW EXCLUSIVE

: This lock is taken by

INSERT

UPDATE

, and

DELETE

. It conflicts with

SHARE ROW EXCLUSIVE

EXCLUSIVE

, and

ACCESS EXCLUSIVE

SHARE UPDATE EXCLUSIVE

: This kind of lock is taken by

CREATE INDEX CONCURRENTLY

ANALYZE

ALTER TABLE

VALIDATE

, and some other flavors of

ALTER TABLE

as well as by

VACUUM

(not

VACUUM FULL

). It conflicts with the

SHARE UPDATE EXCLUSIVE

SHARE ROW EXCLUSIVE

EXCLUSIVE

, and

ACCESS EXCLUSIVE

lock modes.

: When an index is created,

locks will be set. It conflicts with

ROW EXCLUSIVE

SHARE UPDATE EXCLUSIVE

SHARE ROW EXCLUSIVE

EXCLUSIVE

, and

ACCESS EXCLUSIVE

SHARE ROW EXCLUSIVE

: This one is set by

CREATE TRIGGER

and some forms of

ALTER TABLE

and conflicts with everything but

ACCESS SHARE

EXCLUSIVE

: This type of lock is by far the most restrictive one. It protects against reads and writes alike. If this lock is taken by a transaction, nobody else can read or write to the table affected.

ACCESS EXCLUSIVE

: This lock prevents concurrent transactions from reading and writing.

Given the PostgreSQL locking infrastructure, one solution to the max problem outlined previously would be as follows. The example in the listing shows how to lock a table:

BEGIN; LOCK TABLE product IN ACCESS EXCLUSIVE MODE; INSERT INTO product SELECT max(id) + 1, ... FROM product; COMMIT;

Keep in mind that this is a pretty nasty way of doing this kind of operation because nobody else can read or write to the table during your operation. Therefore, ACCESS EXCLUSIVE should be avoided at all costs.