E-Book
35,99 €

PostgreSQL Replication, Second Edition E-Book

Hans-Jürgen Schönig

0,0

35,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

This book is ideal for PostgreSQL administrators who want to set up and understand replication. By the end of the book, you will be able to make your databases more robust and secure by getting to grips with PostgreSQL replication.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 436

Veröffentlichungsjahr: 2015

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Mastering PostgreSQL 17

Hans-Jürgen Schönig

Mastering PostgreSQL 9.6

Hans-Jürgen Schönig

Mastering PostgreSQL 15

Hans-Jürgen Schönig

Mastering PostgreSQL 10

Hans-Jürgen Schönig

Mastering PostgreSQL 11

Hans-Jürgen Schönig

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Kopf schlägt Kapital

Günter Faltin

Der größte Raubzug der Geschichte

Matthias Weik

Der Mann und das Holz

Lars Mytting

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Leseprobe

PostgreSQL Replication Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Errata

Piracy

Questions

1. Understanding the Concepts of Replication

The CAP theorem and physical limitations of replication

Understanding the CAP theorem

Understanding the limits of physics

Different types of replication

Synchronous versus asynchronous replication

Considering performance issues

Understanding replication and data loss

Single-master versus multimaster replication

Logical versus physical replication

When to use physical replication

When to use logical replication

Using sharding and data distribution

Understanding the purpose of sharding

Designing a sharded system

Querying different fields

Pros and cons of sharding

Choosing between sharding and redundancy

Increasing and decreasing the size of a cluster

Combining sharding and replication

Various sharding solutions

PostgreSQL-based sharding

Summary

2. Understanding the PostgreSQL Transaction Log

How PostgreSQL writes data

The PostgreSQL disk layout

Looking into the data directory

PG_VERSION – the PostgreSQL version number

base – the actual data directory

Growing data files

Performing I/O in chunks

Relation forks

global – the global data

Dealing with standalone data files

pg_clog – the commit log

pg_dynshmem – shared memory

pg_hba.conf – host-based network configuration

pg_ident.conf – ident authentication

pg_logical – logical decoding

pg_multixact – multitransaction status data

pg_notify – LISTEN/NOTIFY data

pg_replslot – replication slots

pg_serial – information about committed serializable transactions

pg_snapshot – exported snapshots

pg_stat – permanent statistics

pg_stat_tmp – temporary statistics data

pg_subtrans – subtransaction data

pg_tblspc – symbolic links to tablespaces

pg_twophase – information about prepared statements

pg_xlog – the PostgreSQL transaction log (WAL)

postgresql.conf – the central PostgreSQL configuration file

Writing one row of data

A simple INSERT statement

Crashing during WAL writing

Crashing after WAL writing

Read consistency

The purpose of the shared buffer

Mixed reads and writes

The format of the XLOG

The XLOG and replication

Understanding consistency and data loss

All the way to the disk

From memory to memory

From the memory to the disk

A word about batteries

Beyond the fsync function

PostgreSQL consistency levels

Tuning checkpoints and the XLOG

Understanding the checkpoints

Configuring the checkpoints

Segments and timeouts

To write or not to write?

Scenario 1 – storing stock market data

Scenario 2 – bulk loading

Scenario 3 – I/O spikes and throughput considerations

Conclusion

Tweaking WAL buffers

Experiencing the XLOG in action

Understanding the XLOG records

Making the XLOG deterministic

Making the XLOG reliable

LSNs and shared buffer interaction

Debugging the XLOG and putting it all together

Making use of replication slots

Physical replication slots

Logical replication slots

Configuring replication identities

Summary

3. Understanding Point-in-time Recovery

Understanding the purpose of PITR

Moving to the bigger picture

Archiving the transaction log

Taking base backups

Using pg_basebackup

Modifying pg_hba.conf

Signaling the master server

pg_basebackup – basic features

Backup throttling

pg_basebackup – self-sufficient backups

Making use of traditional methods to create base backups

Tablespace issues

Keeping an eye on the network bandwidth

Replaying the transaction log

Performing a basic recovery

More sophisticated positioning in the XLOG

Cleaning up the XLOG on the way

Switching the XLOG files

Summary

4. Setting Up Asynchronous Replication

Setting up streaming replication

Tweaking the config files on the master

Handling pg_basebackup and recovery.conf

Making the slave readable

The underlying protocol

Configuring a cascaded replication

Turning slaves into masters

Mixing streaming-based and file-based recovery

The master configuration

The slave configuration

Error scenarios

Network connection between the master and slave is dead

Rebooting the slave

Rebooting the master

Corrupted XLOG in the archive

Making streaming-only replication more robust

Using wal_keep_segments

Utilizing replication slots

Efficient cleanup and the end of recovery

Gaining control over the restart points

Tweaking the end of your recovery

Conflict management

Dealing with timelines

Delayed replicas

Handling crashes

Summary

5. Setting Up Synchronous Replication

Synchronous replication setup

Understanding the downside to synchronous replication

Understanding the application_name parameter

Making synchronous replication work

Checking the replication

Understanding performance issues

Setting synchronous_commit to on

Setting synchronous_commit to remote_write

Setting synchronous_commit to off

Setting synchronous_commit to local

Changing durability settings on the fly

Understanding the practical implications and performance

Redundancy and stopping replication

Summary

6. Monitoring Your Setup

Checking your archive

Checking archive_command

Monitoring the transaction log archive

Checking pg_stat_replication

Relevant fields in pg_stat_replication

Checking for operating system processes

Checking for replication slots

Dealing with monitoring tools

Installing check_postgres

Deciding on a monitoring strategy

Summary

7. Understanding Linux High Availability

Understanding the purpose of High Availability

Measuring availability

Durability and availability

Detecting failures

The split-brain syndrome

Understanding Linux-HA

Corosync

Pacemaker

Resource agents / fence agents

PCS

The PostgreSQL resource agent

Setting up a simple HA cluster

Preparing the servers

Installing the necessary software

Configuring the clustering software

Preparing for the PostgreSQL installation

Syncing the standby

Configuring the cluster

Configuring cluster resources

Configuring the constraints

Setting up fencing

Verifying the setup

Common maintenance tasks

Performing maintenance on a single node

Forcing a failover

Recovering from failed PostgreSQL starts

Performing cluster-wide maintenance

Resynchronizing after master failure

Summary

8. Working with PgBouncer

Understanding the fundamental PgBouncer concepts

Installing PgBouncer

Configuring your first PgBouncer setup

Writing a simple config file and starting PgBouncer up

Dispatching requests

More basic settings

Handling pool sizes

max_client_conn

default_pool_size

min_pool_size

reserve_pool_size

pool_size

Authentication

Connecting to PgBouncer

Java issues

Pool modes

Cleanup issues

Improving performance

A simple benchmark

Maintaining PgBouncer

Configuring the admin interface

Using the management database

Extracting runtime information

Suspending and resuming operations

Summary

9. Working with pgpool

Installing pgpool

Installing additional modules

Understanding the features of pgpool

Understanding the pgpool architecture

Setting up replication and load balancing

Password authentication

Firing up pgpool and testing the setup

Attaching hosts

Checking the replication

Running pgpool with streaming replication

Optimizing the pgpool configuration for master/slave mode

Dealing with failovers and High Availability

Using PostgreSQL streaming and Linux HA

pgpool mechanisms for High Availability and failover

Summary

10. Configuring Slony

Installing Slony

Understanding how Slony works

Dealing with logical replication

The slon daemon

Replicating your first database

Deploying DDLs

Adding tables to replication and managing problems

Performing failovers

Planned failovers

Unplanned failovers

Summary

11. Using SkyTools

Installing SkyTools

Dissecting SkyTools

Managing pgq queues

Running pgq

Creating queues and adding data

Adding consumers

Configuring the ticker

Consuming messages

Dropping queues

Using pgq for large projects

Using Londiste to replicate data

Replicating our first table

A word about walmgr

Summary

12. Working with Postgres-XC

Understanding the Postgres-XC architecture

Data nodes

GTM

Coordinators

GTM proxy

Installing Postgres-XC

Configuring a simple cluster

Creating the GTM

Optimizing for performance

Dispatching the tables

Optimizing joins

Optimizing for warehousing

Creating a GTM proxy

Creating tables and issuing queries

Adding nodes

Rebalancing data

Handling failovers and dropping nodes

Handling node failovers

Replacing the nodes

Running a GTM standby

Summary

13. Scaling with PL/Proxy

Understanding the basic concepts

Dealing with the bigger picture

Partitioning the data

Setting up PL/Proxy

A basic example

Partitioned reads and writes

Extending and handling clusters in a clever way

Adding and moving partitions

Increasing the availability

Managing foreign keys

Upgrading the PL/Proxy nodes

Summary

14. Scaling with BDR

Understanding BDR replication concepts

Understanding eventual consistency

Handling conflicts

Distributing sequences

Handling DDLs

Use cases for BDR

Good use cases for BDR

Bad use cases for BDR

Logical decoding does the trick

Installing BDR

Installing binary packages

Setting up a simple cluster

Arranging storage

Creating database instances

Loading modules and firing up the cluster

Checking your setup

Handling conflicts

Understanding sets

Unidirectional replication

Handling data tables

Controlling replication

Summary

15. Working with Walbouncer

The concepts of walbouncer

Filtering XLOG

Installing walbouncer

Configuring walbouncer

Creating a base backup

Firing up walbouncer

Using additional configuration options

Adjusting filtering rules

Removing and filtering objects

Adding objects to slaves

Summary

Index

PostgreSQL Replication Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: August 2013

Second edition: July 2015

Production reference: 1240715

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78355-060-9

www.packtpub.com

Credits

Author

Hans-Jürgen Schönig

Reviewers

Swathi Kurunji

Jeff Lawson

Maurício Linhares

Shaun M. Thomas

Tomas Vondra

Commissioning Editor

Kartikey Pandey

Acquisition Editor

Larissa Pinto

Content Development Editor

Nikhil Potdukhe

Technical Editor

Manali Gonsalves

Copy Editors

Dipti Mankame

Vikrant Phadke

Project Coordinator

Vijay Kushlani

Proofreader

Safis Editing

Indexer

Priya Sane

Graphics

Sheetal Aute

Production Coordinator

Komal Ramchandani

Cover Work

Komal Ramchandani

About the Author

Hans-Jürgen Schönig has 15 years of experience with PostgreSQL. He is the CEO of a PostgreSQL consulting and support company called Cybertec Schönig & Schönig GmbH (www.postgresql-support.de). It has successfully served countless customers around the globe.

Before founding Cybertec Schönig & Schönig GmbH in 2000, he worked as a database developer at a private research company focusing on the Austrian labor market, where he primarily focused on data mining and forecast models. He has also written several books about PostgreSQL.

This book is dedicated to all the members of the Cybertec family, who have supported me over the years and have proven to be true professionals. Without my fellow technicians here at Cybertec, this book would not have existed. I especially want to thank Ants Aasma for his technical input and Florian Ziegler for helping out with the proofreading and graphical stuff.

Special thanks also go to my girl, Sonja Städtner, who has given me all the personal support. Somehow, she managed to make me go to sleep when I was up late at night working on the initial drafts.

About the Reviewers

Swathi Kurunji is a software engineer at Actian Corporation. She recently completed her PhD in computer science from the University of Massachusetts Lowell (UMass Lowell), USA. She has a keen interest in database systems. Her PhD research involved query optimization, big data analysis, data warehousing, and cloud computing. Swathi has shown excellence in her field of study through research publications at international conferences and in journals. She has received awards and scholarships from UMass Lowell for research and academics.

Swathi also has a master's of science degree in computer science from UMass Lowell and a bachelor's of engineering degree in information science from KVGCE in India. During her studies at UMass Lowell, she worked as a teaching assistant, helping professors in teaching classes and labs, designing projects, and grading exams.

She has worked as a software development intern with IT companies such as EMC and SAP. At EMC, she gained experience on Apache Cassandra data modeling and performance analysis. At SAP, she gained experience on the infrastructure/cluster management components of the Sybase IQ product. She has also worked with Wipro Technologies in India as a project engineer, managing application servers.

She has extensive experience with database systems such as Apache Cassandra, Sybase IQ, Oracle, MySQL, and MS Access. Her interests include software design and development, big data analysis, optimization of databases, and cloud computing. Her LinkedIn profile is http://www.linkedin.com/pub/swathi-kurunji/49/578/30a/.

Swathi has previously reviewed two books, Cassandra Data Modeling and Analysis and Mastering Apache Cassandra, both by Packt Publishing.

I would like to thank my husband and my family for all their support.

Jeff Lawson has been a user and fan of PostgreSQL since he noticed it in 2001. Over the years, he has also developed and deployed applications for IBM DB2, Oracle, MySQL, Microsoft SQL Server, Sybase, and others, but he has always preferred PostgreSQL because of its balance of features and openness. Much of his experience has spanned development for Internet-facing websites and projects that required highly scalable databases with high availability or provisions for disaster recovery.

Jeff currently works as the director of software development at FlightAware, which is an airplane tracking website that uses PostgreSQL and other pieces of open source software to store and analyze the positions of thousands of flights that fly worldwide every day. He has extensive experience in software architecture, data security, and networking protocol design because of his roles as a software engineer at Univa/United Devices, Microsoft, NASA's Jet Propulsion Laboratory, and WolfeTech. He was a founder of distributed.net, which pioneered distributed computing in the 1990s, and continues to serve as the chief of operations and a member of the board. He earned a BSc in computer science from Harvey Mudd College.

Jeff is fond of cattle, holds an FAA private pilot certificate, and owns an airplane in Houston, Texas.

Maurício Linhares is a technical leader of the parsing and machine learning team at The Neat Company. At Neat, he helps his team scale their solutions on the cloud and deliver fast results to customers. He is the creator and maintainer of async, a Scala-based PostgreSQL database driver (https://github.com/mauricio/postgresql-async), and has been a PostgreSQL user and proponent for many years.

Shaun M. Thomas has been working with PostgreSQL since late 2000. He has presented at Postgres open conferences in 2011, 2012, and 2014 on topics such as handling extreme throughput, high availability, server redundancy, failover techniques, and system monitoring. With the recent publication of Packt Publishing's PostgreSQL 9 High Availability Cookbook, he hopes to make life easier for DBAs using PostgreSQL in enterprise environments.

Currently, Shaun serves as the database architect at Peak6, an options trading firm with a PostgreSQL constellation of over 100 instances, one of which is over 15 TB in size.

He wants to prove that PostgreSQL is more than ready for major installations.

Tomas Vondra has been working with PostgreSQL since 2003, and although he had worked with various other databases—both open-source and proprietary—he instantly fell in love with PostgreSQL and the community around it.

He is currently working as an engineer at 2ndQuadrant, one of the companies that provide support, training, and other services related to PostgreSQL. Previously, he worked as a PostgreSQL specialist for GoodData, a company that operates a BI cloud platform built on PostgreSQL. He has extensive experience with performance troubleshooting, tuning, and benchmarking.

In his free time, he usually writes PostgreSQL extensions or patches, or he hacks something related to PostgreSQL.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Since the first edition of PostgreSQL Replication many new technologies have emerged or improved. In the PostgreSQL community, countless people around the globe have been working on important techniques and technologies to make PostgreSQL even more useful and more powerful.

To make sure that readers can enjoy all those new features and powerful tools, I have decided to write a second, improved edition of PostgreSQL Replication. Due to the success of the first edition, the hope is to make this one even more useful to administrators and developers alike around the globe.

All the important new developments have been covered and most chapters have been reworked to make them easier to understand, more complete and absolutely up to date.

I hope that all of you can enjoy this book and benefit from it.

What this book covers

This book will guide you through a variety of topics related to PostgreSQL replication. We will present all the important facts in 15 practical and easy-to-read chapters:

Chapter 1, Understanding the Concepts of Replication, guides you through fundamental replication concepts such as synchronous, as well as asynchronous, replication. You will learn about the physical limitations of replication, which options you have and what kind of distinctions there are.

Chapter 2, Understanding the PostgreSQL Transaction Log, introduces you to the PostgreSQL internal transaction log machinery and presents concepts essential to many replication techniques.

Chapter 3, Understanding Point-in-time Recovery, is the next logical step and outlines how the PostgreSQL transaction log will help you to utilize Point-in-time Recovery to move your database instance back to a desired point in time.

Chapter 4, Setting Up Asynchronous Replication, describes how to configure asynchronous master-slave replication.

Chapter 5, Setting Up Synchronous Replication, is one step beyond asynchronous replication and offers a way to guarantee zero data loss if a node fails. You will learn about all the aspects of synchronous replication.

Chapter 6, Monitoring Your Setup, covers PostgreSQL monitoring.

Chapter 7, Understanding Linux High Availability, presents a basic introduction to Linux-HA and presents a set of ideas for making your systems more available and more secure. Since the first edition, this chapter has been completely rewritten and made a lot more practical.

Chapter 8, Working with PgBouncer, deals with PgBouncer, which is very often used along with PostgreSQL replication. You will learn how to configure PgBouncer and boost the performance of your PostgreSQL infrastructure.

Chapter 9, Working with pgpool, covers one more tool capable of handling replication and PostgreSQL connection pooling.

Chapter 10, Configuring Slony, contains a practical guide to using Slony and shows how you can use this tool fast and efficiently to replicate sets of tables.

Chapter 11, Using SkyTools, offers you an alternative to Slony and outlines how you can introduce generic queues to PostgreSQL and utilize Londiste replication to dispatch data in a large infrastructure.

Chapter 12, Working with Postgres-XC, offers an introduction to a synchronous multimaster replication solution capable of partitioning a query across many nodes inside your cluster while still providing you with a consistent view of the data.

Chapter 13, Scaling with PL/Proxy, describes how you can break the chains and scale out infinitely across a large server farm.

Chapter 14, Scaling with BDR, describes the basic concepts and workings of the BDR replication system. It shows how BDR can be configured and how it operates as the basis for a modern PostgreSQL cluster.

Chapter 15, Working with Walbouncer, shows how transaction log can be replicated partially using the walbouncer tool. It dissects the PostgreSQL XLOG and makes sure that the transaction log stream can be distributed to many nodes in the cluster.

What you need for this book

This guide is a must for everybody interested in PostgreSQL replication. It is a comprehensive book explaining replication in a comprehensive and detailed way. We offer a theoretical background as well as a practical introduction to replication designed to make your daily life a lot easier and definitely more productive.

Who this book is for

This book has been written primary for system administrators and system architects. However, we have also included aspects that can be highly interesting for software developers as well—especially when it comes to highly critical system designs.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Understanding the Concepts of Replication

Replication is an important issue and, in order to get started, it is highly important to understand some basic concepts and theoretical ideas related to replication. In this chapter, you will be introduced to various concepts of replication, and learn which kind of replication is most suitable for which kind of practical scenario. By the end of the chapter, you will be able to judge whether a certain concept is feasible under various circumstances or not.

We will cover the following topics in this chapter:

The CAP theorem and physical limitations of replicationDifferent types of replicationSynchronous and asynchronous replicationSharding and data distribution

The goal of this chapter is to give you a lot of insights into the theoretical concepts. This is truly important in order to judge whether a certain customer requirement is actually technically feasible or not. You will be guided through the fundamental ideas and important concepts related to replication.

The CAP theorem and physical limitations of replication

You might wonder why theory is being covered in such a prominent place in a book that is supposed to be highly practical. Well, there is a very simple reason for that: some nice-looking marketing papers of some commercial database vendors might leave you with the impression that everything is possible and easy to do, without any serious limitation. This is not the case; there are physical limitations that every software vendor has to cope with. There is simply no way around the laws of nature, and shiny marketing cannot help overcome nature. The laws of physics and logic are the same for everybody, regardless of the power of someone's marketing department.

In this section, you will be taught the so-called CAP theorem. Understanding the basic ideas of this theorem is essential to avoid some requirements that cannot be turned into reality.

The CAP theorem was first described by Eric Brewer back in the year 2000. It has quickly developed into one of the most fundamental concepts in the database world. Especially with the rise of NoSQL database systems, Brewer's theorem (as the CAP theorem is often called) has become an important cornerstone of every distributed system.

Understanding the CAP theorem

Before we dig into the details, we need to discuss what CAP actually means. CAP is an abbreviation for the following three concepts:

Consistency: This term indicates whether all the nodes in a cluster see the same data at the same time or not. A read-only node has to see all previously completed reads at any time.Availability: Reads and writes have to succeed all the time. In other words a node has to be available for users at any point of time.Partition tolerance: This means that the system will continue to work even if arbitrary messages are lost on the way. A network partition event occurs when a system is no longer accessible (think of a network connection failure). A different way of considering partition tolerance is to think of it as message passing. If an individual system can no longer send or receive messages from other systems, it means that it has been effectively partitioned out of the network. The guaranteed properties are maintained even when network failures prevent some machines from communicating with others.

Why are these three concepts relevant to normal users? Well, the bad news is that a replicated (or distributed) system can provide only two out of these three features at the same time.

Note

Keep in mind that only two out of the three promises can be fulfilled.

It is theoretically impossible to offer consistency, availability, and partition tolerance at the same time. As you will see later in this book, this can have a significant impact on system layouts that are safe and feasible to use. There is simply no such thing as the solution to all replication-related problems. When you are planning a large-scale system, you might have to come up with different concepts, depending on needs that are specific to your requirements.

Note

PostgreSQL, Oracle, DB2, and so on will provide you with CAp ("consistent" and "available"), while NoSQL systems, such as MongoDB and Cassandra, will provide you with cAP ("available" and "partition tolerant"). This is why NoSQL is often referred to as eventually consistent.

Consider a financial application. You really want to be consistent and partition tolerant. Keeping balances in sync is the highest priority.

Or consider an application collecting a log of weather data from some remote locations. If the data is a couple of minutes late, it is really no problem. In this case, you might want to go for cAP. Availability and partition tolerance might really be the most important things in this case.

Depending on the use, people have to decide what is really important and which attributes (consistency, availability, or partition tolerance) are crucial and which can be neglected.

Keep in mind there is no system which can fulfill all those wishes at the same time (neither open source nor paid software).

Understanding the limits of physics

The speed of light is not just a theoretical issue; it really does have an impact on your daily life. And more importantly, it has a serious implication when it comes to finding the right solution for your cluster.

We all know that there is some sort of cosmic speed limit called the speed of light. So why care? Well, let's do a simple mental experiment. Let's assume for a second that our database server is running at 3 GHz clock speed.

How far can light travel within one clock cycle of your CPU? If you do the math, you will figure out that light travels around 10 cm per clock cycle (in pure vacuum). We can safely assume that an electric signal inside a CPU will be very slow compared to pure light in vacuum. The core idea is, "10 cm in one clock cycle? Well, this is not much at all."

For the sake of our mental experiment, let's now consider various distances:

Distance from one end of the CPU to the otherDistance from your server to some other server next doorDistance from your server in Central Europe to a server somewhere in China

Considering the size of a CPU core on a die, you can assume that you can send a signal (even if it is not traveling anywhere close to the speed of light) from one part of the CPU to some other part quite fast. It simply won't take 1 million clock cycles to add up two numbers that are already in your first level cache on your CPU.

But what happens if you have to send a signal from one server to some other server and back? You can safely assume that sending a signal from server A to server B next door takes a lot longer because the cable is simply a lot longer. In addition to that, network switches and other network components will add some latency as well.

Let's talk about the length of the cable here, and not about its bandwidth.

Sending a message (or a transaction) from Europe to China is, of course, many times more time-consuming than sending some data to a server next door. Again, the important thing here is that the amount of data is not as relevant as the so-called latency, consider the following criteria:

Long-distance transmission: To explain the concept of latency, let's cover a very simple example. Let's assume you are a European and you are sending a letter to China. You will easily accept the fact that the size of your letter is not the limiting factor here. It makes absolutely no difference whether your letter is two or 20 pages long; the time it takes to reach the destination is basically the same. Also, it makes no difference whether you send one, two, or 10 letters at the same time. Given a reasonable numbers of letter, the size of the aircraft required (that is, the bandwidth) to ship the stuff to China is usually not the problem. However, the so-called round trip might very well be an issue. If you rely on the response to your letter from China to continue your work, you will soon find yourself waiting for a long time.Why latency matters: Latency is an important issue. If you send a chunk of data from Europe to China, you should avoid waiting for the response. But if you send a chunk of data from your server to a server in the same rack, you might be able to wait for the response, because your electronic signal will simply be fast enough to make it back in time.

Note

The basic problems of latency described in this section are not PostgreSQL-specific. The very same concepts and physical limitations apply to all types of databases and systems. As mentioned before, this fact is sometimes silently hidden and neglected in shiny commercial marketing papers. Nevertheless, the laws of physics will stand firm. This applies to both commercial and open source software.

The most important point you have to keep in mind here is that bandwidth is not always the magical fix to a performance problem in a replicated environment. In many setups, latency is at least as important as bandwidth.

Different types of replication

Now that you are fully armed with the basic understanding of physical and theoretical limitations, it is time to learn about different types of replication. It is important to have a clear image of these types to make sure that the right choice can be made and the right tool can be chosen. In this section, synchronous as well as asynchronous replication will be covered.

Synchronous versus asynchronous replication

Let's dig into some important concepts now. The first distinction we can make is whether to replicate synchronously or asynchronously.

What does this mean? Let's assume we have two servers and we want to replicate data from one server (the master) to the second server (the slave). The following diagram illustrates the concept of synchronous and asynchronous replication:

We can use a simple transaction like the one shown in the following:

BEGIN; INSERT INTO foo VALUES ('bar'); COMMIT;

In the case of asynchronous replication, the data can be replicated after the transaction has been committed on the master. In other words, the slave is never ahead of the master; and in the case of writing, it is usually a little behind the master. This delay is called lag.

Synchronous replication enforces higher rules of consistency. If you decide to replicate synchronously (how this is done practically will be discussed in Chapter 5, Setting Up Synchronous Replication), the system has to ensure that the data written by the transaction will be at least on two servers at the time the transaction commits. This implies that the slave does not lag behind the master and that the data seen by the end users will be identical on both the servers.

Tip

Some systems will also use a quorum server to decide. So, it is not always about just two or more servers. If a quorum is used, more than half of the servers must agree on an action inside the cluster.

Considering performance issues

As you have learned earlier in the section about the speed of light and latency, sending unnecessary messages over the network can be expensive and time-consuming. If a transaction is replicated in a synchronous way, PostgreSQL has to make sure that the data reaches the second node, and this will lead to latency issues.

Synchronous replication can be more expensive than asynchronous replication in many ways, and therefore, people should think twice about whether this overhead is really needed and justified. In the case of synchronous replication, confirmations from a remote server are needed. This, of course, causes some additional overhead. A lot has been done in PostgreSQL to reduce this overhead as much as possible. However, it is still there.

Tip

Use synchronous replication only when it is really needed.

Understanding replication and data loss

When a transaction is replicated from a master to a slave, many things have to be taken into consideration, especially when it comes to things such as data loss.

Let's assume that we are replicating data asynchronously in the following manner:

A transaction is sent to the master.It commits on the master.The master dies before the commit is sent to the slave.The slave will never get this transaction.

In the case of asynchronous replication, there is a window (lag) during which data can essentially be lost. The size of this window might vary, depending on the type of setup. Its size can be very short (maybe as short as a couple of milliseconds) or long (minutes, hours, or days). The important fact is that data can be lost. A small lag will only make data loss less likely, but any lag larger than zero lag is susceptible to data loss. If data can be lost, we are about to sacrifice the consistency part of CAP (if two servers don't have the same data, they are out of sync).

If you want to make sure that data can never be lost, you have to switch to synchronous replication. As you have already seen in this chapter, a synchronous transaction is synchronous because it will be valid only if it commits to at least two servers.

Single-master versus multimaster replication

A second way to classify various replication setups is to distinguish between single-master and multi-master replication.

"Single-master" means that writes can go to exactly one server, which distributes the data to the slaves inside the setup. Slaves may receive only reads, and no writes.

In contrast to single-master replication, multi-master replication allows writes to all the servers inside a cluster. The following diagram shows how things work at a conceptual level:

The ability to write to any node inside the cluster sounds like an advantage, but it is not necessarily one. The reason for this is that multimaster replication adds a lot of complexity to the system. In the case of only one master, it is totally clear which data is correct and in which direction data will flow, and there are rarely conflicts during replication. Multimaster replication is quite different, as writes can go to many nodes at the same time, and the cluster has to be perfectly aware of conflicts and handle them gracefully. An alterative would be to use locks to solve the problem, but this approach will also have its own challenges.

Tip

Keep in mind that the need to resolve conflicts will cause network traffic, and this can instantly turn into scalability issues caused by latency.

Logical versus physical replication

One more way of classifying replication is to distinguish between logical and physical replication.

The difference is subtle but highly important:

Physical replication means that the system will move data as is to the remote box. So, if something is inserted, the remote box will get data in binary format, not via SQL.Logical replication means that a change, which is equivalent to data coming in, is replicated.

Let's look at an example to fully understand the difference:

test=# CREATE TABLE t_test (t date); CREATE TABLE test=# INSERT INTO t_test VALUES (now()) RETURNING *; t ------------ 2013-02-08 (1 row) INSERT 0 1

We see two transactions going on here. The first transaction creates a table. Once this is done, the second transaction adds a simple date to the table and commits.

In the case of logical replication, the change will be sent to some sort of queue in logical form, so the system does not send plain SQL, but maybe something such as this:

test=# INSERT INTO t_test VALUES ('2013-02-08'); INSERT 0 1

Note that the function call has been replaced with the real value. It would be a total disaster if the slave were to calculate now() once again, because the date on the remote box might be a totally different one.

Tip

Some systems do use statement-based replication as the core technology. MySQL, for instance, uses a so-called bin-log statement to replicate, which is actually not too binary but more like some form of logical replication. Of course, there are also counterparts in the PostgreSQL world, such as pgpool, Londiste, and Bucardo.

Physical replication will work in a totally different way; instead of sending some SQL (or something else) over, which is logically equivalent to the changes made, the system will send binary changes made by PostgreSQL internally.

Here are some of the binary changes our two transactions might have triggered (but by far, this is not a complete list):

Added an 8 K block to pg_class and put a new record there (to indicate that the table is present).Added rows to pg_attribute to store the column names.Performed various changes inside the indexes on those tables.Recorded the commit status, and so on.

The goal of physical replication is to create a copy of your system that is (largely) identical on the physical level. This means that the same data will be in the same place inside your tables on all boxes. In the case of logical replication, the content should be identical, but it makes no difference whether it is in the same place or not.

When to use physical replication

Physical replication is very convenient to use and especially easy to set up. It is widely used when the goal is to have identical replicas of your system (to have a backup or to simply scale up).

In many setups, physical replication is the standard method that exposes the end user to the lowest complexity possible. It is ideal for scaling out the data.

When to use logical replication

Logical replication is usually a little harder to set up, but it offers greater flexibility. It is also especially important when it comes to upgrading an existing database. Physical replication is totally unsuitable for version jumps because you cannot simply rely on the fact that every version of PostgreSQL has the same on-disk layout. The storage format might change over time, and therefore, a binary copy is clearly not feasible for a jump from one version to the next.

Logical replication allows decoupling of the way data is stored from the way it is transported and replicated. Using a neutral protocol, which is not bound to any specific version of PostgreSQL, it is easy to jump from one version to the next.

Since PostgreSQL 9.4, there is something called Logical Decoding. It allows users to extract internal changes sent to the XLOG as SQL again. Logical decoding will be needed for a couple of replication techniques outlined in this book.