E-Book
44,39 €

PostgreSQL High Performance Cookbook E-Book

Chitij Chauhan

0,0

44,39 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

PostgreSQL is one of the most powerful and easy to use database management systems. It has strong support from the community and is being actively developed with a new release every year. PostgreSQL supports the most advanced features included in SQL standards. It also provides NoSQL capabilities and very rich data types and extensions. All of this makes PostgreSQL a very attractive solution in software systems.
If you run a database, you want it to perform well and you want to be able to secure it. As the world’s most advanced open source database, PostgreSQL has unique built-in ways to achieve these goals. This book will show you a multitude of ways to enhance your database’s performance and give you insights into measuring and optimizing a PostgreSQL database to achieve better performance.
This book is your one-stop guide to elevate your PostgreSQL knowledge to the next level. First, you’ll get familiarized with essential developer/administrator concepts such as load balancing, connection pooling, and distributing connections to multiple nodes. Next, you will explore memory optimization techniques before exploring the security controls offered by PostgreSQL. Then, you will move on to the essential database/server monitoring and replication strategies with PostgreSQL. Finally, you will learn about query processing algorithms.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 418

Veröffentlichungsjahr: 2017

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

PostgreSQL High Performance Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it...

How it works...

There's more...

PostgreSQL High Performance Cookbook

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: March 2017

Production reference: 1240317

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham

B3 2PB, UK.

ISBN 978-1-78528-433-5

www.packtpub.com

Credits

Authors

Chitij Chauhan

Dinesh Kumar

Copy Editor

Safis Editing

Reviewers

Baji Shaik

Feng Tan

Project Coordinator

Nidhi Joshi

Commissioning Editor

Dipika Gaonkar

Proofreader

Safis Editing

Acquisition Editor

Nitin Dasan

Indexer

Aishwarya Gangawane

Content Development Editor

Mayur Pawanikar

Production Coordinator

Shraddha Falebhai

Technical Editor

Dinesh Pawar

About the Authors

Chitij Chauhan currently works as a senior database administrator at an IT-based MNC in Chandigarh. He has over 10 years of work experience in the field of database and system administration, with specialization in MySQL clustering, PostgreSQL, Greenplum, Informix DB2, SQL Server 2008, Sybase, and Oracle. He is a leading expert in the area of database security, with expertise in database security products such as IBM InfoSphere Guardium, Oracle Database Vault, and Imperva.

Dinesh Kumar is an enthusiastic open source developer and has written several open source tools for PostgreSQL. He recently announced pgBucket, a brand new job scheduler for PostgreSQL. He is also a frequent blogger at manojadinesh.blogpsot.com, where he talks more about PostgreSQL. He is currently working as a senior database engineer in OpenSCG and building the PostgreSQL cloud operations. He has more than 6 years of experience as an Oracle and PostgreSQL database administrator and developer, and is currently focusing on PostgreSQL.

Thanks to my loving parents, Sreenivasulu and Vanamma, who raised me in a small village called Viruvuru, which is in the Nellore district of India. Thanks to my loving wife, Manoja, who enlightens my life with her wonderful support. Also, thanks to my friend Baji Shaik and coordinators, Mayur Pawanikar and Nitin Dasan, for their excellent support. Finally, thanks to every PostgreSQL contributor, author, and blogger.

About the Reviewers

Baji Shaik is a database administrator and developer. He is a co-author of PostgreSQL Development Essentials and has tech-reviewed Troubleshooting PostgreSQL by Packt Publishing. He is currently working as a database consultant at OpenSCG. He has an engineering degree in telecommunications, and had started his career as a C# and Java developer. Baji started working with databases in 2011, and over the years he has worked with Oracle, PostgreSQL, and Greenplum. His background spans a length and breadth of expertise and experience in SQL/NoSQL database technologies. He has good knowledge of automation, orchestration, and DevOps in a cloud environment. He likes to watch movies, read books, and write technical blogs. He also loves to spend time with family. Baji is a certified PostgreSQL professional.

Feng Tan is from China. His nickname is Francs. He was a PostgreSQL DBA at SkyMobi (NASDAQ: MOBI) for more than 5 years, where he was maintaining more than 100 PostgreSQL instances. He gave presentations at the China PostgreSQL conference on topics such as Oracle VS PostgreSQL and PostgreSQL 9.4 new features.

Feng Tan likes to share PostgreSQL technology in his blog at http://francs3.blog.163.com/. He is also one of the translators of PostgreSQL 9 Administration Cookbook Chinese Edition.

Currently, he serves as the open source database administrator at China Mobile Group Zhejiang Co. Ltd.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1785284339.

If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Preface

PostgreSQL is one of the most powerful and easy-to-use database management systems. It has strong support from the community and is being actively developed with a new release every year. PostgreSQL supports the most advanced features included in SQL standards. It also provides NoSQL capabilities and very rich data types and extensions. All of this makes PostgreSQL a very attractive solution in software systems. If you run a database, you want it to perform well and you want to be able to secure it. As the world's most advanced open source database, PostgreSQL has unique built-in ways to achieve these goals. This book will show you a multitude of ways to enhance your database's performance and give you insights into measuring and optimizing a PostgreSQL database to achieve better performance.

What this book covers

Chapter 1, Database Benchmarking, deals with a few major system component benchmarkings such as CPU, disk, and IOPS, besides database benchmarking. In this chapter, we will discuss a benchmarking frame called phoronix, along with disk RAID levels.

Chapter 2, Server Configuration and Control, deals with how to control the PostgreSQL instance behavior with the help of a few major configuration parameter settings.

Chapter 3, Device Optimization, discusses CPU, disk, and memory-related parameters. Besides this, we will be discussing memory components of PostgreSQL along with how to analyze the buffer cache contents.

Chapter 4, Monitoring Server Performance, discuss, various Unix/Linux related operating system utilities that can help the DBA in performance analysis and troubleshooting issues.

Chapter 5, Connection Pooling and Database Partitioning, covers connection pooling methods such as pgpool and pgbouncer. Also, we will be discussing a few partitioning techniques that PostgreSQL offers.

Chapter 6, High Availability and Replication, is about various high availability and replication solutions, including some popular third-party replication tools such as Slony, Londiste, and Bucardo. Also, we will be discussing how to set up the PostgreSQL XL cluster.

Chapter 7, Working with Third-Party Replication Management Utilities, discusses different third-party replication management tools, such as repmgr. Also, this chapter covers a backup management tool called Barman, along with WAL management tools, such as walctl.

Chapter 8, Database Monitoring and Performance, shows different aspects of how and what to monitor in the PostgreSQL instance to achieve the better performance. Also, in this chapter, we will be discussing a few troubleshooting techniques that will help the DBA team.

Chapter 9, Vacuum Internals, is about MVCC and how to handle PostgreSQL's transaction wraparound issues. Also, we will see how to control the bloat using snapshot threshold settings.

Chapter 10, Data Migration from Other Databases to PostgreSQL and Upgrading PostgreSQL Cluster, covers heterogeneous replication between Oracle and PostgreSQL using Goldengate.

Chapter 11, Query Optimization, discusses the functionality of PostgreSQL query planner. Besides this, we will be discussing several query processing algorithms with examples.

Chapter 12, Database Indexing, covers various index loop techniques PostgreSQL follows. Besides, we will be discussing various index management methods, such as how to find unused or missing indexes from the database.

What you need for this book

In general, a modern Unix-compatible platform should be able to run PostgreSQL. To make the most out of this book, you also require CentOS 7. The minimum hardware required to install and run PostgreSQL is as follows:

1 GHz processor1 GB of RAM512 MB of HDD

Who this book is for

If you are a developer or administrator with limited PostgreSQL knowledge and want to develop your skills with this great open source database, then this book is ideal for you. Learning how to enhance the database performance is always an exciting topic to everyone, and this book will show you enough ways to enhance the database performance.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows.

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it...

This section contains the steps required to follow the recipe.

How it works...

This section usually consists of a detailed explanation of what happened in the previous section.

There's more...

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "In Linux, tmpfs is a temporary filesystem, which uses the RAM rather than the disk storage." Any command-line input or output is written as follows:

$ phoronix-test-suite benchmark pts/memoryPhoronix Test Suite v6.8.0Installed: pts/ramspeed-1.4.0To Install: pts/stream-1.3.1To Install: pts/cachebench-1.0.0

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Select the product pack Fusion Middleware and the Linux x86-64 platform. Click on the Go button."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/PostgreSQL-High-Performance-Cookbook. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Chapter 1. Database Benchmarking

In this chapter, we will cover the following recipes:

CPU benchmarkingMemory benchmarkingDisk benchmarkingPerforming a seek rate testWorking with the fysnc commit rateChecking IOPSStorage sizingDiscussing RAID levelsConfiguring pgbenchRunning read/write pgbench tests

Introduction

PostgreSQL is renowned in the database management system world. With every PostgreSQL release, it's gaining in popularity due to its advanced features and performance. This cookbook is especially designed to give more information about most of the major features in PostgreSQL, and also how to achieve good performance with the help of proper hardware/software benchmarking tools. This cookbook is also designed to discuss, all the high availability options we can achieve with PostgreSQL, and also give some details about how to migrate your database from other commercial databases.

To benchmark the database server, we need to benchmark several hardware/software components. In this chapter, we will discuss major tools that are especially designed to benchmark a certain component.

I would like to say thanks to the Phoronix Test Suite team, for allowing me to discuss their benchmarking tool. Phoronix is an open source benchmarking framework, which, by default, provides test cases for several hardware/software components, thanks to its extensible architecture, where we can write our own test suite with the set of benchmarking test cases. Phoronix also supports to upload your benchmarking results to http://openbenchmarking.org/, which is a public/private benchmark results repository, where we can compare our your benchmarking results with others.

Note

Go to the following URL for installation instructions for Phoronix Test Suite: http://www.phoronix-test-suite.com/?k=downloads.

CPU benchmarking

In this recipe, let's discuss how to benchmark the CPU speed using various open source benchmarking tools.

Getting ready

One of the ways to benchmark CPU power is by measuring the wall clock time for the submitted task. The task can be like calculating the factorial of the given number, or calculating the nth Fibonacci number, or some other CPU-intensive task.

How to do it...

Let us discuss about how to configure phoronix and sysbench tools to benchmark the CPU:

Phoronix

Phoronix supports a set of CPU tests in a test suite called CPU. This test suite covers multiple CPU-intensive tasks, which are mentioned at the following URL: https://openbenchmarking.org/suite/pts/CPU.

If you want to run this CPU test suite, then you need to execute the Phoronix Test Suite benchmark CPU command as a root user. We can also run a specific test by mentioning its test name. For example, let's run a sample CPU benchmarking test as follows:

$ phoronix-test-suite benchmark pts/himeno Phoronix Test Suite v6.8.0 To Install: pts/himeno-1.2.0 ... 1 Test To Install pts/himeno-1.2.0: Test Installation 1 of 1 1 File Needed Downloading: himenobmtxpa.tar.bz2 Started Run 2 @ 05:53:40 Started Run 3 @ 05:54:35 [Std. Dev: 1.66%] Test Results: 1503.636072 1512.166077 1550.985494 Average: 1522.26 MFLOPS

Phoronix also provides a way to observe the detailed test results via HTML file. Also, it supports the offline generation of PDF, JSON, CSV, and text format outputs. To open these test results in the browser, we need to execute the following command:

$ phoronix-test-suite show-result <Test Name>

The following is a sample screenshot of the results of the preceding command:

sysbench

The sysbench tool provides a CPU task, which calculates the number of prime numbers within a given range and provides the CPU-elapsed time. Let's execute the sysbench command as shown in the following screenshot, to retrieve the CPU measurements:

[root@localhost ~]# sysbench --test=CPU --CPU-max-prime=10000 --num-threads=4 runDoing CPU performance benchmarkThreads started!Done.Maximum prime number checked in CPU test: 10000Test execution summary: total time: 3.2531s total number of events: 10000 total time taken by event execution: 13.0040 per-request statistics: min: 1.10ms avg: 1.30ms max: 8.60ms approx. 95 percentile: 1.43msThreads fairness: events (avg/stddev): 2500.0000/8.46 execution time (avg/stddev): 3.2510/0.00

How it works...

The preceding results are collected from CentOS 7, which was running virtually on a Windows 10 machine. The virtual machine has four processing units (CPU cores) of Intel Core i7-4510U of CPU family six.

Phoronix

The URL http://openbenchmarking.org/ provides a detailed description of each test detail along with its implementation, and would encourage you to read more information about the himeno test case.

sysbench

From the previous results, the system takes 3.2531 seconds to compute the 10,000 prime numbers, with the help of four background threads.

Memory benchmarking

In this recipe, we will be discussing how to benchmark the memory speed using open source tools.

Getting ready

As with the CPU test suite, phoronix supports one another memory test suite, which covers RAM benchmarking. Otherwise, we can also use a dedicated memtest86 benchmarking tool, which performs memory benchmarking during a server bootup phase. Another neat trick would be to create a tmpfs mount point in the RAM and then create a tablespace on it in PostgreSQL. Once we create the tablespace, we can then create in-memory tables, where we can benchmark the table read/write operations. We can also use the dd command to measure the memory read/write operations.

How to do it...

Let us discuss how to install phoronix and how to configure the tmpfs mount point in Linux:

Phoronix

Let's execute the following phoronix command, which will install the memory test suit and perform memory benchmarking. Once the benchmarking is completed, as aforementioned, observe the HTML report:

$ phoronix-test-suite benchmark pts/memoryPhoronix Test Suite v6.8.0 Installed: pts/ramspeed-1.4.0To Install: pts/stream-1.3.1To Install: pts/cachebench-1.0.0

tmpfs

In Linux, tmpfs is a temporary filesystem, which uses the RAM rather than the disk storage. Anything we store in tmpfs will be cleared once we restart the system:

Note

Refer to the URL for more information about tmpfs: https://en.wikipedia.org/wiki/Tmpfs and https://www.jamescoyle.net/knowledge/1659-what-is-tmpfs.

Let's create a new mount point based on tmpfs using the following command:

# mkdir -p /memmount# mount -t tmpfs -o size=1g tmpfs /memmount# df -kh -t tmpfsFilesystem Size Used Avail Use% Mounted ontmpfs 1.9G 96K 1.9G 1% /dev/shmtmpfs 1.9G 8.9M 1.9G 1% /runtmpfs 1.9G 0 1.9G 0% /sys/fs/cgrouptmpfs 1.0G 0 1.0G 0% /memmount

Let's create a new folder in memmount and assign it to the tablespace.

# mkdir -p /memmount/memtabspace# chown -R postgres:postgres /memmount/memtabspace/postgres=# CREATE TABLESPACE memtbs LOCATION '/memmount/memtabspace';CREATE TABLESPACEpostgres=# CREATE TABLE memtable(t INT) TABLESPACE memtbs;CREATE TABLE

Write test

postgres=# INSERT INTO memtable VALUES(generate_series(1, 1000000)); INSERT 0 1000000Time: 1372.763 ms postgres=# SELECT pg_size_pretty(pg_relation_size('memtable'::regclass)); pg_size_pretty---------------- 35 MB(1 row)

From the preceding results, to insert 1 million records it took approximately 1 second with a writing speed of 35 MB per second.

Read test

postgres=# SELECT COUNT(*) FROM memtable;count---------1000000(1 row)Time: 87.333 ms

From the preceding results, to read the 1 million records it took approximately 90 milliseconds with a reading speed of 385 MB per second, which is pretty fast for the local system configuration. The preceding read test was performed after clearing the system cache and by restarting the PostgreQSL instance, which avoids the system buffers.

How it works...

In the preceding tmpfs example, we created an in-memory table, and all the system calls PostgreQSL tries to perform to read/write the data will be directly affecting the memory rather than the disk, which gives a major performance boost. Also, we need to consider to drop these in-memory tablespace, tables after testing, since these objects will physically vanish after system reboot.

Disk benchmarking

In this recipe, we will be discussing how to benchmark the disk speed using open source tools.

Getting ready

The well-known command to perform disk I/O benchmarking is dd. We all use the dd command to measure read/write operations by specifying the required block size, and we also measure the direct I/O by skipping the system write buffers. Similarly, phoronix supports a complete test suite for the disk as CPU and memory that perform different storage-related tests. Another famous disk benchmarking tool is bonnie++, which provides more flexibility in measuring the disk I/O.

How to do it...

Let us discuss how to run the disk benchmarking using phoronix and using bonnie++ testing tools:

Phoronix

To run the complete disk test suite on the system, run the following command:

$ phoronix-test-suite benchmark pts/disk

Phoronix also supports a quick I/O test case, where you can perform an instant disk performance test using the following command test, which is interactive and collects the input, and then runs the test cases:

$ phoronix-test-suite benchmark pts/iozonePhoronix Test Suite v6.8.0 Installed: pts/iozone-1.8.0Disk Test Configuration 1: 4Kb 2: 64Kb 3: 1MB 4: Test All Options Record Size: 1 1: 512MB 2: 2GB 3: 4GB 4: 8GB 5: Test All Options File Size: 1 1: Write Performance 2: Read Performance 3: Test All Options Disk Test: 3

bonnie++

bonnie++ is a filesystem and disk-level benchmarking tool and can perform the same test multiple times. You can install this tool using either yum or apt-get install or installing it via the source code. Let's run the bulk I/O test case using the following arguments, where it tries to create 8 GB files:

$ /usr/local/sbin/bonnie++ -D -d /tmp/ -s 8G -bWriting with putc()...doneWriting intelligently...done...localhost.localdomain,8G,68996,106,14151,53,46772,15,95343,93,123633,16,201.0,7,16,795,58,+++++,+++,733,46,757,57,+++++,+++,592,38

How it works...

Let us discuss how the bonnie++ performs the benchmarking, and what are all the tools bonnie++ offers to understand the benchmarking results:

bonnie++

From the preceding test case, we provided the results the bonnie++ as to use only direct I/O using the -D option. Also, we asked to create 8 GB random files in the /tmp/ location to measure the disk speed. As the final output from bonnie++, we will get CSV values, which we need to feed to the bon_csv2html command, which provides some detailed information about the test results, as shown in the following screenshot:

$ echo "localhost.localdomain,8G,68996,106,14151,53,46772,15,95343,93,123633,16,201.0,7,16,795,58,+++++,+++,733,46,757,57,+++++,+++,592,38"|bon_csv2html > ~/Desktop/bonresults.html

bonnie++ performs three different tests for disk benchmarking. They are read, write and then seek speed. We will be discussing the seek rate in the further topics. The bonnie++ do always recommend to have high number in /sec section in the preceding table, and lower % CPU values for better disk performance. Also, ++++ shows that the test was not performed accurately by bonnie++, as the test was incomplete with the provided arguments. To get the complete results, we need to rerun the same test multiple times using the -n option, where bonnie will get enough time/resources to complete the job.

Working with the fsync commit rate

In this recipe, we will be discussing how to benchmark the fsync speed using open source tools.

Getting ready

Fsync is a system call that flushes the data from system buffers into physical files. In PostgreSQL, whenever a CHECKPOINT operation occurs, it internally initiates the fsync, to flush all the modified system buffers into the respective files. The fsync benchmarking defines the transfer ratio of data from memory to the disk.

How to do it...

To perform fsync benchmarking, we can use a dedicated benchmark test called fs-mark from Phoronix. This fs-mark test was built based on a filesystem benchmarking tool called fs_mark, or fio, which supports several fsync test cases. We can run this fs-mark test case using the following command:

$ phoronix-test-suite benchmark fs-mark FS-Mark 3.3: pts/fs-mark-1.0.1Disk Test Configuration1: 1000 Files, 1MB Size 2: 1000 Files, 1MB Size, No Sync/FSync 3: 5000 Files, 1MB Size, 4 Threads 4: 4000 Files, 32 Sub Dirs, 1MB Size 5: Test All Options Test:

Note

The preceding command failed to install while testing on the local machine. Once I installed glibc-static via yum install, then the test went smooth.

How it works...

Phoronix installs all the binaries on the local machine when we start benchmarking the corresponding test. In the preceding command, we are benchmarking the test fs-mark, where it installs the tool at ~/.phoronix-test-suite/installed-tests/pts/fs-mark-1.0.1/fs_mark-3.3. Let's go to the location, and let's see what fsync tests it supports:

./fs_mark -helpUsage: fs_mark -S Sync Method ( 0:No Sync, 1:fsyncBeforeClose, 2:sync/1_fsync, 3:PostReverseFsync, 4:syncPostReverseFsync, 5:PostFsync, 6:syncPostFsync)

I would encourage you to read the readme file, which exists in the same location, for detailed information about the sync methods. Let's run a simple fs_mark benchmarking by choosing one sync method as shown in the following here:

./fs_mark -w 8096 -S 1 -s 102400 -d /tmp/ -L 3 -n 500# ./fs_mark -w 8096 -S 1 -s 102400 -d /tmp/ -L 3 -n 500# Version 3.3, 1 thread(s) starting at Fri Dec 30 04:26:28 2016# Sync method: INBAND FSYNC: fsync() per file in write loop.# Directories: no subdirectories used# File names: 40 bytes long, (16 initial bytes of time stamp with 24 random bytes at end of name)# Files info: size 102400 bytes, written with an IO size of 8096 bytes per write# App overhead is time in microseconds spent in the test not doing file writing related system calls.FSUse% Count Size Files/sec App Overhead 39 500 102400 156.4 1790339 1000 102400 78.9 2290639 1500 102400 116.2 24269

We ran the preceding test with write files of size 102,400 and block size of 8,096. The number of files it needs to create is 500 and it needs to repeat the test three times by choosing sync method 1, which closes the file after writing the content to disk.

Storage sizing

In this recipe, we will be discussing how to estimate disk growth using the pgbench tool.

Getting ready

One of the best practices to predict the database disk storage capacity is by loading a set of sample data into the application's database, and simulating production kind of actions using pgbench over a long period. For a period of time (every 1 hour), let's collect the database size using pg_database_size() or any native command, which returns the disk usage information. Once we get the periodic intervals for at least 24 hours, then we can find an average disk growth ratio by calculating the average of delta among each interval value.

How to do it...

Prepare the SQL script as follows, which simulates the live application behavior in the database:

Create connection; --- Create/Use pool connection.INSERT operation --- Initial write operation.SELECT pg_sleep(0.01); --- Some application code runs here, and waiting for the next query.UPDATE operation --- Update other tables for the newly inserted records.SELECT pg_sleep(0.1); --- Updating other services which shows the live graphs on the updated records.DELETE operation --- Delete or purge any unnecessary data.SELECT pg_sleep(0.01); --- Some application code overhead.

Let's run the following pgbench test case, with the preceding test file for 24 hours:

$ pgbench -T 86400 -f <script location> -c <number of concurrent connections>

In parallel, let's schedule a job that collects the database size every hour using the pg_database_size() function, also schedule another job to run for every 10 minutes, which run the VACUUM on the database. This VACUUM job takes care of reclaiming the dead tuples logically at database level. However, in production servers, we will not deploy the VACUUM job to run for every 10 minutes, as the autovacuum process takes care of the dead tuples. As this test is not for database performance benchmarking, we can also make autovacuum more aggressive on the database side as well.

How it works...

Once we find the average disk growth per day, we can predict the database growth for the next 1 or 2 years. However, the database write rate also increases with the business growth. So, we need to deploy the database growth script or we need to analyze any disk storage trends from the monitoring tool to make a better prediction of the storage size.

Discussing RAID levels

In this recipe, we will be discussing about various RAID levels and their unique usage.

Getting ready

In this recipe, we will be discussing several RAID levels, which we configure for database requirements. RAID (Redundant Array of Interdependent Disks) has a dedicated hardware controller to deal with multiple disks, including a separate processor along with a battery backup cache, where data can be flushed to disk properly when a power failure occurs.

How to do it...

RAID levels can be differentiated as per their configurations. RAID supports configuration techniques such as striping, mirroring, and parity to improve the disk storage performance, or high availability. The most popular RAID levels are zero to six, and each level provides its own kind of disk storage capacity, read/write performance and high availability. The common RAID levels we configure for DBMS are 0, 1, 5, 6, or 10 (1 and 0).

How it works...

Let us discuss about how the mostly used RAID level works:

RAID 0

This configuration only focuses on read/write performance by striping the data across multiple devices. With this configuration, we can allocate the complete disk storage for the applications data. The major drawback in this configuration is no high availability. In the case of any single disk failure, it will cause the remaining disks to be useless as they are missing the chunks from the failed disk. This is a not recommended RAID configuration for real-time database systems, but it is a recommended configuration for storing non-critical business data such as historical application logs, database logs, and so on.

RAID 1

This configuration is only to focus on high availability rather than on performance, by broadcasting the data among two disk drives. That is, a single copy of the data will be kept on two disks. If one disk is corrupted, then we can still use the other one for read/write operations. This is also not a recommended configuration for real-time database systems, as it is lacking the write performance. Also, in this configuration, we will be utilizing 50% of the disk to store the actual data, and the rest to keep its duplicated information for high availability. This is a recommended configuration where the durability of data matters when compared with write performance.

RAID 5

This configuration provides more storage and high availability on the disk, by storing the parity blocks across the disks. Unlike RAID 1, it offers more disk space to keep the actual data, as parity blocks are spread among the disks. In any case, if one disk is corrupted, then we can use the parity blocks from the other disk, to fetch the missing data. However, this is also not a recommended configuration, since every read/write operation on the disk needs to process the parity blocks, to get the actual data out of it.

RAID 6

This configuration provides more redundancy than RAID 5 by storing the two parity blocks information for each write operation. That is, if both disks become corrupted, RAID 6 can still get the data from the parity blocks, unlike RAID 5. This configuration is also not recommended for the database systems, as write performance is less as compared than previous RAID levels.

RAID 10

This configuration is the combination of RAID levels 0 and 1. That is, the data will be striped to multiple disks and will be replicated to another disk storage. It is the most recommended RAID level for real-time business applications, where we achieve a better performance than with RAID 1, and higher availability than RAID 0.

Note

For more information about RAID levels, refer to the following URLs:

http://www.slashroot.in/raid-levels-raid0-raid1-raid10-raid5-raid6-complete-tutorialhttps://en.wikipedia.org/wiki/Standard_RAID_levels

Configuring pgbench

In this recipe, we will be discussing how to configure the pgbench to perform various test cases.

Getting ready

Tausende von E-Books und Hörbücher

Ihre Zahl wächst ständig und Sie haben eine Fixpreisgarantie.

Sie haben über uns geschrieben:

PostgreSQL High Performance Cookbook E-Book

Chitij Chauhan

Table of Contents

PostgreSQL High Performance Cookbook

PostgreSQL High Performance Cookbook

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Why subscribe?

Customer Feedback

Preface

What this book covers

What you need for this book

Who this book is for

Sections

Getting ready

How to do it...

How it works...

There's more...

See also

Conventions

Note

Tip

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

Chapter 1. Database Benchmarking

Introduction

Note

CPU benchmarking

Getting ready

How to do it...

Phoronix

sysbench

How it works...

Phoronix

sysbench

Memory benchmarking

Getting ready

How to do it...

Phoronix

tmpfs

Note

Write test

Read test

How it works...

Disk benchmarking

Getting ready

How to do it...

Phoronix

bonnie++

How it works...

bonnie++

Working with the fsync commit rate

Getting ready

How to do it...

Note

How it works...

Storage sizing

Getting ready

How to do it...

How it works...

Discussing RAID levels

Getting ready

How to do it...

How it works...

RAID 0

RAID 1

RAID 5

RAID 6

RAID 10

Note

Configuring pgbench

Getting ready