34,79 €
Mastering Ceph covers all that you need to know to use Ceph effectively. Starting with design goals and planning steps that should be undertaken to ensure successful deployments, you will be guided through to setting up and deploying the Ceph cluster, with the help of orchestration tools. Key areas of Ceph including Bluestore, Erasure coding and cache tiering will be covered with help of examples. Development of applications which use Librados and Distributed computations with shared object classes are also covered. A section on tuning will take you through the process of optimisizing both Ceph and its supporting infrastructure. Finally, you will learn to troubleshoot issues and handle various scenarios where Ceph is likely not to recover on its own.
By the end of the book, you will be able to successfully deploy and operate a resilient high performance Ceph cluster.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 265
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: May 2017
Production reference: 1260517
ISBN 978-1-78588-878-6
www.packtpub.com
Author
Nick Fisk
Copy Editor
Dipti Mankame
Reviewer
Vladimir Franciz S. Blando
Project Coordinator
Judie Jose
Commissioning Editor
Pratik Shah
Proofreader
Safis Editing
Acquisition Editor
Meeta Rajani
Indexer
Rekha Nair
Content Development Editor
Juliana Nair
Graphics
Kirk D'Penha
Technical Editor
Gaurav Suri
Production Coordinator
Aparna Bhagat
Nick Fisk is an IT specialist with a strong history in enterprise storage. Having worked in a variety of roles throughout his career, he has encountered a wide variety of technologies. In 2012, Nick was given the opportunity to focus more toward open source technologies, and this is when his first exposure to Ceph happened. Having seen the potential of Ceph as a storage platform and the benefits of moving away from the traditional closed-stack storage platforms, Nick pursued Ceph with a keen interest.
Throughout the following years, his experience with Ceph increased with the deployment of several clusters and enabled him to spend time in the Ceph community, helping others and improving certain areas of Ceph.
Vladimir Franciz S. Blando is a seasoned IT professional with 18 years' experience of Linux systems administration, including architecture, design, installation, configuration, and maintenance, working on both bare-metal and virtualized environments. He is well versed in Amazon Web Services, OpenStack cloud, Ceph storage, and other cloud technologies.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.comand as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1785888781.
If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Planning for Ceph
What is Ceph?
How Ceph works?
Ceph use cases
Replacing your storage array with Ceph
Performance
Reliability
The use of commodity hardware
Specific use cases
OpenStack- or KVM-based virtualization
Large bulk block storage
Object storage
Object storage with custom application
Distributed filesystem - web farm
Distributed filesystem -SMB file server replacement
Infrastructure design
SSDs
Consumer
Prosumer
Enterprise SSDs
Enterprise -read intensive
Enterprise - general usage
Enterprise -write intensive
Memory
CPU
Disks
Networking
10G networking requirement
Network design
OSD node sizes
Failure domains
Price
Power supplies
How to plan a successful Ceph implementation
Understanding your requirements and how it relates to Ceph
Defining goals so that you can gauge if the project is a success
Choosing your hardware
Training yourself and your team to use Ceph
Running PoC to determine if Ceph has met the requirements
Following best practices to deploy your cluster
Defininga change management process
Creating a backup and recovery plan
Summary
Deploying Ceph
Preparing your environment with Vagrant and VirtualBox
System requirements
Obtaining and installing VirtualBox
Setting up Vagrant
The ceph-deploy tool
Orchestration
Ansible
Installing Ansible
Creating your inventoryfile
Variables
Testing
A very simple playbook
Adding the Ceph Ansible modules
Deploying a test cluster with Ansible
Change and configuration management
Summary
BlueStore
What is BlueStore?
Why was it needed?
Ceph's requirements
Filestore limitations
Why is BlueStore the solution?
How BlueStore works
RocksDB
Deferred writes
BlueFS
How to use BlueStore
Upgrading an OSD in your test cluster
Summary
Erasure Coding for Better Storage Efficiency
What is erasurecoding?
K+M
How does erasure coding work in Ceph?
Algorithms and profiles
Jerasure
ISA
LRC
SHEC
Where can I use erasure coding?
Creating an erasure-coded pool
Overwrites on erasure code pools with Kraken
Demonstration
Troubleshooting the 2147483647 error
Reproducing the problem
Summary
Developing with Librados
What is librados?
How to use librados?
Example librados application
Example of the librados application with atomic operations
Example of the librados application that uses watchers and notifiers
Summary
Distributed Computation with Ceph RADOS Classes
Example applications and the benefits of using RADOS classes
Writing a simple RADOS class in Lua
Writing a RADOS class that simulates distributed computing
Preparing the build environment
RADOS class
Client librados applications
Calculating MD5 on the client
Calculating MD5 on the OSD via RADOS class
Testing
RADOS class caveats
Summary
Monitoring Ceph
Why it is important to monitor Ceph
What should be monitored
Ceph health
Operating system and hardware
Smart stats
Network
Performance counters
PG states -the good, the bad, and the ugly
The good
The active state
The clean state
Scrubbing and deep scrubbing
The bad
The inconsistent state
The backfilling, backfill_wait, recovering, recovery_wait states
The degraded state
Remapped
The ugly
The incomplete state
The down state
The backfill_toofull state
Monitoring Ceph with collectd
Graphite
Grafana
collectd
Deploying collectd with Ansible
Sample Graphite queries for Ceph
Number of Up and In OSDs
Showing most deviant OSD usage
Total number of IOPs across all OSDs
Total MBps across all OSDs
Cluster capacity and usage
Average latency
Custom Ceph collectd plugins
Summary
Tiering with Ceph
Tiering versus caching
How Cephs tiering functionality works
What is a bloom filter
Tiering modes
Writeback
Forward
Read-forward
Proxy
Read-proxy
Uses cases
Creating tiers in Ceph
Tuning tiering
Flushing and eviction
Promotions
Promotion throttling
Monitoring parameters
Tiering with erasure-coded pools
Alternative caching mechanisms
Summary
Tuning Ceph
Latency
Benchmarking
Benchmarking tools
Fio
Sysbench
Ping
iPerf
Network benchmarking
Disk benchmarking
RADOS benchmarking
RBD benchmarking
Recommended tunings
CPU
Filestore
VFS cache pressure
WBThrottle and/or nr_requests
Filestore queue throttling
filestore_queue_low_threshhold
filestore_queue_high_threshhold
filestore_expected_throughput_ops
filestore_queue_high_delay_multiple
filestore_queue_max_delay_multiple
PG Splitting
Scrubbing
OP priorities
The Network
General system tuning
Kernel RBD
Queue Depth
ReadAhead
PG distributions
Summary
Troubleshooting
Repairing inconsistent objects
Full OSDs
Ceph logging
Slow performance
Causes
Increased client workload
Down OSDs
Recovery and backfilling
Scrubbing
Snaptrimming
Hardware or driver issues
Monitoring
iostat
htop
atop
Diagnostics
Extremely slow performance or no IO
Flapping OSDs
Jumbo frames
Failing disks
Slow OSDs
Investigating PGs in a down state
Large monitor databases
Summary
Disaster Recovery
What is a disaster?
Avoiding data loss
What can cause an outage or data loss?
RBD mirroring
The journal
The rbd-mirror daemon
Configuring RBD mirroring
Performing RBD failover
RBD recovery
Lost objects and inactive PGs
Recovering from a complete monitor failure
Using the Cephs object store tool
Investigating asserts
Example assert
Summary
Ceph, a unified, highly resilient, distributed storage system that provides block, object, and file access, has enjoyed a surge in popularity over the last few years. Due to being open source, Ceph has enjoyed rapid adoption both from developers and end users alike, with several well-known corporations being involved in the project. With every new release, the scale of its performance and feature set continues to grow, further enhancing Ceph's status.
With the current ever-increasing data storage requirements and challenges faced by legacy RAID-based systems, Ceph is well placed to offer an answer to these problems. As the world moves forward adopting new cloud technologies and object-based storage, Ceph is ready and waiting to be the driving force as part of the new era of storage technologies.
In this book, we will cover a wide variety of topics, from installing and managing a Ceph cluster to how to recover from disasters, should you ever find yourself in that situation. For those that have interest in getting their applications to talk directly to Ceph, this book will also show you how to develop applications that make use of Ceph's libraries and even how to perform distributed computing by inserting your own code into Ceph. By the end of this book, you will be well on your way to mastering Ceph.
Chapter 1, Planning for Ceph, covers the basics of how Ceph works, its basic architecture, and what some good use cases are. It also discusses the steps that one should take to plan before implementing Ceph, including design goals, proof of concept, and infrastructure design.
Chapter 2, Deploying Ceph, is a no-nonsense step-by-step instructional chapter on how to set up a Ceph cluster. This chapter covers the ceph-deploy tool for testing and goes onto covering Ansible. A section on change management is also included, and it explains how this is essential for the stability of large Ceph clusters. This chapter also serves the purpose of providing you with a common platform you can use for examples later in the book.
Chapter 3, BlueStore, explains that Ceph has to be able to provide atomic operations around data and metadata and how filestore was built to provide these guarantees on top of standard filesystems. We will also cover the problems around this approach. The chapter then introduces BlueStore and explains how it works and the problems that it solves. This will include the components and how they interact with different types of storage devices. We will also have an overview of key-value stores, including RocksDB, which is used by BlueStore. Some of the BlueStore settings and how they interact with different hardware configurations will be discussed.
Chapter 4, Erasure Coding for Better Storage Efficiency, covers how erasure coding works and how it's implemented in Ceph, including explanations of RADOS pool parameters and erasure coding profiles. A reference to the changes in the Kraken release will highlight the possibility of append-overwrites to erasure pools, which will allow RBDs to directly function on erasure-coded pools. Performance considerations will also be explained. This will include references to BlueStore, as it is required for sufficient performance. Finally, we have step-by-step instructions on actually setting up erasure coding on a pool, which can be used as a mechanical reference for sysadmins.
Chapter 5, Developing with Librados, explains how Librados is used to build applications that can interact directly with a Ceph cluster. It then moves onto several different examples of using Librados in different languages to give you an idea of how it can be used, including atomic transactions.
Chapter 6, Distributed Computation with Ceph RADOS Classes, discusses the benefits of moving processing directly into the OSD to effectively perform distributed computing. It then covers how to get started with RADOS classes by building simple ones with Lua. It then covers how to build your own C++ RADOS class into the Ceph source tree and conduct benchmarks against performing processing on the client versus the OSD.
Chapter 7, Monitoring Ceph, starts with a description of why monitoring is important and discusses the difference between alerting and monitoring. The chapter will then cover how to obtain performance counters from all the Ceph components and explain what some of the key counters mean and how to convert them into usable values.
Chapter 8, Tiering with Ceph, explains how RADOS tiering works in Ceph, where it should be used, and its pitfalls. It takes you step-by-step through configuring tiering on a Ceph cluster and finally covers the tuning options to extract the best performance for tiering. An example using Graphite will demonstrate the value of being able to manipulate captured data to provide more meaningful output in graph form.
Chapter 9, Tuning Ceph, starts with a brief overview of how to tune Ceph and the operating system. It also covers basic concepts of avoiding trying to tune something that is not a bottleneck. It will also cover the areas that you may wish to tune and establish how to gauge the success of tuning attempts. Finally, it will show you how to benchmark Ceph and take baseline measurements so that any results achieved are meaningful. We'll discuss different tools and how benchmarks might relate to real-life performance.
Chapter 10, Troubleshooting, explains how although Ceph is largely autonomous in taking care of itself and recovering from failure scenarios, in some cases, human intervention is required. We'll look at common errors and failure scenarios and how to bring Ceph back to full health by troubleshooting them.
Chapter 11, Disaster Recovery, covers situations when Ceph is in such a state that there is a complete loss of service or data loss has occurred. Less familiar recovery techniques are required to restore access to the cluster and, hopefully, recover data. This chapter arms you with the knowledge to attempt recovery in these scenarios.
This book assumes medium-level knowledge of Linux operating systems and basic knowledge of storage technologies and networking. Although the book will go through a simple multi-node setup of a Ceph cluster, it is advisable that you have some prior experience of using Ceph. Although the book uses VirtualBox, feel free to use any other lab environment, such as VMware Workstation.
This book requires that you have enough resources to run the whole Ceph lab environment. The minimum hardware or virtual requirements are as follows:
CPU: 2 cores
Memory: 8 GB RAM
Disk space: 40 GB
You will need the following software:
VirtualBox
Vagrant
Internet connectivity is required to install the packages that are part of the examples in each chapter.
To make use of the content of this book, basic prior knowledge of Ceph is expected. If you feel you don't have that knowledge, it is always possible to catch up with the basics by having a quick read of the major components from the official Ceph documentation, http://docs.ceph.com/docs/master/. This book is essentially intended for Ceph cluster administrators. If you are already running a Ceph cluster, this book can help you to gain a better understanding.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Ceph. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/MasteringCeph_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/supportand enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
The first chapter of this book covers all the areas you need to consider when looking to deploy a Ceph cluster from initial planning stages to hardware choices. Topics covered in this chapter are as follows:
What Ceph is and how it works
Good use cases for Ceph and important considerations
Advice and best practices on infrastructure design
Ideas around planning a Ceph project
Ceph is an open source, distributed, scale-out, software-defined storage (SDS) system, which can provide block, object, and file storage. Through the use of the Controlled Replication Under Scalable Hashing (CRUSH) algorithm, Ceph eliminates the need for centralized metadata and can distribute the load across all the nodes in the cluster. Since CRUSH is an algorithm, data placement is calculated rather than being based on table lookups and can scale to hundreds of petabytes without the risk of bottlenecks and the associated single points of failure. Clients also form direct connections with the server storing the requested data and so their is no centralised bottleneck in the data path.
Ceph provides the three main types of storage, being block via RADOS Block Devices (RBD), file via Ceph Filesystem (CephFS), and object via the Reliable Autonomous Distributed Object Store (RADOS) gateway, which provides Simple Storage Service (S3) and Swift compatible storage.
Ceph is a pure SDS solution and as such means that you are free to run it on commodity hardware as long as it provides the correct guarantees around data consistency. More information on the recommended types of hardware can be found later on in this chapter. This is a major development in the storage industry which has typically suffered from strict vendor lock-in. Although there have been numerous open source projects to provide storage services. Very few of them have been able to offer the scale and high resilience of Ceph, without requiring propriety hardware.
It should be noted that Ceph prefers consistency as per the CAP theorem and will try at all costs to make protecting your data the biggest priority over availability in the event of a partition.
The core storage layer in Ceph is RADOS, which provides, as the name suggests, an object store on which the higher level storage protocols are built and distributed. The RADOS layer in Ceph comprises a number of OSDs. Each OSD is completely independent and forms peer-to-peer relationships to form a cluster. Each OSD is typically mapped to a single physical disk via a basic host bus adapter (HBA) in contrast to the traditional approach of presenting a number of disks via a RedundantArray of IndependentDisks (RAID) controller to the OS.
The other key component in a Ceph cluster are the monitors; these are responsible for forming cluster quorum via the use of Paxos. By forming quorum the monitors can be sure that are in a state where they are allowed to make authoritative decisions for the cluster and avoid split brain scenarios. The monitors are not directly involved in the data path and do not have the same performance requirements as OSDs. They are mainly used to provide a known cluster state including membership, configuration, and statistics via the use of various cluster maps. These cluster maps are used by both Ceph cluster components and clients to describe the cluster topology and enable data to be safely stored in the right location.
Due to the scale that Ceph is intended to be operated at, one can appreciate that tracking the state and placement of every single object in the cluster would become computationally very expensive. Ceph solves this problem using CRUSH to place the objects into groups of objects named placement groups (PGs). This reduces the need to track millions of objects to a much more manageable number in the thousands range.
Librados is a Ceph library that can be used to build applications that interact directly with the RADOS cluster to store and retrieve objects.
For more information on how the internals of Ceph work, it would be strongly recommended to read the official Ceph documentation and also the thesis written by Sage Weil, the creator and primary architect of Ceph.
Before jumping into a specific use case, let's cover some key points that should be understood and considered before thinking about deploying a Ceph cluster.
Ceph should not be compared with a traditional scale-up storage array. It is fundamentally different, and trying to shoehorn Ceph into that role using existing knowledge, infrastructure, and expectation will lead to disappointment. Ceph is Software Defined Storage (SDS) whose internal data movements operate over TCP/IP networking. This introduces several extra layers of technology and complexity compared with a SAS cable at the rear of a traditional storage array.
Due to Ceph's distributed approach, it can offer unrestrained performance compared with scale-up storage arrays, which typically have to funnel all I/O through a pair of controller heads. Although technology has constantly been providing new faster CPUs and faster network speeds, there is still a limit to the performance you can expect to achieve with just a pair of controllers. With recent advances in flash technology combined with new interfaces such as Non-volatile Memory Express (NVMe), the scale-out nature of Ceph provides a linear increase in CPU and network resource with every added OSD node.
Let us also consider where Ceph is not a good fit for performance, and this is mainly around use cases where extremely low latency is desired. For the very reason that enables Ceph to become a scale-out solution, it also means that low latency performance will suffer. The overhead of software and additional network hops means that latency will tend to be about double that of a traditional storage array and 10 times that of local storage. A thought should be given to selecting the best technology for given performance requirements. That said, a well-designed and tuned Ceph cluster should be able to meet performance requirements in all but the most extreme cases.
Ceph is designed to provide a highly fault-tolerant storage system by the scale-out nature of its components. Although no individual component is highly available, when clustered together any component should be able to fail without causing an inability to service client requests. In fact, as your Ceph cluster grows, the failure of individual components should be expected and become part of normal operating conditions. However, Ceph's ability to provide a resilient cluster should not be an invitation to compromise on hardware or design choice. Doing so will likely lead to failure. There are several factors that Ceph assumes your hardware will meet, which are covered later in this chapter.
Unlike RAID where disk rebuilds with larger disks can now stretch into time periods measured in weeks, Ceph will often recover from single disk failures in a matter of hours. With the increasing trend of larger capacity disks, Ceph offers numerous advantages to both the reliability and degraded performance when compared with a traditional storage array.
Ceph is designed to be run on commodity hardware, which gives the ability to be able to design and build a cluster without the premium demanded by traditional tier 1 storage and server vendors. This can be both a blessing and a curse. Be able to choose your own hardware that allows you to build your Ceph infrastructure to exactly match your requirements. However, one thing branded hardware does offer is compatibility testing; it's not unknown for strange exotic firmware bugs to be discovered, which can cause very confusing symptoms. A thought should be applied to whether your IT teams have the time and skills to cope with any obscure issues that may crop up with untested hardware solutions.
The use of commodity hardware also protects against the traditional fork lift upgrade model, where the upgrade of a single component often requires the complete replacement of the whole storage array. With Ceph you can replace individual components in a very granular nature, and with automatic data balancing, lengthy data migration periods are avoided. The distributed nature of Ceph means that hardware replacement or upgrades can be done during working hours without effecting service availability.
We will now cover some of the more common uses cases for Ceph and discuss some of the reasons behind them.
Ceph is the perfect match to provide storage to an OpenStack environment. In fact, Ceph currently is the most popular choice. OpenStack Cinder
