Ceph Cookbook - Karan Singh - E-Book

Ceph Cookbook E-Book

Karan Singh

0,0
39,59 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Over 100 effective recipes to help you design, implement, and manage the software-defined and massively scalable Ceph storage system

About This Book

  • Implement a Ceph cluster successfully and gain deep insights into its best practices
  • Harness the abilities of experienced storage administrators and architects, and run your own software-defined storage system
  • This comprehensive, step-by-step guide will show you how to build and manage Ceph storage in production environment

Who This Book Is For

This book is aimed at storage and cloud system engineers, system administrators, and technical architects who are interested in building software-defined storage solutions to power their cloud and virtual infrastructure. If you have basic knowledge of GNU/Linux and storage systems, with no experience of software defined storage solutions and Ceph, but eager to learn this book is for you.

What You Will Learn

  • Understand, install, configure, and manage the Ceph storage system
  • Get to grips with performance tuning and benchmarking, and gain practical tips to run Ceph in production
  • Integrate Ceph with OpenStack Cinder, Glance, and nova components
  • Deep dive into Ceph object storage, including s3, swift, and keystone integration
  • Build a Dropbox-like file sync and share service and Ceph federated gateway setup
  • Gain hands-on experience with Calamari and VSM for cluster monitoring
  • Familiarize yourself with Ceph operations such as maintenance, monitoring, and troubleshooting
  • Understand advanced topics including erasure coding, CRUSH map, cache pool, and system maintenance

In Detail

Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. This cutting-edge technology has been transforming the storage industry, and is evolving rapidly as a leader in software-defined storage space, extending full support to cloud platforms such as Openstack and Cloudstack, including virtualization platforms. It is the most popular storage backend for Openstack, public, and private clouds, so is the first choice for a storage solution. Ceph is backed by RedHat and is developed by a thriving open source community of individual developers as well as several companies across the globe.

This book takes you from a basic knowledge of Ceph to an expert understanding of the most advanced features, walking you through building up a production-grade Ceph storage cluster and helping you develop all the skills you need to plan, deploy, and effectively manage your Ceph cluster. Beginning with the basics, you'll create a Ceph cluster, followed by block, object, and file storage provisioning. Next, you'll get a step-by-step tutorial on integrating it with OpenStack and building a Dropbox-like object storage solution. We'll also take a look at federated architecture and CephFS, and you'll dive into Calamari and VSM for monitoring the Ceph environment. You'll develop expert knowledge on troubleshooting and benchmarking your Ceph storage cluster. Finally, you'll get to grips with the best practices to operate Ceph in a production environment.

Style and approach

This step-by-step guide is filled with practical tutorials, making complex scenarios easy to understand.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 333

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Ceph Cookbook
Credits
Foreword
About the Author
About the Reviewers
www.PacktPub.com
eBooks, discount offers, and more
Why Subscribe?
Preface
What this book covers
What you need for this book
Who this book is for
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Ceph – Introduction and Beyond
Introduction
Ceph Releases
Ceph – the beginning of a new era
Software Defined Storage (SDS)
Cloud storage
Unified next generation storage architecture
RAID – the end of an era
RAID rebuilds are painful
RAID spare disks increases TCO
RAID can be expensive and hardware dependent
The growing RAID group is a challenge
The RAID reliability model is no longer promising
Ceph – the architectural overview
Planning the Ceph deployment
Setting up a virtual infrastructure
Getting ready
How to do it…
Installing and configuring Ceph
Creating Ceph cluster on ceph-node1
How to do it…
Scaling up your Ceph cluster
How to do it…
Using Ceph cluster with a hands-on approach
How to do it…
2. Working with Ceph Block Device
Introduction
Working with Ceph Block Device
Configuring Ceph client
How to do it…
Creating Ceph Block Device
How to do it…
Mapping Ceph Block Device
How to do it…
Ceph RBD resizing
How to do it…
Working with RBD snapshots
How to do it…
Working with RBD Clones
How to do it…
A quick look at OpenStack
Ceph – the best match for OpenStack
Setting up OpenStack
How to do it…
Configuring OpenStack as Ceph clients
How to do it…
Configuring Glance for Ceph backend
How to do it…
Configuring Cinder for Ceph backend
How to do it…
Configuring Nova to attach Ceph RBD
How to do it…
Configuring Nova to boot instances from Ceph RBD
How to do it…
3. Working with Ceph Object Storage
Introduction
Understanding Ceph object storage
RADOS Gateway standard setup, installation, and configuration
Setting up the RADOS Gateway node
How to do it…
Installing the RADOS Gateway
How to do it…
Configuring RADOS Gateway
How to do it…
Creating the radosgw user
How to do it…
See also…
Accessing Ceph object storage using S3 API
How to do it…
Configuring DNS
Configuring the s3cmd client
Accessing Ceph object storage using the Swift API
How to do it
See also…
Integrating RADOS Gateway with OpenStack Keystone
How to do it…
Configuring Ceph federated gateways
How to do it…
Testing the radosgw federated configuration
How to do it…
Building file sync and share service using RGW
Getting ready…
How to do it…
See also…
4. Working with the Ceph Filesystem
Introduction
Understanding Ceph Filesystem and MDS
Deploying Ceph MDS
How to do it…
Accessing CephFS via kernel driver
How to do it…
Accessing CephFS via FUSE client
How to do it…
Exporting Ceph Filesystem as NFS
How to do it…
ceph-dokan – CephFS for Windows clients
How to do it…
CephFS a drop-in replacement for HDFS
5. Monitoring Ceph Clusters using Calamari
Introduction
Ceph cluster monitoring – the classic way
Monitoring Ceph clusters
How to do it…
Checking the cluster's health
Monitoring cluster events
The cluster utilization statistics
Checking the cluster's status
The cluster authentication entries
Monitoring Ceph MON
How to do it…
Checking the MON status
Checking the MON quorum status
Monitoring Ceph OSDs
How to do it…
OSD tree view
OSD statistics
Checking the crush map
Monitoring PGs
Monitoring Ceph MDS
How to do it…
Introducing Ceph Calamari
Building Calamari server packages
How to do it…
Building Calamari client packages
How to do it…
Setting up Calamari master server
How to do it…
Adding Ceph nodes to Calamari
How to do it…
Monitoring Ceph clusters from the Calamari dashboard
Troubleshooting Calamari
How to do it…
6. Operating and Managing a Ceph Cluster
Introduction
Understanding Ceph service management
Managing the cluster configuration file
How to do it…
Adding monitor nodes to the Ceph configuration file
Adding an MDS node to the Ceph configuration file
Adding OSD nodes to the Ceph configuration file
Running Ceph with SYSVINIT
Starting and stopping all daemons
How to do it…
Starting and stopping all daemons by type
How to do it…
Starting daemons by type
Stopping daemons by type
Starting and stopping a specific daemon
How to do it…
Starting a specific daemon
Stopping a specific daemon
Running Ceph as a service
Starting and stopping all daemons
How to do it…
Starting and stopping all daemons by type
How to do it…
Starting daemons by type
Stopping daemons by type
Starting and stopping a specific daemon
How to do it…
Starting a specific daemon
Stopping a specific daemon
Scale-up versus scale-out
Scaling out your Ceph cluster
Adding the Ceph OSD
How to do it…
Adding the Ceph MON
How to do it...
Adding the Ceph RGW
Scaling down your Ceph cluster
Removing the Ceph OSD
How to do it…
Removing Ceph MON
How to do it…
Replacing a failed disk in the Ceph cluster
How to do it…
Upgrading your Ceph cluster
How to do it…
Maintaining a Ceph cluster
How to do it…
How it works…
7. Ceph under the Hood
Introduction
Ceph scalability and high availability
Understanding the CRUSH mechanism
CRUSH map internals
How to do it…
How it works…
Ceph cluster map
High availability monitors
Ceph authentication and authorization
Ceph authentication
Ceph authorization
How to do it…
Ceph dynamic cluster management
Ceph placement group
How to do it…
Placement group states
Creating Ceph pools on specific OSDs
How to do it…
8. Production Planning and Performance Tuning for Ceph
Introduction
The dynamics of capacity, performance, and cost
Choosing the hardware and software components for Ceph
Processor
Memory
Network
Disk
Ceph OSD Journal partition
Ceph OSD Data partition
Operating System
OSD Filesystem
Ceph recommendation and performance tuning
Global cluster tuning
Monitor tuning
OSD tuning
OSD General Settings
OSD Journal settings
OSD Filestore settings
OSD Recovery settings
OSD Backfilling settings
OSD scrubbing settings
Client tuning
Operating System tuning
Ceph erasure coding
Erasure code plugin
Creating an erasure coded pool
How to do it…
Ceph cache tiering
Writeback mode
Read-only mode
Creating a pool for cache tiering
How to do it…
See also…
Creating a cache tier
How to do it…
Configuring a cache tier
How to do it…
Testing a cache tier
How to do it…
9. The Virtual Storage Manager for Ceph
Introduction
Understanding the VSM architecture
The VSM Controller
The VSM Agent
Setting up the VSM environment
How to do it…
Getting ready for VSM
How to do it…
Installing VSM
How to do it…
Creating a Ceph cluster using VSM
How to do it…
Exploring the VSM dashboard
Upgrading the Ceph cluster using VSM
How to do it…
VSM roadmap
VSM resources
10. More on Ceph
Introduction
Benchmarking the Ceph cluster
Disk performance baseline
Single disk write performance
How to do it…
Multiple disk write performance
How to do it…
Single disk read performance
How to do it…
Multiple disk read performance
How to do it…
Results
Baseline network performance
How to do it…
See also…
Ceph RADOS bench
How to do it…
How it works…
RADOS load-gen
How to do it…
How it works…
There's more…
Benchmarking the Ceph block device
Ceph rbd bench-write
How to do it…
How it works…
There's more…
See also…
Benchmarking Ceph RBD using FIO
How to do it…
See also…
Ceph admin socket
How to do it…
Using the ceph tell command
How to do it…
How it works…
Ceph REST API
How to do it…
Profiling Ceph memory
How to do it…
Deploying Ceph using Ansible
Getting ready
How to do it…
There's more…
The ceph-objectstore tool
How to do it…
How it works…
Index

Ceph Cookbook

Ceph Cookbook

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: February 2016

Production reference: 1250216

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-350-2

www.packtpub.com

Credits

Author

Karan Singh

Reviewers

Christian Eichelmann

Haruka Iwao

Commissioning Editor

Amarabha Banerjee

Acquisition Editor

Meeta Rajani

Content Development Editor

Kajal Thapar

Technical Editor

Menza Mathew

Copy Editor

Angad Singh

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Rekha Nair

Production Coordinator

Melwyn Dsa

Cover Work

Melwyn Dsa

Foreword

One year ago, Karan published his first book, Learning Ceph, Packt Publishing, which has been a great success. It addressed a need that a lot of users had: an easy-to-understand introduction to Ceph and an overview of its architecture.

When an open source project has an enthusiastic community like Ceph does, the innovation and evolution of features happen at a rapid pace. Besides the core development team around Sage Weil at Red Hat, industry heavyweights such as Intel, SanDisk, Fujitsu, and Suse, as well as countless other individuals, have made substantial contributions. As a result, the project continues to mature both in capability and stability; the latter playing a key role in enterprise deployments. Many features and components that are now a part of Ceph were only in their infancy when Learning Ceph, Packt Publishing, came out; erasure encoding, optimized performance for SSDs, and the Virtual Storage Manager (VSM) are just a couple of examples. All of these are covered in great detail in this new book that you are holding in your hands right now.

The other day, I read a blog where the author likened the significance of Ceph to the storage industry to the impact that Linux had on operating systems. While it is still too early to make that call, its adoption in the industry speaks for itself, with multi-petabyte-sized deployments becoming more and more common. Large-scale users such as CERN and Yahoo are regularly sharing their experiences with the community.

The wealth of capabilities and the enormous flexibility to adapt to a wide range of use cases can sometimes make it difficult to approach this new technology, and it can leave new users wondering where to start their learning journeys. Not everybody has access to massive data centers with thousands of servers and disks to experiment and build their own experiences. Karan's new book, Ceph Cookbook, Packt Publishing, is meant to help by providing practical, hands-on advice for the many challenges you will encounter.

As a long-time Ceph enthusiast, I have worked with Karan for several years and congratulate him on his passion and initiative to compile a comprehensive guide for first-time users of Ceph. It will be a useful guide to those embarking on deploying the open source community version of Ceph.

This book complements the more technical documentation and collateral developed by members of the Ceph community, filling in the gaps with useful commentary and advice for new users.

If you are downloading the Ceph community version, kicking its tires, and trying it out at home or on your non-mission-critical workloads in the enterprise, this book is for you. Expect to learn to deploy and manage Ceph step by step along with tips and use cases for deploying Ceph's features and functionality on certain storage workloads.

Now, it's time to begin reading about the ingredients you'll need to cook up your own Ceph software-defined storage deployment. But hurry— the new exciting features, such as production-ready CephFS and support for containers, are already in the pipeline, and I am looking forward to seeing Karan's next book in another year from now.

Dr. Wolfgang Schulze

Director of Global Storage Consulting, Red Hat

About the Author

Karan Singh is an IT expert and tech evangelist, living with his beautiful wife, Monika, in Finland. He holds a bachelor's degree, with honors, in computer science, and a master's degree in system engineering from BITS, Pilani. Apart from this, he is a certified professional for technologies such as OpenStack, NetApp, Oracle Solaris, and Linux.

Karan is currently working as a System Specialist of Storage and Cloud, for CSC – IT Center for Science Ltd., focusing all his energies on developing IaaS cloud solutions based on OpenStack and Ceph, and building economic multi-petabyte storage systems using Ceph.

Karan possesses a rich skill set and has strong work experience in a variety of storage solutions, cloud technologies, automation tools and Unix systems. He is also the author of the very first book on Ceph, titled Learning Ceph, published in 2015.

Karan devotes a part of his time to R&D and learning new technologies. When not working on Ceph and OpenStack, Karan can be found working with emerging technologies or automating stuffs. He loves writing about technologies and is an avid blogger at www.ksingh.co.in. You can reach him on Twitter @karansingh010, or by e-mail at <[email protected]>.

I'd like to thank my wife, Monika, for preparing delicious food while I was writing this book. Kiitos MJ, you are a great chef, Minä rakastan sinua.

I would like to take this opportunity to thank my company, CSC – IT Center for Science Ltd., and all my colleagues with whom I have worked and made memories. CSC, you are an amazing place to work, kiitos.

I'd also like to express my thanks to the vibrant Ceph community and its ecosystem for developing, improving, and supporting Ceph.

Finally, my sincere thanks to the entire Packt Publishing team, and also to the technical reviewers, for their state-of-the-art work during the course of this project.

About the Reviewers

Christian Eichelmann has worked as a System Engineer and an IT architect in Germany for several years, in a lot of different companies. He has been using Ceph since its early alpha releases and is currently running several Petabyte-scale clusters. He also developed ceph-dash: a popular monitoring dashboard for Ceph.

Haruka Iwao is an ads solutions engineer with Google. She worked as a storage solutions architect at Red Hat and has contributed to the Ceph community, especially in Japan. She also has work experience as a site reliability engineer at a few start-ups in Tokyo, and she is interested in site reliability engineering and large-scale computing. She studied distributed filesystems in her master's course at the University of Tsukuba.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why Subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Preface

We are a part of a digital world that is producing an enormous amount of data each second. The data growth is unimaginable and it's predicted that humankind will possess 40 Zettabytes of data by 2020. Well that's not too much, but how about 2050? Should we guesstimate a Yottabyte? The obvious question arises: do we have any way to store this gigantic data, or are we prepared for the future? To me, Ceph is the ray of hope and the technology that can be a possible answer to the data storage needs of the next decade. Ceph is the future of storage.

It's a great saying that "Software is eating the world". Well that's true. However, from another angle, software is the feasible way to go for various computing needs, such as computing weather, networking, storage, datacenters, and burgers, ummm…well, not burgers currently. As you already know, the idea behind a software-defined solution is to build all the intelligence in software itself and use commodity hardware to solve your greatest problem. And I think, this software-defined approach should be the answer to the future's computing problems.

Ceph is a true open source, software-defined storage solution, purposely built to handle unprecedented data growth with linear performance improvement. It provides a unified storage experience for file, object, and block storage interfaces from the same system. The beauty of Ceph is its distributed, scalable nature, and performance; reliability and robustness come along with these attributes. And furthermore, it is pocket friendly, that is, economical, providing you more value for each dollar you spent.

Ceph is the next big thing that has happened to the storage industry. Its enterprise class features such as scalability, reliability, erasure coding, cache tiering and counting, has led to its maturity that has improved significantly in the last few years. To name a few, there are organizations such as CERN, Yahoo, and DreamHost where multi-PB Ceph cluster is being deployed and is running successfully.

It's been a while since block and object interfaces of Ceph have been introduced and they are now fully developed. Until last year, CephFS was the only component that was lacking production readiness. This year, my bet is on CephFS as it's going to be production-ready in Ceph Jewel. I can't wait to see CephFS production adoption stories. There are a few more areas where Ceph is gaining popularity, such as AFA (All Flash Array), database workloads, storage for containers, and Hyper Converge Infrastructure. Well, Ceph has just begun; the best is yet to come.

In this book, we will take a deep dive to understand Ceph—covering components and architecture including its working. The Ceph Cookbook focuses on hands-on knowledge by providing you with step-by-step guidance with the help of recipes. Right from the first chapter, you will gain practical experience of Ceph by following the recipes. With each chapter, you will learn and play around with interesting concepts of Ceph. I hope, by the end of this book, you will feel competent regarding Ceph, both conceptually as well as practically, and you will be able to operate your Ceph storage infrastructure with confidence and success.

Happy Learning

Karan Singh

What this book covers

Chapter 1, Ceph – Introduction and Beyond, covers an introduction to Ceph, gradually moving towards RAID and its challenges, and Ceph architectural overview. Finally, we will go through Ceph installation and configuration.

Chapter 2, Working with Ceph Block Device, covers an introduction to Ceph Block Device and provisioning of the Ceph block device. We will also go through RBD snapshots, clones, as well as storage options for OpenStack cinder, glance and nova.

Chapter 3, Working with Ceph Object Storage, deep dives into Ceph object storage including RGW standard and federated setup, S3, and OpenStack Swift access. Finally, we will set up file sync and service using ownCloud.

Chapter 4, Working with the Ceph Filesystem, covers an introduction to CephFS, deploying and accessing MDS and CephFS via kernel, Fuse, and NFS-Ganesha. You will also learn how to access CephFS via Ceph-Dokan Windows client.

Chapter 5, Monitoring Ceph Clusters using Calamari, includes Ceph monitoring via CLI, an introduction to Calamari, and setting up of Calamari server and clients. We will also cover monitoring of Ceph cluster via Calamari GUI as well as troubleshooting Calamari.

Chapter 6, Operating and Managing a Ceph Cluster, covers Ceph service management and scaling up and scaling down a Ceph cluster. This chapter also includes failed disk replacement and upgrading Ceph infrastructure.

Chapter 7, Ceph under the Hood, explores Ceph CRUSH map, understanding the internals of CRUSH map, followed by Ceph authentication and authorization. This chapter also covers dynamic cluster management and the understanding of Ceph PG. Finally, we created the specifics required for specific hardware.

Chapter 8, Production Planning and Performance Tuning for Ceph, covers the planning of Cluster production deployment and HW and SW planning for Ceph. This chapter also includes Ceph recommendation and performance tuning. Finally, this chapter covers erasure coding and cache tiering.

Chapter 9, The Virtual Storage Manager for Ceph, is dedicated to Virtual Storage Manager (VSM), covering its introduction and architecture. We will also go through the deployment of VSM and then the creation of a Ceph cluster using VSM and manage it.

Chapter 10, More on Ceph, the final chapter of the book, covers Ceph benchmarking, Ceph troubleshooting using admin socket, API, and the ceph-objectstore tool. This chapter also covers the deployment of Ceph using Ansible and Ceph memory profiling.

What you need for this book

The various software components required to follow the instructions in the chapters are as follows:

VirtualBox 4.0 or higher (https://www.virtualbox.org/wiki/Downloads)GIT (http://www.git-scm.com/downloads)Vagrant 1.5.0 or higher (https://www.vagrantup.com/downloads.html)CentOS operating system 7.0 or higher (http://wiki.centos.org/Download)Ceph software packages Version 0.87.0 or higher (http://ceph.com/resources/downloads/)S3 Client, typically S3cmd (http://s3tools.org/download)Python-swift clientownCloud 7.0.5 or higher (https://download.owncloud.org/download/repositories/stable/owncloud/)NFS GaneshaCeph FuseCeph-DokanCeph-Calamari (https://github.com/ceph/calamari.git)Diamond (https://github.com/ceph/Diamond.git)Ceph Calamari Client, romana (https://github.com/ceph/romana)Virtual Storage Manager 2.0 or higher (https://github.com/01org/virtual-storage-manager/releases/tag/v2.1.0)Ansible 1.9 or higher (http://docs.ansible.com/ansible/intro_installation.html)OpenStack RDO (http://rdo.fedorapeople.org/rdo-release.rpm)

Who this book is for

This book is aimed at storage and cloud system engineers, system administrators, and technical architects and consultants who are interested in building software-defined storage solutions around Ceph to power their cloud and virtual infrastructure. If you have a basic knowledge of GNU/Linux and storage systems, with no experience of software-defined storage solutions and Ceph, but are eager to learn, this book is for you.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it, How it works, There's more, and See also).

To give clear instructions on how to complete a recipe, we use these sections as follows:

Getting ready

This section tells you what to expect in the recipe, and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make the reader more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "To do this, we need to edit /etc/nova/nova.conf on the OpenStack node and add the following perform the steps that are given in the following section."

A block of code is set as follows:

inject_partition=-2 images_type=rbd images_rbd_pool=vms images_rbd_ceph_conf=/etc/ceph/ceph.conf

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

inject_partition=-2 images_type=rbd images_rbd_pool=vms images_rbd_ceph_conf=/etc/ceph/ceph.conf

Any command-line input or output is written as follows:

# rados -p cache-pool ls

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "Navigate to the Options defined in nova.virt.libvirt.volume section and add the following lines of code:"

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Ceph – Introduction and Beyond

In this chapter, we will cover the following recipes:

Ceph – the beginning of a new eraRAID – the end of an eraCeph – the architectural overviewPlanning the Ceph deploymentSetting up a virtual infrastructureInstalling and configuring CephScaling up your Ceph clusterUsing Ceph clusters with a hands-on approach

Introduction

Ceph is currently the hottestSoftware Defined Storage (SDS) technology that is shaking up the entire storage industry. It is an open source project that provides unified software defined solutions for Block, File, and Object storage. The core idea of Ceph is to provide a distributed storage system that is massively scalable and high performing with no single point of failure. From the roots, it has been designed to be highly scalable (up to the exabyte level and beyond) while running on general-purpose commodity hardware.

Ceph is acquiring most of the traction in the storage industry due to its open, scalable, and reliable nature. This is the era of cloud computing and software defined infrastructure, where we need a storage backend that is purely software defined, and more importantly, cloud ready. Ceph fits in here very well, regardless of whether you are running a public, private, or hybrid cloud.

Today's software systems are very smart and make the best use of commodity hardware to run gigantic scale infrastructure. Ceph is one of them; it intelligently uses commodity hardware to provide enterprise-grade robust and highly reliable storage systems.

Ceph has been raised and nourished with an architectural philosophy that includes the following:

Every component must scale linearlyThere should not be any single point of failureThe solution must be software-based, open source, and adaptableCeph software should run on readily available commodity hardwareEvery component must be self-managing and self-healing wherever possible

The foundation of Ceph lies in the objects, which are its building blocks, and object storage like Ceph is the perfect provision for the current and future needs for unstructured data storage. Object storage has its advantages over traditional storage solutions; we can achieve platform and hardware independence using object storage. Ceph plays meticulously with objects and replicates them across the cluster to avail reliability; in Ceph, objects are not tied to a physical path, making object location independent. Such flexibility enables Ceph to scale linearly from the petabyte to exabyte level.

Ceph provides great performance, enormous scalability, power, and flexibility to organizations. It helps them get rid of expensive proprietary storage silos. Ceph is indeed an enterprise class storage solution that runs on commodity hardware; it is a low-cost yet feature rich storage system. The Ceph universal storage system provides Block, File, and Object storage under one hood, enabling customers to use storage as they want.

Ceph Releases

Ceph is being developed and improved at a rapid pace. On July 3, 2012, Sage announced the first LTS release of Ceph with the code name, Argonaut. Since then, we have seen seven new releases come up. Ceph releases are categorized as LTS (Long Term Support), and development releases and every alternate Ceph release is an LTS release. For more information, please visit https://Ceph.com/category/releases/.

Ceph release name

Ceph release version

Released On

Argonaut

V0.48 (LTS)

July 3, 2012

Bobtail

V0.56 (LTS)

January 1, 2013

Cuttlefish

V0.61

May 7, 2013

Dumpling

V0.67 (LTS)

August 14, 2013

Emperor

V0.72

November 9, 2013

Firefly

V0.80 (LTS)

May 7, 2014

Giant

V0.87.1

Feb 26, 2015

Hammer

V0.94 (LTS)

April 7, 2015

Infernalis

V9.0.0

May 5, 2015

Jewel

V10.0.0

Nov, 2015

Tip

Here is a fact: Ceph release names follow alphabetic order; the next one will be a "K" release.

Note

The term "Ceph" is a common nickname given to pet octopuses and is considered a short form of "Cephalopod", which is a class of marine animals that belong to the mollusk phylum. Ceph has octopuses as its mascot, which represents Ceph's highly parallel behavior, similar to octopuses.

Ceph – the beginning of a new era

Data storage requirements have grown explosively over the last few years. Research shows that data in large organizations is growing at a rate of 40 to 60 percent annually, and many companies are doubling their data footprint each year. IDC analysts estimated that worldwide, there were 54.4 exabytes of total digital data in the year 2000. By 2007, this reached 295 exabytes, and by 2020, it's expected to reach 44 zettabytes worldwide. Such data growth cannot be managed by traditional storage systems; we need a system like Ceph, which is distributed, scalable and most importantly, economically viable. Ceph has been designed especially to handle today's as well as the future's data storage needs.

Software Defined Storage (SDS)

SDS is what is needed to reduce TCO for your storage infrastructure. In addition to reduced storage cost, an SDS can offer flexibility, scalability, and reliability. Ceph is a true SDS solution; it runs on commodity hardware with no vendor lock-in and provides low cost per GB. Unlike traditional storage systems where hardware gets married to software, in SDS, you are free to choose commodity hardware from any manufacturer and are free to design a heterogeneous hardware solution for your own needs. Ceph's software-defined storage on top of this hardware provides all the intelligence you need and will take care of everything, providing all the enterprise storage features right from the software layer.

Cloud storage

One of the drawbacks of a cloud infrastructure is the storage. Every cloud infrastructure needs a storage system that is reliable, low-cost, and scalable with a tighter integration than its other cloud components. There are many traditional storage solutions out there in the market that claim to be cloud ready, but today we not only need cloud readiness, but a lot more beyond that. We need a storage system that should be fully integrated with cloud systems and can provide lower TCO without any compromise to reliability and scalability. The cloud systems are software defined and are built on top of commodity hardware; similarly, it needs a storage system that follows the same methodology, that is, being software defined on top of commodity hardware, and Ceph is the best choice available for cloud use cases.

Ceph has been rapidly evolving and bridging the gap of a true cloud storage backend. It is grabbing center stage with every major open source cloud platform, namely OpenStack, CloudStack, and OpenNebula. Moreover, Ceph has succeeded in building up beneficial partnerships with cloud vendors such as Red Hat, Canonical, Mirantis, SUSE, and many more. These companies are favoring Ceph big time and including it as an official storage backend for their cloud OpenStack distributions, thus making Ceph a red hot technology in cloud storage space.

The OpenStack project is one of the finest examples of open source software powering public and private clouds. It has proven itself as an end-to-end open source cloud solution. OpenStack is a collection of programs, such as cinder, glance, and swift, which provide storage capabilities to OpenStack. These OpenStack components required a reliable, scalable, and all in one storage backend like Ceph. For this reason, Openstack and Ceph communities have been working together for many years to develop a fully compatible Ceph storage backend for the OpenStack.

Cloud infrastructure based on Ceph provides much needed flexibility to service providers to build Storage-as-a-Service and Infrastructure-as-a-Service solutions, which they cannot achieve from other traditional enterprise storage solutions as they are not designed to fulfill cloud needs. By using Ceph, service providers can offer low-cost, reliable cloud storage to their customers.

Unified next generation storage architecture

The definition of unified storage has changed lately. A few years ago, the term "unified storage" referred to providing file and block storage from a single system. Now, because of recent technological advancements, such as cloud computing, big data, and Internet of Things, a new kind of storage has been evolving, that is, object storage. Thus, all the storage systems that do not support object storage are not really unified storage solutions. A true unified storage is like Ceph; it supports blocks, files, and object storage from a single system.

In Ceph, the term "unified storage" is more meaningful than what existing storage vendors claim to provide. Ceph has been designed from the ground up to be future ready, and it's constructed such that it can handle enormous amounts of data. When we call Ceph "future ready", we mean to focus on its object storage capabilities, which is a better fit for today's mix of unstructured data rather than blocks or files. Everything in Ceph relies on intelligent objects, whether it's block storage or file storage. Rather than managing blocks and files underneath, Ceph manages objects and supports block-and-file-based storage on top of it. Objects provide enormous scaling with increased performance by eliminating metadata operations. Ceph uses an algorithm to dynamically compute where the object should be stored and retrieved from.

The traditional storage architecture of a SAN and NAS system is very limited. Basically, they follow the tradition of controller high availability, that is, if one storage controller fails it serves data from the second controller. But, what if the second controller fails at the same time, or even worse, if the entire disk shelf fails? In most cases, you will end up losing your data. This kind of storage architecture, which cannot sustain multiple failures, is definitely what we do not want today. Another drawback of traditional storage systems is its data storage and access mechanism. It maintains a central lookup table to keep track of metadata, which means that every time a client sends a request for a read or write operation, the storage system first performs a lookup in the huge metadata table, and after receiving the real data location, it performs client operation. For a smaller storage system, you might not notice performance hits, but think of a large storage cluster—you would definitely be bound by performance limits with this approach. This would even restrict your scalability.

Ceph does not follow such traditional storage architecture; in fact, the architecture has been completely reinvented. Rather than storing and manipulating metadata, Ceph introduces a newer way: the CRUSH algorithm. CRUSH stands for Controlled Replication Under Scalable Hashing. Instead of performing lookup in the metadata table for every client request, the CRUSH algorithm computes on demand where the data should be written to or read from. By computing metadata, the need to manage a centralized table for metadata is no longer there. The modern computers are amazingly fast and can perform a CRUSH lookup very quickly; moreover, this computing load, which is generally not too much, can be distributed across cluster nodes, leveraging the power of distributed storage. In addition to this, CRUSH has a unique property, which is infrastructure awareness. It understands the relationship between various components of your infrastructure and stores your data in a unique failure zone, such as a disk, node, rack, row, and datacenter room, among others. CRUSH stores all the copies of your data such that it is available even if a few components fail in a failure zone. It is due to CRUSH that Ceph can handle multiple component failures and provide reliability and durability.

The CRUSH algorithm makes Ceph self-managing and self-healing. In an event of component failure in a failure zone, CRUSH senses which component has failed and determines the effect on the cluster. Without any administrative intervention, CRUSH self-manages and self-heals by performing a recovering operation for the data lost due to failure. CRUSH regenerates the data from the replica copies that the cluster maintains. If you have configured the Ceph CRUSH map in the correct order, it makes sure that at least one copy of your data is always accessible. Using CRUSH, we can design a highly reliable storage infrastructure with no single point of failure. It makes Ceph a highly scalable and reliable storage system that is future ready.

RAID – the end of an era

RAID technology has been the fundamental building block for storage systems for years. It has proven successful for almost every kind of data that has been generated in the last 3 decades. But all eras must come