40,81 €
Achieve enterprise automation in your Linux environment with this comprehensive guide
Key Features
Book Description
Automation is paramount if you want to run Linux in your enterprise effectively. It helps you minimize costs by reducing manual operations, ensuring compliance across data centers, and accelerating deployments for your cloud infrastructures.
Complete with detailed explanations, practical examples, and self-assessment questions, this book will teach you how to manage your Linux estate and leverage Ansible to achieve effective levels of automation. You'll learn important concepts on standard operating environments that lend themselves to automation, and then build on this knowledge by applying Ansible to achieve standardization throughout your Linux environments.
By the end of this Linux automation book, you'll be able to build, deploy, and manage an entire estate of Linux servers with higher reliability and lower overheads than ever before.
What you will learn
Who this book is for
This book is for anyone who has a Linux environment to design, implement, and maintain. Open source professionals including infrastructure architects and system administrators will find this book useful. You're expected to have experience in implementing and maintaining Linux servers along with knowledge of building, patching, and maintaining server infrastructure. Although not necessary, knowledge of Ansible or other automation technologies will be beneficial.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 616
Veröffentlichungsjahr: 2020
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Vijin BorichaAcquisition Editor: Rohit RajkumarContent Development Editor: Alokita AmannaSenior Editor: Rahul DsouzaTechnical Editor: Prachi SawantCopy Editor: Safis EditingProject Coordinator: Vaidehi SawantProofreader: Safis EditingIndexer: Pratik ShirodkarProduction Designer:Nilesh Mohite
First published: January 2020
Production reference: 1240120
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78913-161-1
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Few would disagree when I say that the world of technology has grown ever more complex over the last couple of decades since the internet came to prominence. More and more products have arrived, promising us solutions to tame the growing complexity. Along with the promises come a raft of experts, there to help us through what is actually yet more complexity.
2012 saw the first release of Ansible. By 2013, it was gaining significant traction since its promise of power through simplicity was not an empty one. Here was a technology rooted in a simple truth—solving problems with technology really means solving problems for people. Therefore, people matter. A tool that is easy to pick up and learn? What an amazing thought! Early adopters were those who saw through the functionality list to realize that here was a people-pleasing game changer.
I first met James at one of his technical Ansible talks a few years ago. It was still relatively early days for Ansible, although we'd just been acquired by Red Hat. At that first meeting, I realized that here was a fellow who understood the link between people and Ansible's powerful simplicity. I've been lucky enough to see James speak on a number of occasions since, with two standout talks coming to mind.
At AnsibleFest 2018 in Austin, Texas, James gave a great talk about a client engagement where he presided over a business-critical database upgrade—on a Friday afternoon. What's the golden rule we all tout in tech? Don't make business-critical changes on a Friday! Yet James's charismatic storytelling had the audience enthralled. The second occasion was more recent, at an Ansible London meetup. Taking a very different approach to the usual tech-heavy talks, James presented the audience with a tale of positive psychology, a story that had Ansible as the underlying tool supporting people. It turned out to be a great success, sparking a lively interaction across the audience during the Q&A session that followed.
Scalability isn't just about a technology; it is about people. If you want a technology to scale, it must be easy for people to adopt, to master, and to share. James is a model of scalability himself, as he so readily shares his knowledge. He also shows in this book that Ansible is an orchestrator, a conductor of the symphony if you like, with the ability to span an enterprise. I'm sure you'll enjoy reading it as much as I've enjoyed every interaction I've had with James.
Mark PhillipsProduct Marketing Manager, Red Hat Ansible
I've worked alongside James for several years and consider him to be one of the foremost Ansible experts in the world. I've been witness to his help in the digital modernization efforts of large and small organizations with the help of automation and DevOps practices.
In Hands-On Enterprise Automation on Linux, James generously shares his experience with a practical, no-nonsense approach to managing heterogeneous Linux environments. If you learn best through a hands-on approach, then this is the book for you. James provides plenty of in-depth examples in each chapter so that you can cement your understanding and feel prepared to take Ansible into a live environment.
Ready to become an automation rockstar and revolutionize your IT ops team? Then read on!
Ben StraussSecurity Automation Manager, MindPoint Group
James Freemanis an accomplished IT consultant and architect with over 20 years' experience in the technology industry. He has more than 7 years of first-hand experience of solving real-world enterprise problems in production environments using Ansible, frequently introducing Ansible as a new technology to businesses and CTOs for the first time. He has a passion for positive psychology and its application in the world of technology and, in addition, has authored and facilitated bespoke Ansible workshops and training sessions, and has presented at both international conferences and meetups on Ansible.
Gareth Coffey is an automation consultant for Cachesure, based in London, developing bespoke solutions to enable companies to migrate services to public and private cloud platforms. Gareth has been working with Unix/Linux-based systems for over 15 years. During that time, he has worked with a multitude of different programming languages, including C, PHP, Node.js, and various automation and orchestration tool sets. As well as consulting, Gareth runs his own start-up – Progressive Ops, developing cloud-based services aimed at helping start-up companies deploy resources across multiple cloud providers, with a focus on security.
Iain Grant is a senior engineer with over 20 years' experience as an IT professional, in both small and enterprise companies, where he has held a wide variety of positions, including trainer, programmer, firmware engineer, and system administrator. During this time, he has worked on multiple operating systems, ranging from OpenVMS, through Windows, to Linux, where he has also contributed to the Alpha Linux kernel. He currently works in an enterprise environment looking after over 300 Linux servers, with responsibility for their automation and management.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Enterprise Automation on Linux
Dedication
About Packt
Why subscribe?
Foreword
Contributors
About the author
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Core Concepts
Building a Standard Operating Environment on Linux
Understanding the challenges of Linux environment scaling
Challenges of non-standard environments
Early growth of a non-standard environment
Impacts of non-standard environments
Scaling up non-standard environments
Addressing the challenges
Security
Reliability
Scalability
Longevity
Supportability
Ease of use
What is an SOE?
Defining the SOE
Knowing what to include
Exploring SOE benefits
Example benefits of an SOE in a Linux environment
Benefits of SOE to software testing
Knowing when to deviate from standards
Ongoing maintenance of SOEs
Summary
Questions
Further reading
Automating Your IT Infrastructure with Ansible
Technical requirements
Exploring the Ansible playbook structure
Exploring inventories in Ansible
Understanding roles in Ansible
Understanding Ansible variables
Understanding Ansible templates
Bringing Ansible and the SOE together
Summary
Questions
Further reading
Streamlining Infrastructure Management with AWX
Technical requirements
Introduction to AWX
AWX reduces training requirements
AWX enables auditability
AWX supports version control
AWX helps with credential management
Integrating AWX with other services
Installing AWX
Running your playbooks from AWX
Setting up credentials in AWX
Creating inventories in AWX
Creating a project in AWX
Creating a template in AWX
Running a playbook from AWX
Automating routine tasks with AWX
Summary
Questions
Further reading
Section 2: Standardizing Your Linux Servers
Deployment Methodologies
Technical requirements
Knowing your environment
Deploying to bare-metal environments
Deploying to traditional virtualization environments
Deploying to cloud environments
Docker deployments
Keeping builds efficient
Keeping your builds simple
Making your builds secure
Creating efficient processes
Ensuring consistency across Linux images
Summary
Questions
Further reading
Using Ansible to Build Virtual Machine Templates for Deployment
Technical requirements
Performing the initial build
Using ready-made template images
Creating your own virtual machine images
Using Ansible to build and standardize the template
Transferring files into the image
Installing packages
Editing configuration files
Validating the image build
Putting it all together
Cleaning up the build with Ansible
Summary
Questions
Further reading
Custom Builds with PXE Booting
Technical requirements
PXE booting basics
Installing and configuring PXE-related services
Obtaining network installation images
Performing your first network boot
Performing unattended builds
Performing unattended builds with kickstart files
Performing unattended builds with pre-seed files
Adding custom scripts to unattended boot configurations
Customized scripting with kickstart
Customized scripting with pre-seed
Summary
Questions
Further reading
Configuration Management with Ansible
Technical requirements
Installing new software
Installing a package from operating system default repositories
Installing non-native packages
Installing unpackaged software
Making configuration changes with Ansible
Making small configuration changes with Ansible
Maintaining configuration integrity
Managing configuration at an enterprise scale
Making scalable static configuration changes
Making scalable dynamic configuration changes
Summary
Questions
Further reading
Section 3: Day-to-Day Management
Enterprise Repository Management with Pulp
Technical requirements
Installing Pulp for patch management
Installing Pulp
Building repositories in Pulp
Building RPM-based repositories in Pulp
Building DEB-based repositories in Pulp
Patching processes with Pulp
RPM-based patching with Pulp
DEB-based patching with Pulp
Summary
Questions
Further reading
Patching with Katello
Technical requirements
Introduction to Katello
Installing a Katello server
Preparing to install Katello
Patching with Katello
Patching RPM-based systems with Katello
Patching DEB-based systems with Katello
Summary
Questions
Further reading
Managing Users on Linux
Technical requirements
Performing user account management tasks
Adding and modifying users with Ansible
Removing users with Ansible
Centralizing user account management with LDAP
Microsoft AD
FreeIPA
Enforcing and auditing configuration
Managing sudoers with Ansible
Auditing user accounts with Ansible
Summary
Questions
Further reading
Database Management
Technical requirements
Installing databases with Ansible
Installing MariaDB server with Ansible
Installing PostgreSQL Server with Ansible
Importing and exporting data
Automating MariaDB data loading with Ansible
Performing routine maintenance
Routine maintenance on PostgreSQL with Ansible
Summary
Questions
Further reading
Performing Routine Maintenance with Ansible
Technical requirements
Tidying up disk space
Monitoring for configuration drift
Understanding process management with Ansible
Rolling updates with Ansible
Summary
Questions
Further reading
Section 4: Securing Your Linux Servers
Using CIS Benchmarks
Technical requirements
Understanding CIS Benchmarks
What is a CIS Benchmark?
Exploring CIS Benchmarks in detail
Applying security policy wisely
Applying the SELinux security policy
Mounting of filesystems
Installing Advanced Intrusion Detection Environment (AIDE)
Understanding CIS Service benchmarks
X Windows
Allowing hosts by network
Local firewalls
Overall guidance on scoring
Scripted deployment of server hardening
Ensuring SSH root login is disabled
Ensuring packet redirect sending is disabled
Running CIS Benchmark scripts from a remote location
Summary
Questions
Further reading
CIS Hardening with Ansible
Technical requirements
Writing Ansible security policies
Ensuring remote root login is disabled
Building up security policies in Ansible
Implementing more complex security benchmarks in Ansible
Making appropriate decisions in your playbook design
Application of enterprise-wide policies with Ansible
Testing security policies with Ansible
Summary
Questions
Further reading
Auditing Security Policy with OpenSCAP
Technical requirements
Installing your OpenSCAP server
Running OpenSCAP Base
Installing the OpenSCAP Daemon
Running SCAP Workbench
Considering other OpenSCAP tools
Evaluating and selecting policies
Installing SCAP Security Guide
Understanding the purpose of XCCDF and OVAL policies
Installing other OpenSCAP policies
Scanning the enterprise with OpenSCAP
Scanning the Linux infrastructure with OSCAP
Running regular scans with the OpenSCAP Daemon
Scanning with SCAP Workbench
Interpreting results
Summary
Questions
Further reading
Tips and Tricks
Technical requirements
Version control for your scripts
Integrating Ansible with Git
Organizing your version control repositories effectively
Version control of roles in Ansible
Inventories – maintaining a single source of truth
Working with Ansible dynamic inventories
Example – working with the Cobbler dynamic inventory
Running one-off tasks with Ansible
Summary
Questions
Further reading
Assessments
Chapter 1 - Building a Standard Operating Environment on Linux
Chapter 2 - Automating Your IT Infrastructure with Ansible
Chapter 3 - Streamlining Infrastructure Management with AWX
Chapter 4 - Deployment Methodologies
Chapter 5 - Using Ansible to Build Virtual Machine Templates for Deployment 
Chapter 6 - Custom Builds with PXE Booting
Chapter 7 - Configuration Management with Ansible
Chapter 8 - Enterprise Repository Management with Pulp
Chapter 9 - Patching with Katello
Chapter 10 - Managing Users on Linux
Chapter 11 - Database Management
Chapter 12 - Performing Routine Maintenance with Ansible
Chapter 13 - Using CIS Benchmarks
Chapter 14 - CIS Hardening with Ansible
Chapter 15 - Auditing Security Policy with OpenSCAP
Chapter 16 - Tips and Tricks
Other Books You May Enjoy
Leave a review - let other readers know what you think
Welcome to Hands-On Enterprise Automation on Linux, your guide to a collection of the most valuable processes, methodologies, and tools for streamlining and efficiently managing your Linux deployments at enterprise scale. This book will provide you with the knowledge and skills required to standardize your Linux estate and manage it at scale, using open source tools including Ansible, AWX (Ansible Tower), Pulp, Katello, and OpenSCAP. You will learn about the creation of standard operating environments, and how to define, document, manage, and maintain these standards using Ansible. In addition, you will acquire knowledge of security hardening standards, such as the CIS Benchmarks. Throughout the book, practical, hands-on examples will be provided for you to try for yourself, on which you can build your own code, and to demonstrate the principles being covered.
This book is for anyone who has a Linux environment to design, implement, and care for. It is intended to appeal to a wide range of open source professionals, from infrastructure architects through to system administrators, including professionals up to C level. Proficiency in the implementation and maintenance of Linux servers and familiarity with the concepts involved in building, patching, and maintaining a Linux server infrastructure are assumed. Prior knowledge of Ansible and other automation tools is not essential but may be beneficial.
Chapter 1, Building a Standard Operating Environment on Linux, provides a detailed introduction to standardized operating environments, a core concept that will be referred to throughout this hands-on book, and which is essential understanding in order for you to embark on this journey.
Chapter 2, Automating Your IT Infrastructure with Ansible, provides a detailed, hands-on breakdown of an Ansible playbook, including inventories, roles, variables, and best practices for developing and maintaining playbooks; a crash course enabling you to learn just enough Ansible to begin your automation journey.
Chapter 3, Streamlining Infrastructure Management with AWX, explores, with the help of practical examples, the installation and utilization of AWX (also available as Ansible Tower) so as to build good business processes around your Ansible automation infrastructure.
Chapter 4, Deployment Methodologies, enables you to understand the various methods available in relation to large-scale deployments in Linux environments, and how to leverage these to the best advantage of the enterprise.
Chapter 5, Using Ansible to Build Virtual Machine Templates for Deployment, explores the best practices for deploying Linux by building virtual machine templates that will be deployed at scale on a hypervisor in a practical and hands-on manner.
Chapter 6, Custom Builds with PXE Booting, looks at the process of PXE booting for when the templated approach to server builds may not be possible (for example, where bare-metal servers are still being used), and how to script this to build standard server images over the network.
Chapter 7, Configuration Management with Ansible, provides practical examples of how to manage your build once it enters service, so as to ensure that consistency remains a byword without limiting innovation.
Chapter 8, Enterprise Repository Management with Pulp, looks at how to perform patching in a controlled manner to prevent inconsistencies re-entering even the most carefully standardized environment through the use of the Pulp tool.
Chapter 9, Patching with Katello, builds on our work involving the Pulp tool by introducing you to Katello, providing even more control over your repositories whilst providing a user-friendly graphical user interface.
Chapter 10,Managing Users on Linux, provides a detailed look at user account management using Ansible as the orchestration tool, along with the use of centralized authentication systems such as LDAP directories.
Chapter 11,Database Management, looks at how Ansible can be used both to automate deployments of databases, and to execute routine database management tasks, on Linux servers.
Chapter 12,Performing Routine Maintenance with Ansible, explores some of the more advanced on-going maintenance that Ansible can perform on a Linux server estate.
Chapter 13,Using CIS Benchmarks, provides an in-depth examination of the CIS server hardening benchmarks and how to apply them on Linux servers.
Chapter 14,CIS Hardening with Ansible, looks at how a security hardening policy can be rolled out across an entire estate of Linux servers in an efficient, reproducible manner with Ansible.
Chapter 15,Auditing Security Policy with OpenSCAP, provides a hands-on look at the installation and use of OpenSCAP to audit Linux servers for policy violations on an on-going basis, since security standards can be reversed by either malicious or otherwise well-meaning end users.
Chapter 16,Tips and Tricks, explores a number of tips and tricks to keep your Linux automation processes running smoothly in the face of the ever-changing demands of the enterprise.
To follow the examples in this book, it is recommended that you have access to at least two Linux machines for testing on, though more may be preferable to develop the examples more fully. These can be either physical or virtual machines—all examples were developed on a set of Linux virtual machines, but should work just as well on physical ones. In Chapter 5, Using Ansible to Build Virtual Machine Templates for Deployment, we make use of nested virtualization on a KVM virtual machine to build a Linux image. The exact hardware requirements for this are listed at the beginning of this chapter. This will require either access to a physical machine with the appropriate CPU to run the examples on, or a hypervisor that supports nested virtualization (for example, VMware or Linux KVM).
Please be aware that some examples in this book could be disruptive to other services on your network; where there is such a risk, this is highlighted at the beginning of each chapter. I recommend you try out the examples in an isolated test network unless/until you are confident that they will not have any impact on your operations.
Although other Linux distributions are mentioned in the book, we focus on two key Linux distributions—CentOS 7.6 (though if you have access to it, you are welcome to use Red Hat Enterprise Linux 7.6, which should work just as well in most examples), and Ubuntu Server 18.04. All test machines were built from the official ISO images, using the minimal installation profile.
As such, where additional software is required, we take you through the steps needed to install it so that you can complete the examples. If you choose to complete all the examples, you will install software such as AWX, Pulp, Katello, and OpenSCAP. The only exception to this is FreeIPA, which is mentioned in Chapter 10, Managing Users on Linux. Installing a directory server for your enterprise is a huge topic that sadly requires more space than we have in this book—hence, you may wish to explore this topic independently.
The text assumes that you will run Ansible from one of your Linux test machines, but Ansible can actually be run on any machine with Python 2.7 or Python 3 (versions 3.5 and higher) installed (Windows is supported for the control machine, but only through a Linux distribution running in the Windows Subsystem for Linux (WSL) layer available on newer versions of Windows. Supported operating systems for Ansible include (but are not limited to) Red Hat, Debian, Ubuntu, CentOS, macOS, and FreeBSD.
This book uses the Ansible 2.8.x.x series release, although a few examples are specific to Ansible 2.9.x.x, which was released during the course of writing. Ansible installation instructions can be found at https://docs.ansible.com/ansible/intro_installation.html.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Enterprise-Automation-on-Linux. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781789131611_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "To start with, let's create a role called loadmariadb."
A block of code is set as follows:
- name: Ensure PostgreSQL service is installed and started at boot time service: name: postgresql state: started enabled: yes
Any command-line input or output is written as follows:
$ mkdir /var/lib/tftpboot/EFIx64/centos7
Bold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Select System info from the Administration panel."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
The objective of this section is to understand the systems administration fundamentals and techniques that will be covered in this book. First, we will cover a hands-on introduction to Ansible, the tool that will be used throughout this book for automation and purposes such as package management and advanced systems administration en masse.
This section comprises the following chapters:
Chapter 1
,
Building a Standard Operating Environment on Linux
Chapter 2
,
Automating Your IT Infrastructure with Ansible
Chapter 3
,
Streamlining Infrastructure Management with AWX
This chapter provides a detailed exploration of the Standard Operating Environment (henceforth, SOE for short) concept in Linux. Although we will go into much greater detail later, in short, an SOE is an environment where everything is created and modified in a standard way. For example, this would mean that all Linux servers are built in the same way, using the same software versions. This is an important concept because it makes managing the environment much easier and reduces the workload for those looking after it. Although this chapter is quite theoretical in nature, it sets the groundwork for the rest of this book.
We will start by looking at the fundamental definition of such an environment, and then proceed to explore why it is desirable to want to create one. From there, we will look at some of the pitfalls of an SOE to give you a good perspective on how to maintain the right balance in such an environment, before finally discussing how an SOE should be integrated into day-to-day maintenance processes. The effective application of this concept enables efficient and effective management of Linux environments at very large scales.
In this chapter, we will cover the following topics:
Understanding the challenges of Linux environment scaling
What is an SOE?
Exploring SOE benefits
Knowing when to deviate from standards
Ongoing maintenance of SOEs
Before we delve into the definition of an SOE, let's explore the challenges of scaling a Linux environment without standards. An exploration of this will help us to understand the definition itself, as well as how to define the right standards for a given scenario.
It is important to consider that many challenges experienced by enterprises with technology estates (whether Linux or otherwise) do not start out as such. In the early stages of growth, in fact, many systems and processes are entirely sustainable, and in the next section, we will look at this early stage of environment growth as a precursor to understanding the challenges associated with large-scale growth.
In a surprisingly large number of companies, Linux environments begin life without any form of standardization. Often, they grow organically over time. Deployments start out small, perhaps just covering a handful of core functions, and as time passes and requirements grow, so does the environment. Skilled system administrators often make changes by hand on a per-server basis, deploying new services and growing the server estate as business demands dictate.
This organic growth is the path of least resistance for most companies—project deadlines are often tight and in addition both budget and resource are scarce. Hence, when a skilled Linux resource is available, that resource can assist in just about all of the tasks required, from simple maintenance tasks to commissioning complex application stacks. It saves a great deal of time and money spent on architecture and makes good use of the skillset of staff on hand as they can be used to address immediate issues and deployments, rather than spending time on architectural design. Hence, quite simply, it makes sense, and the author has experienced this at several companies, even high-profile multi-national ones.
Let's take a deeper look at this from a technical standpoint. There are numerous flavors of Linux, numerous applications that perform (at a high level) the same function, and numerous ways to solve a given problem. For example, if you want to script a task, do you write it in a shell script, Perl, Python, or Ruby? For some tasks, all can achieve the desired end result. Different people have different preferred ways of approaching problems and different preferred technology solutions, and often it is found that a Linux environment has been built using a technology that was the flavor of the month when it was created or that was a favorite of the person responsible for it. There is nothing wrong with this in and of itself, and initially, it does not cause any problems.
If organic growth brings with it one fundamental problem, it is this: scale. Making changes by hand and always using the latest and greatest technology is great when the environment size is relatively small, and often provides an interesting challenge, hence keeping technical staff feeling motivated and valued. It is vital for those working in technology to keep their skills up to date, so it is often a motivating factor to be able to employ up-to-date technologies as part of the day job.
When the number of servers enters the hundreds, never mind thousands (or even greater!), this whole organic process breaks down. What was once an interesting challenge becomes laborious and tedious, even stressful. The learning curve for new team members is steep. A new hire may find themselves with a disparate environment with lots of different technologies to learn, and possibly a long period of training before they can become truly effective. Long-serving team members can end up being silos of knowledge, and should they depart the business, their loss can cause continuity issues. Problems and outages become more numerous as the non-standard environment grows in an uncontrolled manner, and troubleshooting becomes a lengthy endeavor—hardly ideal when trying to achieve a 99.99% service uptime agreement, where every second of downtime matters! Hence, in the next section, we will look at how to address these challenges with an SOE.
From this, we realize our requirement for standardization. Building a suitable SOE is all about the following:
Realizing economies of scale
Being efficient in day-to-day operations
Making it easy for all involved to get up to speed quickly and easily
Being aligned with the growing needs of the business
After all, if an environment is concise in its definition, then it is easier for everyone involved in it to understand and work with. This, in turn, means tasks are completed quicker and with greater ease. In short, standardization can bring cost savings and improved reliability.
It must be stressed that this is a concept and not an absolute. There is no right or wrong way to build such an environment, though there are best practices. Throughout this chapter, we will explore the concept further and help you to identify core best practices associated with SOEs so that you can make informed decisions when defining your own.
Let's proceed to explore this in more detail. Every enterprise has certain demands of their IT environments, whether they are based on Linux, Windows, FreeBSD, or any other technology. Sometimes, these are well understood and documented, and sometimes, they are simply implicit—that is to say, everyone assumes the environment meets these standards, but there is no official definition. These requirements often include the following:
Security
Reliability
Scalability
Longevity
Supportability
Ease of use
These, of course, are all high-level requirements, and very often, they intersect with each other. Let's explore these in more detail.
Security in an environment is established by several factors. Let's look at some questions to understand the factors involved:
Is the configuration secure?
Have we allowed the use of weak passwords?
Is the superuser, root, allowed to log in remotely?
Are we logging and auditing all connections?
Now, in a non-standard environment, how can you truly say that these requirements are all enforced across all of your Linux servers? To do so requires a great deal of faith they have all been built the same way, that they had the same security parameters applied, and that no-one has ever revisited the environment to change anything. In short, it requires fairly frequent auditing to ensure compliance.
However, where the environment has been standardized, and all servers have been built from a common source or using a common automation tool (we shall demonstrate this later in this book), it is much easier to say with confidence that your Linux estate is secure.
Security is also enforced by patches, which ensure you are not running any software with vulnerabilities that could allow an attacker to compromise your servers. Some Linux distributions have longer lives than others. For example, Red Hat Enterprise Linux (and derivatives such as CentOS) and the Ubuntu LTS releases all have long, predictable life cycles and make good candidates for your Linux estate.
As such, they should be part of your standards. By contrast, if a bleeding edge Linux distribution such as Fedora has been used because, perhaps, it had the latest packages required at the time, you can be sure that the life cycle will be short, and that updates would cease in the not too distant future, hence leaving you open to potential unpatched vulnerabilities and the need to upgrade to a newer release of Fedora.
Even if the upgrade to a newer version of Fedora is performed, sometimes packages get orphaned—that is to say, they do not get included in the newer release. This might be because they have been superseded by a different package. Whatever the cause, upgrading one distribution to another could cause a false sense of security and should be avoided unless thoroughly researched. In this way, standardization helps to ensure good security practices.
Many enterprises expect their IT operations to be up and running 99.99% of the time (or better). Part of the route to achieving this is robust software, application of relevant bug fixes, and well-defined troubleshooting procedures. This ensures that in the worst case scenario of an outage, the downtime is as minimal as possible.
Standardization again helps here—as we discussed in the preceding section on security, a good choice of underlying operating system ensures that you have ongoing access to bug fixes and updates, and if you know that your business needs a vendor backup to ensure business continuity, then the selection of a Linux operating system with a support contract (available with Red Hat or Canonical, for example) makes sense.
Equally, when servers are all built to a well-defined and understood standard, making changes to them should yield predictable results as everyone knows what they are working with. If all servers are built slightly differently, then a well-meaning change or update could have unintended consequences and result in costly downtime.
Again with standardization, even if the worst-case scenario occurs, everyone involved should know how to approach the problem because they will know that all servers have been built on a certain base image and have a certain configuration. This knowledge and confidence reduce troubleshooting times and ultimately downtime.
All enterprises desire their business to grow and most times, this means that IT environments need to scale up to deal with increased demand. In an environment where the servers are built in a non-standard manner, scaling up an environment becomes more of a challenge.
For example, if scaling horizontally (adding more identical servers to an existing service), the new servers should all have the same configuration as the existing ones. Without standards, the first step is to work out how the initial set of servers was built and then to clone this and make the necessary changes to produce more unique servers.
This process is somewhat cumbersome whereas, with a standardized environment, the investigative step is completely unnecessary, and horizontal scaling becomes a predictable, repeatable, business-as-usual task. It also ensures greater reliability as there should be no unintended results from the new servers in the case that a non-standard configuration item was missed. Human beings are incredible, intelligent beings capable of sending a man to the moon, and yet they are equally capable of overlooking a single line in a configuration file. The idea of standardization is to mitigate this risk, and hence make it quick and efficient to scale an environment either up or out using a well-thought-out operating system template, the concept of which we will explore as we proceed through this chapter.
Sometimes when deploying a service, a particular software version is needed. Let's take the example of a web application that runs on PHP. Now, suppose that your particular enterprise has, for historical reasons, standardized on CentOS 6 (or RHEL 6). This operating system only ships with PHP 5.3, meaning that if you suddenly take on an application that only supports PHP 7.0 and above, you need to figure out how to host this.
One apparently obvious solution to this would be to roll out a Fedora virtual machine image. After all, it shares similar technologies to CentOS and RHEL and has much more up-to-date libraries included with it. The author has direct experience of this kind of solution in several roles! However, let's take a look at the bigger picture.
RHEL (and CentOS, which is based upon this) has a lifespan of around 10 years, depending on the point at which you purchased it. In an enterprise, this is a valuable proposition—it means that you can guarantee that any servers you build will have patches and support for up to 10 years (and possibly longer with extended life cycle support) from the point at which you built them. This ties in nicely with our previous points around security, reliability, and supportability (in the following section).
However, any servers that you build on Fedora will have a lifespan of somewhere in the region of 12-18 months (depending on the Fedora release cycle)—in an enterprise setting, having to redeploy a server after, say, 12-18 months is a headache that is not needed.
This is not to say there is never a case for deploying on Fedora or any other fast-moving Linux platform—it is simply to state that in an enterprise where security and reliability are vitally important, you are unlikely to want a Linux platform with a short life cycle as the short term gain (newer library support) would be replaced in 12-18 months with the pain of a lack of updates and the need to rebuild/upgrade the platform.
Of course, this does depend very much on your approach to your infrastructure—some enterprises take a very container-like approach to their servers and re-deploy them with every new software release or application deployment. When your infrastructure and build standards are defined by code (such as Ansible), then it is entirely possible to do this with a fairly minimal impact on your day-to-day operations, and it is unlikely that any single server would be around for long enough for the operating system to become outdated or unsupported.
At the end of the day, the choice is yours and you must establish which path you feel provides you with the most business benefit without putting your operations at risk. Part of standardization is to make sound, rational decisions on technology and to adopt them wherever feasible, and your standard could include frequent rebuilds such that you can use a fast-moving operating system such as Fedora. Equally, you might decide that your standard is that servers will have long lives and be upgraded in place, and in this case, you would be better choosing an operating system such as an Ubuntu LTS release or RHEL/CentOS.
In the following section, we will look in greater detail at how an SOE benefits the concept of supportability in the next section.
As we have already discussed, having a standardized environment brings with it two benefits. The first is that a well-chosen platform means a long vendor support life cycle. This, in turn, means long support from either the vendor (in the case of a product such as RHEL) or the community (in the case of CentOS). Some operating systems such as Ubuntu Server are available with either community support or a paid contract directly from Canonical.
Supportability doesn't just mean support from the vendor or the Linux community at large, however. Remember that, in an enterprise, your staff is your front line support before anyone external steps in. Now, imagine having a crack team of Linux staff, and presenting them with a server estate comprised of Debian, SuSe, CentOS, Fedora, Ubuntu, and Manjaro. There are similarities between them, but also a huge number of differences. Across them, there are four different package managers for installing and managing software packages, and that's just one example.
Whilst entirely supportable, it does present more of a challenge for your staff and means that, for anyone joining the company, you require both a broad and a deep set of Linux experience—either that or an extensive on-boarding process to get them up to speed.
With a standardized environment, you might end up with more than one operating system, but nonetheless, if you can meet all of your requirements with, say, CentOS 7 and Ubuntu Server 18.04 LTS (and know that you are covered for the next few years because of your choices), then you immediately reduce the workload on your Linux team and enable them to spend more time creatively solving problems (for example, automating solutions with Ansible!) and less time figuring out the nuances between operating systems. As we have also discussed, in the event of an issue, they will be more familiar with each OS and hence need to spend less time debugging, reducing downtime.
This brings us nicely into the subject of ease of use at scale, and we will provide an overview of this in the next section.
This final category overlaps heavily with the last two—that is to say that, quite simply, the more standardized your environment, the easier it is for a given set of employees to get to grips with it. This automatically promotes all of the benefits we have discussed so far around reducing downtime, easier recruitment and on-boarding of staff, and so on.
Having set out the challenges that an SOE helps to address, we will proceed in the next section to look at the anatomy of such an environment to understand it from a technical standpoint.
Now that we've explored the reasons why an SOE is important to the enterprise and understood at a high level the solutions for these problems, let's look in detail at an SOE. We will begin by defining the SOE itself.
Let's take a quick look at this from a more practical standpoint. As we have already said, an SOE is a concept, not an absolute. It is, at its simplest level, a common server image or build standard that is deployed across a large number of servers throughout a company. Here, all required tasks are completed in a known, documented manner.
To start with, there is the base operating system—and, as we have discussed, there are hundreds of Linux distributions to choose from. Some are quite similar from a system administration perspective (for example, Debian and Ubuntu), whilst some are markedly different (for example, Fedora and Manjaro). By way of a simple example, let's say you wanted to install the Apache Web Server on Ubuntu 18.04 LTS—you would enter the following commands:
# sudo apt-get update
# sudo apt-get install apache2
Now, if you wanted to do the same thing but on CentOS 7, you would enter the following:
# sudo yum install httpd
As you can see, there is nothing in common between these commands—not even the name of the package, even though the end result in both cases is an installation of Apache. On a small scale, this is not an issue, but when servers are numerous and as server count goes up, so does the complexity of managing such an environment.
The base operating system is just the start. Our example above was installing Apache, yet we could also install nginx or even lighttpd. They are, after all, also web servers.
Then, there is configuration. Do you want users to be able to log in as root over SSH? Do you need a certain level of logging for audit or debug purposes? Do you need local or centralized authentication? The list is myriad, and as you can see, if left unchecked could grow into a massive headache.
This is where the SOE comes in. It is effectively a specification, and at a high level, it might say the following:
Our standard base operating system is Ubuntu 18.04 LTS.
Our standard web server will be Apache 2.4.
SSH logins are enabled, but only for users with SSH keys and not root.
All user logins must be logged and archived for audit purposes.
Except for a few local
break glass
accounts, all accounts must be centrally managed (for example, by LDAP or Active Directory).
Our corporate monitoring solution must be integrated (for example, the Nagios NCPA agent must be installed and configured to communicate with our Nagios server).
All system logs must be sent to the corporate central log management system.
Security hardening must be applied to the system.
The preceding is simply an example, and it is by no means complete; however, it should begin to give you an idea of what an SOE looks like at a high level. As we proceed through this chapter, we will delve deeper into this subject and give more examples to build up a clear definition.
Before we proceed, let's take a look in a little more detail at what to include in the environment. We have outlined in the previous section a very simplistic definition for an SOE. Part of any good SOE operating process is to have a pre-defined operating system build that can be deployed at a moment's notice. There are multiple ways this might be achieved and we will discuss these later in this book—however, for the time being, let's assume that a base image of Ubuntu 18.04 LTS as suggested previously has been built. What do we integrate into this standard build?
We know, for example, that our login policy is going to be applied throughout the organization—hence, when the build is created, /etc/ssh/sshd_config must be customized to include PermitRootLogin no and PasswordAuthentication no. There is no point in performing this step in the post-deployment configuration, as this would have to be performed on each and every single deployment. Quite simply, this would be inefficient.
There are also important automation considerations for our operating system image. We know that Ansible itself communicates over SSH, and so we know that we are going to require some kind of credentials (it is quite likely this will be SSH key-based) for Ansible to run against all of the deployed servers. There is little point in having to manually roll out Ansible credentials to every single machine before you can actually perform any automation, and so it is important to consider the kind of authentication you want Ansible to use (for example, password- or SSH key-based), and to create the account and corresponding credentials when you build the image. The exact method for doing this will depend upon your corporate security standards, but I would advocate as a potential solution the following:
Creating a local account on the standard image for Ansible to authenticate against
Giving this account appropriate sudo rights to ensure all desired automation tasks can be performed
Setting the local password for this account, or adding the SSH public key from an Ansible key-pair to the
authorized_keys
file for the local Ansible account you created
Moving on from user accounts and authentication, consider also Nagios Cross-Platform Agent (NCPA). We know in our example that all deployed servers are going to need to be monitored, and so it is a given that NCPA agent must be installed, and the token defined such that it can communicate with the Nagios server. Again, there is no point doing this on every single server after the standard image is deployed.
What about the web server though? It is sensible to have a standard, as it means all who are responsible for the environment can become comfortable with the technology. This makes administration easier and is especially beneficial for automation, as we shall see in the next section. However, unless you only ever deploy web servers running on Linux, this probably shouldn't be included as part of the standard build.
As a sound principle, the standard builds should be as simple and lightweight as possible. There is no point in having additional services running on them, taking up memory and CPU cycles, when they are redundant. Equally, having unconfigured services increases the attack surface for any potential attacker and so for security reasons, it is advisable to leave them out.
In short, the standard build should only include configuration and/or services that are going to be common to every server deployed. This approach is sometimes referred to as Just enough Operating System or JeOS for short, and it is the best starting point for your SOE.
Having understood the basic principles of an SOE, we will proceed in the next section to look in more detail at the benefits an SOE brings to your enterprise.
By now, you should have some idea of what an SOE is, and how it brings economies of scale and greater efficiency to a Linux environment. Now, let's build on that and look in more detail at an example of the importance of standardization.
To say that there are commonalities in a Linux environment is to say that the servers that comprise it all share attributes and features. For example, they might all be built upon Ubuntu Linux, or they might all have Apache as their web server.
We can explore this concept with an example. Suppose that you have 10 Linux web servers behind a load balancer and that they are all serving simple static content. Everything is working fine, but then a configuration change is mandated. Perhaps this is to change the document root of each web server to point to a new code release that has been deployed to them by another team.
As the person responsible, you know that because the overall solution is load balanced, all servers should be serving the same content. Therefore, the configuration change is going to be required on each and every one. That means 10 configurations changes to make if you do it by hand.
You could, of course, do this by hand, but this would be tedious and certainly isn't the best use of time for a skilled Linux admin. It is also error-prone—a typo could be made on one of the 10 servers and not spotted. Or the admin could be interrupted by an outage elsewhere and only a subset of the server configurations changed.
The better solution would be to write a script to make the change. This is the very basis of automation and it is almost certainly going to be a better use of time to run a single script once against 10 servers than to manually make the same change 10 times over. Not only is it more efficient, but if the same change became required in a month, the script could be reused with just minimal adjustment.
Now, let's throw a spanner into the works. What if, for reasons unknown, someone built five of the web servers using Apache on CentOS 7, and the other five using nginx on Ubuntu 18.04 LTS? The end result would, after all, be the same—at a basic level, they are both web servers. However, if you want to change the document root in Apache on CentOS 7, you would need to do the following:
Locate the appropriate configuration file in
/etc/httpd/conf.d
.
Make the required change to the
DocumentRoot
parameter.
Reload the web server with
systemctl reload httpd.service
.
If you had to do the same thing for nginx on Ubuntu 18.04 LTS, you would do the following:
Locate the correct configuration file in
/etc/nginx/sites-available
.
Make the required change to the
root
parameter.
Ensure that the site configuration file is enabled using the
a2ensite
command
—
otherwise, Apache will not actually see the configuration file.
Reload the web server with
systemctl reload apache2.service
.
As you can see from this rather simplistic (albeit contrived) example, a lack of commonality is the enemy of automation. To cope with the case, you would need to do as follows:
Detect the operating system on each server. This in itself is non-trivial—there is no one way to detect a Linux operating system, so your script would have to walk through a series of checks, including the following:
The contents of
/etc/os-release
,
if it exists
The output of
lsb_release
,
if it is installed
The contents of
/etc/redhat-release
,
if it exists
The contents of
/etc/debian_version
,
if it exists
Other OS-specific files as required, if none of the preceding produce meaningful results
Run different modification commands in different directories to effect the change as discussed previously.
Run different commands to reload the web server, again as detailed previously.
Hence, the script becomes complex, more difficult to write and maintain, and certainly more difficult to make reliable.
Although this particular example is unlikely to occur in real life, it does serve to make an important point—automation is much easier to implement when the environment is built to a given standard. If a decision is made that all web servers are to be based on CentOS 7, to run Apache 2, and have the site configuration named after the service name, then our automation becomes so much easier. In fact, you could even run a simple sed command to complete the change; for example, suppose the new web application was deployed to /var/www/newapp: