46,44 €
Leverage the power of Kubernetes to build an efficient software delivery pipeline.
Key Features
Book Description
Kubernetes has been widely adopted across public clouds and on-premise data centers. As we're living in an era of microservices, knowing how to use and manage Kubernetes is an essential skill for everyone in the IT industry.
This book is a guide to everything you need to know about Kubernetes—from simply deploying a container to administrating Kubernetes clusters wisely. You'll learn about DevOps fundamentals, as well as deploying a monolithic application as microservices and using Kubernetes to orchestrate them. You will then gain an insight into the Kubernetes network, extensions, authentication and authorization.
With the DevOps spirit in mind, you'll learn how to allocate resources to your application and prepare to scale them efficiently. Knowing the status and activity of the application and clusters is crucial, so we'll learn about monitoring and logging in Kubernetes. Having an improved ability to observe your services means that you will be able to build a continuous delivery pipeline with confidence. At the end of the book, you'll learn how to run managed Kubernetes services on three top cloud providers: Google Cloud Platform, Amazon Web Services, and Microsoft Azure.
What you will learn
Who this book is for
This book is for anyone who wants to learn containerization and clustering in a practical way using Kubernetes. No prerequisite skills are required, however, essential DevOps skill and public/private Cloud knowledge will accelerate the reading speed. If you're advanced, you can get a deeper understanding of all the tools and technique described in the book.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 486
Veröffentlichungsjahr: 2019
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Gebin GeorgeAcquisition Editor:Shrilekha InaniContent Development Editor:Deepti ThoreTechnical Editor:Varsha ShivhareCopy Editor:Safis EditingLanguage Support Editor: Storm Mann, Mary McGowanProject Coordinator:Jagdish PrabhuProofreader: Safis EditingIndexer:Rekha NairGraphics:Jisha ChirayilProduction Coordinator: Aparna Bhagat
First published: October 2017 Second edition: January 2019
Production reference: 1280119
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78953-399-6
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Hideto Saito has around 20 years of experience in the computer industry. In 1998, while working for Sun Microsystems Japan, he was impressed with Solaris OS, OPENSTEP, and Sun Ultra Enterprise 10000 (AKA StarFire). Then, he decided to pursue the UNIX and macOS X operating systems.
In 2006, he relocated to Southern California as a software engineer to develop products and services running on Linux and macOS X. He was especially renowned for his quick Objective-C code when he was drunk. He is also an enthusiast of Japanese anime, drama, and motorsports, and loves Japanese Otaku culture.
Hui-Chuan Chloe Lee is a DevOps and software developer. She has worked in the software industry on a wide range of projects for over 5 years. As a technology enthusiast, Chloe loves trying and learning about new technologies, which makes her life happier and more fulfilled. In her free time, she enjoys reading, traveling, and spending time with the people she loves.
Cheng-Yang Wu has been tackling infrastructure and system reliability since he received his master's degree in computer science from National Taiwan University. His laziness prompted him to master DevOps skills to maximize his efficiency at work so as to squeeze in writing code for fun. He enjoys cooking as it's just like working with software – a perfect dish always comes from balanced flavors and fine-tuned tastes.
Guang Ya Liu is a Senior Technical Staff Member (STSM) for IBM Cloud Private and is now focusing on cloud computing, container technology, and distributed computing. He is also a member of the IBM Academy of Technology. He used to be an OpenStack Magnum Core member from 2015 to 2017, and now serves as an Istio maintainer, Kubernetes member, Kubernetes Federation V2 maintainer, and Apache Mesos committer and PMC member.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
DevOps with Kubernetes Second Edition
About Packt
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to DevOps
Software delivery challenges
Waterfall and static delivery
Agile and digital delivery
Software delivery on the cloud
Continuous integration
Continuous delivery
Configuration management
Infrastructure as code
Orchestration
The microservices trend
Modular programming
Package management
The MVC design pattern
Monolithic applications
Remote procedure call
RESTful design
Microservices
Automation and tools
Continuous integration tools
Configuration management tools
Monitoring and logging tools
Communication tools
The public cloud
Summary
DevOps with Containers
Understanding containers
Resource isolation
Linux containers
Containerized delivery
Getting started with containers
Installing Docker for Ubuntu
Installing Docker for CentOS
Installing Docker for macOS
The life cycle of a container
The basics of Docker
Layers, images, containers, and volumes
Distributing images
Connecting containers
Working with a Dockerfile
Writing your first Dockerfile
The syntax of a Dockerfile
Organizing a Dockerfile
Multi-stage builds
Multi-container orchestration
Piling up containers
An overview of Docker compose
Composing containers
Summary
Getting Started with Kubernetes
Understanding Kubernetes
Kubernetes components
Master components
API server (kube-apiserver)
Controller manager (kube-controller-manager)
etcd
Scheduler (kube-scheduler)
Node components
Kubelet
Proxy (kube-proxy)
Docker
The interaction between the Kubernetes master and nodes
Getting started with Kubernetes
Preparing the environment
kubectl
Kubernetes resources
Kubernetes objects
Namespaces
Name
Label and selector
Annotation
Pods
ReplicaSet
Deployments
Services
ClusterIP
NodePort
LoadBalancer
ExternalName (kube-dns version >= 1.7)
Service without selectors
Volumes
Secrets
Retrieving secrets via files
Retrieving secrets via environment variables
ConfigMap
Using ConfigMap via volume
Using ConfigMap via environment variables
Multi-container orchestration
Summary
Managing Stateful Workloads
Kubernetes volume management
Container volume life cycle
Sharing volume between containers within a pod
Stateless and stateful applications
Kubernetes' persistent volume and dynamic provisioning
Abstracting the volume layer with a persistent volume claim
Dynamic provisioning and StorageClass
Problems with ephemeral and persistent volume settings
Replicating pods with a persistent volume using StatefulSet
Submitting Jobs to Kubernetes
Submitting a single Job to Kubernetes
Submitting a repeatable Job
Submitting a parallel Job
Scheduling running a Job using CronJob
Summary
Cluster Administration and Extension
Kubernetes namespaces
Context
Creating a context
Switching the current context
Kubeconfig
Service account
Authentication and authorization
Authentication
Service account token authentication
User account authentication
Authorization
Role-based access control (RBAC)
Roles and ClusterRoles
RoleBinding and ClusterRoleBinding
Admission control
NamespaceLifecycle
LimitRanger
ServiceAccount
PersistentVolumeLabel
DefaultStorageClass
ResourceQuota
DefaultTolerationSeconds
PodNodeSelector
AlwaysPullImages
DenyEscalatingExec
Other admission controller plugins
Dynamic admission control
Admission webhook
Custom resources
Custom resources definition
Summary
Kubernetes Network
Kubernetes networking
Docker networking
Container-to-container communications
Pod-to-pod communications
Pod communication within the same node
Pod communication across nodes
Pod-to-service communications
External-to-service communications
Ingress
Network policy
Service mesh
Summary
Monitoring and Logging
Inspecting a container
The Kubernetes dashboard
Monitoring in Kubernetes
Monitoring applications
Monitoring infrastructure
Monitoring external dependencies
Monitoring containers
Monitoring Kubernetes
Getting monitoring essentials for Kubernetes
Hands-on monitoring
Getting to know Prometheus
Deploying Prometheus
Working with PromQL
Discovering targets in Kubernetes
Gathering data from Kubernetes
Visualizing metrics with Grafana
Logging events
Patterns of aggregating logs
Collecting logs with a logging agent per node
Running a sidecar container to forward written logs
Ingesting Kubernetes state events
Logging with Fluent Bit and Elasticsearch
Extracting metrics from logs
Incorporating data from Istio
The Istio adapter model
Configuring Istio for existing infrastructure
Mixer templates
Handler adapters
Rules
Summary
Resource Management and Scaling
Scheduling workloads
Optimizing resource utilization
Resource types and allocations
Quality of Service (QoS) classes
Placing pods with constraints
Node selector
Affinity and anti-affinity
Node affinity
Inter-pod affinity
Prioritizing pods in scheduling
Elastically scaling
Horizontal pod autoscaler
Incorporating custom metrics
Managing cluster resources
Resource quotas of namespaces
Creating a ResourceQuota
Request pods with default compute resource limits
Node administration
Pod eviction
Taints and tolerations
Summary
Continuous Delivery
Updating resources
Triggering updates
Managing rollouts
Updating DaemonSet and StatefulSet
DaemonSet
StatefulSet
Building a delivery pipeline
Choosing tools
End-to-end walk-through of the delivery pipeline 
The steps explained
env
script
after_success
deploy
Gaining a deeper understanding of pods
Starting a pod
Liveness and readiness probes
Custom readiness gate
init containers
Terminating a pod
Handling SIGTERM
SIGTERM isn't sent to the application process
SIGTERM doesn't invoke the termination handler
Container life cycle hooks
Tackling pod disruptions
Summary
Kubernetes on AWS
Introduction to AWS
Public cloud
API and infrastructure as code
AWS components
VPC and subnet
Internet gateways and NAT-GW
Security group
EC2 and EBS
ELB
Amazon EKS
Deep dive into AWS EKS
Launching the EKS control plane
Adding worker nodes
Cloud provider on EKS
Storage class
Load balancer
Internal load balancer
Internet-facing load balancer
Updating the Kubernetes version on EKS
Upgrading the Kubernetes master
Upgrading worker nodes
Summary
Kubernetes on GCP
Introduction to GCP
GCP components
VPC
Subnets
Firewall rules
VM instances
Load balancing
Health check
Backend service
Creating a LoadBalancer
Persistent Disk
Google Kubernetes Engine (GKE)
Setting up your first Kubernetes cluster on GKE
Node pool
Multi-zone clusters
Cluster upgrade
Kubernetes cloud provider
StorageClass
L4 LoadBalancer
L7 LoadBalancer (ingress)
Summary
Kubernetes on Azure
Introduction to Azure
Resource groups
Azure virtual network
Network security groups
Application security groups
Subnets
Azure virtual machines
Storage account
Load balancers
Azure disks
Azure Kubernetes service
Setting up your first Kubernetes cluster on AKS
Node pools
Cluster upgrade
Monitoring and logging
Kubernetes cloud provider
Role-based access control
StorageClass
L4 LoadBalancer
Ingress controller
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
This book explains fundamental concepts and useful skills for implementing DevOps principles with containers and Kubernetes. Our journey starts by introducing the core concepts of containers and Kubernetes, and we explore the various features provided by Kubernetes, such as persisting states and data for containers, different types of workloads, cluster network, and cluster management and extension. In order to supervise the cluster activity, we implement a monitoring and logging infrastructure in Kubernetes. For better availability and efficiency, we also learn how to autoscale containers and build a continuous delivery pipeline. Lastly, we learn how to operate the hosted Kubernetes platforms from the top three major public cloud providers.
This book is intended for DevOps professionals with some software development experience who want to scale, automate, and shorten software delivery time to market.
Chapter 1, Introduction to DevOps, walks you through the evolution from the past to what we call DevOps today, and the tools that you should know in this field. Demand for people with DevOps skills has been growing rapidly over the last few years. DevOps practices have accelerated software development and delivery speed, as well as helping business agility.
Chapter 2, DevOps with Containers, helps you learn the fundamentals of working with containers. With the increasing trend toward microservices, containers are a handy and essential tool for every DevOps practitioner because of the agility they bring to managing heterogeneous services in an uniform way.
Chapter 3, Getting Started with Kubernetes, explores the key components and API objects in Kubernetes, and how to deploy and manage containers in a Kubernetes cluster.
Chapter 4, Managing Stateful Workloads, describes pod controllers for different workloads, along with the volume management feature for maintaining the state of an application.
Chapter 5, Cluster Administration and Extension, navigates you through the access control features of Kubernetes, and looks at the built-in admission controllers that provide finer granularity of control over your cluster. Furthermore, we'll also learn how to build our own custom resource to extend the cluster with customized features.
Chapter 6, Kubernetes Network, explains how default networking and routing rules work in Kubernetes. We'll also learn how to expose HTTP and HTTPS routes for external access. At the end of this chapter, the network policy and service mesh features are also introduced for better resiliency.
Chapter 7, Monitoring and Logging, shows you how to monitor a resource's usage at the application, container, and node levels using Prometheus. This chapter also shows how to collect logs from your applications, the service mesh, and Kubernetes with Elasticsearch, Fluent-bit/Fluentd, and the Kibana stack. Ensuring the service is up and healthy is one of the major responsibilities of DevOps.
Chapter 8, Resource Management and Scaling, describes how to leverage the core of Kubernetes, the scheduler, to scale the application dynamically, thereby efficiently utilizing the resources of our cluster.
Chapter 9, Continuous Delivery, explains how to build a continuous delivery pipeline with GitHub/DockerHub/TravisCI. It also explains how to manage updates, eliminate the potential impact when doing rolling updates, and prevent possible failure. Continuous delivery is an approach to speed up your time-to-market.
Chapter 10, Kubernetes on AWS, walks you through AWS components and explains how to provision a cluster with the AWS-hosted Kubernetes service—EKS. EKS provides lots of integration with existing AWS services. We'll learn how to utilize those features in this chapter.
Chapter 11, Kubernetes on GCP, helps you learn the concept of GCP and how to run your applications in GCP's Kubernetes service offering—Google Kubernetes Engine (GKE). GKE has the most native support for Kubernetes. We'll learn how to administer GKE in this chapter.
Chapter 12, Kubernetes on Azure, describes basic Azure components, such as Azure virtual network, Azure virtual machines, disk storage options, and much more. We'll also learn how to provision and run a Kubernetes cluster with Azure Kubernetes Service.
This book will guide you through the methodology of software development and delivery with Docker containers and Kubernetes using macOS and public cloud services (AWS, GCP, and Azure). You will need to install minikube, AWS CLI, Cloud SDK, and the Azure CLI to run the code samples in this book.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/DevOps-with-Kubernetes-Second-Edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781789533996_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "In light of this, cgroups is utilized here to limit resource usage."
A block of code is set as follows:
ENV key value ENV key1=value1 key2=value2 ...
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
metadata:
name: nginx-external annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
Any command-line input or output is written as follows:
$ docker run -p 80:5000 busybox /bin/sh -c \ "while :; do echo -e 'HTTP/1.1 200 OK\n\ngood day'|nc -lp 5000; done"$ curl localhostgood day
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "After clickingCreate, the console will bring us to the following view for us to explore."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
Over the past few years, the software delivery cycle has been moving increasingly fast, while at the same time application deployment has become more and more complicated. This increases the workload of all roles involved in the release cycle, including software developers, Quality Assurance (QA) teams, and IT operators. In order to deal with rapidly-changing software systems, a new concept called DevOps was introduced in 2009, which is dedicated to helping the whole software delivery pipeline evolve in order to make it faster and more robust.
This chapter covers the following topics:
How has the software delivery methodology changed?
What is a microservices architecture? Why do people choose to adopt this architecture?
What is DevOps? How can it make software systems more resilient?
The Software Development Life Cycle (SDLC), or the way in which we build applications and deliver them to the market, has evolved significantly over time.In this section, we'll focus on the changes made and why.
Back in the 1990s, software was delivered in a static way—using a physical floppy disk or CD-ROM. The SDLC always took years per cycle, because it wasn't easy to (re)deliver applications to the market.
At that time, one of the major software development methodologies was the waterfall model. This is made up of various phases, as shown in the following diagram:
Once one phase was started, it was hard go back to the previous phase. For example, after starting the Implementation phase, we wouldn't be able to go back to the Design phase to fix a technical expandability issue, for example, because any changes would impact the overall schedule and cost. Everything was hard to change, so new designs would be relegated to the next release cycle.
The waterfall method had to coordinate precisely with every department, including development, logistics, marketing, and distributors. The waterfall model and static delivery sometimes took several years and required tremendous effort.
A few years later, when the internet became more widely used, the software delivery method changed from physical to digital, using methods such as online downloads. For this reason, many software companies (also known as dot-com companies) tried to figure out how to shorten the SDLC process in order to deliver software that was capable of beating their competitors.
Many developers started to adopt new methodologies, such as incremental, iterative, or agile models, in the hope that these could help shorten the time to market. This meant that if new bugs were found, these new methods could deliver patches to customers via electronic delivery. From Windows 98, Microsoft Windows updates were also introduced in this manner.
In agile or digital models, software developers write relatively small modules, instead of the entire application. Each module is delivered to a QA team, while the developers continue to work on new modules. When the desired modules or functions are ready, they will be released as shown in the following diagram:
This model makes the SDLC cycle and software delivery faster and easily adjustable. The cycle ranges from a few weeks to a few months, which is short enough to make quick changes if necessary.
Although this model was favored by the majority at the time, application software delivery meant software binaries, often in the form of an EXE program, had to be installed and run on the customer's PC. However, the infrastructure (such as the server or the network) is very static and has to set up beforehand. Therefore, this model doesn't tend to include the infrastructure in the SDLC.
A few years later, smartphones (such as the iPhone) and wireless technology (such as Wi-Fi and 4G networks) became popular and widely used. Application software was transformed from binaries to online services. The web browser became the interface of application software, which meant that it no longer requires installation. The infrastructure became very dynamic—in order to accommodate rapidly-changing application requirements, it now had to be able to grow in both capacity and performance.
This is made possible through virtualization technology and a Software Defined Network(SDN). Now, cloud services, such as Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure, are often used. These can create and manage on-demand infrastructures easily.
The infrastructure is one of the most important components within the scope of the Software Development Delivery Cycle. Because applications are installed and operated on the server side, rather than on a client-side PC, the software and service delivery cycle takes between just a few days and a few weeks.
As mentioned previously, the software delivery environment is constantly changing, while the delivery cycle is getting increasingly shorter. In order to achieve this rapid delivery with a higher quality, developers and QA teams have recently started to adopt automation technologies. One of these is Continuous Integration (CI). This includes various tools, such as Version Control Systems (VCSs), build servers, and testing automation tools.
VCSs help developers keep track of the software source code changes in central servers. They preserve code revisions and prevent the source code from being overwritten by different developers. This makes it easier to keep the source code consistent and manageable for every release. Centralized build servers connect to VCSs to retrieve the source code periodically or automatically whenever the developer updates the code to VCS. They then trigger a new build. If the build fails, the build server notifies the developer rapidly. This helps the developer when someone adds broken code into the VCS. Testing automation tools are also integrated with the build server. These invoke the unit test program after the build succeeds, then notify the developer and QA team of the result. This helps to identify if somebody writes buggy code and stores it in the VCS.
The entire CI flow is shown in the following diagram:
CI helps both developers and QA teams to not only increase the quality, but also shorten the process of archiving an application or a module package cycle. In the age of electronic delivery to the customer, CI is more than enough. Delivery to the customer means deploying the application to the server.
CI plus deployment automation is an ideal process for server-side applications to provide a service to customers. However, there are some technical challenges that need to be resolved, such as how to deploy the software to the server; how to gracefully shut down the existing application; how to replace and roll back the application; how to upgrade or replace system libraries that also need to be updated; and how to modify the user and group settings in the OS if necessary.
An infrastructure includes servers and networks. We normally have different environments for different software release stages, such as development, QA, staging, and production. Each environment has its own server configuration and IP ranges.
Continuous Delivery (CD) is a common way of resolving the previously mentioned challenges. This is a combination of CI, configuration management, and orchestration tools:
Configuration management tools help to configure OS settings, such as creating a user or group, or installing system libraries. It also acts as an orchestrator, which keeps multiple managed servers consistent with our desired state.
It's not a programming script, because a script is not necessarily idempotent. This means that if we execute a script twice, we might get an error, such as if we are trying to create the same user twice. Configuration management tools, however, watch the state, so if a user is created already, a configuration management tool wouldn't do anything. If we delete a user accidentally or even intentionally, the configuration management tool would create the user again.
Configuration management tools also support the deployment or installation of software to the server. We simply describe what kind of software package we need to install, then the configuration management tool will trigger the appropriate command to install the software package accordingly.
As well as this, if you tell a configuration management tool to stop your application, to download and replace it with a new package (if applicable), and restart the application, it'll always be up-to-date with the latest software version. Via the configuration management tool, you can also perform blue-green deployments easily.
The configuration management tool supports not only a bare metal environment or a VM, but also cloud infrastructure. If you need to create and configure the network, storage, and VM on the cloud, the configuration management tool helps to set up the cloud infrastructure on the configuration file, as shown in the following diagram:
Configuration management has some advantages compared to a Standard Operation Procedure (SOP). It helps to maintain a configuration file via VCS, which can trace the history of all of the revisions.
It also helps to replicate the environment. For example, let's say we want to create a disaster recovery site in the cloud. If you follow the traditional approach, which involves using the SOP to build the environment manually, it's hard to predict and detect human or operational errors. On the other hand, if we use the configuration management tool, we can build an environment in the cloud quickly and automatically.
The orchestration tool is part of the configuration management tool set. However, this tool is more intelligent and dynamic with regard to configuring and allocating cloud resources. The orchestration tool manages several server resources and networks. Whenever the administrator wants to increase the application and network capacity, the orchestration tool can determine whether a server is available and can then deploy and configure the application and the network automatically. Although the orchestration tool is not included in SDLC, it helps the capacity management in the CD pipeline.
To conclude, the SDLC has evolved significantly such that we can now achieve rapid delivery using various processes, tools, and methodologies. Now, software delivery takes place anywhere and anytime, and software architecture and design is capable of producing large and rich applications.
As mentioned previously, software architecture and design has continued to evolve based on the target environment and the volume of the application. This section will discuss the history and evolution of software design.
As the size of applications increases, the job of developers is to try to divide it into several modules. Each module aims to be independent and reusable, and each is maintained by different developer teams. The main application simply initializes, imports, and uses these modules. This makes the process of building a larger application more efficient.
The following example shows the dependencies for nginx (https://www.nginx.com) on CentOS 7. It indicates that nginx uses OpenSSL(libcrypt.so.1, libssl.so.10), the POSIX thread(libpthread.so.0) library, the regular expression PCRE(libpcre.so.1) library, the zlib(libz.so.1) compression library, the GNU C(libc.so.6) library, and so on:
$ /usr/bin/ldd /usr/sbin/nginx
linux-vdso.so.1 => (0x00007ffd96d79000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007fd96d61c000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fd96d400000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007fd96d1c8000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007fd96cf67000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007fd96ccf9000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007fd96c90e000)
libz.so.1 => /lib64/libz.so.1 (0x00007fd96c6f8000)
libprofiler.so.0 => /lib64/libprofiler.so.0 (0x00007fd96c4e4000)
libc.so.6 => /lib64/libc.so.6 (0x00007fd96c122000)
...
The Java programming language, and several other scripting programming languages such as Python, Ruby, and JavaScript, have their own module or package management tool. Java, for example, has Maven (http://maven.apache.org), Python uses pip (https://pip.pypa.io), RubyGems (https://rubygems.org) is used for for Ruby, and npm is used (https://www.npmjs.com) for JavaScript.
Package management tools not only allow you to download the necessary packages, but can also register the module or package that you implement. The following screenshot shows the Maven repository for the AWS SDK:
When you add dependencies to your application, Maven downloads the necessary packages. The following screenshot is the result you get when you add the aws-java-sdk dependency to your application:
Modular programming helps you to accelerate software development speed. However, applications nowadays have become more sophisticated. They require an ever-increasing number of modules, packages, and frameworks, and new features and logic are continuously added. Typical server-side applications usually use authentication methods such as LDAP, connect to a centralized database such as RDBMS, and then return the result to the user. Developers have recently found themselves required to utilize software design patterns in order to accommodate a bunch of modules in an application.
One of the most popular application design patterns is Model-View-Controller (MVC). This defines three layers: the Model layer is in charge of data queries and persistence, such as loading and storing data to a database; the Viewlayer is in charge of the User Interface(UI) and the Input/Output(I/O);and theControllerlayer is in charge of business logic,which lies in between the Viewand the Model:
There are some frameworks that help developers to make MVC easier, such as Struts (https://struts.apache.org/), SpringMVC (https://projects.spring.io/spring-framework/), Ruby on Rails (http://rubyonrails.org/), and Django (https://www.djangoproject.com/). MVC is one of the most successful software design pattern, and is used for the foundation of modern web applications and services.
MVC defines a borderline between every layer, which allows several developers to jointly develop the same application. However, it also causes some negative side effects. The size of the source code within the application keeps getting bigger. This is because the database code (the Model), the presentation code (the View), and the business logic (the Controller) are all within the same VCS repository. This eventually has an impact on the software development cycle. This type of application is called a monolithic application. It contains a lot of code that builds a giant EXE or war program.
There's no concrete measurement that we can use to define an application as monolithic, but a typical monolithic app tends to have more than 50 modules or packages, more than 50 database tables, and requires more than 30 minutes to build. If we need to add or modify one of those modules, the changes made might affect a lot of code. Therefore, developers try to minimize code changes within the application. This reluctance can lead to the developer hesitation to maintain the application code, however, if problems aren't dealt with in a timely manner. For this reason, developers now tend to divide monolithic applications into smaller pieces and connect them over the network.
In fact, dividing an application into small pieces and connecting them via a network was first attempted back in the 1990s, when Sun Microsystems introduced the SunRemote Procedure Call (SunRPC). This allows you to use a module remotely. One of most popular implementation is Network File System (NFS). The NFS client and the NFS server can communicate over the network, even if the server and the client use different CPUs and OSes.
Some programming languages also support RPC-style functionality. UNIX and the C language have the rpcgen tool, which generates a stub code that contains some complicated network communication code. The developer can use this over the network to avoid difficult network-layer programming.
Java has the JavaRemote Method Invocation (RMI), which is similar to the Sun RPC, but specific to the Java language. The RMI Compiler (RMIC) generates the stub code that connects remote Java processes to invoke the method and return a result. The following diagram shows the procedure flow of the Java RMI:
Objective C has a distributed object and .NET has remoting, both of which work in a similar fashion. Most modern programming languages have RPC capabilities out of the box. These RPC designs are capable of dividing a single application into multiple processes (programs). Individual programs can have separate source code repositories. While the RPC designs worked well, machine resources (CPU and memory) were limited during the 1990s and early 2000s. Another disadvantage was that the same programming language was intended to be used throughout and these designs were intended to be used for a client/server model architecture, rather than a distributed architecture. In addition, there was less security consideration when these designs were developed, so they are not recommended to be used over a public network.
In the early 2000s, initiative web services that used SOAP (HTTP/SSL) as data transport were developed. These used XML for data presentation and the Web Services Description Language (WSDL) to define services. Then, Universal Description, Discovery, and Integration (UDDI) was used as the service registry to look up a web services application. However, as machine resources were not plentiful at the time and due to the complexity of programming and maintaining web services, this was not widely accepted by developers.
In the 2010s, machines and even smartphones were able to access plenty of CPU resources, and network bandwidths of a few hundred Mbps were everywhere. Developers started to utilize these resources to make application code and system structures as easy as possible, making the software development cycle quicker.
Nowadays, there are sufficient hardware resources available, so it makes sense to use HTTP/SSL as the RPC transport. In addition, from experience, developers choose to make this process easier as follows:
By making HTTP and SSL/TLS as standard transport
By using HTTP method for
Create
/
Load
/
Upload
/
Delete
(
CLUD
) operation, such as
GET
,
POST
,
PUT
, or
DELETE
By using the URI as the resource identifier, the user with the ID
123
, for example, would have the URI of
/user/123/
By using JSON for standard data presentation
These concepts are known as Representational State Transfer (RESTful) design. They have been widely accepted by developers and have become the de facto standard of distributed applications. RESTful applications allow the use of any programming language, as they are HTTP-based. It is possible to have, for example, Java as the RESTful server and Python as the client.
RESTful design brings freedom and opportunities to the developer. It makes it easy to perform code refactoring, to upgrade a library, and even to switch to another programming language. It also encourages the developer to build a distributed modular design made up of multiple RESTful applications, which are called microservices.
If you have multiple RESTful applications, you might be wondering how to manage multiple source codes on VCS and how to deploy multiple RESTful servers. However, CI and CD automation makes it easier to build and deploy multiple RESTful server applications. For this reason, the microservices design is becoming increasingly popular for web application developers.
Although microservices have the word micro in their name, they are actually pretty heavy compared to applications from the 1990s or early 2000s. They use full stack HTTP/SSL servers and contain entire MVC layers.
The microservices design has the following advantages:
Stateless
: They don't store user sessions to the system, which helps to scale the application.
No shared data store
: Microservices should have their own data stores, such as databases. They shouldn't share these with other applications. They help to encapsulate the backend database so that it is easier to refactor and update the database scheme within a single microservice.
Versioning and compatibility
: Microservices may change and update the API, but they should define versions, such as
/api/v1
and
/api/v2
, that have backward compatibility. This helps to decouple other microservices and applications.
Integrate CI/CD
: The microservice should adopt the CI and CD process to eliminate management effort.
There are some frameworks that can help to build microservice-based applications, such as Spring Boot (https://projects.spring.io/spring-boot/) and Flask (http://flask.pocoo.org). However, there're a lot of HTTP-based frameworks, so developers can feel free to choose any preferred framework or programming language. This is the beauty of the microservice design.
The following diagram is a comparison between the monolithic application design and the microservices design. It indicates that a microservice design is the same as the monolithic design; they both contain an interface layer, a business logic layer, a model layer, and a data store. The difference is, however, that the application is constructed of multiple microservices. Different applications can share the same microservices:
The developer can add the necessary microservices and modify existing microservices with a rapid software delivery method that won't affect an existing application or service. This is an important breakthrough. It represents an entire software development environment and methodology that's widely accepted by developers.
Although CI and CD automation processes help to develop and deploy microservices, the number of resources, such as VMs, OS, libraries, disk volumes, and networks, can't compare with monolithic applications. There are some tools that can support these large automation environments on the cloud.
As discussed previously, automation is the best way to achieve rapid software delivery. It solves the issue of managing microservices. However, automation tools aren't ordinary IT or infrastructure applications such as Active Directory, BIND (DNS), or Sendmail (MTA). In order to achieve automation, we need an engineer who should have both a developer skill set to write code, particularly in scripting languages, and an infrastructure operator skill set with knowledge related to VMs, networks, and storage operations.
DevOps is short for development and operations. It refers to the ability to make automation processes such as CI, infrastructure as code, and CD. It uses some DevOps tools for these automation processes.
One of the popular VCS tools is Git (https://git-scm.com). A developer uses Git to check-in and check-out code all the time. There are various hosting Git services, including GitHub (https://github.com) and Bitbucket (https://bitbucket.org). These allow you to create and save your Git repositories and collaborate with other users over the internet. The following screenshot shows a sample pull request on GitHub:
The build server has a lot of variation. Jenkins (https://jenkins.io) is one of the most well established applications, along with TeamCity (https://www.jetbrains.com/teamcity/). As well as build servers, you also have hosted services, otherwise known as Software as a Service (SaaS), such as Codeship (https://codeship.com) and Travis CI (https://travis-ci.org). SaaS can integrate with other SaaS tools. The build server is capable of invoking external commands, such as unit test programs. This makes the build server a key tool within the CI pipeline.
The following screenshot shows a sample build using Codeship. We check out the code from GitHub and invoke Maven for building (mvn compile) and unit testing (mvn test) our sample application:
There are a variety of configuration management tools available. The most popular ones include Puppet (https://puppet.com), Chef (https://www.chef.io), and Ansible (https://www.ansible.com).
AWS OpsWorks (https://aws.amazon.com/opsworks/) provides a managed Chef platform on AWS Cloud. The following screenshot shows a Chef recipe (configuration) of an installation of the Amazon CloudWatch Log agent using AWS OpsWorks. AWS OpsWorks automates the installation of the CloudWatch Log agent when launching an EC2 instance:
AWS CloudFormation (https://aws.amazon.com/cloudformation/) helps to achieve infrastructure as code. It supports the automation of AWS operations, so that we can perform the following functions:
Creating a VPC
Creating a subnet on VPC
Creating an internet gateway on VPC
Creating a routing table to associate a subnet to the internet gateway
Creating a security group
Creating a VM instance
Associating a security group to a VM instance
The configuration of CloudFormation is written by JSON, as shown in the following screenshot:
CloudFormation supports parameterizing, so it's easy to create an additional environment with different parameters (such as VPC and CIDR) using a JSON file with the same configuration. It also supports the update operation. If we need to change a part of the infrastructure, there's no need to recreate the whole thing. CloudFormation can identify a delta of configuration and perform only the necessary infrastructure operations on your behalf.
AWS CodeDeploy (https://aws.amazon.com/codedeploy/) is another useful automation tool that focuses on software deployment. It allows the user to define the deployment steps. You can carry out the following actions on the YAML file:
Specify where to download and install the application
Specify how to stop the application
Specify how to install the application
Specify how to start and configure an application
The following screenshot is an example of the AWS CodeDeploy configuration file, appspec.yml:
Once you start to manage microservices using a cloud infrastructure, there are various monitoring tools that can help you to manage your servers.
AmazonCloudWatch is the built-in monitoring tool for AWS. No agent installation is needed; it automatically gathers metrics from AWS instances and allows the user to visualize these in order to carry out DevOps tasks. It also supports the ability to set an alert based on the criteria that you set. The following screenshot shows the Amazon CloudWatch metrics for an EC2 instance:
Amazon CloudWatch also supports the gathering of an application log. This requires us to install an agent on an EC2 instance. Centralized log management is useful when you need to start managing multiple microservice instances.
ELK is a popular combination of stacks that stands for Elasticsearch (https://www.elastic.co/products/elasticsearch), Logstash (https://www.elastic.co/products/logstash), and Kibana (https://www.elastic.co/products/kibana). Logstash aggregates the application log, transforms it to JSON format, and then sends it to Elasticsearch. Elasticsearch is a distributed JSON database. Kibana can visualize the data that's stored on Elasticsearch. The following Kibana example shows an nginx access log:
Grafana (https://grafana.com) is another popular visualization tool. It used to be connected with time series databases such as Graphite (https://graphiteapp.org) or InfluxDB (https://www.influxdata.com). A time series database is designed to store data that's flat, de-normalized, and numeric, such as CPU usage or network traffic. Unlike RDBMS, a time series database has some optimization in order to save data space and can carry out faster queries on historical numeric data. Most DevOps monitoring tools use time series databases in the backend.
The following Grafana screenshot shows some Message Queue Server statistics:
When you start to use several DevOps tools, you need to go back and forth to visit several consoles to check whether the CI and CD pipelines work properly or not. In particular, the following events need to be monitored:
Merging the source code to GitHub
Triggering the new build on Jenkins
Triggering AWS CodeDeploy to deploy the new version of the application
These events need to be tracked. If there's any trouble, DevOps teams needs to discuss this with the developers and the QA team. However, communication can be a problem here, because DevOps teams are required to capture each event one by one and then pass it on as appropriate. This is inefficient.
There are some communication tools that help to integrate these different teams. They allow anyone to join to look at the events and communicate. Slack (https://slack.com) and HipChat (https://www.hipchat.com) are the most popular communication tools.
These tools also support integration with SaaS services so that DevOps teams can see events on a single chat room. The following screenshot is a Slack chat room that integrates with Jenkins:
CI, CD, and automation work can be achieved easily when used with cloud technology. In particular, public cloud APIs help DevOps to come up with many CI and CD tools. Public clouds such as Amazon Web Services (https://aws.amazon.com), Google Cloud Platform (https://cloud.google.com), and Microsoft Azure (https://azure.microsoft.com) provide some APIs for DevOps teams to control cloud infrastructure. The DevOps can also reduce wastage of resources, because you can pay as you go whenever the resources are needed. The public cloud will continue to grow in the same way as the software development cycle and the architecture design. These are all essential in order to carry your application or service to success.
The following screenshot shows the web console for Amazon Web Services:
Google Cloud Platform also has a web console, as shown here:
Here's a screenshot of the Microsoft Azure console as well:
All three cloud services have a free trial period that a DevOps engineer can use to try and understand the benefits of cloud infrastructure.
In this chapter, we've discussed the history of software development methodology, programming evolution, and DevOps tools. These methodologies and tools support a faster software delivery cycle. The microservices design also helps to produce continuous software updates. However, microservices increase the complexity of the management of an environment.
In Chapter 2, DevOps with Containers, we will describe the Docker container technology, which helps to compose microservice applications and manage them in a more efficient and automated way.
We're now familiar with a wide variety of DevOps tools that can help us to automate tasks and manage configuration throughout the delivery journey of an application. Challenges still lie ahead, however, as applications have now become more diverse than ever. In this chapter, we'll add another skill to our tool belt: the container. In particular, we'll talk about the Docker container. In doing this, we'll seek to understand the following:
Key concepts related to containers
Running Docker applications
Building Docker applications with Dockerfile
Orchestrating multiple containers with Docker compose
One of the key features of containers is isolation. In this section, we'll establish a proper understanding of this powerful tool by looking at how a container achieves isolation and why this matters in the software development life cycle.
When an application launches, it consumes CPU time, occupies memory space, links to its dependent libraries, writes to the disk, transmits packets, and may access other devices as well. Everything it uses up is a kind of resource, which is shared by all the programs on the same host. To increase the efficiency of resource utilization, we may try to put as many applications as possible on a single machine. However, the complexity involved in making every application work in a box effectively increases exponentially, even if we just want to run two applications, let alone work with tons of applications and machines. Because of this, the idea to separate the resources of a physical computing unit into isolated pieces soon became a paradigm in the industry.
You may have heard of terms such as Virtual Machines (VMs), BSD jails, Solaris containers, Linux containers, Docker, and others. All of these promise us similar isolation concepts but use fundamentally distinct mechanisms, so the actual level of isolation differs. For example, the implementation of a VM involves full virtualization of the hardware layer with a hypervisor. If you want to run an application on a VM, you have to start from a full operating system. In other words, the resources are isolated between guest operating systems running on the same hypervisor. In contrast, Linux and Docker containers are built on top of Linux primitives, which means they can only run in an operating system with those capabilities. BSD jails and Solaris containers work in a similar fashion, but on other operating systems. The following diagram illustrates the isolation relationship of the Linux container and VMs. The container isolates an application on the operating system layer, while VM-based separation is achieved by the underlying hypervisor or host operating system:
A Linux container is made up of several building blocks, the two most important of which are namespaces and control groups (cgroups). Both of these are Linux kernel features. Namespaces provide logical partitions of certain kinds of system resources, such as the mounting point (mnt), the process ID (PID), and the network (net). To further understand the concept of isolation, let's look at some simple examples on thepidnamespace. The following examples are from Ubuntu 18.04.1 and util-linux 2.31.1.
When we type ps axf in our Terminal, we'll see a long list of running processes:
$ ps axf PID TTY STAT TIME COMMAND 2 ? S 0:00 [kthreadd] 4 ? I< 0:00 \_ [kworker/0:0H] 5 ? I 0:00 \_ [kworker/u2:0] 6 ? I< 0:00 \_ [mm_percpu_wq] 7 ? S 0:00 \_ [ksoftirqd/0]...
Let's now enter a new pid namespace with unshare, which is able to disassociate a process resource part by part into a new namespace. We'll then check the processes again:
$ sudo unshare --fork --pid --mount-proc=/proc /bin/sh# ps axf PID TTY STAT TIME COMMAND 1 pts/0 S 0:00 /bin/sh 2 pts/0 R+ 0:00 ps axf
You'll find that the pid of the shell process at the new namespace becomes 1 and all other processes have disappeared. This means you've successfully created a pid container. Let's switch to another session outside the namespace and list the processes again:
$ ps axf ## from another terminal PID TTY STAT TIME COMMAND ...
1260 pts/0 Ss 0:00 \_ -bash 1496 pts/0 S 0:00 | \_ sudo unshare --fork --pid --mount-proc=/proc /bin/sh 1497 pts/0 S 0:00 | \_ unshare --fork --pid --mount-proc=/proc /bin/sh 1498 pts/0 S+ 0:00 | \_ /bin/sh 1464 pts/1 Ss 0:00 \_ -bash ...
You can still see the other processes and your shell process within the new namespace. With the pid namespace's isolation, processes inhabiting different namespaces can't see each other. However, if one process uses a considerable amount of system resources, such as the memory, it could cause the system to run out of that resource and become unstable. In other words, an isolated process could still disrupt other processes or even crash the whole system if we don't impose resource usage restrictions on it.
The following diagram illustrates the PID namespaces and how an Out-Of-Memory (OOM) event can affect other processes outside a child namespace. The numbered blocks are the processes in the system, and the numbers are their PIDs. Blocks with two numbers are processes created with the child namespace, where the second number represents their PIDs in the child namespace. In the upper part of the diagram, there's still free memory available in the system. Later on, however, in the lower part of the diagram, the processes in the child namespace exhaust the remaining memory in the system. Due to the lack of free memory, the host kernel then starts the OOM killer to release memory, the victims of which are likely to be processes outside the child namespace. In the example here, processes8and13in the system are killed:
In light of this, cgroups is utilized here to limit resource usage. Like namespaces, this can impose constraints on different kinds of system resources. Let's continue from our pid namespace, generate some loadon the CPU with yes > /dev/null, and then monitor it with top:
## in the container terminal# yes > /dev/null & top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2 root 20 0 7468 788 724 R 99.7 0.1 0:15.14 yes 1 root 20 0 4628 780 712 S 0.0 0.1 0:00.00 sh 3 root 20 0 41656 3656 3188 R 0.0 0.4 0:00.00 top
Our CPU load reaches 100%, as expected. Let's now limit it with the cgroup CPU. cgroups are organized as folders under /sys/fs/cgroup/. First, we need to switch to the host session:
## on the host session$ ls /sys/fs/cgroupblkio cpu cpuacct cpu,cpuacct cpuset devices freezer hugetlb memory net_cls net_cls,net_prio net_prio perf_event pids rdma systemd unified
Each folder represents the resources it controls. It's pretty easy to create a cgroup and control processes with it: just create a folder under the resource type with any name and append the process IDs you'd like to control to tasks. Here, we want to throttle the CPU usage of our yes process, so create a new folder under cpu and find out the PID of the yes process:
## also on the host terminal$ ps ax | grep yes | grep -v grep 1658 pts/0 R 0:42 yes$ sudo mkdir /sys/fs/cgroup/cpu/box && \ echo 1658 | sudo tee /sys/fs/cgroup/cpu/box/tasks > /dev/null
We've just added yes into the newly created box CPU group, but the policy remains unset, and the process still runs without any restrictions. Set a limit by writing the desired number into the corresponding file and check the CPU usage again:
$ echo 50000 | sudo tee /sys/fs/cgroup/cpu/box/cpu.cfs_quota_us > /dev/null## go back to namespaced terminal, check stats with top
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
