34,79 €
Kubernetes has continued to grow and achieve broad adoption across various industries, helping you to orchestrate and automate container deployments on a massive scale.
Based on the recent release of Kubernetes 1.12, Getting Started with Kubernetes gives you a complete understanding of how to install a Kubernetes cluster. The book focuses on core Kubernetes constructs, such as pods, services, replica sets, replication controllers, and labels. You will understand cluster-level networking in Kubernetes, and learn to set up external access to applications running in the cluster.
As you make your way through the book, you'll understand how to manage deployments and perform updates with minimal downtime. In addition to this, you will explore operational aspects of Kubernetes , such as monitoring and logging, later moving on to advanced concepts such as container security and cluster federation. You'll get to grips with integrating your build pipeline and deployments within a Kubernetes cluster, and be able to understand and interact with open source projects. In the concluding chapters, you'll orchestrate updates behind the scenes, avoid downtime on your cluster, and deal with underlying cloud provider instability within your cluster.
By the end of this book, you'll have a complete understanding of the Kubernetes platform and will start deploying applications on it.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 471
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Vijin BorichaAcquisition Editor:Rahul NairContent Development Editor: Sharon RajTechnical Editor: Komal KarneCopy Editor: Safis EditingProject Coordinator: Drashti PanchalProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics: Tom ScariaProduction Coordinator: Shantanu Zagade
First published: December 2015 Second edition: May 2017 Third edition: October 2018
Production reference: 2061118
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78899-472-9
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Jonathan Baier is an emerging technology leader living in Brooklyn, New York. He has had a passion for technology since an early age. When he was 14 years old, he was so interested in the family computer (an IBM PCjr) that he pored over the several hundred pages of BASIC and DOS manuals. Then, he taught himself to code a very poorly-written version of Tic-Tac-Toe. During his teenage years, he started a computer support business. Throughout his life, he has dabbled in entrepreneurship. He currently works as Senior Vice President of Cloud Engineering and Operations for Moody's corporation in New York.
Jesse White is a 15-year veteran and technology leader in New York City's very own Silicon Alley, where he is a pillar of the vibrant engineering ecosystem. As founder of DockerNYC and an active participant in the open source community, you can find Jesse at a number of leading industry events, including DockerCon and VelocityConf, giving talks and workshops.
Jakub Pavlik is a co-founder, former CTO, and chief architect of TCP Cloud (acquired by Mirantis in 2016). Jakub and his team worked for several years on the IaaS cloud platform based on the OpenStack-Salt, Kubernetes, and Open Contrail projects, which they deployed and operated for global service providers. Leveraging his skills in architecture, implementation, and operation, his TCP Cloud team was acquired by #1 pure play OpenStack company Mirantis. Currently a director of engineering, together with other skilled professionals, Jakub builds and operates a new generation of edge-computing platforms at Volterra Inc. He is also an enthusiast of Linux OS, ice hockey (with Pepa), and films, and loves his wife, Hanulka.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Getting Started with Kubernetes Third Edition
Dedication
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Introduction to Kubernetes
Technical requirements
A brief overview of containers
What is a container?
cgroups
Namespaces 
Union filesystems
Why are containers so cool?
The advantages of Continuous Integration/Continuous Deployment
Resource utilization
Microservices and orchestration
Future challenges
Our first clusters
Running Kubernetes on GCE
Kubernetes UI
Grafana
Command line
Services running on the master
Services running on the minions
Tearing down a cluster
Working with other providers
CLI setup
IAM setup
Cluster state storage
Creating your cluster
Other modes
Resetting the cluster
Investigating other deployment automation
Local alternatives
Starting from scratch
Cluster setup
Installing Kubernetes components (kubelet and kubeadm)
Setting up a master
Joining nodes
Networking
Joining the cluster
Summary
Questions
Further reading
Building a Foundation with Core Kubernetes Constructs
Technical requirements
The Kubernetes system
Nucleus
Application layer
Governance layer
Interface layer
Ecosystem
The architecture
The Master
Cluster state
Cluster nodes
Master
Nodes (formerly minions)
Core constructs
Pods
Pod example
Labels
The container's afterlife
Services
Replication controllers and replica sets
Our first Kubernetes application
More on labels
Replica sets
Health checks
TCP checks
Life cycle hooks or graceful shutdown
Application scheduling
Scheduling example
Summary
Questions
Further reading
Working with Networking, Load Balancers, and Ingress
Technical requirements
Container networking
The Docker approach
Docker default networks
Docker user-defined networks
The Kubernetes approach
Networking options
Networking comparisons
Weave
Flannel
Project Calico
Canal
Kube-router
Balanced design
Advanced services
External services
Internal services
Custom load balancing
Cross-node proxy
Custom ports
Multiple ports
Ingress
Types of ingress
Migrations, multicluster, and more
Custom addressing
Service discovery
DNS
Multitenancy
Limits
A note on resource usage
Summary
Questions
Further reading
Implementing Reliable Container-Native Applications
Technical requirements
How Kubernetes manages state
Deployments
Deployment use cases
Scaling
Updates and rollouts
History and rollbacks
Autoscaling
Jobs
Other types of jobs
Parallel jobs
Scheduled jobs
DaemonSets
Node selection
Summary
Questions
Exploring Kubernetes Storage Concepts
Technical requirements
Persistent storage
Temporary disks
Cloud volumes
GCE Persistent Disks
AWS Elastic Block Store
Other storage options
PersistentVolumes and Storage Classes
Dynamic volume provisioning
StatefulSets
A stateful example
Summary
Questions
Further reading
Application Updates, Gradual Rollouts, and Autoscaling
Technical requirements
Example setup
Scaling up
Smooth updates
Testing, releases, and cutovers
Application autoscaling
Scaling a cluster
Autoscaling
Scaling up the cluster on GCE
Scaling up the cluster on AWS
Scaling manually
Managing applications
Getting started with Helm
Summary
Questions
Further reading
Designing for Continuous Integration and Delivery
Technical requirements
Integrating Kubernetes with a continuous delivery pipeline
gulp.js
Prerequisites
gulp.js build example
The Kubernetes plugin for Jenkins
Prerequisites
Installing plugins
Configuring the Kubernetes plugin
Helm and Minikube
Bonus fun
Summary
Questions
Further reading
Monitoring and Logging
Technical requirements
Monitoring operations
Built-in monitoring
Exploring Heapster
Customizing our dashboards
FluentD and Google Cloud Logging
FluentD
Maturing our monitoring operations
GCE (Stackdriver)
Signing up for GCE monitoring
Alerts
Beyond system monitoring with Sysdig
Sysdig Cloud
Detailed views
Topology views
Metrics
Alerting
The Sysdig command line
The Csysdig command-line UI
Prometheus
Prometheus summary
Prometheus installation choices
Tips for creating an Operator
Installing Prometheus
Summary
Questions
Further reading
Operating Systems, Platforms, and Cloud and Local Providers
Technical requirements
The importance of standards
The OCI Charter
The OCI
Container Runtime Interface
Trying out CRI-O
More on container runtimes
CNCF
Standard container specification
CoreOS
rkt
etcd
Kubernetes with CoreOS
Tectonic
Dashboard highlights
Hosted platforms
Amazon Web Services
Microsoft Azure
Google Kubernetes Engine
Summary
Further reading
Designing for High Availability and Scalability
Technical requirements
Introduction to high availability
How do we measure availability?
Uptime and downtime
Uptime
Downtime
The five nines of availability
HA best practices
Anti-fragility
HA clusters
HA features of the major cloud service providers
HA approaches for Kubernetes
Prerequisites
Setting up
Stacked nodes
Installing workers
Cluster life cycle
Admission controllers
Using admission controllers
The workloads API
Custom resource definitions
Using CRDs
Summary
Questions
Further reading
Kubernetes SIGs, Incubation Projects, and the CNCF
Technical requirements
Setting up Git for contributions
Git's benefits
CNCF structure
What Kubernetes isn't
Kubernetes SIGs
How to get involved
Summary
Questions
Further reading
Cluster Federation and Multi-Tenancy
Technical requirements
Introduction to federation
Why federation?
The building blocks of federation
Key components
Federated services
Setting up federation
Contexts
New clusters for federation
Initializing the federation control plane
Adding clusters to the federation system
Federated resources
Federated configurations
Federated horizontal pod autoscalers
How to use federated HPAs
Other federated resources
Events
Jobs
True multi-cloud
Getting to multi-cloud
Deleting the cluster
Summary
Questions
Further reading
Cluster Authentication, Authorization, and Container Security
Basics of container security
Keeping containers contained 
Resource exhaustion and orchestration security
Image repositories
Continuous vulnerability scanning
Image signing and verification
Kubernetes cluster security
Secure API calls
Secure node communication
Authorization and authentication plugins
Admission controllers
RBAC
Pod security policies and context
Enabling PodSecurityPolicies
Additional considerations
Securing sensitive application data (secrets)
Summary
Questions
Further reading
Hardening Kubernetes
Ready for production
Ready, set, go
Lessons learned from production
Setting limits
Scheduling limits
Memory limit example
Scheduling CPU constraints
CPU constraints example
Securing a cluster
Third-party companies
Private registries
Google Kubernetes Engine
Azure Kubernetes Service
ClusterHQ
Portworx
Shippable
Twistlock
Aqua Sec
Mesosphere (Kubernetes on Mesos)
Deis
OpenShift
Summary
Questions
Further reading
Kubernetes Infrastructure Management
Technical requirements
Planning a cluster
Picking what's right
Securing the cluster
Tuning examples
Upgrading the cluster
Upgrading PaaS clusters
Scaling the cluster
On GKE and AKS
DIY clusters
Node maintenance
Additional configuration options
Summary
Questions
Further reading
Assessments
Chapter 1: Introduction to Kubernetes
Chapter 2: Building a Foundation with Core Kubernetes Constructs
Chapter 3: Working with Networking, Load Balancers, and Ingress
Chapter 4: Implementing Reliable, Container-Native Applications
Chapter 5: Exploring Kubernetes Storage Concepts
Chapter 6: Application Updates, Gradual Rollouts, and Autoscaling
Chapter 7: Designing for Continuous Integration and Delivery
Chapter 8: Monitoring and Logging
Chapter 10: Designing for High Availability and Scalability
Chapter 11: Kubernetes SIGs, Incubation Projects, and the CNCF
Chapter 12: Cluster Federation and Multi-Tenancy
Chapter 13: Cluster Authentication, Authorization, and Container Security
Chapter 14: Hardening Kubernetes
Chapter 15: Kubernetes Infrastructure Management
Other Books You May Enjoy
Leave a review - let other readers know what you think
This book is a guide to getting started with Kubernetes and overall container management. We will walk you through the features and functions of Kubernetes and show how it fits into an overall operations strategy. You'll learn what hurdles lurk in moving a container off the developer's laptop and managing them at a larger scale. You'll also see how Kubernetes is the perfect tool to help you face these challenges with confidence.
Whether you've got your head down in development, you're up to your neck in operations, or you're looking forward as an executive, Kubernetes and this book are for you. Getting Started with Kubernetes will help you understand how to move your container applications into production with best practices and step-by-step walkthroughs tied to a real-world operational strategy. You'll learn how Kubernetes fits into your everyday operations, which can help you prepare for production-ready container application stacks.
Having some familiarity with Docker containers, general software development, and operations at a high level will be helpful.
Chapter 1, Introduction to Kubernetes, is a brief overview of containers and the how, what, and why of Kubernetes orchestration, exploring how it impacts your business goals and everyday operations.
Chapter 2, Building a Foundation with Core Kubernetes Constructs, uses a few simple examples to explore core Kubernetes constructs, namely pods, services, replication controllers, replica sets, and labels. Basic operations, including health checks and scheduling, will also be covered.
Chapter 3, Working with Networking, Load Balancers, and Ingress, covers cluster networking for Kubernetes and the Kubernetes proxy. It also takes a deeper dive into services, and shows a brief overview of some higher-level isolation features for multi-tenancy.
Chapter 4, Implementing Reliable, Container-Native Applications, covers both long-running application deployments and short-lived jobs. We will also look at using DaemonSets to run containers on all or subsets of nodes in the cluster.
Chapter 5, Exploring Kubernetes Storage Concepts, covers storage concerns and persistent data across pods and the container life cycle. We will also look at new constructs for working with stateful applications in Kubernetes.
Chapter 6, Application Updates, Gradual Rollouts, and Autoscaling, is a quick look at how to roll out updates and new features with minimal disruption to uptime. We will also look at scaling for applications and the Kubernetes cluster.
Chapter 7, Designing for Continuous Integration and Delivery, explains how to integrate Kubernetes into your continuous delivery pipeline. We will see how to use a K8s cluster with gulp.js and Jenkins as well.
Chapter 8, Monitoring and Logging, teaches how to use and customize built-in and third-party monitoring tools on your Kubernetes cluster. We will look at built-in logging and monitoring, the Google Cloud Monitoring/Logging service, and Sysdig.
Chapter 9, Operating Systems, Platforms, and Cloud and Local Providers, starts off by covering Open Container Project and its mission to provide an open container specification, looking at how having open standards encourages a diverse ecosystem of container implementations (such as Docker, rkt, Kurma, and JetPack). The second half of this chapter will cover available OSes, such as CoreOS, Project Atomic, and their advantages as a host OSes, including performance and support for various container implementations.
Chapter 10, Designing for High Availability and Scalability, uncovers the Kubernetes Workload capability, which allows us to leverage all App Workload APIs, such as the DaemonSet, Deployment, ReplicaSet, and StatefulSet APIs, in order to create foundations for long-running, stateless, and stateful workloads. We will describe and implement admission control to validate and/or mutate objects within the cluster.
Chapter 11, Kubernetes SIGs, Incubation Projects, and the CNCF, discusses the new globally distributed collaboration model of Kubernetes and its partner projects. We'll describe the three tiers of organization around SIGs, the different between incubating and graduated projects, and how the CNCF is evolving the idea of an open source project into a distributed foundation.
Chapter 12, Cluster Federation and Multi-Tenancy, explores the new federation capabilities and how to use them to manage multiple clusters. We will also cover the federated version of the core constructs and the integration to public cloud vendor DNS.
Chapter 13, Cluster Authentication, Authorization, and Container Security, gets into the options for container security, from the container run-time level to the host itself. We will discuss how to apply these concepts to workloads running in a Kubernetes cluster and some of the security concerns and practices that relate specifically to running your Kubernetes cluster.
Chapter 14, Hardening Kubernetes, and How to Find Out More about Third-Party Extensions and Tools, covers some of the extensions available from vendors for enterprise-grade deployments. Additionally, we'll look at a brief survey of some of the existing tools and services that work with Kubernetes for monitoring, security, and storage.
Chapter 15, Kubernetes Infrastructure Management, focuses on how to make changes to the infrastructure that powers your Kubernetes infrastructure, whether it be a purely public cloud platform or a hybrid installation. We'll discuss methods for handling underlying instance and resource instability, and strategies for running highly available workloads on partially available underlying hardware.
This book will cover downloading and running the Kubernetes project. You'll need access to a Linux system (VirtualBox will work if you are on Windows) and some familiarity with the command shell.
Additionally, you should have a Google Cloud Platform account. You can sign up for a free trial here: https://cloud.google.com/.
Also, an AWS account is necessary for a few sections of the book. You can sign up for a free trial here: https://aws.amazon.com/.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Getting-Started-with-Kubernetes-third-edition. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781788994729_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The last two main pieces of the Master nodes are kube-controller-manager and cloud-controller-manager."
A block of code is set as follows:
"conditions": [ { "type": "Ready", "status": "True" }
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
"conditions": [ {
"type": "Ready"
, "status": "True" }
Any command-line input or output is written as follows:
$ kubectl describe pods/node-js-pod
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Click on Jobs and then long-task from the list, so we can see the details."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
In this book, we will help you build, scale, and manage production-ready Kubernetes clusters. Each section of this book will empower you with the core container concepts and the operational context of running modern web services that need to be available 24 hours of the day, 7 days a week, 365 days of the year. As we progress, you'll be given concrete, code-based examples that you can deploy into running clusters in order to get real-world feedback on Kubernetes' many abstractions. By the end of this book, you will have mastered the core conceptual building blocks of Kubernetes, and will have a firm understanding of how to handle the following paradigms:
Orchestration
Scheduling
Networking
Security
Storage
Identity and authentication
Infrastructure management
This chapter will set the stage for why Kubernetes? and give an overview of modern container history, diving into how containers work, as well as why it's important to schedule, orchestrate, and manage a container platform well. We'll tie this back to concrete objectives and goals for your business and product. This chapter will also give a brief overview of how Kubernetes orchestration can enhance our container management strategy and how we can get a basic Kubernetes cluster up, running, and ready for container deployments.
In this chapter, we will cover the following topics:
Introducing container operations and management
The importance of container management
The advantages of Kubernetes
Downloading the latest Kubernetes
Installing and starting up a new Kubernetes cluster
The components of a Kubernetes cluster
You'll need to have the following tools installed:
Python
AWS CLI
Google Cloud CLI
Minikube
We'll go into the specifics of these tools' installation and configuration as we go through this chapter. If you already know how to do this, you can go ahead and set them up now.
Believe it or not, containers and their precursors have been around for over 15 years in the Linux and Unix operating systems. If you look deeper into the fundamentals of how containers operate, you can see their roots in the chroot technology that was invented all the way back in 1970. Since the early 2000s, FreeBSD, Linux, Solaris, Open VZ, Warden, and finally Docker all made significant attempts at encapsulating containerization technology for the end user.
While the VServer's project and first commit (running several general purpose Linux server on a single box with a high degree of independence and security (http://ieeexplore.ieee.org/document/1430092/?reload=true)) may have been one of the most interesting historical junctures in container history, it's clear that Docker set the container ecosystem on fire back in late 2013 when they went full in on the container ecosystem and decided to rebrand from dotCloud to Docker. Their mass marketing of container appeal set the stage for the broad market adoption we see today and is a direct precursor of the massive container orchestration and scheduling platforms we're writing about here.
Over the past five years, containers have grown in popularity like wildfire. Where containers were once relegated to developer laptops, testing, or development environments, you'll now see them as the building blocks of powerful production systems. They're running highly secure banking workloads and trading systems, powering IoT, keeping our on-demand economy humming, and scaling up to millions of containers to keep the products of the 21st century running at peak efficiency in both the cloud and private data centers. Furthermore, containerization technology permeates our technological zeitgest, with every technology conference in the world devoting a significant portion of their talks and sessions devoted to building, running, or developing in containers.
At the beginning of this compelling story lies Docker and their compelling suite of developer-friendly tools. Docker for macOS and Windows, Compose, Swarm, and Registry have been incredibly powerful tools that have shaped workflows and changed how companies develop software. They've built a bridge for containers to exist at the very heart of the Software Delivery Life Cycle (SDLC), and a remarkable ecosystem has sprung up around those containers. As Malcom McLean revolutionized the physical shipping world in the 1950s by creating a standardized shipping container, which is used today for everything from ice cube trays to automobiles, Linux containers are revolutionizing the software development world by making application environments portable and consistent across the infrastructure landscape.
We'll pick this story up as containers go mainstream, go to production, and go big within organizations. We'll look at what makes a container next.
Containers are a type of operating system virtualization, much like the virtual machines that preceded them. There's also lesser known types of virtualization such as Application Virtualization, Network Virtualization, and Storage Virtualization. While these technologies have been around since the 1960s, Docker's encapsulation of the container paradigm represents a modern implementation of resource isolation that utilizes built-in Linux kernel features such as chroot, control groups (cgroups), UnionFS, and namespaces to fully isolated resource control at the process level.
Containers use these technologies to create lightweight images that act as a standalone, fully encapsulated piece of software that carries everything it needs inside the box. This can include application binaries, any system tools or libraries, environment-based configuration, and runtime. This special property of isolation is very important, as it allows developers and operators to leverage the all-in-one nature of a container to run without issue, regardless of the environment it's run on. This includes developer laptops and any kind of pre-production or production environment.
This decoupling of application packaging mechanism from the environment on which it runs is a powerful concept that provides a clear separation of concerns between engineering teams. This allows developers to focus on building the core business capabilities into their application code and managing their own dependencies, while operators can streamline the continuous integration, promotion, and deployment of said applications without having to worry about their configuration.
At the core of container technology are three key concepts:
cgroups
Namespaces
Union filesystems
cgroups work by allowing the host to share and also limit the resources each process or container can consume. This is important for both resource utilization and security, as it prevents denial-of-service (DoS) attacks on the host's hardware resources. Several containers can share CPU and memory while staying within the predefined constraints. cgroups allow containers to provision access to memory, disk I/O, network, and CPU. You can also access devices (for example, /dev/foo). cgroups also power the soft and hard limits of container constraints that we'll discuss in later chapters.
There are seven major cgroups:
Memory cgroup
: This keeps track of page access by the group, and can define limits for physical, kernel, and total memory.
Blkio cgroup
: This tracks the I/O usage per group, across the read and write activity per block device. You can throttle by group per device, on operations versus bytes, and for reads versus writes.
CPU cgroup
: This keeps track of user and system CPU time and usage per CPU. This allows you to set weights, but not limits.
Freezer cgroup
: This is useful in batch management systems that are often stopping and starting tasks in order to schedule resources efficiently. The SIGSTOP signal is used to suspend a process, and the process is generally unaware that it is being suspended (or resumed, for that matter.)
CPUset cgroup
: This allows you to pin a group to a specific CPU within a multi-core CPU architecture. You can pin by application, which will prevent it from moving between CPUs. This can improve the performance of your code by increasing the amount of local memory access or minimizing thread switching.
Net_cls/net_prio cgroup
: This keeps tabs on the egress traffic class (
net_cls
) or priority (
net_prio
) that is generated by the processes within the cgroup.
Devices cgroup
: This controls what read/write permissions the group has on device nodes.
Namespaces offer another form of isolation for process interaction within operating systems, creating the workspace we call a container. Linux namespaces are created via a syscall named unshare, while clone and setns allow you to manipulate namespaces in other manners.
Namespaces limit the visibility a process has on other processes, networking, filesystems, and user ID components. Container processes are limited to seeing only what is in the same namespace. Processes from containers or the host processes are not directly accessible from within this container process. Additionally, Docker gives each container its own networking stack that protects the sockets and interfaces in a similar fashion.
If cgroups limit how much of a thing you can use, namespaces limit what things you can see. The following diagram shows the composition of a container:
In the case of the Docker engine, the following namespaces are used:
pid
: Provides process isolation via an independent set of process IDs from other namespaces. These are nested.
net
: Manages network interfaces by virtualizing the network stack through providing a loopback interface, and can create physical and virtual network interfaces that exist in a single namespace at a time.
ipc
: Manages access to interprocess communication.
mnt
: Controls filesystem mount points. These were the first kind of namespaces created in the Linux kernel, and can be private or shared.
uts
: The Unix time-sharing system isolates version IDs and kernel by allowing a single system to provide different host and domain naming schemes to different processes. The processes
gethostname
and
sethostname
use this namespace.
user
: This namespace allows you to map UID/GID from container to host, and prevents the need for extra configuration in the container.
Union filesystems are also a key advantage of using Docker containers. Containers run from an image. Much like an image in the VM or cloud world, it represents state at a particular point in time. Container images snapshot the filesystem, but tend to be much smaller than a VM. The container shares the host kernel and generally runs a much smaller set of processes, so the filesystem and bootstrap period tend to be much smaller—though those constraints are not strictly enforced. Second, the union filesystem allows for the efficient storage, download, and execution of these images. Containers use the idea of copy-on-write storage, which is able to create a brand new container immediately, without having to wait on copying out a whole new filesystem. This is similar to thin provisioning in other systems, where storage is allocated as needed:
Copy-on-write storage keeps track of what's changed, and in this way is similar to distributed version control systems (DVCS) such as Git. There are a number of options available to the end user that leverage copy-on-write storage:
AUFS and overlay at the file level
Device mapper at the block level
BTRFS and ZFS and the filesystem level
The easiest way to understand union filesystems is to think of them like a layer cake with each layer baked independently. The Linux kernel is our base layer; then, we might add an OS such as Red Hat Linux or Ubuntu.
Next, we might add an application such as nginx or Apache. Every change creates a new layer. Finally, as you make changes and new layers are added, you'll always have a top layer (think frosting) that is a writable layer. Union filesystems leverage this strategy to make each layer lightweight and speedy.
In Docker's case, the storage driver is responsible for stacking these layers on top of each other and providing a single pane of glass to view these systems. The thin writable layer on the top of this stack of layers is where you'll do your work: the writable container layer. We can consider each layer below to be container image layers:
What makes this truly efficient is that Docker caches the layers the first time we build them. So, let's say that we have an image with Ubuntu and then add Apache and build the image. Next, we build MySQL with Ubuntu as the base. The second build will be much faster because the Ubuntu layer is already cached. Essentially, our chocolate and vanilla layers, from the preceding diagram, are already baked. We simply need to bake the pistachio (MySQL) layer, assemble, and add the icing (the writable layer).
What's also really exciting is that not only has the open source community embraced containers and Kubernetes, but the cloud providers have also deeply embraced the container ecosystem, and invested millions of dollars in supporting tooling, ecosystem, and management planes that can help manage containers. This means you have more options to run container workloads, and you'll have more tools to manage the scheduling and orchestration of the applications running on your clusters.
We'll explore some specific opportunities available to Kubernetes users, but at the time of this book's publishing, all of the major cloud service providers (CSPs) are offering some form of hosted or managed Kubernetes:
Amazon Web Services
: AWS offers
Elastic Container Service for Kubernetes
(
EKS
) (for more information visit
https://aws.amazon.com/eks/
), a managed service that simplifies running Kubernetes clusters in their cloud. You can also roll your own clusters with kops (for information visit
https://kubernetes.io/docs/setup/custom-cloud/kops/
). This product is still in active development:
Google Cloud Platform
: GCP offers the
Google Kubernetes Engine
(
GKE
) (for more information visit
https://cloud.google.com/kubernetes-engine/
), a powerful cluster manager that can deploy, manage, and scale containerized applications in the cloud. Google has been running containerized workloads for over 15 years, and this platform is an excellent choice for sophisticated workload management:
Microsoft Azure
: Azure offers the
Azure Container Service
(
AKS
) (for more information visit
https://azure.microsoft.com/en-us/services/kubernetes-service/
), which aims to simplify the deployment, management, and operations of a full-scale Kubernetes cluster. This product is still in active development:
When you take advantage of one of these systems, you get built-in management of your Kubernetes cluster, which allows you to focus on the optimization, configuration, and deployment of your cluster.
ThoughtWorks defines Continuous Integration as a development practice that requires developers to integrate code into a shared repository several times a day. By having a continuous process of building and deploying code, organizations are able to instill quality control and testing as part of the everyday work cycle. The result is that updates and bug fixes happen much faster and the overall quality improves.
However, there has always been a challenge in creating development environments that match those of testing and production. Often, inconsistencies in these environments make it difficult to gain the full advantage of Continuous Delivery. Continuous Integration is the first step in speeding up your organization's software delivery life cycle, which helps you get your software features in front of customer quickly and reliably.
The concept of Continuous Delivery/Deployment uses Continuous Integration to enables developers to have truly portable deployments. Containers that are deployed on a developer's laptop are easily deployed on an in-house staging server. They are then easily transferred to the production server running in the cloud. This is facilitated due to the nature of containers, which build files that specify parent layers, as we discussed previously. One advantage of this is that it becomes very easy to ensure OS, package, and application versions are the same across development, staging, and production environments. Because all the dependencies are packaged into the layer, the same host server can have multiple containers running a variety of OS or package versions. Furthermore, we can have various languages and frameworks on the same host server without the typical dependency clashes we would get in a VM with a single operating system.
This sets the stage for Continuous Delivery/Deployment of the application, as the operations teams or the developers themselves can focus on getting deployments and application rollouts correct, without having to worry about the intricacies of dependencies.
Continuous Delivery is the embodiment and process wherein all code changes are automatically built, tested (Continuous Integration), and then released into production (Continuous Delivery). If this process captures the correct quality gates, security guarantees, and unit/integration/system tests, the development teams will constantly release production-ready and deployable artifacts that have moved through an automated and standardized process.
It's important to note that CD requires the engineering teams to automate more than just unit tests. In order to utilize CD in sophisticated scheduling and orchestration systems such as Kubernetes, teams need to verify application functionality across many dimensions before they're deployed to customers. We'll explore deployment strategies that Kubernetes has to offer in later chapters.
Lastly, it's important to keep in mind that utilizing Kubernetes with CI/CD reduces the risk of the many common problems that technology firms face:
Long release cycles
: If it takes a long time to release code to your users, then it's a potential functionality that they're missing out on, and this results in lost revenue. If you have a manual testing or release process, it's going to slow down getting changes to production, and therefore in front of your customers.
Fixing code is hard
: When you shorten the release cycle, you're able to discover and remediate bugs closer to the point of creation. This lowers the fixed cost, as there's a correlation between bug introduction and bug discovery times.
Release better
: The more you release, the better you get at releasing. Challenging your developers and operators to build automation, monitoring, and logging around the processes of CI/CD will make your pipeline more robust. As you release more often, the amount of difference between releases also decreases. A smaller difference allows teams to troubleshoot potential breaking changes more quickly, which in turn gives them more time to refine the release process further. It's a virtuous cycle!
Because all the dependencies are packaged into the layer, the same host server can have multiple containers running a variety of OS or package versions. Furthermore, we can have various languages and frameworks on the same host server without the typical dependency clashes we would get in a VM with a single operating system.
The well-defined isolation and layer filesystem also makes containers ideal for running systems with a very small footprint and domain-specific purpose. A streamlined deployment and release process means we can deploy quickly and often. As such, many companies have reduced their deployment time from weeks or months to days and hours in some cases. This development life cycle lends itself extremely well to small, targeted teams working on small chunks of a larger application.
As we break down an application into very specific domains, we need a uniform way to communicate between all the various pieces and domains. Web services have served this purpose for years, but the added isolation and granular focus that containers bring have paved the way for microservices.
A definition for microservices can be a bit nebulous, but a definition from Martin Fowler, a respected author and speaker on software development, says this:
As the pivot to containerization and as microservices evolve in an organization, they will soon need a strategy to maintain many containers and microservices. Some organizations will have hundreds or even thousands of containers running in the years ahead.
Life cycle processes alone are an important piece of operation and management. How will we automatically recover when a container fails? Which upstream services are affected by such an outage? How will we patch our applications with minimal downtime? How will we scale up our containers and services as our traffic grows?
Networking and processing are also important concerns. Some processes are part of the same service and may benefit from proximity to the network. Databases, for example, may send large amounts of data to a particular microservice for processing. How will we place containers near each other in our cluster? Is there common data that needs to be accessed? How will new services be discovered and made available to other systems?
Resource utilization is also key. The small footprint of containers means that we can optimize our infrastructure for greater utilization. Extending the savings started in the Elastic cloud will take us even further toward minimizing wasted hardware. How will we schedule workloads most efficiently? How will we ensure that our important applications always have the right resources? How can we run less important workloads on spare capacity?
Finally, portability is a key factor in moving many organizations to containerization. Docker makes it very easy to deploy a standard container across various operating systems, cloud providers, and on-premise hardware or even developer laptops. However, we still need tooling to move containers around. How will we move containers between different nodes on our cluster? How will we roll out updates with minimal disruption? What process do we use to perform blue-green deployments or canary releases?
Whether you are starting to build out individual microservices and separating concerns into isolated containers or you simply want to take full advantage of the portability and immutability in your application development, the need for management and orchestration becomes clear. This is where orchestration tools such as Kubernetes offer the biggest value.
Kubernetes is supported on a variety of platforms and OSes. For the examples in this book, I used an Ubuntu 16.04 Linux VirtualBox (https://www.virtualbox.org/wiki/Downloads) for my client and Google Compute Engine (GCE) with Debian for the cluster itself. We will also take a brief look at a cluster running on Amazon Web Services (AWS) with Ubuntu.
We have a few options for setting up the prerequisites for our development environment. While we'll use a Linux client on our local machine in this example, you can also use the Google Cloud Shell to simplify your dependencies and setup. You can check out that documentation at https://cloud.google.com/shell/docs/, and then jump down to the gcloud auth login portion of the tutorial.
Getting back to the local installation, let's make sure that our environment is properly set up before we install Kubernetes. Start by updating the packages:
$ sudo apt-get update
You should see something similar to the following output:
$ sudo apt update
[sudo] password for user:
Hit:1 http://archive.canonical.com/ubuntu xenial InRelease
Ign:2 http://dl.google.com/linux/chrome/deb stable InRelease
Hit:3 http://archive.ubuntu.com/ubuntu xenial InRelease
Get:4 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB]
Ign:5 http://dell.archive.canonical.com/updates xenial-dell-dino2-mlk InRelease
Hit:6 http://ppa.launchpad.net/webupd8team/sublime-text-3/ubuntu xenial InRelease
Hit:7 https://download.sublimetext.com apt/stable/ InRelease
Hit:8 http://dl.google.com/linux/chrome/deb stable Release
Get:9 http://archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB]
Hit:10 https://apt.dockerproject.org/repo ubuntu-xenial InRelease
Hit:11 https://deb.nodesource.com/node_7.x xenial InRelease
Hit:12 https://download.docker.com/linux/ubuntu xenial InRelease
Ign:13 http://dell.archive.canonical.com/updates xenial-dell InRelease
<SNIPPED...>
Fetched 1,593 kB in 1s (1,081 kB/s)
Reading package lists... Done
Building dependency tree
Reading state information... Done
120 packages can be upgraded. Run 'apt list --upgradable' to see them.
$
Install Python and curl if they are not present:
$ sudo apt-get install python
$ sudo apt-get install curl
Install the gcloud SDK:
$ curl https://sdk.cloud.google.com | bash
Configure your GCP account information. This should automatically open a browser, from where we can log in to our Google Cloud account and authorize the SDK:
$ gcloud auth login
A default project should be set, but we can verify this with the following command:
$ gcloud config list project
We can modify this and set a new default project with the following command. Make sure to use project ID and not project name, as follows:
$ gcloud config set project <PROJECT ID>
You can turn on API access to your project at this point in the GCP dashboard, https://console.developers.google.com/project, or the Kubernetes script will prompt you to do so in the next section:
Next, you want to change to a directory when you can install the Kubernetes binaries. We'll set that up and then download the software:
$ mkdir ~/code/gsw-k8s-3
$ cd ~/code/gsw-k8s-3
Installing the latest Kubernetes version is done in a single step, as follows:
$ curl -sS https://get.k8s.io | bash
It may take a minute or two to download Kubernetes depending on your connection speed. Earlier versions would automatically call the kube-up.sh script and start building our cluster. In version 1.5, we will need to call thekube-up.shscript ourselves to launch the cluster. By default, it will use the Google Cloud and GCE:
$ kubernetes/cluster/kube-up.sh
If you get an error at this point due to missing components, you'll need to add a few pieces to your local Linux box. If you're running the Google Cloud Shell, or are utilizing a VM in GCP, you probably won't see this error:
$ kubernetes_install cluster/kube-up.sh...
Starting cluster in us-central1-b using provider gce
... calling verify-prereqs
missing required gcloud component "alpha"
missing required gcloud component "beta"
$
You can see that these components are missing and are required for leveraging the kube-up.sh script:
You can update the components by adding them to your shell:
$ gcloud components install alpha beta
Your current Cloud SDK version is: 193.0.0
Installing components from version: 193.0.0
┌──────────────────────────────────────────────┐
│ These components will be installed. │
├───────────────────────┬────────────┬─────────┤
│ Name │ Version │ Size │
├───────────────────────┼────────────┼─────────┤
│ gcloud Alpha Commands │ 2017.09.15 │ < 1 MiB │
│ gcloud Beta Commands │ 2017.09.15 │ < 1 MiB │
└───────────────────────┴────────────┴─────────┘
For the latest full release notes, please visit:
https://cloud.google.com/sdk/release_notes
Do you want to continue (Y/n)? y
╔════════════════════════════════════════════════════════════╗
╠═ Creating update staging area ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gcloud Alpha Commands ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Installing: gcloud Beta Commands ═╣
╠════════════════════════════════════════════════════════════╣
╠═ Creating backup and activating new installation ═╣
╚════════════════════════════════════════════════════════════╝
Performing post processing steps...done.
Update done!
After you run the kube-up.sh script, you will see quite a few lines roll past. Let's take a look at them one section at a time:
The preceding screenshot shows the checks for prerequisites, as well as making sure that all components are up to date. This is specific to each provider. In the case of GCE, it will verify that the SDK is installed and that all components are up to date. If not, you will see a prompt at this point to install or update:
Now, the script is turning up the cluster. Again, this is specific to the provider. For GCE, it first checks to make sure that the SDK is configured for a default project and zone. If they are set, you'll see those in the output:
Next, it uploads the server binaries to Google Cloud storage, as seen in the Creating gs:... lines:
It then checks for any pieces of a cluster already running. Then, we finally start creating the cluster. In the output in the preceding screenshot, we can see it creating the master server, IP address, and appropriate firewall configurations for the cluster:
Finally, it creates the minions or nodes for our cluster. This is where our container workloads will actually run. It will continually loop and wait while all the minions start up. By default, the cluster will have four nodes (minions), but K8s supports having more than 1,000 (and soon beyond). We will come back to scaling the nodes later on in this book:
Attempt 1 to create kubernetes-minion-template
WARNING: You have selected a disk size of under [200GB]. This may result in poor I/O performance. For more information, see: https://developers.google.com/compute/docs/disks#performance.
Created [https://www.googleapis.com/compute/v1/projects/gsw-k8s-3/global/instanceTemplates/kubernetes-minion-template].
NAME MACHINE_TYPE PREEMPTIBLE CREATION_TIMESTAMP
kubernetes-minion-template n1-standard-2 2018-03-17T11:14:04.186-07:00
Created [https://www.googleapis.com/compute/v1/projects/gsw-k8s-3/zones/us-central1-b/instanceGroupManagers/kubernetes-minion-group].
NAME LOCATION SCOPE BASE_INSTANCE_NAME SIZE TARGET_SIZE INSTANCE_TEMPLATE AUTOSCALED
kubernetes-minion-group us-central1-b zone kubernetes-minion-group 0 3 kubernetes-minion-template no
Waiting for group to become stable, current operations: creating: 3
Group is stable
INSTANCE_GROUPS=kubernetes-minion-group
NODE_NAMES=kubernetes-minion-group-176g kubernetes-minion-group-s9qw kubernetes-minion-group-tr7r
Trying to find master named 'kubernetes-master'
Looking for address 'kubernetes-master-ip'
Using master: kubernetes-master (external IP: 104.155.172.179)
Waiting up to 300 seconds for cluster initialization.
Now that everything is created, the cluster is initialized and started. Assuming that everything goes well, we will get an IP address for the master:
... calling validate-cluster
Validating gce cluster, MULTIZONE=
Project: gsw-k8s-3
Network Project: gsw-k8s-3
Zone: us-central1-b
No resources found.
Waiting for 4 ready nodes. 0 ready nodes, 0 registered. Retrying.
No resources found.
Waiting for 4 ready nodes. 0 ready nodes, 0 registered. Retrying.
Waiting for 4 ready nodes. 0 ready nodes, 1 registered. Retrying.
Waiting for 4 ready nodes. 0 ready nodes, 4 registered. Retrying.
Found 4 node(s).
NAME STATUS ROLES AGE VERSION
kubernetes-master Ready,SchedulingDisabled <none> 32s v1.9.4
kubernetes-minion-group-176g Ready <none> 25s v1.9.4
kubernetes-minion-group-s9qw Ready <none> 25s v1.9.4
kubernetes-minion-group-tr7r Ready <none> 35s v1.9.4
Validate output:
NAME STATUS MESSAGE ERROR
etcd-1 Healthy {"health": "true"}
scheduler Healthy ok
controller-manager Healthy ok
etcd-0 Healthy {"health": "true"}
Cluster validation succeeded
Also, note that configuration along with the cluster management credentials are stored in home/<Username>/.kube/config.
Then, the script will validate the cluster. At this point, we are no longer running provider-specific code. The validation script will query the cluster via the kubectl.sh script. This is the central script for managing our cluster. In this case, it checks the number of minions found, registered, and in a ready state. It loops through, giving the cluster up to 10 minutes to finish initialization.
After a successful startup, a summary of the minions and the cluster component health is printed on the screen:
Done, listing cluster services:
Kubernetes master is running at https://104.155.172.179
GLBCDefaultBackend is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/default-http-backend:http/proxy
Heapster is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/heapster/proxy
KubeDNS is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
kubernetes-dashboard is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
Metrics-server is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
Grafana is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/monitoring-grafana/proxy
InfluxDB is running at https://104.155.172.179/api/v1/namespaces/kube-system/services/monitoring-influxdb:http/proxy
