57,59 €
Kubernetes has taken the world by storm, becoming the standard infrastructure for DevOps teams to develop, test, and run applications. With significant updates in each chapter, this revised edition will help you acquire the knowledge and tools required to integrate Kubernetes clusters in an enterprise environment.
The book introduces you to Docker and Kubernetes fundamentals, including a review of basic Kubernetes objects. You’ll get to grips with containerization and understand its core functionalities such as creating ephemeral multinode clusters using KinD. The book has replaced PodSecurityPolicies (PSP) with OPA/Gatekeeper for PSP-like enforcement. You’ll integrate your container into a cloud platform and tools including MetalLB, externalDNS, OpenID connect (OIDC), Open Policy Agent (OPA), Falco, and Velero. After learning to deploy your core cluster, you’ll learn how to deploy Istio and how to deploy both monolithic applications and microservices into your service mesh. Finally, you will discover how to deploy an entire GitOps platform to Kubernetes using continuous integration and continuous delivery (CI/CD).
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 741
Veröffentlichungsjahr: 2021
Kubernetes – An Enterprise Guide
Second Edition
Effectively containerize applications, integrate enterprise systems, and scale applications in your enterprise
Marc Boorshtein
Scott Surovich
BIRMINGHAM—MUMBAI
Kubernetes – An Enterprise Guide
Second Edition
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Producer: Suman Sen
Acquisition Editor – Peer Reviews: Saby Dsilva
Project Editor: Amisha Vathare
Content Development Editor: Alex Patterson
Copy Editor: Safis Editing
Technical Editor: Aniket Shetty
Proofreader: Safis Editing
Indexer: Manju Arasan
Presentation Designer: Ganesh Bhadwalkar
First published: November 2020
Second edition: December 2021
Production reference: 3170522
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80323-003-0
www.packt.com
Marc Boorshtein has been a software engineer and consultant for twenty years and is currently the CTO of Tremolo Security, Inc. Marc has spent most of his career building identity management solutions for large enterprises, U.S. Government civilian agencies, and local government public safety systems. Marc has recently focused on applying identity to DevOps and Kubernetes, building open source tools for automating the security of infrastructure. Marc is a CKAD, and can often be found in the Kubernetes Slack channels answering questions about authentication and authorization.
To my wife, for supporting me building Tremolo Security and giving up a full time salary to build to do so. To my sons for keeping me on my toes, my mom for raising me and giving me my persistence, and in memory of my dad who pushed me to be my own boss and start my own company.
Scott Surovich has been in information technology for over 20 years and currently works at a Global Tier 1 bank as the Global Container Engineering lead and is the product owner for hybrid cloud Kubernetes deployments. Throughout his career he has worked on multiple Engineering teams in large enterprises and government agencies. He is an active in the community as a co-lead of the CNCF Financial Services Working Group and contributor to multiple open-source projects.
Scott has authored and reviewed other Kubernetes books, and has written and curated multiple chapters for Google's book, Anthos in Action. He holds the CKA, CKAD, and Mirantis Kubernetes certifications, and he was one of the first people to receive Google's premier certification as a Google Certified Hybrid Multi-Cloud Fellow.
I would like to thank my wife, Kim, for always being supportive and understanding of my technology addiction. To my mother, Adele, and in memory of my father, Gene, for teaching me to never give up, and that I can do anything that I set my mind to. To my brother, Chris, for the friendly competitions that always kept me striving to be better. Finally, a special thank you to my colleagues for not only supporting the book, but my role and the platform offering in the bank.
Sergei Bulavintsev is a cloud solutions architect at Altoros. He is passionate about open source, cloud-native infrastructure, and tools that increase developers' productivity. He has successfully migrated multiple customers to the cloud and Kubernetes, advocating and implementing the GitOps approach. Sergei is an active member of his local cloud-native community and holds industry certifications such as CKA, CKAD, CKS, and RHCA lvl 2.
I would like to thank my wife, Elena, and our two children, Maria and Maxim, for their support and patience. I also thank my family, friends, and colleagues, who have helped me become who I am today.
Join the book's Discord workspace for a monthly Ask me Anything session with the authors:
https://packt.link/K8EntGuide
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Docker and Container Essentials
Technical requirements
Understanding the need for containerization
Understanding why Kubernetes is deprecating Docker
Introducing Docker
Understanding Docker
Containers are ephemeral
Docker images
Image layers
Persistent data
Accessing services running in containers
Installing Docker
Preparing to install Docker
Installing Docker on Ubuntu
Granting Docker permissions
Using the Docker CLI
docker help
docker run
docker ps
docker start and stop
docker attach
docker exec
docker logs
docker rm
Summary
Questions
Deploying Kubernetes Using KinD
Technical requirements
Introducing Kubernetes components and objects
Interacting with a cluster
Using development clusters
Why did we select KinD for this book?
Working with a base KinD Kubernetes cluster
Understanding the node image
KinD and Docker networking
Keeping track of the nesting dolls
Installing KinD
Installing KinD – prerequisites
Installing kubectl
Installing the KinD binary
Creating a KinD cluster
Creating a simple cluster
Deleting a cluster
Creating a cluster config file
Multi-node cluster configuration
Customizing the control plane and Kubelet options
Creating a custom KinD cluster
Installing Calico
Installing an Ingress controller
Reviewing your KinD cluster
KinD storage objects
Storage drivers
KinD storage classes
Using KinD's storage provisioner
Adding a custom load balancer for Ingress
Installation prerequisites
Creating the KinD cluster configuration
Deploying a custom HAProxy container
Understanding HAProxy traffic flow
Simulating a kubelet failure
Summary
Questions
Kubernetes Bootcamp
Technical requirements
An overview of Kubernetes components
Exploring the control plane
The Kubernetes API server
The Etcd database
kube-scheduler
kube-controller-manager
cloud-controller-manager
Understanding the worker node components
kubelet
kube-proxy
Container runtime
Interacting with the API server
Using the Kubernetes kubectl utility
Understanding the verbose option
General kubectl commands
Introducing Kubernetes resources
Kubernetes manifests
What are Kubernetes resources?
Reviewing Kubernetes resources
ConfigMaps
Endpoints
Events
Namespaces
Nodes
Persistent Volume Claims
PVs
Pods
Replication controllers
ResourceQuotas
Secrets
Service accounts
Services
CustomResourceDefinitions
DaemonSets
Deployments
ReplicaSets
StatefulSets
HorizontalPodAutoscalers
CronJobs
Jobs
Ingress
NetworkPolicies
PodSecurityPolicies
ClusterRoleBindings
ClusterRoles
RoleBindings
Roles
CSI drivers
CSI nodes
Storage classes
Summary
Questions
Services, Load Balancing, ExternalDNS, and Global Balancing
Technical requirements
Exposing workloads to requests
Understanding how services work
Creating a service
Using DNS to resolve services
Understanding different service types
The ClusterIP service
The NodePort service
The LoadBalancer service
The ExternalName service
Introduction to load balancers
Understanding the OSI model
Layer 7 load balancers
Name resolution and layer 7 load balancers
Using nip.io for name resolution
Creating Ingress rules
Layer 4 load balancers
Layer 4 load balancer options
Using MetalLB as a layer 4 load balancer
Installing MetalLB
Understanding MetalLB's configuration file
MetalLB components
Creating a LoadBalancer service
Adding multiple IP pools to MetalLB
Using multiple protocols
Multiple protocol issues
Using multiple protocols with MetalLB
Using shared-IPs
Enhancing load balancers for the enterprise
Making service names available externally
Setting up external-dns
Integrating external-dns and CoreDNS
Adding an ETCD zone to CoreDNS
Creating a LoadBalancer service with external-dns integration
Integrating CoreDNS with an enterprise DNS
Load balancing between multiple clusters
Introducing the Kubernetes Global Balancer
Requirements for K8GB
Deploying K8GB to a cluster
Understanding K8GB load balancing options
Customizing the Helm chart values
Using Helm to install K8GB
Deploying a highly available application using K8GB
Adding an application to K8GB using custom resources
Adding an application to K8GB using Ingress annotations
Understanding how K8GB provides global load balancing
Keeping the K8GB CoreDNS servers in sync
Summary
Questions
Integrating Authentication into Your Cluster
Technical requirements
Understanding how Kubernetes knows who you are
External users
Groups in Kubernetes
Service accounts
Understanding OpenID Connect
The OpenID Connect protocol
Following OIDC and the API's interaction
id_token
Other authentication options
Certificates
Service accounts
TokenRequest API
Custom authentication webhooks
Keystone
Configuring KinD for OpenID Connect
Addressing the requirements
Using LDAP and Active Directory with Kubernetes
Mapping Active Directory groups to RBAC RoleBindings
Kubernetes Dashboard access
Kubernetes CLI access
Enterprise compliance requirements
Pulling it all together
Deploying OpenUnison
Configuring the Kubernetes API to use OIDC
Verifying OIDC integration
Using your tokens with kubectl
Introducing impersonation to integrate authentication with cloud-managed clusters
What is Impersonation?
Security considerations
Configuring your cluster for impersonation
Testing Impersonation
Configuring Impersonation without OpenUnison
Impersonation RBAC policies
Default groups
Authenticating from pipelines
Using tokens
Using certificates
Avoiding anti-patterns
Summary
Questions
RBAC Policies and Auditing
Technical requirements
Introduction to RBAC
What's a Role?
Identifying a Role
Roles versus ClusterRoles
Negative Roles
Aggregated ClusterRoles
RoleBindings and ClusterRoleBindings
Combining ClusterRoles and RoleBindings
Mapping enterprise identities to Kubernetes to authorize access to resources
Implementing namespace multi-tenancy
Kubernetes auditing
Creating an audit policy
Enabling auditing on a cluster
Using audit2rbac to debug policies
Summary
Questions
Deploying a Secured Kubernetes Dashboard
Technical requirements
How does the dashboard know who you are?
Dashboard architecture
Authentication methods
Understanding dashboard security risks
Deploying an insecure dashboard
Using a token to log in
Deploying the dashboard with a reverse proxy
Local dashboards
Other cluster-level applications
Integrating the dashboard with OpenUnison
Summary
Questions
Extending Security Using Open Policy Agent
Technical requirements
Introduction to dynamic admission controllers
What is OPA and how does it work?
OPA architecture
Rego, the OPA policy language
Gatekeeper
Deploying Gatekeeper
Automated testing framework
Using Rego to write policies
Developing an OPA policy
Testing an OPA policy
Deploying policies to Gatekeeper
Building dynamic policies
Debugging Rego
Using existing policies
Enforcing memory constraints
Enabling the Gatekeeper cache
Mocking up test data
Building and deploying our policy
Mutating objects and default values
Summary
Questions
Node Security with GateKeeper
Technical requirements
What is node security?
Understanding the difference between containers and VMs
Container breakouts
Properly designing containers
Enforcing node security with GateKeeper
What about Pod security policies?
What are the differences between PSPs and GateKeeper?
Authorizing node security policies
Deploying and debugging node security policies
Generating security context defaults
Enforcing cluster policies
Debugging constraint violations
Scaling policy deployment in multi-tenant clusters
Summary
Questions
Auditing Using Falco, DevOps AI, and ECK
Technical requirements
Exploring auditing
Introducing Falco
Exploring Falco's configuration files
The Helm Values file
Customizing the Helm Values
Falco rules config files
Understanding rules
Understanding conditions (fields and values)
Using macros
Understanding lists
Creating and appending to custom rules
Editing an existing rule
Creating a new rule
Deploying Falco
Introducing Falcosidekick
Installing Falcosidekick
Understanding Kubeless
Installing Kubeless
Deploying a function using Kubeless
Introducing DevOPs AI
Understand automatic responses to events
Deploy the NGINX server and test connectivity
Simulating an attack on the Pod
Observing Falco events
Using Falcosidekick-ui
Deploying our logging system
Creating a new namespace
Deploying the ECK operator
Deploying Elasticsearch, Filebeat, and Kibana
Using the components ECK to view logs
Creating a Kibana index
Browsing for events
Visualizations
Creating a dashboard
Creating a visualization for Falco event types
Summary
Questions
Backing Up Workloads
Technical requirements
Understanding Kubernetes backups
Performing an etcd backup
Backing up the required certificates
Backing up the etcd database
Introducing and setting up VMware's Velero
Velero requirements
Installing the Velero CLI
Installing Velero
Backup storage location
Deploying MinIO
Exposing MinIO and the console
Creating the S3 target configuration
Using Velero to back up workloads
Running a one-time cluster backup
Scheduling a cluster backup
Creating a custom backup
Managing Velero using the CLI
Using common Velero commands
Listing Velero objects
Retrieving details for a Velero object
Creating and deleting objects
Restoring from a backup
Restoring in action
Restoring a deployment from a backup
Backing up the namespace
Simulating a failure
Restoring a namespace
Using a backup to create workloads in a new cluster
Backing up the cluster
Building a new cluster
Restoring a backup to the new cluster
Installing Velero in the new cluster
Restoring a backup in a new cluster
Deleting the new cluster
Summary
Questions
An Introduction to Istio
Technical requirements
Why should you care about a service mesh?
Workload observability
Traffic management
Blue/Green Deployments
Canary Deployments
Finding issues before they happen
Security
Introduction to Istio concepts
Understanding the Istio components
Making the Control Plane Simple with Istiod
Breaking down the istiod pod
Understanding the istio-ingressgateway
Understanding the istio-egressgateway
Installing Istio
Downloading Istio
Installing Istio using a Profile
Introducing Istio resources
Authorization policies
Example 1: Denying and allowing all access
Example 2: Allowing only GET methods to a workload
Example 3: Allowing requests from a specific source
Gateways
Virtual Services
Destination rules
Peer authentications
Request authentication
Service entries
Sidecars
Envoy filters
Deploying add-on components to provide observability
Installing Prometheus
Installing Jaeger
Installing Kiali
Deploying an application into the service mesh
Deploying your first application into the mesh
Using Kiali to observe mesh workloads
The Kiali overview screen
Using the Graph view
Using the Application view
Using the Workloads view
Using the Services view
The Istio config view
Summary
Questions
Building and Deploying Applications on Istio
Technical requirements
Comparing microservices and monoliths
My history with microservices versus monolithic architecture
Comparing architectures in an application
Monolithic application design
Microservices design
Choosing between monoliths and microservices
Using Istio to help manage microservices
Deploying a monolith
Exposing our monolith outside our cluster
Configuring sticky sessions
Integrating Kiali and OpenUnison
Building a microservice
Deploying Hello World
Integrating authentication into our service
Authorizing access to our service
Telling your service who's using it
Authorizing user entitlements
Authorizing in service
Using OPA with Istio
Calling other services
Using OAuth2 Token Exchange
Passing tokens between services
Using simple impersonation
Do I need an API gateway?
Summary
Questions
Provisioning a Platform
Technical requirements
Designing a pipeline
Opinionated platforms
Securing your pipeline
Building our platform's requirements
Choosing our technology stack
Preparing our cluster
Deploying cert-manager
Deploying the Docker container registry
Deploying OpenUnison and GateKeeper
Deploying GitLab
Creating example projects
Deploying Tekton
Building Hello World
Building automatically
Deploying ArgoCD
Automating project onboarding using OpenUnison
Designing a GitOps strategy
Integrating GitLab
Integrating the TektonCD dashboard
Integrating ArgoCD
Updating OpenUnison
Deploying an application
Creating the application in Kubernetes
Getting access to developers
Deploying dev manifests
Deploying a Tekton pipeline
Running our pipeline
Promoting to production
Summary
Questions
Other Books You May Enjoy
Index
Cover
Index
Kubernetes has taken the world by storm, becoming the standard infrastructure for DevOps teams to develop, test, and run applications. Most enterprises are either running it already, or are planning to run it in the next year. A look at job postings on any of the major job sites shows that just about every big-name company has Kubernetes positions open. The fast rate of adoption has lead to Kubernetes-related positions growing by over 2,000% in the last 4 years.
One common problem that companies are struggling to address is the lack of enterprise Kubernetes knowledge. Since the technology is relatively new, and even newer for production workloads, companies have had issues trying to build teams to run clusters reliably. Finding people with basic Kubernetes skills is becoming easier, but finding people with knowledge on topics that are required for enterprise clusters is still a challenge.
We created this book to help DevOps teams to expand their skills beyond the basics of Kubernetes. It was created from the years of experience we have working with clusters in multiple enterprise environments.
There are many books available that introduce Kubernetes and the basics of installing clusters, creating deployments, and using Kubernetes objects. Our plan was to create a book that would go beyond a basic cluster, and to keep the book a reasonable length, we will not re-hash the basics of Kubernetes. Readers should have some experience with Kubernetes before reading this book.
While the primary focus of the book is to extend clusters with enterprise features, the first section of the book will provide a refresher of key Docker topics, and Kubernetes objects. It is important that you have a solid understanding of Kubernetes objects in order to get the most out of the more advanced chapters.
Chapter 1, Docker and Container Essentials, covers the problems Docker and Kubernetes address for developers. You will be introduced to Docker, including the Docker daemon, data, installation, and using the Docker CLI.
Chapter 2, Deploying Kubernetes Using KinD, covers KinD, a powerful tool that allows you to create a Kubernetes cluster ranging from a single node cluster to a full multi-node cluster. The chapter goes beyond a basic KinD cluster, explaining how to use a load-balancer running HAProxy to load-balance worker nodes. By the end of this chapter, you will understand how KinD works and how to create a custom multi-node cluster, which will be used for the exercises in the chapters.
Chapter 3, Kubernetes Bootcamp, provides a refresher on Kubernetes, and if you are new to Kubernetes, this chapter will cover most of the objects that a cluster includes. It will explain each object with a description of what each object does and its function in a cluster. It is meant to be a refresher, or a "pocket guide" to objects. It does not contain exhaustive details for each object (that would require a second book).
Chapter 4, Services, Load Balancing, ExternalDNS, and Global Balancing, explains how to expose a Kubernetes deployment using services. Each service type will be explained with examples, and you will learn how to expose them using both a layer 7 and layer 4 load balancer. In this chapter, you will go beyond the basics of a simple Ingress controller, installing MetalLB, to provide layer 4 access to services. You will also learn about two add-ons that benefit Enterprise clusters by install an incubator project called external-dns to provide dynamic name resolution for the services exposed by MetalLB and K8GB, which provides native Kubernetes Global Load Balancing.
Chapter 5, Integrating Authentication into Your Cluster, answers the question "once your cluster is built, how will users access it?" In this chapter we'll detail how OpenID Connect works and why you should use it to access your cluster. You'll also learn how to authenticate your pipelines, and finally we'll also cover several anti-patterns that should be avoided and explain why they should be avoided.
Chapter 6, RBAC Policies and Auditing, explains that once users have access to a cluster, you need to know how to limit their access. Whether you are providing an entire cluster to your users or just a namespace, you'll need to know how Kubernetes authorizes access via its role-based access control (RBAC) system. In this chapter, we'll detail how to design RBAC policies, how to debug them, and different strategies for multi-tenancy.
Chapter 7, Deploying a Secured Kubernetes Dashboard, covers the Kubernetes Dashboard, which is often the first thing users try to launch once a cluster is up and running. There's quite a bit of mythology around the security (or lack thereof).
Your cluster will be made of other web applications too, such as network dashboards, logging systems, and monitoring dashboards. This chapter looks at how the dashboard is architected, how to properly secure it, and examples of how not to deploy it with details as to why.
Chapter 8, Extending Security Using Open Policy Agent, provides you the guidance you need to deploy the OpenPolicyAgent and GateKeeper to enable policies that can't be implemented using RBAC. We'll cover how to deploy GateKeeper, how to write policies in Rego, and how to test your policies using OPA's built-in testing framework.
Chapter 9, Node Security with GateKeeper, deals with the security of the nodes that run your Pods. We will discuss how to securely design your containers so they are harder to abuse and how build policies using GateKeeper that constrain your containers from accessing resources they don't need.
Chapter 10, Auditing Using Falco, DevOps AI, and ECK, explains that Kubernetes includes event logging for API access, but it doesn't have the ability to capture container runtime events. To address this limitation, we will install a project that was donated to the CNCF by Sysdig called Falco. Using Falco, you will learn how to trigger actions based on events captured by Falco using Kubeless functions, and how to present the data that is captured by Falco using FalcoSideKick to forward events to the FalcoSidekick-UI and the ECK stack (Elastic Cloud on Kubernetes).
Chapter 11, Backing Up Workloads, explains how to create a backup of your cluster workloads for disaster recovery, or cluster migrations, using Velero. You will go hands-on to create an S3-compatible storage location using MinIO to create a backup of example workloads and restore the backup to a brand new cluster to simulate a cluster migration.
Chapter 12, An Introduction to Istio, explains that many enterprises use a service mesh to provide advanced features such as security, traffic routing, authentication, tracing, and observability to a cluster. This chapter will introduce you to Istio, a popular open-source mesh, and its architecture, along with the most commonly used resources provided it provides. You will deploy Istio to your KinD cluster with an example application and learn how to observe the behavior of an application using an observability tool called Kiali.
Chapter 13, Building and Deploying Applications on Istio, acknowledges that once you've deployed Istio, you'll want to develop and deploy applications that use it! This chapter starts with a walk-through of the differences between monoliths and micro-services and how they're deployed. Next, we'll step through building a micro-service to run in Istio and get into advanced topics like authentication, authorization, and service-to-service authentication for your services. You will also learn how to secure Kiali access by leveraging existing roles in Kubernetes using an OIDC provider and JSON Web Tokens.
Chapter 14, Provisioning a Platform, discusses how to build a platform for automating a multi-tenant cluster with GitLab, Tekton, ArgoCD, GateKeeper, and OpenUnison. We'll explore how to build pipelines and how to automate their creation. We'll explore how the objects that are used to drive pipelines are related to each other, how to build relationships between systems, and finally, how to create a self-service workflow for automating the deployment of pipelines.
You should have a basic understanding of the Linux, basic commands, tools like Git and a text editor like vi.
The book chapters contain both theory and hands-on exercises. We feel that the exercises help to reinforce the theory, but they are not required to understand each topic. If you want to do the exercises in the book, you will need to meet the requirement in the table below.
Requirements for the chapter exercises.
Version
Ubuntu Server
20.04 or higher
All exercises use Ubuntu, but most of them will work on other Linux installations. Chapter 10, Auditing using Falco, DevOps AI, and ECK has steps that are specific to Ubuntu and the exercise will likely fail to deploy correctly on other Linux installations.
The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Kubernetes---An-Enterprise-Guide-2E. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803230030_ColorImages.pdf.
Here's a link to the YouTube channel (created and managed by the authors Marc Boorshtein and Scott Surovich) that contains videos of the labs from this book, so you can see them in action even before you start on your own: https://packt.link/N5qjd
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "The --name option will set the name of the cluster to cluster01, and --config tells the installer to use the cluster01-kind.yaml config file."
A block of code is set as follows:
apiVersion:apps/v1kind:Deploymentmetadata:labels:app:grafananame:grafananamespace:monitoringWhen we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
apiVersion:apps/v1kind:Deploymentmetadata:labels:app:grafananame:grafananamespace:monitoringAny command-line input or output is written as follows:
PS C:\Users\mlb> kubectl create ns not-going-to-work namespace/not-going-to-work createdBold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: "Hit the Finish Login button at the bottom of the screen."
Warnings or important notes appear like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome.
General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.
Once you've read Kubernetes – An Enterprise Guide, Second Edition, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
Containers are one of the most transformational technologies that we have seen in years. Technology companies, corporations, and end users have all adopted them to handle everyday workloads. Increasingly, commercial off-the-shelf (COTS) applications are transforming from traditional installations into fully containerized deployments. With such a large technology shift, it is essential for anyone in the information technology realm to learn about containers.
In this chapter, we will introduce the problems that containers address. After an introduction to why containers are important, we will introduce the runtime that launched the modern container frenzy, Docker, and explain its relationship to Kubernetes. We'll also cover how Kubernetes' recent deprecation of support for Docker as a runtime impacts the use of Docker and why you should still be familiar with how to use it. By the end of this chapter, you will understand how to install Docker and how to use the most common Docker CLI commands.
In this chapter, we will cover the following topics:
Understanding the need for containerizationKubernetes deprecating DockerUnderstanding DockerInstalling DockerUsing the Docker CLIBefore we begin, you may have read that Kubernetes will be deprecating Docker as a compatible runtime in an upcoming release. This change will affect many businesses that work with containerization and Kubernetes. We will dig into it in the Understanding why Kubernetes is depreciating Docker section but rest assured that Docker is still the best way to introduce you to containers and the advantages that they deliver. It will still be used on many systems that run containers locally, rather than with an orchestration platform like Kubernetes.
This chapter has the following technical requirement:
An Ubuntu 20.04+ server with a minimum of 4 GB of RAM, though 8 GB is suggested.You can access the code for this chapter by going to this book's GitHub repository: https://github.com/PacktPublishing/Kubernetes---An-Enterprise-Guide-2E/tree/main/chapter1.You may have experienced a conversation like this at your office or school:
Developer: "Here's the new application. It went through weeks of testing and you are the first to get the new release."
….. A little while later …..
User: "It's not working. When I click the submit button, it shows an error about a missing dependency."
Developer: "That's weird; it's working fine on my machine."
This is one of the most frustrating things a developer can encounter when delivering an application. Often, the issues that creep up are related to a library that the developer had on their machine, but it wasn't included in the distribution of the package. It may seem like an easy fix for this would be to include all the libraries alongside the release, but what if this release contains a newer library that overwrites the older version, which may be required for a different application?
Developers need to consider their new releases, as well as any potential conflicts with any existing software on users' workstations. This often becomes a careful balancing act that requires larger deployment teams to test the application on different system configurations. It can also lead to additional rework for the developer or, in some extreme cases, full incompatibility with an existing application.
There have been various attempts to make application delivery easier over the years. First, there are solutions such as VMware's ThinApp, which virtualizes an application (not to be confused with virtualizing an operating system). It allows you to package the application and its dependencies into a single executable package. This packaging eliminates the issues of an application's dependencies conflicting with another application's dependencies since the application is in a self-contained package. This provided application isolation not only eliminates dependency issues but also provides an enhanced level of security and eases the burden of operating system migrations.
You may or may not have heard of application packaging, or the term application-on-a-stick, before reading this book. It sounds like a great solution to the "it worked on my machine" issue. There are many reasons it hasn't taken off as expected, though. For starters, most offerings are paid solutions that require a substantial investment. Besides licensing, they require a "clean PC," which means that for every application you want to virtualize, you need to start with a base system. The package you want to create uses the differences between the base installation and anything that was added after the initial system snapshot. The differences are then packaged into your distribution file, which can be executed on any workstation.
We've mentioned application virtualization to highlight that application issues such as "it works on my machine" have had different solutions over the years. Products such as ThinApp are just one attempt at solving the problem. Other attempts include running the application on a server using Citrix, Remote Desktop, Linux containers, chroot jails, and even virtual machines.
In December 2020, Kubernetes announced the deprecation of Docker as a supported container runtime. We thought it would be important to explain how the announcement affects any reason for using, or not using, Docker.
The announcement is only related to using Docker as the container runtime in a cluster – it is important to note that this is the only impact that removing Docker will have. You can still create new containers using Docker and they will run on any runtime that supports the Open Container Initiative (OCI) specification.
When you create a container using Docker, you are creating a container that is OCI compliant, so it will still run on Kubernetes clusters that are running any Kubernetes-compatible container runtime.
To fully explain the impact and the alternatives that will be supported, we need to understand what a container runtime is. A high-level definition would be that a container runtime is the software layer that runs and manages containers. Like many components that make up a Kubernetes cluster, the runtime is not included as part of Kubernetes – it is a pluggable module that needs to be supplied by a vendor, or by you, to create a functioning cluster.
There are many technical reasons that led up to the decision to deprecate Docker, but at a high level, the main concerns were:
Docker contains multiple pieces inside of the Docker executable to support its own remote API and user experience (UX). Kubernetes only requires one component in the executable, dockerd, which is the runtime process that manages containers. All other pieces of the executable contribute nothing to using Docker in a Kubernetes cluster. These extra components made the binary bloated, and could lead to additional bugs, security, or performance issues.Docker does not conform to the Container Runtime Interface (CRI) standard, which was introduced to create a set of standards to easily integrate container runtimes in Kubernetes. Since it doesn't comply, the Kubernetes team has had extra work that only caters to supporting Docker.When it comes to local container testing and development, you can still use Docker on your workstation or server. Building on the previous statement, if you build a container on Docker and the container successfully runs on your Docker runtime system, it will run on a Kubernetes cluster that is not using Docker as the runtime.
Removing Docker will have very little impact on most users of Kubernetes in new clusters. Containers will still run using any standard method, as they would with Docker as the container runtime. If you happen to manage a cluster, you may need to learn new commands when you troubleshoot a Kubernetes node – you will not have a Docker command on the node to look at running containers, or to clean up volumes, etc…
At the time of writing this chapter, Kubernetes supports the following runtimes in place of Docker:
containerdRocket (rkt)CRI-OFrakticri-containerdL: https://github.com/containerd/crisingularity-cri: https://github.com/sylabs/singularity-criThis list will evolve; you can always view the latest supported runtimes on the Kubernetes Git at https://github.com/kubernetes/community/blob/master/contributors/devel/sig-node/container-runtime-interface.md.
Since we are focusing on general containers and we will be using Docker as our runtime to create KinD clusters, we will not go into too many details on the alternative runtimes. They are only being mentioned here to explain the alternatives that can be used on a cluster.
For more details on the impact of deprecating Docker, refer to the article called Don't Panic: Kubernetes and Docker on the Kubernetes.io site at https://kubernetes.io/blog/2020/12/02/dont-panic-kubernetes-and-docker/.
Now, let's introduce Docker and how you can use it to create and manage containers.
The industry and even end users needed something that was easier and cheaper – enter Docker containers. Containers are not a new technology; they have been used in various forms for years. What Docker did was make them accessible to the average developer.
Docker brought an abstraction layer to the masses. It was easy to use and didn't require a clean PC for every application before creating a package, thus offering a solution for dependency issues, but most attractive of all, it was free. Docker became a standard for many projects on GitHub, where teams would often create a Docker container and distribute the Docker image or Dockerfile to team members, providing a standard testing or development environment. This adoption by end users is what eventually brought Docker to the enterprise and, ultimately, what made it the standard it has become today.
While there are many books on Docker, this book focuses on the base topics of Docker that are used to interact with containers. This book will be focusing on what you will need to know when trying to use a local Kubernetes environment. There is a long and interesting history of Docker and how it evolved into the standard container image format that we use today. We encourage you to read about the company and how they ushered in the container world we know today.
While our focus is not to teach Docker inside out, we felt that those of you who are new to Docker would benefit from a quick primer on general container concepts.
If you have some Docker experience and understand terminology such as ephemeral and stateless, feel free to continue to the Installing Docker section.
This book was created with the assumption that you have some basic knowledge of Docker and container concepts. We realize that not everyone may have played with Docker or containers in the past, so we wanted to present a crash course on container concepts and using Docker.
If you are new to containers, we suggest reading the documentation that can be found on Docker's website for additional information: https://docs.docker.com/.
The first thing to understand is that container images are ephemeral.
For those of you who are new to Docker, the term "ephemeral" means short-lived. By design, a container can be destroyed at any time and brought back up with no interaction from a user. In the following example, someone interactively adds files to a web server. These added files are only temporary since the base image does not have these files included in it.
This means that once a container is created and running, any changes that are made to the image will not be saved once the container is removed, or destroyed, from the Docker host. Let's look at an example:
You start a container running a web server using NGINX on your host without any base HTML pages.Using a Docker command, you execute a copy command to copy some web files into the container's filesystem.To test that the copy was successful, you browse to the website and confirm that it is serving the correct web pages.Happy with the results, you stop the container and remove it from the host. Later that day, you want to show a co-worker the website and you start your NGINX container. You browse to the website again, but when the site opens, you receive a 404 error (page not found error).What happened to the files you uploaded before you stopped and removed the container from the host?
The reason your web pages cannot be found after the container was restarted is that all containers are ephemeral.
Whatever is in the base container image is all that will be included each time the container is initially started. Any changes that you make inside a container are short-lived.
If you need to add permanent files to an existing image, you need to rebuild the image with the files included or, as we will explain in the Persistent data section later in this chapter, you could mount a Docker volume in your container. At this point, the main concept to understand is that containers are ephemeral.
But wait! You may be wondering, "If containers are ephemeral, how did I add web pages to the server?". Ephemeral just means that changes will not be saved; it doesn't stop you from making changes to a running container.
Any changes made to a running container will be written to a temporary layer, called the container layer, which is a directory on the local host filesystem. The Docker storage driver is in charge of handling requests that use the container layer. This location will store any changes in the container's filesystem so that when you add the HTML pages to the container, they will be stored on the local host. The container layer is tied to the container ID of the running image and it will remain on the host system until the container is removed from Docker, either by using the CLI or by running a Docker prune job.
If a container is ephemeral and the image cannot be written to, how can you modify data in the container? Docker uses image layering to create multiple linked layers that appear as a single filesystem.
At a high level, a Docker image is a collection of image layers, each with a JSON file that contains metadata for the layer. These are all combined to create the running application that you interact with when a container image is started.
You can read more about the contents of an image on Docker's GitHub at https://github.com/moby/moby/blob/master/image/spec/v1.md.
As we mentioned in the previous section, a running container uses a container layer that is "on top" of the base image layer, as shown in the following diagram:
Figure 1.1: Docker image layers
The image layers cannot be written to since they are in a read-only state, but the temporary container layer is in a writeable state. Any data that you add to the container is stored in this layer and will be retained as long as the container is running.
To deal with multiple layers efficiently, Docker implements copy-on-write, which means that if a file already exists, it will not be created. However, if a file is required that does not exist in the current image, it will be written. In the container world, if a file exists in a lower layer, the layers above it do not need to include it. For example, if layer 1 had a file called /opt/nginx/index.html in it, layer 2 does not need the same file in its layer.
This explains how the system handles files that either exist or do not exist, but what about a file that has been modified? There will be times where you'll need to "replace" a file that is in a lower layer. You may need to do this when you are building an image or as a temporary fix to a running container issue. The copy-on-write system knows how to deal with these issues. Since images read from the top down, the container uses only the highest layer file. If your system had a /opt/nginx/index.html file in layer 1 and you modified and saved the file, the running container would store the new file in the container layer. Since the container layer is the topmost layer, the new copy of index.html would always be read before the older version in the image layer.
Being limited to ephemeral-only containers would severely limit the use cases for Docker. It is very likely that you will have some use cases that will require persistent storage, or data that will remain if you stop a container.
This may seem like we are contradicting our earlier statement that containers are ephemeral, but that is still true. When you store data in the container image layer, the base image does not change. When the container is removed from the host, the container layer is also removed. If the same image is used to start a new container, a new container image layer is also created. So, the container is ephemeral, but by adding a Docker volume to the container, you can store data outside of the container, thus gaining data persistency.
Unlike a physical machine or a virtual machine, containers do not connect to a network directly. When a container needs to send or receive traffic, it goes through the Docker host system using a bridged NAT network connection. This means that when you run a container and you want to receive incoming traffic requests, you need to expose the ports for each of the containers that you wish to receive traffic on. On a Linux-based system, iptables has rules to forward traffic to the Docker daemon, which will service the assigned ports for each container.
That completes the introduction to base containers and Docker. In the next section, we will explain how to install Docker on a host.
The hands-on exercises in this book will require that you have a working Docker host. You can follow the steps in this book, or you can execute the script located in this book's GitHub repository, in the chapter1 directory, called install-docker.sh.
Today, you can install Docker on just about every hardware platform out there. Each version of Docker acts and looks the same on each platform, making development and using Docker easy for people who need to develop cross-platform applications. By making the functions and commands the same between different platforms, developers do not need to learn a different container runtime to run images.
The following is a table of Docker's available platforms. As you can see, there are installations for multiple operating systems, as well as multiple CPU architectures:
Figure 1.2: Available Docker platforms
Images that are created using one architecture cannot run on a different architecture. This means that you cannot create an image based on x86 hardware and expect that same image to run on your Raspberry Pi running an ARM processor. It's also important to note that while you can run a Linux container on a Windows machine, you cannot run a Windows container on a Linux machine.
The installation procedures that are used to install Docker vary between platforms. Luckily, Docker has documented many of the installation procedures on their website: https://docs.docker.com/install/.
In this chapter, we will install Docker on an Ubuntu 18.04 system. If you do not have an Ubuntu machine to install on, you can still read about the installation steps, as each step will be explained and does not require that you have a running system to understand the process. If you have a different Linux installation, you can use the installation procedures outlined on Docker's site at https://docs.docker.com/. Steps are provided for CentOS, Debian, Fedora, and Ubuntu, and there are generic steps for other Linux distributions.
Before we start the installation, we need to consider what storage driver to use. The storage driver is what provides the union filesystem, which manages the layers of the container and how the writeable layer of the container is accessed.
In most installations, you won't need to change the default storage driver since a default option will be selected. If you are running a Linux kernel that is at least version 4.0 or above, your Docker installation will use the overlay2 storage driver; earlier kernels will install the AUFS storage driver.
For reference, along with the overlay2 and AUFS drivers, Docker supports the btrfs storage driver. However, these are rarely used in new systems and are only mentioned here as a reference.
If you would like to learn about each storage driver, take a look at the following Docker web page, which details each driver and its use cases: https://docs.docker.com/storage/storagedriver/select-storage-driver/.
Now that you understand the storage driver requirements, the next step is to select an installation method. You can install Docker using one of three methods:
Add the Docker repositories to your host systemInstall the package manuallyUse a supplied installation script from DockerThe first option is considered the best option since it allows for easy installation and making updates to the Docker engine. The second option is useful for enterprises that do not have internet access to servers, also known as air-gapped servers. The third option is used to install edge and testing versions of Docker and is not suggested for production use.
Since the preferred method is to add Docker's repository to our host, we will use that option and explain the process we should use to add the repository and install Docker.
Now that we have finished preparing everything, let's install Docker. (If you ran the install script from the book repo, you do not need to execute any of the installation steps)
The first step is to update the package index by executing apt-get update: sudo apt-get update Next, we need to add any packages that may be missing on the host system to allow HTTPS apt access: sudo apt-get install -y apt-transport-https ca-certificates curl gnupg lsb-release To pull packages from Docker's repository, we need to add their keys. You can add keys by using the following command, which will download the gpg key and add it to your system: curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpgNow, add Docker's repository to your host system:
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/nullWith all the prerequisites completed, you can install Docker on your server:
sudo apt-get update && sudo apt-get install -y docker-ce docker-ce-cli containerd.io Docker is now installed on your host, but like most new services, Docker is not currently running and has not been configured to start with the system. To start Docker and enable it on startup, use the following command: sudo systemctl enable docker && systemctl start dockerNow that we have Docker installed, let's get some configuration out of the way. First, we'll grant permissions to Docker.
In a default installation, Docker requires root access, so you will need to run all Docker commands as root. Rather than using sudo with every Docker command, you can add your user account to a new group on the server that provides Docker access without requiring sudo for every command.
If you are logged on as a standard user and try to run a Docker command, you will receive an error:
Got permission denied while trying to connect to the Docker daemon socket at unix:///var/run/docker.sock: Get http://%2Fvar%2Frun%2Fdocker.sock/v1.40/images/json: dial unix /var/run/docker.sock: connect: permission deniedTo allow your user, or any other user you may want to add, to execute Docker commands, you need to add the users to a new group called docker that was created during the installation of Docker. The following is an example command you can use to add the currently logged-on user to the group:
sudo usermod -aG docker $USERTo add the new members to your account, you can either log off and log back into the Docker host, or activate the group changes using the newgrp command:
newgrp dockerFinally, you can test that it works by running the standard hello-world image (note that we do not require sudo to run the Docker command):
docker run hello-worldYou should see the output shown below, which verifies that your user has access to Docker:
Unable to find image 'hello-world:latest' locally latest: Pulling from library/hello-world 2db29710123e: Pull complete Digest: sha256:37a0b92b08d4919615c3ee023f7ddb068d12b8387475d64c622ac30f45c29c51 Status: Downloaded newer image for hello-world:latest Hello from Docker!This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
The Docker client contacted the Docker daemonThe Docker daemon pulled the hello-world image from Docker Hub (amd64)The Docker daemon created a new container from that image that runs the executable that produces the output you are currently readingThe Docker daemon streamed that output to the Docker client, which sent it to your terminalTo try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bashShare images, automate workflows, and more with a free Docker ID: https://hub.docker.com/
For more examples and ideas, visit https://docs.docker.com/get-started/
Now that we've granted Docker permission to run without sudo, we can start unlocking the commands at our disposal by learning how to use the Docker CLI.
You used the Docker CLI when you ran the hello-world container to test your installation. The Docker command is what you will use to interact with the Docker daemon. Using this single executable, you can do the following, and more:
Start and stop containersPull and push imagesRun a shell in an active containerLook at container logsCreate Docker volumesCreate Docker networksPrune old images and volumesThis chapter is not meant to include an exhaustive explanation of every Docker command; instead, we will explain some of the common commands that you will need to use to interact with the Docker daemon and containers. Since we consider volumes and networking to be very important topics to understand for this book, we will go into additional details on those topics.
You can break down Docker commands into two categories: general Docker commands and Docker management commands. The standard Docker commands allow you to manage containers, while management commands allow you to manage Docker options such as managing volumes and networking.
It's common to forget an option or the syntax for a command, and Docker realizes this. Whenever you get stuck trying to remember a command, you can always use the docker help command to refresh your memory.
To run a container, use the docker run command with the provided image name. Before executing a docker run command, you should understand the options you can supply when starting a container.
In its simplest form, an example command you can use to run an NGINX web server would be docker run bitnami/nginx:latest. While this will start a container running NGINX, it will run in the foreground, showing logs of the application running in the container. Press Ctrl + C to stop running the container:
nginx 22:52:27.42 nginx 22:52:27.42 Welcome to the Bitnami nginx container nginx 22:52:27.43 Subscribe to project updates by watching https://github.com/bitnami/bitnami-docker-nginx nginx 22:52:27.43 Submit issues and feature requests at https://github.com/bitnami/bitnami-docker-nginx/issues nginx 22:52:27.44 nginx 22:52:27.44 INFO ==> ** Starting NGINX setup ** nginx 22:52:27.49 INFO ==> Validating settings in NGINX_* env vars nginx 22:52:27.50 INFO ==> Initializing NGINX nginx 22:52:27.53 INFO ==> ** NGINX setup finished! ** nginx 22:52:27.57 INFO ==> ** Starting NGINX **To run a container as a background process, you need to add the -d option to your Docker command, which will run your container in detached mode. Now, when you run a detached container, you will only see the container ID, instead of the interactive, or attached, screen:
Figure 1.3: Container ID displayed
[root@localhost ~]# docker run -d bitnami/nginx:latest 13bdde13d0027e366a81d9a19a56c736c28feb6d8354b363ee738d2399023f80 [root@localhost ~]#By default, containers will be given a random name once they are started. In our previous detached example, the container has been given the name silly_keldysh:
CONTAINER ID IMAGE NAMES 13bdde13d002 bitnami/nginx:latest silly_keldyshIf you do not assign a name to your container, it can quickly get confusing as you start to run multiple containers on a single host. To make management easier, you should always start your container with a name that will make it easier to manage. Docker provides another option with the run command: the --name option. Building on our previous example, we will name our container nginx-test. Our new docker run command will be as follows:
docker run --name nginx-test -d bitnami/nginx:latestJust like running any detached image, this will return the container ID, but not the name you provided. In order to verify the container ran with the name nginx-test, we can list the containers using the docker ps command.
Every day, you will need to retrieve a list of running containers or a list of containers that have been stopped. The Docker CLI has an option called ps that will list all running containers, or if you add an extra option to the ps command, all containers that are running and have been stopped. The output will list the containers, including their container ID, image tag, entry command, the creation date, status, ports, and the container name. The following is an example of containers that are currently running:
Figure 1.4: Currently running containers
CONTAINER ID IMAGE COMMAND CREATED 13bdde13d002 bitnami/nginx:latest "/opt/bitnami/script…" Up 4 hours 3302f2728133 registry:2 "/entrypoint.sh /etc…" Up 3 hoursThis is helpful if the container you are looking for is currently running. What if the container was stopped, or even worse, what if you started the container and it failed to start and then stopped? You can view the status of all containers, including previously run containers, by adding the -a option to the docker ps command. When you execute docker ps -a, you will see the same output from a standard ps command, but you will notice that the list may include additional containers.
How can you tell what containers are running versus which ones have stopped? If you look at the STATUS field of the list, the running containers will show a running time; for example, Up xx hours, or Up xx days. However, if the container has been stopped for any reason, the status will show when it stopped; for example, Exited (0) 10 minutes ago.
IMAGE COMMAND CREATED STATUS bitnami/nginx:latest "/opt/bitnami/script…" 10 minutes ago Up 10 minutes bitnami/nginx:latest "/opt/bitnami/script…" 12 minutes ago Exited (0) 10 minutes agoA stopped container does not mean there was an issue running the image. There are containers that may execute a single task and, once completed, the container may stop gracefully. One way to determine whether an exit was graceful or whether it was due to a failed startup is to check the logs of the container.
To stop a running container, use the docker stop option with the name of the container you want to stop. You may wish to stop a container due to the resources on the host since you may have limited resources and can only run a few containers simultaneously.
If you need to start that container at a future time for additional testing or development, execute docker start container name, which will start the container with all of the options that it was originally started with, including any networks or volumes that were assigned.
You may need to access a container interactively to troubleshoot an issue or to look at a log file. One method to connect to a running container is to use the docker attach container name command. When you attach to a running container, you will connect to the running container's process, so if you attach to a container running a process, you are not likely to just see a command prompt of any kind. In fact, you may see nothing but a blank screen for some time until the container outputs some data to the screen.
