37,19 €
Knowing how to use Rancher enables you to manage multiple clusters and applications without being locked into a vendor’s platform. This book will guide you through Rancher’s capabilities while deepening your understanding of Kubernetes and helping you to take your applications to a new level.
The book begins by introducing you to Rancher and Kubernetes, helping you to learn and implement best practices. As you progress through the chapters, you’ll understand the strengths and limitations of Rancher and Kubernetes and discover all the different ways to deploy Rancher. You’ll also find out how to design and deploy Kubernetes clusters to match your requirements. The concluding chapters will show you how to set up a continuous integration and continuous deployment (CI/CD) pipeline for deploying applications into a Rancher cluster, along with covering supporting services such as image registries and Helm charts.
By the end of this Kubernetes book, you’ll be able to confidently deploy your mission-critical production workloads on Rancher-managed Kubernetes clusters.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 504
Veröffentlichungsjahr: 2022
Manage enterprise Kubernetes seamlessly with Rancher
Matthew Mattox
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Rahul Nair
Publishing Product Manager: Meeta Rajani
Senior Editor: Athikho Sapuni Rishana
Content Development Editor: Sayali Pingale
Technical Editor: Nithik Cheruvakodan
Copy Editor: Safis Editing
Associate Project Manager: Neil Dmello
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Sinhayna Bais
Marketing Coordinator: Nimisha Dua
Senior Marketing Coordinator: Sanjana Gupta
First published: June 2022
Production reference: 1070622
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
978-1-80324-609-3
www.packt.com
To my wife, Samantha Mattox, for being my foundation during this life journey. I want to say thank you for putting up with all of the long and late nights, helping when I have writer's block, and always pushing me forward to achieve my goals and aspirations.
– Matthew Mattox
Since 2019, Matthew Mattox has served as a SUSE principal support engineer, providing client-focused support. Having experience in both engineering and DevOps, Matthew has a deep understanding of Kubernetes, Docker, Rancher, Longhorn, and OPA Gatekeeper. Apart from designing custom solutions to solve changing problems, he was named "Bullfighter of the Year" for his outstanding work at Rancher Labs. One of his primary goals is to make IT a profit center within your company, not a cost center.
I want to thank all of the people at Rancher/SUSE for all that they have taught me.
In particular, I would like to thank…
Bala Gopalan – Thank you for giving me the tools to succeed.
Ahmad Emneina – Thank you for being a great mentor and always being someone I can count on when I run out of ideas.
Alena Prokharchyk – Thank you for teaching me all of the ins and outs of Rancher; you really inspired me to get into software development.
Hussein Galal – Thank you for teaching me what Kubernetes is and what it can do, and for always being someone I can count on.
Also, thank you to all of the Rancher/SUSE team for their support.
Lucas Ramage is a co-founder of Infinite Omicron, a company focused on cybersecurity, DevOps, and open source technologies. Prior roles include those of a cyber security consultant and embedded firmware engineer. He also has a bachelor's degree in computer science. For the last 10 years, he has been an active contributor to multiple Linux distributions, including Alpine, Gentoo, and OpenWrt, and also the Android-based distribution, Termux. In addition to maintaining open source projects, he also mentors for Google Summer of Code. He enjoys sharing knowledge and helping others to grow.
I am truly grateful to the open source community and to all of my colleagues and friends who I have been able to learn from over the years. Thank you all for sharing your knowledge with me and the rest of the world. I am also very thankful to my family and my faith for supporting me in life.
Rancher Labs is now part of SUSE.
SUSE is a global leader in innovative, reliable, secure enterprise-grade open source solutions, relied upon by more than 60% of the Fortune 500 to power their mission-critical workloads. SUSE specializes in business-critical Linux, Enterprise container management, and Edge solutions, and collaborates with partners and communities to empower customers to innovate everywhere – from the data center to the cloud, to the edge, and beyond.
SUSE's solutions already power everything from autonomous driving to CAT scan and mammogram machines. Its open source software is embedded inside air traffic control systems, weather forecasting technologies, trains, and satellites.
Business-critical Linux: The SUSE Linux Enterprise family provides a stable, secure, and well-supported Linux operating system for mission-critical workloads such as SAP S/4HANA and other solutions.Enterprise container management: SUSE Rancher solutions enable customers to standardize cloud-native workload operations across all devices and landscapes, including with end-to-end security meeting the highest standards thanks to SUSE's NeuVector technology.Edge solutions: The new Edge offerings bring the best of SUSE's Linux and container technologies together. This is helping SUSE to truly innovate at scale by pushing business applications to where they are needed most.SUSE puts the open back in open source, giving customers the agility to tackle innovation challenges today and the freedom to evolve their strategy and solutions tomorrow. The company employs more than 2,000 people globally. SUSE is listed on the Frankfurt Stock Exchange.
Visit suse.com to learn more and follow us on our social handles: @SUSE and @Rancher_Labs.
Rancher and Kubernetes have been driving the wave of DevOps adoption for both on-premises and cloud workloads. This book will guide you through the history of Rancher and Kubernetes and how they came into being. We will dive into how to design, build, and manage your Rancher environment. We will then build upon Rancher, deploying a range of cluster types, including RKE, RKE2, k3s, EKS, and GKE. With each of these cluster types, we will go over how they work, design a solution around them, and finally, deploy them using Rancher.
We will then shift into getting your clusters production-ready. This includes how we back up and restore the different cluster types and monitor the health of our clusters and the application hosted on them. Then, we will dive into how to provide the additional services needed outside of core Kubernetes services, including persistent storage with Longhorn, security/compliance using OPA Gatekeeper, and how to bring dynamic scaling to our clusters.
We will then close the book by covering how to build and deploy our application in a Rancher/Kubernetes ecosystem using tools such as Drone CI for our CI/CD pipeline and Harbor for hosting build artifacts. We will then dive into the deep topic of Helm charts and how they bring package management to our clusters. Finally, we will close by covering resource management and cost reporting to address the goal of turning IT from a black hole into which you throw money into the profit center it can be.
This book primarily targets DevOps engineers looking to deploy Kubernetes with Rancher, including how Rancher changed how clusters are built and managed using RKE (Rancher Kubernetes Engine) and RKE2/k3s. It is also for people who want to learn more about the Day 2 task part of the Kubernetes and Rancher ecosystem.
Chapter 1, Introduction to Rancher and Kubernetes, explores the history of Rancher and its earlier products, and how Kubernetes changed the whole picture.
Chapter 2, Rancher and Kubernetes High-Level Architecture, discusses the different products that make up the Rancher ecosystem, including the Rancher server, RKE1/2, and k3s.
Chapter 3, Creating a Single Node Rancher, delves into a single node Rancher install, and the limitations of using it in addition to how to migrate to an HA setup.
Chapter 4, Creating an RKE and RKE2 Cluster, looks at RKE1 and 2, how they work, and the rules for architecting a solution using them.
Chapter 5, Deploying Rancher on a Hosted Kubernetes Cluster, covers how to install Rancher on a hosted Kubernetes cluster such as Google Kubernetes Engine (GKE), Amazon Elastic Container Service (EKS) for Kubernetes, Azure Kubernetes Service (AKS), or Digital Ocean's Kubernetes Service (DOKS).
Chapter 6, Creating an RKE Cluster Using Rancher, demonstrates how to use Rancher to deploy a downstream RKE cluster along with the rules of architecting this type of cluster.
Chapter 7, Deploying a Hosted Cluster with Rancher, uses cloud providers to deploy hosted Kubernetes clusters using Rancher for managing the cluster over time.
Chapter 8, Importing an Externally Managed Cluster into Rancher, shows how to bring any kind of Kubernetes into Rancher and how Rancher can gain access to imported clusters.
Chapter 9, Cluster Configuration Backup and Recovery, describes how you back up an RKE1/2 cluster using etcd backups in addition to how to restore a cluster from a backup.
Chapter 10, Monitoring and Logging, explains how to use Rancher monitoring to deploy Prometheus, Grafana, and alert manager for monitoring the health of your cluster, in addition to how to use Banzai Cloud Logging to capture your pod logs.
Chapter 11, Bring Storage to Kubernetes Using Longhorn, explores why you would need persistent storage in your Kubernetes cluster and how Longhorn can solve this problem, including how Longhorn works and how to architect a solution using Longhorn.
Chapter 12, Security and Compliance Using OPA Gatekeeper, talks about how to enforce standards and security in your Kubernetes cluster using tools such as OPA Gatekeeper and NeuVector.
Chapter 13, Scaling in Kubernetes, delves into using Horizontal Pod Autoscaler (HPA), Vertical Pod Autoscaler (VPA), and Cluster Autoscaler to dynamically scale your environment.
Chapter 14, Load Balancer Configuration and SSL Certificates, explains how to publish applications hosted in Kubernetes to the outside world using ingresses and load balancers.
Chapter 15, Rancher and Kubernetes Troubleshooting, explains how to recover from some of the most common failures and issues, including how to detect and prevent them in the future along with how to reproduce these issues in a lab environment.
Chapter 16, Setting Up a CI/CD Pipeline and Image Registry, explains what a CI/CD pipeline is and how we can use it to deploy applications in a standardized and controlled process, along with deploying Drone CI and Harbor to support your applications.
Chapter 17, Creating and Using Helm Charts, describes Helm charts and how we can use Helm to package applications both from public and private and then how to publish them in a Kubernetes cluster.
Chapter 18, Resource Management, explains how to manage resources inside your Kubernetes cluster along with monitoring and controlling the cost of hosted applications in Kubernetes.
This book assumes that you have a basic understanding of Linux server administration, including basic Bash scripting, installing packages, and automating tasks at scale. In addition, we are going to assume that you have a basic understanding of most cloud platforms, such as AWS, GCP, vSphere, or Azure.
It is also recommended to have a lab environment to deploy Rancher and RKE1/2 clusters. An important note: most cloud providers offer trail credits that should be more than enough to spin up small lab clusters.
Finally, Kubernetes and Rancher are ever-changing, so it is important to remember that version numbers will need to be changed as time moves on. So, it is highly recommended to review the release notes of each product and software before picking a version.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803246093_ColorImages.pdf.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Rancher-Deep-Dive. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "It is pretty common to rename the local cluster to something more helpful, such as rancher-prod or rancher-west."
A block of code is set as follows:
helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo updatehelm upgrade –install -n monitoring monitoring prometheus-community/kube-prometheus-stackBold: Indicates a new term, an important word, or words that you see on screen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "With Rancher logging, it is recommended to deploy via the App Marketplace in the Rancher UI by going to Cluster Tools and clicking on the Logging app."
Tips or Important Notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read Rancher Deep Dive, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
By the end of this part of the book, you should be able to architect a Rancher/Kubernetes solution that meets your application's needs.
This part of the book comprises the following chapters:
Chapter 1, Introduction to Rancher and KubernetesChapter 2, Rancher and Kubernetes High-Level ArchitectureThis chapter will focus on the history of Rancher and Kubernetes. We will cover what products and solutions came before Rancher and Kubernetes and how they have evolved into what they are today. At the end of this chapter, you should have a good understanding of the origins of Rancher and Kubernetes and their core concepts. This knowledge is essential for you to understand why Rancher and Kubernetes are what they are.
In this chapter, we're going to cover the following main topics:
The history of Rancher Labs as a companyRancher's earlier productsWhat is Rancher's core philosophy?Where did Kubernetes come from?What problem is Kubernetes trying to solve?Comparing Kubernetes with Docker Swarm and OpenShiftRancher Labs was founded in 2014 in Cupertino, California, by Sheng Liang, Shannon Williams, Darren Shepherd, and Will Chanas. It was a container management platform before Kubernetes was a thing. From the beginning, Rancher was built on the idea that everything should be open source and community-driven. With Rancher being an open source company, all of the products they have released (including Rancher, RancherOS, RKE, K3s, Longhorn, and more) have been 100% open source. Rancher Lab's flagship product is Rancher. Primarily, Rancher is a management and orchestration platform for containerized workloads both on-premises and in the cloud. Rancher can do this because it has always been vendor-neutral; that is, Rancher can deploy a workload using physical hardware in your data center to cloud VMs in AWS to even a Raspberry Pi in a remote location.
When Rancher v1.0 was released in March of 2016, it only supported Docker Swarm and Rancher Cattle clusters. Docker Swarm was the early cluster orchestration tool that created a number of the core concepts that we still use today; for instance, the idea that an application should be defined as a group of containers that can be created and destroyed at any time. Another concept is that containers should live on a virtual network that is accessible on all nodes in a cluster. You can expose your containers via a load balancer which, in the case of Docker Swarm, is just a basic TCP load balancer.
While the Rancher server was being created, Rancher Labs was working on their own Docker cluster software, called Cattle, which is when Rancher went General Availability (GA) with the launch of Rancher v1.0. Cattle was designed to address the limitations of Docker Swarm, which spanned several different areas.
The first was networking. Originally, Docker Swarm's networking overlay was built on Internet Protocol Security (IPsec) with the idea that each node in the cluster would be assigned a subnet; that is, a class C subnet by default. Each node would create an IPsec tunnel to all other nodes in the cluster. It would then use basic routing rules to direct traffic to the node where that container was hosted. For example, let's say a container on node01 with an IP address of 192.168.11.22 wants to connect to another container hosted on node02 with an IP address of 192.168.12.33. The networking swarm uses basic Linux routing to route anything inside the 192.168.12.0/24 subnet to node02 over the IPsec tunnel. This core concept is still in use today by the majority of Kubernetes's CNI providers. The main issue is in managing the health of these tunnels over time and dealing with compatibility issues between the nodes. Cattle addressed this issue by moving IPsec into a container and then wrapping a management layer to handle the creation, deletion, and monitoring of the tunnels.
The second main issue was to do with load balancing. With Docker Swarm, we were limited to very basic TCP/layer4 load balancing. We didn't have sessions, SSL, or connection management. This is because load balancing was all done by iptable rules. Cattle addressed this issue by deploying HAProxy on all nodes in the cluster. Following this, Cattle used a custom container, called rancher-metadata, to dynamically build HAProxy's config every time a container was created or deleted.
The third issue was storage. With Docker Swarm, there weren't any storage options outside bind mounting to a host filesystem. This meant that you had to create a clustered filesystem or shared network and then manually map them to all of your Docker hosts. Cattle addressed this by creating rancher-nfs, which is a tool that can mount NFS shares inside a container and create a bind mount. As Rancher went on, other storage providers were added, such as AWS and VMware.
Then, as time moved forward at Rancher, the next giant leap was when authentication providers were added, because Rancher provides access to the clusters that Rancher manages by integrating external authentication providers such as Active Directory, LDAP, and GitHub. This is unique to Rancher, as Kubernetes still doesn't integrate very well with external authentication providers.
Rancher is built around several core design principles:
Open source: All code, components, and services that make up Rancher or come with Rancher must be open source. Because of this, Rancher has a large community built around it, with users providing feedback, documentation, and contributing code.No lock-ins: Rancher is designed with no vendor lock-in, including being locked inside Rancher. With containerization evolving so quickly, Rancher needs to enable users to change technologies with as little impact as possible. A core requirement of all products and solutions that Rancher provides is that they can be used with or without the Rancher server. An example of this is Longhorn; there are zero dependencies between Rancher and Longhorn. This means that at any time, a user can uninstall one without impacting the other. This includes the ability to uninstall Rancher without losing your clusters. Rancher does this by having a process in place for a user to take over the management of a cluster directly and kick Rancher out of the picture.Everything is a Kubernetes object: With the release of Rancher v2.0, which was released in May 2018, after approximately a year and a half of work, Rancher made the switch from storing all of its resources and configurations inside a MySQL database to storing everything as a Kubernetes object. This is done by using custom resources (or CRDs) in Kubernetes. For example, let's consider that the definition of a cluster in Rancher is stored as a Custom Resource Definition (CRD) called clusters.management.cattle.io and the same with nodes as an object under nodes.management.cattle.io, which is scoped to a namespace with the cluster ID. Because of this, users and applications can directly query Rancher objects without needing to talk to Rancher's API. The reason for this change was mainly to do with scalability. With Cattle and MySQL, all cluster-related tasks had to go back to the Rancher server. So, as you scaled up the size of your cluster and the number of clusters, you had to scale up the Rancher server, too. This resulted in customers hitting issues such as "task storms" where a single node rebooting in a cluster causes a flood of requests that are sent to the Rancher server, which, in turn, causes other tasks to timeout, which then causes more requests. In the end, the only thing you can do is to shut everything down and slowly bring it back up.Everything is stateless: Because everything is a Kubernetes object, there is no need for a database for Rancher. All Rancher pods are stateless, meaning they can be destroyed at any time for any reason. Additionally, Rancher can rely on Kubernetes controllers to simply spin up new pods without needing Rancher to do anything.Controller model: All Rancher services are designed around the Kubernetes controller model. A control loop is always running, watching the current state, and comparing it to the desired state. And if any differences are found, it applies the application logic to make the current state match the desired state. Alongside this, it uses the same leader election process with Kubernetes core components. This ensures there is only one source of truth and ensures certain controllers will handle failing over after a failure.The name Kubernetes originates from Greek, meaning helmsman or pilot. Kubernetes is abbreviated to k8s due to the number of letters between the K and S. Initially, engineers created Kubernetes at Google from an internal project called Borg. Google's Borg system is a cluster manager that was designed to run Google's internal applications. These applications are made up of tens of thousands of microservices hosted on clusters worldwide, with each cluster being made up of tens of thousands of machines. Borg provided three main benefits. The first benefit was the abstraction of resource and failure management, so application designers could focus on application development. The second benefit was its high reliability and availability by design. All parts of Borg were designed, from the beginning, to be highly available. This was done by making applications stateless. This was done so that any component could be destroyed at any time for any reason without impacting availability and, at the same time, could be scaled horizontally to hundreds of instances across clusters. The third benefit was an effective workload; Borg was designed to have a minimal overhead on the compute resources being managed.
Kubernetes can be traced directly back to Borg, as many of the developers at Google that worked on Kubernetes were formerly developers on the Borg project. Because of this, many of its core concepts were incorporated into Kubernetes, with the only real difference being that Borg was custom-made for Google, and its requirements for Kubernetes need to be more generalized and flexible. However, there are four main features that have been derived from Borg:
Pods: A pod is the smallest unit of scheduling in Kubernetes. This object can include one or more containers, with each container in the pods sharing resources such as an IP address, volumes, and other local resources. One of the main design principles is that a pod should be disposable and shouldn't change after creation. Another primary principle is that all application configurations should be handled at the pod level. For example, a database connection string should be defined as part of the pod's definition instead of the application code. This is done so that any changes to the configuration of an application won't require the code to be recompiled and redeployed. Additionally, the pod takes the concept of paired processes from Borg, with the classic example being a log collector. This is because, typically, your container should only have one primary process running inside it.An example of this is a web server: the server creates logs, but how do you ship those logs to a log server like Splunk? One option is to add a custom agent to your application pod which is easy. But now that you manage more than one process inside a container, you'll have duplicate code in your environment, and most importantly, you now have to do error handling for both your main application and this additional logging agent. This is where sidecars come into play and allow you to bolt containers together inside a pod in a repeatable and consistent manner.
Services: One of Borg's primary roles was the life cycle management of applications and pods. Because of this, the pod name and IP address are ephemeral and can change at any time for any reason. So, the concept of a service was created as an abstraction level wherein you can define a service object that references a pod or pods by using labels. Kubernetes will then handle the mapping of the service records to its pods. These benefits load balance the traffic for a service among the pods that make up that service. Service records allow Kubernetes to add and remove pods without disrupting the applications because the service to pod mapping can simply be changed without the requesting client being aware.Labels: Because Borg was designed to manage containers at scale, things such as a hostname were impractical for mapping a pod to its running application. The idea was that if you define a set of labels for your application, those can be added to its pods, allowing Kubernetes to track instances at scale. Labels are arbitrary key-value pairs that can be assigned to any Kubernetes resource, including pods, services, nodes, and more. One example set is "application=web_frontend," "environment=production," "department=marketing". Note that each of these keys is a different label selector rule that can create a service record. This has the side benefit of making the reporting and tracking of usage much easier.Every pod has an IP: When Borg was created, all of the containers on a host would share the host's IP address and then use different ports for each container. This allowed Borg to use a standard IP network. However, this created a burden on infrastructure and application teams, as Borg needed to schedule ports for containers. This required applications to have a set of predefined ports that would be needed for their container.Kubernetes was designed to solve several problems. The primary areas are as follows:
Availability: Everyone, from the application owner to the developers, to the end users, has come to expect 24x7x365 uptime, with work outages and downtime being a four-letter word in IT. With containerization and microservices, this bar has only gotten higher. Kubernetes addresses this issue by scheduling containers across nodes and using the desired state versus the actual state. The idea is that any failures are just a change in the actual state that triggers the controllers to schedule pods until the actual state matches the desired state.CI/CD: Traditional development was carried out using monolithic developments, with a few significant releases per year. This required a ton of developers working for months to test their releases and build a ton of manual processes to deploy their applications. Kubernetes addresses this issue by being driven by the desired state and config file. This means implementing a DevOps workflow that allows developers to automate steps and continuously integrate, test, and deploy code. All of this will enable teams to fail fast and fix fast.Efficiency: Traditional IT was a black hole that companies threw money into. One of the reasons behind this was high availability. For one application, you would need at least two servers for each component of your production application. Also, you would require additional servers for each of your lower environments (such as DEV, QAS, Test, and more). Today, companies want to be as efficient with their IT spending as possible. Kubernetes addresses this need by making spinning up environments very easy. With CI/CD, you can simply create a new namespace, deploy your application, run whatever tests you want, and then tear down the namespace to reclaim its resources.Automate scaling: Traditionally, you would design and build your environment around your peak workload. For instance, let's say your application is mainly busy during business hours and is idle during off-peak hours. You are wasting money because you pay the same amount for your compute resources at 100% and 1%. However, traditionally, it would take days or even weeks to spin up a new server, install your application, config it, and, finally, update the load balancer. This made it impossible to scale up and down rapidly. So, some companies just decided to scale up and stay there. Kubernetes addresses this issue by making it easy to scale up or down, as it just involves a simple change to the desired state.Let's say that an application currently has two web servers, and you want to add a pod to handle the load. Just change the number of replicas to three because the current state doesn't match the desired state. The controllers kick up and start spinning up a new pod. This can be automated using Kubernetes' built-in horizontal pod autoscaler (HPA), which uses several metrics ranging from simple metrics such as CPU and memory to custom metrics such as overall application response times. Additionally, Kubernetes can use its vertical pod autoscaler (VPA) to automatically tune your CPU and memory limits over time. Following this, Kubernetes can use node scaling to dynamically add and remove nodes to your clusters as resources are required. This means your application might have 10 pods with 10 worker nodes during the day, but it might drop to only 1 pod with 1 worker node after hours. This means you can save the cost of 9 nodes for 16 hours per day plus the weekends; all of this without your application having to do anything.
We will compare both of these in the following section.
Kubernetes and Docker Swarm are open source container orchestration platforms that have several identical core functions but significant differences.
Kubernetes is a complex system with several components that all need to work together to make the cluster operate, making it more challenging to set up and administrate. Kubernetes requires you to manage a database (etcd), including taking backups and creating SSL certificates for all of the different components.
Docker Swarm is far simpler, with everything just being included in Docker. All you need to do is create a manager and join nodes to the swarm. However, because everything is baked-in, you don't get the higher-level features such as autoscaling, node provisioning, and more.
Kubernetes uses a flat network model with all pods sharing a large network subnet and another network for creating services. Additionally, Kubernetes allows you to customize and change network providers. For example, if you don't like a particular canal, or can't do network-level encryption, you can switch to another provider, such as Weave, which can do network encryption.
Docker Swarm networking is fundamental. By default, Docker Swarm creates IPsec tunnels between all nodes in the cluster using IPsec for the encryption. This speed can be good because modern CPUs provide hardware acceleration for AES; however, you can still take a performance hit depending on your hardware and workload. Additionally, with Docker Swarm, you can't switch network providers as you only get what is provided.
Kubernetes uses YAML and its API to enable users to define applications and their resources. Because of this, there are tools such as Helm that allow application owners to define their application in a templatized format, making it very easy for applications to be published in a user-friendly format called Helm charts.
Docker Swarm is built on the Docker CLI with a minimal API for management. The only package management tool is Docker Compose, which hasn't been widely adopted due to its limited customization and the high degree of manual work required to deploy it.
Kubernetes has been built from the ground up to be highly available and to have the ability to handle a range of failures, including pods detecting unhealthy pods using advanced features such as running commands inside the pods to verify their health. This includes all of the management components such as Kube-scheduler, Kube-apiserver, and more. Each of these components is designed to be stateless with built-in leader election and failover management.
Docker Swarm is highly available mainly by its ability to clone services between nodes, with the Swarm manager nodes being in an active-standby configuration in the case of a failure.
Kubernetes pods can be exposed using superficial layer 4 (TCP/UDP mode) load balancing services. Then, for external access, Kubernetes has two options. The first is node-port, which acts as a simple method of port-forwarding from the node's IP address to an internal service record. The second is for more complex applications, where Kubernetes can use an ingress controller to provide layer 7 (HTTP/HTTPS mode) load balancing, routing, and SSL management.
Docker Swarm load balancing is DNS-based, meaning Swarm uses round-robin DNS to distribute incoming requests between containers. Because of this, Docker Swarm is limited to layer 4 only, with no option to use any of the higher-level features such as SSL and host-based routing.
Kubernetes provides several tools in which to manage the cluster and its applications, including kubectl for command-line access and even a web UI via the Kubernetes dashboard service. It even offers higher-level UIs such as Rancher and Lens. This is because Kubernetes is built around a REST API that is highly flexible. This means that applications and users can easily integrate their tools into Kubernetes.
Docker Swarm doesn't offer a built-in dashboard. There are some third-party dashboards such as Swarmpit, but there hasn't been very much adoption around these tools and very little standardization.
Kubernetes provides a built-in RBAC model allowing fine-grained control for Kubernetes resources. For example, you can grant pod permission to just one secret with another pod being given access to all secrets in a namespace. This is because Kubernetes authorization is built on SSL certifications and tokens for authentication. This allows Kubernetes to simply pass the certificate and token as a file mounted inside a pod. This makes it straightforward for applications to gain access to the Kubernetes API.
The Docker Swarm security model is primarily network-based using TLS (mTLS) and is missing many fine-grained controls and integrations, with Docker Swarm only having the built-in roles of none, view only, restricted control, scheduler, and full control. This is because the access model for Docker Swarm was built for cluster administration and not application integration. In addition to this, originally, the Docker API only supported basic authentication.
Both Kubernetes and OpenShift share a lot of features and architectures. Both follow the same core design practices, but they differ in terms of how they are executed.
Kubernetes lacks a built-in networking solution and relies on third-party plug-ins such as canal, flannel, and Weave to provide networking for the cluster.
OpenShift provides a built-in network solution called Open vSwitch. This is a VXLAN--based software-defined network stack that can easily be integrated into RedHat's other products. There is some support for third-party network plugins, but they are limited and much harder to support.
Kubernetes takes the approach of being as flexible as possible when deploying applications to the cluster, allowing users to deploy any Linux distribution they choose, including supporting Windows-based images and nodes. This is because Kubernetes is vendor-agnostic.
OpenShift takes the approach of standardizing the whole stack on RedHat products such as RHEL for the node's operating system. Technically, there is little to nothing to stop OpenShift from running on other Linux distributions such as Ubuntu. Additionally, Openshift puts limits on the types of container images that are allowed to run inside the cluster. Again, technically, there isn't much preventing a user from deploying an Ubuntu image on the Openshift cluster, but they will most likely run into issues around supportably.
Kubernetes had a built-in tool for pod-level security called Pod Security Policies (PSPs). PSPs were used to enforce limits on pods such as blocking a pod from running as root or binding to a host's filesystem. PSPs were deprecated in v1.21 due to several limitations of the tool. Now, PSPs are being replaced by a third-party tool called OPA Gatekeeper, which allows all of the same security rules but with a different enforcement model.
OpenShift has a much stricter security mindset, with the option to be secure as a default, and it doesn't require cluster hardening like Kubernetes.
In this chapter, we learned about Rancher's history and how it got its start. Following this, we went over Rancher's core philosophy and how it was designed around Kubernetes. Then, we covered where Kubernetes got its start and its core philosophy. We then dived into what the core problems are that Kubernetes is trying to solve. Finally, we examined the pros and cons of Kubernetes, Docker Swarm, and OpenShift.
In the next chapter, we will cover the high-level architecture and processes of Rancher and its products, including RKE, K3s, and RancherD.
This chapter will cover the high-level processes of Rancher, Rancher Kubernetes Engine (RKE), RKE2 (also known as RKE Government), K3s, and RancherD. We will discuss the core design philosophy of each of these products and explore the ways in which they are different. We'll dive into Rancher's high-level architecture and see how Rancher server pods communicate with downstream clusters using the Cattle agents, which include both the Cattle-cluster-agent and the Cattle-node-agent. We'll also look at how the Rancher server uses RKE and how Rancher-machine provisions downstream nodes and Kubernetes (K8s) clusters. After that, we'll cover the high-level architecture of K8s, including kube-api-server, kube-controller-manager, and kube-scheduler. We'll also discuss how each of these components maintains the state of the cluster. Finally, we'll examine how an end user can change the desired state and how the controllers can update the current state.
In this chapter, we're going to cover the following main topics:
What is the Rancher server?What are RKE and RKE2?What is K3s (five less than K8s)?What is RancherD?What controllers run inside the Rancher server pods?What does the Cattle agent do?How does Rancher provision nodes and clusters?What are kube-apiserver, kube-controller-manager, kube-scheduler, etcd, and kubelet?How do the current state and the desired state work?The Rancher server forms the core of the Rancher ecosystem, and it contains almost everything needed by any other component, product, or tool depending on or connecting to the Rancher server via the Rancher API. The Rancher server is usually shortened to just Rancher, and in this section, when I say Rancher, I will be talking about the Rancher server.
The heart of Rancher is its API. The Rancher API is built on a custom API framework called Norman that acts as a translation layer between the Rancher API and the K8s API. Everything in Rancher uses the Rancher or K8s API to communicate. This includes the Rancher user interface (UI), which is 100% API-driven.
So, how do you connect to the Rancher API? The Rancher API is a standard RESTful API. This means that a request flows from an external HTTP or TCP load balancer into the ingress controller, and then the request is routed to one of the Rancher server pods. Norman then translates the request into a K8s request, which then calls a CustomResource object. Of course, because everything is being stored in a CustomResource object in K8s, the Rancher request flow is stateless and doesn't require session persistence. Finally, once the CustomResource object is created, changed, or deleted, the controller for the object type will take over and process that request. We'll go deeper into the different controllers later in this chapter.
What do I need, RKE or RKE2? Traditionally, when building a K8s cluster, you would need to carry out several steps. First, you'd need to generate a root CA key as well as the certificates for the different K8s components and push them out to every server that was part of the cluster. Second, you'd then install/configure etcd, and this would include setting up the systemd service on your management nodes. Next, you would need to bootstrap the etcd cluster and verify that all etcd nodes were communicating and replicating correctly. At this point, you would install kube-apiserver and connect it back to your etcd cluster. Finally, you would need to install kube-controller-manager and kube-scheduler and connect them back to the kube-apiserver objects. If you wanted to bring up the control plane for your cluster, even more steps would be needed to join your worker nodes to the cluster.
This process is called K8s the hard way, and it's called that for a reason, as this process can be very complicated and can change over time. And in the early days of K8s, this was the only way to create K8s clusters. Because of this, users needed to make large scripts or Ansible Playbooks to create their K8s clusters. These scripts would need lots of care and feeding to get up and running, with even more work required to keep them working as K8s continually changed.
Rancher saw this issue and knew that for K8s to become mainstream, it needed to be crazy easy to build clusters for both end users and the Rancher server. Initially, in the Rancher v1.6 days, Rancher would build K8s clusters on its container clustering software called Cattle. Because of this, everything needed had to run as a container, and this was the starting point of RKE.
RKE is Rancher's cluster orchestration tool for creating and managing Cloud Native Computing Foundation (CNCF)-certified K8s clusters on a wide range of operating systems with a range of configurations. The core concept of RKE is that everything that makes up the K8s cluster should run entirely within Docker containers. Because of this, RKE doesn't care what operating system it's deployed on, as long as it's within a Docker container. This is because RKE is not installing binaries on the host, configuring services, or anything similar to this.
RKE is a Golang application that runs on most Linux/Unix-based systems. When a user wants to create a K8s cluster using RKE, they must first define the cluster using a file called cluster.yml (see Figure 2.1). RKE then uses that configuration file to create all of the containers needed to start the cluster, that is, etcd, kube-apiserver, kube-controller-manager, kube-scheduler, and kubelet. Please see the How does Rancher provision nodes and clusters? section in this chapter for further details on nodes and clusters.
Figure 2.1 – A code snippet from the cluster.yaml file
RKE2 is Rancher's next-generation K8s solution and is also known as RKE Government. RKE2 was designed to update and address some of the shortfalls of RKE, and it also brought the crazy easy setup methods from K3s to improve its functionality. RKE2 is also a fully CNCF-certified K8s distribution. But RKE2 was created specifically for Rancher's US federal government and their customers, as they have several special requirements for their K8s use – the first being that it is highly secure by default.
When setting up RKE, you must follow a hardening guide and take several manual steps to comply with CIS benchmarks. RKE2, on the other hand, is designed to be secure with little to no action required by the cluster administrator. US federal customers need their K8s clusters to be FIPS-enabled (FIPS stands for the United States Federal Information Processing Standards). Also, because RKE2 is built on K3s, it inherits a number of its features – the first being the support of ARM64-based systems. So, you could set up RKE2 on a Raspberry Pi if you chose to. This provides users with the flexibility to mix and match ARM64 and AMD64 nodes in the same cluster and that means customers can run workloads such as multiple arch builds using the Drone Continuous Integration (CI) platform inside their cluster. This also provides support for low-power and cost-effective ARM64 nodes.
The second feature inherited from K3s is self-bootstrapping. In RKE, you would need to define the cluster as YAML and then use the RKE binary to try to create and manage the cluster. But with RKE2, once the first node has been created, all of the other nodes simply join the cluster using a registration endpoint running on the master nodes. Note that this does require an external load balancer or a round-robin DNS record to be successful. Because RKE2 can manage itself, it allows you to do very cool tasks, such as defining a K8s upgrade with kubectl and just letting the cluster take care of it for you.
The third feature that RKE2 inherited from K3s was built-in Helm support. This is because RKE2 was built with Rancher's fleet feature in mind, where all of the cluster services (such as cert-manager, Open Policy Agent (OPA) Gatekeeper, and more) should be deployed in an automated process using Helm. But the most significant change from RKE in RKE2 was the move from Docker to containerd. With RKE, you must have Docker installed on all nodes before RKE can manage them. This is because the core K8s components like etcd and kube-apiserver are static containers that are deployed outside the K8s cluster. RKE2 leverages what are known as static pods. These are unique pods that are managed directly by kubelet and not by kube-controller-manager or kube-scheduler. Because these pods don't require the K8s cluster to be up and running in order to start, the core K8s components such as etcd and kube-apiserver can just be pods – just like any other application in the cluster. This means that if you run kubectl -n kube-system get pods, you can see your etcd containers, and you can even open a shell to them or capture logs, just like you would with any other pod.
Last but not the least, the most crucial feature of RKE2 is that it's fully open source with no paywall – just like every other Rancher product.
K3s is a fully CNCF-certified K8s distribution. This means that in K3s, the YAML you would deploy is just a standard K8s cluster deployed in a K3s cluster. K3s was created because traditional K8s clusters – or even RKE clusters – were designed to run at scale, meaning that they would require three etcd nodes, two control plane nodes, and three or more worker nodes for a standard configuration. In this case, the minimum size for nodes would be around four cores, with 8 gigabits of RAM for the etcd objects and control plane nodes, with the worker nodes having two cores and 4 gigabits of RAM. These would just be the background requirements when talking about K8s clusters at the scale of an IE 50 node cluster, with the worker nodes having 64 cores and 512 GB of RAM. But when you start looking at deploying K8s at the edge, where physical space, power, and compute resources are all at a premium, standard K8s and RKE are just too big. So, the question is: how do we shrink K8s?
K3s was based on the following core principles: no legacy code, duplicate code, or extras. With RKE and other standard K8s distributions, each component exists as its separate code with its own runtime. At Rancher, they asked themselves a question:
Hey, there is a lot of duplicate code running here. What if we just merged kube-apiserver, kube-controller-manager, kube-scheduler, and kubelet into a single binary?
And that was how K3s was born. K3s only has master and worker nodes, with the master node running all of the core components. The next big breakthrough was what they did with etcd. The etcd object is not small. It eats memory like it's going out of style and doesn't play nice when it's in a cluster of one. This is where kind comes into the picture.
The kind database adapter makes standard SQL databases such as SQLite3, MySQL, or Postgres look like an etcd database. So, as far as kube-apiserver knows, it's talking to an etcd cluster. The CPU and memory footprint is much smaller because you can run a database like SQLite3 in place of etcd. It is important to note that Rancher does not customize any of the standard K8s libraries in the core components. This allows K3s to stay up to date with upstream K8s. The next big area of saving in K3s was in-tree storage drivers and cloud providers. Upstream K8s has several storage drivers built into the core components. For example, RKE has storage drivers to allow K8s to connect to the AWS API and use AmazonEBS volumes to provide storage directly to pods. This is great if you are running in AWS, but if you are running in VMware then this code is just wasting resources. It's the same the other way round, with VMware's vSphere having a storage provider for mounting Virtual Machine Disks (VMDKs) to nodes. The idea was that most of these storage and cloud providers are not used. For example, if I'm running a cluster on Amazon, why do I need libraries and tools for Azure? Plus there are out-of-tree alternatives that can be deployed as pods instead of being baked in. Also, most of the major storage providers are moving to out-of-tree provisioning anyway. So, K3s removes them. This eliminates a significant overhead. Because of all these optimizations, K3s clusters can fit on a 40 MB binary file and run on a node with only 512 MB of RAM.
The other significant change in K3s to K8s was the idea that it should be crazy easy to spin up a K3s cluster. For example, creating a single-node K3s cluster only requires the curl -sfL https://get.k3s.io | sh - command to run, with the only dependency being that it's within a Linux ARM64 or AMD64 operating system with curl
