47,99 €
As containers have become the new de facto standard for packaging applications and their dependencies, understanding how to implement, build, and manage them is now an essential skill for developers, system administrators, and SRE/operations teams. Podman and its companion tools Buildah and Skopeo make a great toolset to boost the development, execution, and management of containerized applications.
Starting with the basic concepts of containerization and its underlying technology, this book will help you get your first container up and running with Podman. You'll explore the complete toolkit and go over the development of new containers, their lifecycle management, troubleshooting, and security aspects. Together with Podman, the book illustrates Buildah and Skopeo to complete the tools ecosystem and cover the complete workflow for building, releasing, and managing optimized container images. Podman for DevOps provides a comprehensive view of the full-stack container technology and its relationship with the operating system foundations, along with crucial topics such as networking, monitoring, and integration with systemd, docker-compose, and Kubernetes.
By the end of this DevOps book, you'll have developed the skills needed to build and package your applications inside containers as well as to deploy, manage, and integrate them with system services.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 571
Veröffentlichungsjahr: 2022
Containerization reimagined with Podman and its companion tools
Alessandro Arrichiello
Gianni Salinetti
BIRMINGHAM—MUMBAI
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Rahul Nair
Publishing Product Manager: Niranjan Naikwadi
Senior Editor: Sangeeta Purkayastha
Content Development Editor: Nihar Kapadia
Technical Editor: Nithik Cheruvakodan
Copy Editor: Safis Editing
Project Coordinator: Shagun Saini
Proofreader: Safis Editing
Indexer: Sejal Dsilva
Production Designer: Sinhayna Bais
Marketing Coordinator: Nimisha Dua
First published: May 2022
Production reference: 1080422
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80324-823-3
www.packt.com
To my son, Giovanni, for allowing me to steal time from our playtime. To my wife, Tecla, for being my loving partner and supporting me.
– Alessandro Arrichiello
To my son, Filippo, who teaches me to learn every day and enjoy the little things in life. To my beloved wife, Anna Veronica, for all the strength and inspiration she brings to our family.
– Gianni Salinetti
Containers, their various runtimes, and Kubernetes are seeing fierce momentum in the technology and computing worlds. They are no longer just the darling of system administrators and Kubernetes distributed workloads. Containers are now endemic in Continuous Integration (CI) tasks, cloud workloads, and microservices. Starting with programmers, containers have even broken into the desktop space, where Integrated Development Environments (IDEs) can be the backend into containers for things such as testing language versions or compiling code. We can attribute the latest invigoration to the simplification of container images and the ability to distribute them in container registries. Not bad for a decades-old technology that used to simply focus on the isolation of a computing process.
Podman for DevOps begins with a detailed exploration of container history, from its inception to now. It then transitions into the various container technologies and arrives at the two most common ones: Docker and Podman (short for Pod Manager). The early chapters provide a comprehensive examination of Docker and Podman and describe the pros and cons of both. These comparisons demonstrate Podman's novelty and strengths.
Gianni and Alessandro then settle on Podman, beginning with an exploration of its architecture. They then follow the architecture by illustrating the various applications in the Podman stack, such as conmon and network tooling. After laying the groundwork for how Podman works, they meticulously review each Podman command in an example-oriented approach. Finally, Gianni and Alessandro provide a thorough review of Buildah, Podman's best friend and a best-of-breed application for building container images.
When I write about containers and Podman, one of my challenges when explaining concepts can be providing too many details or oversimplifying things. Gianni and Alessandro have found a perfect medium between both ends by supplying ample amounts of detail. I appreciated the carefully crafted explanations when the topic required them. Not only was the level of detail appropriate, but they also used a very wide scope when writing about Podman and containers. As I read the book, I was able to relate to their superb use of examples and they did not add layers of abstraction that can make learning difficult. Podman for DevOps was a pleasure to read. As a subject matter expert, I am certain it will be a perfect resource for those both new to and experienced with Podman and containers.
Brent J. Baude, Senior Principal Software Engineer
Podman Architect
Alessandro Arrichiello is a solution architect for Red Hat Inc. with a special focus on telco technologies. He has a passion for GNU/Linux systems, which began at age 14 and continues today. He has worked with tools for automating enterprise IT: configuration management and continuous integration through virtual platforms. Alessandro is also a writer for the Red Hat Developer Blog, on which he has authored several articles about container architecture and technology. He now helps telecommunication customers with adopting container orchestration environments such as Red Hat OpenShift and Kubernetes, infrastructure as a service such as OpenStack, edge computing, and data center automation.
Gianni Salinetti is a solution architect from Rome working for Red Hat Inc. with a special focus on cloud-native computing and hybrid cloud strategies. He started working with GNU/Linux back in 2001 and developed a passion for open source software. His main fields of interest are application orchestration, automation, and systems performance tuning. He is also an advocate of DevSecOps and GitOps practices. He is a former Red Hat instructor, having taught many classes about GNU/Linux, OpenStack, JBoss middleware, Ansible, Kubernetes, and Red Hat OpenShift. He won Red Hat EMEA awards as the best DevOps, cloud, and middleware instructor. He is also an author for the Red Hat Developer Blog and actively contributes to webinars and events.
Nicolò Amato has over 20 years of experience working in the field of IT, 16 of which were at Hewlett Packard Enterprise, Accenture, DXC, and Red Hat Inc. Working in both technical and development roles has given him a broad base of skills and the ability to work with a diverse range of clients. His time was spent designing and implementing complex infrastructures for clients with the aim to migrate traditional services to hybrid, multi-cloud, and edge environments, evolving them into cloud-native services. He is enthusiastic about new technologies and he likes to be up to date – in particular with open source, which he considers one of the essences of technology that regulates the evolution of information technology.
Pierluigi Rossi is a solution architect for Red Hat Inc. His passion for GNU/Linux systems began 20 years ago and continues today. He has built a strong business and technical know-how on enterprise and cutting-edge technologies, working for many companies on different verticals and roles in the last 20 years. He has worked with virtualization and containerization tools (open source and not). He has also participated in several projects for corporate IT automation. He is now working on distributed on-premises and cloud environments involving IaaS, PaaS (OpenShift and Kubernetes), and automation. He loves open source in all its shades, and he enjoys sharing ideas and solutions with customers, colleagues, and community members.
Marco Alessandro Fagotto has been in the IT industry for 13 years, ranging across frontend and backend support, administration, system configuration, and security roles. Working in both technical and development roles has given him a broad base of skills and the ability to work with a diverse range of clients. He is a Red Hat Certified Professional, always looking for new technology and solutions to explore due to his interest in the fast evolution of the open source world.
DevOps best practices encourage the adoption of containers as the foundation of cloud-native ecosystems. As containers have become the new de facto standard for packaging applications and their dependencies, understanding how to implement, build, and manage them is now an essential skill for developers, system administrators, and SRE/operations teams. Podman and its companion tools, Buildah and Skopeo, make a great toolset to boost the development, execution, and management of containerized applications. Starting from the basic concepts of containerization and its underlying technologies, this book will help you get your first container up and running with Podman. The book explores the complete toolkit and illustrates the development of new containers, their life cycle management, troubleshooting, and security aspects.
By the end of Podman for DevOps, you'll have the skills needed to be able to build and package your applications inside containers as well as deploy, manage, and integrate them with system services.
The book is for cloud developers looking to learn how to build and package applications inside containers, and system administrators who want to deploy, manage, and integrate containers with system services and orchestration solutions. This book provides a detailed comparison between Docker and Podman to aid you in learning Podman quickly.
Chapter 1, Introduction to Container Technology, covers the key concepts of container technology, a bit of history, and the underlying foundational elements that make things work.
Chapter 2, Comparing Podman and Docker, takes you through the architectures of Docker versus Podman, looking at high-level concepts and the main differences between them.
Chapter 3, Running the First Container, teaches you how to set up the prerequisites for running and managing your first container with Podman.
Chapter 4, Managing Running Containers, helps you understand how to manage the life cycles of your containers, starting/stopping/killing them to properly manage the services.
Chapter 5, Implementing Storage for the Container's Data, covers the basics of storage requirements for containers, the various offerings available, and how to use them.
Chapter 6, Meet Buildah – Building Containers from Scratch, is where you begin to learn the basic concepts of Buildah, Podman's companion tool that is responsible for assisting system administrators as well as developers during the container creation process.
Chapter 7, Integrating with Existing Application Build Processes, teaches you techniques and methods to integrate Buildah into a build process for your existing applications.
Chapter 8, Choosing the Container Base Image, covers more about the container base image format, trusted sources, and their underlying features.
Chapter 9, Pushing Images to a Container Registry, teaches you what a container registry is, how to authenticate them, and how to work with images by pushing and pulling them.
Chapter 10, Troubleshooting and Monitoring Containers, shows you how to inspect running or failing containers, search for issues, and monitor the health status of containers.
Chapter 11, Securing Containers, goes into more detail on security in containers, the main issues, and the important step of updating container images during runtime.
Chapter 12, Implementing Container Networking Concepts, teaches you about Containers Network Interface (CNI), how to expose a container to the external world, and finally, how to interconnect two or more containers running on the same machine.
Chapter 13, Docker Migration Tips and Tricks, sees you learn how to migrate from Docker to Podman in the easiest way by using some of the built-in features of Podman, as well as some tricks that may help in the process.
Chapter 14, Interacting with systemd and Kubernetes, shows you how to integrate a container as a system service in the underlying operating host, enabling its management with the common sysadmin's tools. Podman interaction features with Kubernetes will also be explored.
In this book, we will guide you through the installation and use of Podman 3 or later, and its companion tools, Buildah and Skopeo. The default Linux distribution used in the book is Fedora Linux 34 or later but any other Linux distribution can be used. All commands and code examples have been tested using Fedora 34 or 35 and Podman 3 or 4, but they should work also with future version releases.
If you are using the digital version of this book, we advise you to type the commands yourself or access the code from the book's GitHub repository (a link is available in the next section).
Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/Podman-for-DevOps. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781803248233_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "We just defined a name for our repo, ubi8-httpd, and we chose to link this repository to a GitHub repository push."
A block of code is set as follows:
[Unit]
Description=Podman API Socket
Documentation=man:podman-system-service(1)
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
$ podman ps
CONTAINER ID IMAGE
COMMAND CREATED STATUS PORTS
NAMES
685a339917e7 registry.fedoraproject.org/f29/httpd:latest /
usr/bin/run-http... 3 minutes ago Up 3 minutes ago
clever_zhukovsky
Any command-line input or output is written as follows:
$ skopeo login -u admin -p p0dman4Dev0ps# --tls-verify=false localhost:5000
Login Succeeded!
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "… and prints a crafted HTML page with the Hello World! message when it receives a GET / request."
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read , we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
This chapter will take you through the basic concepts of container technology, the main features of Podman and its companion tools, the main differences between Podman and Docker, and finally, will put the theory of running and managing containers into practice.
This part of the book comprises the following chapters:
Chapter 1, Introduction to Container TechnologyChapter 2, Comparing Podman and DockerChapter 3, Running the First ContainerChapter 4, Managing Running ContainersChapter 5, Implementing Storage for the Container’s DataContainer technology has old roots in operating system history. For example, do you know that part of container technology was born back in the 1970s? Despite their simple and intuitive approach, there are many concepts behind containers that deserve a deeper analysis to fully grasp and appreciate how they made their way in the IT industry.
We're going to explore this technology to better understand how it works under the hood, the theory behind it, and its basic concepts. Knowing the mechanics and the technology behind the tools will let you easily approach and learn the whole technology's key concepts.
Then, we will also explore container technology's purpose and why it has spread to every company today. Do you know that 50% of the world's organizations are running half of their application base as containers in production nowadays?
Let's dive into this great technology!
In this chapter, we're going to ask the following questions:
What are containers?Why do I need a container?Where do containers come from?Where are containers used today?This chapter does not require any technical prerequisites, so feel free to read it without worrying about installing or setting up any kind of software on your workstation!
Anyway, if you are new to containers, you will find here many technical concepts useful to understand the next chapters. We recommend going through it carefully and coming back when needed. Previous knowledge of the Linux operating system would be helpful in understanding the technical concepts covered in this book.
In the following chapters, we will learn many new concepts with practical examples that will require active interaction with a Linux shell environment. In the practical examples, we will use the following conventions:
For any shell command that will be anticipated by the $ character, we will use a standard user (not root) for the Linux system.For any shell command that will be anticipated by the # character, we will use the root user for the Linux system.Any output or shell command that would be too long to display in a single line for the code block will be interrupted with the \ character, and then it will continue to a new line.This section describes the container technology from the ground up, beginning from basic concepts such as processes, filesystems, system calls, the process isolation up to container engines, and runtimes. The purpose of this section is to describe how containers implement process isolation. We also describe what differentiates containers from virtual machines and highlight the best use case of both scenarios.
Before asking ourselves what a container is, we should answer another question: what is a process?
According to The Linux Programming Interface, an enjoyable book by Michael Kerrisk, a process is an instance of an executing program. A program is a file holding information necessary to execute the process. A program can be dynamically linked to external libraries, or it can be statically linked in the program itself (the Go programming language uses this approach by default).
This leads us to an important concept: a process is executed in the machine CPU and allocates a portion of memory containing program code and variables used by the code itself. The process is instantiated in the machine's user space and its execution is orchestrated by the operating system kernel. When a process is executed, it needs to access different machine resources such as I/O (disk, network, terminals, and so on) or memory. When the process needs to access those resources, it performs a system call into the kernel space (for example, to read a disk block or send packets via the network interface).
The process indirectly interacts with the host disks using a filesystem, a multi-layer storage abstraction, that facilitates the write and read access to files and directories.
How many processes usually run in a machine? A lot. They are orchestrated by the OS kernel with complex scheduling logics that make the processes behave like they are running on a dedicated CPU core, while the same is shared among many of them.
The same program can instantiate many processes of its kind (for example, multiple web server instances running on the same machine). Conflicts, such as many processes trying to access the same network port, must be managed accordingly.
Nothing prevents us from running a different version of the same program on the host, assuming that system administrators will have the burden of managing potential conflicts of binaries, libraries, and their dependencies. This could become a complex task, which is not always easy to solve with common practices.
This brief introduction was necessary to set the context.
Containers are a simple and smart answer to the need of running isolated process instances. We can safely affirm that containers are a form of application isolation that works on many levels:
Filesystem isolation: Containerized processes have a separated filesystem view, and their programs are executed from the isolated filesystem itself.Process ID isolation: This is a containerized process run under an independent set of process IDs (PIDs).User isolation: User IDs (UIDs) and group IDs (GIDs) are isolated to the container. A process' UID and GID can be different inside a container and run with a privileged UID or GID inside the container only.Network isolation: This kind of isolation relates to the host network resources, such as network devices, IPv4 and IPv6 stacks, routing tables, and firewall rules.IPC isolation: Containers provide isolation for host IPC resources, such as POSIX message queues or System V IPC objects. Resource usage isolation: Containers rely on Linux control groups (cgroups) to limit or monitor the usage of certain resources, such as CPU, memory, or disk. We will discuss more about cgroups later in this chapter.From an adoption point of view, the main purpose of containers, or at least the most common use case, is to run applications in isolated environments. To better understand this concept, we can look at the following diagram:
Figure 1.1 – Native applications versus containerized ones
Applications running natively on a system that does not provide containerization features share the same binaries and libraries, as well as the same kernel, filesystem, network, and users. This could lead to many issues when an updated version of an application is deployed, especially conflicting library issues or unsatisfied dependencies.
On other hand, containers offer a consistent layer of isolation for applications and their related dependencies that ensures seamless coexistence on the same host. A new deployment only consists of the execution of the new containerized version, as it will not interact or conflict with the other containers or native applications.
Linux containers are enabled by different native kernel features, with the most important being Linux namespaces. Namespaces abstract specific system resources (notably, the ones described before, such as network, filesystem mount, users, and so on) and make them appear as unique to the isolated process. In this way, the process has the illusion of interacting with the host resource, for example, the host filesystem, while an alternative and isolated version is being exposed.
Currently, we have a total of eight kinds of namespaces:
PID namespaces: These isolate the process ID number in a separate space, allowing processes in different PID namespaces to retain the same PID.User namespaces: These isolate user and group IDs, root directory, keyrings, and capabilities. This allows a process to have a privileged UID and GID inside the container while simultaneously having unprivileged ones outside the namespace.UTS namespaces: These allow the isolation of hostname and NIS domain name.Network namespaces: These allow isolation of networking system resources, such as network devices, IPv4 and IPv6 protocol stacks, routing tables, firewall rules, port numbers, and so on. Users can create virtual network devices called veth pairs to build tunnels between network namespaces.IPC namespaces: These isolate IPC resources such as System V IPC objects and POSIX message queues. Objects created in an IPC namespace can be accessed only by the processes that are members of the namespace. Processes use IPC to exchange data, events, and messages in a client-server mechanism.cgroup namespaces: These isolate cgroup directories, providing a virtualized view of the process's cgroups.Mount namespaces: These provide isolation of the mount point list that is seen by the processes in the namespace.Time namespaces: These provide an isolated view of system time, letting processes in the namespace run with a time offset against the host time.Now's, let's move on to resource usage.
cgroups are a native feature of the Linux kernel whose purpose is to organize processes in a hierarchical tree and limit or monitor their resource usage.
The kernel cgroups interface, similar to what happens with /proc, is exposed with a cgroupfs pseudo-filesystem. This filesystem is usually mounted under /sys/fs/cgroup in the host.
cgroups offer a series of controllers (also called subsystems) that can be used for different purposes, such as limiting the CPU time share of a process, memory usage, freeze and resume processes, and so on.
The organizational hierarchy of controllers has changed through time, and there are currently two versions, V1 and V2. In cgroups V1, different controllers could be mounted against different hierarchies. Instead, cgroups V2 provide a unified hierarchy of controllers, with processes residing in the leaf nodes of the tree.
cgroups are used by containers to limit CPU or memory usage. For example, users can limit CPU quota, which means limiting the number of microseconds the container can use the CPU over a given period, or limit CPU shares, the weighted proportion of CPU cycles for each container.
Now that we have illustrated how process isolation works (both for namespaces and resources), we can illustrate a few basic examples.
A useful fact to know is that GNU/Linux operating systems offer all the features necessary to run a container manually. This result can be achieved by working with a specific system call (notably unshare() and clone()) and utilities such as the unshare command.
For example, to run a process, let's say /bin/sh, in an isolated PID namespace, users can rely on the unshare command:
# unshare --fork --pid --mount-proc /bin/sh
The result is the execution of a new shell process in an isolated PID namespace. Users can try to monitor the process view and will get an output such as the following:
sh-5.0# ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1 0.0 0.0 226164 4012 pts/4 S 22:56 0:00 /bin/sh
root 4 0.0 0.0 227968 3484 pts/4 R+ 22:56 0:00 ps aux
Interestingly, the shell process of the preceding example is running with PID 1, which is correct, since it is the very first process running in the new isolated namespace.
Anyway, the PID namespace will be the only one to be abstracted, while all the other system resources still remain the original host ones. If we want to add more isolation, for example on a network stack, we can add the --net flag to the previous command:
# unshare --fork --net --pid --mount-proc /bin/sh
The result is a shell process isolated on both PID and network namespaces. Users can inspect the network IP configuration and realize that the host native devices are no longer directly seen by the unshared process:
sh-5.0# ip addr show
1: lo: <LOOPBACK> mtu 65536 qdisc noop state DOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
The preceding examples are useful to understand a very important concept: containers are strongly related to Linux native features. The OS provided a solid and complete interface that helped container runtime development, and the capability to isolate namespaces and resources was the key that unlocked containers adoption. The role of the container runtime is to abstract the complexity of the underlying isolation mechanisms, with the mount point isolation being probably the most crucial of them. Therefore, it deserves a better explanation.
We have seen so far examples of unsharing that did not impact mount points and the filesystem view from the process side. To gain the filesystem isolation that prevents binary and library conflicts, users need to create another layer of abstraction for the exposed mount points.
This result is achieved by leveraging mount namespaces and bind mounts. First introduced in 2002 with the Linux kernel 2.4.19, mount namespaces isolate the list of mount points seen by the process. Each mount namespace exposes a discrete list of mount points, thus making processes in different namespaces aware of different directory hierarchies.
With this technique, it is possible to expose to the executing process an alternative directory tree that contains all the necessary binaries and libraries of choice.
Despite seeming a simple task, the management of a mount namespace is all but straightforward and easy to master. For example, users should handle different archive versions of directory trees from different distributions, extract them, and bind mount on separate namespaces. We will see later that the first approaches with containers in Linux followed this approach.
The success of containers is also bound to an innovative, multi-layered, copy-on-write approach of managing the directory trees that introduced a simple and fast method of copying, deploying, and using the tree necessary to run the container – container images.
We must thank Docker for the introduction of this smart method of storing data for containers. Later, images would become an Open Container Initiative (OCI) standard specification (https://github.com/opencontainers/image-spec).
Images can be seen as a filesystem bundle that is downloaded (pulled) and unpacked in the host before running the container for the first time.
Images are downloaded from repositories called image registries. Those repositories can be seen as specialized object storages that hold image data and related metadata. There are both public and free-to-use registries (such as quay.io or docker.io) and private registries that can be executed in the customer private infrastructure, on-premises, or in the cloud.
Images can be built by DevOps teams to fulfill special needs or embed artifacts that must be deployed and executed on a host.
During the image build, process developers can inject pre-built artifacts or source code that can be compiled in the build container itself. To optimize image size, it is possible to create multi-stage builds with a first stage that compiles the source code using a base image with the necessary compilers and runtimes, and a second stage where the built artifacts are injected into a minimal, lightweight image, optimized for fast startup and minimal storage footprint.
The recipe of the build process is defined in a special text file called a Dockerfile, which defines all the necessary steps to assemble the final image.
After building them, users can push their own images on public or private registries for later use or complex, orchestrated deployments.
The following diagram summarizes the build workflow:
Figure 1.2 – Image build workflow
We will cover the build topic more extensively later in this book.
What makes a container image so special? The smart idea behind images is that they can be considered as a packaging technology. When users build their own image with all the binaries and dependencies installed in the OS directory tree, they are effectively creating a self-consistent object that can be deployed everywhere with no further software dependencies. From this point of view, container images are an answer to the long-debated sentence, It works on my machine.
Developer teams love them because they can be certain of the execution environment of their applications, and operations teams love them because they simplify the deployment process by removing the tedious task of maintaining and updating a server's library dependencies.
Another smart feature of container images is their copy-on-write, multi-layered approach. Instead of having a single bulk binary archive, an image is made up of many tar archives called blobs or layers. Layers are composed together using image metadata and squashed into a single filesystem view. This result can be achieved in many ways, but the most common approach today is by using union filesystems.
OverlayFS (https://www.kernel.org/doc/html/latest/filesystems/overlayfs.html) is the most used union filesystem nowadays. It is maintained in the kernel tree, despite not being completely POSIX-compliant.
According to kernel documentation, "An overlay filesystem combines two filesystems – an 'upper' filesystem and a 'lower' filesystem." This means that it can combine more directory trees and provide a unique, squashed view. The directories are the layers and are referred to as lowerdir and upperdir to respectively define the low-level directory and the one stacked on top of it. The unified view is called merged. It supports up to 128 layers.
OverlayFS is not aware of the concept of container image; it is merely used as a foundation technology to implement the multi-layered solution used by OCI images.
OCI images also implement the concept of immutability. The layers of an image are all read-only and cannot be modified. The only way to change something in the lower layers is to rebuild the image with appropriate changes.
Immutability is an important pillar of the cloud computing approach. It simply means that an infrastructure (such as an instance, container, or even complex clusters) can only be replaced by a different version and not modified to achieve the target deployment. Therefore, we usually do not change anything inside a running container (for example, installing packages or updating config files manually), even though it could be possible in some contexts. Rather, we replace its base image with a new updated version. This also ensures that every copy of the running containers stays in sync with others.
When a container is executed, a new read/write thin layer is created on top of the image. This layer is ephemeral, thus any changes on top of it will be lost after the container is destroyed:
Figure 1.3 – A container's layers
This leads to another important statement: we do not store anything inside containers. Their only purpose is to offer a working and consistent runtime environment for our applications. Data must be accessed externally, by using bind mounts inside the container itself or network storage (such as Network File System (NFS), Simple Storage Service (S3), Internet Small Computer System Interface (iSCSI), and so on).
Containers' mount isolation and images layered design provide a consistent immutable infrastructure, but more security restrictions are necessary to prevent processes with malicious behaviors escape the container sandbox to steal the host's sensitive information or use the host to attack other machines. The following subsection introduces security considerations to show how container runtimes can limit those behaviors.
From a security point of view, there is a hard truth to share: if a process is running inside a container, it simply does not mean it is more secure than others.
A malicious attacker can still make its way through the host filesystem and memory resources. To achieve better security isolation, additional features are available:
Mandatory access control: SELinux or AppArmor can be used to enforce container isolation against the parent host. These subsystems, and their related command-line utilities, use a policy-based approach to better isolate the running processes in terms of filesystem and network access.Capabilities: When an unprivileged process is executed in the system (which means a process with an effective UID different from 0), it is subject to permission checking based on the process credentials (its effective UID). Those permissions, or privileges, are called capabilities and can be enabled independently, assigning to an unprivileged process limited privileged permissions to access specific resources. When running a container, we can add or drop capabilities.Secure Computing Mode (Seccomp): This is a native kernel feature that can be used to restrict the syscall that a process is able to make from user space to kernel space. By identifying the strictly necessary privileges needed by a process to run, administrators can apply seccomp profiles to limit the attack surface.Applying the preceding security features manually is not always easy and immediate, as some of them require a shallow learning curve. Instruments that automate and simplify (possibly in a declarative way) these security constraints provide a high value.
We will discuss security topics in further detail later in this book.
Despite being feasible and particularly useful from a learning point of view, running and securing containers manually is an unreliable and complex approach. It is too hard to reproduce and automate on production environments and can easily lead to configuration drift among different hosts.
This is the reason container engines and runtimes were born – to help automate the creation of a container and all the related tasks necessary that culminate with a running container.
The two concepts are quite different and tend to be often confused, thus requiring a clearance:
A container engine is a software tool that accepts and processes requests from users to create a container with all the necessary arguments and parameters. It can be seen as a sort of orchestrator, since it takes care of putting in place all the necessary actions to have the container up and running; yet it is not the effective executor of the container (the container runtime's role).Engines usually solve the following problems:
Providing a command line and/or REST interface for user interactionPulling and extracting container images (discussed later in this book)Managing container mount point and bind-mounting the extracted imageHandling container metadataInteracting with the container runtimeWe have already stated that when a new container is instantiated, a thin R/W layer is created on top of the image; this task is achieved by the container engine, which takes care of presenting a working stack of the merged directories to the container runtime.
The container ecosystem offers a wide choice of container engines. Docker is, without doubt, the most well-known (despite not being the first) engine implementation, along with Podman (the core subject of this book), CRI-O, rkt, and LXD.
A container runtime is a low-level piece of software used by container engines to run containers in the host. The container runtime provides the following functionalities:
Starting the containerized process in the target mount point (usually provided by the container engine) with a set of custom metadata
Managing the cgroups' resource allocation
Managing mandatory access control policies (SELinux and AppArmor) and capabilities
There are many container runtimes nowadays, and most of them implement the OCI runtime spec reference (https://github.com/opencontainers/runtime-spec). This is an industry standard that defines how a runtime should behave and the interface it should implement.
The most common OCI runtime is runc, used by most notable engines, along with other implementations such as crun, kata-containers, railcar, rkt, and gVisor.
This modular approach lets container engines swap the container runtime as needed. For example, when Fedora 33 came out, it introduced a new default cgroups hierarchy called cgroups V2. runc did not support cgroups V2 in the beginning, and Podman simply swapped runc with another OCI-compatible container runtime (crun) that was already compliant with the new hierarchy. Now that runc finally supports cgroups V2, Podman will be able to safely use it again with no impact for the end user.
After introducing container runtimes and engines, it's time for one of the most debated and asked questions during container introductions – the difference between containers and virtual machines.
Until now, we have talked about isolation achieved with native OS features and enhanced with container engines and runtimes. Many users could be tricked into thinking that containers are a form of virtualization.
There is nothing farther from the truth; containers are not virtual machines.
So, what is the main difference between a container and a virtual machine? Before answering, we can look at the following diagram:
Figure 1.4 – A system call to a kernel from a container
A container, despite being isolated, holds a process that directly interacts with the host kernel using system calls. The process may not be aware of the host namespaces, but it still needs to context-switch into kernel space to perform operations such as I/O access.
On the other hand, a virtual machine is always executed on top of a hypervisor, running a guest operating system with its own filesystem, networking, storage (usually as image files), and kernel. The hypervisor is software that provides a layer of hardware abstraction and virtualization to the guest OS, enabling a single bare-metal machine running on capable hardware to instantiate many virtual machines. The hardware seen by the guest OS kernel is mostly virtualized hardware, with some exceptions:
Figure 1.5 – Architecture – virtualization versus containers
This means that when a process performs a system call inside a virtual machine, it is always directed to the guest OS kernel.
To recap, we can affirm that containers share the same kernel with the host, while virtual machines have their own guest OS kernel.
This statement implies a lot of considerations.
From a security point of view, virtual machines provide better isolation from potential attacks. Anyway, some of the latest CPU-based attacks (Spectre or Meltdown, most notably) could exploit CPU vulnerabilities to access VMs' address spaces.
Containers have refined the isolation features and can be configured with strict security policies (such as CIS Docker, NIST, HIPAA, and so on) that make them quite hard to exploit.
From a scalability point of view, containers are faster to spin up than VMs. Running a new container instance is a matter of milliseconds if the image is already available in the host. These fast results are also achieved by the kernel-less nature of the container. Virtual machines must boot a kernel and initramfs, pivot into the root filesystem, run some kind of init (such as systemd), and start a variable number of services.
A VM will usually consume more resources than a container. To spin up a guest OS, we usually need to allocate more RAM, CPU, and storage than the resources needed to start a container.
Another great differentiator between VMs and containers is the focus on workloads. The best practice for containers is to spin up a container for every specific workload. On the other hand, a VM can run different workloads together.
Imagine a LAMP or WordPress architecture: on non-production or small production environments, it would not be strange to have everything (Apache, PHP, MySQL, and WordPress) installed on the same virtual machine. This design would be split into a multi-container (or multi-tier) architecture, with one container running the frontend (Apache-PHP-WordPress) and one container running the MySQL database. The container running MySQL could access storage volumes to persist the database files. At the same time, it would be easier to scale up/down the frontend containers.
Now that we understand how containers work and what differentiates them from virtual machines, we can move on to the next big question: why do I need a container?
This section describes the benefits and the value of containers in modern IT systems, and how containers can provide benefits for both technology and business.
The preceding question could be rephrased as, what is the value of adopting containers in production?
IT has become a fast, market-driven environment where changes are dictated by business and technological enhancements. When adopting emerging technologies, companies are always looking to their Return of Investment (ROI) while striving to keep the Total Cost of Ownership (TCO) under reasonable thresholds. This is not always easy to attain.
This section will try to uncover the most important ones.
The technologies that power container technology are open source and became open standards widely adopted by many vendors or communities. Open source software, today adopted by large companies, vendors, and cloud providers, has many advantages, and provides great value for the enterprise. Open source is often associated with high-value and innovative solutions – that's simply the truth!
First, community-driven projects usually have a great evolutionary boost that helps mature the code and bring new features continuously. Open source software is available to the public and can be inspected and analyzed. This is a great transparency feature that also has an impact on software reliability, both in terms of robustness and security.
One of the key aspects is that it promotes an evolutionary paradigm where only the best software is adopted, contributed, and supported; container technology is a perfect example of this behavior.
We have already stated that containers are a technology that enables users to package and isolate applications with their entire runtime environment, which means all the files necessary to run. This feature unlocks one key benefit – portability.
This means that a container image can be pulled and executed on any host that has a container engine running, regardless of the OS distribution underneath. A CentOS or nginx image can be pulled indifferently from a Fedora or Debian Linux distribution running a container engine and executed with the same configuration.
Again, if we have a fleet of many identical hosts, we can choose to schedule the application instance on one of them (for example, using load metrics to choose the best fit) with the awareness of having the same result when running the container.
Container portability also reduces vendor lock-ins and provides better interoperability between platforms.
As stated before, containers help solve the old it works on my machine pattern between development and operations teams when it comes to deploying applications for production.
As a smart and easy packaging solution for applications, they meet the developers' need to create self-consistent bundles with all the necessary binaries and configurations to run their workloads seamlessly. As a self-consistent way to isolate processes and guarantee separation of namespaces and resource usage, they are appreciated by operations teams who are no more forced to maintain complex dependencies constraints or segregate every single application inside VMs.
From this point of view, containers can be seen as facilitators of DevOps best practices, where developers and operators work closer to deploy and manage applications without rigid separations.
Developers who want to build their own container images are expected to be more aware of the OS layer built into the image and work closely with operations teams to define build templates and automations.
Containers are built for the cloud, designed with an immutable approach in mind. The immutability pattern clearly states that changes in the infrastructure (be it a single container or a complex cluster) must be applied by redeploying a modified version and not by patching the current one. This helps to increase a system's predictability and reliability.
When a new application version must be rolled out, it is built into a new image and a new container is deployed in place of the previous version. Build pipelines can be implemented to manage complex workflows, from application build and image creation, image registry push and tagging, until deployment in the target host. This approach drastically shortens provisioning time while reducing inconsistencies.
We will see later in this book that dedicated container orchestration solutions such as Kubernetes also provide ways to automate the scheduling patterns of large fleets of hosts and make containerized workloads easy to deploy, monitor, and scale.
Compared to virtual machines, containers have a lightweight footprint that drives much greater efficiency in the consumption of compute and memory resources. By providing a way to simplify workload execution, container adoption brings great cost savings.
IT resources optimization is achieved by reducing the computational cost of applications; if an application server that was running on top of a virtual machine can be containerized and executed on a host along with other containers (with dedicated resource limits and requests), computing resources can be saved and reused.
Whole infrastructures can be re-modulated with this new paradigm in mind; a bare-metal machine previously configured as a hypervisor can be reallocated as a worker node of a container orchestration system that simply runs more granular containerized applications as containers.
Microservice architectures split applications into multiple services that perform fine-grained functions and are part of the application as a whole.
Traditional applications have a monolithic approach where all the functions are part of the same instance. The purpose of microservices is to break the monolith into smaller parts that interact independently.
Monolithic applications fit well into containers, but microservice applications have an ideal match with them.
Having one container for every single microservice helps to achieve important benefits, such as the following:
Independent scalability of microservicesMore defined responsibilities for development teams' cloud access programPotential adoption of different technology stacks over the different microservicesMore control over security aspects (such as public-facing exposed services, mTLS connections, and so on)Orchestrating microservices can be a daunting task when dealing with large and articulated architectures. The adoption of orchestration platforms such as Kubernetes, service mesh solutions such as Istio or Linkerd, and tracing tools such as Jaeger and Kiali becomes crucial to achieving control over complexity.
Where do containers come from? Containers' technology is not a new topic in the computer industry, as we will see in the next paragraphs. It has deep roots in OS history, and we'll discover that it could be even older than us!
This section rewinds the tape and recaps the most important milestones of containers in OS history, from Unix to GNU/Linux machines. A useful glance in the past to understand how the underlying idea evolved through the years.
If we want to create an events timeline for our travel time in the containers' history, the first and older destination is 1979 – the year of Unix V7. At that time, way back in 1979, an important system call was introduced in the Unix kernel – the chroot system call.
Important Note
A system call (or syscall) is a method used by an application to request something from the OS's kernel.
This system call allows the application to change the root directory of the running copy of itself and its children, removing any capability of the running software to escape that jail. This feature allows you to prohibit the running application access to any kind of files or directory outside the given subtree, which was really a game changer for that time.
After some years, way back in 1982, this system call was then introduced, also in BSD systems.
Unfortunately, this feature was not built with security in mind, and over the years, OS documentation and security literature strongly discouraged the use of chroot jails as a security mechanism to achieve isolation.
Chroot was only the first milestone in the journey towards complete process isolation in *nix systems. The next was, from a historic point of view, the introduction of FreeBSD jails.
Making some steps forward in our history trip, we jump back (or forward, depending on where we're looking from) to 2000, when the FreeBSD OS approved and released a new concept that extends the old and good chroot system call – FreeBSD jails.
Important Note
FreeBSD is a free and open source Unix-like operating system first released in 1993, born from the Berkeley Software Distribution, which was originally based on Research Unix.
As we briefly reported previously, chroot was a great feature back in the '80s, but the jail it creates can easily be escaped and has many limitations, so it was not adequate for complex scenarios. For that reason, FreeBSD jails were built on top of the chroot syscall with the goal of extending and enlarging its feature set.
In a standard chroot environment, a running process has limitations and isolation only at the filesystem level; all the other stuff, such as running processes, system resources, the networking subsystem, and system users, is shared by the processes inside the chroot and the host system's processes.
Looking at FreeBSD jails, its main feature is the virtualization of the networking subsystem, system users, and its processes; as you can imagine, this improves so much the flexibility and the overall security of the solution.
Let's schematize the four key features of a FreeBSD jail:
A directory subtree: This is what we already saw also for the chroot jail. Basically, once defined as a subtree, the running process is limited to that, and it cannot escape from it.An IP address: This is a great revolution; finally, we can define an independent IP address for our jail and let our running process be isolated even from the host system.A hostname: Used inside the jail, this is, of course, different from the host system.A command: This is the running executable and has an option to be run inside the system jail. The executable has a relative path that is self-contained in the jail.One plus of this kind of jail is that every instance has also its own users and root account that has no kind of privileges or permissions over the other jails or the underlying host system.
Another interesting feature of FreeBSD jails is that we have two ways of installing/creating a jail:
From binary-reflecting the ones we might install with the underlying OSFrom the source, building from scratch what's needed by the final applicationMoving back to our time machine, we must jump forward only a few years, to 2004 to be exact, to finally meet the first wording we can recognize – Solaris Containers.
Important Note
Solaris is a proprietary Unix OS born from SunOS in 1993, originally developed by Sun Microsystems.
To be honest, Solaris Containers was only a transitory naming of Solaris Zones, a virtualization technology built-in Solaris OS, with help also from a special filesystem, ZFS, that allows storage snapshots and cloning.
A zone is a virtualized application environment, built from the underlying operating system, that allows complete isolation between the base host system and any other applications running inside other zones.
The cool feature that Solaris Zones introduced is the concept of a branded zone. A branded zone is a completely different environment compared to the underlying OS, and can container different binaries, toolkits, or even a different OS!
Finally, for ensuring isolation, a Solaris zone can have its own networking, its own users, and even its own time zone.
Let's jump forward four years more and meet Linux Containers (LXC). We're in 2008, when Linux's first complete container management solution was released.
LXC cannot just be simplified as a manager for one of the first container implementations of Linux containers, because its authors developed a lot of the kernel features that now are also used for other container runtimes in Linux.
LXC has its own low-level container runtime, and its authors made it with the goal of offering an isolated environment as close as possible to VMs but without the overhead needed for simulating the hardware and running a brand-new kernel instance. LXC achieves this a goal and isolation thanks to the following kernel functionalities:
NamespacesMandatory access controlControl groups (also known as cgroups)Let's recap the kernel functionalities that we saw earlier in the chapter.
A namespace isolates processes that abstract a global system resource. If a process makes changes to a
