34,79 €
DevOps is a set of practices that help remove barriers between developers and system administrators, and is implemented by Google through site reliability engineering (SRE).
With the help of this book, you'll explore the evolution of DevOps and SRE, before delving into SRE technical practices such as SLA, SLO, SLI, and error budgets that are critical to building reliable software faster and balance new feature deployment with system reliability. You'll then explore SRE cultural practices such as incident management and being on-call, and learn the building blocks to form SRE teams. The second part of the book focuses on Google Cloud services to implement DevOps via continuous integration and continuous delivery (CI/CD). You'll learn how to add source code via Cloud Source Repositories, build code to create deployment artifacts via Cloud Build, and push it to Container Registry. Moving on, you'll understand the need for container orchestration via Kubernetes, comprehend Kubernetes essentials, apply via Google Kubernetes Engine (GKE), and secure the GKE cluster. Finally, you'll explore Cloud Operations to monitor, alert, debug, trace, and profile deployed applications.
By the end of this SRE book, you'll be well-versed with the key concepts necessary for gaining Professional Cloud DevOps Engineer certification with the help of mock tests.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 579
Veröffentlichungsjahr: 2021
A practical guide to SRE and achieving Google's Professional Cloud DevOps Engineer certification
Sandeep Madamanchi
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Wilson D'souza
Publishing Product Manager: Vijin Boricha
Senior Editor: Rahul D'souza
Content Development Editor: Nihar Kapadia
Technical Editor: Sarvesh Jaywant
Copy Editor: Safis Editing
Project Coordinator: Neil D'mello
Proofreader: Safis Editing
Indexer: Vinayak Purushotham
Production Designer: Nilesh Mohite
First published: July 2021
Production reference: 3280621
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-83921-801-9
www.packt.com
To my wonderful daughter – Adwita. As I see you grow every day, you motivate me to learn and inspire me to be a better person. Thank You!!
Sandeep Madamanchi works at a company called Variant that re-engineers trucking through technology, acting as the Head of Cloud Infrastructure and Data Engineering. He is a continuous learner, focused on building highly resilient, secure, and self-healing cloud infrastructure. He advocates SRE and its practices to achieve reliability and operational stability. His vision is to provide infrastructure as a service to core engineering, analytics, and ML specialized teams through DevOps, DataOps, and MLOps practices.
Prior to Variant, he was Director of R&D at Manhattan Associates; a leader in supply chain software and worked there for over 14 years. He holds certifications across AWS and GCP; which includes the Professional Cloud DevOps Engineer Certification.
I want to thank my family, friends, and mentors. A big shout out to the Packt team (Neil, Nihar, Rahul, Riyan) led by Vijin Boricha for their professionalism, and to the technical reviewers for their constructive feedback. A big "thank you" to my best friend – Mridula, who kickstarted my cloud journey. Finally, I would like to express my gratitude to my parents (both of whom are educators) and to my grandparents for their support during my learning phase.
Richard Rose has worked in IT for over 20 years and holds many industry certifications. He is the author of Hands-On Serverless Computing with Google Cloud. As someone who works with the cloud every day, Rich is a strong proponent of cloud certifications. For him, gaining expertise in an area is not the end of the journey. It is the start of realizing your potential.
In his spare time, he enjoys spending time with his family and playing the guitar. Recently, he has updated his blog to include his development-side projects and general computing tips.
I would like to acknowledge the time, support, and energy of Dawn, Bailey, Elliot, Noah, and Amelia. Without your assistance, it would be impossible to achieve the possible.
Vaibhav Chopra has extensive experience in the fields of DevOps, SRE, Cloud Infra, and NFV. He currently works as a senior engineering manager for Orange, and prior to that, he worked with Ericsson, Amdocs, and Netcracker over a period of 11 years.
He is an open source enthusiast and is passionate about the latest technologies, as well as being a multi-cloud expert and a seasoned blogger.
LFN has awarded him for his contribution in open source projects and he has participated as a trainer and speaker at multiple conferences. In his current role, he drives the team in relation to various CNCF projects, largely revolving around container security, resiliency, and benchmarking, to build a production-grade hybrid cloud (GCP Anthos and Openstack) solution.
I would like to thank 3 ladies, my mother Anu, my wife, Swati, my daughter, Saanvika, who have supported me to contribute extra time towards this and to my friends, who continuously motivate and inspire me to learn new things.
Bruno S. Brasil is a cloud engineer who has always used Linux and then started working in on-premises environments, participating in the modernization and migration to cloud solutions. After working with several cloud providers, he chose Google Cloud as his focus of expertise. Since then, he has worked on projects of this type as a consultant and engineer, for several types of businesses, ranging from digital banks and marketplaces to start-ups. He has always focused on implementing best practices in the development of infrastructure as code, disseminating the DevOps culture and implementing SRE strategies. He is enthusiastic about the open source community and believes that this is the most important path in terms of the growth of new professionals.
I would like to thank my family and friends who have always supported me since I chose to follow the path of technology. It was difficult to begin with, but the fact that I am here today is because of the support I have received from them. I would also like to thank the open source community, everyone who shares their experiences, solutions, and time, for organizing events of all kinds. This is essential to making knowledge accessible and democratic.
This book is a comprehensive guide to Site Reliability Engineering (SRE) fundamentals, Google's approach to DevOps. In addition, the book dives into critical services from Google Cloud to implement Contiuous Integration/Continous Deployment (CI/CD) with a focus on containerized deployments via Kubernetes. The book also serves as preparation material for the Professional Cloud DevOps Engineer certification from Google Cloud with chapter-based tests and mock tests included.
This book is ideal for cloud system administrators and network engineers interested in resolving cloud-based operational issues. IT professionals looking to enhance their careers in administering Google Cloud services will benefit. Users who want to learn about applying SRE principles and are focused on implementing DevOps in Google Cloud Platform (GCP) will also benefit. Basic knowledge of cloud computing and GCP services, an understanding of CI/CD, and hands-on experience with Unix/Linux infrastructure are recommended. Those interested in passing the Professional Cloud DevOps Engineer certification will find this book useful.
The Professional Cloud DevOps Engineer certification is advanced in nature. To better prepare for this certification, it is recommended to either prepare for or be certified in the Associate Cloud Engineer certification or Professional Cloud Architect certification. Though these certifications are not prerequisites for the Professional Cloud DevOps Engineer certification, they help to better prepare and be acquainted with services on GCP.
Chapter 1, DevOps, SRE, and Google Cloud Services for CI/CD, covers DevOps, which is a set of practices that builds, tests, and releases code in a repeatable and iterative manner. These practices are aimed to break down the metaphoric wall between development and operation teams. SRE is a prescriptive way from Google to implement DevOps that aligns incentives between the development and operations teams to build and maintain reliable systems. In addition, Google recommends a cloud-native development paradigm where complex systems are decomposed into multiple services using microservices architecture.
This chapter will cover topics that include the DevOps life cycle, the evolution of SRE, an introduction to key technical and cultural SRE practices, and the benefits of using a cloud-native development paradigm. The chapter will also introduce services on GCP to implement cloud-native development and apply SRE concepts.
Chapter 2, SRE Technical Practices – Deep Dive, covers reliability, which is the most critical feature of a service and should be aligned with business objectives. SRE prescribes specific technical practices to measure characteristics that define and track reliability. These technical practices include the Service-Level Agreement (SLA), Service-Level Objective (SLO), Service-Level Indicator (SLI), error budgets, and eliminating toil through automation.
This chapter goes in depth into SRE technical practices. This chapter will cover topics that include the blueprint for a well-defined SLA, defining reliability expectations via SLOs, understanding reliability targets and their implications, categorizing user journeys, sources to measure SLIs, exploring ways to make a service reliable by tracking error budgets, and eliminating toil through automation. The chapter concludes by walking through two scenarios where the impact of SLAs, SLOs, and error budgets are illustrated relative to the SLI being measured.
Chapter 3, Understanding Monitoring and Alerting to Target Reliability, discusses how the key to implementing SRE technical practices is to ensure that SLAs, SLOs, and SLIs are never violated. This makes it feasible to balance new feature releases and still maintain system reliability since the error budget is not exhausted. Monitoring, alerting, and time series are fundamental concepts to track SRE technical practices.
This chapter goes in depth into monitoring, alerting, and time series. This chapter will cover topics that include monitoring sources, monitoring types, monitoring strategies, the golden signals that are recommended to be measured and monitored, potential approaches and key attributes to define an alerting strategy, and the structure and cardinality of time series.
Chapter 4, Building SRE Teams and Applying Cultural Practices, covers how for a long time, Google considered SRE as their secret sauce to achieving system reliability while maintaining a balance with new feature release velocity. Google achieved this by applying a prescribed set of cultural practices such as incident management, being on call, and psychological safety. SRE cultural practices are required to implement SRE technical practices and are strongly recommended by Google for organizations that would like to start their SRE journey. In addition, Google has put forward aspects that are critical to building SRE teams along with an engagement model.
This chapter goes in depth into building SRE teams, which includes topics around different implementations of SRE teams, details around staffing SRE engineers, and insights into the SRE engagement model. This chapter will also cover topics on SRE cultural practices that include facets of effective incident management, factors to consider while being on call, and factors to overcome to foster psychological safety. The chapter concludes with a cultural practice that is aimed to reduce organizational silos by sharing vision and knowledge and by fostering collaboration.
Chapter 5, Managing Source Code Using Cloud Source Repositories, looks at how source code management is the first step in a CI flow. Code is stored in a source code repository such as GitHub or Bitbucket so that developers can continuously make code changes and the modified code is integrated into the repository. Cloud Source Repositories (CSR) is a service from Google Cloud that provides source code management through private Git repositories.
This chapter goes in depth into CSR. This chapter will cover topics that include key features of CSR, steps to create and access the repository, how to perform one-way sync from GitHub/Bitbucket to CSR, and common operations in CSR such as browsing repositories, performing universal code search, and detecting security keys. The chapter concludes with a hands-on lab that illustrates how code can be deployed in Cloud Functions by pulling code hosted from CSR.
Chapter 6, Building Code Using Cloud Build, and Pushing to Container Registry, looks at how once code is checked into a source code management system such as CSR, the next logical step in a CI flow is to build code, create artifacts, and push to a registry that can store the generated artifacts. Cloud Build is a service from Google Cloud that can build the source code whereas Container Registry is the destination where the created build artifacts are stored.
This chapter goes in depth into Cloud Build and Container Registry. This chapter will cover topics that include understanding the need for automation, processes to build and create container images, key essentials of Cloud Build, strategies to optimize the build speed, key essentials of Container Registry, the structure of Container Registry, and Container Analysis. The chapter concludes with a hands-on lab to build, create, push, and deploy a container to Cloud Run using Cloud Build triggers. This hands-on lab also illustrates a way to build a CI/CD pipeline as it includes both CI and automated deployment of containers to Cloud Run, a GCP compute option that runs containers.
Chapter 7, Understanding Kubernetes Essentials to Deploy Containerized Applications, covers Kubernetes, or K8s, which is an open source container orchestration system that can run containerized applications but requires significant effort in terms of setting up and ongoing maintenance. Kubernetes originated as an internal cluster management tool from Google; Google donated it to the Cloud Native Computing Foundation (CNCF) as an open source project in 2014.
This chapter goes in depth into K8s. This chapter will cover topics that include key features of K8s, elaboration of cluster anatomy, which includes components of the master control plane, node components, key Kubernetes objects such as Pods, Deployments, StatefulSets, DaemonSet, Job, CronJob, and Services, and critical factors that need to be considered while scheduling Pods. The chapter concludes with a deep dive into possible deployment strategies in Kubernetes, which includes Recreate, Rolling Update, Blue/Green, and Canary.
Chapter 8, Understanding GKE Essentials to Deploy Containerized Applications, covers Google Kubernetes Engine (GKE), which is a managed version of K8s; that is, an open source container orchestration system to automate application deployment, scaling, and cluster management. GKE requies less effort in terms of cluster creation and ongoing maintenance.
This chapter goes in depth into GKE. This chapter will cover topics that include GKE core features such as GKE node pools, GKE cluster configurations, GKE autoscaling, networking in GKE, which includes Pod and service networking, GKE storage options, and Cloud Operations for GKE. There are two hands-on labs in this chapter. The first hands-on lab is placed at the start of the chapter and illustrates cluster creation using Standard mode, deploying workloads, and exposing the Pod as a service. The chapter concludes with another hands-on lab like the first one but with the cluster creation mode as Autopilot.
Chapter 9, Securing the Cluster Using GKE Security Constructs, covers how securing a Kubernetes cluster is a critical part of deployment. Native Kubernetes provide some essential security features that focus on how a request being sent to the cluster is authenticated and authorized. It is also important to understand how the master plane components are secured along with the Pods running the applications. Additionally, GKE provides security features that are fundamental to harden a cluster's security.
This chapter goes in depth into GKE security features. This chapter will cover topics that include essential security patterns in Kubernetes, control plane security, and Pod security. This chapter concludes by discussing various GKE-specific security features such as GKE private clusters, container-optimized OS, shielded GKE nodes, restricting traffic among Pods using a network policy, deploying time security services via binary authorizations, and using a workload identity to access GCP services from applications running inside the GKE cluster.
Chapter 10, Exploring GCP Cloud Operations, looks at how once an application is deployed, the next critical phase of the DevOps life cycle is continuous monitoring as it provides a feedback loop to ensure the reliability of a service or a system. As previously discussed in Chapter 2, SRE Technical Practices – Deep Dive, SRE prescribes specific technical tools or practices that help in measuring characteristics that define and track reliability, such as SLAs, SLOs, SLIs, and error budgets. SRE prescribes the use of observability to track technical practices. Observability on GCP is established through Cloud Operations.
This chapter goes in depth into Cloud Operations. This chapter will cover topics that include Cloud Monitoring essentials such as workspaces, dashboards, Metrics Explorer, uptime checks, configuring alerting, and the need for a monitoring agent and access controls specific to Cloud Monitoring. The chapter will also cover topics that include Cloud Logging essentials such as audit logs with their classification, summarizing log characteristics across log buckets, logs-based metrics, access controls specific to Cloud Logging, network-based log types, and the use of logging agents. This chapter concludes by discussing various essentials tied to Cloud Debugger, Cloud Trace, and Cloud Profiler.
It is recommended that you have prior knowledge on topics that include Docker, an introduction to native Kubernetes, a working knowledge of Git, getting hands on with key services on Google Cloud such as Cloud Operations, compute services, and hands-on usage of Google Cloud SDK or Cloud Shell. Additionally, hands-on knowledge of a programming language of choice such as Python, Java, or Node.js will be very useful. The code samples in this book are, however, written in Python.
If you are using the digital version of this book, we advise you to type the code yourself or access the code via the GitHub repository (link available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
Irrespective of whether you are working toward the Professional Cloud DevOps Engineer certification, it is recommended to attempt the mock exam after completing all the chapters. This will be a good way to assess the learnings absorbed from the content of the book.
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Google-Cloud-Platform-for-DevOps-Engineers. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: http://www.packtpub.com/sites/default/files/downloads/9781839218019_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You can use the my-first-csr repository."
A block of code is set as follows:
steps:
- name: 'gcr.io/cloud-builders/docker'
args: ['build', '-t', 'gcr.io/$PROJECT_ID/builder-myimage', '.']
- name: 'gcr.io/cloud-builders/docker'
args: ['push', 'gcr.io/$PROJECT_ID/builder-myimage']
- name: 'gcr.io/cloud-builders/gcloud'
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-vpa
Any command-line input or output is written as follows:
gcloud builds submit --config <build-config-file> <source-code-path>
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "Navigate to Source Repositories within GCP and select the Get Started option."
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
The core focus of this section is Site Reliability Engineering (SRE). The section starts with the evolution of DevOps and its life cycle and introduces SRE as a prescriptive way to implement DevOps. The emphasis is on defining the reliability of a service and ways to measure it. In this section, SRE technical practices such as Service-Level Agreement (SLA), Service-Level Objective (SLO), Service-Level Indicator (SLI), error budget, and eliminating toil are introduced and are further elaborated. Fundamentals and critical concepts around monitoring, alerting, and time series are also explored to track SRE technical practices. However, SRE technical practices cannot be implemented within an organization without bringing cultural change. Google prescribes a set of SRE cultural practices such as incident management, being on call, psychological safety, and the need to foster collaboration. This section also explores the aspects critical to building SRE teams and provides insights into the SRE engagement model. It's important to note that the start of the section introduces services in Google Cloud to implement DevOps through the principles of SRE; however, the details of those services are explored in the next section.
This part of the book comprises the following chapters:
Chapter 1, DevOps, SRE, and Google Cloud Services for CI/CDChapter 2, SRE Technical Practices – Deep DiveChapter 3, Understanding Monitoring and Alerting to Target ReliabilityChapter 4, Building SRE Teams and Applying Cultural Practices