39,59 €
The AWS Certified DevOps Engineer certification is one of the highest AWS credentials, vastly recognized in cloud computing or software development industries. This book is an extensive guide to helping you strengthen your DevOps skills as you work with your AWS workloads on a day-to-day basis.
You'll begin by learning how to create and deploy a workload using the AWS code suite of tools, and then move on to adding monitoring and fault tolerance to your workload. You'll explore enterprise scenarios that'll help you to understand various AWS tools and services. This book is packed with detailed explanations of essential concepts to help you get to grips with the domains needed to pass the DevOps professional exam. As you advance, you'll delve into AWS with the help of hands-on examples and practice questions to gain a holistic understanding of the services covered in the AWS DevOps professional exam. Throughout the book, you'll find real-world scenarios that you can easily incorporate in your daily activities when working with AWS, making you a valuable asset for any organization.
By the end of this AWS certification book, you'll have gained the knowledge needed to pass the AWS Certified DevOps Engineer exam, and be able to implement different techniques for delivering each service in real-world scenarios.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 775
Veröffentlichungsjahr: 2021
Pass the DOP-C01 exam and prepare for the real world using case studies and real-life examples
Adam Book
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Wilson Dsouza
Publishing Product Manager: Rahul Nair
Senior Editor: Arun Nadar
Content Development Editor: Rafiaa Khan
Technical Editor: Arjun Varma
Copy Editor: Safis Editing
Project Coordinator: Shagun Saini
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Joshua Misquitta
First published: February 2022
Production reference: 1031121
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80107-445-2
www.packt.com
Writing a book is harder than I thought but more rewarding than I ever conceived. I'm eternally grateful for my wife Mariel, who not only told me this was an opportunity that I couldn't turn down, but for also keeping our two daughters, Sara and Emma, at bay so I had time to write and edit. I'd also like to thank the whole Packt team for presenting me with this opportunity. Finally, thank you to Presidio for all the support they have shown me when I said I was going to be taking this challenge on.
To all the people who take on the challenge of getting AWS certified and passing. – Adam Book
Adam Book has been programming since the age of six and has been constantly tapped by founders and CEOs as one of the pillars to start their online or cloud businesses.
Adam has been developing applications and websites and has been involved in cloud computing and data center transformation professionally since 1996 and has made it a primary professional focus to bring the benefits of cloud computing to his clients. In addition, he has led technology teams in transformative changes such as the shift to programming in sprints, with Agile formats, and has also mentored organizations on the benefits of peer programming in local and remote formats along with test-driven development or business-driven development.
Adam is a passionate cloud evangelist with a proven track record of using the AWS platform from the very beginning to successfully migrate thousands of application workloads to the cloud and guiding businesses in understanding cloud economics to create business cases and identify operating model gaps.
Adam has passed the AWS Professional exams, both Architect and DevOps Engineer, multiple times along with the underlying associate exams. He has also achieved the Security Specialty Certification from AWS. In addition to this, Adam has helped multiple organizations achieve Premier Partner status, along with being instrumental in the attainment of numerous AWS Competencies.
Sebastian Krueger started his AWS certification journey in 2014 and since then he has passed over 17 AWS Certification Exams (including re-certifications). He is the Founder of the Wellington (NZ) AWS User Group and has been an Organiser of the community group ever since. In 2019 he was a Founder of the AWS User Group Aotearoa and served as its first President.
After founding the successful AWS Consultancy 'API Talent' in New Zealand in 2014, his company was acquired by Deloitte in 2018. Sebastian then served as Partner in the Cloud Engineering practice.
Since 2021, he has been working at AWS in Wellington, New Zealand as a Principal Solution Architect.
Sebastian has also been a co-author of Java CAPS Basics (2008) and Tech Reviewer of Serverless Single Page Apps (2016) and Learn AWS Serverless Computing (2019).
Stoyan Zdravkov Yanev is a certified Solutions Architect and passionate cloud evangelist. His experience and specialization is on building modern event-driven architectures and creating scalable kubernetes infrastructures. Stoyan has broad knowledge in securing and scaling Enterprise projects, which he built during his stay in companies like NewsUK, VMware, MentorMate, and Schwarz IT. He is also the author of the book AWS Fundamentals, which will be published in 2022. Stoyan now works as a senior DevOps/Cloud consultant focusing on the middle market, and helping smaller organizations achieve the same benefits as large enterprises.
I would like to thank my wife Gergana for her continued support and encouragement with everything that I do. You have always pushed me towards new adventures, accomplishing my goals, and doing what is right. I appreciate what you have done for us and I love you. I'd also like to thank my son Boris for showing me what is actually important in life.
Stoyan Zdravkov Yanev
More and more companies are making the move to the cloud, and more specifically, the Amazon Web Services (AWS) cloud, every day. Once in the cloud, these companies and enterprises are looking to streamline their processes and Software Development Life Cycles (SDLCs) through the use of techniques found in DevOps practices. This includes automating build and release processes so that development teams can focus on writing the code and features that customers desire. This also includes making sure that logging and monitoring are in place, not just for the sake of checking a checkbox on a deployment list, but instead to empower teams to quickly find the root causes of their issues, be they performance, error, or security related.
The need for skilled, certified AWS individuals is at an all-time high and growing. Passing the AWS DevOps Engineer Professional exam allows you to be separated instantly from others, showing that you have taken the time and effort to not only learn these valuable skills but also pass the rigorous standard that is the AWS Professional certification.
Certifications offered by AWS, especially Professional certification exams, are not easy. Those that work and hire in this industry know this. Not only are the exams themselves time-consuming at around 3 hours to take, but they are also constantly being updated.
There is a lot of information to digest and understand on this exam. You not only need to know the topics that are covered in the blueprint provided by AWS, but you also need to have a solid foundation of AWS knowledge in order to use those services. It helps if you have real-world experience or at least hands-on practice with using these services. This is where the exercises that have been included in this book come in. They serve not as an ending point but hopefully as a starting point to build on and create other projects from so that you have the confidence when you press start on your test that you have the skills and the knowledge you need to pass the certification test and take your career to the next level.
This book is designed to help you gain a firm grasp of the services that are presented in the AWS DevOps Professional exam. This is accomplished in a variety of methods. There are sample architectures presented for many of the services so that you can visualize how different services work together. There are plenty of hands-on examples to try and therefore see how things work in the real world. There are also example cases of best use cases and anti-patterns for the different services. These are especially important to understand when evaluating the Professional exam questions, which are presented in a large scenario format. Understanding where one service is the best fit and where it is not a fit can help you decide on the correct answer.
Finally, this book is meant to be not only a study guide in passing the test but also a reference guide in performing your day-to-day activities once you have passed. This is where the beyond comes in. There is extra information presented that is not required for the exam, and this is included on purpose. I wanted to share some of the years of experience I have gained from working with all types of businesses on AWS, from small and medium-sized companies to large Fortune 100 enterprises.
No matter where you are in your AWS journey, if you have the desire to become certified as a DevOps professional engineer, then this book will help you gain an understanding of the fundamental concepts and services, as well as examining the essential services that are covered by the exam blueprint.
With the opening chapter, we lay the foundation of what good looks like in the world of AWS, and although this seems like a lot of theory, it helps you work through many of the formidable scenarios that are presented in the exam questions. There are also plenty of hands-on exercises for using services that you may not be familiar with so that you have the confidence and experience both during the exam and afterward as well.
At the end of each chapter, there are knowledge assessments to help self-check to see whether you have a grasp of the information that is needed to not only pass this challenging exam but also succeed in the other tasks that lie ahead for you in the ever-changing world of cloud computing.
Chapter 1, Amazon Web Service Pillars, focuses on the foundational pillars that make up the Well-Architected Framework in AWS. By understanding these pillars, you will gain a better feel for the context of the questions being asked in the certification exam.
Chapter 2, Fundamental AWS Services, examines a large number of fundamental AWS services that are imperative to know going forward with future chapters. This may seem like a review for some that have already passed some of the lower associate exams. However, it can also serve as a quick refresher and provide a few tips that were previously unknown.
Chapter 3, Identity and Access Management and Working with Secrets in AWS, focuses on the fundamental security building blocks of AWS, which are identity and access management using the IAM service. After a quick look at the shared security model from AWS and the concepts of authorization versus authentication, we review creating users and groups. Providing access to other accounts via cross-account access is also covered in this chapter with a practical exercise. In this fundamental security chapter, we also talk about other essential security services that may appear in test questions, such as AWS Directory Service, Secrets Manager, and Systems Manager Parameter Store. There are comparisons on when to use and not to use the different versions of AWS Directory Service, along with which service would be better to store your secrets. Finally, we take a look at Amazon Cognito and how it can help with application authentication.
Chapter 4, Amazon S3 Blob Storage, focuses on one of the key services in AWS Simple Storage Service, or S3. Even though this service is easy to start using right away, it also has a number of features and functions available to it that you must be aware of if you are trying to become certified on AWS.
Chapter 5, Amazon DynamoDB, explains the native NoSQL database DynamoDB. It looks at not only some of the essential concepts of DynamoDB but also topics such as streams, understanding global tables, using DynamoDB Accelerator, and even using Web Federation to connect to your DynamoDB tables.
Chapter 6, Understanding CI/CD and the SDLC, focuses on many theoretical aspects of continuous integration, continuous development, and continuous deployment. We then look at the SDLC along with which services map to different stages of the SDLC.
Chapter 7, Using CloudFormation Templates to Deploy Workloads, teaches you about using Infrastructure as Code using the native CloudFormation service. First, we'll go over the basics of CloudFormation templates, but then we'll quickly ramp up to examples of creating a changeset for a basic template, and then move on to intrinsic functions and nested stacks. Using the knowledge of CloudFormation templates, we discuss how ServiceCatalog can be used to serve up templated designs for developers and non-developers in a quick and easy fashion. This chapter closes by going over the Cloud Development Kit, which can be programmed in your language of choice and then used to create CloudFormation templates.
Chapter 8, Creating Workloads with CodeCommit and CodeBuild, guides you through the initial steps of the SDLC using native AWS tooling. We start by creating a brand-new group and user, who is a developer, with a whole new set of permissions that are scoped to just this user's role. After creating an initial CodeCommit repository, we have our developer use Git to commit code onto a feature branch and then request a merge to the main branch. Next, we examine the CodeBuild service by having the service build a container using AWS CodeBuild.
Chapter 9, Deploying Workloads with CodeDeploy and CodePipeline, shows you how to create DevOps pipelines using the native AWS CodePipeline service. This is a chapter where many of the services that we have been talking about and practicing previously come into play. The pipeline example being used is crafted in a CloudFormation template. The developer user that we previously created also needs expanded access in order to view and run our pipeline, so there is an exercise to elaborate their IAM permissions. Also discussed in this chapter is how to deploy workloads using the AWS CodeDeploy service.
Chapter 10, Using AWS OpsWorks to Manage and Deploy Your Application Stack, focuses on how to create stacks and layers to deploy infrastructure and applications using the AWS OpsWorks service. There is a comparison of the different versions of OpsWorks available, along with an exercise to create a stack with layers and an application.
Chapter 11, Using Elastic Beanstalk to Deploy Your Application, walks through one of the key services on the DevOps Professional exam – Elastic Beanstalk. Creating and deploying an application in Elastic Beanstalk using the EB CLI not only lets you see things through the lens of the developer, but also allows you to think about how you would automate these types of tasks in the real world.
Chapter 12, Lambda Deployments and Versioning, explores the concepts of serverless and using the AWS Lambda platform for serverless computing. With the cost savings available from running compute needs on an on-demand, pay-per-usage basis, this is becoming a more and more desired state in organizations today. We talk about not only how to deploy and monitor Lambda functions but also how to implement versions and aliases. At the end of the chapter, we even go through orchestrating multiple functions in a step function.
Chapter 13, Blue/Green Deployments, focuses on blue/green deployment strategies and the different variations of those strategies, including which service can use the various strategies and how to implement the different strategies depending on the services that you are utilizing. There are specific strategies that you can employ when using EC2 instances and autoscaling groups, and there are others that are available when using a Lambda function. Ensuring that your end users and customers have a seamless experience, even if you have an issue during deployment, is what this chapter is truly about.
Chapter 14, CloudWatch and X-Ray's Role in DevOps, shows you the role that monitoring and logging play using the native CloudWatch and X-Ray services from AWS. Log streams and searching through logs can be tedious tasks and sometimes feel like looking for a needle in a haystack. The same can be said of performance problems. Adding the X-Ray service to your Lambda application can help you quickly pinpoint where the issues are and know where to remedy the issues.
Chapter 15, CloudWatch Metrics and Amazon EventBridge, shows you how to use the metrics from various services and then tie them in the Amazon EventBridge service to create automated alerts for your systems. We discuss which metrics are some of the most useful for different vital services to keep a watch over. We also walk through creating dashboards in the Amazon CloudWatch console.
Chapter 16, Various Logs Generated (VPC Flow Logs, Load Balancer Logs, and CloudTrail Logs), examines the other types of logs that can be generated by AWS services that are not CloudWatch Logs. These logs are all valuable when troubleshooting information and may need to be turned on some or all of the time. The ability to know where to retrieve these logs and how to search the logs can be a task that you are called upon to do as a DevOps professional.
Chapter 17, Advanced and Enterprise Logging Scenarios, shows you real-world scenarios and architectures for building and processing log files. This includes incorporating not only the CloudWatch and CloudTrail services but also services such as Elasticsearch, Kinesis, and Lambda for the real-time processing of multiple log streams. Understanding the concepts of how to gather and process massive amounts of log files is important both for real-world engagements and for potential scenarios that could appear on the DevOps Professional certification exam.
Chapter 18,Auto Scaling and Lifecycle Hooks, covers how autoscaling and autoscaling groups work in detail. This includes examining the autoscaling life cycle and life cycle hooks. There is an exercise that walks you through creating a launch template, which is the successor of the launch configuration. We also go through a practice of removing and terminating instances inside of an autoscaling group.
Chapter 19, Protecting Data in Flight and at Rest, illustrates how the use of services such as Key Management Service and Amazon Certificate Manager helps protect data that is both sitting at rest as well as in transit. If you are building systems using Infrastructure as Code, you need to incorporate these key pieces into your system so that your data is safe from the very start.
Chapter 20, Enforcing Standards and Compliance with System Manger's Role and AWS Config, focuses on how to use automation to keep your AWS environment in a compliant state. Using the AWS Config service, you can keep a constant check on what is being created in your AWS environment. Combine this with rules that flag violations for what is not allowed in your environment to either send alerts or do automated enforcement and remediation. Add to this the capabilities of System Manager, which can automatically install software on instances using runbooks for needed compliance items such as virus scanners or perform regular operating system upgrades; then, creating an audit trail of performed tasks becomes much easier for your organization.
Chapter 21, Using Amazon Inspector to Check your Environment, shows you how to add automated security scans to your DevOps life cycle using the Amazon Inspector service. We look at how to configure the Inspector service in both an automated and manual manner and then view and understand the different reports that Inspector generates.
Chapter 22, Other Policy and Standards Services to Know, covers some of the services that have the tendency to appear on the DevOps Professional exam but did not make it into other chapters. These include services such as AWS GuardDuty, Amazon Macie, and Server Migration Service. We also go over AWS Organizations once again with its incorporation with the Service Catalog service to make sure that you have a full understanding of how those services work hand in hand.
Chapter 23, Overview of the DevOps Professional Certification Test, explains the testing process itself. It also has a list of extra resources that you should use in conjunction with this book to read and study for the exam, as well as some tips for studying.
Chapter 24, Practice Exam 1, is primarily meant to be a readiness check for you. This chapter presents questions as you will be presented on the exam and then gives you the answers and an explanation of why you would choose the correct answers to help you.
Prior knowledge in the following subjects will help you get the most understanding of this book:
Knowledge of software architecture, programming languages, and application designKnowledge of relational and non-relational databases Knowledge of AWS regions and geographical locations Understanding of JSON and YAML file formats Basic knowledge of operating systems and system commands Knowledge of logging and monitoring for applications and system availabilitySetting up the AWS CLI will be necessary to complete many of the exercises that have been presented throughout the book. A step-by-step walkthrough of installing the CLI is given inChapter 2, Fundamental AWS Services, if you do not already have the CLI installed on your computer.
If you are using the digital version of this book, we advise you to type the code yourself or access the code from the book's GitHub repository (a link is available in the next section). Doing so will help you avoid any potential errors related to the copying and pasting of code.
You can download the example code files for this book from GitHub at https://github.com/PacktPublishing/AWS-Certified-DevOps-Engineer-Professional-Certification-and-Beyond. If there's an update to the code, it will be updated in the GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots and diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801074452_ColorImages.pdf.
There are a number of text conventions used throughout this book.
Code in text: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "You will see both the username and password returned unencrypted in the SecretString field for you to use."
A block of code is set as follows:
{
"Project_ID": {"N": "0100"},
"Dept": {"S": "Test Team"},
"Dept_ID": {"N": "0001"},
"Project_Name": {"S": "Serverless Forms"},
"Owner": {"S": "Jerry Imoto"},
"Builds": {"NS": ["2212121"] },
"Language": {"S": "python" },
"Contact": {"S": "[email protected]" }
}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
[default]
exten => s,1,Dial(Zap/1|30)
exten => s,2,Voicemail(u100)
exten => s,102,Voicemail(b100)
exten => i,1,Voicemail(s0)
Any command-line input or output is written as follows:
$ aws iam list-groups --output text
$ aws iam create-group --group-name Admins
Bold: Indicates a new term, an important word, or words that you see onscreen. For instance, words in menus or dialog boxes appear in bold. Here is an example: "You have the option to encrypt the reports either with SSE-S3 or with a Key Management Service (KMS) key of your choosing."
Tips or important notes
Appear like this.
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, email us at [email protected] and mention the book title in the subject of your message.
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata and fill in the form.
Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Once you've read AWS Certified DevOps Engineer - Professional Certification and Beyond, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
In this part, we will look at the fundamentals of the AWS cloud, including the basis of the Well-Architected Framework, security, and storage.
This part of the book comprises the following chapters:
Chapter 1, Amazon Web Service PillarsChapter 2, Fundamental AWS ServicesChapter 3, Identity and Access Management and Working with Secrets in AWSChapter 4,Amazon S3 Blob StorageChapter 5,Amazon DynamoDBDevOps is, at its heart, a combination of the skills of development and operations and breaking down the walls between these two different teams. DevOps includes enabling developers to perform operational tasks easily. DevOps also involves empowering operational team members to create their Infrastructure as Code and use other coding techniques, such as continuous integration pipelines, to spin up the same infrastructure in multiple regions quickly.
In this book, we will go through the services and concepts that are part of the DevOps professional exam so that you have a solid understanding from a practical standpoint, in terms of both explanations and hands-on exercises.
Becoming Amazon Web Services (AWS) Certified not only gives you instant validation of the technical skills that you hold and maintain – it also strengthens you as a technical professional. The AWS DevOps Engineer Professional Certification is a cumulative test that incorporates the base knowledge of fundamental AWS services, including system operations capabilities for running, managing, and monitoring workloads in AWS. This is in addition to developing and deploying code to functions, containers, and instances.
We go look at the test itself in more depth in Chapter 23, Overview of the DevOps Professional Certification Test, as well as provide tips for taking the exam.
The AWS pillars are the five guiding principles that guide architects and developers in generally accepted cloud architecture and design. They are subtly referenced in the DevOps Professional exam, but the pillars and their guidelines are tenets of best practices for working with any cloud service provider – especially Amazon Web Services. These are all guiding principles in DevOps practices and pipelines, and having a sound understanding of these five items will not only help you come exam time, but serve you throughout your DevOps career journey.
In this chapter, we're going to cover the following main topics:
Operational excellenceSecurity Reliability Performance efficiencyCost optimizationAt first glance, you may be wondering why we aren't just jumping right into AWS, continuous integration/continuous delivery (CI/CD), and other DevOps topics. The main reason is that these five pillars are the foundational fabric of the exams. In addition, they will help you provide the most effective, dependable, and efficient environment for your company or clients. These design principles are not only important when architecting for success on Amazon Web Services, or any cloud provider for that matter, but in guiding the practices that you use throughout your day-to-day endeavors.
Once you become familiar with these pillars, you will see them and their themes in the testing questions as you go down your path for certification. This is especially true when working to obtain the DevOps Professional Certification as there are specific sections for Operations, Security, and Reliability.
The following are the five pillars of a well-architected framework:
Operational excellenceSecurity ReliabilityPerformance efficiencyCost optimizationUse these pillars as the guiding principles, not only for designing your workloads in AWS but also for improving and refactoring current workloads. Every organization should strive to achieve well-architected applications and systems. Therefore, improving any AWS applications you are working on will make you a valuable asset. Now, let's look at each of these pillars in detail.
As we look at the operational excellence pillar, especially in the context of DevOps, this is one – if not the most – important service pillar for your day-to-day responsibilities. We will start by thinking about how our teams are organized; after all, the DevOps movement came about from breaking down silos between Development and Operations teams.
Question – How does your team determine what its priorities are?
* Does it talk to customers (whether they're internal or external)?
* Does it get its direction from product owners who have drawn out a roadmap?
Amazon outlines five design principles that incorporate operational excellence in the cloud:
Performing Operations as CodeRefining operations frequentlyMaking small, frequent, and reversible changes Anticipating failure Learning from all operational failuresLet's take a look at each of these operational design principals in detail to see how they relate to your world as a DevOps engineer. As you go through the design principles of not only this pillar but all the service pillars, you will find that the best practices are spelled out, along with different services, to help you complete the objective.
With the contrivance of Infrastructure as Code, the cloud allows teams to create their applications using code alone, without the need to interact with a graphical interface. Moreover, it allows any the underlying networking, services, datastores, and more that's required to run your applications and workloads. Moving most, if not all, the operations to code does quite a few things for a team:
Distributes knowledge quickly and prevents only one person on the team from being able to perform an operationAllows for a peer review of the environment to be conducted, along with quick iterationsAllows changes and improvements to be tested quickly, without the production environment being disruptedIn AWS, you can perform Operations as Code using a few different services, such as CloudFormation, the Cloud Development Kit (CDK), language-specific software development kits (SDK), or by using the command-line interface (CLI).
As you run your workload in the cloud, you should be in a continual improvement process for not only your application and infrastructure but also your methods of operation. Teams that run in an agile process are familiar with having a retrospective meeting after each sprint to ask three questions: what went well, what didn't go well, and what has room for improvement?
Operating a workload in the cloud presents the same opportunities for retrospection and to ask those same three questions. It doesn't have to be after a sprint, but it should occur after events such as the following:
Automated, manual, or hybrid deployments Automated, manual, or hybrid testing After a production issueRunning a game day simulationAfter each of these situations, you should be able to look at your current operational setup and see what could be better. If you have step-by-step runbooks that have been created for incidents or deployments, ask yourself and your team whether there were any missing steps or steps that are no longer needed. If you had a production issue, did you have the correct monitoring in place to troubleshoot that issue?
As we build and move workloads into the cloud, instead of placing multiple systems on a single server, the best design practices are to break any large monolith designs into smaller, decoupled pieces. With the pieces being smaller, decoupled, and more manageable, you can work with smaller changes that are more reversible, should a problem arise.
The ability to reverse changes can also come in the form of good coding practices. AWS CodeCommit allows Git tags in code repositories. By tagging each release once it has been deployed, you can quickly redeploy a previous version of your working code, should a problem arise in the code base. Lambda has a similar feature called versions.
Don't expect that just because you are moving to the cloud and the service that your application is relying on is labeled as a managed service, that you no longer need to worry about failures. Failures happen, maybe not often; however, when running a business, any sort of downtime can translate into lost revenue. Having a plan to mitigate risks (and also test that plan) can genuinely mean the difference in keeping your service-level agreement (SLA) or having to apologize or, even worse, having to give customers credits or refunds.
Things fail from time to time, but when they do, it's important not to dwell on the failures. Instead, perform post-mortem analysis and find the lessons that can make the team and the workloads stronger and more resilient for the future. Sharing learning across teams helps bring everyone's perspective into focus. One of the main questions that should be asked and answered after failure is, Could the issue be resolved with automatic remediation?
One of the significant issues in larger organizations today is that in their quest of trying to be great, they stop being good. Sometimes, you need to be good at the things you do, especially on a daily basis. It can be a steppingstone to greatness. However, the eternal quest for excellence without the retrospective of what is preventing you from becoming good can sometimes be an exercise in spinning your wheels, and not gaining traction.
Let's take a look at the following relevant example, which shows the implementation of automated patching for the instances in an environment:
Figure 1.1 – Operational excellence – automated patching groups
If you have instances in your environment that you are self-managing and need to be updated with patch updates, then you can use System Manager – Patch Manager to help automate the task of keeping your operating systems up to date. This can be done on a regular basis using a Systems Manager Maintenance Task.
The initial step would be to make sure that the SSM agent (formally known as Simple Systems Manager) is installed on the machines that you want to stay up to date with patching.
Next, you would create a patching baseline, which includes rules for auto-approving patches within days of their release, as well as a list of both approved and rejected patches.
After that, you may need to modify the IAM role on the instance to make sure that the SSM service has the correct permissions.
Optionally, you can set up patch management groups. In the preceding diagram, we can see that we have two different types of servers, and they are both running on the same operating system. However, they are running different functions, so we would want to set up one patching group for the Linux servers and one group for the Database servers. The Database servers may only get critical patches, whereas the Linux servers may get the critical patches as well as the update patches.
Next is the Security pillar of the AWS Well-Architected Framework. Today, security is at the forefront of everyone's minds. Bad actors are consistently trying to find vulnerabilities in any code and infrastructure (both on-premises and in the cloud). When looking back at the lessons learned from the first 10 years of AWS, CTO Werner Vogels said Protecting your customers should always be your number one priority… And it certainly has been for AWS. (Vogels, 2016)
It is everyone's job these days to have secure practices across all cloud systems. This (protection) includes the infrastructure and networking components that serve the application and using secure coding practices and data protection, ultimately ensuring that the customer has a secure experience.
When you think about security, there are four main areas that the security pillar focuses on. They are shown in the following diagram:
Figure 1.2 – The four main areas of security in the security pillar
The security pillar is constructed of seven principles:
Implementing a strong identity foundationEnabling traceability Applying security at all layers Automating security best practices Protecting data in transit and at rest Keeping people away from data Preparing for security eventsAs we move through this book, you will find practical answers and solutions to some of the security principles introduced here in the security pillar. This will help you develop the muscle memory needed to instill security in everything you build, rather than putting your piece out there and letting the security team worry about it. Remember, security is everyone's responsibility. Initially, we will look at these security principles in a bit more detail.
When building a strong identity foundation, it all starts with actualizing the principle of least privilege. No user or role should have more or less permissions than it actually needs to perform its job or duties. Taking this a step further, if you are using IAM to manage your users, then ensure that a password policy is in place to confirm that passwords are being rotated on a regular basis, and that they don't become too stale. It is also a good idea to check that the IAM password policy is in sync with your corporate password policy.
Also, as your organization grows and managing users and permissions starts to become a more complex task, you should look to establish central identity management either with Amazon Single Sign-on or by connecting a corporate Active Directory server.
Security events can leave you in a reactive state; however, your ability to react can rely on the amount of information you can gather about the event. Putting proper monitoring, logging, alerting, and the ability to audit your environments and systems in place before an event happens is crucial to being able to perform the correct assessments and steps, when the need arises.
Capturing enough logs from a multitude of sources can be done with AWS services such as CloudWatch Logs, VPC Flow Logs, CloudTrail, and others. We will look at logging and monitoring extensively in Part 3 of this book as it is important to the DevOps Professional exam.
Think about the following scenario:
Someone has gained access to a server via a weak password and compromised some data. You feel that you are currently capturing many logs; however, would you be able to figure out the following?
The username used to access the systemThe IP address that was used where the access originatedThe time access was startedThe records that were changed, modified, or deletedHow many systems were affectedSecuring all the levels of your environment helps protect you by giving your actions an extra expansiveness throughout your environment. To address network-level security, different VPCs can be secured using simple techniques such as Security Groups and Network ACLs. Seasoned AWS professionals know that additional security layers add an expansive security footprint – for example, at the edge (network access points to the AWS cloud), at the operating system level, and even making a shift left to secure the application code itself.
As you and your team get more educated about secure practices in the cloud, repetitive tasks should become automated. This allows you to react quicker to events that are happening and even react when you don't realize when things are happening.
This should be a topic when you start to dive in headfirst. As a DevOps specialist, you are used to taking repetitive manual processes and making them more efficient with automation. Automation can take the form of automatically analyzing logs, removing or remediating resources that don't comply with your organization's security posture, and intelligently detecting threats.
Amazon Web Services has come out with tools to help with this process, including GuardDuty, CloudWatch, EventHub, and AWS Config.
Bad actors are all around, constantly looking for exploitable data that is traveling across the internet unprotected. You definitely can't rely on end users to use best practices such as secure communications over VPN, so it is up to you and your team to put the best practices in place on the server side. Basic items such as implementing certificates on your load balancers, on your CloudFront distribution, or even at the server level allows transmissions to be encrypted while going from point to point.
On the same token, figuratively speaking, making sure that you authenticate network communications either by enabling Transport Layer Security (TLS) or IPsec at the protocol layer helps ensure that network communications are authenticated.
There are AWS services to help protect your data, both in transit and at rest, such as AWS Certificate Manager, AWS Shield, AWS Web Application Firewall (the other WAF), and Amazon CloudFront. The Key Management Service (KMS) can also help protect your data at rest by allowing you to create, use, and rotate cryptographic keys easily.
For a deeper look at protecting data in transit and at rest, see Chapter 19, Protecting Data in Flight and at Rest.
There are ways to automate how data is accessed, rather than allowing individuals to directly access the data. It is a better idea to have items that can be validated through a change control process. These would be items, such as System Manager runbooks or Lambda Functions, that would access the data. The opposite of this would be allowing direct access to data sources through either a bastion host or Elastic IP address/CNAME.
Providing this direct access can either lead to human mistakes or having a username and password compromised, which will ultimately lead to data loss or leakage.
Even if you enact all the security principles described previously, there is no guarantee that a security event won't be coming in the future. You are much better off practicing and having a prepared set of steps to enact quickly in case the need ever arises.
You may need to create one or more runbooks or playbooks that outline the steps of how to do things such as snapshotting an AMI for forensic analysis and moving it to a secured account (if available). If the time comes when these steps are necessary, there will be questions coming from many different places. The answers will have a timeline aspect to them. If the team whose responsibility is to perform these duties has never even practiced any of these tasks, nor has a guide been established to help them through the process, then valuable cycles will be wasted, just trying to get organized.
The following is the Shared Responsibility Model between AWS and you, the customer:
Figure 1.3 – The AWS shared responsibility model
Questions to ask
* How do you protect your root account?
- Is there a Multi-Factor Authentication (MFA) device on the root account?
- Is there no use of the root account?
* How do you assign IAM users and groups?
* How do you delegate API/CLI access?
Next, let's learn about the five design principles for reliability in the cloud.
There are five design principles for reliability in the cloud:
Automating recover from failure Testing recovery procedures Scaling horizontally to increase aggregate workload availability Stopping guessing capacity Managing changes in automationWhen you think of automating recovery from failure, the first thing most people think of is a technology solution. However, this is not necessarily the context that is being referred to in the reliability service pillar. These points of failure really should be based on Key Performance Indicators (KPIs) set by the business.
As part of the recovery process, it's important to know both the Recovery Time Objective (RTO) and Recovery Point Objective (RPO) of the organization or workload:
RTO (Recovery Time Objective): The maximum acceptable delay between the service being interrupted and restored RPO (Recovery Point Objective): The maximum acceptable amount of time since the last data recovery point (backup) (Amazon Web Services, 2021)In your cloud environment, you should not only test your workload functions properly, but also that they can recover from single or multiple component failures if they happen on a service, Availability Zone, or regional level.
Using practices such as Infrastructure as Code, CD pipelines, and regional backups, you can quickly spin up an entirely new environment. This could include your application and infrastructure layers, which will give you the ability to test that things work the same as in the current production environment and that data is restored correctly. You can also time how long the restoration takes and work to improve it by automating the recovery time.
Taking the proactive measure of documenting each of the necessary steps in a runbook or playbook allows for knowledge sharing, as well as fewer dependencies on specific team members who built the systems and processes.
When coming from a data center environment, planning for peak capacity means finding a machine that can run all the different components of your application. Once you hit the maximum resources for that machine, you need to move to a bigger machine.
As you move from development to production or as your product or service grows in popularity, you will need to scale out your resources. There are two main methods for achieving this: scaling vertically or scaling horizontally:
Figure 1.4 – Horizontal versus vertical scaling
One of the main issues with scaling vertically is that you will hit the ceiling at some point in time, moving to larger and larger instances. At some point, you will find that there is no longer a bigger instance to move up to, or that the larger instance will be too cost-prohibitive to run.
