35,99 €
It’s easy to learn and deploy resources in Microsoft Azure, without worrying about resource optimization. However, for production or mission critical workloads, it’s crucial that you follow best practices for resource deployment to attain security, reliability, operational excellence and performance. Apart from these aspects, you need to account for cost considerations, as it’s the leading reason for almost every organization’s cloud transformation.
In this book, you’ll learn to leverage Microsoft Well-Architected Framework to optimize your workloads in Azure. This Framework is a set of recommended practices developed by Microsoft based on five aligned pillars; cost optimization, performance, reliability, operational excellence, and security. You’ll explore each of these pillars and discover how to perform an assessment to determine the quality of your existing workloads. Through the book, you’ll uncover different design patterns and procedures related to each of the Well-Architected Framework pillars.
By the end of this book, you’ll be well-equipped to collect and assess data from an Azure environment and perform the necessary upturn of your Azure workloads.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 364
Veröffentlichungsjahr: 2023
Leverage the Well-Architected Framework to boost performance, scalability, and cost efficiency
Rithin Skaria
BIRMINGHAM—MUMBAI
Copyright © 2023 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Preet Ahuja
Publishing Product Manager: Preet Ahuja
Senior Editor: Shruti Menon
Technical Editor: Irfa Ansari
Copy Editor: Safis Editing
Project Coordinator: Ashwin Kharwa
Proofreader: Safis Editing
Indexer: Hemangini Bari
Production Designer: Ponraj Dhandapani
Marketing Coordinator: Rohan Dobhal
First published: July 2023
Production reference: 1110723
Published by Packt Publishing Ltd.
Grosvenor House
11 St Paul's Square
Birmingham
B3 1RB, UK.
ISBN 978-1-83763-292-3
www.packtpub.com
Cloud computing has revolutionized the way enterprises think about the IT infrastructure that enables their business needs. It has provided agility to businesses and enabled them to scale their operations and innovate at an unprecedented pace. However, while cloud computing has lent immense power to the hands of IT departments, it can be a liability if not used responsibly. As organizations move more workloads to the cloud, they face new challenges in managing their cloud costs.
This book is a comprehensive guide to cloud optimization using the Well-Architected Framework. While it is written in the context of Microsoft Azure, the principles articulated can be extended to any hyperscale public cloud. This book provides you with practical advice on how to optimize your cloud environment while maintaining cost, performance, reliability, operational excellence, and security. Rithin has done an excellent job of distilling complex concepts into easy-to-understand language. This book is a must-read for anyone who wants to get the most out of their cloud investment.
Rithin is a great colleague and it was an honor when he reached out to me for an opportunity to preview this book and write this foreword. I am confident that it will be a valuable resource for anyone who is looking to optimize their cloud environment.
Jatinder Pal Singh
Director – Solutions Architecture, Microsoft Qatar
Rithin Skaria is a prominent supporter of cloud technologies, in addition to his roles as a speaker, consultant, and published author with a specialization in the design and enhancement of cloud architecture. He has spent over a decade managing, implementing, and designing IT infrastructure solutions for public and private clouds. At present, he works with Microsoft Qatar as a cloud solution architect, placing particular emphasis on Azure solutions. Rithin holds an impressive array of over 18 certifications in diverse technologies such as Azure, Linux, Microsoft 365, and Kubernetes, and he is a Microsoft Certified Trainer. His substantial contributions to the Microsoft worldwide Open Source community have earned him recognition as one of its engagement leaders. He has also spoken at several events and conferences, including Microsoft Spark.
My heartfelt thanks to my family, manager, colleagues at Microsoft, and everyone who provided their unwavering support and guidance on this journey of writing this book.
Suraj S. Pujari is a cloud solution architect at Microsoft India with more than 13 years of experience in IT. His technical capabilities span helping customers with digital transformation, migration, solution designing, and modernizing on the cloud. He works with a wide range of small, medium, and large businesses in the banking and manufacturing domains. In his free time, he likes to do yoga and play with his little one.
I would like to thank my family and friends for all the support that I have received while staying up late and doing reviews of this book, and finally, the Packt team for giving me this opportunity and bearing with me throughout the process.
Microsoft has different frameworks designed for every stage of your cloud transformation journey. This section comprises two chapters. In the first chapter, we will go through the fundamentals of the Well-Architected Framework, and in the second chapter, we will cover the differences between the Well-Architected Framework and the Cloud Adoption Framework. Understanding the differences between the frameworks will help you make the right design decisions on your cloud transformation journey. This part contains the following chapters:
Chapter 1, Planning Workloads with the Well-Architected FrameworkChapter 2, Distinguishing between the Cloud Adoption Framework and Well-Architected FrameworkMicrosoft has different frameworks nurtured for Azure; prominent ones are the Cloud Adoption Framework (CAF) and the Well-Architected Framework (WAF). There are other frameworks that are subsets of these prominent ones. In this book, we will be covering the WAF and its five pillars.
Important note
Do not get confused with the Web Application Firewall in Azure, which is also often denoted as WAF. If you see any reference to WAF in this book, that is the Well-Architected Framework.
Just to give you a quick introduction, the WAF deals with a set of best practices and guidelines developed by Microsoft for optimizing your workloads in Azure. As described in the opening paragraph, this framework has five pillars, and the optimization is aligned with these pillars. Let’s not take a deep dive into these pillars at this point; nevertheless, we will certainly cover all aspects of the five pillars as we progress. Further, we will cover the elements of the WAF. When we discuss elements, we will talk about cloud design patterns. This is a lengthy topic, and it’s recommended that you refer to the Cloud Design Patterns documentation (https://docs.microsoft.com/en-us/azure/architecture/patterns/) if you are new to this topic. You will see the pattern names coming up when we discuss design principles, but as patterns are out of scope for this book, we will not take a deep dive into this topic.
In this chapter, we will learn why there is a need for the WAF, its pillars, and its elements.
Microsoft Azure has incredible documentation that can help any beginner to deploy their first workload in Azure. With the help of this well-planned documentation and tutorials, deployment is not a tedious task. Now, the question is: Are these workloads optimized or running in the best shape?
When it comes to optimizing, some considerations include the following:
What is the cost of running this workload?What is the business continuity (BC) and disaster recovery (DR) strategy?Are the workloads secured from common internet attacks?Are there any performance issues during peak hours?These are some common considerations related to optimization. Nonetheless, considerations may vary from workload to workload. We need to understand the best practices and guidelines for each of our workloads, and if it’s a complex solution, then finding the best practices for each service can be a weighty task. This is where the Microsoft Azure WAF comes into the picture.
Quoting Microsoft’s documentation: “The Azure Well-Architected Framework is a set of guiding tenets that can be used to improve the quality of a workload.”
While some organizations have already completed their cloud adoption journey, others are still in the transition and early stages. As the documentation states, this framework is a clear recipe for improving the quality of mission-critical workloads we migrate to the cloud. Incorporating the best practices outlined by Microsoft will produce a high-standard, durable, and cost-effective cloud architecture.
Now that we know the outcome of leveraging the WAF, let’s look at its pillars. The framework comprises five interconnected pillars of architectural excellence, as follows:
Cost optimizationOperational excellencePerformance efficiencyReliabilitySecurityThe assessment of the workload will be aligned with these pillars, and the pillars are interconnected. Let’s take an example to understand what interconnected means.
Consider the case of a web application running on a virtual machine (VM) scale set. We can improve the performance by enabling autoscaling so that the number of instances is increased automatically whenever there is a performance bottleneck. On the other hand, when we enable autoscaling, we are only using the extra compute power whenever we need it; this way, we only pay for the extra instances at the time of need, not 24x7.
As you can see in this scenario, both performance and cost optimization are achieved by enabling autoscaling. Similarly, we can connect these pillars and improve the quality of the workload. Nonetheless, there will be trade-offs as well—for example, trying to improve reliability will increase the cost; we will discuss this later in this book.
Let’s take a closer glimpse of these pillars in the next section.
As you read in the previous section, Microsoft has divided its optimization plans, targeting five pillars of architectural excellence. Even though we have dedicated chapters for each of the pillars, for the time being, let’s cover some key concepts related to each of the pillars.
The following figure shows the five pillars of the WAF:
Figure 1.1 – The five pillars of the WAF
We will start with the first pillar, cost optimization.
One of the main reasons for organizations to adopt the cloud is its cost-effectiveness. The total cost of ownership (TCO) is much less in the cloud as the end customer doesn’t need to purchase any physical servers or set up data centers. Due to the agility of the cloud, they can deploy, scale, and decommission as required. With the help of the Azure TCO calculator (https://azure.microsoft.com/en-us/pricing/tco/calculator/), customers can estimate cost savings before migrating to Azure. Once they are migrated, the journey doesn’t end there; migrations mostly go with the lift-and-shift strategy where the workloads are deployed with a similar size as on-premises. The challenge here is that with on-premises, there is no cost for individual VMs or servers as the customer will make a capital investment and purchase the servers. The only cost will be for licensing, maintenance, electricity, cooling, and labor. In the case of Azure, the cost will be pay-as-you-go; for n number of hours, you must pay n times the per-hour cost, and the price of the server varies with size and location. If the servers were wrongly sized on-premises, then during the migration we will replicate that mistake in the cloud. With the servers running underutilized, you are paying extra every hour, every day, and every month. For this reason, we need cost optimization after migration.
It’s recommended that organizations conduct cost reviews every quarter to understand anomalies, plan the budget, and forecast usage. With the help of cost optimization, we will find underutilized and idle resources, often referred to as waste, and eliminate them. Eliminating this waste will improve the cost profile of your workloads and result in cost savings. In Chapter 3, Implementing Cost Optimization, we will assess a demo Azure environment and see how we can develop a remediation plan. Once we figure out the weak points in our infrastructure, we can resize, eliminate, or enforce policies for cost optimization.
Operations and procedures required to run a production application are covered by operational excellence. When we are deploying our applications to our resources, we need to make sure that we have a reliable, predictable, and repeatable process for deployment. In Azure, we can automate the deployment process, which will eliminate any human errors. Bug fixes can be easily deployed if we have a fast and reliable deployment. Most importantly, whenever there is an issue post-deployment, we can always roll back to the last known good configuration.
In Chapter 4, Achieving Operational Excellence, we will learn about key topics related to operational excellence. For the time being, let’s name the topics and explore them later. The key topics are application design, monitoring, app performance management, code deployment, infrastructure provisioning, and testing.
Operational excellence mainly concentrates on DevOps patterns for application deployment and processes related to deployment. This includes guidance on application design and the build process, as well as automating deployments using DevOps principles.
As we saw in the case of cost optimization, we scale the workloads to meet demand with the help of autoscaling; this ability to scale is what we cover in the performance efficiency pillar. In Azure, we can define the minimum number of instances that are adequate to run our application during non-peak hours. During peak hours, we can define an autoscaling policy by which the number of instances can be increased. The increase can be controlled by a metric (CPU, memory, and so on) or a schedule. Nevertheless, we can also define the maximum number of instances to stop scaling after a certain number to control billing. To be honest, this autoscaling scenario was not at all possible before the cloud. Earlier, administrators used to create oversized instances that could handle both peak and non-peak hours. But with Azure, this has changed; the advantage here is that Azure will collect all metrics out of the box, and we can easily figure out bottlenecks.
Proper planning is required to define the scaling requirements. In Azure, how we define scaling varies from resource to resource. Some resource tiers don’t offer autoscaling and you must go with manual scaling, while others don’t support both automatic and manual scaling. One thing to note here is performance efficiency is not only about autoscaling, but it also includes data performance, content delivery performance, caching, and background jobs. Thus, we can infer that this pillar deals with the overall performance efficiency of our application.
In Chapter 5, Improving Applications with Performance Efficiency, we will take a deep dive into performance patterns, practices, and performance checklists.
The word “reliability” means consistent performance and, in this context, it means redundant operation of the application. When we build and deploy our applications in Azure, we need to make sure that they are reliable. In our on-premises environment, we use different redundancy techniques to make sure that our application and data are available even if there is a failure. For example, we use Redundant Array of Independent Disks (RAID) on-premises, where we replicate the data using multiple disks to increase data reliability.
In Azure or any other cloud, the first and foremost thing we need to admit is that there are chances of failure and it’s not completely failproof. Keeping this in mind, we need to design our applications in a reliable manner by making use of different cloud features. Incorporating these techniques in the design will avoid a single point of failure (SPOF).
The level of reliability is often driven by the service-level agreement (SLA) required by the application or end users. For example, a single VM with a premium disk offers 99.9% uptime, but if a failure happens on the host server in the Azure data center, your VM will face downtime. Here, we can leverage availability sets or availability zones, which will help you deploy multiple VMs across fault domains/update domains or zones. By doing so, the SLA can be increased to 99.95% for availability sets and 99.99% for availability zones. Always keep in mind that to get this SLA, you need to have at least two VMs deployed across the availability sets or zones. Earlier, we read that the pillars of the WAF are interconnected, and they work hand in hand. However, in this case, if you want to increase reliability, you need to deploy multiple instances of your application, and what that essentially means is your costs will increase. Remember that these pillars work hand in hand, and sometimes there will be trade-offs, as we have seen in this scenario.
Security in public clouds was—and is always—a concern for enterprise customers because of the complexity and the way attackers are coming up with new types of attacks. Coping with these types of attacks is always a challenge, and finding the right skills to mitigate these attacks is not easy for organizations. In Azure, we follow the shared responsibility model; the model defines the responsibilities of Microsoft and its customers based on the technology. If we take Infrastructure-as-a-Service (IaaS) solutions such as VMs, more responsibility is with the customer, and Microsoft is responsible for the security of the underlying infrastructure. The levels of responsibilities will shift more to Microsoft if you choose a Platform-as-a-Service (PaaS) solution.
It’s very important to leverage the different security options provided by Azure to improve the security of our workloads. In the security pillar, we will assess the workloads and make sure they align with the security best practices outlined by Microsoft. As we progress, in Chapter 7, Leveraging the Security Pillar, we will take a holistic approach to security and how to build secure applications.
Cost optimization, operational excellence, performance efficiency, reliability, and security are the five pillars of the WAF. When it comes to the elements of the WAF, this is different from the pillars. If we place the WAF in the center, then we have six supporting elements. These elements support the pillars with the principles and datasets required for the assessment.
As you know, the WAF is a set of best practices developed by Microsoft; these best practices are further categorized into five interconnected pillars. Now, the question is: Where exactly are these best practices inscribed? In other words, the practices should be developed first before we can categorize them into different pillars. This is where the elements come into the picture. The elements act as a stanchion for the pillars.
As per Microsoft’s documentation, the supporting elements for the WAF are the following:
Azure Well-Architected ReviewAzure AdvisorDocumentationPartners, support, and service offersReference architectureDesign principlesNow, we will see the explanation of each of these elements. Let’s start with the Azure Well-Architected Review.
Assessment of the workload is required for the creation of the remediation plan; the assessment is inevitable. In the Well-Architected Review, there will be a set of questions prepared by Microsoft to understand the processes and practices in your environment. There will be a separate questionnaire for each pillar of the WAF. For example, the questionnaire for cost optimization will contain questions related to Azure Reserved Instances, tagging, Azure Hybrid Benefit, and so on. Meanwhile, the operational excellence questionnaire will have questions related to DevOps practices and approaches. There will be different possible answers to these questions, varying from recommended methods to non-recommended methods. Customers can answer based on their environment, and the system will generate a plan with recommendations that can be implemented to make their environment aligned with the WAF.
The review can be taken by anyone from the Microsoft Assessments portal (https://docs.microsoft.com/en-us/assessments/?mode=home). In the portal, you must select Azure Well-Architected Review, as shown in the following screenshot:
Figure 1.2 – Accessing Microsoft Assessments
Once you select Azure Well-Architected Review, you will be presented with a popup asking whether you want to create a new assessment or create a milestone. If you want to create a new assessment, then you can go for New Assessment, or choose Create a milestone for an existing assessment. At this point, we will conduct an assessment; nevertheless, each pillar of the WAF has its own dedicated chapter, and we will perform the assessment there.
With that, we will move on to the next element of the framework, which is Azure Advisor.
If you have worked on Microsoft Azure, you will know that Azure Advisor is the personalized cloud consultant developed by Microsoft for you. Azure Advisor can generate recommendations for you, and you can leverage this tool to improve the quality of workloads. Looking at Figure 1.3, we can see that the recommendations are categorized into different groups, and the group names are the same as the pillars of the WAF:
Figure 1.3 – Azure Advisor
With the help of Azure Advisor, you can do the following:
Get best practices and recommendations aligned to the pillars of the WAFEnhance the cost optimization, performance, reliability, and operational excellence of workloads using actionable recommendations, thus improving the quality of the workloadsPostpone recommendations if you don’t want to act immediatelyAdvisor has a score based on the number of actionable recommendations; this score is called Advisor Score. If the score is lower than 100%, that means there are recommendations, and we need to remediate them to improve the score. As you can see in Figure 1.3, the Advisor Score total for the environment is 81%, and the Score by category values are on the right side.
The good thing about Azure Advisor is that recommendations will be generated as soon as you start using the subscription. You don’t have to deploy any agents, make any additional configurations, or pay to use the Advisor service. The recommendations are generated with the help of machine learning (ML) algorithms based on usage, and they will also be refreshed periodically. Advisor can be accessed from the Azure portal, and it has a rich REST API if you prefer to retrieve the recommendations programmatically and build your own dashboard.
In the coming chapters, we will be relying a lot on Azure Advisor for collecting recommendations for each of the pillars.
Now that we have covered the second element of the WAF, let’s move on to the next one.
Microsoft’s documentation has done an excellent job of helping people who are new to Azure. All documentation related to the WAF is documented at https://docs.microsoft.com/en-us/azure/architecture/framework/. As a matter of fact, this book is a demystified version of this documentation with additional examples and real-world scenarios.
As with all documentation, the WAF documentation is lengthy and refined, but for a beginner, the amount of information in the documentation can be overwhelming. This book distills the key insights and essentials from the documentation, providing you with everything you need to get started. The following screenshot shows the documentation for the framework:
Figure 1.4 – WAF documentation
As you can see in the preceding screenshot, the contents are organized according to the pillars, and finally, the documentation is concluded with steps to implement the recommendations. You could call this the Holy Bible of WAF. Everything related to the WAF is found in this documentation and we would strongly recommend bookmarking the link to stay updated.
All documentation for Azure is available at https://docs.microsoft.com/en-us/azure/?product=popular. The documentation covers how to get started, the CAF, and the WAF, and includes learning modules and product manuals for every Azure service. Apart from the documentation, this site offers sample code, tutorials, and more. Regardless of the language you write your code in, Azure documentation provides SDK guides for Python, .NET, JavaScript, Java, and Go. On top of that, documentation is also available for scripting languages such as PowerShell, the Azure CLI, and infrastructure as code (IaC) solutions such as Bicep, ARM templates, and Terraform.
Deploying complex solutions by adhering to the best practices can be challenging for new customers. This is where we can rely on Microsoft partners. The Microsoft Partner Network (MPN) is massive, and you can leverage Azure partners for technical assistance and support to empower your organization. You can find Azure partners and Azure Expert Managed Service Providers (MSPs) at https://azure.microsoft.com/en-us/partners/. MSPs can aid with automation, cloud operations, and service optimization. You can also seek assistance for migration, deployment, and consultation. Based on the service you are working with and the region you belong to, you can find a partner with the required skills closer to you.
Once the partner deploys the solution, there will be break-fix issues that you need assistance with. Microsoft Support can help you with any break-fix scenarios. For example, if one of your VMs is unavailable or a storage account is inaccessible, you can open a support request. Billing and subscription support is free of cost and does not require you to purchase any support plans. However, for technical assistance, you need to purchase a support plan. A quick comparison of these plans is shown in the following table:
Basic
Developer
Standard
ProDirect
Price
Free
$29/month
$100/month
$1,000/month
Scope
All Azure customers
Trial and non-productionenvironments
Production workloads
Mission-critical workloads
Billing support
Yes
Yes
Yes
Yes
Number of support requests
Unlimited
Unlimited
Unlimited
Unlimited
Technical support
No
Yes
Yes
Yes
24/7 support
N/A
During business hours via email only
Yes (email/phone)
Yes (email/phone)
Table 1.1 – Comparison of Azure support plans
A full comparison is available at https://azure.microsoft.com/en-us/support/plans/. Basic support can only open Severity C cases with Microsoft Support. In order to open Severity B or Severity A cases, you must have a Standard or ProDirect plan. Severity C has an SLA of 8 business hours and is recommended for issues with minimal business impact, while Severity B is for moderate impact with an SLA of 4 hours. If the case opened is a Severity A case, then the SLA is 1 hour. Severity A is reserved for critical business impact issues where production is down. Having a ProDirect plan offers extra perks to customers, such as training, a dedicated ProDirect manager, and operations support. The ProDirect plan also has a Support API that customers can use to create support cases programmatically. For example, if a VM is down, by combining the power of Azure alerts and action groups, we can make a call to the Support API to create a request automatically.
In addition to these plans, there is a Unified/Premier contract that is above the ProDirect plan and is ideal for customers who want to cover Azure, Microsoft 365, and Dynamics 365. Microsoft support is available in English, Spanish, French, German, Italian, Portuguese, traditional Chinese, Korean, and Japanese to support global customers. Keep in mind that the plans cannot be transferred from one customer to another. Based on your requirement, you can purchase a plan and you will be charged every month.
Service offers deal with different subscription types for customers. There are different types of Azure subscriptions having different billing models. A complete list of available offers is listed at https://azure.microsoft.com/en-in/support/legal/offer-details/. When it comes to organizations, the most common options are Enterprise Agreement (EA), Cloud Solution Provider (CSP), and Pay-As-You-Go; these are commercial subscriptions. Organizations deploy their workloads in these subscriptions, and they will be charged based on consumption. How they get charged depends solely on the offer type. For example, EA customers make an upfront payment and utilize the credits for Azure; any charges above the credit limit will be invoiced as an overage. Both Pay-As-You-Go and CSP will get monthly invoices. In CSP, an invoice will be generated by the partner; however, in Pay-As-You-Go, the invoice comes directly from Microsoft.
There are other types of subscriptions used for development, testing, and learning purposes, such as Visual Studio subscriptions, Azure Pass, Azure for Students, the Free Trial, and so on. However, these are credit-based subscriptions, and they are not backed up by the SLAs. Hence, these cannot be used for hosting production workloads.
The next element we are going to cover is reference architecture.
If you know coding, you might have come across a scenario where you are not able to resolve a code error and you find the solution from Stack Overflow or some other forum. Reference architecture serves the same purpose, whereby Microsoft provides guidance on how the architecture should be implemented. With the help of reference architecture, we can design scalable, secure, reliable, and optimized applications by taking a defined methodology.
Reference architecture is part of the application architecture fundamentals. The application architecture fundamentals comprise a series of steps where we will decide on the architecture style, technology, architecture, and—finally—alignment with the WAF. This will be used for developing the architecture, design, and implementation. The following diagram shows the series of steps:
Figure 1.5 – Application architecture fundamentals
In the preceding diagram, you can see that the first choice is the architectural style, and this is the most fundamental thing we must decide on. For example, we could take a three-tier application approach or go for microservices architecture.
Once that’s decided, then the next decision is about the services involved. Let’s say your application is a three-tier application and has a web frontend. This frontend can be deployed in Azure Virtual Machines, Azure App Service, Azure Container Instances, or even Azure Kubernetes Service (AKS). Similarly, for the data store, we can decide whether we need to go for a relational or non-relational database. Based on your requirements, you can select from a variety of database services offered by Microsoft Azure. Likewise, we can also choose the service that will host the mid-tier.
After selecting the technology, we need to choose the application architecture. This is the stage at which we decide how the architecture is going to be in the following stages and select the style and services we are going to use. Microsoft has several design principles and reference architectures that can be leveraged in this stage. We will cover the design principles in the next section.
The reference architectures can be accessed from https://docs.microsoft.com/en-us/azure/architecture/browse/?filter=reference-architecture, and this is a good starting point to begin with the architecture for your solution. You might get an exact match as per your requirement; nevertheless, we can tweak these architectures as required. Since these architectures are developed by Microsoft by keeping the WAF pillars in mind, you can deploy with confidence as these solutions are scalable, secure, and reliable. The following screenshot shows the portal for viewing reference architectures:
Figure 1.6 – Browsing reference architectures
The portal offers filtering on the type of product and categories. From hundreds of reference diagrams, you can filter and find the one that matches your requirements. For example, a simple search for 3d video rendering returns two reference architectures, as shown in the following screenshot:
Figure 1.7 – Filtering reference architectures
Clicking on the reference architecture takes you to a complete explanation of the architecture components, data flow, potential use cases, considerations, and best practices aligned with the WAF. The best part is you will have the Deploy to Azure button, which lets you directly deploy the solution to Azure. The advantage is the architecture is already aligned with the WAF and you don’t have to spend time assessing the solution again.
With that, let’s move on to the last element of the WAF—design principles.
In Figure 1.5, we saw that reference diagrams and design principles are part of the third stage of application architecture fundamentals. In the previous section, we saw how we can use the reference architecture, and now we will see how to leverage the design principles. There are 11 design principles you should incorporate into your design discussions. Let’s understand each of the design principles.
As with on-premises, failures can happen in the cloud as well. We need to acknowledge this fact; the cloud is not a silver bullet for all the issues that you faced on-premises but does offer massive advantages compared to on-premises infrastructure. The bottom line is failures can happen, hardware can fail, and network outages can happen. While designing our mission-critical workloads, we need to anticipate this failure and design for healing. We can take a three-branched approach to tackle the failure:
Track and detect failuresRespond to failures using monitoring systemsLog and monitor failures to build insights and telemetryThe way you want to respond to failures will entirely depend on your services and the availability requirements. For example, you have a database and would like to failover to a secondary region during the primary region failover. Setting up this replication will sync your data to a secondary region and failover whenever the primary region fails to serve the application. Keep in mind that replicating data to another region can be more expensive than having a database with a single region.
Regional outages are generally uncommon, but while designing for healing, you should also consider this scenario. Your focus should be on handling hardware failures, network outages, and so on because they are very common and can affect the uptime of your application. There are recommendations provided by Microsoft on how to design for healing—these are called design patterns. The recommended patterns are presented here:
Circuit breakerBulkheadLoad levelingFailoverRetryAs mentioned at the beginning of this chapter, design patterns are not within the scope of this book. Again, thanks to Microsoft, all patterns are listed at https://docs.microsoft.com/en-us/azure/architecture/patterns/. Let’s move on to the next design principle.
SPOFs in architecture can be eliminated by having redundancy. Earlier, we discussed RAID storage in the Reliability subsection of the What are the pillars of the WAF? section, where multiple disks are used to improve data redundancy. Azure has different redundancy options based on the service that you are using. Here are some of the recommendations:
Understand the business requirements: Redundancy is directly proportional to complexity and cost, and not every solution requires you to set up redundancy. If your business demands a higher level of redundancy, be prepared for the cost implications and complexity, and the demand should be justifiable. If not, you will end up with a higher cost than you budgeted for.Use a load balancer: A single VM is a SPOF and is not recommended for hosting mission-critical workloads. Instead, you need to deploy multiple VMs and place them behind a load balancer. On top of that, you can consider deploying the VMs across multiple availability zones for improved SLAs and availability. Once the VMs are behind the load balancer, with the help of health probes we can verify if the VM is available or not before routing the user request to the backend VM.Database replication: PaaS solutions such as Azure SQL Database and Cosmos DB have out-of-the-box replication within the same region. In addition to that, you can replicate the data to another region with the help of the geo-replication feature. If the primary region goes down, the database can failover to the secondary region for any read or write requests.Database partitioning: With the help of database partitioning, we can improve the scalability as well as the availability of the data. If one shard goes down, only a subset of total transactions will be affected; meanwhile, other shards are still reachable.Multi-region deployment: Regional outages are uncommon; however, we need to account for regional failure as well based on the application requirements. Deploying the infrastructure to multiple regions can help in improving