E-Book
36,59 €

Hands-On Linux for Architects E-Book

Denis Salamanca

0,0

36,59 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Explore practical use cases to learn everything from Linux components, and functionalities, through to hardware and software support

Key Features

Gain a clear understanding of how to design a Linux environment

Learn more about the architecture of the modern Linux operating system(OS)

Understand infrastructure needs and design a high-performing computing environment

Book Description

It is very important to understand the flexibility of an infrastructure when designing an efficient environment. In this book, you will cover everything from Linux components and functionalities through to hardware and software support, which will help you to implement and tune effective Linux-based solutions.

This book gets started with an overview of Linux design methodology. Next, you will focus on the core concepts of designing a solution. As you progress, you will gain insights into the kinds of decisions you need to make when deploying a high-performance solution using Gluster File System (GlusterFS). In the next set of chapters, the book will guide you through the technique of using Kubernetes as an orchestrator for deploying and managing containerized applications. In addition to this, you will learn how to apply and configure Kubernetes for your NGINX application. You'll then learn how to implement an ELK stack, which is composed of Elasticsearch, Logstash, and Kibana. In the concluding chapters, you will focus on installing and configuring a Saltstack solution to manage different Linux distributions, and explore a variety of design best practices. By the end of this book, you will be well-versed with designing a high-performing computing environment for complex applications to run on.

By the end of the book, you will have delved inside the most detailed technical conditions of designing a solution, and you will have also dissected every aspect in detail in order to implement and tune open source Linux-based solutions

What you will learn

Study the basics of infrastructure design and the steps involved

Expand your current design portfolio with Linux-based solutions

Discover open source software-based solutions to optimize your architecture

Understand the role of high availability and fault tolerance in a resilient design

Identify the role of containers and how they improve your continuous integration and continuous deployment pipelines

Gain insights into optimizing and making resilient and highly available designs by applying industry best practices

Who this book is for

This intermediate-level book is for Linux system administrators, Linux support engineers, DevOps engineers, Linux consultants or any open source technology professional looking to learn or expand their knowledge in architecting, designing and implementing solutions based on Linux and open source software. Prior experience in Linux is required.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 399

Veröffentlichungsjahr: 2019

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Hands-On Linux for Architects

Design and implement Linux-based IT solutions

Denis Salamanca

Esteban Flores

BIRMINGHAM - MUMBAI

Hands-On Linux for Architects

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor:Vijin BorichaAcquisition Editor: Rohit RajkumarContent Development Editor: Jordina DcunhaTechnical Editor:Mamta YadavCopy Editor:Safis EditingProject Coordinator:Nusaiba AnsariProofreader: Safis EditingIndexer: Tejal Daruwale SoniGraphics: Jisha ChirayilProduction Coordinator: Jyoti Chauhan

First published: April 2019

Production reference: 1270419

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78953-410-8

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the authors

Denis Salamanca is a technology enthusiast living in Costa Rica with his fiancée and step-son. He has been working in IT since he was 20 and has worked for the most influential and leading companies in the industry, including VMware, Microsoft, and Hewlett-Packard Enterprise. He currently holds more than 10 technical certifications across different fields, such as cloud, storage, Linux, Docker, and virtualization. He has also participated in the development of Linux certifications and is part of the CompTIA Linux Subject Matter Experts and Technical Advisory Committee.

His love for technology has driven him to work in different positions and fields across his career, and this has helped him to develop an understanding about the different points of view that a technical solution requires. Esteban Flores has been meddling with computers since he was 8 years old. His life as an IT expert began when he lost a lot of important data belonging to his family by saying he was "fixing the computer." He's worked for top-tier companies, including Hewlett-Packard Enterprise, VMware, Akamai, and Microsoft. With 10 years' experience, his passion for cutting-edge technology has driven him to work on different roles during his professional career. Storage technologies have always been his forte, focusing mostly on performance tuning and optimization. A photographer during his free time, he's been doing Linux-related things since his first job, finding amazement in its flexibility to run from a small laptop all the way up to the world's fastest supercomputers.

About the reviewer

Donald Tevault—but you can call him "Donnie"—has been working with Linux since way back in 2006. He's a professional Linux trainer, with the LPI Level 3 - Security and the GIAC Incident Handler certifications. Donnie is also a fellow Packt Publishing author, having published Mastering Linux Security and Hardening as his first book. He's the brains behind the BeginLinux Guru channel on YouTube, and works as a Linux consultant for the VDOO IoT security company.

I'd like to thank the good folk at Packt Publishing for giving me this opportunity. I'd also like to thank my cats for finally allowing me to get this done.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

I want to dedicate this book to my parents and my lovely fiancée, who have always supported me and given me their best, so I can not only reach my goals, but also achieve them successfully. Without them, nothing would have been possible. Thank you for always being there for me. -Denis Salamanca

I would like to dedicate this book to my grandmother, who unfortunately passed away during the development of the book and couldn't see the finished version. Even when she didn't even understand the language or what was being described, she always asked about its progress. -Esteban Flores

Title Page

Hands-On Linux for Architects

About Packt

Why subscribe?

Packt.com

Contributors

About the authors

About the reviewer

Packt is searching for authors like you

Dedication

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Conventions used

Get in touch

Reviews

Section 1: High-Performance Storage Solutions with GlusterFS

Introduction to Design Methodology

Defining the stages of solution design and why they matter

Analyzing the problem and asking the right questions

Technical standpoint

Business standpoint

Functional standpoint

Considering possible solutions

POC

Implementing the solution

Summary

Questions

Further reading

Defining GlusterFS Storage

Technical requirements

What is a cluster?

Computing a cluster

Storage clusters

What is GlusterFS?

SDS

Cost reduction

Scalability

Control

The market is moving toward SDS

Massive storage

Block, file, and object storage

Block storage

File storage

Object storage

Why choose GlusterFS?

GlusterFS features

Commodity hardware – GlusterFS runs on pretty much anything

GlusterFS can be deployed on private, public, or hybrid clouds

No single point of failure

Scalability

Asynchronous geo-replication

Performance

Self-healing

Flexibility

Remote direct memory access (RDMA)

Gluster volume types

Distributed

Replicated

Distributed replicated

Dispersed

Distributed dispersed

The need for highly redundant storage

Disaster recovery

RTO

RPO

Synchronous replication

Asynchronous replication

The need for high performance

Parallel I/O

Summary

Questions

Further reading

Architecting a Storage Cluster

Technical requirements

GlusterFS compute requirements

RAM

Why is cache important?

CPU

Cloud considerations

How much space do you need?

GlusterFS volume type

Distributed

Replicated

Dispersed

Space required by the application

Projected growth

Performance considerations

Throughput

Latency

IOPS

I/O size

GlusterFS performance

Volume type

Brick layout

Number of nodes

Tuning parameters

The best approach for high availability

Replicated

Dispersed

Geo-replication

How the workload defines requirements

Documentation

System tools

File type and size

Ask the right questions

Summary

Questions

Further reading

Using GlusterFS on the Cloud Infrastructure

Technical requirements

Setting up the bricks used for backend storage

Azure deployment

ZFS as the backend for the bricks

Installing ZFS

Configuring the zpools

Adding the ZFS cache to the pool (optional)

Installing GlusterFS on the nodes

Installing the packages

Creating the trusted pool

Creating the volumes

Creating a dispersed volume

Mounting the volume

Optimizing performance

GlusterFS tuning

ZFS

ARC

L2ARC

Summary

Questions

Further reading

Analyzing Performance in a Gluster System

Technical requirements

An overview of the implementation

An overview of the cluster

Performance testing

Performance theory

Performance tools

The ZFS zpool iostat command

iostat

The FIO tester

Availability testing

Scaling

Summary

Questions

Further reading

Section 2: High-Availablility Nginx Web Application Using Kubernetes

Creating a Highly Available Self-Healing Architecture

Microservices

Creating container images

FROM

LABEL

RUN

ENV

COPY

EXPOSE

CMD and ENTRYPOINT

Building container images using best practices

Container orchestration

Kubernetes

Summary

Questions

Further reading

Bibliography/sources

Understanding the Core Components of a Kubernetes Cluster

The Kubernetes control plane

The kube-apiserver

The kube-controller-manager

The kube-scheduler

The etcd database

Kubernetes worker nodes

Container runtime

The kubelet

The kube-proxy

Kubernetes objects

Pods – the basis of Kubernetes

Deployments

Services

Kubernetes and persistent storage

Volumes

Persistent Volumes, Persistent Volume Claims, and Storage Classes

Summary

Questions

Further reading

Architecting a Kubernetes Cluster

Kube-sizing

etcd considerations

kube-apiserver sizing

Worker nodes

Load balancer considerations

Storage considerations

Network requirements

Kubernetes DNS-based service discovery

Customizing kube objects

Namespacing

Limiting namespace resources

Customizing pods

Summary

Questions

Further reading

Deploying and Configuring Kubernetes

Infrastructure deployment

Installing Azure CLI

Configuring Azure CLI

High-level design overview

Provisioning network resources

Provisioning compute resources

Preparing the management VM

Generating certificates

Certificate authority

Client certificates

Control plane certificates

Sending our certificates home

Kubeconfigs

Installing kubectl

Control plane kubeconfigs

Kube-controller-manager

Kube-scheduler

Kubelet configs

Kube-proxy

Moving configs around

Installing the control plane

ETCD

Installing etcd

Encrypting etcd data

Installing the Kubernetes controller binaries

Kube-apiserver

Kube-controller-manager

Kube-scheduler

Starting the control plane

Setting RBAC permissions for kubelets.

Cluster role

Cluster role binding

Load-balancer setup

Creating the load-balancer

Azure load-balancer

The backend pool

Health probes

Load-balancing rules

Worker node setup

Downloading and preparing binaries

Adding the Kubernetes repository

Installing dependencies and kubectl

Downloading and storing worker binaries

Containerd setup

The kubelet

kube-proxy

Starting services

Kubernetes networking

Getting the nodes ready

Configuring remote access

Installing Weave Net

DNS server

Managing Kubernetes on the cloud

Summary

Questions

Further reading

Bibliography/sources:

Section 3: Elastic Stack

Monitoring with the ELK Stack

Technical requirements

Understanding the need for monitoring

Decisions made through historical data

Proactively detect problems

Understand environment performance

Plan for budget

Centralized logs

Elasticsearch overview

Fast

Scalable

Highly available

Logstash

Grok

Custom patterns

Kibana brings everything together

Summary

Questions

Further reading

Designing an ELK Stack

Technical requirements

Elasticsearch CPU requirements

CPU count

CPU speed

CPU performance impact

Startup

Index per second

Search latency

Recommendations

Test/dev

Production

Memory sizing for Elasticsearch

Filesystem cache

Disable swap

Undersizing memory

Unable start

OOM killer

Recommendations

Storage configuration for Elasticsearch

Capacity

Performance

Considerations

Logstash and Kibana requirements

Logstash

Kibana

Summary

Questions

Further reading

Using Elasticsearch, Logstash, and Kibana to Manage Logs

Technical requirements

Deployment overview

Installing Elasticsearch

The RPM repository

The Elasticsearch data directory

Partitioning the disk

Formatting the filesystem

Persistent mounting using fstab

Configuring Elasticsearch

Elasticsearch YAML

Cluster name

Discovery settings

Node name

Network host

Path settings

Starting Elasticsearch

Adding an Elasticsearch node

Installing Logstash and Kibana

Configuring Logstash

Logstash YAML

Logstash pipelines

Configuring Kibana

Kibana YAML

The coordinating node

Starting Logstash and Kibana

What are Beats?

Filebeat

Metricbeat

Let's not skip a beat – installing Beats

Configuring Beats clients

Filebeat YAML

Metricbeat YAML

Next steps

Summary

Questions

Further reading

Section 4: System Management Using Saltstack

Solving Management Problems with Salty Solutions

Centralizing system management

New technologies and system management

Recovering control of our own infrastructure

Centralized tools to disperse problems

Coding for a desired state

Understanding NaCl

Introducing Salt

The SaltStack platform

Salt capabilities

Remote command execution modules

The sys module

The pkg module

The test module  

Salt states

Grains of Salt

Salt pillars

Summary

Questions

Further reading

Getting Your Hands Salty

Hands-on with Salt

Scenario

Terraforming our initial infrastructure

Setting up Terraform

Creating IaC

Installing Salt with package managers

Installing CentOS yum 

Ubuntu apt-getting Salt

Installing Salt via the bootstrap script

Master and minion handshake

Working with Salt

Creating WebServer formulas

Creating load-balancing formulas

Summary

Design Best Practices

Designing for the occasion

On-premises environments

Bare metal server

Virtual machines

Cloud environments

The journey to the cloud

Assessing

Migrating

Lift and shift

Refactor

Rearchitecting

Rebuild

Optimizing

DevOps

Monolithic waterfalls

Agile solutions to monolithic problems

Continuous culture for CI/CD

Summary

Questions

Further reading

Assessments

Chapter 1: Introduction to Design Methodology

Chapter 2: Defining GlusterFS Storage

Chapter 3: Architecting a Storage Cluster 

Chapter 4: Using GlusterFS on Cloud Infrastructure

Chapter 5: Analyzing Performance in a Gluster System

Chapter 6: Creating a Highly Available Self-Healing Architecture

Chapter 7: Understanding the Core Components of a Kubernetes Cluster

Chapter 8: Architecting Kubernetes on Azure

Chapter 9: Deploying and Configuring Kubernetes

Chapter 10: Monitoring with ELK stack

Chapter 11: Designing an ELK Stack

Chapter 12: Using Elasticsearch, Logstash, and Kibana to Manage Logs

Chapter 13:  Solving Management Problems with Salty Solutions

Chapter 14: Designing a Salt Solution and Installing the Software

Chapter 15: Design Best Practices

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Welcome to Hands-On Linux For Architects, an in-depth look at what goes through the mind of an architect when dealing with Linux-based solutions. This book will help you achieve the level of knowledge required to architect and implement different IT solutions.

Additionally, it will show you the flexibility of open source software by demonstrating some of the most widely used products of the industry, presenting you with a solution and analyzing every aspect, from the very beginning of the design phase, all the way up to the implementation stage, where we will build, from the ground up, the infrastructure proposed in our design.

Delve inside the technical aspects of designing a solution, where we dissect every aspect with in-depth details to implement and tune open source Linux-based solutions.

Who this book is for

This book is aimed at Linux system administrators, Linux support engineers, DevOps engineers, Linux consultants, and any other type of open source technology professional looking to learn or expand their knowledge in architecting, designing, and implementing solutions based on Linux and open source software.

What this book covers

Chapter 1, Introduction to Design Methodology, kicks off the book by analyzing a proposed problem, as well as what the correct questions are to ask when designing a solution, in order to extract the necessary information to define the correct problem statement.

Chapter 2, Defining GlusterFS Storage, goes through what GlusterFS is and defines a storage cluster.

Chapter 3, Architecting a Storage Cluster, explores the design aspects of implementing a clustered storage solution using GlusterFS and its various components.

Chapter 4, Using GlusterFS on the Cloud Infrastructure, explains the configuration necessary to implement GlusterFS on the cloud.

Chapter 5, Analyzing Performance in a Gluster System, details the previously configured solution, explaining the configurations put in place, as well as testing the implementation for performance.

Chapter 6, Creating a Highly Available Self-Healing Architecture, talks about how the IT industry has evolved from using monolithic applications into cloud-native, containerized, highly available microservices.

Chapter 7, Understanding the Core Components of a Kubernetes Cluster, explores the core Kubernetes components, giving a view of each one and how they can help us solve our customer's problem.

Chapter 8, Architecting a Kubernetes Cluster, dives into the requirements and configurations for a Kubernetes cluster.

Chapter 9, Deploying and Configuring Kubernetes, goes into the actual installation and configuration of a Kubernetes cluster.

Chapter 10, Monitoring with the ELK Stack, explains what each component of the Elastic Stack is and how they're connected.

Chapter 11, Designing an ELK Stack, covers the design considerations when deploying an Elastic Stack.

Chapter 12, Using Elasticsearch, Logstash, and Kibana to Manage Logs, describes the implementation, installation, and configuration of the Elastic Stack.

Chapter 13, Solving Management Problems with Salty Solutions, discusses the business needs to have a centralized management utility for infrastructure, such as Salt.

Chapter 14, Getting Your Hands Salty, examines how to install and configure Salt.

Chapter 15, Design Best Practices, takes you through some of the different best practices needed to design a resilient and failure-proof solution.

To get the most out of this book

Some basic Linux knowledge is needed, as this book does not explain the basics of Linux management.

The examples given in this book can be implemented either in the cloud or on-premises. Some of the setups were deployed on Microsoft's cloud platform, Azure, so having an account with Azure to follow the examples is recommended. Azure does offer a free trial to evaluate and test deployments before committing—more information can be found at https://azure.microsoft.com/free/. Additionally, more information on Azure's offerings can be found at: https://azure.microsoft.com.

Since the book entirely revolves around Linux, having a way to connect to the internet is a requirement. This can be done from a Linux desktop (or laptop), a macOS Terminal, or Windows Subsystem for Linux (WSL).

All of the examples illustrated in this book make use of open source software that can be easily obtained from either the available repositories or from their respective sources, without the need of a paying license.

Be sure to drop by the projects pages to show some love—a lot of effort goes into developing them:

https://github.com/gluster/glusterfs

https://github.com/zfsonlinux/zfs

https://github.com/kubernetes/kubernetes

https://github.com/elastic/elasticsearch

https://github.com/saltstack/salt

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packt.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/-Hands-On-Linux-for-Architects. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789534108_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The two key points in this command are the address-prefix flag and the subnet-prefix flag."

A block of code is set as follows:

apiVersion: v1kind: PersistentVolumeClaimmetadata: name: gluster-pvc spec: accessModes: - ReadWriteMany resources: requests: storage: 1Gi

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

SHELL

["/bin/bash", "-c"]

RUN

echo "Hello I'm using bash"

Any command-line input or output is written as follows:

yum install -y zfs

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "To confirm that data is being sent to the cluster, go to Discover on the kibana screen"

Warnings or important notes appear like this.

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Section 1: High-Performance Storage Solutions with GlusterFS

In this section, the reader will be able to understand the decisions needed to be made when deploying a high-performance storage solution using GlusterFS.

This section contains the following chapters:

Chapter 1

Introduction to Design Methodology

Chapter 2

Defining GlusterFS Storage

Chapter 3

Architecting a Storage Cluster

Chapter 4

Using GlusterFS on the Cloud Infrastructure

Chapter 5

Analyzing Performance in a Gluster System

Introduction to Design Methodology

These days, IT solutions require increased performance and data availability, and designing a robust implementation that meets these requirements is a challenge that many IT experts have to go through every day.

In this chapter, you will learn the basics, from a bird's-eye view of architecting IT solutions in any type of environment, to virtualized infrastructure, bare metal, and even the public cloud, as basic concepts of solution design apply to any environment.

You will explore the following subjects:

Defining the stages of solution design and why they matter

Analyzing the problem and asking the right questions

Considering possible solutions

Implementing the solution

Fully understanding the aspects that you need to consider when architecting a solution is crucial for the success of the project, as this will determine which software, hardware, and configuration will help you achieve the desired state that meets the needs of your customers.

Defining the stages of solution design and why they matter

Like many things, designing solutions is a step-by-step process that not only involves: technical aspects, nor necessarily technical parties. Usually, you will be engaged by an account manager, project manager, or, if you are lucky, a CTO, who understands the technical part of the requirements. They are looking for an expert who can help them deliver a solution to a customer. These requests usually do not contain all the information you will need to deliver your solution, but it's a start to understand what your goal is.

For example, imagine that you receive an email from a project manager with the following statement.

We require a solution that can sustain at least 10,000 website hits and will stay available during updates as well as survive outages. Our budget is considerably low, so we need to spend as little as possible, with little to no upfront cost. We're also expecting this to gain momentum during the project's life cycle.

From the previous statement, you can only get a general idea of what is required, but no specifics have been given. So, you only know basic information: we require a solution that can sustain at least 10,000 website hits, which, for a design, is not good enough, as you require as much information as possible to be able to resolve the problems exposed by your customer. This is where you have to ask for as many details as possible to be able to provide an accurate set of proposals for your customer, which will be the first impression your customer will have of the project. This part is critical, as it will help you understand whether you understand your customer's vision.

It is also important to understand that you need to deliver several different solutions for the customer, as the customer is the one who decides which one fits their business needs the most. Remember that each solution has its own advantages and disadvantages. After the customer decides which way to go, you will have what is necessary to move on to the implementation of your proposal, which can always trigger more challenges. It will require, more often than not, some final customized tuning or changes that were not considered in the initial Proof of Concept (POC).

From our previous analysis, you can see four well-defined stages of the process that you need to follow in order to reach the final delivery illustrated in the following diagram:

There are many more stages and design methodologies that we could cover, but since they're not in the scope of this book, we will be focusing on these four general stages to help you understand the process in which you will be architecting your solutions.

Analyzing the problem and asking the right questions

After getting the initial premise, you need to break it into smaller pieces in order to understand what is required. Each piece will raise different questions that you will ask your customers later. These questions will help fill in the gaps for your POC, ensuring that your questions cover all business needs from all view standpoints: the business standpoint, the functional standpoint, and, finally, the technical standpoint. One good way to keep track of the questions that arise and which business need they will be resolving is to have a checklist that asks which standpoint the question is being asked from and what is resolved or answered.

It is also important to note that, as questions become answers, they can also come with constraints or other obstacles that will also need to be addressed and mentioned during the POC stage. The customer will have to agree with them and will be decisive when selecting the final solution.

From our previous example, you can analyze the premise by dissecting it into standpoints.

Technical standpoint

From this perspective, we will analyze all technical aspects of the premise – anything that you will need to provide the initial technical requirements of your solution.

We will analyze it in the following way:

You can understand, from the premise, that your customer needs some kind of solution that can sustain some amount of website hits, but you can't be certain if the web server is already set up, and whether the customer only needs a load balancing solution. Alternatively, maybe the customer needs both, a web server, that is NGINX, Apache, or something of that sort, and the load balancing solution.

The customer mentions at least 10,000 hits to their website, but they didn't mention if these hits are concurrent per second, daily, weekly, or even monthly.

You can also see that they need to stay available during updates and be able to continue serving their website if the company has an outage, but all these statements are very general, since availability is measured in 9s. The more 9s you have, the better (in reality, this is a percentage measurement of the amount of time during the year; a 99% availability means that there can only be 526 minutes of downtime per year). Outages are also very hard to predict, and it's almost impossible to be able to say that you will never have an outage, therefore, you need to plan for it. You have to have a

Recovery point objective

(

RPO

) and a

Recovery time objective

(

RTO

) for your solution in case of a disaster. The customer didn't mention this, and it is crucial to understand how much time a business can sustain an outage.

When it comes to budget, this is usually from a business perspective, but the technical aspects are affected directly by it. It looks like the budget in the project is tight, and the customer wants to spend as little as possible on their solution, but they're not mentioning exact numbers, which you will require in order to fit your proposals to it. Little to no upfront cost? What does this mean? Are we repurposing the existing resources and building a new solution? How can we implement a design with no upfront cost? One way to overcome low budgets, or no upfront costs, at least in software, is to utilize

open source software

(

OSS

), but this is something that we need to ask the customer.

Gaining momentum can only mean that they are predicting that their userbase will grow eventually, but you need an estimate of how much they predict this will grow and how fast, as this will imply that you have to leave the solution ready to be scaled vertically or horizontally. Vertically, by leaving space to increase the resources eventually and take into account the business's procurement process if you need to buy more resources such RAM, CPU, or storage. Horizontally will also involve a procurement process and a considerable amount of time to integrate a new node/server/VM/container into the solution. None of these are included in the premise, and it's vital information.

Here, we have a comparison of horizontal and vertical scaling. Horizontal scaling adds more nodes, while vertical scaling adds more resources to the existing nodes:

The following is a list of example questions that you could ask to clarify the gray areas:

Is this solution for a new/existing website or web server?

When you say 10,000 hits, are these concurrent per second or is it daily/weekly/monthly?

Do you have any estimates or current data of how large your userbase is?

Considering that the budget is low, can we use

OSS

Do you have the technical resources to support the solution in case we use OSS?

Do you have any sort of update infrastructure in place, or version control software implemented already?

When you say little to no upfront cost, does this mean that you

already

have hardware, resources, or infrastructures (virtual or cloud)

available

that we could recycle and/or reuse for our new solution?

Are there any disaster recovery sites in place that we could use to provide high availability?

If your userbase grows, will this generate more storage requirements or only compute resources?

Do you plan on performing any backups? What is your backup scheme?

From the technical perspective, once you start designing your POCs more questions will arise based on the software or hardware that will be used in the solution. You will need to know how they fit or what is needed for them to adjust to the customer's existing infrastructure, if any.

Business standpoint

Here, we will be analyzing the statement from a business perspective, taking into account all the aspects that can affect our design:

A main requirement is performance, as this affects how many hits the solution can sustain. Since this is one of the main objectives of the solution, it needs to be sized to meet business expectations.

Budget seems to be the main constraint that will affect the project's design and scope.

There is no mention of the actual available budget.

Availability requirements affect how the business should react in case of an outage. As there's no specific

service level agreement

(

SLA

), this needs to be clarified to adjust to the business needs.

A main concern is the upfront cost. This can be

lowered

considerably by utilizing

OSS

, as there are no licensing fees.

It has been mentioned that the solution needs to remain up during maintenance operations. This might indicate that the customer is willing to invest in maintenance operation for further upgrades or enhancements.

The statement—we're also expecting this to gain momentum, indicates that the solution will change in the amount of resources needed, thus

directly

affecting the amount of money consumed by it.

The following are questions to ask when clarifying doubts from a business standpoint:

Based on the performance requirements, what is the business impact when performance goes below the expected baseline?

What is the actual budget for the project?

Does the budget take into account maintenance operations?

Considering the possible unplanned outages and maintenance, how much time exactly can your website be down per year? Will this affect business continuity?

If an outage happens, how much time can the application tolerate not receiving data?

Do we have data of any sort from which we can estimate how much your userbase

will

grow?

Do you have any procurement process in place?

How much time does it take to approve the acquisition of new hardware or resources?

Functional standpoint

In the functional standpoint, you will be reviewing the functional side of the solution:

You know that the customer requires 10,000 hits, but what types of user will be using this website?

You can see that it requires 10,000 hits, but the premise does not specify what the user will be doing with it.

The premise states that they need the solution to be available during updates. By this, we assume that the application will be updated, but how?

To clarify the gaps in the functional standpoint, we can ask for the following information:

What type of users will be using your application?

What will your users be doing in your website?

How often will this application be updated or maintained?

Who will be maintaining and supporting this solution?

Will this website be for internal company users or external users?

It is important to note that functional standpoint overlaps considerably with the business standpoint, as they are both trying to address similar problems.

Once we have gathered all the information, you can build a document summarizing the requirements of your solution; ensure that you go through it with the customer and that they agree to what is required to consider this solution complete.

Considering possible solutions

Once all the doubts that arose during the initial premise have been cleared, you can move on and construct a more elaborate and specific statement that includes all the information gathered. We will continue working with our previous statement and, assuming that our customer responded to all of our previous questions, we can construct a more detailed statement, as follows.

We require a new web server for our financial application that can sustain at least 10,000 web hits per second from approximately 2,000 users, alongside another three applications that will consume its data. It will be capable of withstanding maintenance and outages through the use of high-availability implementations with a minimum of four nodes. The budget for the project will be $20,000 for the initial implementation, and the project will utilize OSS, which will lower upfront costs. The solution will be deployed in an existing virtual environment, whose support will be handled by our internal Linux team, and updates will be conducted internally by our own update management solution. The userbase will grow approximately every two months, which is within our procurement process, allowing us to acquire new resources fairly quickly, without creating extensive periods of resource contention. User growth will impact mostly computer resources.

As you can see, it is a more complete statement on which you can already start working. You know that it will utilize an existing virtual infrastructure. OSS is a go, high availability is also required, and it will be updated via an update and version control infrastructure that it is already in place, so, possibly, only monitoring agents will be needed for your new solution.

A very simplified overview with not many details of the possible design is as follows:

In the diagram, you can see that it's a web server cluster that provides high availability and load balancing to the clients and applications that are consuming the solution.

As you are already utilizing much of the existing infrastructure, there are fewer options for possible POC, so this design will be very straightforward. Nonetheless, there are certain variables that we can play with to provide our customer with several different options. For instance, for the web server we can have one solution with Apache and another with NGINX, or a combination of both, with Apache hosting the website and NGINX providing load balancing.

POC

With a complete statement and several options already defined, we can proceed to provide a POC based on one of the possible routes.

A POC is the process of demonstrating an idea or method, in our case a solution, with the aim of verifying a given functionality. Additionally, it provides a broad overview of how the solution will behave within an environment, allowing further testing to be able to fine-tune for specific workloads and use cases.

Any POC will have its advantages and disadvantages, but the main focus is for customers and architects to explore the different concepts of the solution of an actual working environment. It is important to note that you, as an architect, have a heavy influence in which POC will be used as a final solution, but the customer is the one who chooses which constraints and advantages suit their business better.

With the example of choosing an NGINX as a load balancer to provide high availability and performance improvements to Apache web servers hosting the application files, we can implement a working solution with scaled-down resources. Instead of deploying four nodes for the final solution, we can deploy just two to demonstrate the load-balancing features as well as provide a practical demonstration of high availability by purposely bringing one of them down.

Here's a diagram describing the previous example:

This does not require the full four-node cluster that was envisioned during the design phase, as we're not testing the full performance of the entire solution. For performance or load testing, this can be done by having less concurrent users provide a close to actual workload for the application. While having fewer users will never provide exact performance numbers for the full implementation, it delivers a good baseline with data that can later be extrapolated to provide an approximation of what the actual performance will be.

As an example for performance testing, instead of having 2,000 users load the application, we can have a quarter of the userbase and half of the resources. This will considerably decrease the amount of resources needed, while providing enough data to be able to analyze the performance of the final solution at the same time.

Also, in the information gathering stage, a document that has the different POC documented is a good idea, as it can serve as a starting point if the customer wants to construct a similar solution in the future.

Implementing the solution

Once the customer has selected the optimal route based on their business needs, we can start constructing our design. At this stage, you will be facing different obstacles, as implementing the POC in a development or QA environment might vary from production. Things that worked in QA or development may now fail in production, and different variables might be in place; all these things only arise at the implementation stage, and you need to be aware that, in a worst-case scenario, it might mean changing a large amount of the initial design.

This stage requires hands-on work with the customer and the customer's environment, so it is of utmost importance to ensure that the changes you make won't affect the current production. Working with the customer is also important, because this will familiarize their IT team with the new solution; this way, when the sign-off is done, they will be familiar with it and its configuration.

The creation of an implementation guide is one of the most important parts at this stage, since it will document each step and every minor configuration made to the solution. It will also help in the future in case an issue appears and the support team needs to know how it was configured in order to be able to solve the problem.

Summary

Designing a solution requires different approaches. This chapter went through the basics of the design stages and why each of them matters.

The first stage goes through analyzing the problem the design aims to solve, while at the same time asking the right questions. This will help define the actual requirements and narrow the scope to the real business needs. Working with the initial problem statement will impose problems further down the road, making this stage extremely important, as it will prevent unnecessarily going back and forth.

Then, we considered the possible paths or solutions we can take to solve the already defined problem. With the right questions asked in the previous stage, we should be able to construct several options for the customer to select, and can later implement a POC. POCs help both customers and architects understand how the solution will behave in an actual working environment. Normally, POCs are scaled-down versions of the final solution, making implementation and testing more agile.

Finally, the implementation stage deals with the actual configuration and hands-on aspects of the project. Based on the findings during the POC, changes can be made to accommodate the specifics of each infrastructure. Documentation delivered through this stage will help align parties to ensure that the solution is implemented as expected.

In the next chapter, we will jump into solving a problem that affects every type of implementation, regardless of cloud provider, software, or design, showing the necessity of high-performance redundant storage.

Questions

What are the stages of a solution design?

Why is it important to ask the right questions when designing a solution?

Why should we deliver several design options?

What questions can be asked to obtain information that can help design a better solution?

What is a POC?

What happens in the implementation stage?

How does the POC helps with the final implementation?

Defining GlusterFS Storage

Every day, applications require faster storage that can sustain thousands of concurrent I/O requests. GlusterFS is a highly-scalable, redundancy filesystem that can deliver high-performance I/O to many clients simultaneously. We will define the core concept of a cluster and then introduce how GlusterFS plays an important role.

In the preceding chapter, we went through the different aspects of designing solutions to provide high availability and performance to applications that have many requirements. In this chapter, we'll go through solving a very specific problem, that is, storage.

In this chapter, we will cover the following topics:

Understanding the core concept of a cluster

The reason for choosing GlusterFS

Explaining

software-defined storage

(

SDS

)

Exploring the differences between file, object, and block storage

Explaining the need for high performance and highly available storage

Technical requirements

This chapter will focus on defining GlusterFS. You can refer to the project's home page at https://github.com/gluster/glusterfs or https://www.gluster.org/.

Additionally, the project's documentation can be found at https://docs.gluster.org/en/latest/.

What is a cluster?

We can leverage the many advantages of SDS, which allows for easy scalability and enhanced fault tolerance. GlusterFS is a piece of software that can create highly scalable storage clusters while providing maximum performance.

Before we go through how we can solve this specific need, we first need to define what a cluster is, why it exists, and what problems a cluster might be able to solve.

Computing a cluster

Put simply, a cluster is a set of computers (often called nodes) that work in tandem on the same workload and can distribute loads across all available members of the cluster to increase performance, while, at the same time, allowing for self-healing and availability. Note that the term server wasn't used before as, in reality, any computer can be added to a cluster. Made from a simple Raspberry Pi to multiple CPU servers, clusters can be made from a small two-node configuration to thousands of nodes in a data center.

Here is an example of a cluster: