46,44 €
Explore software engineering methodologies, techniques, and best practices in Go programming to build easy-to-maintain software that can effortlessly scale on demand
Key Features
Book Description
Over the last few years, Go has become one of the favorite languages for building scalable and distributed systems. Its opinionated design and built-in concurrency features make it easy for engineers to author code that efficiently utilizes all available CPU cores.
This Golang book distills industry best practices for writing lean Go code that is easy to test and maintain, and helps you to explore its practical implementation by creating a multi-tier application called Links 'R' Us from scratch. You'll be guided through all the steps involved in designing, implementing, testing, deploying, and scaling an application. Starting with a monolithic architecture, you'll iteratively transform the project into a service-oriented architecture (SOA) that supports the efficient out-of-core processing of large link graphs. You'll learn about various cutting-edge and advanced software engineering techniques such as building extensible data processing pipelines, designing APIs using gRPC, and running distributed graph processing algorithms at scale. Finally, you'll learn how to compile and package your Go services using Docker and automate their deployment to a Kubernetes cluster.
By the end of this book, you'll know how to think like a professional software developer or engineer and write lean and efficient Go code.
What you will learn
Who this book is for
This Golang programming book is for developers and software engineers looking to use Go to design and build scalable distributed systems effectively. Knowledge of Go programming and basic networking principles is required.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 907
Veröffentlichungsjahr: 2020
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor:Richa TripathiAcquisition Editor:Karan GuptaContent Development Editor:Tiksha SarangSenior Editor: Storm MannTechnical Editor:Pradeep SahuCopy Editor: Safis EditingProject Coordinator:Francy PuthiryProofreader: Safis EditingIndexer:Tejal Daruwale SoniProduction Designer:Arvindkumar Gupta
First published: January 2020 Production reference: 1230120
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-83855-449-1
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Fully searchable for easy access to vital information
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Achilleas Anagnostopoulos has been writing code in a multitude of programming languages since the mid 90s. His main interest lies in building scalable, microservice-based distributed systems where components are interconnected via gRPC or message queues. Achilleas has over 4 years of experience building production-grade systems using Go and occasionally enjoys pushing the language to its limits through his experimental gopher-os project: a 64-bit kernel written entirely in Go. He is currently a member of the Juju team at Canonical, contributing to one of the largest open source Go code bases in existence.
Eduard Bondarenko is a long-time software developer. He prefers concise and expressive code with comments and has tried many programming languages, such as Ruby, Go, Java, and JavaScript.
Eduard has reviewed a couple of programming books and has enjoyed their broad topics and how interesting they are. Besides programming, he likes to spend time with the family, play soccer, and travel.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Hands-On Software Engineering with Golang
Dedication
About Packt
Why subscribe?
Contributors
About the author
About the reviewer
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Code in Action
Download the color images
Conventions used
Get in touch
Reviews
Section 1: Software Engineering and the Software Development Life Cycle
A Bird's-Eye View of Software Engineering
What is software engineering?
Types of software engineering roles
The role of the software engineer (SWE)
The role of the software development engineer in test (SDET)
The role of the site reliability engineer (SRE)
The role of the release engineer (RE)
The role of the system architect
A list of software development models that all engineers should know
Waterfall
Iterative enhancement
Spiral
Agile
Lean
Eliminate waste
Create knowledge
Defer commitment
Build in quality
Deliver fast
Respect and empower people
See and optimize the whole
Scrum
Scrum roles
Essential Scrum events
Kanban
DevOps
The CAMS model
The three ways model
Summary
Questions
Further reading
Section 2: Best Practices for Maintainable and Testable Go Code
Best Practices for Writing Clean and Maintainable Go Code
The SOLID principles of object-oriented design
Single responsibility
Open/closed principle
Liskov substitution
Interface segregation
Dependency inversion
Applying the SOLID principles
Organizing code into packages
Naming conventions for Go packages
Circular dependencies
Breaking circular dependencies via implicit interfaces
Sometimes, code repetition is not a bad idea!
Tips and tools for writing lean and easy-to-maintain Go code
Optimizing function implementations for readability
Variable naming conventions
Using Go interfaces effectively
Zero values are your friends
Using tools to analyze and manipulate Go programs
Taking care of formatting and imports (gofmt, goimports)
Refactoring code across packages (gorename, gomvpkg, fix)
Improving code quality metrics with the help of linters
Summary
Questions
Further reading
Dependency Management
What's all the fuss about software versioning?
Semantic versioning
Comparing semantic versions
Applying semantic versioning to Go packages
Managing the source code for multiple package versions
Single repository with versioned folders
Single repository – multiple branches
Vendoring – the good, the bad, and the ugly
Benefits of vendoring dependencies
Is vendoring always a good idea?
Strategies and tools for vendoring dependencies
The dep tool
The Gopkg.toml file
The Gopkg.lock file
Go modules – the way forward
Fork packages
Summary
Questions
Further reading
The Art of Testing
Technical requirements
Unit testing
Mocks, stubs, fakes, and spies – commonalities and differences
Stubs and spies!
Mocks
Introducing gomock
Exploring the details of the project we want to write tests for
Leveraging gomock to write a unit test for our application
Fake objects
Black-box versus white-box testing for Go packages – an example
The services behind the facade
Writing black-box tests
Boosting code coverage via white-box tests
Table-driven tests versus subtests
Table-driven tests
Subtests
The best of both worlds
Using third-party testing frameworks
Integration versus functional testing
Integration tests
Functional tests
Functional tests part deux – testing in production!
Smoke tests
Chaos testing – breaking your systems in fun and interesting ways!
Tips and tricks for writing tests
Using environment variables to set up or skip tests
Speeding up testing for local development
Excluding classes of tests via build flags
This is not the output you are looking for – mocking calls to external binaries
Testing timeouts is easy when you have all the time in the world!
Summary
Questions
Further reading
Section 3: Designing and Building a Multi-Tier System from Scratch
The Links 'R'; Us Project
System overview – what are we going to be building?
Selecting an SDLC model for our project
Iterating faster using an Agile framework
Elephant carpaccio – how to iterate even faster!
Requirements analysis
Functional requirements
User story – link submission
User story – search
User story – crawl link graph
User story – calculate PageRank scores
User story – monitor Links 'R' Us health
Non-functional requirements
Service-level objectives
Security considerations
Being good netizens
System component modeling
The crawler
The link filter
The link fetcher
The content extractor
The link extractor
The content indexer
The link provider
The link graph
The PageRank calculator
The metrics store
The frontend
Monolith or microservices? The ultimate question
Summary
Questions
Further reading
Building a Persistence Layer
Technical requirements
Running tests that require CockroachDB
Running tests that require Elasticsearch
Exploring a taxonomy of database systems
Key-value stores
Relational databases
NoSQL databases
Document databases
Understanding the need for a data layer abstraction
Designing the data layer for the link graph component
Creating an ER diagram for the link graph store
Listing the required set of operations for the data access layer
Defining a Go interface for the link graph
Partitioning links and edges for processing the graph in parallel
Iterating Links and Edges
Verifying graph implementations using a shared test suite
Implementing an in-memory graph store
Upserting links
Upserting edges
Looking up links
Iterating links/edges
Removing stale edges
Setting up a test suite for the graph implementation
Scaling across with a CockroachDB-backed graph implementation
Dealing with DB migrations
An overview of the DB schema for the CockroachDB implementation
Upserting links
Upserting edges
Looking up links
Iterating links/edges
Removing stale edges
Setting up a test suite for the CockroachDB implementation
Designing the data layer for the text indexer component
A model for indexed documents
Listing the set of operations that the text indexer needs to support
Defining the Indexer interface
Verifying indexer implementations using a shared test suite
An in-memory Indexer implementation using bleve
Indexing documents
Looking up documents and updating their PageRank score
Searching the index
Iterating the list of search results
Setting up a test suite for the in-memory indexer
Scaling across an Elasticsearch indexer implementation
Creating a new Elasticsearch indexer instance
Indexing and looking up documents
Performing paginated searches
Updating the PageRank score for a document
Setting up a test suite for the Elasticsearch indexer
Summary
Questions
Further reading
Data-Processing Pipelines
Technical requirements
Building a generic data-processing pipeline in Go
Design goals for the pipeline package
Modeling pipeline payloads
Multistage processing
Stageless pipelines – is that even possible?
Strategies for handling errors
Accumulating and returning all errors
Using a dead-letter queue
Terminating the pipeline's execution if an error occurs
Synchronous versus asynchronous pipelines
Synchronous pipelines
Asynchronous pipelines
Implementing a stage worker for executing payload processors
FIFO
Fixed and dynamic worker pools
1-to-N broadcasting
Implementing the input source worker
Implementing the output sink worker
Putting it all together – the pipeline API
Building a crawler pipeline for the Links 'R' Us project
Defining the payload for the crawler
Implementing a source and a sink for the crawler
Fetching the contents of graph links 
Extracting outgoing links from retrieved webpages
Extracting the title and text from retrieved web pages
Inserting discovered outgoing links to the graph
Indexing the contents of retrieved web pages
Assembling and running the pipeline
Summary
Questions
Further reading
Graph-Based Data Processing
Technical requirements
Exploring the Bulk Synchronous Parallel model
Building a graph processing system in Go
Queueing and delivering messages
The Message interface
Queues and message iterators
Implementing an in-memory, thread-safe queue
Modeling the vertices and edges of graphs
Defining the Vertex and Edge types
Inserting vertices and edges into the graph
Sharing global graph state through data aggregation
Defining the Aggregator interface
Registering and looking up aggregators
Implementing a lock-free accumulator for float64 values
Sending and receiving messages
Implementing graph-based algorithms using compute functions
Achieving vertical scaling by executing compute functions in parallel
Orchestrating the execution of super-steps
Creating and managing Graph instances
Solving interesting graph problems
Searching graphs for the shortest path
The sequential Dijkstra algorithm
Leveraging a gossip protocol to run Dijkstra in parallel
Graph coloring
A sequential greedy algorithm for coloring undirected graphs
Exploiting parallelism for undirected graph coloring
Calculating PageRank scores
The model of the random surfer
An iterative approach to PageRank score calculation
Reaching convergence – when should we stop iterating?
Web graphs in the real world – dealing with dead ends
Defining an API for the PageRank calculator
Implementing a compute function to calculate PageRank scores
Summary
Further reading
Communicating with the Outside World
Technical requirements
Designing robust, secure, and backward-compatible REST APIs
Using human-readable paths for RESTful resources
Controlling access to API endpoints
Basic HTTP authentication
Securing TLS connections from eavesdropping
Authenticating to external service providers using OAuth2
Dealing with API versions
Including the API version as a route prefix
Negotiating API versions via HTTP Accept headers
Building RESTful APIs in Go
Building RPC-based APIs with the help of gRPC
Comparing gRPC to REST
Defining messages using protocol buffers
Defining messages
Versioning message definitions
Representing collections
Modeling field unions
The Any type
Implementing RPC services
Unary RPCs
Server-streaming RPCs
Client-streaming RPCs
Bi-directional streaming RPCs
Security considerations for gRPC APIs
Decoupling Links 'R' Us components from the underlying data stores
Defining RPCs for accessing a remote link-graph instance
Defining RPCs for accessing a text-indexer instance
Creating high-level clients for accessing data stores over gRPC 
Summary
Questions
Further reading
Building, Packaging, and Deploying Software
Technical requirements
Building and packaging Go services using Docker
Benefits of containerization
Best practices for dockerizing Go applications
Selecting a suitable base container for your application
A gentle introduction to Kubernetes
Peeking under the hood
Summarizing the most common Kubernetes resource types
Running a Kubernetes cluster on your laptop!
Building and deploying a monolithic version of Links 'R' Us
Distributing computation across application instances
Carving the UUID space into non-overlapping partitions
Assigning a partition range to each pod
Building wrappers for the application services
The crawler service
The PageRank calculator service
Serving a fully functioning frontend to users
Specifying the endpoints for the frontend application
Performing searches and paginating results
Generating convincing summaries for search results
Highlighting search keywords
Orchestrating the execution of individual services
Putting everything together
Terminating the application in a clean way
Dockerizing and starting a single instance of the monolith
Deploying and scaling the monolith on Kubernetes
Setting up the required namespaces
Deploying CockroachDB and Elasticsearch using Helm
Deploying Links 'R' Us
Summary
Questions
Further reading
Section 4: Scaling Out to Handle a Growing Number of Users
Splitting Monoliths into Microservices
Technical requirements
Monoliths versus service-oriented architectures
Is there something inherently wrong with monoliths?
Microservice anti-patterns and how to deal with them
Monitoring the state of your microservices
Tracing requests through distributed systems
The OpenTracing project
Stepping through a distributed tracing example
The provider service
The aggregator service
The gateway
Putting it all together
Capturing and visualizing traces using Jaeger
Making logging your trusted ally
Logging best practices
The devil is in the (logging) details
Shipping and indexing logs inside Kubernetes
Running a log collector on each Kubernetes node
Using a sidecar container to collect logs
Shipping logs directly from the application
Introspecting live Go services
Building a microservice-based version of Links 'R' Us
Decoupling access to the data stores
Breaking down the monolith into distinct services
Deploying the microservices that comprise the Links 'R' Us project
Deploying the link-graph and text-indexer API services
Deploying the web crawler
Deploying the PageRank service
Deploying the frontend service
Locking down access to our Kubernetes cluster using network policies
Summary
Questions
Further reading
Building Distributed Graph-Processing Systems
Technical requirements
Introducing the master/worker model
Ensuring that masters are highly available
The leader-follower configuration
The multi-master configuration
Strategies for discovering nodes
Recovering from errors
Out-of-core distributed graph processing
Describing the system architecture, requirements, and limitations
Modeling a state machine for executing graph computations
Establishing a communication protocol between workers and masters
Defining a job queue RPC service
Establishing protocol buffer definitions for worker payloads
Establishing protocol buffer definitions for master payloads
Defining abstractions for working with bi-directional gRPC streams
Remote worker stream
Remote master stream
Creating a distributed barrier for the graph execution steps
Implementing a step barrier for individual workers
Implementing a step barrier for the master
Creating custom executor factories for wrapping existing graph instances
The workers' executor factory
The master's executor factory
Coordinating the execution of a graph job
Simplifying end user interactions with the dbspgraph package
The worker job coordinator
Running a new job
Transitioning through the stages of the graph's state machine
Handling incoming payloads from the master
Using the master as an outgoing message relay
The master job coordinator
Running a new job
Transitioning through the stages for the graph's state machine
Handling incoming worker payloads
Relaying messages between workers
Defining package-level APIs for working with master and worker nodes
Instantiating and operating worker nodes
Instantiating and operating master nodes
Handling incoming gRPC connections
Running a new job
Deploying a distributed version of the Links 'R' Us PageRank calculator
Retrofitting master and worker capabilities to the PageRank calculator service
Serializing PageRank messages and aggregator values
Defining job runners for the master and the worker
Implementing the job runner for master nodes
The worker job runner
Deploying the final Links 'R' Us version to Kubernetes
Summary
Questions
Further reading
Metrics Collection and Visualization
Technical requirements
Monitoring from the perspective of a site reliability engineer
Service-level indicators (SLIs)
Service-level objectives (SLOs)
Service-level agreements (SLAs)
Exploring options for collecting and aggregating metrics
Comparing push versus pull systems
Capturing metrics using Prometheus
Supported metric types
Automating the detection of scrape targets
Static and file-based scrape target configuration
Querying the underlying cloud provider
Leveraging the API exposed by Kubernetes
Instrumenting Go code
Registering metrics with Prometheus
Vector-based metrics
Exporting metrics for scraping
Visualizing collected metrics using Grafana
Using Prometheus as an end-to-end solution for alerting
Using Prometheus as a source for alert events
Handling alert events
Grouping alerts together
Selectively muting alerts
Configuring alert receivers
Routing alerts to receivers
Summary
Questions
Further reading
Epilogue
Assessments
Chapter 1
Chapter 2
Chapter 3
Chapter 4
Chapter 5
Chapter 6
Chapter 7
Chapter 8
Chapter 9
Chapter 10
Chapter 11
Chapter 12
Chapter 13
Other Books You May Enjoy
Leave a review - let other readers know what you think
Over the last few years, Go has gradually turned into one of the industry's favorite languages for building scalable and distributed systems. The language's opinionated design and built-in concurrency features make it relatively easy for engineers to author code that efficiently utilizes all available CPU cores.
This book distills the industry's best practices for writing lean Go code that is easy to test and maintain and explores their practical implementation by creating a multi-tier application from scratch called 'Links 'R' Us.' You will be guided through all the steps involved in designing, implementing, testing, deploying, and scaling the application. You'll start with a monolithic architecture and iteratively transform the project into a Service-Oriented Architecture (SOA) that supports efficient out-of-core processing of large link graphs. You will learn about various advanced and cutting-edge software engineering techniques such as building extensible data-processing pipelines, designing APIs using gRPC, and running distributed graph processing algorithms at scale. Finally, you will learn how to compile and package your Go services using Docker and automate their deployment to a Kubernetes cluster.
By the end of this book, you will start to think like a professional developer/engineer who can put theory into practice by writing lean and efficient Go code.
This book is for developers and software engineers interested in effectively using Go to design and build scalable distributed systems. This book will also be useful for amateur-to-intermediate level developers who aspire to become professional software engineers.
Chapter 1, A Bird's-Eye View of Software Engineering, explains the difference between software engineering and programming and outlines the different types of engineering roles that you may encounter in small, medium, and large organizations. What's more, the chapter summarizes the basic software design life cycle models that every software engineer (SWE) should be aware of.
Chapter 2, Best Practices for Writing Clean and Maintainable Go Code, explains how the SOLID design principles can be applied to Go projects and provides useful tips for organizing your Go code in packages and writing code that is easy to maintain and test.
Chapter 3, Dependency Management, highlights the importance of versioning Go packages and discusses tools and strategies for vendoring your project dependencies.
Chapter 4, The Art of Testing, advocates the use of primitives such as stubs, mocks, spies, and fake objects for writing comprehensive unit tests for your code. Furthermore, the chapter enumerates the pros and cons of different types of tests (for example, black- versus white-box, integration versus functional) and concludes with an interesting discussion on advanced testing techniques such as smoke testing and chaos testing.
Chapter 5, The Links 'R' Us project, introduces the hands-on project that we will be building from scratch in the following chapters.
Chapter 6, Building a Persistence Layer, focuses on the design and implementation of the data access layer for two of the Links 'R' Us project components: the link graph and the text indexer.
Chapter 7, Data-Processing Pipelines, explores the basic principles behind data-processing pipelines and implements a framework for constructing generic, concurrent-safe, and reusable pipelines using Go primitives such as channels, contexts, and go-routines. The framework is then used to develop the crawler component for the Links 'R' Us project.
Chapter 8, Graph-Based Data Processing, explains the theory behind the Bulk Synchronous Parallel (BSP) model of computation and implements, from scratch, a framework for executing parallel algorithms against graphs. As a proof of concept, we will be using this framework to investigate parallel versions of popular graph-based algorithms (namely, shortest path and graph coloring) with our efforts culminating in the complete implementation of the PageRank algorithm, a critical component of the Links 'R' Us project.
Chapter 9, Communicating with the Outside World, outlines the key differences between RESTful and gRPC-based APIs with respect to subjects such as routing, security, and versioning. In this chapter, we will also define gRPC APIs for making the link graph and text indexer data stores for the Links 'R' Us project accessible over the network.
Chapter 10, Building, Packaging, and Deploying Software, enumerates the best practices for dockerizing your Go applications and optimizing their size. In addition, the chapter explores the anatomy of a Kubernetes cluster and enumerates the essential list of Kubernetes resources that we can use. As a proof of concept, we will be creating a monolithic version of the Links 'R' Us project and will deploy it to a Kubernetes cluster that you will spin up on your local machine.
Chapter 11, Splitting Monoliths into Microservices, explains the SOA pattern and discusses some common anti-patterns that you should be aware of and pitfalls that you want to avoid when switching from a monolithic design to microservices. To put the ideas from this chapter to the test, we will be breaking down the monolithic version of the Links 'R' Us project into microservices and deploying them to Kubernetes.
Chapter 12, Building Distributed Graph-Processing Systems, combines the knowledge from the previous chapters to create a distributed version of the graph-based data processing framework, which can be used for massive graphs that do not fit in memory (out-of-core processing).
Chapter 13, Metrics Collection and Visualization, enumerates the most popular solutions for collecting and indexing metrics from applications with a focus on Prometheus. After discussing approaches to instrumenting your Go code to capture and export Prometheus metrics, we will delve into the use of tools such as Grafana for metrics visualization, and Alert manager for setting up alerts based on the aggregated values of collected metrics.
Chapter 14, Epilogue, provides suggestions for furthering your understanding of the material by extending the hands-on project that we have built throughout the chapters of the book.
To get the most out of this book and experiment with the accompanying code, you need to have a fairly good understanding of programming in Go as well as sufficient experience working with the various tools that comprise the Go ecosystem.
In addition, the book assumes that you have a solid grasp of basic networking theory.
Finally, some of the more technical chapters in the book utilize technologies such as Docker and Kubernetes. While a priori knowledge of these technologies is not strictly required, any prior experience using these (or equivalent) systems will certainly prove beneficial in better understanding the topics discussed in those chapters.
You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packt.com
.
Select the
Support
tab.
Click on
Code Downloads
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Hands-On-Software-Engineering-with-Golang. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
To see the Code in Action please visit the following link: http://bit.ly/37QWeR2.
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781838554491_ColorImages.pdf.
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "In the following code, you can see the definition of a generic Sword type for our upcoming game."
A block of code is set as follows:
type Sword struct { name string // Important tip for RPG players: always name your swords!}// Damage returns the damage dealt by this sword.func (Sword) Damage() int { return 2}
When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:
type Sword struct { name string // Important tip for RPG players: always name your swords!}// Damage returns the damage dealt by this sword.func (Sword) Damage() int {
return 2
}
Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "The following excerpt is part of a system that collects and publishes performance metrics to a key-value store."
Feedback from our readers is always welcome.
General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/support/errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packt.com.
The objective of part one is to familiarize you with the concept of software engineering, the stages of the software development life cycle, and the various roles of software engineers.
This section comprises the following chapter:
Chapter 1
,
A Bird's-Eye View of Software Engineering
Through the various stages of my career, I have met several people that knew how to code; people whose skill level ranged from beginner to, what some would refer to as, guru. All those people had different backgrounds and worked for both start-ups and large organizations. For some, coding was seen as a natural progression from their CS studies, while others turned to coding as part of a career change decision.
Regardless of all these differences, all of them had one thing in common: when asked to describe their current role, all of them used the term software engineer. It is quite a common practice for job candidates to use this term in their CVs as the means to set themselves apart from a globally distributed pool of software developers. A quick random sampling of job specs published online reveals that a lot of companies – and especially high-profile start-ups – also seem to subscribe to this way of thinking, as evidenced by their search for professionals to fill software engineering roles. In reality, as we will see in this chapter, the term software engineer is more of an umbrella term that covers a wide gamut of bespoke roles, each one combining different levels of software development expertise with specialized skills pertaining to topics such as system design, testing, build tools, and operations management.
So, what is software engineering and how does it differ from programming? What set of skills should a software engineer possess and which models, methodologies, and frameworks are at their disposal for facilitating the delivery of complex pieces of software? These are some of the questions that will be answered in this chapter.
This chapter covers the following topics:
A definition of software engineering
The types of software engineering roles that you may encounter in contemporary organizations
An overview of popular software development models and which one to select based on the project type and requirements
Before we dive deeper into this chapter, we need to establish an understanding of some of the basic terms and concepts around software engineering. For starters, how do we define software engineering and in what ways does it differ from software development and programming in general? To begin answering this question, we will start by examining the formal definition of software engineering, as published in IEEE's Standard Glossary of Software Engineering Terminology [7]:
The main takeaway from this definition is that authoring code is just one of the many facets of software engineering. At the end of the day, any capable programmer can take a well-defined specification and convert it into a fully functioning program without thinking twice about the need to produce clean and maintainable code. A disciplined software engineer, on the other hand, would follow a more systematic approach by applying common design patterns to ensure that the produced piece software is extensible, easier to test, and well documented in case another engineer or engineering team assumes ownership of it in the future.
Besides the obvious requirement for authoring high-quality code, the software engineer is also responsible for thinking about other aspects of the systems that will be built. Some questions that the software engineer must be able to answer include the following:
What are the business use cases that the software needs to support?
What components comprise the system and how do they interact with each other?
Which technologies will be used to implement the various system components?
How will the software be tested to ensure that its behavior matches the customer's expectations?
How does load affect the system's performance and what is the plan for scaling the system?
To be able to answer these questions, the software engineer needs a special set of skills that, as you are probably aware, go beyond programming. These extra responsibilities and required skills are the main factors that differentiate a software engineer from a software developer.
As we discussed in the previous section, software engineering is an inherently complex, multi-stage process. In an attempt to manage this complexity, organizations around the world have invested a lot of time and effort over the years to break the process down into a set of well-defined stages and train their engineering staff to efficiently deal with each stage.
Some software engineers strive to work across all the stages of the Software Development Life Cycle (SDLC), while others have opted to specialize in and master a particular stage of the SDLC. This gave rise to a variety of software engineering roles, each one with a different set of responsibilities and a required set of skills. Let's take a brief look at the most common software engineering roles that you may encounter when working with both small- and large-sized organizations.
The software engineer (SWE) is the most common role that you are bound to interact with in any organization, regardless of its size. Software engineers play a pivotal role not only in designing and building new pieces of software, but also in operating and maintaining existing and legacy systems.
Depending on their experience level and technical expertise, SWEs are classified into three categories:
Junior engineer
: A junior engineer is someone who has recently started their software development career and lacks the necessary experience to build and deploy production-grade software. Companies are usually keen on hiring junior engineers as it allows them to keep their hiring costs low. Furthermore, companies often pair promising junior engineers with senior engineers in an attempt to grow them into mid-level engineers and retain them for longer.
Mid-level engineer
: A typical mid-level engineer is someone who has at least three years of software development experience. Mid-level engineers are expected to have a solid grasp of the various aspects of the software development life cycle and are the ones who can exert a significant impact on the amount of code that's produced for a particular project. To this end, they not only contribute code, but also review and offer feedback to the code that's contributed by other team members.
Senior engineer
: This class of engineer is well-versed in a wide array of disparate technologies; their breadth of knowledge makes them ideal for assembling and managing software engineering teams, as well as serving as mentors and coaches for less senior engineers. From their years of experience, senior engineers acquire a deep understanding of a particular business domain. This trait allows them to serve as a liaison between their teams and the other, technical or non-technical, business stakeholders.
Another way to classify software engineers is by examining the main focus of their work:
Frontend engineers
work exclusively on software that customers interact with. Examples of frontend work include the UI for a desktop application, a single-page web application for a
software as a service
(
SaaS
) offering, and a mobile application running on a phone or other smart device.
Backend engineers
specialize in building the parts of a system that implement the actual business logic and deal with data modeling, validation, storage, and retrieval.
Full stack engineers
are developers who have a good understanding of both frontend and backend technologies and no particular preference of doing frontend or backend work. This class of developers is more versatile as they can easily move between teams, depending on the project requirements.
The software development engineer in test (SDET) is a role whose origins can be traced back to Microsoft's engineering teams. In a nutshell, SDETs are individuals who, just like their SWE counterparts, take part in software development, but their primary focus lies in software testing and performance.
An SDET's primary responsibility is to ensure that the development team produces high-quality software that is free from defects. A prerequisite for achieving this goal is to be cognizant of the different types of approaches to testing software, including, but not limited to, unit testing, integration testing, white/black-box testing, end-to-end/acceptance testing, and chaos testing. We will be discussing all of these testing approaches in more detail in the following chapters.
The main tool that SDETs use to meet their goals is testing automation. Development teams can iterate much faster when a Continuous Integration (CI) pipeline is in place to automatically test their changes across different devices and CPU architectures. Besides setting up the infrastructure for the CI pipeline and integrating it with the source code repository system that the team uses, SDETs are often tasked with authoring and maintaining a separate set of tests. These tests fall into the following two categories:
Acceptance tests
: A set of scripted end-to-end tests to ensure that the complete system adheres to all the customer's business requirements before a new version is given the green light for a release.
Performance regression tests
: Another set of quality control tests that monitor a series of performance metrics across builds and alert you when a metric exceeds a particular threshold. These types of tests prove to be a great asset in the case where a
service-level agreement
(
SLA
) has been signed that makes seemingly innocuous changes to the code (for example, switching to a different data structure implementation) that may trigger a breach of the SLA, even though all the unit tests pass.
Finally, SDETs collaborate with support teams to transform incoming support tickets into bug reports that the development team can work on. The combination of software development and debugging skills, in conjunction with the SDET's familiarity with the system under development, makes them uniquely capable of tracking down the location of bugs in production code and coming up with example cases (for example, a particular data input or a sequence of actions) that allow developers to reproduce the exact set of conditions that trigger each bug.
The role of the site reliability engineer (SRE) came into the spotlight in 2016 when Google published a book on the subject of Site Reliability Engineering[4]. This book outlined the best practices and strategies that are used internally by Google to run their production systems and has since led to the wide adoption of the role by the majority of companies operating in the SaaS space.
The term was initially coined sometime around 2003 by Ben Treynor, the founder of Google's site reliability team. A site reliability engineer is a software engineer with a strong technical background who also focuses on the operations side of deploying and running production-grade services.
According to the original role definition, SREs spend approximately 50% of their time developing software and the other 50% dealing with ops-related aspects such as the following:
Working on support tickets or responding to alerts
Being on-call
Running manual tasks (for example, upgrading systems or running disaster recovery scenarios)
It is in the best interests of SREs to increase the stability and reliability of the services they operate. After all, no one enjoys being paged at 2 a.m. when a service melts down due to a sudden spike in the volume of incoming requests. The end goal is always to produce services that are highly available and self-healing; services that can automatically recover from a variety of faults without the need for human intervention.
The basic mantra of SREs is to eliminate potential sources of human errors by automating repeated tasks. One example of this philosophy is the use of a Continuous Deployment (CD) pipeline to minimize the amount of time that's required to deploy software changes to production. The benefits of this type of automation become apparent when a critical issue affecting production is identified and a fix must be deployed as soon as possible.
Ultimately, software is designed and built by humans so bugs will undoubtedly creep in. Rather than relying on a rigorous verification process to prevent defects from being deployed to production, SREs operate under the premise that we live in a non-perfect world: systems do crash and buggy software will, at some point, get deployed to production. To detect defective software deployments and mitigate their effects on end users, SREs set up monitoring systems that keep track of various health-related metrics for each deployed service and can trigger automatic rollbacks if a deployment causes an increase in a service's error rate.
In a world where complex, monolithic systems are broken down into multiple microservices and continuous delivery has become the new norm, debugging older software releases that are still deployed out in the wild becomes a major pain point for software engineers.
To understand why this can be a pain point, let's take a look at a small example: you arrive at work on a sunny Monday morning only to find out that one of your major customers has filed a bug against the microservice-based software your team is responsible for. To make things even worse, that particular customer is running a long-term support (LTS) release of the software, which means that some, if not all, of the microservices that the run on the customer's machines are based on code that is at least a couple of hundred commits behind the current state of development. So, how can you actually come up with a bug reproducer and check whether the bug has already been fixed upstream?
This is where the concept of reproducible builds comes into play. By reproducible builds, we mean that at any point in time we should be able to compile a particular version of all the system components where the resulting artifacts match, bit by bit, the ones that have been deployed by the customer.
A release engineer (RE) is effectively a software engineer who collaborates with all the engineering teams to define and document all the required steps and processes for building and releasing code to production. A prerequisite for a release engineer is having deep knowledge of all the tools and processes that are required for compiling, versioning, testing, and packaging software. Typical tasks for REs include the following:
Authoring makefiles
Implementing workflows for containerizing software artifacts (for example, as Docker or .
rkt
images)
Ensuring all teams use exactly the same build tool (compilers, linkers, and so on) versions and flags
Ensuring that builds are both
repeatable
and
hermetic
: changes to external dependencies (for example, third-party libraries) between builds of the
same software version
should have no effect on the artifacts that are produced by each build
The last role that we will be discussing in this section, and one that you will only probably encounter when working on bigger projects or collaborating with large organizations, is the system architect. While software engineering teams focus on building the various components of the system, the architect is the one person who sees the big picture: what components comprise the system, how each component must be implemented, and how all the components fit and interact with each other.
In smaller companies, the role of the architect is usually fulfilled by one of the senior engineers. In larger companies, the architect is a distinct role that's filled by someone with both a solid technical background and strong analytical and communication skills.
Apart from coming up with a high-level, component-based design for the system, the architect is also responsible for making decisions regarding the technologies that will be used during development and setting the standards that all the development teams must adhere to.
Even though architects have a technical background, they rarely get to write any code. As a matter of fact, architects tend to spend a big chunk of their time in meetings with the various internal or external stakeholders, authoring design documents or providing technical direction to the software engineering teams.
The software engineering definition from the previous section alludes to the fact that software engineering is a complicated, multi-stage process. In an attempt to provide a formal description of these stages, academia has put forward the concept of the SDLC.
Over the years, there has been an abundance of alternative model proposals for facilitating software development. The following diagram is a timeline illustrating the years when some of the most popular SDLC models were introduced:
In the upcoming sections, we will explore each of the preceding models in more detail.
The waterfall model is probably the most widely known model out there for implementing the SDLC. It was introduced by Winston Royce in 1970 [11] and defines a series of steps that must be sequentially completed in a particular order. Each stage produces a certain output, for example, a document or some artifact, that is, in turn, consumed by the step that follows.
The following diagram outlines the basic steps that were introduced by the waterfall model:
Requirement collection
: During this stage, the customer's requirements are captured and analyzed and a requirements document is produced.
Design
: Based on the requirement's document contents, analysts will plan the system's architecture. This step is usually split into two sub-steps: the logical system design, which models the system as a set of high-level components, and the physical system design, where the appropriate technologies and hardware components are selected.
Implementation
: The implementation stage is where the design documents from the previous step get transformed by software engineers into actual code.
Verification
: The verification stage follows the implementation stage and ensures that the piece of software that got implemented actually satisfies the set of customer requirements that were collected during the requirements gathering step.
Maintenance
: The final stage in the waterfall model is when the developed software is deployed and operated by the customer:
One thing to keep in mind is that the waterfall model operates under the assumption that all customer requirements can be collected early on, especially before the project implementation stage begins. Having the full set of requirements available as a set of use cases makes it easier to get a more accurate estimate of the amount of time that's required for delivering the project and the development costs involved. A corollary to this is that software engineers are provided with all the expected use cases and system interactions in advance, thus making testing and verifying the system much simpler.
The waterfall model comes with a set of caveats that make it less favorable to use when building software systems. One potential caveat is that the model describes each stage in an abstract, high-level way and does not provide a detailed view into the processes that comprise each step or even tackle cross-cutting processes (for example, project management or quality control) that you would normally expect to execute in parallel through the various steps of the model.
While this model does work for small- to medium-scale projects, it tends, at least in my view, not to be as efficient for projects such as the ones commissioned by large organizations and/or government bodies. To begin with, the model assumes that analysts are always able to elicit the correct set of requirements from customers. This is not always the case as, oftentimes, customers are not able to accurately describe their requirements or tend to identify additional requirements just before the project is delivered. In addition to this, the sequential nature of this model means that a significant amount of time may elapse between gathering the initial requirements and the actual implementation. During this time – what some would refer to as an eternity in software engineering terms – the customer's requirements may shift. Changes in requirements necessitate additional development effort and this directly translates into increased costs for the deliverable.
The iterative enhancement model that's depicted in the following diagram was proposed in 1975 by Basili and Victor [2] in an attempt to improve on some of the caveats of the waterfall model. By recognizing that requirements may potentially change for long-running projects, the model advocates executing a set of evolution cycles or iterations, with each one being allocated a fixed amount of time out of the project's time budget:
Instead of starting with the full set of specifications, each cycle focuses on building some parts of the final deliverable and refining the set of requirements from the cycle that precedes it. This allows the development team to make full use of any information available at that particular point in time and ensure that any requirement changes can be detected early on and acted upon.
One important rule when applying the iterative model is that the output of each cycle must be a usable piece of software. The last iteration is the most important as its output yields the final software deliverable. As we will see in the upcoming sections, the iterative model has exerted quite a bit of influence in the evolution of most of the contemporary software development models.
The spiral development model was introduced by Barry Boehm in 1986 [5] as an approach to minimize risk when developing large-scale projects associated with significant development costs.
In the context of software engineering, risks are defined as any kind of situation or sequence of events that can cause a project to fail to meet its goals. Examples of various degrees of failure include the following:
Missing the delivery deadline
Exceeding the project budget
Delivering software on time, depending on the hardware that isn't available yet
As illustrated in the following diagram, the spiral model combines the ideas and concepts from the waterfall and iterative models with a risk assessment and analysis process. As Boehm points out, a very common mistake that people who are unfamiliar with the model tend to make when seeing this diagram for the first time is to assume that the spiral model is just a sequence of incremental waterfall steps that have to be followed in a particular order for each cycle. To dispel this misconception, Boehms provided the following definition for the spiral model:
Under this definition, risk is the primary factor that helps project stakeholders answer the following questions:
What steps should we follow next?
How long should we keep following those steps before we need to reevaluate risk?
At the beginning of each cycle, all the potential sources of risk are identified and mitigation plans are proposed to address any risk concerns. These set of risks are then ordered in terms of importance, for example, the impact on the project and the likelihood of occurring, and used as input by the stakeholders when planning the steps for the next spiral cycle.
Another common misconception about the spiral model is that the development direction is one-way and can only spiral outward, that is, no backtracking to a previous spiral cycle is allowed. This is generally not the case: stakeholders always try to make informed decisions based on the information that's available to them at a particular point in time. As the project's development progresses, circumstances may change: new requirements may be introduced or additional pieces of previously unknown information may become available. In light of the new information that's available to them, stakeholders may opt to reevaluate prior decisions and, in some cases, roll back development to a previous spiral iteration.
When we talk about agile development, we usually refer to a broader family of software development models that were initially proposed during the early 90s. Agile is a sort of umbrella term that encompasses not only a set of frameworks but also a fairly long list of best practices for software development. If we had to come up with a more specific definition for agile, we would probably define it as follows:
The popularity of agile development and agile frameworks, in particular, skyrocketed with the publication of the Manifesto for Agile Software Development in 2001 [3]. At the time of writing this book, agile development practices have become the de facto standard for the software industry, especially in the field of start-up companies.
In the upcoming sections, we will be digging a bit deeper into some of the most popular models and frameworks in the agile family. While doing a deep dive on each model is outside the scope of this book, a set of additional resources will be provided at the end of this chapter if you are interested in learning more about the following models.
Lean software development is one of the earliest members of the agile family of software development models. It was introduced by Mary and Tom Poppendieck in 2003 [10]. Its roots go back to the lean manufacturing techniques that were introduced by Toyota's production system in the 70s. When applied to software development, the model advocates seven key principles.
This is one of the key philosophies of the lean development model. Anything that does not directly add value to the final deliverable is considered as a blocker and must be removed.
Typical cases of things that are characterized as waste by this model are as follows:
Introduction of non-essential, that is, nice-to-have features when development is underway.
Overly complicated decision-making processes that force development teams to remain idle while waiting for a feature to be signed off
–
in other words:
bureaucracy
!
Unnecessary communication between the various project stakeholders and the development teams. This disrupts the focus of the development team and hinders their development velocity.
The development team should never assume that the customers' requirements are static. Instead, the assumption should always be that they are dynamic and can change over time. Therefore, it is imperative for the development team to come up with appropriate strategies to ensure that their view of the world is always aligned with the customer's.
