31,19 €
Developing large-scale systems that continuously grow in scale and complexity requires a thorough understanding of how software projects should be implemented. Software developers, architects, and technical management teams rely on high-level software design patterns such as microservices architecture, event-driven architecture, and the strategic patterns prescribed by domain-driven design (DDD) to make their work easier.
This book covers these proven architecture design patterns with a forward-looking approach to help Python developers manage application complexity—and get the most value out of their test suites.
Starting with the initial stages of design, you will learn about the main blocks and mental flow to use at the start of a project. The book covers various architectural patterns like microservices, web services, and event-driven structures and how to choose the one best suited to your project. Establishing a foundation of required concepts, you will progress into development, debugging, and testing to produce high-quality code that is ready for deployment. You will learn about ongoing operations on how to continue the task after the system is deployed to end users, as the software development lifecycle is never finished.
By the end of this Python book, you will have developed "architectural thinking": a different way of approaching software design, including making changes to ongoing systems.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 741
Veröffentlichungsjahr: 2022
Python Architecture Patterns
Master API design, event-driven structures, and package management in Python
Jaime Buelta
BIRMINGHAM—MUMBAI
"Python" and the Python Logo are trademarks of the Python Software Foundation.
Python Architecture Patterns
Copyright © 2022 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Producer: Tushar Gupta
Acquisition Editor – Peer Reviews: Saby D'silva
Project Editor: Parvathy Nair
Content Development Editor: Alex Patterson
Copy Editor: Safis Editor
Technical Editor: Tejas Mhasvekar
Proofreader: Safis Editor
Indexer: Pratik Shirodkar
Presentation Designer: Pranit Padwal
First published: January 2022
Production reference: 2020222
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80181-999-2
www.packt.com
Jaime Buelta has been a professional programmer for 20 years and a full-time Python developer for over 10. During that time, he has been exposed to a lot of different technologies while working for different industries and helping them achieve their goals; these industries include aerospace, industrial systems, video game online services, finance services and educational tools. He has been writing technical books since 2018, reflecting on lessons learned over his career, including Python Automation Cookbook and Hands On Docker for Microservices in Python. He is currently living in Dublin, Ireland.
Writing a book is always more than a single person's work. There're not only the people involved directly in polishing and improving the drafts, but also a lot of conversations and talks with exceptional people in the Python and tech community that shape the ideas in it. It also wouldn't be possible without the love and support from Dana, my amazing wife.
Pradeep Pant is a computer programmer, software architect, AI researcher and open source advocate. Pradeep has been writing computer programs for more than 2 decades in various programming languages and platforms, such as microprocessor/Assembly, C, C++, Perl, Python, R, JavaScript, AI/ML, Linux, the cloud and many more. Pradeep holds a master's degree in physics and another master's in computer science. In his free time, Pradeep likes to write about his tech journey and learnings at https://pradeeppant.com.
Pradeep works with Ockham BV, a Belgium-based software development company. The company develops software in the quality and document management systems space.
Pradeep can be contacted through email or through professional networks:
Email: [email protected]: https://www.linkedin.com/in/ppant/GitHub: https://github.com/ppantJoin the book’s Discord workspace for a monthly Ask me Anything session with the authors:
https://packt.link/PythonArchitechture
Preface
Who this book is for
What this book covers
To get the most out of this book
Get in touch
Share your thoughts
Introduction to Software Architecture
Defining the structure of a system
Division into smaller units
In-process communication
Conway's Law – Effects on software architecture
Application example – Overview
Security aspects of software architecture
Summary
Part I: Design
API Design
Abstractions
Using the right abstractions
Leaking abstractions
Resources and action abstractions
RESTful interfaces
A more practical definition
Headers and statuses
Designing resources
Resources and parameters
Pagination
Designing a RESTful API process
Using the Open API specification
Authentication
Authenticating HTML interfaces
Authenticating RESTful interfaces
Self-encoded tokens
Versioning the API
Why versioning?
Internal versus external versioning
Semantic versioning
Simple versioning
Frontend and backend
Model View Controller structure
HTML interfaces
Traditional HTML interfaces
Dynamic pages
Single-page apps
Hybrid approach
Designing the API for the example
Endpoints
Review of the design and implementation
Summary
Data Modeling
Types of databases
Relational databases
Non-relational databases
Key-value stores
Document stores
Wide-column databases
Graph databases
Small databases
Database transactions
Distributed relational databases
Primary/replica
Sharding
Pure sharding
Mixed sharding
Table sharding
Advantages and disadvantages of sharding
Schema design
Schema normalization
Denormalization
Data indexing
Cardinality
Summary
The Data Layer
The Model layer
Domain-Driven Design
Using ORM
Independence from the database
Independence from SQL and the Repository pattern
No problems related to composing SQL
The Unit of Work pattern and encapsulating the data
CQRS, using different models for read and write
Database migrations
Backward compatibility
Relational schema changes
Changing the database without interruption
Data migrations
Changes without enforcing a schema
Dealing with legacy databases
Detecting a schema from a database
Syncing the existing schema to the ORM definition
Summary
Part II: Architectural Patterns
The Twelve-Factor App Methodology
Intro to the Twelve-Factor App
Continuous Integration
Scalability
Configuration
The Twelve Factors
Build once, run multiple times
Dependencies and configurations
Scalability
Monitoring and admin
Containerized Twelve-Factor Apps
Summary
Web Server Structures
Request-response
Web architecture
Web servers
Serving static content externally
Reverse proxy
Logging
Advanced usages
uWSGI
The WSGI application
Interacting with the web server
Processes
Process lifecycle
Python worker
Django MVT architecture
Routing a request towards a View
The View
HttpRequest
HttpResponse
Middleware
Django REST framework
Models
URL routing
Views
Serializer
External layers
Summary
Event-Driven Structures
Sending events
Asynchronous tasks
Subdividing tasks
Scheduled tasks
Queue effects
Single code for all workers
Cloud queues and workers
Celery
Configuring Celery
Celery worker
Triggering tasks
Connecting the dots
Scheduled tasks
Celery Flower
Flower HTTP API
Summary
Advanced Event-Driven Structures
Streaming events
Pipelines
Preparation
Base task
Image task
Video task
Connecting the tasks
Running the task
Defining a bus
More complex systems
Testing event-driven systems
Summary
Microservices vs Monolith
Monolithic architecture
The microservices architecture
Which architecture to choose
A side note about similar designs
The key factor – team communication
Moving from a monolith to microservices
Challenges for the migration
A move in four acts
1. Analyze
2. Design
3. Plan
4. Execute
Containerizing services
Building and running an image
Building and running a web service
uWSGI configuration
nginx configuration
Start script
Building and running
Caveats
Orchestration and Kubernetes
Summary
Part III: Implementation
Testing and TDD
Testing the code
Different levels of testing
Unit tests
Integration tests
System tests
Testing philosophy
How to design a great test
Structuring tests
Test-Driven Development
Introducing TDD into new teams
Problems and limitations
Example of the TDD process
Introduction to unit testing in Python
Python unittest
Pytest
Testing external dependencies
Mocking
Dependency injection
Dependency injection in OOP
Advanced pytest
Grouping tests
Using fixtures
Summary
Package Management
The creation of a new package
Trivial packaging in Python
The Python packaging ecosystem
PyPI
Virtual environments
Preparing an environment
A note on containers
Python packages
Creating a package
Development mode
Pure Python package
Cython
Python package with binary code
Uploading your package to PyPI
Creating your own private index
Summary
Part IV: Ongoing operations
Logging
Log basics
Producing logs in Python
Detecting problems through logs
Detecting expected errors
Capturing unexpected errors
Log strategies
Adding logs while developing
Log limitations
Summary
Metrics
Metrics versus logs
Kinds of metrics
Generating metrics with Prometheus
Preparing the environment
Configuring Django Prometheus
Checking the metrics
Starting a Prometheus server
Querying Prometheus
Proactively working with metrics
Alerting
Summary
Profiling
Profiling basics
Types of profilers
Profiling code for time
Using the built-in cProfile module
Line profiler
Partial profiling
Example web server returning prime numbers
Profiling the whole process
Generating a profile file per request
Memory profiling
Using memory_profiler
Memory optimization
Summary
Debugging
Detecting and processing defects
Investigation in production
Understanding the problem in production
Logging a request ID
Analyzing data
Increasing logging
Local debugging
Python introspection tools
Debugging with logs
Debugging with breakpoints
Summary
Ongoing Architecture
Adjusting the architecture
Scheduled downtime
Maintenance window
Incidents
Postmortem analysis
Premortem analysis
Load testing
Versioning
Backward compatibility
Incremental changes
Deploying without interruption
Feature flags
Teamwork aspects of changes
Summary
Other Books You May Enjoy
Index
Cover
Index
The evolution of software means that, over time, systems grow to be more and more complex, and require more and more developers working on them in a coordinated fashion. As the size increases, a general structure arises from there. This structure, if not well planned, can become really chaotic and difficult to work with.
The challenge of software architecture is to plan and design this structure. A well-designed architecture makes different teams able to interact with each other while at the same time having a clear understanding of their own responsibilities and their goals.
The architecture of a system should be designed in a way that day-to-day software development is possible with minimal resistance, allowing for adding features and expanding the system. The architecture in a live system is also always in flux, and can be adjusted and expanded as well, reshaping the different software elements in a deliberate and smooth fashion.
In this book we will see the different aspects of software architecture, from the top level to some of the lower-level details that support the higher view. The book is structured in four sections, covering all the different aspects in the life cycle:
Design before writing any codeArchitectural patterns to use proven approachesImplementation of the design in actual codeOngoing operation to cover changes, and verification that it's all working as expectedDuring the book we will cover different techniques across all these aspects.
This book is for software developers that want to expand their knowledge of software architecture, whether experienced developers that want to expand and solidify their intuitions about complex systems, or less experienced developers who want to learn and grow their abilities, facing bigger systems with a broader view.
We will use code written in Python for the examples. Though you're not required to be an expert, some basic knowledge of Python is advisable.
Chapter 1, Introduction to Software Architecture, presents the topic of what software architecture is and why it is useful, as well as presenting a design example.
The first section of the book covers the Design phase, before the software is written:
Chapter 2, API Design, shows the basics of designing useful APIs that abstract the operations conveniently.
Chapter 3, Data Modeling, talks about the particularities of storage systems and how to design the proper data representation for the application.
Chapter 4, The Data Layer, goes over the code handling of the stored data, and how to make it fit for purpose.
Next, we will present a section that covers the different Architectural patterns available, which reuse proven structures:
Chapter 5, The Twelve-Factor App Methodology, shows how this methodology includes good practices that can be useful when operating with web services and can be applied in a variety of situations.
Chapter 6, Web Server Structures, explains web services and the different elements to take into consideration when settling on both the operative and the software design.
Chapter 7, Event-Driven Structures, describes another kind of system that works asynchronously, receiving information without returning an immediate response.
Chapter 8, Advanced Event-Driven Structures, explains more advanced usages for asynchronous systems, and some different patterns that can be created.
Chapter 9, Microservices vs Monolith, presents these two architectures for complex systems, and goes over their differences.
The Implementation section of the book covers how the code is written:
Chapter 10, Testing and TDD, talks about the fundaments of testing and how Test Driven Development can be used in the coding process.
Chapter 11, Package Management, follows the process of creating reusable parts of code and how to distribute them.
Finally, the last section deals about Ongoing operations, where the system is in operation and requires monitoring at the same time that is adjusted and changed:
Chapter 12, Logging, describes how to record what working systems are doing.
Chapter 13, Metrics, discusses aggregating different values to see how the whole system is behaving.
Chapter 14, Profiling, explains how to understand how code is executed to improve its performance.
Chapter 15, Debugging, covers the process of digging deep into the execution of code to find and fix errors.
Chapter 16, Ongoing Architecture, describes how to successfully operate architectural changes on running systems.
The code bundle for the book is hosted on GitHub at https://github.com/PacktPublishing/Python-Architecture-Patterns. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://static.packt-cdn.com/downloads/9781801819992_ColorImages.pdf
There are a number of text conventions used throughout this book.
CodeInText: Indicates code words in text, object names, module names, folder names, filenames, file extensions, pathnames, dummy URLs and user input. Here is an example: "For this recipe, we need to import the requests module."
A block of code is set as follows:
defleonardo(number): if number in (0, 1): return1# EXAMPLE COMMENTreturn leonardo(number - 1) + leonardo(number - 2) + 1Note that code may be edited for concision and clarity. Refer to the full code when necessary, which is available on GitHub.
Any command-line input or output is written as follows (notice the $ symbol):
$ python example_script.py parametersAny input in the Python interpreter is written as follows (notice the >>> symbol). Expected output will be reflected without the >>> symbol:
>>>import logging>>>logging.warning('This is a warning') WARNING:root:This is a warningTo enter the Python interpreter, call the python3 command with no parameters:
$ python3 Python 3.9.7 (default, Oct 13 2021, 06:45:31) [Clang 13.0.0 (clang-1300.0.29.3)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>>Any command-line input or output is written as follows:
$ cp example.txt copy_of_example.txtBold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: "Select System info from the Administration panel."
Warnings or important notes appear like this.
Tips and tricks appear like this.
Feedback from our readers is always welcome.
General feedback: Email [email protected], and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.
Once you've read Python Architecture Patterns, we'd love to hear your thoughts! Please click here to go straight to the Amazon review page for this book and share your feedback.
Your review is important to us and the tech community and will help us make sure we're delivering excellent quality content.
The objective of this chapter is to present an introduction to what software architecture is and where it's useful. We will look at some of the basic techniques used when defining the architecture of a system and a baseline example of the web services architecture.
This chapter includes a discussion of the implications that software structure has for team structure and communication. As the successful building of any non-tiny piece of software depends heavily on successful communication and collaboration between one or more teams of multiple developers, this factor should be taken into consideration. Also, the structure of the software can have a profound effect on how different elements are accessed, so how software is structured has ramifications for security.
Also, in this chapter, there will be a brief introduction to the architecture of an example system that we will be using to present the different patterns and discussions throughout the rest of the book.
In this chapter, we'll cover the following topics:
Defining the structure of a systemDividing into smaller unitsConway's Law in software architectureGeneral overview of the exampleSecurity aspects of software architectureLet's dive in.
At its core, software development is about creating and managing complex systems.
In the early days of computing, programs were relatively simple. At most, they perhaps could calculate a parabolic trajectory or factorize numbers. The very first computer program, designed in 1843 by Ada Lovelace, calculated a sequence of Bernoulli numbers. A hundred years after that, during the Second World War, electronic computers were invented to break encryption codes. As the possibilities of the new invention started to be explored, more and more complex operations and systems were designed. Tools like compilers and high-level languages multiplied the number of possibilities and the rapid advancement of hardware allowed more and more operations to be performed. This quickly created a need to manage the growing complexity and apply consistent engineering principles to the creation of software.
More than 50 years after the birth of the computing industry, the software tools at our disposal are incredibly varied and powerful. We stand on the shoulders of giants to build our own software. We can quickly add a lot of functionalities with relatively little effort, either leveraging high-level languages and APIs or using out-of-the-box modules and packages. With this great power comes the great responsibility of managing the explosion of complexity that it produces.
In the most simple terms, software architecture defines the structure of a software system. This architecture can develop organically, usually in the early stages of a project, but after system growth and a few change requests, the need to think carefully about the architecture becomes more and more important. As the system becomes bigger, the structure becomes more difficult to change, which affects future efforts. It's easier to make changes following the structure rather than against the structure.
Making it so that certain changes are difficult to do is not necessarily always a bad thing. Changes that should be made difficult could involve elements that need to be overseen by different teams or perhaps elements that can affect external customers. While the main focus is to create a system that's easy and efficient to change in the future, a smart architectural design will have a proper balance of ease and difficulty based on the requirements. Later in the chapter, we will study security as a clear example of when to keep certain operations difficult to implement.
At the core of software architecture, then, is taking a look at the big picture: to focus on where the system is going to be in the future, to be able to materialize this view, but also to help the present situation. The usual choice between short-term wins and long-term operation is very important in development, and its most common outcome is the creation of technical debt. Software architecture deals mostly with long-term implications.
The considerations for software architecture can be quite numerous and there needs to be a balance between them. Some examples may include:
Business vision, if the system is going to be commercially exploited. This may include requirements coming from stakeholders like marketing, sales, or management. Business vision is typically driven by customers.Technical requirements, like being sure that the system is scalable and can handle a certain number of users, or that the system is fast enough for its use case. A news website requires different update times than a real-time trading system.Security and reliability concerns, the seriousness of which depends on how risky or critical the application and the data stored are.Division of tasks, to allow multiple teams, perhaps specialized in different areas, to work in a flexible way at the same time on the same system. As systems grow, the need to divide them into semi-autonomous, smaller components becomes more pressing. Small projects may live longer with a "single-block" or monolithic approach.Use specific technologies, for example, to allow integration with other systems or leverage the existing knowledge in the team.These considerations will influence the structure and design of a system. In a sense, the software architect is responsible for implementing the application vision and matching it with the specific technologies and teams that will develop it. That makes the software architect an important intermediary between the business teams and the technology teams, as well as between the different technology teams. Communication is a critical aspect of the job.
To enable successful communication, a good architecture should define boundaries between the different aspects and assign clear responsibilities. The software architect should, in addition to defining clear boundaries, facilitate the creation of interface channels between the system components and follow up on the implementation details.
Ideally, the architectural design should happen at the beginning of system design, with a well thought-out design based on the requirements for the project. This is the general approach in this book because it's the best way to explain the different options and techniques. But it's not the most common use case in real life.
One of the main challenges for a software architect is working with existing systems that need to be adapted, making incremental approaches toward a better system, all while not interrupting the normal daily operation that keeps the business running.
The main technique for software architecture is to divide the whole system into smaller elements and describe how they interact with each other. Each smaller element, or unit, should have a clear function and interface.
For example, a common architecture for a typical system could be a web service architecture composed of:
A database that stores all the data in MySQLA web worker that serves dynamic HTML content written in PHPAn Apache web server that handles all the web requests, returns any static files, like CSS and images, and forwards the dynamic requests to the web workerFigure 1.1: Typical web architecture
This architecture and tech stack has been extremely popular since the early 2000s and was called LAMP, an acronym made from the different open source projects involved: (L)inux as an operating system, (A)pache, (M)ySQL, and (P)HP. Nowadays, the technologies can be swapped for equivalent ones, like using PostgreSQL instead of MySQL or Nginx instead of Apache, but still using the LAMP name. The LAMP architecture can be considered the default starting point when designing web-based client/server systems using HTTP, creating a solid and proven foundation to start building a more complex system.
As you can see, every different element has a distinct function in the system. They interact with each other in clearly defined ways. This is known as the Single-Responsibility principle. When presented with new features, most use cases will fall clearly within one of the elements of the system. Any style changes will be handled by the web server and dynamic changes by the web worker. There are dependencies between the elements, as the data stored in the database may need to be changed to support dynamic requests, but they can be detected early in the process.
We will describe this architecture in greater detail in Chapter 9.
Each element has different requirements and characteristics:
The database needs to be reliable, as it stores all the data. Maintenance work like backup- and recovery-related work will be important. The database won't be updated very frequently, as databases are very stable. Changes to the table schemas will be made through restarts in the web worker.The web worker needs to be scalable and not store any state. Instead, any data will be sent and received from the database. This element will be updated often. Multiple copies can be run, either in the same machine or in multiple ones to allow horizontal scalability.The web server will require some changes for new styling, but that won't happen very often. Once the configuration is properly set up, this element will remain quite stable. Only one web server per machine is required, as it's capable of load-balancing between multiple web workers.As we can see, the work balance between elements is very different, as the web worker will be the focus for most new work, while the other two elements will be much more stable. The database will require specific work for us to be sure that it's in good shape, as it's arguably the most critical element of the three. The other two can recover quickly if there's a problem, but any corruption in the database will generate a lot of problems.
The most critical and valuable element of a system is almost always the stored data.
The communication protocols are also unique. The web worker talks to the database using SQL statements. The web server talks to the web worker using a dedicated interface, normally FastCGI or a similar protocol. The web server communicates with the external clients via HTTP requests. The web server and the database don't talk to each other.
These three protocols are different. This doesn't have to be the case for all systems; different components can share the same protocol. For example, there can be multiple RESTful interfaces, which is common in microservices.
The typical way of looking at different units is as different processes running independently, but that's not the only option. Two different modules inside the same process can still follow the Single-Responsibility principle.
The Single-Responsibility principle can be applied at different levels and is used to define the divisions between functions or other blocks. So, it can be applied in smaller and smaller scopes. It's turtles all the way down! But, from the point of view of architecture, the higher-level elements are the most important, as it's the higher level that defines the structure. Knowing how far to go in terms of detail is clearly important, but when taking an architectural approach, it is better to err on the "big picture" side rather than the "too much detail" one.
A clear example of this would be a library that's maintained independently, but it could also be certain modules within a code base. For example, you could create a module that performs all the external HTTP calls and handles all the complexity of keeping connections, retries, handling errors, and so on, or you could create a module to produce reports in multiple formats, based on some parameters.
The important characteristic is that in order to create an independent element, the API needs to be clearly defined and the responsibility needs to be well defined. It should be possible for the module to be extracted into a different repo and installed as a third-party element for it to be considered truly independent.
Creating a big component with internal divisions only is a well-known pattern called a monolithic architecture. The LAMP architecture described above is an example of that, as most of the code is defined inside the web worker. Monoliths are the usual de facto starts of projects, as normally at the start there's no big plan and dividing things strictly into multiple components doesn't have a big advantage when the code base is small. As the code base and system grow more and more complex, the division of elements inside the monolith starts to make sense, and later it may start to make sense to split it into several components. We will discuss monoliths further in Chapter 9, Microservices vs Monolith.
Inside the same component, communication is typically straightforward, as internal APIs will be used. In the vast majority of cases, the same programming language will be used.
A critical concept to always keep in mind while dealing with architectural designs is Conway's Law. Conway's Law is a well-known adage that postulates that the systems introduced in organizations mirror the communication pattern of the organization structure (https://www.thoughtworks.com/insights/articles/demystifying-conways-law):
Any organization that designs a system (defined broadly) will produce a design whose structure is a copy of the organization's communication structure.– Melvin E. Conway
This means that the structure of the organization's people is replicated, either explicitly or otherwise, to form the software structure created by an organization. In a very simple example, a company that has two big departments – say, purchases and sales – will tend to create two big systems, one focused on buying and another on selling, that talk to each other, instead of other possible structures, like a system with divisions by product.
This can feel natural; after all, communication between teams is more difficult than communication within teams. Communication between teams would need to be more structured and require more active work. Communication inside a single group would be more fluid and less rigid. These elements are key for the design of a good software architecture.
The main thing for the successful application of any software architecture is that the team structure needs to follow the designed architecture quite closely. Trying to deviate too much will result in difficulties, as the tendency will be to structure, de facto, everything following group divisions. In the same way, changing the architecture of a system would likely necessitate restructuring the organization. This is a difficult and painful process, as anyone who has experienced a company reorganization will attest.
Division of responsibilities is also a key aspect. A single software element should have a clear owner, and this shouldn't be distributed across multiple teams. Different teams have different goals and focuses, which will complicate the long-term vision and create tensions.
The reverse, a single team taking ownership of multiple elements, is definitely possible but also requires careful consideration to ensure that this doesn't overstress the team.
If there's a big imbalance in the mapping of work units to teams (for example, too many work units for one team and too few for another team), it is likely that there's a problem with the architecture of the system.
As remote work becomes more common and teams increasingly become located in different parts of the world, communication is also impacted. That's why it has become very common to set up different branches to take care of different elements of the system and to use detailed APIs to overcome the physical barriers of geographical distance. Communication improvements also have an effect on the capacity for collaboration, making remote work more effective and allowing fully remote teams to work closely together on the same code base.
The recent COVID-19 crisis has greatly increased the trend of remote working, especially in software. This is resulting in more people working remotely and in better tools that are adapted to work in this way. While time zone differences are still a big barrier to communication, more and more companies and teams are learning to work effectively in full-remote mode. Remember that Conway's Law is very much dependent on the communication dependencies of organizations, but communication itself can change and improve.
Conway's Law should not be considered an impediment to overcome but a reflection of the fact that organizational structure has an impact on the structure of the software. Software architecture is tightly related to how different teams are coordinated and responsibilities are divided. It has an important human communication component.
Keeping this in mind will help you design a successful software architecture so that the communication flow is fluid at all times and you can identify problems in advance. Software architecture is, of course, closely tied to the human factor, as the architecture will ultimately be implemented and maintained by engineers.
In this book, we will be using an application as an example to demonstrate the different elements and patterns presented. This application will be simple but divided into different elements for demonstration purposes. The full code for the example is available on GitHub, and different parts of it will be presented in the different chapters. The example is written in Python, using well-known frameworks and modules.
The example application is a web application for microblogging, very similar to Twitter. In essence, users will write short text messages that will be available for other users to read.
The architecture of the example system is described in this diagram:
Figure 1.2: Example architecture
It has the following high-level functional elements:
A public website in HTML that can be accessed. This includes functionality for login, logout, writing new micro-posts, and reading other users' micro-posts (no need to be logged in for this).A public RESTful API, to allow the usage of other clients (mobile, JavaScript, and so on) instead of the HTML site. This will authenticate the users using OAuth and perform actions similar to the website.These two elements, while distinct, will be made into a single application, as shown in the diagram. The front-facing part of the application will include a web server, as we saw in the LAMP architecture description, which has not been displayed here for simplicity.
An important element to take into consideration when creating an architecture is the security requirements. Not every application is the same, so some can be more relaxed in this aspect than others. For example, a banking application needs to be 100 times more secure than, say, an internet forum for discussing cats. The most common example of this is the storage of passwords. The most naive approach to passwords is to store them, in plain text, associated with a username or email address – say, in a file or a database table. When the user tries to log in, we receive the input password, compare it with the one stored previously, and, if they are the same, we allow the user to log in. Right?
Well, this is a very bad idea, because it can produce serious problems:
If an attacker has access to the storage for the application, they'll be able to read the passwords of all the users. Users tend to reuse passwords (even if it's a bad idea), so, paired with their emails, they'll be exposed to attacks on multiple applications, not only the breached one.This may seem unlikely, but keep in mind that any copy of the data stored is susceptible to attack, including backups.
To make things secure, data needs to be structured in a way that's as protected as possible from access or even copying, without exposing the real passwords of users. The usual solution to this is to have the following schema:
The password itself is not stored. Instead, a cryptographical hash of the password is stored. This applies a mathematical function to the password and generates a replicable sequence of bits, but the reverse operation is computationally very difficult.As the hash is deterministic based on the input, a malicious actor could detect duplicated passwords, as their hashes are the same. To avoid this problem, a random sequence of characters, called a salt, is added for each account. This will be added to each password before hashing, meaning two users with the same password but different salts will have different hashes.Both the resulting hash and the salt are stored.When a user tries to log in, their input password is added to the salt, and the result is compared with the stored hash. If it's correct, the user is logged in.Note that in this design, the actual password is unknown to the system. It's not stored anywhere and is only accepted temporarily to compare it with the expected hash, after being processed.
This example is presented in a simplified way. There are multiple ways of using this schema and different ways of comparing a hash. For example, the bcrypt function can be applied multiple times, increasing encryption each time, which can increase the time required to produce a valid hash, making it more resistant to brute-force attacks.
This kind of system is more secure than one that stores the password directly, as the password is not known by the people operating the system, nor is it stored anywhere.
The problem of mistakenly displaying the password of a user in status logs may still happen! Extra care should be taken to make sure that sensitive information is not being logged by mistake.
In certain cases, the same approach as for passwords can be taken to encrypt other stored data, so that only customers can access their own data. For example, you can enable end-to-end encryption for a communication channel.
Security has a very close relationship with the architecture of a system. As we saw before, the architecture defines which aspects are easy and difficult to change and can make some unsafe things impossible to do, like knowing the password of a user, as we described in the previous example. Other options include not storing data from the user to keep privacy or reducing the data exposed in internal APIs, for example. Software security is a very difficult problem and is often a double-edged sword, and trying to make a system more secure can have the side effect of making operations long-winded and inconvenient.
In this chapter, we looked at what software architecture is and when it is required, as well as its focus on the long-term approach, which is characteristic of the discipline. We learned that the underlying structure of software is difficult to change and that that aspect should be taken into consideration when designing and changing a software system.
We described how the most important thing is to divide a complex system into smaller parts and assign clear goals and objectives to each of them, keeping in mind that these smaller parts can use multiple programming languages and refer to different scopes. We also described the LAMP architecture and how it's a widely successful starting point when creating simple web service systems.
We talked about how Conway's Law affects the architecture of a system, as underlying team structures have a direct impact on the implementation and structure of software. After all, software is operated and developed by humans, and human communication needs to be accounted for to implement it successfully.
We described the example that we will use throughout the book to describe the different elements and patterns we will present. Finally, we commented on the security aspects of software architecture and how creating barriers to accessing data as part of the structural design of a system can mitigate security issues.
In the next section of the book, we will talk about the different aspects of designing a system.
Join the book’s Discord workspace for a monthly Ask me Anything session with the authors:
https://packt.link/PythonArchitechture
We will first spend some time explaining the basic steps to designing a system. My suggestion is as follows: "Design is the first stage of any successful system, and encompasses everything that you work on before you begin implementation." In this section, we will focus on the general principles and core aspects of each element of the system.
Two main core elements should be at the forefront when designing each part of the system: The interface, or how an element of the system connects to the rest, and data storage, how this element stores information that can be retrieved later.
Both are critical. The interface defines what the system is and its functionality from the point of view of any user. A well-designed interface hides the implementation details and provides some abstractions that allow for a consistent and comprehensive way of performing actions.
The heart of virtually every successful working system is the data. This is where the value of the system lies. Any seasoned engineer will tell you that an organization can reconstruct a system when the data is available, even if the code that produced it is lost, rather than recover from a total loss of the data, even if the application code is available.
The storage of data is, then, the core of the system. There are many options we can choose from when it comes to storing our data. What kind of database? Store the data in one data storage facility, or several? The traditional way of using raw access to the database, typically in plain SQL statements, is not the most efficient option, and it's prone to problems when complex systems are involved. Other kinds of databases exist that don't even use SQL. We will look at multiple options along with their pros and cons.
Changing how the data is stored in the system is hard once the system is in operation. It isn't impossible but will require a lot of work. The storage option is arguably the founding stone when designing a new system, so be sure that the chosen option fits your requirements. It can be difficult to design something that isn't overly complex but also allows the allocated space to grow as the application starts to store more and more data as it's used.
This section of the book comprises the following chapters:
API Design, describing how to create useful, yet flexible, interfacesData Modeling, with different ways of handling and representing data to ensure that this critical aspect is well thought through from the outsetIn this chapter, we will talk about the basic application programming interface (API)design principles. We will see how to start our design by defining useful abstractions that will create the foundation for the design.
We will then present the principles for RESTful interfaces, covering both the strict, academic definition and a more practical definition to help when making designs. We will look at design approaches and techniques to help create a useful API based on standard practices. We will also spend some time talking about authentication, as this is a critical element for most APIs.
We will focus in this book on RESTful interfaces, as they are the most common right now. Before that, there were other alternatives, including Remote Procedure Call (RPC) in the 80s, a way to make a remote function call, or Single Object Access Protocol (SOAP) in the early 2000s, which standardized the format of the remote call. Current RESTful interfaces are easier to read and take advantage of the already established usage of HTTP more strongly, although, in essence, they could potentially be integrated via these older specifications.
They are still available nowadays, although predominantly in older systems.
We will cover how to create a versioning system for the API, attending to the different use cases that can be affected.
We will see the difference between the frontend and the backend, and its interaction. Although the main objective of the chapter is to talk about API interfaces, we will also talk about HTML interfaces to see the differences and how they interact with other APIs.
Finally, we will describe the design for the example that we will use later in the book.
In this chapter, we'll cover the following topics:
AbstractionsRESTful interfacesAuthenticationVersioning the APIFrontend and backendHTML interfacesDesigning the API for the exampleLet's take a look at abstractions first.
An API allows us to use a piece of software without totally understanding all the different steps that are involved. It presents a clear menu of actions that can be performed, enabling an external user, who doesn't necessarily understand the complexities of the operation, to perform them efficiently. It presents a simplification of the process.
These actions can be purely functional, where the output is only related to the input; for example, a mathematical function that calculates the barycenter of a planet and a star, given their orbits and masses.
Alternatively, they can deal with state, as the same action repeated twice may have different effects; for example, retrieving the time in the system. Perhaps even a call allows the time zone of the computer to be set, and two subsequent calls to retrieve the time may return very different results.
In both cases, the APIs are defining abstractions. Retrieving the time of the system in a single operation is simple enough, but perhaps the details of doing so are not so easy. It may involve reading in a certain way some piece of hardware that keeps track of time.
Different hardware may report the time differently, but the result should always be translated in a standard format. Time zones and time savings need to be applied. All this complexity is handled by the developers of the module that exposes the API and provides a clear and understandable contract with any user. "Call this function, and the time in ISO format will be returned."
While we are mainly talking about APIs, and throughout the book we will describe mostly ones related to online services, the concept of abstractions really can be applied to anything. A web page to manage a user is an abstraction, as it defines the concept of "user account" and the associated parameters. Another omnipresent example is the "Shopping cart" for e-commerce. It's good to create a clear mental image, as it helps to create a clearer and more consistent interface for the user.
This is, of course, a simple example, but APIs can hide a tremendous amount of complexity under their interfaces. A good example to think about is a program like curl. Even when just sending an HTTP request to a URL and printing the returned headers, there is a huge amount of complexity associated with this:
$ curl -IL http://google.com HTTP/1.1 301 Moved Permanently Location: http://www.google.com/ Content-Type: text/html; charset=UTF-8 Date: Tue, 09 Mar 2021 20:39:09 GMT Expires: Thu, 08 Apr 2021 20:39:09 GMT Cache-Control: public, max-age=2592000 Server: gws Content-Length: 219 X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN HTTP/1.1 200 OK Content-Type: text/html; charset=ISO-8859-1 P3P: CP="This is not a P3P policy! See g.co/p3phelp for more info." Date: Tue, 09 Mar 2021 20:39:09 GMT Server: gws X-XSS-Protection: 0 X-Frame-Options: SAMEORIGIN Transfer-Encoding: chunked Expires: Tue, 09 Mar 2021 20:39:09 GMT Cache-Control: private Set-Cookie: NID=211=V-jsXV6z9PIpszplstSzABT9mOSk7wyucnPzeCz-TUSfOH9_F-07V6-fJ5t9L2eeS1WI-p2G_1_zKa2Tl6nztNH-ur0xF4yIk7iT5CxCTSDsjAaasn4c6mfp3fyYXMp7q1wA2qgmT_hlYScdeAMFkgXt1KaMFKIYmp0RGvpJ-jc; expires=Wed, 08-Sep-2021 20:39:09 GMT; path=/; domain=.google.com; HttpOnlyThis makes a call to www.google.com and displays the headers of the response using the -I flag. The -L flag is added to automatically redirect any request which is what is happening here.
Making a remote connection to a server requires a lot of different moving parts:
DNS access to translate the server address www.google.com to an actual IP address.The communication between both servers, which involves using the TCP protocol to generate a persistent connection and guarantee the reception of the data.Redirection based on the result from the first request, as the server returns a code pointing to another URL. This was done owing to the usage of the -L flag.The redirection points to an HTTPS URL, which requires adding a verification and encryption layer on top of that.Each of these steps also makes use of other APIs to perform smaller actions, which could involve the functionality of the operating system or even calling remote servers such as the DNS one to obtain data from there.
Here, the curl interface is used from the command line. While the strict definition of an API discard stipulates that the end user is a human, there's not really a big change. Good APIs should be easily testable by human users. Command-line interfaces can also be easily automated by bash scripts or other languages.
But, from the point of view of the user of curl, this is not very relevant. It is simplified to the point where a single command line with a few flags can perform a well-defined operation without worrying about the format to get data from the DNS or how to encrypt a request using SSL.
For a successful interface, the root is to create a series of abstractions and present them to the user so that they can perform actions. The most important question when designing a new API is, therefore, to decide which are the best abstractions.
When the process happens organically, the abstractions are decided mostly on the go. There is an initial idea, acknowledged as an understanding of the problem, that then gets tweaked.
For example, it's very common to start a user management system by adding different flags to the users. So, a user has permission to perform action A, and then a parameter to perform action B, and so on. By adding one flag at a time, come the tenth flag, the process becomes very confusing.
Then, a new abstraction can be used; roles and permissions. Certain kinds of users can perform different actions, such as admin roles. A user can have a role, and the role is the one that describes the related permissions.
Note that this simplifies the problem, as it's easy to understand and manage. However, moving from "an individual collection of flags" to "several roles" can be a complicated process. There is a reduction in the number of possible options. Perhaps some existing users have a peculiar combination of flags. All this needs to be handled carefully.
While designing a new API, it is good to try to explicitly describe the inherent abstractions that the API uses to clarify them, at least at a high level. This also has the advantage of being able to think about that as a user of the API and see if things add up.
One of the most useful viewpoints in the work of software developers is to detach yourself from your "internal view" and take the position of the actual user of the software. This is more difficult than it sounds, but it's certainly a skill worth developing. This will make you a better designer. Don't be afraid to ask a friend or coworker to detect blind spots in your design.
However, every abstraction has its limits.
When an abstraction is leaking details from the implementation, and not presenting a perfectly opaque image, it's called a leaky abstraction.
While a good API should try to avoid this, sometimes it happens. This can be caused by underlying bugs in the code serving the API, or sometimes directly from the way the code operates in certain operations.
A common case for this is relational databases. SQL abstracts the process of searching data from how it is actually stored in the database. You can search with complex queries and get the result, and you don't need to know how the data is structured. But sometimes, you'll find out that a particular query is slow, and reorganizing the parameters of the query has a big impact on how this happens. This is a leaky abstraction.
This is very common, and the reason why there are significant tools to help ascertain what is going on when running a SQL query, which is very detached from the implementation. The main one is the EXPLAIN command.
Operating systems are good examples of a system that generates good abstractions that don't leak the majority of the time. There are lots of examples. Not being able to read or write a file due to a lack of space (a less common problem now than three decades ago); breaking a connection with a remote server due to a network problem; or not being able to create a new connection due to reaching a limit in terms of the number of open file descriptors.
Leaky abstractions are, to a certain degree, unavoidable. They are the result of not living in a perfect world. Software is fallible. Understanding and preparing for that is critical.
"All non-trivial abstractions, to some degree, are leaky."– Joel Spolsky's Law of Leaky Abstractions
When designing an API, it is important to take this fact into account for several reasons:
To present clear errors and hints externally. A good design will always include cases for things going wrong and try to present them clearly with proper error codes or error handling.To deal with errors that could come from dependent services internally. Dependent services can fail or have other kinds of problems. The API should abstract this to a certain degree, recovering from the problem if possible, failing gracefully if not, and returning a proper result if recovery is impossible.The best design is the one that not only designs things when they work as expected, but also prepares for unexpected problems and is sure that they can be analyzed and corrected.
A very useful pattern to consider when designing an API is to produce a set of resources that can perform actions. This pattern uses two kinds of elements: resources and actions.
Resources are passive elements that are referenced, while actions are performed on resources.
For example, let's define a very simple interface to play a simple game guessing coin tosses. This is a game consisting of three guesses for three coin tosses, and the user wins if at least two of these guesses are correct.
The resource and actions may be as follows:
Resource
Actions
Details
HEADS
None
A coin toss result.
TAILS
None
A coin toss result.
GAME
START
Start a new GAME.
READ
Returns the current round (1 to 3) and the current correct guesses.
COIN_TOSS
TOSS
Toss the coin. If the GUESS hasn't been produced, it returns an error.
GUESS
Accepts HEADS or TAILS as the guess.
RESULT
It returns HEADS or TAILS and whether the GUESS was correct.
A possible sequence for a single game could be:
GAME START > (GAME 1) GAME 1 COIN_TOSS GUESS HEAD GAME 1 COIN_TOSS TOSS GAME 1 COIN_TOSS RESULT > (TAILS, INCORRECT) GAME 1 COIN_TOSS GUESS HEAD GAME 1 COIN_TOSS TOSS GAME 1 COIN_TOSS RESULT > (HEAD, CORRECT) GAME 1 READ > (ROUND 2, 1 CORRECT, IN PROCESS) GAME 1 COIN_TOSS GUESS HEAD GAME 1 COIN_TOSS TOSS GAME 1 COIN_TOSS RESULT > (HEAD, CORRECT) GAME 1 READ > (ROUND 3, 2 CORRECT, YOU WIN)Note how each resource has its own set of actions that can be performed. Actions can be repeated if that's convenient, but it's not required. Resources can be combined into a hierarchical representation (like here, where COIN_TOSS depends on a higher GAME resource). Actions can require parameters that can be other resources.
However, the abstractions are organized around having a consistent set of resources and actions. This way of explicitly organizing an API is useful as it clarifies what is passive and what's active in the system.
Object-oriented programming (OOP) uses these abstractions, as everything is an object that can receive messages to perform some actions. Functional programming, on the other hand, doesn't fit neatly into this structure, as "actions" can work like resources.
This is a common pattern, and it's used in RESTful interfaces, as we will see next.
RESTful interfaces are incredibly common these days, and for good reason. They've become the de facto standard in web services that serve other applications.
Representational State Transfer (REST) was defined in 2000 in a Ph.D. dissertation by Roy Fielding, and it uses HTTP standards as a basis to create a definition of a software architecture style.
For a system to be considered RESTful, it should follow certain rules:
Client-server architecture. It works through remote calling.Stateless. All the information related to a particular request should be contained in the request itself, making it independent from the specific server serving the request.Cacheability. The cacheability of the responses should be clear, either to say they are cacheable or not.Layered system.The client cannot tell if they are connected to a final server or if there's an intermediate server.Uniform interface, with four prerequisites:Resource identification in requests, meaning a resource is unequivocally represented, and its representation is independentResource manipulation through representations, allowing clients to have all the required information to make changes when they have the representationSelf-descriptive messages, meaningmessages are complete in themselvesHypermedia as the Engine of Application State, meaning the client can walk through the system using referenced hyperlinksCode on demand. This is an optional requirement, and it's normally not used. Servers can submit code in response to help perform operations or improve the client; for example, submitting JavaScript to be executed in the browser.This is the most formal definition. As you can see, it's not necessarily based on HTTP requests. For more convenient usage, we need to limit the possibilities somewhat and set a common framework.
When people talk colloquially about RESTful interfaces, normally they are understood as interfaces based on HTTP resources using JSON formatted requests. This is wholly compatible with the definition that we've seen before, but taking some key elements into consideration.
These key elements are sometimes ignored, leading to pseudo-RESTful interfaces, which don't have the same properties.
The main one is that
