E-Book
34,79 €

Mastering Concurrency in Python E-Book

Quan Nguyen

0,0

34,79 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Python is one of the most popular programming languages, with numerous libraries and frameworks that facilitate high-performance computing. Concurrency and parallelism in Python are essential when it comes to multiprocessing and multithreading; they behave differently, but their common aim is to reduce the execution time. This book serves as a comprehensive introduction to various advanced concepts in concurrent engineering and programming.
Mastering Concurrency in Python starts by introducing the concepts and principles in concurrency, right from Amdahl's Law to multithreading programming, followed by elucidating multiprocessing programming, web scraping, and asynchronous I/O, together with common problems that engineers and programmers face in concurrent programming. Next, the book covers a number of advanced concepts in Python concurrency and how they interact with the Python ecosystem, including the Global Interpreter Lock (GIL). Finally, you'll learn how to solve real-world concurrency problems through examples.
By the end of the book, you will have gained extensive theoretical knowledge of concurrency and the ways in which concurrency is supported by the Python language

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 551

Veröffentlichungsjahr: 2018

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

Advanced Python Programming

Quan Nguyen

Hands-On Application Development with PyCharm

Quan Nguyen

Für immer aufgeräumt – auch digital

Jürgen Kurz

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Kopf schlägt Kapital

Günter Faltin

Der größte Raubzug der Geschichte

Matthias Weik

Der Mann und das Holz

Lars Mytting

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Leseprobe

Mastering Concurrency in Python

Create faster programs using concurrency, asynchronous, multithreading, and parallel programming

Quan Nguyen

BIRMINGHAM - MUMBAI

Mastering Concurrency in Python

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Richa TripathiAcquisition Editor: Shahnish Khan Content Development Editor: Zeeyan PinheiroTechnical Editor: Romy DiasCopy Editor: Safis EditingProject Coordinator: Vaidehi SawantProofreader: Safis EditingIndexer: Rekha NairGraphics: Alishon MendonsaProduction Coordinator: Aparna Bhagat

First published: November 2018

Production reference: 1231118

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78934-305-2

www.packtpub.com

To Tiffany, my incredible mentor and friend. Your guidance and friendship made all of this possible

– Quan Nguyen

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Quan Nguyen is a Python enthusiast and data scientist. He is currently a data analysis engineer at Micron Technology, Inc. With a strong background in mathematics and statistics, Quan is interested in the fields of scientific computing and machine learning. With data analysis being his focus, Quan also enjoys incorporating technology automation into everyday tasks through programming.

Quan's passion for Python programming has led him to be heavily involved in the Python community. He started as a primary contributor for the book Python for Scientists and Engineers and various open source projects on GitHub. Quan is also a writer for the Python Software Foundation and an occasional content contributor for DataScience.com (part of Oracle).

I'm grateful to my parents for their unwavering support. Special thanks to my sister, who somehow always managed to remind me of the truly important things in life. To aunt Y and uncle Nam: thank you for helping me in ways I never knew I needed.

A big thanks to my friends at Sigma Nu for always pushing me forward. To Karan, who started this amazing journey. Thank you, Zeeyan and Romy, for your dedication. To technical reviewers, for your insightful feedback.

About the reviewers

Romain Picard is currently a data science engineer. He has been working in the digital TV and telecommunications industry for 20 years. His daily work consists of data manipulation, machine learning model training, and model deployment. Most of these tasks are based on Python code.

He was previously a media software architect and a software developer. In these previous positions, he designed and developed TV and OTT players that have been used in millions of set-top boxes. Romain is especially interested in algorithms, and is constantly hunting for the most effective algorithm for each given use case.

Yogendra Sharma is a developer with experience of the architecture, design, and development of scalable and distributed applications. He was awarded a bachelor's degree from Rajasthan Technical University in computer science. With a core interest in microservices and Spring, he also has hands-on experience with technologies such as AWS Cloud, Python, J2EE, Node.js, JavaScript, Angular, MongoDB, and Docker. Currently, he works as an IoT and cloud architect at Intelizign Engineering Services, Pune.

Simone Marzola is a software engineer and technical lead with 10 years of experience. He is passionate about Python and machine learning, which have led him to be an active contributor in open source communities such as Mozilla Services and the Pylons Project, as well as involvement in European conferences as a speaker. Simone has been a lecturer on the BIG DIVE data science and machine learning course. He is currently a CTO and Scrum Master at Oval Money.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Title Page

Mastering Concurrency in Python

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the author

About the reviewers

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download the color images

Code in Action

Conventions used

Get in touch

Reviews

Advanced Introduction to Concurrent and Parallel Programming

Technical requirements

What is concurrency?

Concurrent versus sequential

Example 1 – checking whether a non-negative number is prime

Concurrent versus parallel

A quick metaphor

Not everything should be made concurrent

Embarrassingly parallel

Inherently sequential

Example 2 – inherently sequential tasks

I/O bound

The history, present, and future of concurrency

The history of concurrency

The present

The future

A brief overview of mastering concurrency in Python

Why Python?

Setting up your Python environment

General setup

Downloading example code

Summary

Questions

Further reading

Amdahl's Law

Technical requirements

Amdahl's Law

Terminology

Formula and interpretation

The formula for Amdahl's Law

A quick example

Implications

Amdahl's Law's relationship to the law of diminishing returns

How to simulate in Python

Practical applications of Amdahl's Law

Summary

Questions

Further reading

Working with Threads in Python

Technical requirements

The concept of a thread

Threads versus processes

Multithreading

An example in Python

An overview of the threading module

The thread module in Python 2

The threading module in Python 3

Creating a new thread in Python

Starting a thread with the thread module

Starting a thread with the threading module

Synchronizing threads

The concept of thread synchronization

The threading.Lock class

An example in Python

Multithreaded priority queue

A connection between real-life and programmatic queues

The queue module

Queuing in concurrent programming

Multithreaded priority queue

Summary

Questions

Further reading

Using the with Statement in Threads

Technical requirements

Context management

Starting from managing files

The with statement as a context manager

The syntax of the with statement

The with statement in concurrent programming

Example of deadlock handling

Summary

Questions

Further reading

Concurrent Web Requests

Technical requirements

The basics of web requests

HTML

HTTP requests

HTTP status code

The requests module

Making a request in Python

Running a ping test

Concurrent web requests

Spawning multiple threads

Refactoring request logic

The problem of timeout

Support from httpstat.us and simulation in Python

Timeout specifications

Good practices in making web requests

Consider the terms of service and data-collecting policies

Error handling

Update your program regularly

Avoid making a large number of requests

Summary

Questions

Further reading

Working with Processes in Python

Technical requirements

The concept of a process

Processes versus threads

Multiprocessing

Introductory example in Python

An overview of the multiprocessing module

The process class

The Pool class

Determining the current process, waiting, and terminating processes

Determining the current process

Waiting for processes

Terminating processes

Interprocess communication

Message passing for a single worker

Message passing between several workers

Summary

Questions

Further reading

Reduction Operators in Processes

Technical requirements

The concept of reduction operators

Properties of a reduction operator

Examples and non-examples

Example implementation in Python

Real-life applications of concurrent reduction operators

Summary

Questions

Further reading

Concurrent Image Processing

Technical requirements

Image processing fundamentals

Python as an image processing tool

Installing OpenCV and NumPy

Computer image basics

RGB values

Pixels and image files

Coordinates inside an image

OpenCV API

Image processing techniques

Grayscaling

Thresholding

Applying concurrency to image processing

Good concurrent image processing practices

Choosing the correct way (out of many)

Spawning an appropriate number of processes

Processing input/output concurrently

Summary

Questions

Further reading

Introduction to Asynchronous Programming

Technical requirements

A quick analogy

Asynchronous versus other programming models

Asynchronous versus synchronous programming

Asynchronous versus threading and multiprocessing

An example in Python

Summary

Questions

Further reading

Implementing Asynchronous Programming in Python

Technical requirements

The asyncio module

Coroutines, event loops, and futures

Asyncio API

The asyncio framework in action

Asynchronously counting down

A note about blocking functions

Asynchronous prime-checking

Improvements from Python 3.7

Inherently blocking tasks

concurrent.futures as a solution for blocking tasks

Changes in the framework

Examples in Python

Summary

Questions

Further reading

Building Communication Channels with asyncio

Technical requirements

The ecosystem of communication channels

Communication protocol layers

Asynchronous programming for communication channels

Transports and protocols in asyncio

The big picture of asyncio's server client

Python example

Starting a server

Installing Telnet

Simulating a connection channel

Sending messages back to clients

Closing the transports

Client-side communication with aiohttp

Installing aiohttp and aiofiles

Fetching a website's HTML code

Writing files asynchronously

Summary

Questions

Further reading

Deadlocks

Technical requirements

The concept of deadlock

The Dining Philosophers problem

Deadlock in a concurrent system

Python simulation

Approaches to deadlock situations

Implementing ranking among resources

Ignoring locks and sharing resources

An additional note about locks

Concluding note on deadlock solutions

The concept of livelock

Summary

Questions

Further reading

Starvation

Technical requirements

The concept of starvation

What is starvation?

Scheduling

Causes of starvation

Starvation's relationship to deadlock

The readers-writers problem

Problem statement

The first readers-writers problem

The second readers-writers problem

The third readers-writers problem

Solutions to starvation

Summary

Questions

Further reading

Race Conditions

Technical requirements

The concept of race conditions

Critical sections

How race conditions occur

Simulating race conditions in Python

Locks as a solution to race conditions

The effectiveness of locks

Implementation in Python

The downside of locks

Turning a concurrent program sequential

Locks do not lock anything

Race conditions in real life

Security

Operating systems

Networking

Summary

Questions

Further reading

The Global Interpreter Lock

Technical requirements

An introduction to the Global Interpreter Lock

An analysis of memory management in Python

The problem that the GIL addresses

Problems raised by the GIL

The potential removal of the GIL from Python

How to work with the GIL

Implementing multiprocessing, rather than multithreading

Getting around the GIL with native extensions

Utilizing a different Python interpreter

Summary

Questions

Further reading

Designing Lock-Based and Mutex-Free Concurrent Data Structures

Technical requirements

Lock-based concurrent data structures in Python

LocklessCounter and race conditions

Embedding locks in the data structure of the counter

The concept of scalability

Analysis of the scalability of the counter data structure

Approximate counters as a solution for scalability

The idea behind approximate counters

Implementing approximate counters in Python

A few considerations for approximate counter designs

Mutex-free concurrent data structures in Python

The impossibility of being lock-free in Python

Introduction to the network data structure

Implementing a simple network data structure in Python and race conditions

RCU as a solution

Building on simple data structures

Summary

Questions

Further reading

Memory Models and Operations on Atomic Types

Technical requirements

Python memory model

The components of Python memory manager

Memory model as a labeled directed graph

In the context of concurrency

Atomic operations in Python

What does it mean to be atomic?

The GIL reconsidered

Innate atomicity in Python

Atomic versus nonatomic

Simulation in Python

Summary

Questions

Further reading

Building a Server from Scratch

Technical requirements

Low-level network programming via the socket module

The theory of server-side communication

The API of the socket module

Building a simple echo server

Building a calculator server with the socket module

The underlying calculation logic

Implementing the calculator server

Building a non-blocking server

Analyzing the concurrency of the server

Generators in Python

Asynchronous generators and the send method

Making the server non-blocking

Summary

Questions

Further reading

Testing, Debugging, and Scheduling Concurrent Applications

Technical requirements

Scheduling with APScheduler

Installing APScheduler

Not a scheduling service

APScheduler functionalities

APScheduler API

Scheduler classes

Executor classes

Trigger keywords

Common scheduler methods

Examples in Python

Blocking scheduler

Background scheduler

Executor pool

Running on the cloud

Testing and concurrency in Python

Testing concurrent programs

Unit testing

Static code analysis

Testing programs concurrently

Debugging concurrent programs

Debugging tools and techniques

Debugging and concurrency

Summary

Questions

Further reading

Assessments

Chapter 1

Chapter 2

Chapter 3

Chapter 4

Chapter 5

Chapter 6

Chapter 7

Chapter 8

Chapter 9

Chapter 10

Chapter 11

Chapter 12

Chapter 13

Chapter 14

Chapter 15

Chapter 16

Chapter 17

Chapter 18

Chapter 19

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Concurrency can be notoriously difficult to get right, but fortunately, the Python programming language makes working with concurrency tractable and easy. This book shows how Python can be used to program high-performance, robust, concurrent programs with its unique form of programming.

Designed for any curious developer with an interest in building fast, non-blocking, and resource-thrifty systems applications, this book will cover the best practices and patterns to help you incorporate concurrency into your systems. Additionally, emerging topics in Python concurrent programming will be discussed, including the new AsyncIO syntax, the widely accepted view that "locks don't lock anything," the use of atomic message queues, concurrent application architecture, and best practices.

We will tackle complex concurrency concepts and models via hands-on and engaging code examples. Having read this book, you will have gained a deep understanding of the principal components in the Python concurrency ecosystem, as well as a practical appreciation of different approaches to a real-life concurrency problem.

Who this book is for

If you're a developer familiar who's and you Python who want to learn to build high-performance applications that scale by leveraging single-core, multi-core, or distributed concurrency, then this book is for you.

What this book covers

Chapter 1, Advanced Introduction to Concurrent and Parallel Programming, introduces you to the concept of concurrency, and demonstrates an instance in which concurrent programming can improve significantly the speed of a Python program.

Chapter 2, Amdahl's Law, takes a theoretical approach and discusses the limitations of concurrency in improving the speed of applications. We will take a look at what concurrency truly provides and how we can best incorporate it.

Chapter 3, Working with Threads in Python, introduces the formal definition of threading and covers a different approach to implementing threading in a Python program. In this chapter, we will also discuss a major element in concurrent programming—the concept of synchronization.

Chapter 4, Using the with Statement in Threads, combines the concept of context management with threading in the overall context of concurrent programming in Python. We will be introduced to the main idea behind context management and how it is used in various programming practices, including threading.

Chapter 5, Concurrent Web Requests, covers one of the main applications of concurrent programming: web scraping. It also covers the concept of web scraping, along with other relevant elements, before discussing how threading can be applied to web scraping programs in order to achieve significant speedup.

Chapter 6, Working with Processes in Python, shows the formal definition of multiprocessing and how Python supports it. We will also learn more about the key differences between threading and multiprocessing, which are often confused with one another.

Chapter 7, Reduction Operators in Processes, pairs the concepts of reduction operations and multiprocessing together as a concurrent programming practice. This chapter will go over the theoretical foundation of reduction operations and how it is relevant to multiprocessing as well as programming in general.

Chapter 8, Concurrent Image Processing, goes into a specific application of concurrency: image processing. The basic ideas behind image processing, in addition to some of the most common processing techniques, are discussed. We will, of course, see how concurrency, specifically multiprocessing, can speed up the task of image processing.

Chapter 9, Introduction to Asynchronous Programming, considers the formal concept of asynchronous programming as one of the three major concurrent programming models aside from threading and multiprocessing. We will learn how asynchronous programming is fundamentally different from the two mentioned, but can still speedup concurrent applications.

Chapter 10, Implementing Asynchronous Programming in Python, goes in depth into the API that Python provides to facilitate asynchronous programming. Specifically, we will learn about the asyncio module, which is the main tool for implementing asynchronous programming in Python, and the general structure of an asynchronous application.

Chapter 11, Building Communication Channels with asyncio, combines the knowledge obtained regarding asynchronous programming covered in previous chapters with the topic of network communication. Specifically, we will look into using the aiohttp module as a tool to make asynchronous HTTP requests to web servers, as well as the aiofile module that implements asynchronous file reading/writing.

Chapter 12, Deadlocks, introduces the first of the problems that are commonly faced in concurrent programming. We will learn about the classical dining philosophers problem as an example of how deadlocks can cause concurrent programs to stop functioning. This chapter will also cover a number of potential approaches to deadlocks as well as relevant concepts, such as livelocks and distributed deadlocks.

Chapter 13, Starvation, considers another common problem in concurrent applications. The chapter uses the narrative of the classical readers-writers problem to explain the concept of starvation and its causes. We will, of course, also discuss potential solutions to these problems via hands-on examples in Python.

Chapter 14, Race Conditions, addresses arguably the most well-known concurrency problem: race conditions. We will also discuss the concept of a critical section, which is an essential element in the context of race conditions specifically, and concurrent programming in general. The chapter will then cover mutual exclusion as a potential solution for this problem.

Chapter 15, The Global Interpreter Lock, introduces the infamous GIL, which is considered the biggest challenge in concurrent programming in Python. We will learn about the reason behind GIL's implementation and the problems that it raises. This chapter concludes with some thoughts regarding how Python programmers and developers should think about and interact with the GIL.

Chapter 16, Designing Lock-Based and Mutex-Free Concurrent Data Structures, analyzes the process of designing two common concurrent data structures involving locks as a synchronization mechanism: lock-based and mutex-free. Several advanced analyses of the implementation of the data structures, as well as the performance thereof, are incorporated into the chapter so that readers will develop a critical mindset when it comes to designing concurrent applications.

Chapter 17, Memory Models and Operations on Atomic Types, includes theoretical topics that involve the underlying structure of the Python language and how programmers can take advantage of that in their concurrent applications. The concept of atomic operations is also introduced to readers in this chapter.

Chapter 18, Building a Server from Scratch, walks readers through the process of building a non-blocking server on a low level. We will learn about network programming functionalities that the socket module in Python provides and how we can use them to implement a functioning server. We will also apply the general structure of an asynchronous program discussed earlier in the book to convert a blocking server into a non-blocking one.

Chapter 19, Testing, Debugging, and Scheduling Concurrent Applications, covers higher-level uses of concurrent programs. The chapter will first cover how concurrency can be applied to the task of scheduling Python applications via the APScheduler module. We will then discuss the complexities that arise from concurrency in the topics of testing and debugging Python programs.

To get the most out of this book

Readers of this book should know how to execute Python programs in a development environment, or simply from a command prompt. They should also be familiar with general syntax and practices in Python programming (variables, functions, importing packages, and so on). Some basic computer science knowledge of elements such as pixels, the execution stack, and bytecode instructions is assumed at various points throughout this book.

The final section of Chapter 1, Advanced Introduction to Concurrent and Parallel Programming, covers the process of getting your Python environment set up. Chapters in this book might discuss the use of external libraries or tools that have to be installed via a package manager such as pip and Anaconda, and specific instructions on how to install those libraries are included in their corresponding chapters.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

www.packt.com

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

Enter the name of the book in the

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Concurrency-in-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781789343052_ColorImages.pdf.

Code in Action

Visit the following link to check out videos of the code being run: http://bit.ly/2BsvQj6

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in, and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Advanced Introduction to Concurrent and Parallel Programming

This first chapter of Mastering Concurrency in Python will provide an overview of what concurrent programming is (in contrast to sequential programming). We will briefly discuss the differences between a program that can be made concurrent and one that cannot. We will go over the history of concurrent engineering and programming, and we will provide a number of examples of how concurrent programming is used in the present day. Finally, we will give a brief introduction to the approach that will be taken in this book, including an outline of the chapter structure and detailed instructions for how to download the code and create a working Python environment.

The following topics will be covered in this chapter:

The concept of concurrency

Why some programs cannot be made concurrent, and how to differentiate them from programs that can

The history of concurrency in computer science: how it is used in the industry today, and what can be expected in the future

The specific topics that will be covered in each section/chapter of the book

How to set up a Python environment, and how to check out/download code from GitHub

Technical requirements

Check out the following video to see the Code in Action: http://bit.ly/2TAMAeR

What is concurrency?

It is estimated that the amount of data that needs to be processed by computer programs doubles every two years. The International Data Corporation (IDC), for example, estimates that, by 2020, there will be 5,200 GBof data for every person on earth. With this staggering volume of data come insatiable demands for computing power, and, while numerous computing techniques are being developed and utilized every day, concurrent programming remains one of the most prominent ways to effectively and accurately process data.

While some might be intimidated when the word concurrency appears, the notion behind it is quite intuitive, and it is very common, even in a non-programming context. However, this is not to say that concurrent programs are as simple as sequential ones; they are indeed more difficult to write and understand. Yet, once a correct and effective concurrent structure is achieved, significant improvement in execution time will follow, as you will see later on.

Concurrent versus sequential

Perhaps the most obvious way to understand concurrent programming is to compare it to sequential programming. While a sequential program is in one place at a time, in a concurrent program, different components are in independent, or semi-independent, states. This means that components in different states can be executed independently, and therefore at the same time (as the execution of one component does not depend on the result of another). The following diagram illustrates the basic differences between these two types:

Difference between concurrent and sequential programs

One immediate advantage of concurrency is an improvement in execution time. Again, since some tasks are independent and can therefore be completed at the same time, less time is required for the computer to execute the whole program.

Concurrent versus parallel

At this point, if you have had some experience in parallel programming, you might be wondering whether concurrency is any different from parallelism. The key difference between concurrent and parallel programming is that, while in parallel programs there are a number of processing flows (mainly CPUs and cores) working independently all at once, there might be different processing flows (mostly threads) accessing and using a shared resource at the same time in concurrent programs.

Since this shared resource can be read and overwritten by any of the different processing flows, some form of coordination is required at times, when the tasks that need to be executed are not entirely independent from one another. In other words, it is important for some tasks to be executed after the others, to ensure that the programs will produce the correct results.

Difference between concurrency and parallelism

The preceding figure illustrates the difference between concurrency and parallelism: while in the upper section, parallel activities (in this case, cars) that do not interact with each other can run at the same time, in the lower section, some tasks have to wait for others to finish before they can be executed.

We will look at more examples of these distinctions later on.

A quick metaphor

Concurrency is a quite difficult concept to fully grasp immediately, so let's consider a quick metaphor, in order to make concurrency and its differences from parallelism easier to understand.

Although some neuroscientists might disagree, let's briefly assume that different parts of the human brain are responsible for performing separate, exclusive body part actions and activities. For example, the left hemisphere of the brain controls the right side of the body, and hence, the right hand (and vice versa); or, one part of the brain might be responsible for writing, while another solely processes speaking.

Now, let's consider the first example, specifically. If you want to move your left hand, the right side of your brain (and only the right side) has to process that command to move, which means that the left side of your brain is free to process other information. So, it is possible to move and use the left and right hands at the same time, in order to do different things. Similarly, it is possible to be writing and talking at the same time.

That is parallelism: where different processes don't interact with, and are independent of, each other. Remember that concurrency is not quite like parallelism. Even though there are instances where processes are executed together, concurrency also involves sharing the same resources. If parallelism is similar to using your left and right hands for independent tasks at the same time, concurrency can be associated with juggling, where the two hands perform different tasks simultaneously, but they also interact with the same object (in this case, the juggling balls), and some form of coordination between the two hands is therefore required.

Not everything should be made concurrent

Not all programs are created equal: some can be made parallel or concurrent relatively easily, while others are inherently sequential, and thus cannot be executed concurrently, or in parallel. An extreme example of the former is embarrassingly parallel programs, which can be divided into different parallel tasks, between which there is little or no dependency or need for communication.

Embarrassingly parallel

A common example of an embarrassingly parallel program is the 3D video rendering handled by a graphics processing unit, where each frame or pixel can be processed with no interdependency. Password cracking is another embarrassingly parallel task that can easily be distributed on CPU cores. In a later chapter, we will tackle a number of similar problems, including image processing and web scraping, which can be made concurrent/parallel intuitively, resulting in significantly improved execution times.

Inherently sequential

In opposition to embarrassingly parallel tasks, the execution of some tasks depends heavily on the results of others. In other words, those tasks are not independent, and thus, cannot be made parallel or concurrent. Furthermore, if we were to try to implement concurrency into those programs, it could cost us more execution time to produce the same results. Let's go back to our prime-checking example from earlier; the following is the output that we saw:

> python example1.py

Result 1: [10000000000037, 10000000000051, 10000000000099, 10000000000129, 10000000000183, 10000000000259, 10000000000267, 10000000000273, 10000000000279, 10000000000283, 10000000000313, 10000000000343, 10000000000391, 10000000000411, 10000000000433, 10000000000453]

Took: 3.41 seconds.

Result 2: [10000000000183, 10000000000037, 10000000000129, 10000000000273, 10000000000259, 10000000000343, 10000000000051, 10000000000267, 10000000000279, 10000000000099, 10000000000283, 10000000000313, 10000000000391, 10000000000433, 10000000000411, 10000000000453]

Took: 2.33 seconds.

Pay close attention, and you will see that the two results from the two methods are not identical; the primes in the second result list are out of order. (Recall that, in the second method, to apply concurrency we specified splitting the tasks into different groups to be executed simultaneously, and the order of the results we obtained is the order in which each task finished being executed.) This is a direct result of using concurrency in our second method: we split the tasks to be executed by the program into different groups, and our program processed the tasks in these groups at the same time.

Since tasks across different groups were executed simultaneously, there were tasks that were behind other tasks in the input list, and yet were executed before those other tasks. For example, the number 10000000000183 was behind the number 10000000000129 in our input list, but was processed prior to, and therefore in front of, the number 10000000000129 in our output list. In fact, if you execute the program again and again, the second result will vary in almost every run.

Evidently, this situation is not desirable if the result we'd like to obtain needs to be in the order of the input we originally had. Of course, in this example, we can simply modify the result by using some form of sorting, but it will cost us extra execution time in the end, which might make it even more expensive than the original sequential approach.

A concept that is commonly used to illustrate the innate sequentiality of some tasks is pregnancy: the number of women will never reduce the length of pregnancy. As opposed to parallel or concurrent tasks, where an increase in the number of processing entities will improve the execution time, adding more processors in inherently sequential tasks will not. Famous examples of inherent sequentiality include iterative algorithms: Newton's method, iterative solutions to the three-body problem, or iterative numerical approximation methods.

I/O bound

Another way to think about sequentiality is the concept (in computer science) of a condition called I/O bound, in which the time it takes to complete a computation is mainly determined by the time spent waiting for input/output (I/O) operations to be completed. This condition arises when the rate at which data is requested is slower than the rate at which it is consumed, or, in short, more time is spent requesting data than processing it.

In an I/O bound state, the CPU must stall its operation, waiting for data to be processed. This means that, even if the CPU gets faster at processing data, processes tend to not increase in speed in proportion to the increased CPU speed, since they get more I/O-bound. With faster computation speed being the primary goal of new computer and processor designs, I/O bound states are becoming undesirable, yet more and more common, in programs.

As you have seen, there are a number of situations in which the application of concurrent programming results in decreased processing speed, and they should thus be avoided. It is therefore important for us to not see concurrency as a golden ticket that can produce unconditionally better execution times, and to understand the differences between the structures of programs that benefit from concurrency and programs that do not.

The history, present, and future of concurrency

In the following sub-topics, we will discuss the past, present, and future of concurrency.

The field of concurrent programming has enjoyed significant popularity since the early days of computer science. In this section, we will discuss how concurrent programming started and evolved throughout its history, its current usage in the industry, and some predictions regarding how concurrency will be used in the future.

The history of concurrency

The concept of concurrency has been around for quite some time. The idea developed from early work on railroads and telegraphy in the nineteenth and early twentieth centuries, and some terms have even survived to this day (such as semaphore, which indicates a variable that controls access to a shared resource in concurrent programs). Concurrency was first applied to address the question of how to handle multiple trains on the same railroad system, in order to avoid collisions and maximize efficiency, and how to handle multiple transmissions over a given set of wires in early telegraphy.

A significant portion of the theoretical groundwork for concurrent programming was actually laid in the 1960s. The early algorithmic language ALGOL 68, which was first developed in 1959, includes features that support concurrent programming. The academic study of concurrency officially started with a seminal paper in 1965 from Edsger Dijkstra, who was a pioneer in computer science, best known for the path-finding algorithm that was named after him.

That seminal paper is considered the first paper in the field of concurrent programming, in which Dijkstra identified and solved the mutual exclusion problem. Mutual exclusion, which is a property of concurrency control that prevents race conditions (which we will discuss later on), went on to become one of the most discussed topics in concurrency.

Yet, there was no considerable interest after that. From around 1970 to early 2000, processors were said to double in executing speed every 18 months. During this period, programmers did not need to concern themselves with concurrent programming, as all they had to do to have their programs run faster was wait. However, in the early 2000s, a paradigm shift in the processor business took place; instead of making increasingly big and fast processors for computers, manufacturers started focusing on smaller, slower processors, which were put together in groups. This was when computers started to have multicore processors.

Nowadays, an average computer has more than one core. So, if a programmer writes all of their programs to be non-concurrent in any way, they will find that their programs utilize only one core or one thread to process data, while the rest of the CPU sits idle, doing nothing (as we saw in the Example 1 – Checking whether a non-negative number is prime section). This is one reason for the recent push in concurrent programming.

Another reason for the increasing popularity of concurrency is the growing field of graphical, multimedia, and web-based application development, in which the application of concurrency is widely used to solve complex and meaningful problems. For example, concurrency is a major player in web development: each new request made by a user typically comes in as its own process (this is called multiprocessing; see Chapter 6,Working with Processes in Python) or asynchronously coordinated with other requests (this is called asynchronous programming; see Chapter 9, Introduction to Asynchronous Programming); if any of those requests need to access a shared resource (a database, for example) where data can be changed, concurrency should be taken into consideration.

The present

Considering the present day, where an explosive growth the internet and data sharing happens every second, concurrency is more important than ever. The current use of concurrent programming emphasizes correctness, performance, and robustness.

Some concurrent systems, such as operating systems or database management systems, are generally designed to operate indefinitely, including automatic recovery from failure, and not terminate unexpectedly. As mentioned previously, concurrent systems use shared resources, and thus they require some form of semaphore in their implementation, to control and coordinate access to those resources.

Concurrent programming is quite ubiquitous in the field of software development. Following are a few examples where concurrency is present:

Concurrency plays an important role in most common programming languages: C++, C#, Erlang, Go, Java, Julia, JavaScript, Perl, Python, Ruby, Scala, and so on.

Again, since almost every computer today has more than one core in its CPU, desktop applications need to be able to take advantage of that computing power, in order to provide truly well-designed software.

Multicore processors used in MacBook Pro computers

The iPhone 4S, which was released in 2011, has a dual-core CPU, so mobile development also has to stay connected to concurrent applications.

As for video games, two of the biggest players on the current market are the Xbox 360, which is a multi-CPU system, and Sony's PS3, which is essentially a multicore system.

Even the current iteration of the $35 Raspberry Pi is built around a quad-core system.

It is estimated that on average, Google processes over 40,000 search queries every second, which equates to over 3.5 billion searches per day, and 1.2 trillion searches per year, worldwide. Apart from having massive machines with incredible processing power, concurrency is the best way to handle that amount of data requests.

A large percentage of today's data and applications are stored in the cloud. Since computing instances on the cloud are relatively small in size, almost every web application is therefore forced to be concurrent, processing different small jobs simultaneously. As it gains more customers and has to process more requests, a well-designed web application can simply utilize more servers while keeping the same logic; this corresponds to the property of robustness that we mentioned earlier.

Even in the increasingly popular fields of artificial intelligence and data science, major advances have been made, in part due to the availability of high-end graphics cards (GPUs), which are used as parallel computing engines. In every notable competition on the biggest data science website (https://www.kaggle.com/), almost all prize-winning solutions feature some form of GPU usage during the training process. With the sheer amount of data that big data models have to comb through, concurrency provides an effective solution. Some AI algorithms are even designed to break their input data down into smaller portions and process them independently, which is a perfect opportunity to apply concurrency in order to achieve better model-training time.

The future

In this day and age, computer/internet users expect instant output, no matter what applications they are using, and developers often find themselves struggling with the problem of providing better speed for their applications. In terms of usage, concurrency will continue to be one of the main players in the field of programming, providing unique and innovative solutions to those problems. As mentioned earlier, whether it be video game design, mobile apps, desktop software, or web development, concurrency is, and will be, omnipresent in the near future.

Given the need for concurrency support in applications, some might argue that concurrent programming will also become more standard in academia. Even though specific topics in concurrency and parallelism are being covered in computer science courses, in-depth, complex subjects on concurrent programming (both theoretical and applied subjects) will be implemented in undergraduate and graduate courses, to better prepare students for the industry, where concurrency is being used every day. Computer science courses on building concurrent systems, studying data flows, and analyzing concurrent and parallel structures will only be the beginning.

Others might have a more skeptical view of the future of concurrent programming. Some say that concurrency is really about dependency analysis: a sub-field of compiler theory that analyzes execution-order constraints between statements/instructions, and determines whether it is safe for a program to reorder or parallelize its statements. Furthermore, since only a very small number of programmers truly understand concurrency and all of its intricacies, there will be a push for compilers, along with support from the operating system, to take on the responsibility of actually implementing concurrency into the programs they compile on their own.

Specifically, in the future programmers will not have to concern themselves with the concepts and problems of concurrent programming, nor should they. An algorithm implemented on the compiler-level should look at the program being compiled, analyze the statements and instructions, produce a dependency graph to determine the optimal order of execution for those statements and instructions, and apply concurrency/parallelism where it is appropriate and efficient. In short, the combination of the low number of programmers understanding and being able to effectively work with concurrent systems and the possibility of automating the design of concurrency will lead to a decrease in interest in concurrent programming.

In the end, only time will tell what the future holds for concurrent programming. We programmers can only look at how concurrency is currently being used in the real world, and determine whether it is worth learning or not: which, as we have seen in this case, it is. Furthermore, even though there are strong connections between designing concurrent programs and dependency analysis, I personally see concurrent programming as a more intricate and involved process, which might be very difficult to achieve through automation.

Concurrent programming is indeed extremely complicated and very hard to get right, but that also means the knowledge gained through the process will be beneficial and useful to any programmer, and I see that as a good enough reason to learn about concurrency. The ability to analyze the problems of program speedup, restructure your programs into different independent tasks, and coordinate those tasks to use the same resources, are the main skills that programmers build while working with concurrency, and knowledge of these topics will help them with other programming problems, as well.

A brief overview of mastering concurrency in Python

Python is one of the most popular programming languages out there, and for good reason. The language comes with numerous libraries and frameworks that facilitate high-performance computing, whether it be software development, web development, data analysis, or machine learning. Yet, there have been discussions among developers criticizing Python, which often revolve around the Global Interpreter Lock(GIL) and the difficulty of implementing concurrent and parallel programs that it leads to.

While concurrency and parallelism do behave differently in Python than in other common programming languages, it is still possible for programmers to implement Python programs that run concurrently or in parallel, and achieve significant speedup for their programs.

Mastering Concurrency in Python will serve as a comprehensive introduction to various advanced concepts in concurrent engineering and programming in Python. This book will also provide a detailed overview of how concurrency and parallelism are being used in real-world applications. It is a perfect blend of theoretical analyses and practical examples, which will give you a full understanding of the theories and techniques regarding concurrent programming in Python.

This book will be divided into six main sections. It will start with the idea behind concurrency and concurrent programming—the history, how it is being used in the industry today, and finally, a mathematical analysis of the speedup that concurrency can potentially provide. Additionally, the last section in this chapter (which is our next section) will cover instructions for how to follow the coding examples in this book, including setting up a Python environment on your own computer, downloading/cloning the code included in this book from GitHub, and running each example from your computer.

The next three sections will cover three of the main implementation approaches in concurrent programming: threads, processes, and asynchronous I/O, respectively. These sections will include theoretical concepts and principles for each of these approaches, the syntax and various functionalities that the Python language provides to support them, discussions of best practices for their advanced usage, and hands-on projects that directly apply these concepts to solve real-world problems.

Section five will introduce readers to some of the most common problems that engineers and programmers face in concurrent programming: deadlock, starvation, and race conditions. Readers will learn about the theoretical foundations and causes for each problem, analyze and replicate each of them in Python, and finally implement potential solutions. The last chapter in this section will discuss the aforementioned GIL, which is specific to the Python language. It will cover the GIL's integral role in the Python ecosystem, some challenges that the GIL poses for concurrent programming, and how to implement effective workarounds.

In the last section of the book, we will be working on various advanced applications of concurrent Python programming. These applications will include the design of lock-free and lock-based concurrent data structures, memory models and operations on atomic types, and how to build a server that supports concurrent request processing from scratch. The section will also cover the the best practices when testing, debugging, and scheduling concurrent Python applications.

Throughout this book, you will be building essential skills for working with concurrent programs, just through following the discussions, the example code, and the hands-on projects. You will understand the fundamentals of the most important concepts in concurrent programming, how to implement them in Python programs, and how to apply that knowledge to advanced applications. By the end of Mastering Concurrency in Python, you will have a unique combination of extensive theoretical knowledge regarding concurrency, and practical know-how of the various applications of concurrency in the Python language.

Why Python?

As mentioned previously, one of the difficulties that developers face while working with concurrency in the Python programming language (specifically, CPython—a reference implementation of Python written in C) is its GIL. The GIL is a mutex that protects access to Python objects, preventing multiple threads from executing Python byte codes at once. This lock is necessary mainly because CPython's memory management is not thread-safe. CPython uses reference counting to implement its memory management. This results in the fact that multiple threads can access and execute Python code simultaneously; this situation is undesirable, as it can cause an incorrect handling of data, and we say that this type of memory management is not thread-safe. To address this problem, the GIL is, as the name suggests, a lock that allows only one thread to access Python code and objects. However, this also means that, to implement multithreading programs in CPython, developers need to be aware of the GIL and work around it. That is why many have problems with implementing concurrent systems in Python.