31,19 €
Harness the power of multiple computers using Python through this fast-paced informative guide
This book is for Python developers who have developed Python programs for data processing and now want to learn how to write fast, efficient programs that perform CPU-intensive data processing tasks.
CPU-intensive data processing tasks have become crucial considering the complexity of the various big data applications that are used today. Reducing the CPU utilization per process is very important to improve the overall speed of applications.
This book will teach you how to perform parallel execution of computations by distributing them across multiple processors in a single machine, thus improving the overall performance of a big data processing task. We will cover synchronous and asynchronous models, shared memory and file systems, communication between various processes, synchronization, and more.This example based, step-by-step guide will show you how to make the best of your hardware configuration using Python for distributing applications.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 208
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: April 2016
Production reference: 1060416
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-969-1
www.packtpub.com
Author
Francesco Pierfederici
Reviewer
James King
Commissioning Editor
Veena Pagare
Acquisition Editor
Aaron Lazar
Content Development Editor
Parshva Sheth
Technical Editor
Abhishek R. Kotian
Copy Editor
Neha Vyas
Project Coordinator
Nikhil Nair
Proofreader
Safis Editing
Indexer
Rekha Nair
Graphics
Disha Haria
Production Coordinator
Melwyn Dsa
Cover Work
Melwyn Dsa
Francesco Pierfederici is a software engineer who loves Python. He has been working in the fields of astronomy, biology, and numerical weather forecasting for the last 20 years.
He has built large distributed systems that make use of tens of thousands of cores at a time and run on some of the fastest supercomputers in the world. He has also written a lot of applications of dubious usefulness but that are great fun. Mostly, he just likes to build things.
I would like to thank my wife, Alicia, for her unreasonable patience during the gestation of this book. I would also like to thank Parshva Sheth and Aaron Lazar at Packt Publishing and the technical reviewer, James King, who were all instrumental in making this a better book.
James King is a software developer with a broad range of experience in distributed systems. He is a contributor to many open source projects including OpenStack and Mozilla Firefox. He enjoys mathematics, horsing around with his kids, games, and art.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Parallel and distributed computing is a fascinating subject that only a few years ago developers in only a very few large companies and national labs were privy to. Things have changed dramatically in the last decade or so, and now everybody can build small- and medium-scale distributed applications in a variety of programming languages including, of course, our favorite one: Python.
This book is a very practical guide for Python programmers who are starting to build their own distributed systems. It starts off by illustrating the bare minimum theoretical concepts needed to understand parallel and distributed computing in order to lay the basic foundations required for the rest of the (more practical) chapters.
It then looks at some first examples of parallelism using nothing more than modules from the Python standard library. The next step is to move beyond the confines of a single computer and start using more and more nodes. This is accomplished using a number of third-party libraries, including Celery and Pyro.
The remaining chapters investigate a few deployment options for our distributed applications. The cloud and classic High Performance Computing (HPC) clusters, together with their strengths and challenges, take center stage.
Finally, the thorny issues of monitoring, logging, profiling, and debugging are touched upon.
All in all, this is very much a hands-on book, teaching you how to use some of the most common frameworks and methodologies to build parallel and distributed systems in Python.
Chapter 1, AnIntroduction to Parallel and Distributed Computing, takes you through the basic theoretical foundations of parallel and distributed computing.
Chapter 2, Asynchronous Programming, describes the two main programming styles used in distributed applications: synchronous and asynchronous programming.
Chapter 3, Parallelism in Python, shows you how to do more than one thing at the same time in your Python code, using nothing more than the Python standard library.
Chapter 4, Distributed Applications – with Celery, teaches you how to build simple distributed applications using Celery and some of its competitors: Python-RQ and Pyro.
Chapter 5, Python in the Cloud, shows how you can deploy your Python applications on the cloud using Amazon Web Services.
Chapter 6, Python on an HPC Cluster, shows how to deploy your Python applications on a classic HPC cluster, typical of many universities and national labs.
Chapter 7, Testing and Debugging Distributed Applications, talks about the challenges of testing, profiling, and debugging distributed applications in Python.
Chapter 8, The Road Ahead, looks at what you have learned so far and which directions interested readers could take to push their development of distributed systems further.
The following software and hardware is recommended:
All software mentioned in this book is free of charge and can be downloaded from the Internet with the exception of PBS Pro, which is commercial. Most of the PBS Pro functionality, however, is available in its close sibling Torque, which is open source.
This book is for developers who already know Python and want to learn how to parallelize their code and/or write distributed systems. While a Unix environment is assumed, most if not all of the examples should also work on Windows systems.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
The first modern digital computer was invented in the late 30s and early 40s (that is, arguably, the Z1from Konrad Zuse in 1936), probably before most of the readers of this book—let alone the author—were born. These last seventy odd years have seen computers become faster and cheaper at an amazing rate, which was unique across industries. Just think that today's smartphones (for example, the latest iPhones or Android phones) are faster than the fastest computer in the world from just 20 years ago. Not to mention, the amazing feat of miniaturization: those supercomputers used to take up entire rooms; now they fit in our pockets.
These years have also seen, among others, two key inventions relevant to the topic at hand. One is the ability to cram more than one processor on a single motherboard (and even multiple CPU cores on a single processor). This development was crucial in allowing computations to be performed truly concurrently. As we know, processors are able to perform only one task at a time; however, as we will see later on in the chapter, they are fast enough to give the illusion of being able to run multiple tasks at the same time. To be able to perform more than one action exactly at the same time, you need access to more than one processor.
The other critical invention is high-speed computer networking. This allowed, for the first time, a potentially enormous number of computers to communicate with each other. These networked machines can either be located in the same office/building (the so-called Local Area Network (LAN)) or be spread out across different buildings, cities, or even across the planet (that is, WAN or wide area networking).
By now, most of us are familiar with multiprocessor/multicore computers, and indeed, the chances are pretty high that the phone in our pocket, the tablet in our hands, or the laptop we take on the road has a handful of cores already. The graphics card, also called Graphics Processing Unit (GPU) in these devices is more often than not massively parallel, with hundreds or even thousands of processing units. Computer networks too are all around us, starting from the most famous of them all: the Internet, to the Wi-Fi in our homes and coffee shops and the 4G mobile networks our phones use.
In the rest of this chapter, we will lay some working definitions of the topics that we will explore in the rest of the book. We will be introducing the concepts of parallel and distributed computing. We will give some examples of each that are taken from familiar topics and technologies. Some general advantages and disadvantages of each architecture and programming paradigm will be discussed as well.
Before proceeding with our definitions and a little bit of theory, let's clarify a possible source of confusion. In this and the following chapters, we will use the term processor and the term CPUcore (or even simply core) interchangeably, unless otherwise specified. This is, of course, technically incorrect; a processor has one or more cores, and a computer has one or more processors as cores do not exist in isolation. Depending on the algorithm and its performance requirements, running on multiple processors or on a single processor using multiple cores can make a bit of difference in speed, assuming, of course, that the algorithm can be parallelized in the first place. For our intents and purposes, however, we will ignore these differences and refer to more advanced texts for further exploration of this topic.
Definitions of parallel computing abound. However, for the purpose of this book, a simple definition will suffice, which is as follows:
Parallel computing is the simultaneous use of more than one processor to solve a problem.
Typically, this definition is further specialized by requiring that the processors reside on the same motherboard. This is mostly to distinguish parallel computing from distributed computing (which is discussed in the next section).
The idea of splitting work among many workers is as old as human civilization, is not restricted to the digital world, and finds an immediate and obvious application in modern computers equipped with higher and higher numbers of compute units.
There are, of course, many reasons why parallel computing might be useful and even necessary. The simplest one is performance; if we can indeed break up a long-running computation into smaller chunks and parcel them out to different processors, then we can do more work in the same amount of time.
Other times, and just as often, parallel computing techniques are used to present users with responsive interfaces while the system is busy with some other task. Remember that one processor executes just one task at the time. Applications with GUIs need to offload work to a separate thread of execution running on another processor so that one processor is free to update the GUI and respond to user inputs.
The following figure illustrates this common architecture, where the main thread is processing user and system inputs using what is called an event loop. Tasks that require a long time to execute and those that would otherwise block the GUI are offloaded to a background or worker thread:
A simple real-world example of this parallel architecture could be a photo organization application. When we connect a digital camera or a smartphone to our computers, the photo application needs to perform a number of actions; all the while its user interface needs to stay interactive. For instance, our application needs to copy images from the device to the internal disk, create thumbnails, extract metadata (for example, date and time of the shot), index the images, and finally update the image gallery. While all of this happens, we are still able to browse images that are already imported, open them, edit them, and so on.
Of course, all these actions could very well be performed sequentially on a single processor—the same processor that is handling the GUI. The drawback would be a sluggish interface and an extremely slow overall application. Performing these steps in parallel keeps the application snappy and its users happy.
The astute reader might jump up at this point and rightfully point out that older computers, with a single processor and a single core, could already perform multiple things at the same time (by way of multitasking). What happened back then (and even today, when we launch more tasks than there are processors and cores on our computers) was that the one running task gave up the CPU (either voluntarily or forcibly by the OS, for example, in response to an IO event) so that another task could run in its place. These interrupts would happen over and over again, with various tasks acquiring and giving up the CPU many times over the course of the application's life. In those cases, users had the impression of multiple tasks running concurrently, as the switches were extremely fast. In reality, however, only one task was running at any given time.
The typical tools used in parallel applications are threads. On systems such as Python (as we will see in Chapter 3, Parallelism in Python) where threads have significant limitations, programmers resort to launching (oftentimes, by means of forking) subprocesses instead. These subprocesses replace (or complement) threads and run alongside the main application process.
The first technique is called multithreaded programming. The second is called multiprocessing. It is worth noting that multiprocessing should not be seen as inferior or as a workaround with respect to using multiple threads.
There are many situations where multiprocessing is preferable to multiple threads. Interestingly, even though they both run on a single computer, a multithreaded application is an example of shared-memory architecture, whereas a multiprocess application is an example of distributed memory architecture (refer to the following section to know more).
For the remainder of this book, we will adopt the following working definition of distributed computing:
Distributed computing is the simultaneous use of more than one computer to solve a problem.
Typically, as in the case of parallel computing, this definition is oftentimes further restricted. The restriction usually is the requirement that these computers appear to their users as a single machine, therefore hiding the distributed nature of the application. In this book, we will be happy with the more general definition.
Distributing computation across multiple computers is again a pretty obvious strategy when using systems that are able to speak to each other over the (local or otherwise) network. In many respects, in fact, this is just a generalization of the concepts of parallel computing that we saw in the previous section.
Reasons to build distributed systems abound. Oftentimes, the reason is the ability to tackle a problem so big that no individual computer could handle it at all, or at least, not in a reasonable amount of time. An interesting example from a field that is probably familiar to most of us is the rendering of 3D animation movies, such as those from Pixar and DreamWorks.
Given the sheer number of frames to render for a full-length feature (30 frames per second on a two-hour movie is a lot!), movie studios need to spread the full-rendering job to large numbers of computers (computer farms as they are called).
Other times, the very nature of the application being developed requires a distributed system. This is the case, for instance, for instant messaging and video conferencing applications. For these pieces of software, performance is not the main driver. It is just that the problem that the application solves is itself distributed.
In the following figure, we see a very common web application architecture (another example of a distributed application), where multiple users connect to the website over the network. At the same time, the application itself communicates with systems (such as a database server) running on different machines in its LAN:
Another interesting example of distributed systems, which might be a bit counterintuitive, is the CPU-GPU combination. These days, graphics cards are very sophisticated computers in their own right. They are highly parallel and offer compelling performance for a large number of compute-intensive problems, not just for displaying images on screen. Tools and libraries exist to allow programmers to make use of GPUs for general-purpose computing (for example CUDA from NVIDIA, OpenCL, and OpenAcc among others).
However, the system composed by the CPU and GPU is really an example of a distributed system, where the network is replaced by the PCI bus. Any application exploiting both the CPU and the GPU needs to take care of data movement between the two subsystems just like a more traditional application running across the network!
It is worth noting that, in general, adapting the existing code to run across computers on a network (or on the GPU) is far from a simple exercise. In these cases, I find it quite helpful to go through the intermediate step of using multiple processes on a single computer first (refer to the discussion in the previous section). Python, as we will see in Chapter 3, Parallelism in Python, has powerful facilities for doing just that (refer to the concurrent.futures module).
Once I evolve my application so that it uses multiple processes to perform operations in parallel, I start thinking about how to turn these processes into separate applications altogether, which are no longer part of my monolithic core.
Special attention must be given to the data—where to store it and how to access it. In simple cases, a shared filesystem (for example, NFS on Unix systems) is enough; other times, a database and/or a message bus is needed. We will see some concrete examples from Chapter 4, Distributed Applications – with Celery, onwards. It is important to remember that, more often than not, data, rather than CPU, is the real bottleneck.
Conceptually, parallel computing and distributed computing look very similar—after all, they both are about
