39,59 €
This book will teach you parallel programming techniques using examples in Python and will help you explore the many ways in which you can write code that allows more than one process to happen at once. Starting with introducing you to the world of parallel computing, it moves on to cover the fundamentals in Python. This is followed by exploring the thread-based parallelism model using the Python threading module by synchronizing threads and using locks, mutex, semaphores queues, GIL, and the thread pool.
Next you will be taught about process-based parallelism where you will synchronize processes using message passing along with learning about the performance of MPI Python Modules. You will then go on to learn the asynchronous parallel programming model using the Python asyncio module along with handling exceptions. Moving on, you will discover distributed computing with Python, and learn how to install a broker, use Celery Python Module, and create a worker.
You will understand anche Pycsp, the Scoop framework, and disk modules in Python. Further on, you will learnGPU programming withPython using the PyCUDA module along with evaluating performance limitations.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 296
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2015
Production reference: 1210815
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78528-958-3
www.packtpub.com
Author
Giancarlo Zaccone
Reviewers
Aditya Avinash
Ravi Chityala
Mike Galloy
Ludovic Gasc
Commissioning Editor
Sarah Crofton
Acquisition Editor
Meeta Rajani
Content Development Editor
Rashmi Suvarna
Technical Editor
Mrunmayee Patil
Copy Editor
Neha Vyas
Project Coordinator
Judie Jose
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Sheetal Aute
Disha Haria
Jason Monterio
Abhinash Sahu
Production Coordinator
Conidon Miranda
Cover Work
Conidon Miranda
Giancarlo Zaccone has more than 10 years of experience in managing research projects, both in scientific and industrial domains. He worked as a researcher at the National Research Council (CNR), where he was involved in a few parallel numerical computing and scientific visualization projects.
He currently works as a software engineer at a consulting company, developing and maintaining software systems for space and defense applications.
Giancarlo holds a master's degree in physics from the University of Naples Federico II and has completed a second-level postgraduate master's program in scientific computing from the Sapienza University of Rome.
You can know more about him at https://it.linkedin.com/in/giancarlozaccone.
Aditya Avinash is a graduate student who focuses on computer graphics and GPUs. His areas of interest are compilers, drivers, physically based rendering, and real-time rendering. His current focus is on making a contribution to MESA (the open source graphics driver stack for Linux), where he will implement OpenGL extensions for the AMD backend. This is something that he is really excited about. He also likes writing compilers to translate high-level abstraction code into GPU code. He has developed Urutu, which gives GPUs thread-level parallelism with Python. For this, NVIDIA funded him with a couple of Tesla K40 GPUs. Currently, he is working on RockChuck, translating the Python code (written using data parallel abstraction) into GPU/CPU code, depending on the available backend. This project was started after he reviewed the opinions of a lot of Python programmers who wanted data parallel abstraction for Python and GPUs.
He has a computer engineering background, where he designed hardware and software to fit certain applications (ASIC). From this, he gained experience of how to use FPGAs and HDLs. Apart from this, he mainly programs using Python and C++. In C++, he uses OpenGL, CUDA, OpenCL, and other multicore programming APIs. Since he is a student, most of his work is not affiliated with any institution or person.
Ravi Chityala is a senior engineer at Elekta Inc. He has more than 12 years of experience in image processing and scientific computing. He is also a part time instructor at the University of California, Santa Cruz Extension, San Jose, CA, where he teaches advanced Python to programmers. He began using Python as a scripting tool and fell in love with the language's simplicity, power, and expressiveness. He now uses it for web development, scientific prototyping and computing, and he uses it as a glue to automate the process. He combined his experience in image processing and his love for Python and coauthored the book Image Acquisition and Processing using Python, published by CRC Press.
Mike Galloy is a software developer who focuses on high-performance computing and visualization in scientific programming. He works mostly on IDL, but occasionally uses C, CUDA, and Python. He currently works for the National Center for Atmospheric Research (NCAR) at the Mauna Loa Solar Observatory. Previously, he worked for Tech-X Corporation, where he was the main developer for GPULib, a library of IDL bindings for GPU-accelerated computation routines. He is the creator and main developer of the open source projects, IDLdoc, mgunit, and rIDL, as well as the author of the book Modern IDL.
Ludovic Gasc is a senior software developer and engineer at Eyepea and ALLOcloud, a highly renowned open source VoIP and unified communications company in Europe.
Over the last 5 years, he has developed redundant distributed systems for the telecom sector that are based on Python, AsyncIO, PostgreSQL, and Redis.
You can contact him on his blog at http://www.gmludo.eu.
He is also the creator of the blog API-Hour: Write efficient network daemons (HTTP, SSH) with ease. For more information, visit http://www.api-hour.io.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
The study of computer science should cover not only the principles on which computational processing is based, but should also reflect the current state of knowledge of these fields. Today, the technology requires that professionals from all branches of computer science know both the software and hardware whose interaction at all levels is the key to understanding the basics of computational processing.
For this reason, in this book, a special focus is given on the relationship between hardware architectures and software.
Until recently, programmers could rely on the work of the hardware designers, compilers, and chip manufacturers to make their software programs faster or more efficient without the need for changes.
This era is over. So now, if a program is to run faster, it must become a parallel program.
Although the goal of many researchers is to ensure that programmers are not aware of the parallel nature of the hardware for which they write their programs, it will take many years before this actually becomes possible. Nowadays, most programmers need to thoroughly understand the link between hardware and software so that the programs can be run efficiently on modern computer architectures.
To introduce the concepts of parallel programming, the Python programming language has been adopted. Python is fun and easy to use, and its popularity has grown steadily in recent years. Python was developed more than 10 years ago by Guido van Rossum, who derived Python's syntax simplicity and ease of use largely from ABC, which is a teaching language that was developed in the 80s.
In addition to this specific context, Python was created to solve real-life problems, and it borrows a wide variety of typical characteristics of programming languages, such as C ++, Java, and Scheme. This is one of its most remarkable features, which has led to its broad appeal among professional software developers, the scientific research industry, and computer science educators. One of the reasons why Python is liked so much is because it provides the best balance between the practical and conceptual approaches. It is an interpreted language, so you can start doing things immediately without getting lost in the problems of compilation and linking. Python also provides an extensive software library that can be used in all sorts of tasks ranging from the Web, graphics, and of course, parallel computing. This practical aspect is a great way to engage readers and allow them to carry out projects that are important in this book.
This book contains a wide variety of examples that are inspired by many situations, and these offer you the opportunity to solve real-life problems. This book examines the principles of software design for parallel architectures, insisting on the importance of clarity of the programs and avoiding the use of complex terminology in favor of clear and direct examples. Each topic is presented as part of a complete, working Python program, which is followed by the output of the program in question.
The modular organization of the various chapters provides a proven path to move from the simplest arguments to the most advanced ones, but this is also suitable for those who only want to learn a few specific issues.
I hope that the settings and content of this book are able to provide you with a useful contribution for your better understanding and dissemination of parallel programming techniques.
Chapter 1, Getting Started with Parallel Computing and Python, gives you an overview of parallel programming architectures and programming models. This chapter introduces the Python programming language, the characteristics of the language, its ease of use and learning, extensibility, and richness of software libraries and applications. It also shows you how to make Python a valuable tool for any application, and also, of course, for parallel computing.
Chapter 2, Thread-based Parallelism, discusses thread parallelism using the threading Python module. Through complete programming examples, you will learn how to synchronize and manipulate threads to implement your multithreading applications.
Chapter 3, Process-based Parallelism, will guide through the process-based approach to parallelize a program. A complete set of examples will show you how to use the multiprocessing Python module. Also, this chapter will explain how to perform communication through processes, using the message passing parallel programming paradigm via the mpi4py Python module.
Chapter 4, Asynchronous Programming, explains the asynchronous model for concurrent programming. In some ways, it is simpler than the threaded one because there is a single instruction stream and tasks explicitly relinquish control instead of being suspended arbitrarily. This chapter will show you how to use the Python asyncio module to organize each task as a sequence of smaller steps that must be executed in an asynchronous manner.
Chapter 5, Distributed Python, introduces you to distributed computing. It is the process of aggregating several computing units logically and may even be geographically distributed to collaboratively run a single computational task in a transparent and coherent way. This chapter will present some of the solutions proposed by Python for the implementation of these architectures using the OO approach, Celery, SCOOP, and remote procedure calls, such as Pyro4 and RPyC. It will also include different approaches, such as PyCSP, and finally, Disco, which is the Python version of the MapReduce algorithm.
Chapter 6, GPU Programming with Python, describes the modern Graphics Processing Units (GPUs) that provide breakthrough performance for numerical computing at the cost of increased programming complexity. In fact, the programming models for GPUs require the programmer to manually manage the data transfer between a CPU and GPU. This chapter will teach you, through the programming examples and use cases, how to exploit the computing power provided by the GPU cards, using the powerful Python modules: PyCUDA, NumbaPro, and PyOpenlCL.
All the examples of this book can be tested in a Windows 7 32-bit machine. Also, a Linux environment will be useful.
The Python versions needed to run the examples are:
The following modules (all of which are freely downloadable) are required:
This book is intended for software developers who want to use parallel programming techniques to write powerful and efficient code. After reading this book, you will be able to master the basics and the advanced features of parallel computing. The Python programming language is easy to use and allows nonexperts to deal with and easily understand the topics exposed in this book.
This book contains the following sections:
This section tells us what to expect in the recipe and describes how to set up any software or any preliminary settings needed for the recipe.
This section characterizes the steps that are to be followed to "cook" the recipe.
This section usually consists a brief and detailed explanation of what happened in the previous section.
This section consists of additional information about the recipe in order to make the reader more anxious about the recipe.
This section may contain references to the recipe.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.
To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide on www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you would report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the errata submission form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded on our website, or added to any list of existing errata, under the Errata section of that title. Any existing errata can be viewed by selecting your title from http://www.packtpub.com/support.
Piracy of copyright material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works, in any form, on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors, and our ability to bring you valuable content.
You can contact us at <[email protected]> if you are having a problem with any aspect of the book, and we will do our best to address it.
In this chapter, we will cover the following recipes:
This chapter gives you an overview of parallel programming architectures and programming models. These concepts are useful for inexperienced programmers who have approached parallel programming techniques for the first time. This chapter can be a basic reference for the experienced programmers. The dual characterization of parallel systems is also presented in this chapter. The first characterization is based on the architecture of the system and the second characterization is based on parallel programming paradigms. Parallel programming will always be a challenge for programmers. This programming-based approach is further described in this chapter, when we present the design procedure of a parallel program. The chapter ends with a brief introduction of the Python programming language. The characteristics of the language, ease of use and learning, and extensibility and richness of software libraries and applications make Python a valuable tool for any application and also, of course, for parallel computing. In the final part of the chapter, the concepts of threads and processes are introduced in relation to their use in the language. A typical way to solve a problem of a large-size is to divide it into smaller and independent parts in order to solve all the pieces simultaneously. A parallel program is intended for a program that uses this approach, that is, the use of multiple processors working together on a common task. Each processor works on its section (the independent part) of the problem. Furthermore, a data information exchange between processors could take place during the computation. Nowadays, many software applications require more computing power. One way to achieve this is to increase the clock speed of the processor or to increase the number of processing cores on the chip. Improving the clock speed increases the heat dissipation, thereby decreasing the performance per watt and moreover, this requires special equipment for cooling. Increasing the number of cores seems to be a feasible solution, as power consumption and dissipation are way under the limit and there is no significant gain in performance.
To address this problem, computer hardware vendors decided to adopt multi-core architectures, which are single chips that contain two or more processors (cores). On the other hand, the GPU manufactures also introduced hardware architectures based on multiple computing cores. In fact, today's computers are almost always present in multiple and heterogeneous computing units, each formed by a variable number of cores, for example, the most common multi-core architectures.
Therefore, it became essential for us to take advantage of the computational resources available, to adopt programming paradigms, techniques, and instruments of parallel computing.
Based on the number of instructions and data that can be processed simultaneously, computer systems are classified into four categories:
This classification is known as Flynn's taxonomy.
The SISD computing system is a uniprocessor machine. It executes a single instruction that operates on a single data stream. In SISD, machine instructions are processed sequentially.
In a clock cycle, the CPU executes the following operations:
Once the execution stage is complete, the CPU sets itself to begin another CPU cycle.
The SISD architecture schema
The algorithms that run on these types of computers are sequential (or serial), since they do not contain any parallelism. Examples of SISD computers are hardware systems with a single CPU.
The main elements of these architectures (Von Neumann architectures) are:
The conventional single processor computers are classified as SISD systems. The following figure specifically shows which areas of a CPU are used in the stages of fetch, decode, and execute:
CPU's components in the fetch-decode-execute phase
In this model, n processors, each with their own control unit, share a single memory unit. In each clock cycle, the data received from the memory is processed by all processors simultaneously, each in accordance with the instructions received from its control unit. In this case, the parallelism (instruction-level parallelism) is obtained by performing several operations on the same piece of data. The types of problems that can be solved efficiently in these architectures are rather special, such as those regarding data encryption; for this reason, the computer MISD did not find space in the commercial sector. MISD computers are more of an intellectual exercise than a practical configuration.
The MISD architecture scheme
A SIMD computer consists of n identical processors, each with its own local memory, where it is possible to store data. All processors work under the control of a single instruction stream; in addition to this, there are n data streams, one for each processor. The processors work simultaneously on each step and execute the same instruction, but on different data elements. This is an example of data-level parallelism. The SIMD architectures are much more versatile than MISD architectures. Numerous problems covering a wide range of applications can be solved by parallel algorithms on SIMD computers. Another interesting feature is that the algorithms for these computers are relatively easy to design, analyze, and implement. The limit is that only the problems that can be divided into a number of subproblems (which are all identical, each of which will then be solved contemporaneously, through the same set of instructions) can be addressed with the SIMD computer. With the supercomputer developed according to this paradigm, we must mention the Connection Machine (1985 Thinking Machine) and MPP (NASA - 1983). As we will see in Chapter 6, GPU Programming with Python, the advent of modern graphics processor unit (GPU), built with many SIMD embedded units has lead to a more widespread use of this computational paradigm.
This class of parallel computers is the most general and more powerful class according to Flynn's classification. There are n processors, n instruction streams, and n data streams in this. Each processor has its own control unit and local memory, which makes MIMD architectures more computationally powerful than those used in SIMD. Each processor operates under the control of a flow of instructions issued by its own control unit; therefore, the processors can potentially run different programs on different data, solving subproblems that are different and can be a part of a single larger problem. In MIMD, architecture is achieved with the help of the parallelism level with threads and/or processes. This also means that the processors usually operate asynchronously. The computers in this class are used to solve those problems that do not have a regular structure that is required by the model SIMD. Nowadays, this architecture is applied to many PCs, supercomputers, and computer networks. However, there is a counter that you need to consider: asynchronous algorithms are difficult to design, analyze, and implement.
The MIMD architecture scheme
Another aspect that we need to consider to evaluate a parallel architecture is memory organization or rather, the way in which the data is accessed. No matter how fast the processing unit is, if the memory cannot maintain and provide instructions and data at a sufficient speed, there will be no improvement in performance. The main problem that must be overcome to make the response time of the memory compatible with the speed of the processor is the memory cycle time, which is defined as the time that has elapsed between two successive operations. The cycle time of the processor is typically much shorter than the cycle time of the memory. When the processor starts transferring data (to or from the memory), the memory will remain occupied for the entire time of the memory cycle: during this period, no other device (I/O controller, processor, or even the processor itself that made the request) can use the memory because it will be committed to respond to the request.
The memory organization in MIMD architecture
Solutions to the problem of access memory resulted in a dichotomy of MIMD architectures. In the first type of system, known as the shared memory system, there is high virtual memory and all processors have equal access to data and instructions in this memory. The other type of system is the distributed memory model, wherein each processor has a local memory that is not accessible to other processors. The difference between shared memory and distributed memory lies in the structure of the virtual memory or the memory from the perspective of the processor. Physically, almost every system memory is divided into distinct components that are independently accessible. What distinguishes a shared memory from a distributed memory is the memory access management by the processing unit. If a processor were to execute the instruction load R0, i, which means load in the R0 register the contents of the memory location i, the question now is what should happen? In a system with shared memory, the i index is a global address and the memory location i is the same for each processor. If two processors were to perform this instruction at the same time, they would load the same information in their registers R0. In a distributed memory system, i is a local address. If two processors were to load the statement R0 at the same time, different values may end up in the respective register's R0, since, in this case, the memory cells are allotted one for each local memory. The distinction between shared memory and distributed memory is very important for programmers because it determines the way in which different parts of a parallel program must communicate. In a system, shared memory is sufficient to build a data structure in memory and go to the parallel subroutine, which are the reference variables of this data structure. Moreover, a distributed memory machine must make copies of shared data in each local memory. These copies are created by sending a message containing the data to be shared from one processor to another. A drawback of this memory organization is that sometimes, these messages can be very large and take a relatively long transfer time.
The schema of a shared memory multiprocessor system is shown in the following figure. The physical connections here are quite simple. The bus