29,99 €
This book is for the scientists, engineers, programmers, or analysts looking for a high-quality, open source mathematical library. Knowledge of Python is assumed. Also, some affinity, or at least interest, in mathematics and statistics is required. However, I have provided brief explanations and pointers to learning resources.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 343
Veröffentlichungsjahr: 2015
Copyright © 2015 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: November 2011
Second edition: April 2013
Third edition: June 2015
Production reference: 1160615
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78528-196-9
www.packtpub.com
Author
Ivan Idris
Reviewers
Alexandre Devert
Davide Fiacconi
Ardo Illaste
Commissioning Editor
Amarabha Banerjee
Acquisition Editors
Shaon Basu
Usha Iyer
Rebecca Youe
Content Development Editor
Neeshma Ramakrishnan
Technical Editor
Rupali R. Shrawane
Copy Editors
Charlotte Carneiro
Vikrant Phadke
Sameen Siddiqui
Project Coordinator
Shweta H. Birwatkar
Proofreader
Safis Editing
Indexer
Rekha Nair
Graphics
Sheetal Aute
Jason Monteiro
Production Coordinator
Aparna Bhagat
Cover Work
Aparna Bhagat
Ivan Idris has an MSc in experimental physics. His graduation thesis had a strong emphasis on applied computer science. After graduating, he worked for several companies as a Java developer, data warehouse developer, and QA Analyst. His main professional interests are business intelligence, big data, and cloud computing. Ivan enjoys writing clean, testable code and interesting technical articles. He is the author of NumPy Beginner's Guide, NumPy Cookbook, Learning NumPy Array, and Python Data Analysis. You can find more information about him and a blog with a few examples of NumPy at http://ivanidris.net/wordpress/.
I would like to take this opportunity to thank the reviewers and the team at Packt Publishing for making this book possible. Also thanks go to my teachers, professors, colleagues, Wikipedia contributors, Stack Overflow contributors, and other authors who taught me science and programming. Last but not least, I would like to acknowledge my parents, family, and friends for their support.
Davide Fiacconi is completing his PhD in theoretical astrophysics from the Institute for Computational Science at the University of Zurich. He did his undergraduate and graduate studies at the University of Milan-Bicocca, studying the evolution of collisional ring galaxies using hydrodynamic numerical simulations. Davide's research now focuses on the formation and coevolution of supermassive black holes and galaxies, using both massively parallel simulations and analytical techniques. In particular, his interests include the formation of the first supermassive black hole seeds, the dynamics of binary black holes, and the evolution of high-redshift galaxies.
Ardo Illaste is a data scientist. He wants to provide everyone with easy access to data for making major life and career decisions. He completed his PhD in computational biophysics, prior to fully delving into data mining and machine learning. Ardo has worked and studied in Estonia, the USA, and Switzerland.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.
I dedicate this book to my aunt Lies who recently passed away. Rest in peace.
Scientists, engineers, and quantitative data analysts face many challenges nowadays. Data scientists want to be able to perform numerical analysis on large datasets with minimal programming effort. They also want to write readable, efficient, and fast code that is as close as possible to the mathematical language they are used to. A number of accepted solutions are available in the scientific computing world.
The C, C++, and Fortran programming languages have their benefits, but they are not interactive and considered too complex by many. The common commercial alternatives, such as MATLAB, Maple, and Mathematica, provide powerful scripting languages that are even more limited than any general-purpose programming language. Other open source tools similar to MATLAB exist, such as R, GNU Octave, and Scilab. Obviously, they too lack the power of a language such as Python.
Python is a popular general-purpose programming language that is widely used in the scientific community. You can access legacy C, Fortran, or R code easily from Python. It is object-oriented and considered to be of a higher level than C or Fortran. It allows you to write readable and clean code with minimal fuss. However, it lacks an out-of-the-box MATLAB equivalent. That's where NumPy comes in. This book is about NumPy and related Python libraries, such as SciPy and matplotlib.
NumPy (short for numerical Python) is an open source Python library for scientific computing. It lets you work with arrays and matrices in a natural way. The library contains a long list of useful mathematical functions, including some functions for linear algebra, Fourier transformation, and random number generation routines. LAPACK, a linear algebra library, is used by the NumPy linear algebra module if you have it installed on your system. Otherwise, NumPy provides its own implementation. LAPACK is a well-known library, originally written in Fortran, on which MATLAB relies as well. In a way, NumPy replaces some of the functionality of MATLAB and Mathematica, allowing rapid interactive prototyping.
We will not be discussing NumPy from a developing contributor's perspective, but from more of a user's perspective. NumPy is a very active project and has a lot of contributors. Maybe, one day you will be one of them!
NumPy is based on its predecessor Numeric. Numeric was first released in 1995 and has deprecated status now. Neither Numeric nor NumPy made it into the standard Python library for various reasons. However, you can install NumPy separately, which will be explained in Chapter 1, NumPy Quick Start.
In 2001, a number of people inspired by Numeric created SciPy, an open source scientific computing Python library that provides functionality similar to that of MATLAB, Maple, and Mathematica. Around this time, people were growing increasingly unhappy with Numeric. Numarray was created as an alternative to Numeric. That is also deprecated now. It was better in some areas than Numeric, but worked very differently. For that reason, SciPy kept on depending on the Numeric philosophy and the Numeric array object. As is customary with new latest and greatest software, the arrival of Numarray led to the development of an entire ecosystem around it, with a range of useful tools.
In 2005, Travis Oliphant, an early contributor to SciPy, decided to do something about this situation. He tried to integrate some of Numarray's features into Numeric. A complete rewrite took place, and it culminated in the release of NumPy 1.0 in 2006. At that time, NumPy had all the features of Numeric and Numarray, and more. Tools were available to facilitate the upgrade from Numeric and Numarray. The upgrade is recommended since Numeric and Numarray are not actively supported any more.
Originally, the NumPy code was a part of SciPy. It was later separated and is now used by SciPy for array and matrix processing.
NumPy code is much cleaner than straight Python code and it tries to accomplish the same tasks. There are fewer loops required because operations work directly on arrays and matrices. The many convenience and mathematical functions make life easier as well. The underlying algorithms have stood the test of time and have been designed with high performance in mind.
NumPy's arrays are stored more efficiently than an equivalent data structure in base Python, such as a list of lists. Array IO is significantly faster too. The improvement in performance scales with the number of elements of the array. For large arrays, it really pays off to use NumPy. Files as large as several terabytes can be memory-mapped to arrays, leading to optimal reading and writing of data.
The drawback of NumPy arrays is that they are more specialized than plain lists. Outside the context of numerical computations, NumPy arrays are less useful. The technical details of NumPy arrays will be discussed in later chapters.
Large portions of NumPy are written in C. This makes NumPy faster than pure Python code. A NumPy C API exists as well, and it allows further extension of functionality with the help of the C language. The C API falls outside the scope of the book. Finally, since NumPy is open source, you get all the related advantages. The price is the lowest possible—as free as a beer. You don't have to worry about licenses every time somebody joins your team or you need an upgrade of the software. The source code is available for everyone. This of course is beneficial to code quality.
If you are a Java programmer, you might be interested in Jython, the Java implementation of Python. In that case, I have bad news for you. Unfortunately, Jython runs on the Java Virtual Machine and cannot access NumPy because NumPy's modules are mostly written in C. You could say that Jython and Python are two totally different worlds, though they do implement the same specifications. There are some workarounds for this discussed in NumPy Cookbook - Second Edition, Packt Publishing, written by Ivan Idris.
Chapter 1, NumPy Quick Start, guides you through the steps needed to install NumPy on your system and create a basic NumPy application.
Chapter 2, Beginning with NumPy Fundamentals, introduces NumPy arrays and fundamentals.
Chapter 3, Getting Familiar with Commonly Used Functions, teaches you the most commonly used NumPy functions—the basic mathematical and statistical functions.
Chapter 4, Convenience Functions for Your Convenience, tells you about functions that make working with NumPy easier. This includes functions that select certain parts of your arrays, for instance, based on a Boolean condition. You also learn about polynomials and manipulating the shapes of NumPy objects.
Chapter 5, Working with Matrices and ufuncs, covers matrices and universal functions. Matrices are well-known in mathematics and have their representation in NumPy as well. Universal functions (ufuncs) work on arrays element by element, or on scalars. ufuncs expect a set of scalars as the input and produce a set of scalars as the output.
Chapter 6, Moving Further with NumPy Modules, discusses a number of basic modules of universal functions. These functions can typically be mapped to their mathematical counterparts, such as addition, subtraction, division, and multiplication.
Chapter 7, Peeking into Special Routines, describes some of the more specialized NumPy functions. As NumPy users, we sometimes find ourselves having special requirements. Fortunately, NumPy satisfies most of our needs.
Chapter 8, Assuring Quality with Testing, teaches you how to write NumPy unit tests.
Chapter 9, Plotting with matplotlib, covers matplotlib in depth, a very useful Python plotting library. NumPy cannot be used on its own to create graphs and plots. matplotlib integrates nicely with NumPy and has plotting capabilities comparable to MATLAB.
Chapter 10, When NumPy Is Not Enough – SciPy and Beyond, covers more details about SciPy. We know that SciPy and NumPy are historically related. SciPy, as mentioned in the History section, is a high-level Python scientific computing framework built on top of NumPy. It can be used in conjunction with NumPy.
Chapter 11, Playing with Pygame, is the dessert of this book. You learn how to create fun games with NumPy and Pygame. You also get a taste of artificial intelligence in this chapter.
Appendix A, Pop Quiz Answers, has the answers to all the pop quiz questions within the chapters.
Appendix B, Additional Online Resources, contains links to Python, mathematics, and statistics websites.
Appendix C, NumPy Functions' References, lists some useful NumPy functions and their descriptions.
To try out the code samples in this book, you will need a recent build of NumPy. This means that you will need one of the Python versions supported by NumPy as well. Some code samples make use of matplotlib for illustration purposes. matplotlib is not strictly required to follow the examples, but it is recommended that you install it too. The last chapter is about SciPy and has one example involving SciKits.
Here is a list of the software used to develop and test the code examples:
Needless to say, you don't need exactly this software and these versions on your computer. Python and NumPy constitute the absolute minimum you will need.
This book is for the scientists, engineers, programmers, or analysts looking for a high-quality, open source mathematical library. Knowledge of Python is assumed. Also, some affinity, or at least interest, in mathematics and statistics is required. However, I have provided brief explanations and pointers to learning resources.
In this book, you will find several headings that appear frequently (Time for action, What just happened?, Have a go hero, and Pop quiz).
To give clear instructions on how to complete a procedure or task, we use the following sections.
Instructions often need some extra explanation to ensure that they make sense, so they are followed by these sections.
This section explains the working of the tasks or instructions that you have just completed.
You will also find some other learning aids in the book.
These are short multiple-choice questions intended to help you test your own understanding.
These are practical challenges that give you ideas to experiment with what you have learned.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/NumpyBeginner'sGuide_Third_Edition_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
Let's get started. We will install NumPy and related software on different operating systems and have a look at some simple code that uses NumPy. This chapter briefly introduces the IPython interactive shell. SciPy is closely related to NumPy, so you will see the SciPy name appearing here and there. At the end of this chapter, you will find pointers on how to find additional information online if you get stuck or are uncertain about the best way to solve problems.
In this chapter, you will cover the following topics:
NumPy is based on Python, so you need to have Python installed. On some operating systems, Python is already installed. However, you need to check whether the Python version corresponds with the NumPy version you want to install. There are many implementations of Python, including commercial implementations and distributions. In this book, we focus on the standard CPython implementation, which is guaranteed to be compatible with NumPy.
NumPy has binary installers for Windows, various Linux distributions, and Mac OS X at http://sourceforge.net/projects/numpy/files/. There is also a source distribution, if you prefer that. You need to have Python 2.4.x or above installed on your system. We will go through the various steps required to install Python on the following operating systems:
Install, for instance, the Python 2.7 port by running the following command:
LinearAlgebraPACKage (LAPACK) does not need to be present but, if it is, NumPy will detect it and use it during the installation phase. It is recommended that you install LAPACK for serious numerical analysis as it has useful numerical linear algebra functionality.
We installed Python on Debian, Ubuntu, Windows, and the Mac OS X.
You can download the example code files for all the Packt books you have purchased from your account at https://www.packtpub.com/. If you purchased this book elsewhere, you can visit https://www.packtpub.com/books/content/support and register to have the files e-mailed directly to you.
Before we start the NumPy introduction, let's take a brief tour of the Python help system, in case you have forgotten how it works or are not very familiar with it. The Python help system allows you to look up documentation from the interactive Pythonshell. A shell is an interactive program, which accepts commands and executes them for you.
Depending on your operating system, you can access the Python shell with special applications, usually a terminal of some sort.
Type the following in the prompt:
Another message appears and the prompt changes as follows:
Pressing Ctrl + D together again ends the Python shell session.
We learned about the Python interactive shell and the Python help system.
In the Time for action – using the Python help system section, we used the Python shell to look up documentation. We can also use Python as a calculator. By the way, this is just a refresher, so if you are completely new to Python, I recommend taking some time to learn the basics. If you put your mind to it, learning basic Python should not take you more than a couple of weeks.
We can use Python as a calculator as follows:
We will discuss what this result is about in several later chapters of this book. Take the cube of 2 as follows:
We used the Python shell as a calculator and performed addition, multiplication, division, and exponentiation.
If you haven't programmed in Python for a while or are a Python novice, you may be confused about the Python 2 versus Python 3 discussions. In a nutshell, the latest version Python 3 is not backward compatible with the older Python 2 because the Python development team felt that some issues were fundamental and therefore warranted a radical change. The Python team has committed to maintain Python 2 until 2020. This may be problematic for the people who still depend on Python 2 in some way. The consequence for the print() function is that we have two types of syntax.
We can print using the print() function as follows: