Learning IPython for Interactive Computing and Data Visualization, Second Edition - Cyrille Rossant - E-Book

Learning IPython for Interactive Computing and Data Visualization, Second Edition E-Book

Cyrille Rossant

0,0
32,39 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Python is a user-friendly and powerful programming language. IPython offers a convenient interface to the language and its analysis libraries, while the Jupyter Notebook is a rich environment well-adapted to data science and visualization. Together, these open source tools are widely used by beginners and experts around the world, and in a huge variety of fields and endeavors.

This book is a beginner-friendly guide to the Python data analysis platform. After an introduction to the Python language, IPython, and the Jupyter Notebook, you will learn how to analyze and visualize data on real-world examples, how to create graphical user interfaces for image processing in the Notebook, and how to perform fast numerical computations for scientific simulations with NumPy, Numba, Cython, and ipyparallel. By the end of this book, you will be able to perform in-depth analyses of all sorts of data.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 194

Veröffentlichungsjahr: 2015

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Learning IPython for Interactive Computing and Data Visualization Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
1. Getting Started with IPython
What are Python, IPython, and Jupyter?
Jupyter and IPython
What this book covers
References
Installing Python with Anaconda
Downloading Anaconda
Installing Anaconda
Before you get started...
Opening a terminal
Finding your home directory
Manipulating your system path
Testing your installation
Managing environments
Common conda commands
References
Downloading the notebooks
Introducing the Notebook
Launching the IPython console
Launching the Jupyter Notebook
The Notebook dashboard
The Notebook user interface
Structure of a notebook cell
Markdown cells
Code cells
The Notebook modal interface
Keyboard shortcuts available in both modes
Keyboard shortcuts available in the edit mode
Keyboard shortcuts available in the command mode
References
A crash course on Python
Hello world
Variables
String escaping
Lists
Loops
Indentation
Conditional branches
Functions
Positional and keyword arguments
Passage by assignment
Errors
Object-oriented programming
Functional programming
Python 2 and 3
Going beyond the basics
Ten Jupyter/IPython essentials
Using IPython as an extended shell
Learning magic commands
Mastering tab completion
Writing interactive documents in the Notebook with Markdown
Creating interactive widgets in the Notebook
Running Python scripts from IPython
Introspecting Python objects
Debugging Python code
Benchmarking Python code
Profiling Python code
Summary
2. Interactive Data Analysis with pandas
Exploring a dataset in the Notebook
Provenance of the data
Downloading and loading a dataset
Making plots with matplotlib
Descriptive statistics with pandas and seaborn
Manipulating data
Selecting data
Selecting columns
Selecting rows
Filtering with boolean indexing
Computing with numbers
Working with text
Working with dates and times
Handling missing data
Complex operations
Group-by
Joins
Summary
3. Numerical Computing with NumPy
A primer to vector computing
Multidimensional arrays
The ndarray
Vector operations on ndarrays
How fast are vector computations in NumPy?
How an ndarray is stored in memory
Why operations on ndarrays are fast
Creating and loading arrays
Creating arrays
Loading arrays from files
Basic array manipulations
Computing with NumPy arrays
Selection and indexing
Boolean operations on arrays
Mathematical operations on arrays
A density map with NumPy
Other topics
Summary
4. Interactive Plotting and Graphical Interfaces
Choosing a plotting backend
Inline plots
Exported figures
GUI toolkits
Dynamic inline plots
Web-based visualization
matplotlib and seaborn essentials
Common plots with matplotlib
Customizing matplotlib figures
Interacting with matplotlib figures in the Notebook
High-level plotting with seaborn
Image processing
Further plotting and visualization libraries
High-level plotting
Bokeh
Vincent and Vega
Plotly
Maps and geometry
The matplotlib Basemap toolkit
GeoPandas
Leaflet wrappers: folium and mplleaflet
3D visualization
Mayavi
VisPy
Summary
5. High-Performance and Parallel Computing
Accelerating Python code with Numba
Random walk
Universal functions
Writing C in Python with Cython
Installing Cython and a C compiler for Python
Implementing the Eratosthenes Sieve in Python and Cython
Distributing tasks on several cores with IPython.parallel
Direct interface
Load-balanced interface
Further high-performance computing techniques
MPI
Distributed computing
C/C++ with Python
GPU computing
PyPy
Julia
Summary
6. Customizing IPython
Creating a custom magic command in an IPython extension
Writing a new Jupyter kernel
Displaying rich HTML elements in the Notebook
Displaying SVG in the Notebook
JavaScript and D3 in the Notebook
Customizing the Notebook interface with JavaScript
Summary
Index

Learning IPython for Interactive Computing and Data Visualization Second Edition

Learning IPython for Interactive Computing and Data Visualization Second Edition

Copyright © 2015 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2013

Second edition: October 2015

Production reference: 1151015

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-698-9

www.packtpub.com

Credits

Author

Cyrille Rossant

Reviewers

Damián Avila

Nicola Rainiero

G Scott Stukey

Commissioning Editor

Kartikey Pandey

Acquisition Editors

Kartikey Pandey

Richard Brookes-Bland

Content Development Editor

Arun Nadar

Technical Editor

Pranil Pathare

Copy Editor

Stephen Copestake

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Monica Ajmera Mehta

Production Coordinator

Conidon Miranda

Cover Work

Conidon Miranda

About the Author

Cyrille Rossant is a researcher in neuroinformatics, and is a graduate of Ecole Normale Superieure, Paris, where he studied mathematics and computer science. He has worked at Princeton University, University College London, and College de France. As part of his data science and software engineering projects, he gained experience in machine learning, high-performance computing, parallel computing, and big data visualization.

He is one of the main developers of VisPy, a high-performance visualization package in Python. He is the author of the IPython Interactive Computing and Visualization Cookbook, Packt Publishing, an advanced-level guide to data science and numerical computing with Python, and the sequel of this book.

I am grateful to Nick Fiorentini for his help during the revision of the book. I would also like to thank my family and notably my wife Claire for their support.

About the Reviewers

Damián Avila is a software developer and data scientist (formerly a biochemist) from Córdoba, Argentina.

His main focus of interest is data science, visualization, finance, and IPython/Jupyter-related projects.

In the open source area, he is a core developer for several interesting and popular projects, such as IPython/Jupyter, Bokeh, and Nikola. He has also started his own projects, being RISE, an extension to enable amazing live slides in the Jupyter notebook, the most popular one. He has also written several tutorials about the Scientific Python tools (available at Github) and presented several talks at international conferences.

Currently, he is working at Continuum Analytics.

Nicola Rainiero is a civil geotechnical engineer with a background in the construction industry as a self-employed designer engineer. He is also specialized in the renewable energy field and has collaborated with the Sant'Anna University of Pisa for two European projects, REGEOCITIES and PRISCA, using qualitative and quantitative data analysis techniques.

He has an ambition to simplify his work with open software and use and develop new ones; sometimes obtaining good results, at other times, negative. You can reach Nicola on his website at http://rainnic.altervista.org.

A special thanks to Packt Publishing for this opportunity to participate in the reviewing of this book. I thank my family, especially my parents, for their physical and moral support.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Data analysis skills are now essential in scientific research, engineering, finance, economics, journalism, and many other domains. With its high accessibility and vibrant ecosystem, Python is one of the most appreciated open source languages for data science.

This book is a beginner-friendly introduction to the Python data analysis platform, focusing on IPython (Interactive Python) and its Notebook. While IPython is an enhanced interactive Python terminal specifically designed for scientific computing and data analysis, the Notebook is a graphical interface that combines code, text, equations, and plots in a unified interactive environment.

The first edition of Learning IPython for Interactive Computing and Data Visualization was published in April 2013, several months before the release of IPython 1.0. This new edition targets IPython 4.0, released in August 2015. In addition to reflecting the novelties of this new version of IPython, the present book is also more accessible to non-programmer beginners. The first chapter contains a brand new crash course on Python programming, as well as detailed installation instructions.

Since the first edition of this book, IPython's popularity has grown significantly, with an estimated user base of several millions of people and ongoing collaborations with large companies like Microsoft, Google, IBM, and others. The project itself has been subject to important changes, with a refactoring into a language-independent interface called the Jupyter Notebook, and a set of backend kernels in various languages. The Notebook is no longer reserved to Python; it can now also be used with R, Julia, Ruby, Haskell, and many more languages (50 at the time of this writing!).

The Jupyter project has received significant funding in 2015 from the Leona M. and Harry B. Helmsley Charitable Trust, the Gordon and Betty Moore Foundation, and the Alfred P. Sloan Foundation, which will allow the developers to focus on the growth and maturity of the project in the years to come.

Here are a few references:

Home page for the Jupyter project at http://jupyter.org/Announcement of the funding for Jupyter at https://blog.jupyter.org/2015/07/07/jupyter-funding-2015/Detail of the project's grant at https://blog.jupyter.org/2015/07/07/project-jupyter-computational-narratives-as-the-engine-of-collaborative-data-science/

What this book covers

Chapter 1, Getting Started with IPython, is a thorough and beginner-friendly introduction to Anaconda (a popular Python distribution), the Python language, the Jupyter Notebook, and IPython.

Chapter 2, Interactive Data Analysis with pandas, is a hands-on introduction to interactive data analysis and visualization in the Notebook with pandas, matplotlib, and seaborn.

Chapter 3, Numerical Computing with NumPy, details how to use NumPy for efficient computing on multidimensional numerical arrays.

Chapter 4, Interactive Plotting and Graphical Interfaces, explores many capabilities of Python for interactive plotting, graphics, image processing, and interactive graphical interfaces in the Jupyter Notebook.

Chapter 5, High-Performance and Parallel Computing, introduces the various techniques you can employ to accelerate your numerical computing code, namely parallel computing and compilation of Python code.

Chapter 6, Customizing IPython, shows how IPython and the Jupyter Notebook can be extended for customized use-cases.

What you need for this book

The following software is required for the book:

Anaconda with Python 3Windows, Linux, or OS X can be used as a platform

Who this book is for

This book targets anyone who wants to analyze data or perform numerical simulations of mathematical models.

Since our world is becoming more and more data-driven, knowing how to analyze data effectively is an essential skill to learn. If you're used to spreadsheet programs like Microsoft Excel, you will appreciate Python for its much larger range of analysis and visualization possibilities. Knowing this general-purpose language will also let you share your data and analysis with other programs and libraries.

In conclusion, this book will be useful to students, scientists, engineers, analysts, journalists, statisticians, economists, hobbyists, and all data enthusiasts.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Run it with a command like bash Anaconda3-2.3.0-Linux-x86_64.sh (if necessary, replace the filename by the one you downloaded)."

A block of code is set as follows:

def load_ipython_extension(ipython): """This function is called when the extension is loaded. It accepts an IPython InteractiveShell instance. We can register the magic with the `register_magic_function` method of the shell instance.""" ipython.register_magic_function(cpp, 'cell')

Any command-line input or output is written as follows:

$ pythonPython 3.4.3 |Anaconda 2.3.0 (64-bit)| (default, Jun 4 2015, 15:29:08) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linuxType "help", "copyright", "credits" or "license" for more information.>>>

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "To create a new notebook, click on the New button, and select Notebook (Python 3)."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors. You can also report any issues at https://github.com/ipython-books/minibook-2nd-code/issues.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You will also find the book's code on this GitHub repository: https://github.com/ipython-books/minibook-2nd-code.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/6989OS_ColouredImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Getting Started with IPython

In this chapter, we will cover the following topics:

What are Python, IPython, and Jupyter?Installing Python with AnacondaIntroducing the NotebookA crash course on PythonTen Jupyter/IPython essentials

What are Python, IPython, and Jupyter?

Python is an open source general-purpose language created by Guido van Rossum in the late 1980s. It is widely-used by system administrators and developers for many purposes: for example, automating routine tasks or creating a web server. Python is a flexible and powerful language, yet it is sufficiently simple to be taught to school children with great success.

In the past few years, Python has also emerged as one of the leading open platforms for data science and high-performance numerical computing. This might seem surprising as Python was not originally designed for scientific computing. Python's interpreted nature makes it much slower than lower-level languages like C or Fortran, which are more amenable to number crunching and the efficient implementation of complex mathematical algorithms.

However, the performance of these low-level languages comes at a cost: they are hard to use and they require advanced knowledge of how computers work. In the late 1990s, several scientists began investigating the possibility of using Python for numerical computing by interoperating it with mainstream C/Fortran scientific libraries. This would bring together the ease-of-use of Python with the performance of C/Fortran: the dream of any scientist!

Consequently, the past 15 years have seen the development of widely-used libraries such as NumPy (providing a practical array data structure), SciPy (scientific computing), matplotlib (graphical plotting), pandas (data analysis and statistics), scikit-learn (machine learning), SymPy (symbolic computing), and Jupyter/IPython (efficient interfaces for interactive computing). Python, along with this set of libraries, is sometimes referred to as the SciPy stack or PyData platform.

Tip

Competing platforms

Python has several competitors. For example, MATLAB (by Mathworks) is a commercial software focusing on numerical computing that is widely-used in scientific research and engineering. SPSS (by IBM) is a commercial software for statistical analysis. Python, however, is free and open source, and that's one of its greatest strengths. Alternative open source platforms include R (specialized in statistics) and Julia (a young language for high-performance numerical computing).

More recently, this platform has gained popularity in other non-academic communities such as finance, engineering, statistics, data science, and others.

This book provides a solid introduction to the whole platform by focusing on one of its main components: Jupyter/IPython.

Jupyter and IPython

IPythonwas created in 2001 by Fernando Perez (the I in IPython stands for "interactive"). It was originally meant to be a convenient command-line interface to the scientific Python platform. In scientific computing, trial and error is the rule rather than the exception, and this requires an efficient interface that allows for interactive exploration of algorithms, data, and graphs.

In 2011, IPython introduced the interactive Notebook. Inspired by commercial software such as Maple (by Maplesoft) or Mathematica (by Wolfram Research), the Notebook runs in a browser and provides a unified web interface where code, text, mathematical equations, plots, graphics, and interactive graphical controls can be combined into a single document. This is an ideal interface for scientific computing. Here is a screenshot of a notebook:

Example of a notebook

It quickly became clear that this interface could be used with languages other than Python such as R, Julia, Lua, Ruby, and many others. Further, the Notebook is not restricted to scientific computing: it can be used for academic courses, software documentation, or book writing thanks to conversion tools targeting Markdown, HTML, PDF, ODT, and many other formats. Therefore, the IPython developers decided in 2014 to acknowledge the general-purpose nature of the Notebook by giving a new name to the project: Jupyter.

Jupyter features a language-independent Notebook platform that can work with a variety of kernels. Implemented in any language, a kernel is the backend of the Notebook interface. It manages the interactive session, the variables, the data, and so on. By contrast, the Notebook interface is the frontend of the system. It manages the user interface, the text editor, the plots, and so on. IPython is henceforth the name of the Python kernel for the Jupyter Notebook. Other kernels include IR, IJulia, ILua, IRuby, and many others (50 at the time of this writing).

In August 2015, the IPython/Jupyter developers achieved the "Big Split" by splitting the previous monolithic IPython codebase into a set of smaller projects, including the language-independent Jupyter Notebook (see https://blog.jupyter.org/2015/08/12/first-release-of-jupyter/). For example, the parallel computing features of IPython are now implemented in a standalone Python package named ipyparallel, the IPython widgets are implemented in ipywidgets, and so on. This separation makes the code of the project more modular and facilitates third-party contributions. IPython itself is now a much smaller project than before since it only features the interactive Python terminal and the Python kernel for the Jupyter Notebook.

Note

You will find the list of changes in IPython 4.0 at http://ipython.readthedocs.org/en/latest/whatsnew/version4.html. Many internal IPython imports have been deprecated due to the code reorganization. Warnings are raised if you attempt to perform a deprecated import. Also, the profiles have been removed and replaced with a unique default profile. However, you can simulate this functionality with environment variables. You will find more information at http://jupyter.readthedocs.org.

What this book covers

This book covers the Jupyter Notebook 1.0 and focuses on its Python kernel, IPython 4.0. In this chapter, we will introduce the platform, the Python language, the Jupyter Notebook interface, and IPython. In the remaining chapters, we will cover data analysis and scientific computing in Jupyter/IPython with the help of mainstream scientific libraries such as NumPy, pandas, and matplotlib.

Note

This book gives you a solid introduction to Jupyter and the SciPy platform. The IPython Interactive Computing and Visualization Cookbook (http://ipython-books.github.io/cookbook/) is the sequel of this introductory-level book. In 15 chapters and more than 500 pages, it contains a hundred recipes covering a wide range of interactive numerical computing techniques and data science topics. The IPython Cookbook is an excellent addition to the present IPython minibook if you're interested in delving into the platform in much greater detail.

References

Here are a few references about IPython and the Notebook:

The main Jupyter page at: http://jupyter.org/The main Jupyter documentation at: https://jupyter.readthedocs.org/en/latest/The main IPython page at: http://ipython.org/Jupyter on GitHub at: https://github.com/jupyterTry Jupyter online at: https://try.jupyter.org/The IPython Notebook in research, a Nature note at http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261

Installing Python with Anaconda

Although Python is an open-source, cross-platform language, installing it with the usual scientific packages used to be overly complicated. Fortunately, there is now an all-in-one scientific Python distribution, Anaconda