E-Book
32,39 €

Learning IPython for Interactive Computing and Data Visualization, Second Edition E-Book

Cyrille Rossant

0,0

32,39 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Fachliteratur
Sprache: Englisch

Beschreibung

Python is a user-friendly and powerful programming language. IPython offers a convenient interface to the language and its analysis libraries, while the Jupyter Notebook is a rich environment well-adapted to data science and visualization. Together, these open source tools are widely used by beginners and experts around the world, and in a huge variety of fields and endeavors.

This book is a beginner-friendly guide to the Python data analysis platform. After an introduction to the Python language, IPython, and the Jupyter Notebook, you will learn how to analyze and visualize data on real-world examples, how to create graphical user interfaces for image processing in the Notebook, and how to perform fast numerical computations for scientific simulations with NumPy, Numba, Cython, and ipyparallel. By the end of this book, you will be able to perform in-depth analyses of all sorts of data.

Details

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

MOBI

Seitenzahl: 194

Veröffentlichungsjahr: 2015

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Ähnliche

IPython Interactive Computing and Visualization Cookbook

Cyrille Rossant

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Der Weg zum erfolgreichen Unternehmer

Stefan Merath

Denke (nach) und werde reich

Napoleon Hill

30 Minuten Resilienz

Ulrich Siegrist

Krebszellen mögen keine Himbeeren - Der große Bestseller - Vollständig überarbeitet und aktualisiert

Richard Béliveau

Die Hormonrevolution

Michael E Platt

Der Crash ist die Lösung

Matthias Weik

Günter, der innere Schweinehund, lernt verkaufen

Stefan Frädrich

Mission erfüllt

Owen Mark

Die Leber wächst mit ihren Aufgaben

Dr. med. Eckart von Hirschhausen

Macht, was ihr liebt!

Anja Förster

Der größte Raubzug der Geschichte

Matthias Weik

Unsere Hunde - gesund durch Homöopathie

Hans Günter Wolff

Die Jahrhundertlüge, die nur Insider kennen

Heiko Schrang

Organisation für Komplexität

Niels Pfläging

Radikal führen

Reinhard K. Sprenger

30 Minuten Sympathisch und souverän: So geht Vortragen!

Thomas Lorenz

BLACKOUT - Morgen ist es zu spät

Marc Elsberg

The Truth About Employee Engagement

Patrick M. Lencioni

Leseprobe

Learning IPython for Interactive Computing and Data Visualization Second Edition

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Getting Started with IPython

What are Python, IPython, and Jupyter?

Jupyter and IPython

What this book covers

References

Installing Python with Anaconda

Downloading Anaconda

Installing Anaconda

Before you get started...

Opening a terminal

Finding your home directory

Manipulating your system path

Testing your installation

Managing environments

Common conda commands

References

Downloading the notebooks

Introducing the Notebook

Launching the IPython console

Launching the Jupyter Notebook

The Notebook dashboard

The Notebook user interface

Structure of a notebook cell

Markdown cells

Code cells

The Notebook modal interface

Keyboard shortcuts available in both modes

Keyboard shortcuts available in the edit mode

Keyboard shortcuts available in the command mode

References

A crash course on Python

Hello world

Variables

String escaping

Lists

Loops

Indentation

Conditional branches

Functions

Positional and keyword arguments

Passage by assignment

Errors

Object-oriented programming

Functional programming

Python 2 and 3

Going beyond the basics

Ten Jupyter/IPython essentials

Using IPython as an extended shell

Learning magic commands

Mastering tab completion

Writing interactive documents in the Notebook with Markdown

Creating interactive widgets in the Notebook

Running Python scripts from IPython

Introspecting Python objects

Debugging Python code

Benchmarking Python code

Profiling Python code

Summary

2. Interactive Data Analysis with pandas

Exploring a dataset in the Notebook

Provenance of the data

Downloading and loading a dataset

Making plots with matplotlib

Descriptive statistics with pandas and seaborn

Manipulating data

Selecting data

Selecting columns

Selecting rows

Filtering with boolean indexing

Computing with numbers

Working with text

Working with dates and times

Handling missing data

Complex operations

Group-by

Joins

Summary

3. Numerical Computing with NumPy

A primer to vector computing

Multidimensional arrays

The ndarray

Vector operations on ndarrays

How fast are vector computations in NumPy?

How an ndarray is stored in memory

Why operations on ndarrays are fast

Creating and loading arrays

Creating arrays

Loading arrays from files

Basic array manipulations

Computing with NumPy arrays

Selection and indexing

Boolean operations on arrays

Mathematical operations on arrays

A density map with NumPy

Learning IPython for Interactive Computing and Data Visualization Second Edition

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: April 2013

Second edition: October 2015

Production reference: 1151015

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78398-698-9

www.packtpub.com

Credits

Author

Cyrille Rossant

Reviewers

Damián Avila

Nicola Rainiero

G Scott Stukey

Commissioning Editor

Kartikey Pandey

Acquisition Editors

Kartikey Pandey

Richard Brookes-Bland

Content Development Editor

Arun Nadar

Technical Editor

Pranil Pathare

Copy Editor

Stephen Copestake

Project Coordinator

Shweta H Birwatkar

Proofreader

Safis Editing

Indexer

Monica Ajmera Mehta

Production Coordinator

Conidon Miranda

Cover Work

Conidon Miranda

About the Author

Cyrille Rossant is a researcher in neuroinformatics, and is a graduate of Ecole Normale Superieure, Paris, where he studied mathematics and computer science. He has worked at Princeton University, University College London, and College de France. As part of his data science and software engineering projects, he gained experience in machine learning, high-performance computing, parallel computing, and big data visualization.

He is one of the main developers of VisPy, a high-performance visualization package in Python. He is the author of the IPython Interactive Computing and Visualization Cookbook, Packt Publishing, an advanced-level guide to data science and numerical computing with Python, and the sequel of this book.

I am grateful to Nick Fiorentini for his help during the revision of the book. I would also like to thank my family and notably my wife Claire for their support.

About the Reviewers

Damián Avila is a software developer and data scientist (formerly a biochemist) from Córdoba, Argentina.

His main focus of interest is data science, visualization, finance, and IPython/Jupyter-related projects.

In the open source area, he is a core developer for several interesting and popular projects, such as IPython/Jupyter, Bokeh, and Nikola. He has also started his own projects, being RISE, an extension to enable amazing live slides in the Jupyter notebook, the most popular one. He has also written several tutorials about the Scientific Python tools (available at Github) and presented several talks at international conferences.

Currently, he is working at Continuum Analytics.

Nicola Rainiero is a civil geotechnical engineer with a background in the construction industry as a self-employed designer engineer. He is also specialized in the renewable energy field and has collaborated with the Sant'Anna University of Pisa for two European projects, REGEOCITIES and PRISCA, using qualitative and quantitative data analysis techniques.

He has an ambition to simplify his work with open software and use and develop new ones; sometimes obtaining good results, at other times, negative. You can reach Nicola on his website at http://rainnic.altervista.org.

A special thanks to Packt Publishing for this opportunity to participate in the reviewing of this book. I thank my family, especially my parents, for their physical and moral support.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

Data analysis skills are now essential in scientific research, engineering, finance, economics, journalism, and many other domains. With its high accessibility and vibrant ecosystem, Python is one of the most appreciated open source languages for data science.

This book is a beginner-friendly introduction to the Python data analysis platform, focusing on IPython (Interactive Python) and its Notebook. While IPython is an enhanced interactive Python terminal specifically designed for scientific computing and data analysis, the Notebook is a graphical interface that combines code, text, equations, and plots in a unified interactive environment.

The first edition of Learning IPython for Interactive Computing and Data Visualization was published in April 2013, several months before the release of IPython 1.0. This new edition targets IPython 4.0, released in August 2015. In addition to reflecting the novelties of this new version of IPython, the present book is also more accessible to non-programmer beginners. The first chapter contains a brand new crash course on Python programming, as well as detailed installation instructions.

Since the first edition of this book, IPython's popularity has grown significantly, with an estimated user base of several millions of people and ongoing collaborations with large companies like Microsoft, Google, IBM, and others. The project itself has been subject to important changes, with a refactoring into a language-independent interface called the Jupyter Notebook, and a set of backend kernels in various languages. The Notebook is no longer reserved to Python; it can now also be used with R, Julia, Ruby, Haskell, and many more languages (50 at the time of this writing!).

The Jupyter project has received significant funding in 2015 from the Leona M. and Harry B. Helmsley Charitable Trust, the Gordon and Betty Moore Foundation, and the Alfred P. Sloan Foundation, which will allow the developers to focus on the growth and maturity of the project in the years to come.

Here are a few references:

Home page for the Jupyter project at http://jupyter.org/Announcement of the funding for Jupyter at https://blog.jupyter.org/2015/07/07/jupyter-funding-2015/Detail of the project's grant at https://blog.jupyter.org/2015/07/07/project-jupyter-computational-narratives-as-the-engine-of-collaborative-data-science/

What this book covers

Chapter 1, Getting Started with IPython, is a thorough and beginner-friendly introduction to Anaconda (a popular Python distribution), the Python language, the Jupyter Notebook, and IPython.

Chapter 2, Interactive Data Analysis with pandas, is a hands-on introduction to interactive data analysis and visualization in the Notebook with pandas, matplotlib, and seaborn.

Chapter 3, Numerical Computing with NumPy, details how to use NumPy for efficient computing on multidimensional numerical arrays.

Chapter 4, Interactive Plotting and Graphical Interfaces, explores many capabilities of Python for interactive plotting, graphics, image processing, and interactive graphical interfaces in the Jupyter Notebook.

Chapter 5, High-Performance and Parallel Computing, introduces the various techniques you can employ to accelerate your numerical computing code, namely parallel computing and compilation of Python code.

Chapter 6, Customizing IPython, shows how IPython and the Jupyter Notebook can be extended for customized use-cases.

What you need for this book

The following software is required for the book:

Anaconda with Python 3Windows, Linux, or OS X can be used as a platform

Who this book is for

This book targets anyone who wants to analyze data or perform numerical simulations of mathematical models.

Since our world is becoming more and more data-driven, knowing how to analyze data effectively is an essential skill to learn. If you're used to spreadsheet programs like Microsoft Excel, you will appreciate Python for its much larger range of analysis and visualization possibilities. Knowing this general-purpose language will also let you share your data and analysis with other programs and libraries.

In conclusion, this book will be useful to students, scientists, engineers, analysts, journalists, statisticians, economists, hobbyists, and all data enthusiasts.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "Run it with a command like bash Anaconda3-2.3.0-Linux-x86_64.sh (if necessary, replace the filename by the one you downloaded)."

A block of code is set as follows:

def load_ipython_extension(ipython): """This function is called when the extension is loaded. It accepts an IPython InteractiveShell instance. We can register the magic with the `register_magic_function` method of the shell instance.""" ipython.register_magic_function(cpp, 'cell')

Any command-line input or output is written as follows:

$ pythonPython 3.4.3 |Anaconda 2.3.0 (64-bit)| (default, Jun 4 2015, 15:29:08) [GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linuxType "help", "copyright", "credits" or "license" for more information.>>>

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "To create a new notebook, click on the New button, and select Notebook (Python 3)."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors. You can also report any issues at https://github.com/ipython-books/minibook-2nd-code/issues.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you. You will also find the book's code on this GitHub repository: https://github.com/ipython-books/minibook-2nd-code.

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/6989OS_ColouredImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.

Chapter 1. Getting Started with IPython

In this chapter, we will cover the following topics:

What are Python, IPython, and Jupyter?Installing Python with AnacondaIntroducing the NotebookA crash course on PythonTen Jupyter/IPython essentials

What are Python, IPython, and Jupyter?

Python is an open source general-purpose language created by Guido van Rossum in the late 1980s. It is widely-used by system administrators and developers for many purposes: for example, automating routine tasks or creating a web server. Python is a flexible and powerful language, yet it is sufficiently simple to be taught to school children with great success.

In the past few years, Python has also emerged as one of the leading open platforms for data science and high-performance numerical computing. This might seem surprising as Python was not originally designed for scientific computing. Python's interpreted nature makes it much slower than lower-level languages like C or Fortran, which are more amenable to number crunching and the efficient implementation of complex mathematical algorithms.

However, the performance of these low-level languages comes at a cost: they are hard to use and they require advanced knowledge of how computers work. In the late 1990s, several scientists began investigating the possibility of using Python for numerical computing by interoperating it with mainstream C/Fortran scientific libraries. This would bring together the ease-of-use of Python with the performance of C/Fortran: the dream of any scientist!

Consequently, the past 15 years have seen the development of widely-used libraries such as NumPy (providing a practical array data structure), SciPy (scientific computing), matplotlib (graphical plotting), pandas (data analysis and statistics), scikit-learn (machine learning), SymPy (symbolic computing), and Jupyter/IPython (efficient interfaces for interactive computing). Python, along with this set of libraries, is sometimes referred to as the SciPy stack or PyData platform.

Tip

Competing platforms

Python has several competitors. For example, MATLAB (by Mathworks) is a commercial software focusing on numerical computing that is widely-used in scientific research and engineering. SPSS (by IBM) is a commercial software for statistical analysis. Python, however, is free and open source, and that's one of its greatest strengths. Alternative open source platforms include R (specialized in statistics) and Julia (a young language for high-performance numerical computing).

More recently, this platform has gained popularity in other non-academic communities such as finance, engineering, statistics, data science, and others.

This book provides a solid introduction to the whole platform by focusing on one of its main components: Jupyter/IPython.

Jupyter and IPython

IPythonwas created in 2001 by Fernando Perez (the I in IPython stands for "interactive"). It was originally meant to be a convenient command-line interface to the scientific Python platform. In scientific computing, trial and error is the rule rather than the exception, and this requires an efficient interface that allows for interactive exploration of algorithms, data, and graphs.

In 2011, IPython introduced the interactive Notebook. Inspired by commercial software such as Maple (by Maplesoft) or Mathematica (by Wolfram Research), the Notebook runs in a browser and provides a unified web interface where code, text, mathematical equations, plots, graphics, and interactive graphical controls can be combined into a single document. This is an ideal interface for scientific computing. Here is a screenshot of a notebook:

Example of a notebook

It quickly became clear that this interface could be used with languages other than Python such as R, Julia, Lua, Ruby, and many others. Further, the Notebook is not restricted to scientific computing: it can be used for academic courses, software documentation, or book writing thanks to conversion tools targeting Markdown, HTML, PDF, ODT, and many other formats. Therefore, the IPython developers decided in 2014 to acknowledge the general-purpose nature of the Notebook by giving a new name to the project: Jupyter.

Jupyter features a language-independent Notebook platform that can work with a variety of kernels. Implemented in any language, a kernel is the backend of the Notebook interface. It manages the interactive session, the variables, the data, and so on. By contrast, the Notebook interface is the frontend of the system. It manages the user interface, the text editor, the plots, and so on. IPython is henceforth the name of the Python kernel for the Jupyter Notebook. Other kernels include IR, IJulia, ILua, IRuby, and many others (50 at the time of this writing).

In August 2015, the IPython/Jupyter developers achieved the "Big Split" by splitting the previous monolithic IPython codebase into a set of smaller projects, including the language-independent Jupyter Notebook (see https://blog.jupyter.org/2015/08/12/first-release-of-jupyter/). For example, the parallel computing features of IPython are now implemented in a standalone Python package named ipyparallel, the IPython widgets are implemented in ipywidgets, and so on. This separation makes the code of the project more modular and facilitates third-party contributions. IPython itself is now a much smaller project than before since it only features the interactive Python terminal and the Python kernel for the Jupyter Notebook.

Note

You will find the list of changes in IPython 4.0 at http://ipython.readthedocs.org/en/latest/whatsnew/version4.html. Many internal IPython imports have been deprecated due to the code reorganization. Warnings are raised if you attempt to perform a deprecated import. Also, the profiles have been removed and replaced with a unique default profile. However, you can simulate this functionality with environment variables. You will find more information at http://jupyter.readthedocs.org.

What this book covers

This book covers the Jupyter Notebook 1.0 and focuses on its Python kernel, IPython 4.0. In this chapter, we will introduce the platform, the Python language, the Jupyter Notebook interface, and IPython. In the remaining chapters, we will cover data analysis and scientific computing in Jupyter/IPython with the help of mainstream scientific libraries such as NumPy, pandas, and matplotlib.

Note

This book gives you a solid introduction to Jupyter and the SciPy platform. The IPython Interactive Computing and Visualization Cookbook (http://ipython-books.github.io/cookbook/) is the sequel of this introductory-level book. In 15 chapters and more than 500 pages, it contains a hundred recipes covering a wide range of interactive numerical computing techniques and data science topics. The IPython Cookbook is an excellent addition to the present IPython minibook if you're interested in delving into the platform in much greater detail.

References

Here are a few references about IPython and the Notebook:

The main Jupyter page at: http://jupyter.org/The main Jupyter documentation at: https://jupyter.readthedocs.org/en/latest/The main IPython page at: http://ipython.org/Jupyter on GitHub at: https://github.com/jupyterTry Jupyter online at: https://try.jupyter.org/The IPython Notebook in research, a Nature note at http://www.nature.com/news/interactive-notebooks-sharing-the-code-1.16261

Installing Python with Anaconda

Although Python is an open-source, cross-platform language, installing it with the usual scientific packages used to be overly complicated. Fortunately, there is now an all-in-one scientific Python distribution, Anaconda