IPython Interactive Computing and Visualization Cookbook - Cyrille Rossant - E-Book

IPython Interactive Computing and Visualization Cookbook E-Book

Cyrille Rossant

0,0
28,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Learn to use IPython and Jupyter Notebook for your data analysis and visualization work.

Key Features

  • Leverage the Jupyter Notebook for interactive data science and visualization
  • Become an expert in high-performance computing and visualization for data analysis and scientific modeling
  • A comprehensive coverage of scientific computing through many hands-on, example-driven recipes with detailed, step-by-step explanations

Book Description

Python is one of the leading open source platforms for data science and numerical computing. IPython and the associated Jupyter Notebook offer efficient interfaces to Python for data analysis and interactive visualization, and they constitute an ideal gateway to the platform.

IPython Interactive Computing and Visualization Cookbook, Second Edition contains many ready-to-use, focused recipes for high-performance scientific computing and data analysis, from the latest IPython/Jupyter features to the most advanced tricks, to help you write better and faster code. You will apply these state-of-the-art methods to various real-world examples, illustrating topics in applied mathematics, scientific modeling, and machine learning.

The first part of the book covers programming techniques: code quality and reproducibility, code optimization, high-performance computing through just-in-time compilation, parallel computing, and graphics card programming. The second part tackles data science, statistics, machine learning, signal and image processing, dynamical systems, and pure and applied mathematics.

What you will learn

  • Master all features of the Jupyter Notebook
  • Code better: write high-quality, readable, and well-tested programs; profile and optimize your code; and conduct reproducible interactive computing experiments
  • Visualize data and create interactive plots in the Jupyter Notebook
  • Write blazingly fast Python programs with NumPy, ctypes, Numba, Cython, OpenMP, GPU programming (CUDA), parallel IPython, Dask, and more
  • Analyze data with Bayesian or frequentist statistics (Pandas, PyMC, and R), and learn from actual data through machine learning (scikit-learn)
  • Gain valuable insights into signals, images, and sounds with SciPy, scikit-image, and OpenCV
  • Simulate deterministic and stochastic dynamical systems in Python
  • Familiarize yourself with math in Python using SymPy and Sage: algebra, analysis, logic, graphs, geometry, and probability theory

Who this book is for

This book is intended for anyone interested in numerical computing and data science: students, researchers, teachers, engineers, analysts, and hobbyists. A basic knowledge of Python/NumPy is recommended. Some skills in mathematics will help you understand the theory behind the computational methods.

Cyrille Rossant, PhD, is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he has gained experience in numerical computing, parallel computing, and high-performance data visualization. He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing, the prequel of this cookbook.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 572

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

IPython Interactive Computing and Visualization CookbookSecond Edition
Why subscribe?
PacktPub.com
Contributors
About the author
Packt is Searching for Authors Like You
Preface
Who this book is for
What this book covers
Part 1 – Interactive Computing with Jupyter
Part 2 – Standard Methods in Data Science and Applied Mathematics
To get the most out of this book

Installing Python
GitHub repositories
Download the example code files
Download the color images
Conventions used
Sections
Getting ready
How to do it…
How it works…
There's more…
See also
Get in touch
Reviews
1. A Tour of Interactive Computing with Jupyter and IPython
Introduction
What is Python?
What is IPython?
What is Jupyter?
What is the SciPy ecosystem?
What's new in the SciPy ecosystem?
How to install Python
References
Introducing IPython and the Jupyter Notebook
Getting ready
How to do it...
There's more...
See also
Getting started with exploratory data analysis in the Jupyter Notebook
How to do it...
How it works...
There's more...
See also
Introducing the multidimensional array in NumPy for fast array computations
How to do it...
How it works...
There's more...
See also
Creating an IPython extension with custom magic commands
How to do it...
How it works...
The InteractiveShell class
Loading an extension
There's more...
See also
Mastering IPython's configuration system
How to do it...
How it works...
Configurables
Magics
There's more...
See also
Creating a simple kernel for Jupyter
How to do it...
How it works...
There's more...
2. Best Practices in Interactive Computing
Introduction
Learning the basics of the Unix shell
Getting ready
How to do it...
There's more...
See also
Using the latest features of Python 3
How to do it...
There's more...
Learning the basics of the distributed version control system Git
Getting ready
How to do it...
How it works...
There's more...
See also
A typical workflow with Git branching
Getting ready
How to do it...
How it works...
There's more...
See also
Efficient interactive computing workflows with IPython
How to do it...
The IPython terminal
IPython and text editor
The Jupyter Notebook
Integrated Development Environments
There's more...
See also
Ten tips for conducting reproducible interactive computing experiments
How to do it...
How it works...
There's more...
See also
Writing high-quality Python code
How to do it...
How it works...
There's more...
See also
Writing unit tests with pytest
Getting ready
How to do it...
How it works...
There's more...
Test coverage
Workflows with unit testing
Unit testing and continuous integration
Debugging code with IPython
How to do it...
The post-mortem mode
Step-by-step debugging
There's more...
3. Mastering the Jupyter Notebook
Introduction
The Notebook ecosystem
Architecture of the Jupyter Notebook
Connecting multiple clients to one kernel
JupyterHub
Security in notebooks
References
Teaching programming in the Notebook with IPython Blocks
Getting ready
How to do it...
Converting a Jupyter notebook to other formats with nbconvert
Getting ready
How to do it...
How it works...
There's more...
Mastering widgets in the Jupyter Notebook
Getting ready
How to do it...
There's more...
See also
Creating custom Jupyter Notebook widgets in Python, HTML, and JavaScript
How to do it...
There's more...
See also
Configuring the Jupyter Notebook
How to do it...
There's more...
See also
Introducing JupyterLab
Getting ready
How to do it...
There's more...
See also
4. Profiling and Optimization
Introduction
Evaluating the time taken by a command in IPython
How to do it...
How it works...
There's more...
See also
Profiling your code easily with cProfile and IPython
How to do it...
How it works...
There's more...
See also
Profiling your code line-by-line with line_profiler
Getting ready
How do to it...
How it works...
There's more...
See also
Profiling the memory usage of your code with memory_profiler
Getting ready
How to do it...
How it works...
There's more...
See also
Understanding the internals of NumPy to avoid unnecessary array copying
Getting ready
How to do it...
How it works...
Why are NumPy arrays efficient?
What is the difference between in-place and implicit-copy operations?
Why can't some arrays be reshaped without a copy?
What are NumPy broadcasting rules?
There's more...
See also
Using stride tricks with NumPy
Getting ready
How to do it...
How it works...
See also
Implementing an efficient rolling average algorithm with stride tricks
How to do it...
See also
Processing large NumPy arrays with memory mapping
How to do it...
How it works...
There's more...
See also
Manipulating large arrays with HDF5
Getting ready
How to do it...
How it works...
There's more...
See also
5. High-Performance Computing
Introduction
CPython and concurrent programming
Compiler-related installation instructions
Using Python to write faster code
How to do it...
There's more...
See also
Accelerating pure Python code with Numba and Just-In-Time compilation
Getting ready
How to do it...
How it works...
There's more...
See also
Accelerating array computations with NumExpr
Getting ready
How to do it...
How it works...
See also
Wrapping a C library in Python with ctypes
Getting ready
How to do it...
How it works...
There's more...
See also
Accelerating Python code with Cython
Getting ready
How to do it...
How it works...
There's more...
See also
Optimizing Cython code by writing less Python and more C
How to do it...
How it works...
There's more...
See also
Releasing the GIL to take advantage of multi-core processors with Cython and OpenMP
Getting ready
How to do it...
How it works...
See also
Writing massively parallel code for NVIDIA graphics cards (GPUs) with CUDA
Getting ready
How to do it...
How it works...
There's more...
See also
Distributing Python code across multiple cores with IPython
Getting started
How to do it...
How it works...
There's more...
References
See also
Interacting with asynchronous parallel tasks in IPython
Getting ready
How to do it...
How it works...
There's more...
See also
Performing out-of-core computations on large arrays with Dask
Getting ready
How to do it...
There's more...
See also
Trying the Julia programming language in the Jupyter Notebook
Getting ready
How to do it...
How it works...
There's more...
6. Data Visualization
Introduction
Using Matplotlib styles
How to do it...
There's more...
See also
Creating statistical plots easily with seaborn
How to do it...
There's more...
See also
Creating interactive web visualizations with Bokeh and HoloViews
Getting ready
How to do it...
There's more...
Visualizing a NetworkX graph in the Notebook with D3.js
Getting ready
How to do it...
There's more...
See also
Discovering interactive visualization libraries in the Notebook
Getting started
How to do it...
There's more
Creating plots with Altair and the Vega-Lite specification
Getting started...
How to do it...
How it works...
There's more...
See also
7. Statistical Data Analysis
Introduction
What is statistical data analysis?
A bit of vocabulary
Exploration, inference, decision, prediction
Univariate and multivariate methods
Frequentist and Bayesian methods
Parametric and nonparametric inference methods
Exploring a dataset with pandas and Matplotlib
How to do it...
There's more...
Getting started with statistical hypothesis testing — a simple z-test
Getting ready
How to do it...
How it works...
There's more...
See also
Getting started with Bayesian methods
Getting ready
How to do it...
How it works...
Bayes' theorem
Computation of the posterior distribution
Maximum a posteriori estimation
There's more...
Credible interval
Conjugate distributions
Non-informative (objective) prior distributions
See also
Estimating the correlation between two variables with a contingency table and a chi-squared test
How to do it...
How it works...
Pearson's correlation coefficient
Contingency table and chi-squared test
There's more...
See also
Fitting a probability distribution to data with the maximum likelihood method
Getting ready
How to do it...
How it works...
There's more...
See also
Estimating a probability distribution nonparametrically with a kernel density estimation
Getting ready
How to do it...
How it works...
See also
Fitting a Bayesian model by sampling from a posterior distribution with a Markov chain Monte Carlo method
Getting ready
How to do it...
How it works...
There's more...
See also
Analyzing data with the R programming language in the Jupyter Notebook
Getting ready
How to do it...
How it works...
There's more...
See also
8. Machine Learning
Introduction
A bit of vocabulary
Learning from data
Supervised learning
Unsupervised learning
Feature selection and feature extraction
Overfitting, underfitting, and the bias-variance tradeoff
Model selection
Machine learning references
Getting started with scikit-learn
Getting ready
How to do it...
How it works...
scikit-learn API
Ordinary Least Squares regression
Polynomial interpolation with linear regression
Ridge regression
Cross-validation and grid search
There's more...
Predicting who will survive on the Titanic with logistic regression
How to do it...
How it works...
There's more...
See also
Learning to recognize handwritten digits with a K-nearest neighbors classifier
How to do it...
How it works...
There's more...
See also
Learning from text – Naive Bayes for Natural Language Processing
How to do it...
How it works...
There's more...
See also
Using support vector machines for classification tasks
How to do it...
How it works...
There's more...
See also
Using a random forest to select important features for regression
How to do it...
How it works...
There's more...
See also
Reducing the dimensionality of a dataset with a principal component analysis
How to do it...
How it works...
There's more...
See also
Detecting hidden structures in a dataset with clustering
How to do it...
How it works...
There's more...
See also
9. Numerical Optimization
Introduction
The objective function
Local and global minima
Constrained and unconstrained optimization
Deterministic and stochastic algorithms
References
Finding the root of a mathematical function
How to do it...
How it works...
There's more…
See also
Minimizing a mathematical function
How to do it...
How it works...
There's more...
See also
Fitting a function to data with nonlinear least squares
How to do it...
How it works...
See also
Finding the equilibrium state of a physical system by minimizing its potential energy
How to do it...
How it works...
There's more...
See also
10. Signal Processing
Introduction
Analog and digital signals
The Nyquist–Shannon sampling theorem
Compressed sensing
References
Analyzing the frequency components of a signal with a Fast Fourier Transform
How to do it...
How it works...
The discrete Fourier transform
Inverse Fourier transform
There's more...
See also
Applying a linear filter to a digital signal
How to do it...
How it works...
What are linear filters?
Linear filters and convolutions
The FIR and IIR filters
Filters in the frequency domain
The low-, high-, and band-pass filters
There's more...
See also
Computing the autocorrelation of a time series
How to do it...
How it works...
There's more...
See also
11. Image and Audio Processing
Introduction
Images
Sounds
References
Manipulating the exposure of an image
Getting ready
How to do it...
How it works...
There's more...
See also
Applying filters on an image
How it works...
How it works...
There's more...
See also
Segmenting an image
How to do it...
How it works...
There's more...
See also
Finding points of interest in an image
How to do it...
How it works...
There's more...
Detecting faces in an image with OpenCV
Getting ready
How to do it...
How it works...
There's more...
Applying digital filters to speech sounds
Getting ready
How to do it
How it works...
There's more...
See also
Creating a sound synthesizer in the Notebook
How to do it...
How it works...
There's more...
See also
12. Deterministic Dynamical Systems
Introduction
Types of dynamical systems
Differential equations
References
Plotting the bifurcation diagram of a chaotic dynamical system
How to do it...
There's more...
See also
Simulating an elementary cellular automaton
How to do it...
How it works...
There's more...
Simulating an ordinary differential equation with SciPy
How to do it...
How it works...
There's more...
See also
Simulating a partial differential equation — reaction-diffusion systems and Turing patterns
How to do it...
How it works...
There's more...
13. Stochastic Dynamical Systems
Introduction
References
Simulating a discrete-time Markov chain
How to do it...
How it works...
There's more...
See also
Simulating a Poisson process
How to do it...
How it works...
There's more...
See also
Simulating a Brownian motion
How to do it...
How it works...
There's more...
See also
Simulating a stochastic differential equation
How to do it...
How it works...
There's more...
See also
14. Graphs, Geometry, and Geographic Information Systems
Introduction
Graphs
Problems in graph theory
Random graphs
Graphs in Python
Geometry in Python
Geographical information systems in Python
References
Manipulating and visualizing graphs with NetworkX
Getting ready
How to do it...
There's more...
See also
Drawing flight routes with NetworkX
Getting ready
How to do it...
See also
Resolving dependencies in a directed acyclic graph with a topological sort
How to do it...
How it works...
There's more...
Computing connected components in an image
How to do it...
How it works...
There's more...
Computing the Voronoi diagram of a set of points
Getting ready
How to do it...
How it works...
There's more...
See also
Manipulating geospatial data with Cartopy
Getting ready
How to do it...
There's more...
See also
Creating a route planner for a road network
Getting ready
How to do it...
How it works...
There's more...
15. Symbolic and Numerical Mathematics
Introduction
LaTeX
Diving into symbolic computing with SymPy
Getting ready
How to do it...
How it works...
See also
Solving equations and inequalities
How to do it...
There's more...
Analyzing real-valued functions
How to do it...
There's more...
Computing exact probabilities and manipulating random variables
How to do it...
How it works...
There's more...
A bit of number theory with SymPy
Getting ready
How to do it...
How it works...
There's more...
Finding a Boolean propositional formula from a truth table
How to do it...
How it works...
There's more...
Analyzing a nonlinear differential system — Lotka-Volterra (predator-prey) equations
Getting ready
How to do it...
How it works...
There's more...
Getting started with Sage
Getting ready
How to do it...
There's more...
See also
Index

IPython Interactive Computing and Visualization Cookbook Second Edition

IPython Interactive Computing and Visualization CookbookSecond Edition

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Veena Pagare

Acquisition Editor: Dominic Shakeshaft

Project Editor: Suzanne Coutinho

Technical Editors: Bhagyashree Rai, Nidhisha Shetty

Proofreader: Safis Editing

Indexer: Aishwarya Gangawane

Graphics: Tom Scaria

Production Coordinator: Shantanu Zagade

First published: September 2014

Second Edition: January 2018

Production reference: 1290118

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-863-2

www.packtpub.com

mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionalsLearn better with Skill Plans built especially for youGet a free eBook or video every monthMapt is fully searchableCopy and paste, print, and bookmark content

PacktPub.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Cyrille Rossant, PhD, is a neuroscience researcher and software engineer at University College London. He is a graduate of École Normale Supérieure, Paris, where he studied mathematics and computer science. He has also worked at Princeton University and Collège de France. While working on data science and software engineering projects, he has gained experience in numerical computing, parallel computing, and high-performance data visualization.

He is the author of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing, the prequel of this cookbook.

I'm grateful to everyone who gave their feedback on this book, including Matthias Bussonnier, Thomas Caswell, Guillaume Gay, Brian Granger, Matthew Rocklin, Steven Silvester, and Jake VanderPlas. I'd also like to thank my family for their support.

Packt is Searching for Authors Like You

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Preface

We are becoming awash in the flood of digital data from scientific research, engineering, economics, politics, journalism, business, and many other domains. As a result, analyzing, visualizing, and harnessing data is the occupation of an increasingly large and diverse set of people. Quantitative skills such as programming, numerical computing, mathematics, statistics, and data mining, which form the core of data science, are more and more appreciated in a seemingly endless plethora of fields.

Python, a widely-known programming language, is also one of the leading open platforms for data science. IPython is a mature Python project that provides scientist-friendly interactive access to Python. It is part of the broader Project Jupyter, which aims to provide high-quality environments for interactive computing, data analysis, visualization, and the authoring of interactive scientific documents. Jupyter is estimated to have several million users today.

The prequel of this book, Learning IPython for Interactive Computing and Data Visualization Second Edition, Packt Publishing was published in 2015, two years after the first edition. It is a beginner-level introduction to data science and numerical computing with Python, IPython, and Jupyter.

This book, the first edition of which was published in 2014, continues that journey by presenting more than 100 recipes for interactive scientific computing and data science. These recipes not only cover programming topics such as numerical computing, high-performance computing, parallel computing, and interactive visualization, but also data analysis topics such as statistics, data mining, machine learning, signal processing, graph theory, numerical optimization, and many others.

This second edition is fully compatible with the latest versions of the platform and its libraries. It includes new recipes to better leverage the latest features of Python 3, and it introduces promising new projects such as JupyterLab, Altair, and Dask.

Note

By design, this book privileges breadth over depth. A particularly wide range of libraries and techniques are covered in this book, but not comprehensively. We give many references that let you deepen your knowledge of individual methods. The goal of this book is not to make you an expert of the subjects covered, but to give you a glimpse of the extremely diverse set of applications that you can tackle with the platform.

All the recipes in this book, which cover a specific techniques, are available online as a Jupyter notebook. This interactive document lets you read, execute, and modify the code interactively, which makes the learning process more engaging and dynamic.

Almost all of this book's content is available online on the GitHub platform (http://ipython-books.github.io/). Updates and corrections will be regularly published there, so you should make sure you check out the latest version of the book online.

Who this book is for

This book targets researchers, engineers, data scientists, teachers, students, analysts, journalists, economists, and hobbyists interested in data analysis and numerical computing.

Readers familiar with the scientific Python ecosystem will find many resources to sharpen their skills in high-performance interactive computing with IPython and Jupyter.

Readers who need to implement algorithms for domain-specific applications will appreciate the introductions to a wide variety of topics in data analysis and applied mathematics.

Readers who are new to numerical computing with Python should start with the prequel of this book, Learning IPython for Interactive Computing and Data Visualization Second Edition, Packt Publishing published in 2015.

What this book covers

This book is split into two parts:

Part 1 (chapters 1 to 6) covers relatively advanced methods in interactive numerical computing, high-performance computing, and data visualization.

Part 2 (chapters 7 to 15) introduces standard methods in data science and mathematical modeling. Many of these methods are applied to real-world data.

Part 1 – Interactive Computing with Jupyter

Chapter 1, A Tour of Interactive Computing with Jupyter and IPython, contains a brief introduction to data analysis and numerical computing with IPython and Jupyter. It not only covers common packages such as Python, NumPy, pandas, and Matplotlib, but also advanced IPython/Jupyter topics such as interactive widgets in the Notebook, custom magic commands, configurable IPython extensions, and custom Jupyter kernels.

Chapter 2, Best Practices in Interactive Computing, details best practices to write reproducible, high-quality code: task automation, version control with Git, workflows with IPython and Jupyter, unit testing, continuous integration, debugging, and other related topics. The importance of these subjects in computational research and data analysis cannot be overstated.

Chapter 3, Mastering the Jupyter Notebook, covers topics related to the Jupyter Notebook, notably the Notebook format, notebook conversions, and interactive widgets.

Chapter 4, Profiling and Optimization, covers methods to make your code faster and more efficient: CPU and memory profiling in Python, advanced optimization techniques with NumPy (including large array manipulations), and memory mapping of huge arrays. These techniques are essential for big data analysis.

Chapter 5, High-Performance Computing, covers techniques to make your code much faster: code acceleration with Numba and Cython, wrapping C libraries in Python with ctypes, parallel computing with IPython and Dask, OpenMP, and General-Purpose Computing on Graphics Processing Units (GPGPU) with CUDA. The chapter ends with an introduction to the Julia language, a high-performance numerical computing programming language that can be used in the Jupyter Notebook.

Chapter 6, Data Visualization, introduces several visualization or interactive visualization libraries, such as matplotlib, seaborn, bokeh, D3, Altair, and others.

Part 2 – Standard Methods in Data Science and Applied Mathematics

Chapter 7, Statistical Data Analysis, covers methods for getting insights into data. It introduces classic frequentist and Bayesian methods for hypothesis testing, parametric and nonparametric estimation, and model inference. The chapter leverages Python libraries such as pandas, SciPy, statsmodels, and PyMC. The last recipe introduces the statistical language R, which can be easily used in the Jupyter Notebook.

Chapter 8, Machine Learning, covers methods to learn and make predictions from data. Using the scikit-learn Python package, this chapter illustrates fundamental data mining and machine learning concepts such as supervised and unsupervised learning, classification, regression, feature selection, feature extraction, overfitting, regularization, cross-validation, and grid search. Algorithms addressed in this chapter include logistic regression, Naive Bayes, K-nearest neighbors, support vector machines, random forests, and others. These methods are applied to various types of datasets: numerical data, images, and text.

Chapter 9, Numerical Optimization, covers minimizing and maximizing mathematical functions. This topic is pervasive in data science, notably in statistics, machine learning, and signal processing. This chapter illustrates a few root-finding, minimization, and curve-fitting routines with SciPy.

Chapter 10, Signal Processing, covers extracting relevant information from complex and noisy data. These steps are sometimes required prior to running statistical and data mining algorithms. This chapter introduces basic signal processing methods such as Fourier transforms and digital filters.

Chapter 11, Image and Audio Processing, covers signal processing methods for images and sounds. It introduces image filtering, segmentation, computer vision, and face detection with scikit-image and OpenCV. It also presents methods for audio processing and synthesis.

Chapter 12, Deterministic Dynamical Systems, describes the dynamical processes underlying particular types of data. It illustrates simulation techniques for discrete-time dynamical systems, as well as for ordinary differential equations and partial differential equations.

Chapter 13, Stochastic Dynamical Systems, describes the dynamical random processes underlying particular types of data. It illustrates simulation techniques for discrete-time Markov chains, point processes, and stochastic differential equations.

Chapter 14, Graphs, Geometry, and Geographic Information Systems, covers analysis and visualization methods for graphs, flight networks, road networks, maps, and geographic data.

Chapter 15, Symbolic and Numerical Mathematics, introduces SymPy, a computer algebra system that brings symbolic computing to Python. The chapter ends with an introduction to Sage, another Python-based system for computational mathematics.

To get the most out of this book


This book is accessible to beginners. However, it may be easier for you if you are familiar with the contents of Learning IPython for Interactive Computing and Data Visualization, Second Edition, Packt Publishing (also called the "IPython minibook"), the prequel of this book. The minibook introduces Python programming, the IPython console, the Jupyter Notebook, numerical computing with NumPy, basic data analysis with pandas, and plotting with Matplotlib. This book tackles scientific programming topics that rely on all of these tools.

Part 2 is a bit more theoretical. It is easier to read if you know the basics of calculus, linear algebra, and probability theory (real-valued functions, integrals and derivatives, differential equations, matrices, vector spaces, probabilities, random variables, and so on). These chapters introduce different topics in data science and applied mathematics, and how to apply them with Python: statistics, machine learning, numerical optimization, signal processing, dynamical systems, graph theory, and others.

Installing Python

This book uses the free Anaconda distribution (https://www.anaconda.com/download/). It includes Python 3, IPython, Jupyter, and almost all of the packages that we will be using in this book. Anaconda also includes a powerful packaging system named Conda. The introduction of this book's first chapter gives you more details.

The code of this book has been written for Python 3 and is incompatible with older versions of Python, Python 2 (although minimal to no changes would be required to make it compatible).

GitHub repositories

This book has a website: http://ipython-books.github.io. The text, the code, and the data from the book are available on several GitHub repositories at https://github.com/ipython-books/. You can also run the code interactively in your web browser without installing anything on your computer, thanks to the Binder project.

Be sure to check out http://ipython-books.github.io and the repositories to get the latest updates and corrections. You can also propose your own corrections and suggestions on GitHub by opening issues or pull requests.

You can also follow the author online (http://cyrille.rossant.net) and on Twitter (@cyrillerossant).

Download the example code files

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at http://www.packtpub.com.Select the SUPPORT tab.Click on Code Downloads & Errata.Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example:«"

A block of code is set as follows:

>>> print("Hello world!") Hello world!

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

>>> print("Hello world!") Hello world!

Any command-line input or output is written as follows:

# cp /usr/src/asterisk-addons/configs/cdr_mysql.conf.sample /etc/asterisk/cdr_mysql.conf

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. Here is an example: "Select System info from the Administration panel."

Note

Warnings or important notes appear in a box like this.

Tip

Tips and tricks appear like this.

Sections

In this book, you will find several headings that appear frequently (Getting ready, How to do it..., How it works..., There's more..., and See also).

To give clear instructions on how to complete a recipe, use these sections as follows:

Getting ready

This section tells you what to expect in the recipe and describes how to set up any software or any preliminary settings required for the recipe.

How to do it…

This section contains the steps required to follow the recipe.

How it works…

This section usually consists of a detailed explanation of what happened in the previous section.

There's more…

This section consists of additional information about the recipe in order to make you more knowledgeable about the recipe.

See also

This section provides helpful links to other useful information for the recipe.

Get in touch

Feedback from our readers is always welcome.

General feedback: Email <[email protected]> and mention the book's title in the subject of your message. If you have questions about any aspect of this book, please email us at <[email protected]>.

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at <[email protected]> with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packtpub.com.

Chapter 1. A Tour of Interactive Computing with Jupyter and IPython

In this chapter, we will cover the following topics:

Introducing IPython and the Jupyter NotebookGetting started with exploratory data analysis in the Jupyter NotebookIntroducing the multidimensional array in NumPy for fast array computationsCreating an IPython extension with custom magic commandsMastering IPython's configuration systemCreating a simple kernel for Jupyter

Introduction

In this introduction, we will give a broad overview of Python, IPython, Jupyter, and the scientific Python ecosystem.

What is Python?

Python is a high-level, open-source, general-purpose programming language originally conceived by Guido van Rossum in the late 1980s (the name was inspired by the British comedy Monty Python's Flying Circus). This easy-to-use language is commonly used by system administrators as a glue language, linking various system components together. It is also a robust language for large-scale software development. In addition, Python comes with an extremely rich standard library (the batteries included philosophy), which covers string processing, internet protocols, operating system interfaces, and many other domains.

In the last twenty years, Python has been increasingly used for scientific computing and data analysis as well. Other competing platforms include commercial software such as MATLAB, Maple, Mathematica, Excel, SPSS, SAS, and others. Competing open-source platforms include Julia, R, Octave, and Scilab. These tools are dedicated to scientific computing, whereas Python is a general-purpose programming language that was not initially designed for scientific computing.

However, a wide ecosystem of tools has been developed to bring Python to the level of these other scientific computing systems. Today, the main advantage of Python, and one of the main reasons why it is so popular, is that it brings scientific computing features to a general-purpose language that is used in many research areas and industries. This makes the transition from research to production much easier.

What is IPython?

IPython is a Python library that was originally meant to improve the default interactive console provided by Python, and to make it scientist-friendly. In 2011, ten years after the first release of IPython, the IPython Notebook was introduced. This web-based interface to IPython combines code, text, mathematical expressions, inline plots, interactive figures, widgets, graphical interfaces, and other rich media within a standalone sharable web document. This platform provides an ideal gateway to interactive scientific computing and data analysis. IPython has become essential to researchers, engineers, data scientists, and teachers and their students.

What is Jupyter?

Within a few years, IPython gained an incredible popularity among the scientific and engineering communities. The Notebook started to support more and more programming languages beyond Python. In 2014, the IPython developers announced the Jupyter project, an initiative created to improve the implementation of the Notebook and make it language-agnostic by design. The name of the project reflects the importance of three of the main scientific computing languages supported by the Notebook: Julia, Python, and R.

Today, Jupyter is an ecosystem by itself that comprehends several alternative Notebook interfaces (JupyterLab, nteract, Hydrogen, and others), interactive visualization libraries, and authoring tools compatible with notebooks. Jupyter has its own conference named JupyterCon. The project received funding from several companies as well as the Alfred P. Sloan Foundation and the Gordon and Betty Moore Foundation.

What is the SciPy ecosystem?

SciPy is the name of a Python package for scientific computing, but it refers also, more generally, to the collection of all Python tools that have been developed to bring scientific computing features to Python.

In the late 1990s, Travis Oliphant and others started to build efficient tools to deal with numerical data in Python: Numeric, Numarray, and finally, NumPy. SciPy, which implements many numerical computing algorithms, was also created on top of NumPy. In the early 2000s, John Hunter created Matplotlib to bring scientific graphics to Python. At the same time, Fernando Perez created IPython to improve interactivity and productivity in Python. In the late 2000s, Wes McKinney created pandas for the manipulation and analysis of numerical tables and time series. Since then, hundreds of engineers and researchers collaboratively worked on this platform to make SciPy one of the leading open source platforms for scientific computing and data science.

Note

Many of the SciPy tools are supported by NumFOCUS, a nonprofit that was created as a legal structure to promote the sustainable development of the ecosystem. NumFOCUS is supported by several large companies including Microsoft, IBM, and Intel.

SciPy has its own conferences, too: SciPy (in the US) and EuroSciPy (in Europe) (see https://conference.sci).

What's new in the SciPy ecosystem?

What are some of the main changes in the SciPy ecosystem since the first edition of this book, published in 2014? We give here a very brief selection.

Tip

Feel free to skip this section if you are new to the platform.

The last version of IPython at the time of writing is IPython 6.0, released in April 2017. It is the first version of IPython that is no longer compatible with Python 2. This decision allowed the developers to make the internal code simpler and to make better use of the new features of the language.

IPython now has a web-based Terminal interface that can be used along with notebooks. Keyboard shortcuts can be edited directly from the Notebook interface. Multiple cells can be selected and copy/pasted between notebooks. There is a new restart-and-run-all button and a find-and-replace option in the Notebook. See http://ipython.readthedocs.io/en/stable/whatsnew/version6.html for more details.

NumPy, which last version 1.13 was released in June 2017, now supports the @ matrix multiplication operator between matrices (it was previously accessible via the np.dot() function). Operations such as a + b + c use less memory and are faster on some systems (temporary elision). The new np.block() function lets one define block matrices. The new np.stack() function joins a sequence of arrays along a new axis. See https://docs.scipy.org/doc/numpy-1.13.0/release.html for more details.

SciPy 1.0 was released in October 2017. For the developers, the 1.0 version means that the library has reached some stability and maturity after 16 years of development. See https://docs.scipy.org/doc/scipy/reference/release.html for more details.

Matplotlib, of which version 2.1 was released in October 2017, has an improved styling and a much better default color palette with the viridis colormap instead of jet. See https://github.com/matplotlib/matplotlib/releases for more details.

pandas 0.21 was released in October 2017. pandas now supports categorical data. Several deprecations were done in the past years, with the deprecation of the .ix syntax and Panels (which may be replaced via the xarray library). See https://pandas.pydata.org/pandas-docs/stable/release.html for more details.

How to install Python

In this book, we use the Anaconda distribution, which is available at https://www.anaconda.com/download/. Anaconda works on Linux, macOS, and Windows. You should install the latest version of Anaconda (5.0.1 at the time of writing) with the latest 64-bit version of Python (3.6 at the time of writing). Python 2.7 is an old version that will be officially unsupported in 2020.

Anaconda comes with Python, IPython, Jupyter, NumPy, SciPy, pandas, Matplotlib, and almost all of the other scientific packages we will be using in this book. The list of all packages is available at https://docs.anaconda.com/anaconda/packages/pkg-docs.

Note

Miniconda is a light version of Anaconda with only Python and a few other essential packages. You can install only the packages you need one by one using the conda package manager of Anaconda.

We won't cover in this book the various other ways of installing a scientific Python distribution.

The Anaconda website should give you all the instructions to install Anaconda on your system. To install new packages, you can use the conda package manager that comes with Anaconda. For example, to install the ipyparallel package (which is currently not installed by default in Anaconda), type conda install ipyparallel in a system shell.

Tip

A short introduction to system shells is given in the Learning the basics of the Unix shell section of Chapter 2, Best Practices in Interactive Computing.

Another way of installing packages is with conda-forge, available at https://conda-forge.org/. This is a community-driven effort to automatically build the latest versions of packages available on GitHub, and make them available with conda. If a package is not available with conda install somepackage, one may use instead conda install --channel conda-forge somepackage if the package is supported by conda-forge.

Tip

GitHub is a commercial service that provides free and paid hosting for software repositories. It is one of the most popular platforms for open source collaborative development.

pip is the Python system manager. Contrary to conda, pip works with any Python distribution, not just with Anaconda. Packages installable by pip are stored on the Python Package Index (PyPI) available at https://pypi.python.org/pypi.

Almost all Python packages available in conda are also available in pip, but the inverse is not true. In practice, if a package is not available in conda or conda-forge, it should be available with pip install somepackage. conda packages typically include binaries compiled for the most common platforms, whereas that is not necessarily the case with pip packages. pip packages may contain source code that has to be compiled locally (which requires that a compatible compiler is installed and configured), but they may also contain compiled binaries.

References

Here are a few references:

The Python web page at https://www.python.orgPython on Wikipedia at https://en.wikipedia.org/wiki/Python_%28programming_language%29Python's standard library at https://docs.python.org/3/library/Conversation with Guido van Rossum on the birth of Python available at http://www.artima.com/intv/pythonP.htmlHistory of scientific Python available at http://fr.slideshare.net/shoheihido/sci-pyhistoryHistory of the Jupyter Notebook at http://blog.fperez.org/2012/01/ipython-notebook-historical.htmlJupyterCon at https://conferences.oreilly.com/jupyter/jup-ny

Here are a few resources on scientific Python:

Introduction to Python for Computational Science and Engineering, at https://github.com/fangohr/introduction-to-python-for-computational-science-and-engineeringStatistical Computing and Computation, at http://people.duke.edu/~ccc14/sta-663-2017/SciPy 2017 videos at https://www.youtube.com/playlist?list=PLYx7XA2nY