34,79 €
Taking an approach that uses the latest developments in the Python ecosystem, you’ll first be guided through the Jupyter ecosystem, key visualization libraries and powerful data sanitization techniques before you train your first predictive model. You’ll then explore a variety of approaches to classification such as support vector networks, random decision forests and k-nearest neighbors to build on your knowledge before moving on to advanced topics.
After covering classification, you’ll go on to discover ethical web scraping and interactive visualizations, which will help you professionally gather and present your analysis. Next, you’ll start building your keystone deep learning application, one that aims to predict the future price of Bitcoin based on historical public data. You’ll then be guided through a trained neural network, which will help you explore common deep learning network architectures (convolutional, recurrent, and generative adversarial networks) and deep reinforcement learning. Later, you’ll delve into model optimization and evaluation. You’ll do all this while working on a production-ready web application that combines TensorFlow and Keras to produce meaningful user-friendly results.
By the end of this book, you’ll be equipped with the skills you need to tackle and develop your own real-world deep learning projects confidently and effectively.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 295
Veröffentlichungsjahr: 2018
Copyright © 2019 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Acquisitions Editors: Aditya Date, Koushik Sen Content Development Editors: Tanmayee Patil, Rina Yadav Production Coordinator: Ratan Pote
First published: August 2018
Production reference: 2260719
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78980-474-4
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.Packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Alex Galea has been professionally practicing data analytics since graduating with a Master's degree in Physics from the University of Guelph, Canada. He developed a keen interest in Python while researching quantum gases as part of his graduate studies. Alex is currently doing web data analytics, where Python continues to play a key role in his work. He is a frequent blogger about data-centric projects that involve Python and Jupyter Notebooks.
Luis Capelo is a Harvard-trained analyst and programmer who specializes in the design and development of data science products. He is based in the great New York City, USA.
He is the head of the Data Products team at Forbes, where they both investigate new techniques for optimizing article performance and create clever bots that help them distribute their content. Previously, he led a team of world-class scientists at the Flowminder Foundation, where we developed predictive models for assisting the humanitarian community. Prior to that, he worked for the United Nations as part of the Humanitarian Data Exchange team (founders of the Center for Humanitarian Data).
He is a native of Havana, Cuba, and the founder and owner of a small consultancy firm dedicated to supporting the nascent Cuban private sector.
Elie Kawerk likes to solve problems using the analytical skills he has accumulated over the years. He uses the data science process, including statistical methods and machine learning, to extract insights from data and get value out of it.
His formal training is in computational physics. He used to simulate atomic and molecular physics phenomena with the help of supercomputers using the good old FORTRAN language; this involved a lot of linear algebra and quantum physics equations.
Manoj Pandey is a Python programmer and the founder and organizer of PyData Delhi. He works on research and development from time to time, and is currently working with RaRe Technologies on their incubator program for a computational linear algebra project. Prior to this, he has worked with Indian startups and small design/development agencies, and teaches Python/JavaScript to many on Codementor.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Applied Deep Learning with Python
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the authors
About the reviewers
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Conventions used
Get in touch
Reviews
Jupyter Fundamentals
Basic Functionality and Features
What is a Jupyter Notebook and Why is it Useful?
Navigating the Platform
Introducing Jupyter Notebooks
Jupyter Features
Exploring some of Jupyter's most useful features
Converting a Jupyter Notebook to a Python Script
Python Libraries
Import the external libraries and set up the plotting environment
Our First Analysis - The Boston Housing Dataset
Loading the Data into Jupyter Using a Pandas DataFrame
Load the Boston housing dataset
Data Exploration
Explore the Boston housing dataset
Introduction to Predictive Analytics with Jupyter Notebooks
Linear models with Seaborn and scikit-learn
Activity: Building a Third-Order Polynomial Model
Linear models with Seaborn and scikit-learn
Using Categorical Features for Segmentation Analysis
Create categorical fields from continuous variables and make segmented visualizations
Summary
Data Cleaning and Advanced Machine Learning
Preparing to Train a Predictive Model
Determining a Plan for Predictive Analytics
Preprocessing Data for Machine Learning
Exploring data preprocessing tools and methods
Activity: Preparing to Train a Predictive Model for the Employee-Retention Problem
Training Classification Models
Introduction to Classification Algorithms
Training two-feature classification models with scikit-learn
The plot_decision_regions Function
Training k-nearest neighbors for our model
Training a Random Forest
Assessing Models with k-Fold Cross-Validation and Validation Curves
Using k-fold cross-validation and validation curves in Python with scikit-learn
Dimensionality Reduction Techniques
Training a predictive model for the employee retention problem
Summary
Web Scraping and Interactive Visualizations
Scraping Web Page Data
Introduction to HTTP Requests
Making HTTP Requests in the Jupyter Notebook
Handling HTTP requests with Python in a Jupyter Notebook
Parsing HTML in the Jupyter Notebook
Parsing HTML with Python in a Jupyter Notebook
Activity: Web Scraping with Jupyter Notebooks
Interactive Visualizations
Building a DataFrame to Store and Organize Data
Building and merging Pandas DataFrames
Introduction to Bokeh
Introduction to interactive visualizations with Bokeh
Activity: Exploring Data with Interactive Visualizations
Summary
Introduction to Neural Networks and Deep Learning
What are Neural Networks?
Successful Applications
Why Do Neural Networks Work So Well?
Representation Learning
Function Approximation
Limitations of Deep Learning
Inherent Bias and Ethical Considerations
Common Components and Operations of Neural Networks
Configuring a Deep Learning Environment
Software Components for Deep Learning
Python 3
TensorFlow
Keras
TensorBoard
Jupyter Notebooks, Pandas, and NumPy
Activity: Verifying Software Components
Exploring a Trained Neural Network
MNIST Dataset
Training a Neural Network with TensorFlow
Training a Neural Network
Testing Network Performance with Unseen Data
Activity: Exploring a Trained Neural Network
Summary
Model Architecture
Choosing the Right Model Architecture
Common Architectures
Convolutional Neural Networks
Recurrent Neural Networks
Generative Adversarial Networks
Deep Reinforcement Learning
Data Normalization
Z-score
Point-Relative Normalization
Maximum and Minimum Normalization
Structuring Your Problem
Activity: Exploring the Bitcoin Dataset and Preparing Data for Model
Using Keras as a TensorFlow Interface
Model Components
Activity: Creating a TensorFlow Model Using Keras
From Data Preparation to Modeling
Training a Neural Network
Reshaping Time-Series Data
Making Predictions
Overfitting
Activity: Assembling a Deep Learning System
Summary
Model Evaluation and Optimization
Model Evaluation
Problem Categories
Loss Functions, Accuracy, and Error Rates
Different Loss Functions, Same Architecture
Using TensorBoard
Implementing Model Evaluation Metrics
Evaluating the Bitcoin Model
Overfitting
Model Predictions
Interpreting Predictions
Activity:Creating an Active Training Environment
Hyperparameter Optimization
Layers and Nodes - Adding More Layers
Adding More Nodes
Layers and Nodes - Implementation
Epochs
Epochs - Implementation
Activation Functions
Linear (Identity)
Hyperbolic Tangent (Tanh)
Rectifid Linear Unit
Activation Functions - Implementation
Regularization Strategies
L2 Regularization
Dropout
Regularization Strategies – Implementation
Optimization Results
Activity:Optimizing a Deep Learning Model
Summary
Productization
Handling New Data
Separating Data and Model
Data Component
Model Component
Dealing with New Data
Re-Training an Old Model
Training a New Model
Activity: Dealing with New Data
Deploying a Model as a Web Application
Application Architecture and Technologies
Deploying and Using Cryptonic
Activity: Deploying a Deep Learning Application
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
This Learning Path takes a step-by-step approach to teach you how to get started with data science, machine learning, and deep learning. Each module is designed to build on the learning of the previous chapter. The book contains multiple demos that use real-life business scenarios for you to practice and apply your new skills in a highly relevant context.
In the first part of this Learning Path, you will learn entry-level data science. You'll learn about commonly used libraries that are part of the Anaconda distribution, and then explore machine learning models with real datasets to give you the skills and exposure you need for the real world.
In the second part, you'll be introduced to neural networks and deep learning. You will then learn how to train, evaluate, and deploy Tensorflow and Keras models as real-world web applications. By the time you are done reading, you will have the knowledge to build applications in the deep learning environment and create elaborate data visualizations and predictions.
If you’re a Python programmer stepping out into the world of data science, this is the right-way to get started. It is also ideal for experienced developers, analysts, or data scientists, who want to work with TensorFlow and Keras. We assume that you are familiar with Python, web application development, Docker commands, and concepts of linear algebra, probability, and statistics.
Chapter 1, Jupyter Fundamentals, covers the fundamentals of data analysis in Jupyter. We will start with usage instructions and features of Jupyter such as magic functions and tab completion. We will then transition to data science specific material. We will run an exploratory analysis in a live Jupyter Notebook. We will use visual assists such as scatter plots, histograms, and violin plots to deepen our understanding of the data. We will also perform simple predictive modeling.,
Chapter 2, Data Cleaning and Advanced Machine Learning, shows how predictive models can be trained in Jupyter Notebooks. We will talk about how to plan a machine learning strategy. This chapter also explains the machine learning terminology such as supervised learning, unsupervised learning, classification, and regression. We will discuss methods for preprocessing data using scikit-learn and pandas.,
Chapter 3, Web Scraping and Interactive Visualizations, explains how to scrap web page tables and then use interactive visualizations to study the data. We will start by looking at how HTTP requests work, focusing on GET requests and their response status codes. Then, we will go into the Jupyter Notebook and make HTTP requests with Python using the Requests library. We will see how Jupyter can be used to render HTML in the notebook, along with actual web pages that can be interacted with. After making requests, we will see how Beautiful Soup can be used to parse text from the HTML, and used this library to scrape tabular data.
Chapter 4, Introduction to Neural Networks and Deep Learning, helps you set up and configure deep learning environment and start looking at individual models and case studies. It also discusses neural networks and its idea along with their origins and explores their power.
Chapter 5, Model Architecture, shows how to predict Bitcoin prices using deep learning model.
Chapter 6, Model Evaluation and Optimization, shows how to evaluate a neural network model. We will modify the network's hyper parameters to improve its performance.
Chapter 7, Productization, explains how to create a working application from a deep learning model. We will deploy our Bitcoin prediction model as an application that is capable of handling new data by creating a new models.
This book will be most applicable to professionals and students interested in data analysis and want to enhance their knowledge in the field of developing applications using TensorFlow and Keras. For the best experience, you should have knowledge of programming fundamentals and some experience with Python. In particular, having some familiarity with Python libraries such as Pandas, matplotlib, and scikit-learn will be useful.
You can download the example code files for this book from your account at www.packtpub.com. If you purchased this book elsewhere, you can visit www.packtpub.com/support and register to have the files emailed directly to you.
You can download the code files by following these steps:
Log in or register at
www.packtpub.com
.
Select the
SUPPORT
tab.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box and follow the onscreen instructions.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR/7-Zip for Windows
Zipeg/iZip/UnRarX for Mac
7-Zip/PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/TrainingByPackt/Applied-Deep-Learning-with-Python. In case there's an update to the code, it will be updated on the existing GitHub repository.
We also have other code bundles from our rich catalog of books and videos available at https://github.com/TrainingByPackt/Applied-Deep-Learning-with-Python. Check them out!
Feedback from our readers is always welcome.
General feedback: Email [email protected] and mention the book title in the subject of your message. If you have questions about any aspect of this book, please email us at [email protected].
Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.
Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.
If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.
Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!
For more information about Packt, please visit packtpub.com.
Jupyter Notebooks are one of the most important tools for data scientists using Python. This is because they're an ideal environment for developing reproducible data analysis pipelines. Data can be loaded, transformed, and modeled all inside a single Notebook, where it's quick and easy to test out code and explore ideas along the way. Furthermore, all of this can be documented "inline" using formatted text, so you can make notes for yourself or even produce a structured report. Other comparable platforms - for example, RStudio or Spyder - present the user with multiple windows, which promote arduous tasks such as copy and pasting code around and rerunning code that has already been executed. These tools also tend to involve Read Eval Prompt Loops (REPLs) where code is run in a terminal session that has saved memory. This type of development environment is bad for reproducibility and not ideal for development either. Jupyter Notebooks solve all these issues by giving the user a single window where code snippets are executed and outputs are displayed inline. This lets users develop code efficiently and allows them to look back at previous work for reference, or even to make alterations.
We'll start the chapter by explaining exactly what Jupyter Notebooks are and continue to discuss why they are so popular among data scientists. Then, we'll open a Notebook together and go through some exercises to learn how the platform is used. Finally, we'll dive into our first analysis and perform an exploratory analysis in the section Basic Functionality and Features.
By the end of this chapter, you will be able to:
Learn what a Jupyter Notebook is and why it's useful for data analysis
Use Jupyter Notebook features
Study Python data science libraries
Perform simple exploratory data analysis
In this section, we first demonstrate the usefulness of Jupyter Notebooks with examples and through discussion. Then, in order to cover the fundamentals of Jupyter Notebooks for beginners, we'll see the basic usage of them in terms of launching and interacting with the platform. For those who have used Jupyter Notebooks before, this will be mostly a review; however, you will certainly see new things in this topic as well.
Jupyter Notebooks are locally run web applications which contain live code, equations, figures, interactive apps, and Markdown text. The standard language is Python, and that's what we'll be using for this book; however, note that a variety of alternatives are supported. This includes the other dominant data science language, R:
Those familiar with R will know about R Markdown. Markdown documents allow for Markdown-formatted text to be combined with executable code. Markdown is a simple language used for styling text on the web. For example, most GitHub repositories have a README.md Markdown file. This format is useful for basic text formatting. It's comparable to HTML but allows for much less customization.
Commonly used symbols in Markdown include hashes (#) to make text into a heading, square and round brackets to insert hyperlinks, and stars to create italicized or bold text:
Having seen the basics of Markdown, let's come back to R Markdown, where Markdown text can be written alongside executable code. Jupyter Notebooks offer the equivalent functionality for Python, although, as we'll see, they function quite differently than R Markdown documents. For example, R Markdown assumes you are writing Markdown unless otherwise specified, whereas Jupyter Notebooks assume you are inputting code. This makes it more appealing to use Jupyter Notebooks for rapid development and testing.
From a data science perspective, there are two primary types for a Jupyter Notebook depending on how they are used: lab-style and deliverable.
Lab-style Notebooks are meant to serve as the programming analog of research journals. These should contain all the work you've done to load, process, analyze, and model the data. The idea here is to document everything you've done for future reference, so it's usually not advisable to delete or alter previous lab-style Notebooks. It's also a good idea to accumulate multiple date-stamped versions of the Notebook as you progress through the analysis, in case you want to look back at previous states.
Deliverable Notebooks are intended to be presentable and should contain only select parts of the lab-style Notebooks. For example, this could be an interesting discovery to share with your colleagues, an in-depth report of your analysis for a manager, or a summary of the key findings for stakeholders.
In either case, an important concept is reproducibility. If you've been diligent in documenting your software versions, anyone receiving the reports will be able to rerun the Notebook and compute the same results as you did. In the scientific community, where reproducibility is becoming increasingly difficult, this is a breath of fresh air.
Now, we are going to open up a Jupyter Notebook and start to learn the interface. Here, we will assume you have no prior knowledge of the platform and go over the basic usage.
Navigate to the companion material directory in the terminal.
Start a new local Notebook server here by typing the following into the terminal:
jupyter notebook.
A new window or tab of your default browser will open the Notebook Dashboard to the working directory. Here, you will see a list of folders and files contained therein.
Click on a folder to navigate to that particular path and open a file by clicking on it. Although its main use is editing IPYNB Notebook files, Jupyter functions as a standard text editor as well.
Reopen the terminal window used to launch the app. We can see the
NotebookApp
being run on a local server. In particular, you should see a line like this:
[I 20:03:01.045 NotebookApp] The Jupyter Notebook is running at: http:// localhost:8888/?token=e915bb06866f19ce462d959a9193a94c7c088e81765f9d8a
Going to that HTTP address will load the app in your browser window, as was done automatically when starting the app. Closing the window does not stop the app; this should be done from the terminal by typing
Ctrl + C
.
Close the app by typing
Ctrl +
C
in the terminal. You may also have to confirm by entering
y
. Close the web browser window as well.
When loading the NotebookApp, there are various options available to you. In the terminal, see the list of available options by running the following:
jupyter notebook –-help.
One such option is to specify a specific port. Open a NotebookApp at
local port 9000
by running the following:
jupyter notebook --port 9000
The primary way to create a new Jupyter Notebook is from the Jupyter Dashboard. Click
New
in the upper-right corner and select a kernel from the drop-down menu (that is, select something in the Notebooks section):
Kernels provide programming language support for the Notebook. If you have installed Python with Anaconda, that version should be the default kernel. Conda virtual environments will also be available here.
With the newly created blank Notebook, click on the top cell and type print('hello world'), or any other code snippet that writes to the screen. Execute it by clicking on the cell and pressing Shift + Enter, or by selecting Run Cell in the Cell menu.
Any stdout or stderr output from the code will be displayed beneath as the cell runs. Furthermore, the string representation of the object written in the final line will be displayed as well. This is very handy, especially for displaying tables, but sometimes we don't want the final object to be displayed. In such cases, a semicolon (; ) can be added to the end of the line to suppress the display.
New cells expect and run code input by default; however, they can be changed to render Markdown instead.
Click into an empty cell and change it to accept Markdown-formatted text. This can be done from the drop-down menu icon in the toolbar or by selecting
Markdown
from the
Cell
menu. Write some text in here (any text will do), making sure to utilize Markdown formatting symbols such as #.
Focus on the toolbar at the top of the Notebook:
There is a Play icon in the toolbar, which can be used to run cells. As we'll see later, however, it's handier to use the keyboard shortcut Shift +Enter to run cells. Right next to this is a Stop icon, which can be used to stop cells from running. This is useful, for example, if a cell is taking too long to run:
New cells can be manually added from the Insert menu:
Cells can be copied, pasted, and deleted using icons or by selecting options from the Edit menu:
Cells can also be moved up and down this way:
There are useful options under the Cell menu to run a group of cells or the entire Notebook:
Experiment with the toolbar options to move cells up and down, insert new cells, and delete cells.
An important thing to understand about these Notebooks is the shared memory between cells. It's quite simple: every cell existing on the sheet has access to the global set of variables. So, for example, a function defined in one cell could be called from any other, and the same applies to variables. As one would expect, anything within the scope of a function will not be a global variable and can only be accessed from within that specific function.
Open the Kernel menu to see the selections. The Kernel menu is useful for stopping script executions and restarting the Notebook if the kernel dies. Kernels can also be swapped here at any time, but it is unadvisable to use multiple kernels for a single Notebook due to reproducibility concerns.
Open the
File
menu to see the selections. The
File
menu contains options for downloading the Notebook in various formats. In particular, it's recommended to save an HTML version of your Notebook, where the content is rendered statically and can be opened and viewed "as you would expect" in web browsers.
The Notebook name will be displayed in the upper-left corner. New Notebooks will automatically be named Untitled.
Change the name of your IPYNB
Notebook
file by clicking on the current name in the upper-left corner and typing the new name. Then, save the file.
Close the current tab in your web browser (exiting the Notebook) and go to the Jupyter Dashboard tab, which should still be open. (If it's not open, then reload it by copy and pasting the HTTP link from the terminal.)
Since we didn't shut down the Notebook, we just saved and exited, it will have a green book symbol next to its name in the Files section of the Jupyter Dashboard and will be listed as Running on the right side next to the last modified date. Notebooks can be shut down from here.
Quit the Notebook you have been working on by selecting it (checkbox to the left of the name) and clicking the orange Shutdown button:
Jupyter has many appealing features that make for efficient Python programming. These include an assortment of things, from methods for viewing docstrings to executing Bash commands. Let's explore some of these features together in this section.
From the Jupyter Dashboard, navigate to the
chapter-1
directory and open the
chapter-1-workbook.ipynb
file by selecting it. The standard file extension for Jupyter Notebooks is
.ipynb
, which was introduced back when they were called IPython Notebooks.
Scroll down to Subtopic
Jupyter Features
in the Jupyter Notebook. We start by reviewing the basic keyboard shortcuts. These are especially helpful to avoid having to use the mouse so often, which will greatly speed up the workflow. Here are the most useful keyboard shortcuts. Learning to use these will greatly improve your experience with Jupyter Notebooks as well as your own efficiency:
Shift
+
Enter
is used to run a cell
The
Esc
key
is used to leave a cell
The
M
key is used to change a cell to Markdown (after pressing
Esc
)
The
Y
key is used to change a cell to code (after pressing
Esc
)
Arrow keys
move cells (after pressing
Esc
)
The
Enter
key
is used to enter a cell
Moving on from shortcuts, the help option is useful for beginners and experienced coders alike. It can help provide guidance at each uncertain step.
Users can get help by adding a question mark to the end of any object and running the cell. Jupyter finds the docstring for that object and returns it in a pop-out window at the bottom of the app.
Run the
Getting Help
section cells and check out how Jupyter displays the docstrings at the bottom of the Notebook. Add a cell in this section and get help on the object of your choice:
Tab completion can be used to do the following:
List available modules when importing external libraries
List available modules of imported external libraries
Function and variable completion
This can be especially useful when you need to know the available input arguments for a module, when exploring a new library, to discover new modules, or simply to speed up workflow. They will save time writing out variable names or functions and reduce bugs from typos. The tab completion works so well that you may have difficulty coding Python in other editors after today!
Click into an empty code cell in the Tab Completion section and try using tab completion in the ways suggested immediately above. For example, the first suggestion can be done by typing import (including the space after) and then pressing the Tab key:
Last but not least of the basic Jupyter Notebook features are
magic
commands. These consist of one or two percent signs followed by the command. Magics starting with
%%
will apply to the entire cell, and magics starting with
%
will only apply to that line. This will make sense when seen in an example.
Scroll to the Jupyter Magic Functions section and run the cells containing %lsmagic and %matplotlib inline:
%lsmagic lists the available options. We will discuss and show examples of some of the most useful ones. The most common magic command you will probably see is %matplotlib inline, which allows matplotlib figures to be displayed in the Notebook without having to explicitly use plt.show().
The timing functions are very handy and come in two varieties: a standard timer (%time or %%time) and a timer that measures the average runtime of many iterations (%timeit and %%timeit).
Run the cells in the Timers section. Note the difference between using one and two percent signs.
Even by using a Python kernel (as you are currently doing), other languages can be invoked using magic commands. The built-in options include JavaScript, R, Pearl, Ruby, and Bash. Bash is particularly useful, as you can use Unix commands to find out where you are currently (pwd), what's in the directory (ls), make new folders (mkdir), and write file contents (cat / head / tail).
Run the first cell in the
Using bash in the notebook section
. This cell writes some text to a file in the working directory, prints the directory contents, prints an empty line, and then writes back the contents of the newly created file before removing it:
Run the following cells containing only
ls
and
pwd
. Note how we did not have to explicitly use the Bash magic command for these to work.
There are plenty of external magic commands that can be installed. A popular one is ipython-sql, which allows for SQL code to be executed in cells.
If you've not already done so, install
ipython-sql
now. Open a new terminal window and execute the following code:
pip install ipython-sql
Run the
%load_ext sql
cell to load the external command into the Notebook:
This allows for connections to remote databases so that queries can be executed (and thereby documented) right inside the Notebook.
Run the cell containing the SQL sample query: