34,79 €
Big data analytics are driving innovations in scientific research, digital marketing, policy-making and much more. Matplotlib offers simple but powerful plotting interface, versatile plot types and robust customization.
Matplotlib 2.x By Example illustrates
the methods and applications of various plot types through real world examples.
It begins by giving readers the basic
know-how on how to create and
customize plots by Matplotlib. It further covers how to plot different types of economic data in the form of 2D and 3D graphs, which give insights from a deluge of data from public repositories, such as Quandl Finance. You will learn to visualize geographical data on maps and implement interactive charts.
By the end of this book, you will become well versed with Matplotlib in your day-to-day work to perform advanced data visualization. This book will guide you to prepare high quality figures for manuscripts and presentations. You will learn to create intuitive info-graphics and reshaping your message crisply understandable.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 251
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2017
Production reference: 1240817
ISBN 978-1-78829-526-0
www.packtpub.com
Authors
Allen Chi Shing Yu
Claire Yik Lok Chung
Aldrin Kay Yuen Yim
Copy Editor
Vikrant Phadkay
Reviewer
Nikhil Borkar
Project Coordinator
Nidhi Joshi
Commissioning Editor
Sunith Shetty
Proofreader
Safis Editing
Acquisition Editor
Tushar Gupta
Indexer
Tejal Daruwale Soni
Content Development Editor
Mayur Pawanikar
Graphics
Tania Dutta
Technical Editor
Vivek Arora
Production Coordinator
Arvindkumar Gupta
Allen Chi Shing Yu, PhD, is a Chevening Scholar, 2017-18, and an MSc student in computer science at the University of Oxford. He holds a PhD degree in Biochemistry from the Chinese University of Hong Kong, and he has used Python and Matplotlib extensively during his 10 years of experience in the field of bioinformatics and big data analysis. During his research career, Allen has published 12 international scientific research articles and presented at four international conferences, including on-stage presentations at the Congress On the Future of Engineering Software (COFES) 2011, USA, and Genome Informatics 2014, UK. Other research highlights include discovering the novel subtype of Spinocerebellar ataxia (SCA40), identifying the cause of pathogenesis for a family with Spastic paraparesis, leading the gold medalist team in 2011 International Genetically Engineered Machine (iGEM) competition, and co-developing a number of cancer genomics project.
Apart from academic research, Allen is also the co-founder of Codex Genetics Limited, which aims to provide personalized medicine service in Asia through the use of the latest genomics technology. With the financial and business support from the HKSAR Innovation and Technology Commission, Hong Kong Science Park, and the Chinese University of Hong Kong, Codex Genetics has curated and transformed recent advances in cancer and neuro-genomics research into clinically actionable insights.
Claire Yik Lok Chung is now a PhD student at the Chinese University of Hong Kong working on Bioinformatics, after receiving her BSc degree in Cell and Molecular Biology. With her passion for scientific research, she joined three labs during her college study, including the synthetic biology lab at the University of Edinburgh. Her current projects include soybean genomic analysis using optical mapping and next-generation sequencing data. Claire started programming 10 years ago, and uses Python and Matplotlib daily to tackle Bioinformatics problems and to bring convenience to life. Being interested in information technology in general, she leads the Campus Network Support Team in college and is constantly keeping up with the latest technological trends by participating in PyCon HK 2016. She is motivated to acquire new skills through self-learning and is keen to share her knowledge and experience. In addition to science, she has developed skills in multilingual translation and graphic design, and found these transferable skills useful at work.
Aldrin Kay Yuen Yim is a PhD student in computational and system biology at Washington University School of Medicine. Before joining the university, his research primarily focused on big data analytics and bioinformatics, which led to multiple discoveries, including a novel major allergen class (designated as Group 24th Major allergen by WHO/IUIS Allergen Nomenclature subcommittee) through a multi-omic approach analysis of dust mites (JACI 2015), as well as the identification of the salt-tolerance gene in soybean through large-scale genomic analysis (Nat. Comm. 2014). He also loves to explore sci-fi ideas and put them into practice, that is, the development of a DNA-based information storage system (iGEM 2010, Frontiers in Bioengineering and Biotechnology 2014). Aldrin’s current research interest focuses on neuro development and diseases, such as exploring the heterogeneity of cell types within the nervous system, as well as the gender dimorphism in brain cancers (JCI Insight 2017).
Aldrin is also the founding CEO of Codex Genetics Limited, which is currently servicing two research hospitals and the cancer registry of Hong Kong.
Nikhil Borkar holds a CQF designation and a postgraduate degree in quantitative finance. He also holds certified financial crime examiner and certified anti-money laundering professional qualifications. He is a registered research analyst with the securities and Exchange Board of India (SEBI) and has a keen grasp of laws and regulations pertaining to securities and investment. He is currently working as an independent FinTech and legal consultant. Prior to this, he worked with Morgan Stanley Capital International as a Global RFP project manager. He is self-motivated, intellectually curious, and hardworking. He loves to approach problems using a multi-disciplinary, holistic approach. Currently, he is actively working on machine learning, artificial intelligence, and deep learning projects. He has expertise in the following areas:
Quantitative investing: equities, futures and options, and derivatives engineering
Econometrics: time series analysis, statistical modeling
Algorithms: parametric, non-parametric, and ensemble machine learning algorithms
Code: R programming, Python, Scala, Excel VBA, SQL, and big data ecosystems.
Data analysis: Quandl and Quantopian
Strategies: trend following, mean reversion, cointegration, Monte-Carlo srimulations, Value at Risk, Credit Risk Modeling and Credit Rating
Data visualization : Tableau and Matplotlib
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1788295269.
If you'd like to join our team of regular reviewers, you can email us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Hello Plotting World!
Hello Matplotlib!
What is Matplotlib?
What's new in Matplotlib 2.0?
Changes to the default style
Color cycle
Colormap
Scatter plot
Legend
Line style
Patch edges and color
Fonts
Improved functionality or performance
Improved color conversion API and RGBA support
Improved image support
Faster text rendering
Change in the default animation codec
Changes in settings
New configuration parameters (rcParams)
Style parameter blacklist
Change in Axes property keywords
Setting up the plotting environment
Setting up Python
Windows
Using Python
macOS
Linux
Installing the Matplotlib dependencies
Installing the pip Python package manager
Installing Matplotlib with pip
Setting up Jupyter notebook
Why Jupyter notebook?
Installing Jupyter notebook
Using Jupyter notebook
Starting a Jupyter notebook session
Editing and running code
Jotting down notes in Markdown mode
Viewing Matplotlib plots
Saving the notebook project
All set to go!
Plotting our first graph
Loading data for plotting
Data structures
List
Numpy array
pandas dataframe
Loading data from files
The basic Python way
The Numpy way
The pandas way
Importing the Matplotlib pyplot module
Plotting a curve
Viewing the figure
Saving the figure
Setting the output format
PNG (Portable Network Graphics)
PDF (Portable Document Format)
SVG (Scalable Vector Graphics)
Post (Postscript)
Adjusting the resolution
Summary
Figure Aesthetics
Basic structure of a Matplotlib figure
Glossary of objects in a Matplotlib figure
Setting colors in Matplotlib
Single letters for basic built-in colors
Names of standard HTML colors
RGB or RGBA color code
Hexadecimal color code
Depth of grayscale
Using specific colors in the color cycle
Aesthetic and readability considerations
Adjusting text formats
Font
Font appearance
Font size
Font style
Font weight
Font family
Checking available fonts in system
LaTeX support
Customizing lines and markers
Lines
Choosing dash patterns
Setting capstyle of dashes
Advanced example
Markers
Choosing markers
Adjusting marker sizes
Customizing grids, ticks, and axes
Grids
Adding grids
Ticks
Adjusting tick spacing
Removing ticks
Drawing ticks in multiples
Automatic tick settings
Setting ticks by the number of data points
Set scaling of ticks by mathematical functions
Locating ticks by datetime
Customizing tick formats
Removing tick labels
Fixing labels
Setting labels with strings
Setting labels with user-defined functions
Formatting axes by numerical values
Setting label sizes
Trying out the ticker locator and formatter
Rotating tick labels
Axes
Nonlinear axis
Logarithmic scale
Changing the base of the log scale
Advanced example
Symmetrical logarithmic scale
Logit scale
Using style sheets
Applying a style sheet
Resetting to default styles
Customizing a style sheet
Title and legend
Adding a title to your figure
Adding a legend
Test your skills
Summary
Figure Layout and Annotations
Adjusting the layout
Adjusting the size of the figure
Adjusting spines
Adding subplots
Adding subplots using pyplot.subplot
Using pyplot.subplots() to specify handles
Sharing axes between subplots
Adjusting margins
Setting dimensions when adding subplot axes with figure.add_axes
Modifying subplot axes dimensions via pyplot.subplots_adjust
Aligning subplots with pyplot.tight_layout
Auto-aligning figure elements with pyplot.tight_layout
Stacking subplots of different dimensions with subplot2grid
Drawing inset plots
Drawing a basic inset plot
Using inset_axes
Annotations
Adding text annotations
Adding text and arrows with axis.annotate
Adding a textbox with axis.text
Adding arrows
Labeling data values on a bar chart
Adding graphical annotations
Adding shapes
Adding image annotations
Summary
Visualizing Online Data
Typical API data formats
CSV
JSON
XML
Introducing pandas
Importing online population data in the CSV format
Importing online financial data in the JSON format
Visualizing the trend of data
Area chart and stacked area chart
Introducing Seaborn
Visualizing univariate distribution
Bar chart in Seaborn
Histogram and distribution fitting in Seaborn
Visualizing a bivariate distribution
Scatter plot in Seaborn
Visualizing categorical data
Categorical scatter plot
Strip plot and swarm plot
Box plot and violin plot
Controlling Seaborn figure aesthetics
Preset themes
Removing spines from the figure
Changing the size of the figure
Fine-tuning the style of the figure
More about colors
Color scheme and color palettes
Summary
Visualizing Multivariate Data
Getting End-of-Day (EOD) stock data from Quandl
Grouping the companies by industry
Converting the date to a supported format
Getting the percentage change of the closing price
Two-dimensional faceted plots
Factor plot in Seaborn
Faceted grid in Seaborn
Pair plot in Seaborn
Other two-dimensional multivariate plots
Heatmap in Seaborn
Candlestick plot in matplotlib.finance
Visualizing various stock market indicators
Building a comprehensive stock chart
Three-dimensional (3D) plots
3D scatter plot
3D bar chart
Caveats of Matplotlib 3D
Summary
Adding Interactivity and Animating Plots
Scraping information from websites
Non-interactive backends
Interactive backends
Tkinter-based backend
Interactive backend for Jupyter Notebook
Plot.ly-based backend
Creating animated plots
Installation of FFmpeg
Creating animations
Summary
A Practical Guide to Scientific Plotting
General rules of effective visualization
Planning your figure
Do we need the plot?
Choosing the right plot
Targeting your audience
Crafting your graph
The science of visual perception
The Gestalt principles of visual perception
Getting organized
Ordering plots and data series logically
Grouping
Giving emphasis and avoiding clutter
Color and hue
Size and weight
Spacing
Typography
Use minimal marker shapes
Styling plots for slideshows, posters, and journal articles
Display time
Space allowed
Distance from the audience
Adaptations
Summary of styling plots for slideshows, posters, and journal articles
Visualizing statistical data more intuitively
Stacked bar chart and layered histogram
Replacing bar charts with mean-and-error plots
Indicating statistical significance
Methods for dimensions reduction
Principal Component Analysis (PCA)
t-distributed Stochastic Neighbor Embedding (t-SNE)
Summary
Exploratory Data Analytics and Infographics
Visualizing population health information
Map-based visualization for geographical data
Combining geographical and population health data
Survival data analysis on cancer
Summary
Big data analytics drives innovation in scientific research, digital marketing, policy making, and much more. With the increasing amount of data from sensors, user activities, to APIs and databases, there is a need to visualize data effectively in order to communicate the insights to the target audience.
Matplotlib offers a simple but a powerful plotting library that helps to resolve the complexity in big data visualization, and turns overwhelming data into useful information. The library offers versatile plot types and robust customizations to transform data into persuasive and actionable figures. With the recent introduction of version 2, Matplotlib has further established its pivotal role in Python visualization.
Matplotlib 2.x By Example illustrates the methods and applications of various plot types through real-world examples. It begins by giving readers the basic know-how on how to create and customize plots with Matplotlib. It further covers how to plot different types of economic data in the form of 2D and 3D graphs, which give insights from a deluge of data from public repositories such as Quandl Finance and data.gov. By extending the power of Matplotlib using toolkits such as GeoPandas, Lifelines, Mplot3d, NumPy, Pandas, Plot.ly, Scikit-learn, SciPy, and Seaborn, you will learn how to visualize geographical data on maps, implement interactive charts, and craft professional scientific visualizations from complex datasets. By the end of this book, you will become well-versed with Matplotlib in your day-to-day work and be able to create advanced data visualizations.
In the first part of this book, you will learn the basics of creating a Matplotlib plot:
Chapter 1
,
Hello Plotting World!
, covers the basic constituents of a Matplotlib figure, as well as the latest features of Matplotlib version 2
Chapter 2
,
Figure Aesthetics
, explains how to in customize the style of components in a Matplotlib figure
Chapter 3
,
Figure Layout and Annotations
, explains how to add annotations and subplots, which allow more comprehensive representation of the data
Once we have a solid foundation of the basics of Matplotlib, in part two of this book, you will learn how to mix and match different techniques to create increasingly complex visualizations:
Chapter 4
,
Visualizing Online Data
, teaches you how to design intuitive infographics for effective storytelling through the use of real-world datasets.
Chapter 5
,
Visualizing Multivariate Data
, gives you an overview of the plot types that are suitable for visualizing datasets with multiple features or dimensions.
Chapter 6
,
Adding Interactivity and Animating Plots
, shows you that Matplotlib is not limited to creating static plots. You will learn how to create interactive charts and animations.
Finally, in part three of this book, you will learn some practical considerations and data analysis routines that are relevant to scientific plotting:
Chapter 7
,
A Practical Guide to Scientific Plotting
, explains that data visualization is an art that's closely coupled with statistics. As a data scientist, you will learn how to create visualizations that are not only understandable by yourself, but legible to your target audiences.
Chapter 8
,
Exploratory Data Analytics and Infographics
, guides you through more advanced topics in geographical infographics and exploratory data analytics.
These are the prerequisites for this book:
Basic Python knowledge is expected. Interested readers can refer to
Learning Python
by Fabrizio Romano if they are relatively new to Python programming.
A working installation of Python 3.4 or later is required. The default Python distribution can be obtained from
https://www.python.org/download/
. Readers may also explore other Python distributions, such as Anaconda (
https://www.continuum.io/downloads
), which provides better package dependency management.
A Windows 7+, macOS 10.10+, or Linux-based computer with 4 GB RAM or above is recommended.
The code examples are based on Matplotlib 2.x, Seaborn 0.8.0, Pandas 0.20.3, Numpy 1.13.1, SciPy 0.19.1, pycountry 17.5.14, stockstats 0.2.0, BeautifulSoup4 4.6.0, requests 2.18.4, plotly 2.0.14, scikit-learn 0.19.0, GeoPandas 0.2.1, PIL 1.1.6, and lifelines 0.11.1. Brief instructions for installing these packages are included in the chapters, but readers can refer to the official documentation pages for more details.
This book aims to help anyone interested in data visualization to get insights from big data with Python and Matplotlib 2.x. Well-visualized data aids analysis and communication regardless of the field. This book will guide Python novices to quickly pick up Matplotlib plotting skills through step-by-step tutorials. Data scientists will learn to prepare high-quality figures for publications. News editors and copywriters will learn how to create intuitive infographics to make their message crisply understandable.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of. To send us general feedback, simply email [email protected], and mention the book's title in the subject of your message. If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files emailed directly to you. You can download the code files by following these steps:
Log in or register to our website using your email address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Matplotlib-2.x-By-Example. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/Matplotlib2xByExample_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title. To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy. Please contact us at [email protected] with a link to the suspected pirated material. We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
To learn programming, we often start with printing the "Hello world!" message. For graphical plots that contain all the elements from data, axes, labels, lines and ticks, how should we begin?
This chapter gives an overview of Matplotlib's functionalities and latest features. We will guide you through the setup of the Matplotlib plotting environment. You will learn to create a simple line graph, view, and save your figures. By the end of this chapter, you will be confident enough to start building your own plots, and be ready to learn about customization and more advanced techniques in the coming sections.
Come and say "Hello!" to the world of plots!
Here is a list of topics covered in this chapter:
What is Matplotlib?
Setting up the Python environment
Installing Matplotlib and its dependencies
Setting up the Jupyter notebook
Plotting the first simple line graph
Loading data into Matplotlib
Exporting the figure
Welcome to the world of Matplotlib 2.0! Follow our simple example in the chapter and draw your first "Hello world" plot.
Matplotlib is a versatile Python library that generates plots for data visualization. With the numerous plot types and refined styling options available, it works well for creating professional figures for presentations and scientific publications. Matplotlib provides a simple way to produce figures to suit different purposes, from slideshows, high-quality poster printing, and animations to web-based interactive plots. Besides typical 2D plots, basic 3D plotting is also supported.
On the development side, the hierarchical class structure and object-oriented plotting interface of Matplotlib make the plotting process intuitive and systematic. While Matplotlib provides a native graphical user interface for real-time interaction, it can also be easily integrated into popular IPython-based interactive development environments, such as Jupyter notebook and PyCharm.
Matplotlib 2.0 features many improvements, including the appearance of default styles, image support, and text rendering speed. We have selected a number of important changes to highlight later. The details of all new changes can be found on the documentation site at http://matplotlib.org/devdocs/users/whats_new.html.
If you are already using previous versions of Matplotlib, you may want to pay more attention to this section to update your coding habits. If you are totally new to Matplotlib or even Python, you may jump ahead to start using Matplotlib first, and revisit here later.
The most prominent change to Matplotlib in version 2.0 is to the default style. You can find the list of changes here: http://matplotlib.org/devdocs/users/dflt_style_changes.html. Details of style setting will be covered in Chapter 2, Figure Aesthetics.
For quick plotting without having to set colors for each data series, Matplotlib uses a list of colors called the default property cycle, whereby each series is assigned one of the default colors in the cycle. In Matplotlib 2.0, the list has been changed from the original red, green, blue, cyan, magenta, yellow, and black, noted as