21,59 €
Explore Python frameworks like pandas, Jupyter notebooks, and Matplotlib to build data pipelines and data visualization
The pandas is a Python library that lets you manipulate, transform, and analyze data. It is a popular framework for exploratory data visualization and analyzing datasets and data pipelines based on their properties.
This book will be your practical guide to exploring datasets using pandas. You will start by setting up Python, pandas, and Jupyter Notebooks. You will learn how to use Jupyter Notebooks to run Python code. We then show you how to get data into pandas and do some exploratory analysis, before learning how to manipulate and reshape data using pandas methods. You will also learn how to deal with missing data from your datasets, how to draw charts and plots using pandas and Matplotlib, and how to create some effective visualizations for your audience. Finally, you will wrapup your newly gained pandas knowledge by learning how to import data out of pandas into some popular file formats.
By the end of this book, you will have a better understanding of exploratory analysis and how to build exploratory data pipelines with Python.
If you are a budding data scientist looking to learn the popular pandas library, or a Python developer looking to step into the world of data analysis, this book is the ideal resource you need to get started. Some programming experience in Python will be helpful to get the most out of this course
Harish Garg is a data analyst, author, and software developer who is really passionate about data science and Python. He is a graduate of Udacity's Data Analyst Nanodegree program. He has 17 years of industry experience in data analysis using Python, developing and testing enterprise and consumer software, managing projects and software teams, and creating training material and tutorials. He also worked for 11 years for Intel Security (previously McAfee, Inc.). He regularly contributes articles and tutorials on data analysis and Python. He is also active in the open data community and is a contributing member of the Data4Democracy open data initiative. He has written data analysis pieces for the Takshashila think tank.Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 87
Veröffentlichungsjahr: 2018
Copyright © 2018 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Pavan RamchandaniAcquisition Editor:Nelson MorrisContent Development Editor: Karan ThakkarTechnical Editor: Suwarna PatilCopy Editor: Safis EditingProject Coordinator: Nidhi JoshiProofreader: Safis EditingIndexer: Pratik ShirodkarGraphics: Jisha ChirayilProduction Coordinator: Arvindkumar Gupta
First published: September 2018
Production reference: 1290918
Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.
ISBN 978-1-78961-963-8
www.packtpub.com
Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Spend less time learning and more time coding with practical eBooks and videos from over 4,000 industry professionals
Improve your learning with Skill Plans built especially for you
Get a free eBook or video every month
Mapt is fully searchable
Copy and paste, print, and bookmark content
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Harish Garg is a data analyst, author, and software developer who is really passionate about data science and Python. He is a graduate of Udacity's Data Analyst Nanodegree program. He has 17 years of industry experience in data analysis using Python, developing and testing enterprise and consumer software, managing projects and software teams, and creating training material and tutorials. He also worked for 11 years for Intel Security (previously McAfee, Inc.). He regularly contributes articles and tutorials on data analysis and Python. He is also active in the open data community and is a contributing member of the Data4Democracy open data initiative. He has written data analysis pieces for the Takshashila think tank.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
Title Page
Copyright and Credits
Mastering Exploratory Analysis with pandas
Packt Upsell
Why subscribe?
Packt.com
Contributors
About the author
Packt is searching for authors like you
Preface
Who this book is for
What this book covers
To get the most out of this book
Download the example code files
Download the color images
Conventions used
Get in touch
Reviews
Working with Different Kinds of Datasets
Using advanced options while reading data from CSV files
Importing modules
Advanced read options
Manipulating columns, index locations, and names
Specifying a different row as a header
Specifying a column as an index
Choosing a subset of columns to be read
Handling missing and NA data
Choosing whether to skip over blank rows
Data parsing options
Skipping rows from the footer or end of the file
Reading the subset of a file or a certain number of rows
Reading data from Excel files
Basic Excel read
Specifying which sheet should be read
Reading data from multiple sheets
Finding out sheet names
Choosing header or column labels
No header
Skipping rows at the beginning
Skipping rows at the end
Choosing columns
Column names
Setting an index while reading data
Handling missing data while reading
Reading data from other popular formats
Reading a JSON file
Reading JSON data into pandas
Reading HTML data
Reading a PICKLE file
Reading SQL data
Reading data from the clipboard
Summary
Data Selection
Introduction to datasets
Selecting data from the dataset
Multi-column selection
Dot notation
Selecting multiple rows and columns from a pandas DataFrame
Selecting a single row and multiple columns
Selecting values from a range of rows and all columns
Sorting a pandas DataFrame
Filtering rows of a pandas DataFrame
Applying multiple filter criteria to a pandas DataFrame
Filtering based on multiple conditions – AND
Filtering based on multiple conditions – OR
Filtering using the isin method
Using the isin method with multiple conditions
Using the axis parameter in pandas
Usage of the axis parameter
Axis usage examples
More examples of the axis keyword
The axis keyword
Using string methods in pandas
Checking for a substring
Changing the values of a series or column into uppercase
Changing the values into lowercase
Finding the length of every value of a column
Removing white spaces
Replacing parts of a column's values
Changing the datatype of a pandas series
Changing an int datatype column to a float
Changing the datatype while reading data
Converting string to datetime
Summary
Manipulating, Transforming, and Reshaping Data
Modifying a pandas DataFrame using the inplace parameter
Using the groupby method
Handling missing values in pandas
Indexing in pandas DataFrames
Renaming columns in a pandas DataFrame
Removing columns from a pandas DataFrame
Working with date and time series data
Handling SettingWithCopyWarning
Applying a function to a pandas series or DataFrame
Merging and concatenating multiple DataFrames into one
Summary
Visualizing Data Like a Pro
Controlling plot aesthetics
Our first plot with seaborn
Changing the plot style with set_style
Setting the plot background to a white grid
Setting the plot background to dark
Setting the background to white
Adding ticks
Customizing styles
Style parameters
Plotting context presets
Choosing the colors for plots
Changing the color palette
Building custom color palettes
Plotting categorical data
Scatterplot
Swarm plot
Box plot
Violin plot
Bar plot
Wide-form plot
Plotting with Data-Aware Grids
Plotting with the FacetGrid() method
Plotting with the PairGrid() method 
Plotting with the PairPlot() method 
Summary
Other Books You May Enjoy
Leave a review - let other readers know what you think
In this book, you will be learning in depth about pandas, which is a Python library for manipulating, transforming, and analyzing data. It is a popular framework for exploratory data visualization, which is a method for analyzing datasets and data pipelines based on their properties.
This book will be your practical guide to exploring datasets using pandas. You will start by setting up Python, pandas, and Jupyter Notebooks. You will learn how to use Jupyter Notebooks to run Python code. We will then show you how to get data into pandas and perform some exploratory analysis. You will learn how to manipulate and reshape data using pandas methods. You will also learn how to deal with missing data from your datasets, how to draw charts and plots using pandas and Matplotlib, and how to create some effective visualizations for your audience. Finally, we will wrap up your newly gained pandas knowledge by teaching you how to get data out of pandas and into a number of popular file formats.
This book is for the budding data scientist looking to learn about the popular pandas library, or the Python developer looking to step into the world of data analysis—if you fall into either of those categories, then this book is the ideal resource for you to get started.
Chapter 1, Working with Different Kinds of Datasets, teaches you about using advanced options when reading data from CSV files and Excel files.
Chapter 2, Data Selection, looks at how to use the pandas series data structure to select data. You will also learn how to sort and filter data from pandas DataFrames and how to change datatypes in pandas series.
Chapter 3, Manipulating, Transforming, and Reshaping Data, explores how to modify pandas DataFrames. You will also learn how to use the GroupBy method, how to handle missing values, and how to index methods in pandas DataFrames. This chapter will also teach you how to work with dates and time data and how to apply functions to pandas series or DataFrames.
Chapter 4, Visualizing Data Like a Pro, will show you how to control plot aesthetics, including how to choose colors for plots. You will also learn how to plot categorical data and get to grips with plotting with data-aware grids.
