46,79 €
Extracting valuable business insights is no longer a ‘nice-to-have’, but an essential skill for anyone who handles data in their enterprise. Hands-On Data Analysis with Pandas is here to help beginners and those who are migrating their skills into data science get up to speed in no time.
This book will show you how to analyze your data, get started with machine learning, and work effectively with the Python libraries often used for data science, such as pandas, NumPy, matplotlib, seaborn, and scikit-learn.
Using real-world datasets, you will learn how to use the pandas library to perform data wrangling to reshape, clean, and aggregate your data. Then, you will learn how to conduct exploratory data analysis by calculating summary statistics and visualizing the data to find patterns. In the concluding chapters, you will explore some applications of anomaly detection, regression, clustering, and classification using scikit-learn to make predictions based on past data.
This updated edition will equip you with the skills you need to use pandas 1.x to efficiently perform various data manipulation tasks, reliably reproduce analyses, and visualize your data for effective decision making – valuable knowledge that can be applied across multiple domains.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 807
Veröffentlichungsjahr: 2021
A Python data science handbook for data collection, wrangling, analysis, and visualization
Stefanie Molin
BIRMINGHAM—MUMBAI
Copyright © 2021 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author(s), nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Group Product Manager: Kunal Parikh
Publishing Product Manager: Sunith Shetty
Senior Editor: Roshan Ravikumar
Content Development Editor: Athikho Sapuni Rishana
Technical Editor: Sonam Pandey
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Shankar Kalbhor
First published: July 2019
Second edition: April 2021
Production reference: 1270421
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-80056-345-2
www.packt.com
To everyone that made the first edition such a success.
As educators, we are inclined to teach across the medium that we best learn from. I personally gravitated towards video content early in my career. As I produce more online content, surprisingly, one of the most frequently asked questions I get is: What book would you recommend for someone getting started in data science?
Initially, I was baffled at why people would turn to books when there are so many great online resources out there. However, after reading Hands-On Data Analysis with Pandas, my perception of books for learning data science began to change.
The first thing I loved about Hands-On Data Analysis with Pandas was the structure. The book gives you just the right amount of information at the right time to keep you progressing at a natural pace. Starting with light foundations in statistics and concepts gives the perfect amount of cognitive glue to keep theory and practice comfortably bound together.
After the foundations, you are introduced to the star of the show: pandas. Stefanie uses practical examples (not the same old datasets you have used before) to bring the module to life. I use pandas almost every day, and I still learned quite a few tricks across these sections.
As a software engineer, Stefanie knows the importance of quality documentation. She has all of the data, examples, and more in a tidy GitHub repo. Through these examples, the book truly earns the "Hands-On" moniker in its title.
The latter portion of the book gives the reader a taste of what is possible with a strong foundation in pandas. Stefanie leads you just a little bit deeper into the more advanced machine learning concepts. Once again, she provides just enough information to get you excited about taking the next step in your learning journey without inundating you with overly technical jargon.
I could sense the pride Stefanie took in this work through our conversations. While the book is a great resource for people looking to learn data science tools, it was also a way for her to solidify her own knowledge and push her boundaries. In my opinion, you want to learn from people that are creating not only for the community but also for their own learning. People with intrinsic motivation like this are willing to go the extra mile to make that extra revision or get the wording perfect.
I hope you enjoy learning from this book as much as I did. To those who asked me the question above, I have a simple answer: This one.
Ken Jee YouTuber & Head of Data Science @ Scouts Consulting Group Honolulu, HI (03/09/2021)
Recent advancements in computing and artificial intelligence have completely changed the way we understand the world. Our current ability to record and analyze data has already transformed industries and inspired big changes in society.
Stefanie Molin's Hands-On Data Analysis with Pandas is much more than an introduction to the subject of data analysis or the pandas Python library; it's a guide to help you become part of this transformation.
Not only will this book teach you the fundamentals of using Python to collect, analyze, and understand data, but it will also expose you to important software engineering, statistical, and machine learning concepts that you will need to be successful.
Using examples based on real data, you will be able to see firsthand how to apply these techniques to extract value from data. In the process, you will learn important software development skills, including writing simulations, creating your own Python packages, and collecting data from APIs.
Stefanie possesses a rare combination of skills that makes her uniquely qualified to guide you through this process. Being both an expert data scientist and a strong software engineer, she can not only talk authoritatively about the intricacies of the data analysis workflow but also about how to implement it correctly and efficiently in Python.
Whether you are a Python programmer interested in learning more about data analysis, or a data scientist learning how to work in Python, this book will get you up to speed fast, so you can begin to tackle your own data analysis projects right away.
Felipe Moreno New York, June 10, 2019.
Felipe Moreno has been working in information security for the last two decades. He currently works for Bloomberg LP, where he leads the Security Data Science team within the Chief Information Security Office and focuses on applying statistics and machine learning to security problems.
Stefanie Molin is a data scientist and software engineer at Bloomberg LP in NYC, tackling tough problems in information security, particularly revolving around anomaly detection, building tools for gathering data, and knowledge sharing. She has extensive experience in data science, designing anomaly detection solutions, and utilizing machine learning in both R and Python in the AdTech and FinTech industries. She holds a B.S. in operations research from Columbia University's Fu Foundation School of Engineering and Applied Science, with minors in economics, and entrepreneurship and innovation. In her free time, she enjoys traveling the world, inventing new recipes, and learning new languages spoken among both people and computers.
Writing this book was a tremendous amount of work, but I have grown a lot through the experience: as a writer, as a technologist, and as a person. This wouldn't have been possible without the help of my friends, family, and colleagues. I'm very grateful to you all. In particular, I want to thank Aliki Mavromoustaki, Felipe Moreno, Suphannee Sivakorn, Lucy Hao, Javon Thompson, and Ken Jee. (The full version of my acknowledgments can be found in the code repository; see the preface for the link.)
Aliki Mavromoustaki is the lead data scientist at Tasman Analytics. She works with direct-to-consumer companies to deliver scalable infrastructure and implement event-driven analytics. Previously, she worked at Criteo, an AdTech company that employs machine learning to help digital commerce companies target valuable customers. Aliki has worked on optimizing marketing campaigns and designed statistical experiments comparing Criteo products. Aliki holds a PhD in fluid dynamics from Imperial College London and was an assistant adjunct professor in applied mathematics at UCLA.
Our journey begins with an introduction to data analysis and statistics, which will lay a strong foundation for the concepts we will cover throughout the book. Then, we will set up our Python data science environment, which contains everything we will need to work through the examples, and get started with learning the basics of pandas.
This section comprises the following chapters:
Chapter 1, Introduction to Data AnalysisChapter 2, Working with Pandas DataFrames