31,19 €
Statistics remain the backbone of modern analysis tasks, helping you to interpret the results produced by data science pipelines. This book is a detailed guide covering the math and various statistical methods required for undertaking data science tasks.
The book starts by showing you how to preprocess data and inspect distributions and correlations from a statistical perspective. You’ll then get to grips with the fundamentals of statistical analysis and apply its concepts to real-world datasets. As you advance, you’ll find out how statistical concepts emerge from different stages of data science pipelines, understand the summary of datasets in the language of statistics, and use it to build a solid foundation for robust data products such as explanatory models and predictive models. Once you’ve uncovered the working mechanism of data science algorithms, you’ll cover essential concepts for efficient data collection, cleaning, mining, visualization, and analysis. Finally, you’ll implement statistical methods in key machine learning tasks such as classification, regression, tree-based methods, and ensemble learning.
By the end of this Essential Statistics for Non-STEM Data Analysts book, you’ll have learned how to build and present a self-contained, statistics-backed data product to meet your business goals.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 388
Veröffentlichungsjahr: 2020
Get to grips with the statistics and math knowledge needed to enter the world of data science with Python
Rongpeng Li
BIRMINGHAM—MUMBAI
Copyright © 2020 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Commissioning Editor: Sunith Shetty
Acquisition Editor: Devika Battike
Senior Editor: Roshan Kumar
Content Development Editor: Sean Lobo
Technical Editor: Sonam Pandey
Copy Editor: Safis Editing
Project Coordinator: Aishwarya Mohan
Proofreader: Safis Editing
Indexer: Pratik Shirodkar
Production Designer: Roshan Kawale
First published: November 2020
Production reference: 1111120
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.
ISBN 978-1-83898-484-7
www.packt.com
Packt.com
Subscribe to our online digital library for full access to over 7,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.
Rongpeng Li is a data science instructor and a senior data scientist at Galvanize, Inc. He has previously been a research programmer at Information Sciences Institute, working on knowledge graphs and artificial intelligence. He has also been the host and organizer of the Data Analysis Workshop Designed for Non-STEM Busy Professionals at LA.
Michael Hansen (https://www.linkedin.com/in/michael-n-hansen/), a friend of mine, provided invaluable English language editing suggestions for this book. Michael has great attention to detail, which made him a great language reviewer. Thank you, Michael!
James Mott, PhD, is a senior education consultant with extensive experience in teaching statistical analysis, modeling, data mining, and predictive analytics. He has over 30 years of experience using SPSS products in his own research, including IBM SPSS Statistics, IBM SPSS Modeler, and IBM SPSS Amos. He has also been actively teaching about these products to IBM/SPSS customers for over 30 years. In addition, he is an experienced historian with expertise in the research and teaching of 20th century United States political history and quantitative methods. His specialties are data mining, quantitative methods, statistical analysis, teaching, and consulting.
Yidan Pan obtained her PhD in system, synthetic, and physical biology from Rice University. Her research interest is profiling mutagenesis at genomic and transcriptional levels with molecular biology wet labs, bioinformatics, statistical analysis, and machine learning models. She believes that this book will give its readers a lot of practical skills for data analysis.
If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.
In this section, you will learn how to preprocess data and inspect distributions and correlations from a statistical perspective.
This section consists of the following chapters:
Chapter 1, Fundamentals of Data Collection, Cleaning, and PreprocessingChapter 2, Essential Statistics for Data AssessmentChapter 3, Visualization with Statistical Graphs