82,79 €
Learn to solve challenging data science problems by building powerful machine learning models using Python
This Learning Path is for Python programmers who are looking to use machine learning algorithms to create real-world applications. It is ideal for Python professionals who want to work with large and complex datasets and Python developers and analysts or data scientists who are looking to add to their existing skills by accessing some of the most powerful recent trends in data science. Experience with Python, Jupyter Notebooks, and command-line execution together with a good level of mathematical knowledge to understand the concepts is expected. Machine learning basic knowledge is also expected.
Machine learning is increasingly spreading in the modern data-driven world. It is used extensively across many fields such as search engines, robotics, self-driving cars, and more. Machine learning is transforming the way we understand and interact with the world around us.
In the first module, Python Machine Learning Cookbook, you will learn how to perform various machine learning tasks using a wide variety of machine learning algorithms to solve real-world problems and use Python to implement these algorithms.
The second module, Advanced Machine Learning with Python, is designed to take you on a guided tour of the most relevant and powerful machine learning techniques and you'll acquire a broad set of powerful skills in the area of feature selection and feature engineering.
The third module in this learning path, Large Scale Machine Learning with Python, dives into scalable machine learning and the three forms of scalability. It covers the most effective machine learning techniques on a map reduce framework in Hadoop and Spark in Python.
This Learning Path will teach you Python machine learning for the real world. The machine learning techniques covered in this Learning Path are at the forefront of commercial practice.
This Learning Path combines some of the best that Packt has to offer in one complete, curated package. It includes content from the following Packt products:
This course is a smooth learning path that will teach you how to get started with Python machine learning for the real world, and develop solutions to real-world problems. Through this comprehensive course, you'll learn to create the most effective machine learning techniques from scratch and more!
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 1091
Veröffentlichungsjahr: 2016
Learn to solve challenging data science problems by building powerful machine learning models using Python
A course in three modules
BIRMINGHAM - MUMBAI
Copyright © 2016 Packt Publishing
All rights reserved. No part of this course may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this course to ensure the accuracy of the information presented. However, the information contained in this course is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this course.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this course by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
Published on: October 2016
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78712-321-2
www.packtpub.com
Authors
Prateek Joshi
John Hearty
Bastiaan Sjardin
Luca Massaron
Alberto Boschetti
Reviewers
Dr. Vahid Mirjalili
Jared Huffman
Ashwin Pajankar
Oleg Okun
Kai Londenberg
Content Development Editor
Aishwarya Pandere
Production Coordinator
Nilesh Mohite
Machine learning is becoming increasingly pervasive in the modern data-driven world. It is used extensively across many fields, such as search engines, robotics, self-driving cars, and so on. In this course, you will explore various real-life scenarios where you can use machine learning. You will understand what algorithms you should use in a given context using this exciting recipe-based guide.
This course starts by talking about various realms in machine learning followed by practical examples.
Module 1, Python Machine Learning Cookbook, teaches you about the algorithms that we use to build recommendation engines. We will learn how to apply these algorithms to collaborative filtering and movie recommendations.
Module 2, Advanced Machine Learning with Python, explains how to apply several semi-supervised learning techniques including CPLE, self learning, and S3VM.
Module 3, Large Scale Machine Learning with Python, covers interesting deep learning techniques together with an online method for neural networks. Although TensorFlow is only in its infancy, the framework provides elegant machine learning solutions.
Module 1: While we believe that the world is moving forward with better versions coming out, a lot of developers still enjoy using Python 2.x. A lot of operating systems have Python 2.x built into them. This course is focused on machine learning in Python as opposed to Python itself. It also helps in maintaining compatibility with libraries that haven’t been ported to Python 3.x. Hence the code in the book is oriented towards Python 2.x. In that spirit, we have tried to keep all the code as agnostic as possible to the Python versions.
Module 2: The entirety of this course’s content leverages openly available data and code,including open source Python libraries and frameworks. While each chapter’s example code is accompanied by a README file documenting all the libraries required to run the code provided in that chapter’s accompanying scripts, the content of these files is collated here for your convenience. It is recommended that some libraries required for earlier chapters be available when working with code from any later chapter. These requirements are identified using bold text. Particularly, it is important to set up the first chapter’s required libraries for any content later in the book.
Module 3: The execution of the code examples provided in this book requires an installation of Python 2.7 or higher versions on macOS, Linux, or Microsoft Windows.
The examples throughout the book will make frequent use of Python’s essential libraries, such as SciPy, NumPy, Scikit-learn, and StatsModels, and to a minor extent, matplotlib and pandas, for scientific and statistical computing. We will also make use of an out-of-core cloud computing application called H2O. This book is highly dependent on Jupyter and its Notebooks powered by the Python kernel. We will use its most recent version, 4.1, for this book. The first chapter will provide you with all the step-by-step instructions and some useful tips to set up your Python environment, these core libraries, and all the necessary tools.
This Learning Path is for Python programmers who are looking to use machine learning algorithms to create real-world applications. Python professionals intending to work with large and complex datasets and Python developers and analysts or data scientists who are looking to add to their existing skills by accessing some of the most powerful recent trends in data science will find this Learning Path useful. Experience with Python, Jupyter Notebooks, and command-line execution together with a good level of mathematical knowledge to understand the concepts is expected. Machine learning basics is also expected.
Feedback from our readers is always welcome. Let us know what you think about this course—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the course’s title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt course, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this course from your account at http://www.packtpub.com. If you purchased this course elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
You can also download the code files by clicking on the Code Files button on the course’s webpage at the Packt Publishing website. This page can be accessed by entering the course’s name in the Search box. Please note that you need to be logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
The code bundle for the course is also hosted on GitHub at https://github.com/PacktPublishing/Python-Real-World-Machine-Learning. We also have other code bundles from our rich catalog of books, videos, and courses available at https://github.com/PacktPublishing/. Check them out!
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our courses—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this course. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your course, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the course in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this course, you can contact us at <[email protected]>, and we will do our best to address the problem.
Python Machine Learning Cookbook
100 recipes that teach you how to perform various machine learning tasks in the real world
In this chapter, we will cover the following recipes:
If you are familiar with the basics of machine learning, you will certainly know what supervised learning is all about. To give you a quick refresher, supervised learning refers to building a machine learning model that is based on labeled samples. For example, if we build a system to estimate the price of a house based on various parameters, such as size, locality, and so on, we first need to create a database and label it. We need to tell our algorithm what parameters correspond to what prices. Based on this data, our algorithm will learn how to calculate the price of a house using the input parameters.
Unsupervised learning is the opposite of what we just discussed. There is no labeled data available here. Let's assume that we have a bunch of datapoints, and we just want to separate them into multiple groups. We don't exactly know what the criteria of separation would be. So, an unsupervised learning algorithm will try to separate the given dataset into a fixed number of groups in the best possible way. We will discuss unsupervised learning in the upcoming chapters.
We will use various Python packages, such as NumPy, SciPy, scikit-learn, and matplotlib, during the course of this book to build various things. If you use Windows, it is recommended that you use a SciPy-stack compatible version of Python. You can check the list of compatible versions at http://www.scipy.org/install.html. These distributions come with all the necessary packages already installed. If you use Mac OS X or Ubuntu, installing these packages is fairly straightforward. Here are some useful links for installation and documentation:
Make sure that you have these packages installed on your machine before you proceed.
Now that we know how to build a regressor, it's important to understand how to evaluate the quality of a regressor as well. In this context, an error is defined as the difference between the actual value and the value that is predicted by the regressor.
Let's quickly understand what metrics can be used to measure the quality of a regressor. A regressor can be evaluated using many different metrics, such as the following:
There is a module in scikit-learn that provides functionalities to compute all the following metrics. Open a new Python file and add the following lines:
Keeping track of every single metric can get tedious, so we pick one or two metrics to evaluate our model. A good practice is to make sure that the mean squared error is low and the explained variance score is high.
In this chapter, we will cover the following recipes:
In the field of machine learning, classification refers to the process of using the characteristics of data to separate it into a certain number of classes. This is different from regression that we discussed in the previous chapter where the output is a real number. A supervised learning classifier builds a model using labeled training data and then uses this model to classify unknown data.
A classifier can be any algorithm that implements classification. In simple cases, this classifier can be a straightforward mathematical function. In more real-world cases, this classifier can take very complex forms. In the course of study, we will see that classification can be either binary, where we separate data into two classes, or it can be multiclass, where we separate data into more than two classes. The mathematical techniques that are devised to deal with the classification problem tend to deal with two classes, so we extend them in different ways to deal with the multiclass problem as well.
Evaluating the accuracy of a classifier is an important step in the world machine learning. We need to learn how to use the available data to get an idea as to how this model will perform in the real world. In this chapter, we will look at recipes that deal with all these things.
Despite the word regression being present in the name, logistic regression is actually used for classification purposes. Given a set of datapoints, our goal is to build a model that can draw linear boundaries between our classes. It extracts these boundaries by solving a set of equations derived from the training data.
