41,99 €
An approachable guide to applying advanced machine learning methods to everyday problems
Python programmers and data scientists - put your skills to the test with this practical guide dedicated to real-world machine learning that makes a real impact.
Machine Learning is transforming the way we understand and interact with the world around us. But how much do you really understand it? How confident are you interacting with the tools and models that drive it?
Python Machine Learning Blueprints puts your skills and knowledge to the test, guiding you through the development of some awesome machine learning applications and algorithms with real-world examples that demonstrate how to put concepts into practice.
You'll learn how to use cluster techniques to discover bargain air fares, and apply linear regression to find yourself a cheap apartment – and much more. Everything you learn is backed by a real-world example, whether its data manipulation or statistical modelling.
That way you're never left floundering in theory – you'll be simply collecting and analyzing data in a way that makes a real impact.
Packed with real-world projects, this book takes you beyond the theory to demonstrate how to apply machine learning techniques to real problems.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 266
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2016
Production reference: 1270716
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78439-475-2
www.packtpub.com
Author
Alexander T. Combs
Copy Editor
Priyanka Ravi
Reviewer
Kushal Khandelwal
Project Coordinator
Suzanne Coutinho
Commissioning Editor
Kartikey Pandey
Proofreader
Safis Editing
Acquisition Editor
Vivek Anantharaman
Manish Nainani
Indexer
Rekha Nair
Content Development Editor
Merint Thomas Mathew
Production Coordinator
Melwyn Dsa
Technical Editor
Abhishek R. Kotian
Cover Work
Melwyn Dsa
Alexander T. Combs is an experienced data scientist, strategist, and developer with a background in financial data extraction, natural language processing and generation, and quantitative and statistical modeling. He is currently a full-time lead instructor for a data science immersive program in New York City.
Writing a book is truly a massive undertaking that would not be possible without the support of others. I would like to thank my family for their love and encouragement and Jocelyn for her patience and understanding. I owe all of you tremendously.
Kushal Khandelwal is a data scientist and a full-stack developer. His interests include building scalable machine learning and image processing software applications. He is adept at coding in Python and contributes actively to various open source projects. He is currently serving as the Head of technology at Truce.in, a farmer-centric start-up where he is building scalable web applications to assist farmers.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Get notified! Find out when new books are published by following @PacktEnterprise on Twitter or the Packt Enterprise Facebook page.
Machine learning is rapidly becoming a fixture in our data-driven world. It is relied upon in fields as diverse as robotics and medicine to retail and publishing. In this book, you will learn how to build real-world machine learning applications step by step.
Working through easy-to-understand projects, you will learn how to process various types of data and how and when to apply different machine learning techniques such as supervised or unsupervised learning.
Each of the projects in this book provides educational as well as practical value. For example, you'll learn how to use clustering techniques to find bargain airfares and how to use linear regression to find a cheap apartment. This book will teach you how to use machine learning to collect, analyze, and act on massive quantities of data in an approachable, no-nonsense manner.
Chapter 1, The Python Machine Learning Ecosystem, delves into Python, which has a deep and active developer community, and many of these developers come from the scientific community as well. This has provided Python with a rich array of libraries for scientific computing. In this chapter, we will discuss the features of these key libraries and how to prepare your environment to best utilize them.
Chapter 2, Build an App to Find Underpriced Apartments, guides us to build our first machine learning application, and we begin with a minimal but practical example: building an application to identify underpriced apartments. By the end of this chapter, we will create an application that will make finding the right apartment a bit easier.
Chapter 3, Build an App to Find Cheap Airfares, demonstrates how to build an application that continually monitors fare pricing. Once an anomalous price appears, our app will generate an alert that we can quickly act on.
Chapter 4, Forecast the IPO Market using Logistic Regression, shows how we can use machine learning to decide which IPOs are worth a closer look and which ones we may want to skip.
Chapter 5, Create a Custom Newsfeed, covers how to build a system that understands your taste in news and will send you a personally tailored newsletter each day.
Chapter 6, Predict whether Your Content Will Go Viral, examines some of the most shared content and attempts to find the common elements that differentiate it from the content that people are less willing to share.
Chapter 7, Forecast the Stock Market with Machine Learning, discusses how to build and test a trading strategy. There are countless pitfalls to avoid when trying to devise your own system, and it is quite nearly an impossible task. However, it can be a lot of fun, and sometimes, it can even be profitable.
Chapter 8, Build an Image Similarity Engine, helps you construct an advanced, image-based, deep learning application. We will also cover deep learning algorithms to understand why they are so important and why there is such a hype surrounding them.
Chapter 9, Build a Chatbot, demonstrates how to construct a chatbot from scratch. Along the way, you'll learn more about the history of the field and its future prospects.
Chapter 10, Build a Recommendation Engine, explores the different varieties of recommendation systems. We'll see how they're implemented commercially and how they work. We will also implement our own recommendation engine to find GitHub repos.
All you need is Python 3.x and a desire to build real-world machine learning projects. You can refer to the detailed software list provided along with the code files of this book.
This book targets Python programmers, data scientists, and architects with a good knowledge of data science and all those who want to build complete Python-based machine learning systems.
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
The code bundle for the book is also hosted on GitHub at https://github.com/packtpublishing/pythonmachinelearningblueprints. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
Machine learning is rapidly changing our world. As the centerpiece of artificial intelligence, it is difficult to go a day without reading how it will transform our lives. Some argue it will lead us into a Singularity-style techno-utopia. Others suggest we are headed towards a techno-pocalypse marked by constant battles with job-stealing robots and drone death squads. But while the pundits may enjoy discussing these hyperbolic futures, the more mundane reality is that machine learning is rapidly becoming a fixture of our daily lives. Through subtle but progressive improvements in how we interact with computers and the world around us, machine learning is quietly improving our lives.
If you shop at online retailers such as Amazon.com, use streaming music or movie services such as Spotify or Netflix, or even just perform a Google search, you have encountered a machine learning application. The data generated by the users of these services is collected, aggregated, and fed into models that improve the services by creating tailored experiences for each user.
Now is an ideal time to dive into developing machine learning applications, and as you will discover, Python is an ideal choice with which to develop these applications. Python has a deep and active developer community, and many of these developers come from the scientific community as well. This has provided Python with a rich array of libraries for scientific computing. In this book, we will discuss and use a number of these libraries from this Python scientific stack.
In the chapters that follow, we'll learn step by step how to build a wide variety of machine learning applications. But before we begin in earnest, we'll spend the remainder of this chapter discussing the features of these key libraries and how to prepare your environment to best utilize them.
We'll cover the following topics in this chapter:
Building machine learning applications, while similar in many respects to the standard engineering paradigm, differs in one crucial way: the need to work with data as a raw material. The success of a data project will, in large part, depend on the quality of the data that you acquired as well as how it's handled. And because working with data falls into the domain of data science, it is helpful to understand the data science workflow:
The process proceeds through these six steps in the following order: acquisition, inspection and exploration, cleaning and preparation, modeling, evaluation, and finally deployment. There is often the need to circle back to prior steps, such as when inspecting and preparing the data or when evaluating and modeling, but the process at a high level can be described as shown in the preceding diagram.
Let's now discuss each step in detail.
Data for machine learning applications can come from any number of sources; it may be e-mailed as a CSV file, it may come from pulling down server logs, or it may require building a custom web scraper. The data may also come in any number of formats. In most cases, it will be text-based data, but as we'll see, machine learning applications may just as easily be built utilizing images or even video files. Regardless of the format, once the data is secured, it is crucial to understand what's in the data—as well as what isn't.
Once the data has been acquired, the next step is to inspect and explore it. At this stage, the primary goal is to sanity-check the data, and the best way to accomplish this is to look for things that are either impossible or highly unlikely. As an example, if the data has a unique identifier, check to see that there is indeed only one; if the data is price-based, check whether it is always positive; and whatever the data type, check the most extreme cases. Do they make sense? A good practice is to run some simple statistical tests on the data and visualize it. Additionally, it is likely that some data is missing or incomplete. It is critical to take note of this during this stage as it will need to be addressed it later during the cleaning and preparation stage. Models are only as good as the data that goes into them, so it is crucial to get this step right.
When all the data is in order, the next step is to place it in a format that is amenable to modeling. This stage encompasses a number of processes such as filtering, aggregating, imputing, and transforming. The type of actions that are necessary will be highly dependent on the type of data as well as the type of library and algorithm utilized. For example, with natural-language-based text, the transformations required will be very different from those required for time series data. We'll see a number of examples of these types of transformations throughout the book.
Once the data preparation is complete, the next phase is modeling. In this phase, an appropriate algorithm is selected and a model is trained on the data. There are a number of best practices to adhere to during this stage, and we will discuss them in detail, but the basic steps involve splitting the data into training, testing, and validation sets. This splitting up of the data may seem illogical—especially when more data typically yields better models—but as we'll see, doing this allows us to get better feedback on how the model will perform in the real world, and prevents us from the cardinal sin of modeling: overfitting.
Once the model is built and making predictions, the next step is to understand how well the model does that. This is the question that evaluation seeks to answer. There are a number of ways to measure the performance of a model, and again it is largely dependent on the type of data and the model used, but on the whole, we are seeking to answer the question of how close are the model's predictions to the actual value. There are arrays of confusing-sounding terms such as root mean-square error, Euclidean distance, and F1 score, but in the end, they are all just a measure of distance between the actual value and the estimated prediction.
Once the model's performance is satisfactory, the next step is deployment. This can take a number of forms depending on the use case, but common scenarios include utilization as a feature within another larger application, a bespoke web application, or even just a simple cron job.
