E-Book
41,99 €

Python Machine Learning Blueprints: Intuitive data projects you can relate to E-Book

Alexander T. Combs

0,0

41,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

An approachable guide to applying advanced machine learning methods to everyday problems

About This Book

Put machine learning principles into practice to solve real-world problems
Get to grips with Python's impressive range of Machine Learning libraries and frameworks
From retrieving data from APIs to cleaning and visualization, become more confident at tackling every stage of the data pipeline

Who This Book Is For

Python programmers and data scientists - put your skills to the test with this practical guide dedicated to real-world machine learning that makes a real impact.

What You Will Learn

Explore and use Python's impressive machine learning ecosystem
Successfully evaluate and apply the most effective models to problems
Learn the fundamentals of NLP - and put them into practice
Visualize data for maximum impact and clarity
Deploy machine learning models using third party APIs
Get to grips with feature engineering

In Detail

Machine Learning is transforming the way we understand and interact with the world around us. But how much do you really understand it? How confident are you interacting with the tools and models that drive it?

Python Machine Learning Blueprints puts your skills and knowledge to the test, guiding you through the development of some awesome machine learning applications and algorithms with real-world examples that demonstrate how to put concepts into practice.

You'll learn how to use cluster techniques to discover bargain air fares, and apply linear regression to find yourself a cheap apartment – and much more. Everything you learn is backed by a real-world example, whether its data manipulation or statistical modelling.

That way you're never left floundering in theory – you'll be simply collecting and analyzing data in a way that makes a real impact.

Style and approach

Packed with real-world projects, this book takes you beyond the theory to demonstrate how to apply machine learning techniques to real problems.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 266

Veröffentlichungsjahr: 2016

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Python Machine Learning Blueprints

Credits

About the Author

About the Reviewer

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. The Python Machine Learning Ecosystem

The data science/machine learning workflow

Acquisition

Inspection and exploration

Cleaning and preparation

Modeling

Evaluation

Deployment

Python libraries and functions

Acquisition

Inspection

The Jupyter notebook

Pandas

Visualization

The matplotlib library

The seaborn library

Preparation

Map

Apply

Applymap

Groupby

Modeling and evaluation

Statsmodels

Scikit-learn

Deployment

Setting up your machine learning environment

Summary

2. Build an App to Find Underpriced Apartments

Sourcing the apartment listing data

Pulling listing data using import.io

Inspecting and preparing the data

Analyzing the data

Visualizing the data

Modeling the data

Forecasting

Extending the model

Summary

3. Build an App to Find Cheap Airfares

Sourcing airfare pricing data

Retrieving the fare data with advanced web scraping techniques

Parsing the DOM to extract pricing data

Identifying outlier fares with clustering techniques

Sending real-time alerts using IFTTT

Putting it all together

Summary

4. Forecast the IPO Market using Logistic Regression

The IPO market

What is an IPO?

Recent IPO market performance

Baseline IPO strategy

Feature engineering

Binary classification

Feature importance

Summary

5. Create a Custom Newsfeed

Creating a supervised training set with the Pocket app

Installing the Pocket Chrome extension

Using the Pocket API to retrieve stories

Using the embed.ly API to download story bodies

Natural language processing basics

Support vector machines

IFTTT integration with feeds, Google Sheets, and e-mail

Setting up news feeds and Google Sheets through IFTTT

Setting up your daily personal newsletter

Summary

6. Predict whether Your Content Will Go Viral

What does research tell us about virality?

Sourcing shared counts and content

Exploring the features of shareability

Exploring image data

Exploring the headlines

Exploring the story content

Building a predictive content scoring model

Summary

7. Forecast the Stock Market with Machine Learning

Types of market analysis

What does research tell us about the stock market?

How to develop a trading strategy

Extending our analysis period

Building our model with a support vector regression

Evaluating our model's performance

Modeling with Dynamic Time Warping

Summary

8. Build an Image Similarity Engine

Machine learning on images

Working with images

Finding similar images

Understanding deep learning

Building an image similarity engine

Summary

9. Build a Chatbot

The Turing test

The history of chatbots

The design of chatbots

Building a chatbot

Summary

10. Build a Recommendation Engine

Collaborative filtering

User-to-user filtering

Item-to-item filtering

Content-based filtering

Hybrid systems

Building a recommendation engine

Summary

Python Machine Learning Blueprints

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2016

Production reference: 1270716

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78439-475-2

www.packtpub.com

Credits

Author

Alexander T. Combs

Copy Editor

Priyanka Ravi

Reviewer

Kushal Khandelwal

Project Coordinator

Suzanne Coutinho

Commissioning Editor

Kartikey Pandey

Proofreader

Safis Editing

Acquisition Editor

Vivek Anantharaman

Manish Nainani

Indexer

Rekha Nair

Content Development Editor

Merint Thomas Mathew

Production Coordinator

Melwyn Dsa

Technical Editor

Abhishek R. Kotian

Cover Work

Melwyn Dsa

About the Author

Alexander T. Combs is an experienced data scientist, strategist, and developer with a background in financial data extraction, natural language processing and generation, and quantitative and statistical modeling. He is currently a full-time lead instructor for a data science immersive program in New York City.

Writing a book is truly a massive undertaking that would not be possible without the support of others. I would like to thank my family for their love and encouragement and Jocelyn for her patience and understanding. I owe all of you tremendously.

About the Reviewer

Kushal Khandelwal is a data scientist and a full-stack developer. His interests include building scalable machine learning and image processing software applications. He is adept at coding in Python and contributes actively to various open source projects. He is currently serving as the Head of technology at Truce.in, a farmer-centric start-up where he is building scalable web applications to assist farmers.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

Get notified! Find out when new books are published by following @PacktEnterprise on Twitter or the Packt Enterprise Facebook page.

Preface

Machine learning is rapidly becoming a fixture in our data-driven world. It is relied upon in fields as diverse as robotics and medicine to retail and publishing. In this book, you will learn how to build real-world machine learning applications step by step.

Working through easy-to-understand projects, you will learn how to process various types of data and how and when to apply different machine learning techniques such as supervised or unsupervised learning.

Each of the projects in this book provides educational as well as practical value. For example, you'll learn how to use clustering techniques to find bargain airfares and how to use linear regression to find a cheap apartment. This book will teach you how to use machine learning to collect, analyze, and act on massive quantities of data in an approachable, no-nonsense manner.

What this book covers

Chapter 1, The Python Machine Learning Ecosystem, delves into Python, which has a deep and active developer community, and many of these developers come from the scientific community as well. This has provided Python with a rich array of libraries for scientific computing. In this chapter, we will discuss the features of these key libraries and how to prepare your environment to best utilize them.

Chapter 2, Build an App to Find Underpriced Apartments, guides us to build our first machine learning application, and we begin with a minimal but practical example: building an application to identify underpriced apartments. By the end of this chapter, we will create an application that will make finding the right apartment a bit easier.

Chapter 3, Build an App to Find Cheap Airfares, demonstrates how to build an application that continually monitors fare pricing. Once an anomalous price appears, our app will generate an alert that we can quickly act on.

Chapter 4, Forecast the IPO Market using Logistic Regression, shows how we can use machine learning to decide which IPOs are worth a closer look and which ones we may want to skip.

Chapter 5, Create a Custom Newsfeed, covers how to build a system that understands your taste in news and will send you a personally tailored newsletter each day.

Chapter 6, Predict whether Your Content Will Go Viral, examines some of the most shared content and attempts to find the common elements that differentiate it from the content that people are less willing to share.

Chapter 7, Forecast the Stock Market with Machine Learning, discusses how to build and test a trading strategy. There are countless pitfalls to avoid when trying to devise your own system, and it is quite nearly an impossible task. However, it can be a lot of fun, and sometimes, it can even be profitable.

Chapter 8, Build an Image Similarity Engine, helps you construct an advanced, image-based, deep learning application. We will also cover deep learning algorithms to understand why they are so important and why there is such a hype surrounding them.

Chapter 9, Build a Chatbot, demonstrates how to construct a chatbot from scratch. Along the way, you'll learn more about the history of the field and its future prospects.

Chapter 10, Build a Recommendation Engine, explores the different varieties of recommendation systems. We'll see how they're implemented commercially and how they work. We will also implement our own recommendation engine to find GitHub repos.

What you need for this book

All you need is Python 3.x and a desire to build real-world machine learning projects. You can refer to the detailed software list provided along with the code files of this book.

Who this book is for

This book targets Python programmers, data scientists, and architects with a good knowledge of data science and all those who want to build complete Python-based machine learning systems.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/packtpublishing/pythonmachinelearningblueprints. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Chapter 1. The Python Machine Learning Ecosystem

Machine learning is rapidly changing our world. As the centerpiece of artificial intelligence, it is difficult to go a day without reading how it will transform our lives. Some argue it will lead us into a Singularity-style techno-utopia. Others suggest we are headed towards a techno-pocalypse marked by constant battles with job-stealing robots and drone death squads. But while the pundits may enjoy discussing these hyperbolic futures, the more mundane reality is that machine learning is rapidly becoming a fixture of our daily lives. Through subtle but progressive improvements in how we interact with computers and the world around us, machine learning is quietly improving our lives.

If you shop at online retailers such as Amazon.com, use streaming music or movie services such as Spotify or Netflix, or even just perform a Google search, you have encountered a machine learning application. The data generated by the users of these services is collected, aggregated, and fed into models that improve the services by creating tailored experiences for each user.

Now is an ideal time to dive into developing machine learning applications, and as you will discover, Python is an ideal choice with which to develop these applications. Python has a deep and active developer community, and many of these developers come from the scientific community as well. This has provided Python with a rich array of libraries for scientific computing. In this book, we will discuss and use a number of these libraries from this Python scientific stack.

In the chapters that follow, we'll learn step by step how to build a wide variety of machine learning applications. But before we begin in earnest, we'll spend the remainder of this chapter discussing the features of these key libraries and how to prepare your environment to best utilize them.

We'll cover the following topics in this chapter:

The data science/machine learning workflowLibraries for each stage of the workflowSetting up your environment

The data science/machine learning workflow

Building machine learning applications, while similar in many respects to the standard engineering paradigm, differs in one crucial way: the need to work with data as a raw material. The success of a data project will, in large part, depend on the quality of the data that you acquired as well as how it's handled. And because working with data falls into the domain of data science, it is helpful to understand the data science workflow:

The process proceeds through these six steps in the following order: acquisition, inspection and exploration, cleaning and preparation, modeling, evaluation, and finally deployment. There is often the need to circle back to prior steps, such as when inspecting and preparing the data or when evaluating and modeling, but the process at a high level can be described as shown in the preceding diagram.

Let's now discuss each step in detail.

Acquisition

Data for machine learning applications can come from any number of sources; it may be e-mailed as a CSV file, it may come from pulling down server logs, or it may require building a custom web scraper. The data may also come in any number of formats. In most cases, it will be text-based data, but as we'll see, machine learning applications may just as easily be built utilizing images or even video files. Regardless of the format, once the data is secured, it is crucial to understand what's in the data—as well as what isn't.

Inspection and exploration

Once the data has been acquired, the next step is to inspect and explore it. At this stage, the primary goal is to sanity-check the data, and the best way to accomplish this is to look for things that are either impossible or highly unlikely. As an example, if the data has a unique identifier, check to see that there is indeed only one; if the data is price-based, check whether it is always positive; and whatever the data type, check the most extreme cases. Do they make sense? A good practice is to run some simple statistical tests on the data and visualize it. Additionally, it is likely that some data is missing or incomplete. It is critical to take note of this during this stage as it will need to be addressed it later during the cleaning and preparation stage. Models are only as good as the data that goes into them, so it is crucial to get this step right.

Cleaning and preparation

When all the data is in order, the next step is to place it in a format that is amenable to modeling. This stage encompasses a number of processes such as filtering, aggregating, imputing, and transforming. The type of actions that are necessary will be highly dependent on the type of data as well as the type of library and algorithm utilized. For example, with natural-language-based text, the transformations required will be very different from those required for time series data. We'll see a number of examples of these types of transformations throughout the book.

Modeling

Once the data preparation is complete, the next phase is modeling. In this phase, an appropriate algorithm is selected and a model is trained on the data. There are a number of best practices to adhere to during this stage, and we will discuss them in detail, but the basic steps involve splitting the data into training, testing, and validation sets. This splitting up of the data may seem illogical—especially when more data typically yields better models—but as we'll see, doing this allows us to get better feedback on how the model will perform in the real world, and prevents us from the cardinal sin of modeling: overfitting.

Evaluation

Once the model is built and making predictions, the next step is to understand how well the model does that. This is the question that evaluation seeks to answer. There are a number of ways to measure the performance of a model, and again it is largely dependent on the type of data and the model used, but on the whole, we are seeking to answer the question of how close are the model's predictions to the actual value. There are arrays of confusing-sounding terms such as root mean-square error, Euclidean distance, and F1 score, but in the end, they are all just a measure of distance between the actual value and the estimated prediction.

Deployment

Once the model's performance is satisfactory, the next step is deployment. This can take a number of forms depending on the use case, but common scenarios include utilization as a feature within another larger application, a bespoke web application, or even just a simple cron job.