Python Machine Learning - Wei-Meng Lee - E-Book

Python Machine Learning E-Book

Wei-Meng Lee

0,0
27,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Python makes machine learning easy for beginners and experienced developers

With computing power increasing exponentially and costs decreasing at the same time, there is no better time to learn machine learning using Python. Machine learning tasks that once required enormous processing power are now possible on desktop machines. However, machine learning is not for the faint of heart—it requires a good foundation in statistics, as well as programming knowledge. Python Machine Learning will help coders of all levels master one of the most in-demand programming skillsets in use today.

Readers will get started by following fundamental topics such as an introduction to Machine Learning and Data Science. For each learning algorithm, readers will use a real-life scenario to show how Python is used to solve the problem at hand.

  • Python data science—manipulating data and data visualization
  • Data cleansing
  • Understanding Machine learning algorithms
  • Supervised learning algorithms
  • Unsupervised learning algorithms
  • Deploying machine learning models

Python Machine Learning is essential reading for students, developers, or anyone with a keen interest in taking their coding skills to the next level.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 299

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Python® Machine Learning

 

 

Wei-Meng Lee

 

 

 

 

 

 

 

Introduction

This book covers machine learning, one of the hottest topics in more recent years. With computing power increasing exponentially and prices decreasing simultaneously, there is no better time for machine learning. With machine learning, tasks that usually require huge processing power are now possible on desktop machines. Nevertheless, machine learning is not for the faint of heart—it requires a good foundation in statistics, as well as programming knowledge. Most books on the market either are too superficial or go into too much depth that often leaves beginning readers gasping for air.

This book will take a gentle approach to this topic. First, it will cover some of the fundamental libraries used in Python that make machine learning possible. In particular, you will learn how to manipulate arrays of numbers using the NumPy library, followed by using the Pandas library to deal with tabular data. Once that is done, you will learn how to visualize data using the matplotlib library, which allows you to plot different types of charts and graphs so that you can visualize your data easily.

Once you have a firm foundation in the basics, I will discuss machine learning using Python and the Scikit‐Learn libraries. This will give you a solid understanding of how the various machine learning algorithms work behind the scenes.

For this book, I will cover the common machine learning algorithms, such as regression, clustering, and classification.

This book also contains a chapter where you will learn how to perform machine learning using the Microsoft Azure Machine Learning Studio, which allows developers to start building machine learning models using drag‐and‐drop without needing to code. And most importantly, without requiring a deep knowledge of machine learning.

Finally, I will discuss how you can deploy the models that you have built, so that they can be used by client applications running on mobile and desktop devices.

It is my key intention to make this book accessible to as many developers as possible. To get the most out of this book, you should have some basic knowledge of Python programming, and some foundational understanding of basic statistics. And just like you will never be able to learn how to swim just by reading a book, I strongly suggest that you try out the sample code while you are going through the chapters. Go ahead and modify the code and see how the output varies, and very often you would be surprised by what you can do.

All the sample code in this book are available as Jupyter Notebooks (available for download from Wiley’s support page for this book, www.wiley.com/go/leepythonmachinelearning). So you could just download them and try them out immediately.

Without further delay, welcome to Python Machine Learning!

CHAPTER 1Introduction to Machine Learning

Welcome to Python Machine Learning! The fact that you are reading this book is a clear indication of your interest in this very interesting and exciting topic.

This book covers machine learning, one of the hottest programming topics in more recent years. Machine learning (ML) is a collection of algorithms and techniques used to design systems that learn from data. These systems are then able to perform predictions or deduce patterns from the supplied data.

With computing power increasing exponentially and prices decreasing simultaneously, there is no better time for machine learning. Machine learning tasks that usually require huge processing power are now possible on desktop machines. Nevertheless, machine learning is not for the faint of heart—it requires a good foundation in mathematics, statistics, as well as programming knowledge. The majority of the books in the market on machine learning go into too much detail, which often leaves beginning readers gasping for air. Most of the discussion on machine learning revolves heavily around statistical theories and algorithms, so unless you are a mathematician or a PhD candidate, you will likely find them difficult to digest. For most people, developers in particular, what they want is to have a foundational understanding of how machine learning works, and most importantly, how to apply machine learning in their applications. It is with this motive in mind that I set out to write this book.

This book will take a gentle approach to machine learning. I will attempt to do the following:

Cover the libraries in Python that lay the foundation for machine learning, namely NumPy, Pandas, and matplotlib.

Discuss machine learning using Python and the Scikit‐learn libraries. Where possible, I will manually implement the relevant machine learning algorithm using Python. This will allow you to understand how the various machine learning algorithms work behind the scenes. Once this is done, I will show how to use the Scikit‐learn libraries, which make it really easy to integrate machine learning into your own apps.

Cover the common machine learning algorithms—regressions, clustering, and classifications.

TIP

It is not the intention of this book to go into a deep discussion of machine learning algorithms. Although there are chapters that discuss some of the mathematical concepts behind the algorithms, it is my intention to make the subject easy to understand and hopefully motivate you to learn further.

Machine learning is indeed a very complex topic. But instead of discussing the complex mathematical theories behind it, I will cover it using easy‐to‐understand examples and walk you through numerous code samples. This code‐intensive book encourages readers to try out the numerous examples in the various chapters, which are designed to be independent, compact, and easy to follow and understand.

What Is Machine Learning?

If you have ever written a program, you will be familiar with the diagram shown in Figure 1.1. You write a program, feed some data into it, and get your output. For example, you might write a program to perform some accounting tasks for your business. In this case, the data collected would include your sales records, your inventory lists, and so on. The program would then take in the data and calculate your profits or loss based on your sales records. You may also perhaps churn out some nice and fanciful charts showing your sales performance. In this case, the output is the profit/loss statement, as well as other charts.

Figure 1.1: In traditional programming, the data and the program produce the output

For many years, traditional desktop and web programming have dominated the landscape, and many algorithms and methodologies have evolved to make programs run more efficiently. In more recent years, however, machine learning has taken over the programming world. Machine learning has transformed the paradigm in Figure 1.1 to a new paradigm, which is shown in Figure 1.2. Instead of feeding the data to the program, you now use the data and the output that you have collected to derive your program (also known as the model). Using the same accounting example, with the machine learning paradigm, you would take the detailed sales records (which are collectively both the data and output) and use them to derive a set of rules to make predictions. You may use this model to predict the most popular items that will sell next year, or which items will be less popular going forward.

Figure 1.2: In machine learning, the data and the output produce the program

TIP

Machine learning is about finding patterns in data.

What Problems Will Machine Learning Be Solving in This Book?

So, what exactly is machine learning? Machine learning (ML) is a collection of algorithms and techniques used to design systems that learn from data. ML algorithms have a strong mathematical and statistical basis, but they do not take into account domain knowledge. ML consists of the following disciplines:

Scientific computing

Mathematics

Statistics

A good application of machine learning is trying to determine if a particular credit card transaction is fraudulent. Given past transaction records, the data scientist's job is to clean up and transform the data based on domain knowledge so that the right ML algorithm can be applied in order to solve the problem (in this case determine if a transaction is fraudulent). A data scientist needs to know about which method of machine learning will best help in completing this task and how to apply it. The data scientist does not necessarily need to know how that method works, although knowing this will always help in building a more accurate learning model.

In this book, there are three main types of problems that we want to solve using machine learning. These problem types are as follows:

Classification:

Is this A or B?

Regression:

How much or how many?

Clustering:

How is this organized?

Classification

In machine learning, classification is identifying to which set of categories a new observation belongs based on the set of training data containing in the observed categories. Here are some examples of classification problems:

Predicting the winner for the U.S. 2020 Presidential Election

Predicting if a tumor is cancerous

Classifying the different types of flowers

A classification problem with two classes is known as a two‐class classification problem. Those with more than two classes are known as multi‐class classification problems.

The outcome of a classification problem is a discrete value indicating the predicted class in which an observation lies. The outcome of a classification problem can also be a continuous value, indicating the likelihood of an observation belonging to a particular class. For example, candidate A is predicted to win the election with a probability of 0.65 (or 65 percent). Here, 0.65 is the continuous value indicating the confidence of the prediction, and it can be converted to a class value (“win” in this case) by selecting the prediction with the highest probability.

Chapter 7 through Chapter 9 will discuss classifications in more detail.

Regression

Regression helps in forecasting the future by estimating the relationship between variables. Unlike classification (which predicts the class to which an observation belongs), regression returns a continuous output variable. Here are some examples of regression problems:

Predicting the sales number for a particular item for the next quarter

Predicting the temperatures for next week

Predicting the lifespan of a particular model of tire

Chapter 5 and Chapter 6 will discuss regressions in more detail.

Clustering

Clustering helps in grouping similar data points into intuitive groups. Given a set of data, clustering helps you discover how they are organized by grouping them into natural clumps.

Examples of clustering problems are as follows:

Which viewers like the same genre of movies

Which models of hard drives fail in the same way

Clustering is very useful when you want to discover a specific pattern in the data. Chapter 10 will discuss clustering in more detail.

Types of Machine Learning Algorithms

Machine learning algorithms fall into two broad categories:

Supervised learning algorithms

are trained with labeled data. In other words, data composed of examples of the desired answers. For instance, a model that identifies fraudulent credit card use would be trained from a dataset with labeled data points of known fraudulent and valid charges. Most machine learning is supervised.

Unsupervised learning algorithms

are used on data with no labels, and the goal is to find relationships in the data. For instance, you might want to find groupings of customer demographics with similar buying habits.

Supervised Learning

In supervised learning, a labeled dataset is used. A labeled dataset means that a group of data has been tagged with a label. This label provides informative meaning to the data. Using the label, unlabeled data can be predicted to obtain a new label. For example, a dataset may contain a series of records containing the following fields, which record the size of the various houses and the prices for which they were sold:

House Size, Price Sold

In this very simple example, Price Sold is the label. When plotted on a chart (see Figure 1.3), this dataset can help you predict the price of a house that is yet to be sold. Predicting a price for the house is a regression problem.

Figure 1.3: Using regression to predict the expected selling price of a house

Using another example, suppose that you have a dataset containing the following:

Tumor Size, Age, Malignant

The Malignant field is a label indicating if a tumor is cancerous. When you plot the dataset on a chart (see Figure 1.4), you will be able to classify it into two distinct groups, with one group containing the cancerous tumors and the other containing the benign tumors. Using this grouping, you can now predict if a new tumor is cancerous or not. This type of problem is known as a classification problem.

Figure 1.4: Using classification to categorize data into distinct classes

TIP

Chapter 6 through Chapter 9 will discuss supervised learning algorithms in more detail.

Unsupervised Learning

In unsupervised learning, the dataset used is not labeled. An easy way to visualize unlabeled data is to consider the dataset containing the waist size and leg length of a group of people:

Waist Size, Leg Length

Using unsupervised learning, your job is to try to predict a pattern in the dataset. You may plot the dataset in a chart, as shown in Figure 1.5.

Figure 1.5: Plotting the unlabeled data

You can then use some clustering algorithms to find the patterns in the dataset. The end result might be the discovery of three distinct groups of clusters in the data, as shown in Figure 1.6.

Figure 1.6: Clustering the points into distinct groups

TIP

Chapter 10 will discuss unsupervised learning algorithms in more detail.

Getting the Tools

For this book, all of the examples are tested using Python 3 and the Scikit‐learn library, a Python library that implements the various types of machine learning algorithms, such as classification, regression, clustering, decision tree, and more. Besides Scikit‐learn, you will also be using some complementary Python libraries—NumPy, Pandas, and matplotlib.

While you can install the Python interpreter and the other libraries individually on your computer, the trouble‐free way to install all of these libraries is to install the Anaconda package. Anaconda is a free Python distribution that comes with all of the necessary libraries that you need to create data science and machine learning projects.

Anaconda includes the following:

The core Python language

The various Python packages (libraries)

conda

, Anaconda's own package manager for updating Anaconda and packages

Jupyter Notebook (formerly known as

iPython Notebook

), a web‐based editor for working with Python projects

With Anaconda, you have the flexibility to install different languages (R, JavaScript, Julia, and so on) to work in Jupyter Notebook.

Obtaining Anaconda

To download Anaconda, go to https://www.anaconda.com/download/. You will be able to download Anaconda for these operating systems (see Figure 1.7):

Figure 1.7: Downloading Anaconda for Python 3

Windows

macOS

Linux

Download the Python 3 for the platform you are using.

NOTE

At the time of this writing, Python is in version 3.7.

TIP

For this book, we will be using Python 3. So be sure to download the correct version of Anaconda containing Python 3.

Installing Anaconda

Installing Anaconda is mostly a non‐event process. Double‐click the file that you have downloaded, and follow the instructions displayed on the screen. In particular, Anaconda for Windows has the option to be installed only for the local user. This option does not require administrator rights, and hence it is very useful for users who are installing Anaconda on company‐issued computers, which are usually locked down with limited user privileges.

Once Anaconda is installed, you will want to launch Jupyter Notebook. Jupyter Notebook is an open source web application, which allows you to create and share documents that contain documentation, code, and more.

Running Jupyter Notebook for Mac

To launch Jupyter from macOS, launch Terminal and type the following command:

$

jupyter notebook

You will see the following:

$ jupyter notebook

[I 18:57:03.642 NotebookApp] JupyterLab extension loaded from /Users/weimenglee/anaconda3/lib/python3.7/site-packages/jupyterlab

[I 18:57:03.643 NotebookApp] JupyterLab application directory is /Users/weimenglee/anaconda3/share/jupyter/lab

[I 18:57:03.648 NotebookApp] Serving notebooks from local directory: /Users/weimenglee/Python Machine Learning

[I 18:57:03.648 NotebookApp] The Jupyter Notebook is running at:

[I 18:57:03.648 NotebookApp] http://localhost:8888/?token=3700cfe13b65982612c0e1975ce3a68107399b07f89b85fa

[I 18:57:03.648 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).

[C 18:57:03.649 NotebookApp]

Copy/paste this URL into your browser when you connect for the first time,

to login with a token:

http://localhost:8888/?token=3700cfe13b65982612c0e1975ce3a68107399b07f89b85fa

[I 18:57:04.133 NotebookApp] Accepting one-time-token-authenticated connection from ::1

Essentially, Jupyter Notebook starts a web server listening at port 8888. After a while, a web browser will launch (see Figure 1.8).

Figure 1.8: The Jupyter Notebook Home page

TIP

The Home page of Jupyter Notebook shows the content of the directory from where it is launched. Hence, it is always useful to change to the directory that contains your source code first, prior to launching Jupyter Notebook.

Running Jupyter Notebook for Windows

The best way to launch Jupyter Notebook in Windows is to launch it from the Anaconda Prompt. The Anaconda Prompt automatically runs the batch file located at C:\Anaconda3\Scripts\activate.bat with the following argument:

C:\Anaconda3\Scripts\activate.bat C:\Anaconda3

TIP