E-Book
41,99 €

Machine Learning for the Web E-Book

Andrea Isoni

0,0

41,99 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Explore the web and make smarter predictions using Python

About This Book

Targets two big and prominent markets where sophisticated web apps are of need and importance.
Practical examples of building machine learning web application, which are easy to follow and replicate.
A comprehensive tutorial on Python libraries and frameworks to get you up and started.

Who This Book Is For

The book is aimed at upcoming and new data scientists who have little experience with machine learning or users who are interested in and are working on developing smart (predictive) web applications. Knowledge of Django would be beneficial. The reader is expected to have a background in Python programming and good knowledge of statistics.

What You Will Learn

Get familiar with the fundamental concepts and some of the jargons used in the machine learning community
Use tools and techniques to mine data from websites
Grasp the core concepts of Django framework
Get to know the most useful clustering and classification techniques and implement them in Python
Acquire all the necessary knowledge to build a web application with Django
Successfully build and deploy a movie recommendation system application using the Django framework in Python

In Detail

Python is a general purpose and also a comparatively easy to learn programming language. Hence it is the language of choice for data scientists to prototype, visualize, and run data analyses on small and medium-sized data sets. This is a unique book that helps bridge the gap between machine learning and web development. It focuses on the difficulties of implementing predictive analytics in web applications. We focus on the Python language, frameworks, tools, and libraries, showing you how to build a machine learning system. You will explore the core machine learning concepts and then develop and deploy the data into a web application using the Django framework. You will also learn to carry out web, document, and server mining tasks, and build recommendation engines. Later, you will explore Python's impressive Django framework and will find out how to build a modern simple web app with machine learning features.

Style and approach

Instead of being overwhelmed with multiple concepts at once, this book provides a step-by-step approach that will guide you through one topic at a time.

An intuitive step-by step guide that will focus on one key topic at a time. Building upon the acquired knowledge in each chapter, we will connect the fundamental theory and practical tips by illustrative visualizations and hands-on code examples.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 277

Veröffentlichungsjahr: 2016

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Machine Learning for the Web

Credits

Foreword

About the Author

About the Reviewers

www.PacktPub.com

eBooks, discount offers, and more

Why subscribe?

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

1. Introduction to Practical Machine Learning Using Python

General machine-learning concepts

Machine-learning example

Installing and importing a module (library)

Preparing, manipulating and visualizing data – NumPy, pandas and matplotlib tutorials

Using NumPy

Arrays creation

Array manipulations

Array operations

Linear algebra operations

Statistics and mathematical functions

Understanding the pandas module

Exploring data

Manipulate data

Matplotlib tutorial

Scientific libraries used in the book

When to use machine learning

Summary

2. Unsupervised Machine Learning

Clustering algorithms

Distribution methods

Expectation maximization

Mixture of Gaussians

Centroid methods

k-means

Density methods

Mean – shift

Hierarchical methods

Training and comparison of the clustering methods

Dimensionality reduction

Principal Component Analysis (PCA)

PCA example

Singular value decomposition

Summary

3. Supervised Machine Learning

Model error estimation

Generalized linear models

Linear regression

Ridge regression

Lasso regression

Logistic regression

Probabilistic interpretation of generalized linear models

k-nearest neighbours (KNN)

Naive Bayes

Multinomial Naive Bayes

Gaussian Naive Bayes

Decision trees

Support vector machine

Kernel trick

A comparison of methods

Regression problem

Classification problem

Hidden Markov model

A Python example

Summary

4. Web Mining Techniques

Web structure mining

Web crawlers (or spiders)

Indexer

Ranking – PageRank algorithm

Web content mining

Parsing

Natural language processing

Information retrieval models

TF-IDF

Latent Semantic Analysis (LSA)

Doc2Vec (word2vec)

Word2vec – continuous bag of words and skip-gram architectures

Mathematical description of the CBOW model

Doc2Vec extension

Movie review query example

Postprocessing information

Latent Dirichlet allocation

Model

Example

Opinion mining (sentiment analysis)

Summary

5. Recommendation Systems

Utility matrix

Similarities measures

Collaborative Filtering methods

Memory-based Collaborative Filtering

User-based Collaborative Filtering

Item-based Collaborative Filtering

Simplest item-based Collaborative Filtering – slope one

Model-based Collaborative Filtering

Alternative least square (ALS)

Stochastic gradient descent (SGD)

Non-negative matrix factorization (NMF)

Singular value decomposition (SVD)

CBF methods

Item features average method

Regularized linear regression method

Association rules for learning recommendation system

Log-likelihood ratios recommendation system method

Hybrid recommendation systems

Evaluation of the recommendation systems

Root mean square error (RMSE) evaluation

Classification metrics

Summary

6. Getting Started with Django

HTTP – the basics of the GET and POST methods

Installation and server creation

Settings

Writing an app – most important features

Models

URL and views behind HTML web pages

HTML pages

URL declarations and views

Admin

Shell interface

Commands

RESTful application programming interfaces (APIs)

Summary

7. Movie Recommendation System Web Application

Application setup

Models

Commands

User sign up login/logout implementation

Information retrieval system (movies query)

Rating system

Recommendation systems

Admin interface and API

Summary

8. Sentiment Analyser Application for Movie Reviews

Application usage overview

Search engine choice and the application code

Scrapy setup and the application code

Scrapy settings

Scraper

Pipelines

Crawler

Django models

Integrating Django with Scrapy

Commands (sentiment analysis model and delete queries)

Sentiment analysis model loader

Deleting an already performed query

Sentiment reviews analyser – Django views and HTML

PageRank: Django view and the algorithm code

Admin and API

Summary

Index

Machine Learning for the Web

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: July 2016

Production reference: 1250716

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78588-660-7

www.packtpub.com

Credits

Author

Andrea Isoni

Reviewers

Chetan Khatri

Pavan Kumar Kolluru

Dipanjan Sarkar

Commissioning Editor

Akram Hussain

Acquisition Editor

Sonali Vernekar

Content Development Editor

Arun Nadar

Technical Editor

Sushant S Nadkar

Copy Editor

Vikrant Phadkay

Project Coordinator

Ritika Manoj

Proofreader

Safis Editing

Indexer

Mariammal Chettiyar

Graphics

Disha Haria

Kirk D'Penha

Abhinash Sahu

Production Coordinator

Arvindkumar Gupta

Cover Work

Arvindkumar Gupta

Foreword

What is machine learning? In the past year, whether it was during a conference, a seminar or an interview, a lot of people have asked me to define machine learning. There is a lot of curiosity around what it is, as human nature requires us to define something before we begin to understand what its potential impact on our lives may be, what this new thing may mean for us in the future.

Similar to other disciplines that become suddenly popular, machine learning is not new. A lot of people in the scientific community have been working with algorithms to automate repetitive activities over time for several years now. An algorithm where the parameters are fixed is called static algorithm and its output is predictable and function only of the input variables. On the other hand, when the parameters of the algorithm are dynamic and function of external factors (most frequently, the previous outputs of the same algorithm), then it is called dynamic ts output is no longer function only of the input variables and that is the founding pillar of machine learning: a set of instructions that can learn from the data generated during previous iterations to make a better output the following time.

Scientists, developers, and engineers have been dealing with fuzzy logic, neural networks, and other kinds of machine learning techniques for years, but it is only now that this discipline is becoming popular, as its applications have left the lab and are now used in marketing, sales, and finance—basically, every activity that requires the repetition of the same operation over and over again could benefit from machine learning.

The implications are easy to grasp and will have a deep impact on our society. The best way I can think of to describe what will likely happen in the next 5 to 10 years with machine learning is recalling what happened during the industrial revolution. Before the advent of the steam engine, lots of people were performing highly repetitive physical tasks, often risking their lives or their health for minimum wages; thanks to the industrial revolution, society evolved and machines took over the relevant parts of manufacturing processes, leading to improved yields, more predictable and stable outputs, improved quality of the products and new kinds of jobs, controlling the machines that were replacing physical labor. This was the first time in the history of mankind where man had delegated the responsibility for the creation of something else to a thing we had designed and invented. In the same way, machine learning will change the way data operations are performed, reducing the need of human intervention and leaving optimization to machines and algorithms. Operators will no longer have a direct control over data, but they will control algorithms that, in turn, will control data. This will allow faster execution of operations, larger datasets will be manageable by fewer people, errors will be reduced, and more stable and predictable outcomes will be guaranteed. As many things that have a deep impact on our society, it is easy to love it as it is to hate it. Lovers will praise the benefits that machine learning will drive to their lives, haters will be criticizing the fact that, in order to be effective, machine learning needs lots of iterations, hence, lots of data. Usually, the data we feed algorithms with is our own personal information.

In fact, the main applications where machine learning is taking off as a tool to improve productivity are marketing and customer support, where a deep knowledge of the customer is required to give him/her the personal service that will make the difference between a purchase or a visit or between a happy and an unhappy customer.

In marketing, for example, marketers are starting to take into consideration information, such as location, device, past purchases, what websites one has visited, weather conditions, to name just a few of the parameters that determine whether a company would decide to display its ads to a specific set of customers.

Long gone are the days of broadcasting marketing messages through untraceable media, such as TV or newspapers. Today's marketers want to know everything about who clicks and buys their products so that they can optimize creatives, spend, and allocate budget to make the best possible use of the resources at their disposal. This leads to unprecedented levels of personalization that, when exploited properly, make customers feel valued as individuals and not part of a socio-demographic group.

It is intriguing and challenging at the same time, but there is no doubt that the winners of the next decade will be those companies or individuals who can understand unstructured data and make decisions based on them in a scalable way: I see no other way than machine learning to achieve such a feat.

Andrea Isoni's book is a step into this world; reading it will be like a peek down the rabbit hole, where you'll be able to see a few applications of these techniques, mostly applied to web development, where machine learning serves to create customized websites and allow customers to see their own, optimized version of a service

If you want to futureproof your career, this is a must read; anyone dealing with data in the next decade will need to be proficient in these techniques to succeed.

Davide Cervellin, @ingdave

Head of EU Analytics at eBay

About the Author

Andrea Isoni is a data scientist, PhD, and physicist professional with extensive experience in software developer positions. He has an extensive knowledge of machine learning algorithms and techniques. He also has experience with multiple languages, such as Python, C/C++, Java, JavaScript, C#, SQL, HTML, and Hadoop.

About the Reviewers

Chetan Khatri is a data science researcher who has a total of 4.6 years of experience in research and development. He works as a principal engineer, data and machine learning, at Nazara Technologies Pvt. Ltd, where he leads data science practice in the gaming business and the subscription telecom business. He has worked with a leading data company and a Big 4 company, where he managed the Data Science Practice Platform and one of the Big 4 company's resources team. Previously, he was worked with R & D Lab and Eccella Corporation. He completed his master's degree in computer science and minor data science at KSKV Kachchh University as a gold medalist.

He contributes to society in various ways, including giving talks to sophomore students at universities and giving talks on the various fields of data science in academia and at various conferences, thus helping the community by providing a data science platform. He has excellent correlative knowledge of both academic research and industry best practices. He loves to participate in Data Science Hackathons. He is one of the founding members of PyKutch—A Python Community. Currently, he is exploring deep neural networks and reinforcement learning with parallel and distributed computing for government data.

I would like to thanks Prof. Devji Chhanga, Head of the Computer Science Department, University of Kachchh, for routing me to the correct path and for his valuable guidance in the field of data science research. I would also like to thank my beloved family.

Pavan Kumar Kolluru is an interdisciplinary engineer with expertise in Big Data; digital images and processing; remote sensing (hyperspectral data and images); and programming in Python, R, and MATLAB. His major emphasis is on Big Data, using machine learning techniques, and its algorithm development.

His quest is to find a link between different disciplines so as to make their processes much easier computationally and automatic.

As a data (image and signal) processing professional/trainer, he has worked on multi/hyper spectral data, which gave him expertise in processing, information extraction, and segmentation with advanced processing using OOA, random sets, and Markov random fields.

As a programmer/trainer, he concentrates on Python and R languages, serving both the corporate and educational fraternities. He also trained various batches in Python and packages (signal, image, data analytics, and so on).

As a machine learning researcher/trainer, he has expertise in classifications (Sup and Unsup), modeling and data understanding, regressions, and data dimensionality reduction (DR). This lead him to develop a novel machine learning algorithm on Big Data (images or signals) that performs DR and classifications in a single framework in his M.Sc. research, fetching distinction marks for it. He trained engineers from various corporate giants on Big Data analysis using Hadoop and MapReduce. His expertise in Big Data analysis is in HDFS, Pig, Hive, and Spark.

Dipanjan Sarkar is an Data Scientist at Intel, the world's largest silicon company which is on a mission to make the world more connected and productive. He primarily works on analytics, business intelligence, application development, and building large scale intelligent systems. He received his Master's degree in Information Technology from the International Institute of Information Technology, Bangalore. His area of specialization includes software engineering, data science, machine learning, and text analytics.

Dipanjan's interests include learning about new technology, disruptive start-ups, data science, and more recently deep learning. In his spare time he loves reading, writing, gaming, and watching popular sitcoms. He has authored a book on Machine Learning titled R Machine Learning by Example,Packt Publishing and also acted as a technical reviewer for several books on machine learning and Data Science from Packt Publishing.

www.PacktPub.com

eBooks, discount offers, and more

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Preface

Data science and machine learning in particular are emerging as leading topics in the tech commercial environment to evaluate the always increasing amount of data generated by the users. This book will explain how to use Python to develop a web commercial application using Django and how to employ some specific libraries (sklearn, scipy, nltk, Django, and some others) to manipulate and analyze (through machine learning techniques) data that is generated or used in the application.

What this book covers

Chapter 1, Introduction to Practical Machine Learning Using Python, discusses the main machine learning concepts together with the libraries used by data science professionals to handle the data in Python.

Chapter 2, Machine Learning Techniques – Unsupervised Learning, describes the algorithms used to cluster datasets and to extract the main features from the data.

Chapter 3, Supervised Machine Learning, presents the most relevant supervised algorithms to predict the labels of a dataset.

Chapter 4, Web Mining Techniques, discusses the main techniques to organize, analyze, and extract information from web data

Chapter 5, Recommendation Systems, covers the most popular recommendation systems used in a commercial environment to date in detail.

Chapter 6, Getting Started with Django, introduces the main Django features and characteristics to develop a web application.

Chapter 7, Movie Recommendation System Web Application, describes an example to put in practice the machine learning concepts developed in Chapter 5, Recommendation Systems and Chapter 6, Getting Started with Django, recommending movies to final web users.

Chapter 8, Sentiment Analyser Application on Movie Reviews, covers another example to use the knowledge explained in Chapter 3, Supervised Machine Learning, Chapter 4, Web Mining Techniques, and Chapter 6, Getting Started with Django, analyzing the sentiment of the movies' reviews online and their importance.

What you need for this book

The reader should have a computer with Python 2.7 installed to be able to run (and modify) the code discussed throughout the chapters.

Who this book is for

Any person with some programming (in Python) and statistics background who is curious about machine learning and/or pursuing a career in data science will benefit from reading this book.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.Hover the mouse pointer on the SUPPORT tab at the top.Click on Code Downloads & Errata.Enter the name of the book in the Search box.Select the book for which you're looking to download the code files.Choose from the drop-down menu where you purchased this book from.Click on Code Download.

You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for WindowsZipeg / iZip / UnRarX for Mac7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-the-Web. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/Machine-Learning-for-the-Web.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at <[email protected]> with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.