41,99 €
Explore the web and make smarter predictions using Python
The book is aimed at upcoming and new data scientists who have little experience with machine learning or users who are interested in and are working on developing smart (predictive) web applications. Knowledge of Django would be beneficial. The reader is expected to have a background in Python programming and good knowledge of statistics.
Python is a general purpose and also a comparatively easy to learn programming language. Hence it is the language of choice for data scientists to prototype, visualize, and run data analyses on small and medium-sized data sets. This is a unique book that helps bridge the gap between machine learning and web development. It focuses on the difficulties of implementing predictive analytics in web applications. We focus on the Python language, frameworks, tools, and libraries, showing you how to build a machine learning system. You will explore the core machine learning concepts and then develop and deploy the data into a web application using the Django framework. You will also learn to carry out web, document, and server mining tasks, and build recommendation engines. Later, you will explore Python's impressive Django framework and will find out how to build a modern simple web app with machine learning features.
Instead of being overwhelmed with multiple concepts at once, this book provides a step-by-step approach that will guide you through one topic at a time.
An intuitive step-by step guide that will focus on one key topic at a time. Building upon the acquired knowledge in each chapter, we will connect the fundamental theory and practical tips by illustrative visualizations and hands-on code examples.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 277
Veröffentlichungsjahr: 2016
Copyright © 2016 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: July 2016
Production reference: 1250716
Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham B3 2PB, UK.
ISBN 978-1-78588-660-7
www.packtpub.com
Author
Andrea Isoni
Reviewers
Chetan Khatri
Pavan Kumar Kolluru
Dipanjan Sarkar
Commissioning Editor
Akram Hussain
Acquisition Editor
Sonali Vernekar
Content Development Editor
Arun Nadar
Technical Editor
Sushant S Nadkar
Copy Editor
Vikrant Phadkay
Project Coordinator
Ritika Manoj
Proofreader
Safis Editing
Indexer
Mariammal Chettiyar
Graphics
Disha Haria
Kirk D'Penha
Abhinash Sahu
Production Coordinator
Arvindkumar Gupta
Cover Work
Arvindkumar Gupta
What is machine learning? In the past year, whether it was during a conference, a seminar or an interview, a lot of people have asked me to define machine learning. There is a lot of curiosity around what it is, as human nature requires us to define something before we begin to understand what its potential impact on our lives may be, what this new thing may mean for us in the future.
Similar to other disciplines that become suddenly popular, machine learning is not new. A lot of people in the scientific community have been working with algorithms to automate repetitive activities over time for several years now. An algorithm where the parameters are fixed is called static algorithm and its output is predictable and function only of the input variables. On the other hand, when the parameters of the algorithm are dynamic and function of external factors (most frequently, the previous outputs of the same algorithm), then it is called dynamic ts output is no longer function only of the input variables and that is the founding pillar of machine learning: a set of instructions that can learn from the data generated during previous iterations to make a better output the following time.
Scientists, developers, and engineers have been dealing with fuzzy logic, neural networks, and other kinds of machine learning techniques for years, but it is only now that this discipline is becoming popular, as its applications have left the lab and are now used in marketing, sales, and finance—basically, every activity that requires the repetition of the same operation over and over again could benefit from machine learning.
The implications are easy to grasp and will have a deep impact on our society. The best way I can think of to describe what will likely happen in the next 5 to 10 years with machine learning is recalling what happened during the industrial revolution. Before the advent of the steam engine, lots of people were performing highly repetitive physical tasks, often risking their lives or their health for minimum wages; thanks to the industrial revolution, society evolved and machines took over the relevant parts of manufacturing processes, leading to improved yields, more predictable and stable outputs, improved quality of the products and new kinds of jobs, controlling the machines that were replacing physical labor. This was the first time in the history of mankind where man had delegated the responsibility for the creation of something else to a thing we had designed and invented. In the same way, machine learning will change the way data operations are performed, reducing the need of human intervention and leaving optimization to machines and algorithms. Operators will no longer have a direct control over data, but they will control algorithms that, in turn, will control data. This will allow faster execution of operations, larger datasets will be manageable by fewer people, errors will be reduced, and more stable and predictable outcomes will be guaranteed. As many things that have a deep impact on our society, it is easy to love it as it is to hate it. Lovers will praise the benefits that machine learning will drive to their lives, haters will be criticizing the fact that, in order to be effective, machine learning needs lots of iterations, hence, lots of data. Usually, the data we feed algorithms with is our own personal information.
In fact, the main applications where machine learning is taking off as a tool to improve productivity are marketing and customer support, where a deep knowledge of the customer is required to give him/her the personal service that will make the difference between a purchase or a visit or between a happy and an unhappy customer.
In marketing, for example, marketers are starting to take into consideration information, such as location, device, past purchases, what websites one has visited, weather conditions, to name just a few of the parameters that determine whether a company would decide to display its ads to a specific set of customers.
Long gone are the days of broadcasting marketing messages through untraceable media, such as TV or newspapers. Today's marketers want to know everything about who clicks and buys their products so that they can optimize creatives, spend, and allocate budget to make the best possible use of the resources at their disposal. This leads to unprecedented levels of personalization that, when exploited properly, make customers feel valued as individuals and not part of a socio-demographic group.
It is intriguing and challenging at the same time, but there is no doubt that the winners of the next decade will be those companies or individuals who can understand unstructured data and make decisions based on them in a scalable way: I see no other way than machine learning to achieve such a feat.
Andrea Isoni's book is a step into this world; reading it will be like a peek down the rabbit hole, where you'll be able to see a few applications of these techniques, mostly applied to web development, where machine learning serves to create customized websites and allow customers to see their own, optimized version of a service
If you want to futureproof your career, this is a must read; anyone dealing with data in the next decade will need to be proficient in these techniques to succeed.
Davide Cervellin, @ingdave
Head of EU Analytics at eBay
Andrea Isoni is a data scientist, PhD, and physicist professional with extensive experience in software developer positions. He has an extensive knowledge of machine learning algorithms and techniques. He also has experience with multiple languages, such as Python, C/C++, Java, JavaScript, C#, SQL, HTML, and Hadoop.
Chetan Khatri is a data science researcher who has a total of 4.6 years of experience in research and development. He works as a principal engineer, data and machine learning, at Nazara Technologies Pvt. Ltd, where he leads data science practice in the gaming business and the subscription telecom business. He has worked with a leading data company and a Big 4 company, where he managed the Data Science Practice Platform and one of the Big 4 company's resources team. Previously, he was worked with R & D Lab and Eccella Corporation. He completed his master's degree in computer science and minor data science at KSKV Kachchh University as a gold medalist.
He contributes to society in various ways, including giving talks to sophomore students at universities and giving talks on the various fields of data science in academia and at various conferences, thus helping the community by providing a data science platform. He has excellent correlative knowledge of both academic research and industry best practices. He loves to participate in Data Science Hackathons. He is one of the founding members of PyKutch—A Python Community. Currently, he is exploring deep neural networks and reinforcement learning with parallel and distributed computing for government data.
I would like to thanks Prof. Devji Chhanga, Head of the Computer Science Department, University of Kachchh, for routing me to the correct path and for his valuable guidance in the field of data science research. I would also like to thank my beloved family.
Pavan Kumar Kolluru is an interdisciplinary engineer with expertise in Big Data; digital images and processing; remote sensing (hyperspectral data and images); and programming in Python, R, and MATLAB. His major emphasis is on Big Data, using machine learning techniques, and its algorithm development.
His quest is to find a link between different disciplines so as to make their processes much easier computationally and automatic.
As a data (image and signal) processing professional/trainer, he has worked on multi/hyper spectral data, which gave him expertise in processing, information extraction, and segmentation with advanced processing using OOA, random sets, and Markov random fields.
As a programmer/trainer, he concentrates on Python and R languages, serving both the corporate and educational fraternities. He also trained various batches in Python and packages (signal, image, data analytics, and so on).
As a machine learning researcher/trainer, he has expertise in classifications (Sup and Unsup), modeling and data understanding, regressions, and data dimensionality reduction (DR). This lead him to develop a novel machine learning algorithm on Big Data (images or signals) that performs DR and classifications in a single framework in his M.Sc. research, fetching distinction marks for it. He trained engineers from various corporate giants on Big Data analysis using Hadoop and MapReduce. His expertise in Big Data analysis is in HDFS, Pig, Hive, and Spark.
Dipanjan Sarkar is an Data Scientist at Intel, the world's largest silicon company which is on a mission to make the world more connected and productive. He primarily works on analytics, business intelligence, application development, and building large scale intelligent systems. He received his Master's degree in Information Technology from the International Institute of Information Technology, Bangalore. His area of specialization includes software engineering, data science, machine learning, and text analytics.
Dipanjan's interests include learning about new technology, disruptive start-ups, data science, and more recently deep learning. In his spare time he loves reading, writing, gaming, and watching popular sitcoms. He has authored a book on Machine Learning titled R Machine Learning by Example,Packt Publishing and also acted as a technical reviewer for several books on machine learning and Data Science from Packt Publishing.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www2.packtpub.com/books/subscription/packtlib
Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.
Data science and machine learning in particular are emerging as leading topics in the tech commercial environment to evaluate the always increasing amount of data generated by the users. This book will explain how to use Python to develop a web commercial application using Django and how to employ some specific libraries (sklearn, scipy, nltk, Django, and some others) to manipulate and analyze (through machine learning techniques) data that is generated or used in the application.
Chapter 1, Introduction to Practical Machine Learning Using Python, discusses the main machine learning concepts together with the libraries used by data science professionals to handle the data in Python.
Chapter 2, Machine Learning Techniques – Unsupervised Learning, describes the algorithms used to cluster datasets and to extract the main features from the data.
Chapter 3, Supervised Machine Learning, presents the most relevant supervised algorithms to predict the labels of a dataset.
Chapter 4, Web Mining Techniques, discusses the main techniques to organize, analyze, and extract information from web data
Chapter 5, Recommendation Systems, covers the most popular recommendation systems used in a commercial environment to date in detail.
Chapter 6, Getting Started with Django, introduces the main Django features and characteristics to develop a web application.
Chapter 7, Movie Recommendation System Web Application, describes an example to put in practice the machine learning concepts developed in Chapter 5, Recommendation Systems and Chapter 6, Getting Started with Django, recommending movies to final web users.
Chapter 8, Sentiment Analyser Application on Movie Reviews, covers another example to use the knowledge explained in Chapter 3, Supervised Machine Learning, Chapter 4, Web Mining Techniques, and Chapter 6, Getting Started with Django, analyzing the sentiment of the movies' reviews online and their importance.
The reader should have a computer with Python 2.7 installed to be able to run (and modify) the code discussed throughout the chapters.
Any person with some programming (in Python) and statistics background who is curious about machine learning and/or pursuing a career in data science will benefit from reading this book.
Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail <[email protected]>, and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
You can also download the code files by clicking on the Code Files button on the book's webpage at the Packt Publishing website. This page can be accessed by entering the book's name in the Search box. Please note that you need to be logged in to your Packt account.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Machine-Learning-for-the-Web. We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from http://www.packtpub.com/sites/default/files/downloads/Machine-Learning-for-the-Web.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books—maybe a mistake in the text or the code—we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at <[email protected]> with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at <[email protected]>, and we will do our best to address the problem.
