E-Book
31,19 €

Building a Recommendation Engine with Scala E-Book

Saleem Ansari

0,0

31,19 €

Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.

Herausgeber: Packt Publishing
Kategorie: Wissenschaft und neue Technologien
Sprache: Englisch

Beschreibung

Learn to use Scala to build a recommendation engine from scratch and empower your website users

About This Book

Learn the basics of a recommendation engine and its application in e-commerce
Discover the tools and machine learning methods required to build a recommendation engine
Explore different kinds of recommendation engines using Scala libraries such as MLib and Spark

Who This Book Is For

This book is written for those who want to learn the different tools in the Scala ecosystem to build a recommendation engine. No prior knowledge of Scala or recommendation engines is assumed.

What You Will Learn

Discover the tools in the Scala ecosystem
Understand the challenges faced in e-commerce systems and learn how you can solve those challenges with a recommendation engine
Familiarise yourself with machine learning algorithms provided by the Apache Spark framework
Build different versions of recommendation engines from practical code examples
Enhance the user experience by learning from user feedback
Dive into the various techniques of recommender systems such as collaborative, content-based, and cross-recommendations

In Detail

With an increase of data in online e-commerce systems, the challenges in assisting users with narrowing down their search have grown dramatically. The various tools available in the Scala ecosystem enable developers to build a processing pipeline to meet those challenges and create a recommendation system to accelerate business growth and leverage brand advocacy for your clients.

This book provides you with the Scala knowledge you need to build a recommendation engine.

You'll be introduced to Scala and other related tools to set the stage for the project and familiarise yourself with the different stages in the data processing pipeline, including at which stages you can leverage the power of Scala and related tools. You'll also discover different machine learning algorithms using MLLib.

As the book progresses, you will gain detailed knowledge of what constitutes a collaborative filtering based recommendation and explore different methods to improve users' recommendation.

Style and approach

A step-by-step guide full of real-world, hands-on examples of Scala recommendation engines. Each example is placed in context with explanation and visuals.

Details

Sie lesen das E-Book in den Legimi-Apps auf:

Android

iOS

von Legimi
zertifizierten E-Readern

Seitenzahl: 156

Veröffentlichungsjahr: 2016

Bewertungen

0,0

Rezensionen(0 Rezensionen)

Leseprobe

Building a Recommendation Engine with Scala

Credits

About the Author

About the Reviewers

www.PacktPub.com

Support files, eBooks, discount offers, and more

Why subscribe?

Free access for Packt account holders

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Errata

Piracy

Questions

1. Introduction to Scala and Machine Learning

Setting up Scala, SBT, and Apache Spark

A quick introduction to Scala

Case classes

Tuples

Scala REPL

SBT – Scala Build Tool

Apache Spark

Setting up a standalone Apache Spark cluster

Apache Spark – MLlib

Machine learning and recommendation engines

Summary

2. Data Processing Pipeline Using Scala

Entree – a sample dataset for recommendation systems

Data analysis of the Entree dataset

ETL – extract transform load

Extract

Transform

Load

Extraction and transformation for machine learning

Types of data

Discrete

Continuous

Categorical

Cleaning the data

Missing data

Normalization

Standardization

Setting up MongoDB and Apache Kafka

Setting up MongoDB

Setting up Apache Kafka

Data processing pipeline for Entree

How does it relate to information retrieval?

Summary

3. Conceptualizing an E-Commerce Store

Importance of recommender systems in e-commerce

Converting browsers into buyers

Making cross-sell happen

Increased loyalty time

Types of recommendation methods

Frequently bought together

An example of frequent patterns

People to people correlation

Customer reviews and ratings

People who were also interested in other similar items

Recommendation from others' views

Example of similar items

Manual

Automatic

Ephemeral

Persistent

The architecture of the project

Batch versus online

Summary

4. Machine Learning Algorithms

Hands on with Spark/MLlib

Data types

Vector

Matrix

Labeled point

Statistics

Summary statistics

Correlation

Sampling

Hypothesis testing

Random data generation

Feature extraction and transformation

Term frequency-inverted document frequency (TF-IDF)

Word2Vec

StandardScaler

Normalizer

Feature selection

Dimensionality reduction

Classification/regression

Linear methods

Naive Bayes

Decision trees

Ensembles

Clustering

K-Means

Expectation-maximization

Power iteration clustering

Latent Dirichlet Allocation

LDA example

Association analysis

Frequent pattern mining (FPGrowth)

Summary

5. Recommendation Engines and Where They Fit in?

Populating the Amazon dataset

Creating a web app with user/product pages

Creating a Play framework application

The home page

Product Groups

Product view

Customer views

Adding recommendation pages

The Top Rated view

The Most Popular view

The Monthly Trends view

Summary

6. Collaborative Filtering versus Content-Based Recommendation Engines

Content-based recommendation

Similarity measures

Pearson correlation

Challenges with Pearson correlation

Euclidean distance

Challenges with Euclidean distance

Cosine measure

Spearman correlation

Tanimoto coefficient

Log likelihood test

Content-based recommendation steps

Clustering for performance

Collaborative filtering based recommendation

What is ALS?

ALS in Apache Spark

ALS on Amazon ratings

Content-based versus collaborative filtering

Summary

7. Enhancing the User Experience

Adding product search

Setting up Elasticsearch

Adding recommendation listings

Understanding recommendation behavior

Why is that so?

Logging

Ranking

Diversification

Justification

Evaluation

Summary

8. Learning from User Feedback

Introducing PredictionIO

Installing PredictionIO

Unified recommender

Summary

Index

Building a Recommendation Engine with Scala

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2016

Production reference: 1281215

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78528-258-4

www.packtpub.com

Credits

Author

Saleem Ansari

Reviewers

Eric Le Goff

Andrii Kravets

Loránd Szakács

Commissioning Editor

Nadeem Bagban

Acquisition Editor

Vinay Argekar

Content Development Editor

Zeeyan Pinheiro

Technical Editor

Siddhi Rane

Copy Editor

Ting Baker

Project Coordinator

Suzanne Coutinho

Proofreader

Safis Editing

Indexer

Rekha Nair

Graphics

Kirk D'Penha

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

About the Author

Saleem Ansari is a full-stack developer with over 8 years of industry experience. He has a special interest in machine learning and information retrieval. Having implemented data ingestion and a processing pipeline in Core Java and Ruby separately, he knows the challenges faced by huge data sets in such systems. He has worked for companies such as Red Hat, Impetus Technologies, Belzabar Software, and Exzeo Software. He is also a passionate member of free and open source software (FOSS) community. He started his journey with FOSS in the year 2004. The very next year, he formed JMILUG—Linux Users Group at Jamia Millia Islamia University, New Delhi. Since then, he has been contributing to FOSS by organizing community activities and contributing code to various projects (for more information, visit http://github.com/tuxdna). He also mentors students about FOSS and its benefits.

In 2015, he reviewed two books related to Apache Mahout, namely Learning Apache Mahout and Apache Mahout Essentials; both the books were produced by Packt Publishing.

He blogs at http://tuxdna.in/ and can be reached at <[email protected]> via e-mail.

I dedicate this book to my parents.

I would like to acknowledge the amazing people who have helped me push forward while writing this book. First off, I would like to thank Vinay Argekar and Zeeyan Pinheiro from Packt Publishing, who have been of immense help and guidance right from the beginning of this book. I would like to especially thank the reviewers, Eric Le Goff and Andrii Kravets. I wouldn't have leveled up the content if I had not received their critical reviews and suggestions. So much kudos to you guys! I would like to give another special mention to Pat Ferrel from the Apache Mahout and PredictionIO project. He helped me understand the unified recommender algorithm that is mentioned in the book.

All the appreciations are due to my family and friends, who have been supportive while I was writing this book.

About the Reviewers

Eric Le Goff is a senior developer and an open source evangelist. Located in Bordeaux, France, he has more than 15 years of experience in large-scale system designs and server-side developments in both start-ups and established corporations, from digital signature solutions to financial institutions and risk management.

A former board member at the OW2 consortium (an international open source community for infrastructure), he is also a Scala enthusiast with Coursera certifications such as Functional Programming Principles in Scala and Principles of Reactive Programming.

He is also passionate about NoSql solutions (M101J: MongoDB for Java developers certified).

He has reviewed the book Scala for Java Developers, Packt Publishing.

First, thanks goes to my wife, Corine, who constantly supports everything that I undertake. I also would like to include all the contributors and the open source community at large. Finally, I'd like to thank Martin Odersky and his team for creating Scala.

Andrii Kravets is a highly motivated, agile-minded engineer with more than 5 years of experience in software development and software project management who wants to make the world better. He has a lot of experience with high-loaded distributed projects, big data, JVM languages, machine learning, and building complex web solutions.

He is currently making the world better at TransferWise.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

With the growth of the Internet and the widespread adoption of e-commerce and social media, a lot of new services have arrived in recent years. We shop online, we communicate online, we stay up-to-date online, and so on. We have a huge growth of data, and this has made it increasingly tough for service providers to provide only the relevant data. Recommendation engines help us provide only the relevant data to a consumer.

In this book, we will use the Scala programming language and the many tools that are available in its ecosystem, such as Apache Spark, Play Framework, Spray, Kafka, PredictionIO, to build a recommendation engine. We will reach that stage step by step with a real world dataset and a fully functional application that gives readers a hands-on experience. We have discussed the key topics in detail for readers to get started on their own. You will learn the challenges and approaches used to build a recommendation engine.

You must have some understanding of the Scala programming language, SBT, and command-line tools. An understanding of different machine learning and data processing concepts is beneficial but not required. You will learn the tools necessary for writing data-munging programs and experimenting using Scala.

What this book covers

Chapter 1, Introduction to Scala and Machine Learning, is a fast-paced introduction to Scala, SBT, Spark, MLlib, and other related tools. We basically set the stage for the upcoming experiments.

Chapter 2, Data Processing Pipeline Using Scala, explores ways to compose a data processing pipeline using Scala. We do this by taking a sample dataset from the recommendation system and then building the pipeline.

Chapter 3, Conceptualizing an E-Commerce Store, discusses the need for a recommendation engine. We discuss different ways in which we can present recommendations; we will also explore the architecture of our project.

Chapter 4, Machine Learning Algorithms, discusses some machine learning algorithms that are relevant while building different aspects of a recommender system. We will also have hands-on exercises dealing with Apache Spark's MLlib library.

Chapter 5, Recommendation Engines and Where They Fit in?, implements our first recommender system on a dataset for products. We will continue by populating the dataset, creating a web application, and adding recommendation pages and product/customer trends.

Chapter 6, Collaborative Filtering versus Content-Based Recommendation Engines, focuses on tuning the recommendations that are user-specific, rather than being global in nature. We will implement the content-based recommendation and collaborative filtering-based recommendations. Then, we will compare these two approaches.

Chapter 7, Enhancing the User Experience, discusses some tricks that add more spice to the overall user experience. We will add product search and recommendations listing and also discuss recommendation behavior.

Chapter 8, Learning from User Feedback, discusses a case study of PredictionIO. We will have a look at a hybrid recommender called unified recommender that is implemented using PredictionIO.

What you need for this book

Before you start reading this book, ensure that you have all the necessary software installed. The prerequisites for this book are as follows:

Java: http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.htmlScala: http://www.scala-lang.org/download/SBT: http://www.scala-sbt.org/download.htmlMongoDB: http://www.mongodb.org/downloadsApache Spark: https://spark.apache.org/downloads.html

Who this book is for

This book is intended for those developers who are keen on understanding how a recommender system is built from scratch. It is assumed that you have a basic understanding of the Scala programming language and you can also handle regular data-munging tasks.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.