Building a Recommendation Engine with Scala - Saleem Ansari - E-Book

Building a Recommendation Engine with Scala E-Book

Saleem Ansari

0,0
31,19 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Learn to use Scala to build a recommendation engine from scratch and empower your website users

About This Book

  • Learn the basics of a recommendation engine and its application in e-commerce
  • Discover the tools and machine learning methods required to build a recommendation engine
  • Explore different kinds of recommendation engines using Scala libraries such as MLib and Spark

Who This Book Is For

This book is written for those who want to learn the different tools in the Scala ecosystem to build a recommendation engine. No prior knowledge of Scala or recommendation engines is assumed.

What You Will Learn

  • Discover the tools in the Scala ecosystem
  • Understand the challenges faced in e-commerce systems and learn how you can solve those challenges with a recommendation engine
  • Familiarise yourself with machine learning algorithms provided by the Apache Spark framework
  • Build different versions of recommendation engines from practical code examples
  • Enhance the user experience by learning from user feedback
  • Dive into the various techniques of recommender systems such as collaborative, content-based, and cross-recommendations

In Detail

With an increase of data in online e-commerce systems, the challenges in assisting users with narrowing down their search have grown dramatically. The various tools available in the Scala ecosystem enable developers to build a processing pipeline to meet those challenges and create a recommendation system to accelerate business growth and leverage brand advocacy for your clients.

This book provides you with the Scala knowledge you need to build a recommendation engine.

You'll be introduced to Scala and other related tools to set the stage for the project and familiarise yourself with the different stages in the data processing pipeline, including at which stages you can leverage the power of Scala and related tools. You'll also discover different machine learning algorithms using MLLib.

As the book progresses, you will gain detailed knowledge of what constitutes a collaborative filtering based recommendation and explore different methods to improve users' recommendation.

Style and approach

A step-by-step guide full of real-world, hands-on examples of Scala recommendation engines. Each example is placed in context with explanation and visuals.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 156

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Building a Recommendation Engine with Scala
Credits
About the Author
About the Reviewers
www.PacktPub.com
Support files, eBooks, discount offers, and more
Why subscribe?
Free access for Packt account holders
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Errata
Piracy
Questions
1. Introduction to Scala and Machine Learning
Setting up Scala, SBT, and Apache Spark
A quick introduction to Scala
Case classes
Tuples
Scala REPL
SBT – Scala Build Tool
Apache Spark
Setting up a standalone Apache Spark cluster
Apache Spark – MLlib
Machine learning and recommendation engines
Summary
2. Data Processing Pipeline Using Scala
Entree – a sample dataset for recommendation systems
Data analysis of the Entree dataset
ETL – extract transform load
Extract
Transform
Load
Extraction and transformation for machine learning
Types of data
Discrete
Continuous
Categorical
Cleaning the data
Missing data
Normalization
Standardization
Setting up MongoDB and Apache Kafka
Setting up MongoDB
Setting up Apache Kafka
Data processing pipeline for Entree
How does it relate to information retrieval?
Summary
3. Conceptualizing an E-Commerce Store
Importance of recommender systems in e-commerce
Converting browsers into buyers
Making cross-sell happen
Increased loyalty time
Types of recommendation methods
Frequently bought together
An example of frequent patterns
People to people correlation
Customer reviews and ratings
People who were also interested in other similar items
Recommendation from others' views
Example of similar items
Manual
Automatic
Ephemeral
Persistent
The architecture of the project
Batch versus online
Summary
4. Machine Learning Algorithms
Hands on with Spark/MLlib
Data types
Vector
Matrix
Labeled point
Statistics
Summary statistics
Correlation
Sampling
Hypothesis testing
Random data generation
Feature extraction and transformation
Term frequency-inverted document frequency (TF-IDF)
Word2Vec
StandardScaler
Normalizer
Feature selection
Dimensionality reduction
Classification/regression
Linear methods
Naive Bayes
Decision trees
Ensembles
Clustering
K-Means
Expectation-maximization
Power iteration clustering
Latent Dirichlet Allocation
LDA example
Association analysis
Frequent pattern mining (FPGrowth)
Summary
5. Recommendation Engines and Where They Fit in?
Populating the Amazon dataset
Creating a web app with user/product pages
Creating a Play framework application
The home page
Product Groups
Product view
Customer views
Adding recommendation pages
The Top Rated view
The Most Popular view
The Monthly Trends view
Summary
6. Collaborative Filtering versus Content-Based Recommendation Engines
Content-based recommendation
Similarity measures
Pearson correlation
Challenges with Pearson correlation
Euclidean distance
Challenges with Euclidean distance
Cosine measure
Spearman correlation
Tanimoto coefficient
Log likelihood test
Content-based recommendation steps
Clustering for performance
Collaborative filtering based recommendation
What is ALS?
ALS in Apache Spark
ALS on Amazon ratings
Content-based versus collaborative filtering
Summary
7. Enhancing the User Experience
Adding product search
Setting up Elasticsearch
Adding recommendation listings
Understanding recommendation behavior
Why is that so?
Logging
Ranking
Diversification
Justification
Evaluation
Summary
8. Learning from User Feedback
Introducing PredictionIO
Installing PredictionIO
Unified recommender
Summary
Index

Building a Recommendation Engine with Scala

Building a Recommendation Engine with Scala

Copyright © 2016 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: January 2016

Production reference: 1281215

Published by Packt Publishing Ltd.

Livery Place

35 Livery Street

Birmingham B3 2PB, UK.

ISBN 978-1-78528-258-4

www.packtpub.com

Credits

Author

Saleem Ansari

Reviewers

Eric Le Goff

Andrii Kravets

Loránd Szakács

Commissioning Editor

Nadeem Bagban

Acquisition Editor

Vinay Argekar

Content Development Editor

Zeeyan Pinheiro

Technical Editor

Siddhi Rane

Copy Editor

Ting Baker

Project Coordinator

Suzanne Coutinho

Proofreader

Safis Editing

Indexer

Rekha Nair

Graphics

Kirk D'Penha

Production Coordinator

Manu Joseph

Cover Work

Manu Joseph

About the Author

Saleem Ansari is a full-stack developer with over 8 years of industry experience. He has a special interest in machine learning and information retrieval. Having implemented data ingestion and a processing pipeline in Core Java and Ruby separately, he knows the challenges faced by huge data sets in such systems. He has worked for companies such as Red Hat, Impetus Technologies, Belzabar Software, and Exzeo Software. He is also a passionate member of free and open source software (FOSS) community. He started his journey with FOSS in the year 2004. The very next year, he formed JMILUG—Linux Users Group at Jamia Millia Islamia University, New Delhi. Since then, he has been contributing to FOSS by organizing community activities and contributing code to various projects (for more information, visit http://github.com/tuxdna). He also mentors students about FOSS and its benefits.

In 2015, he reviewed two books related to Apache Mahout, namely Learning Apache Mahout and Apache Mahout Essentials; both the books were produced by Packt Publishing.

He blogs at http://tuxdna.in/ and can be reached at <[email protected]> via e-mail.

I dedicate this book to my parents.

I would like to acknowledge the amazing people who have helped me push forward while writing this book. First off, I would like to thank Vinay Argekar and Zeeyan Pinheiro from Packt Publishing, who have been of immense help and guidance right from the beginning of this book. I would like to especially thank the reviewers, Eric Le Goff and Andrii Kravets. I wouldn't have leveled up the content if I had not received their critical reviews and suggestions. So much kudos to you guys! I would like to give another special mention to Pat Ferrel from the Apache Mahout and PredictionIO project. He helped me understand the unified recommender algorithm that is mentioned in the book.

All the appreciations are due to my family and friends, who have been supportive while I was writing this book.

About the Reviewers

Eric Le Goff is a senior developer and an open source evangelist. Located in Bordeaux, France, he has more than 15 years of experience in large-scale system designs and server-side developments in both start-ups and established corporations, from digital signature solutions to financial institutions and risk management.

A former board member at the OW2 consortium (an international open source community for infrastructure), he is also a Scala enthusiast with Coursera certifications such as Functional Programming Principles in Scala and Principles of Reactive Programming.

He is also passionate about NoSql solutions (M101J: MongoDB for Java developers certified).

He has reviewed the book Scala for Java Developers, Packt Publishing.

First, thanks goes to my wife, Corine, who constantly supports everything that I undertake. I also would like to include all the contributors and the open source community at large. Finally, I'd like to thank Martin Odersky and his team for creating Scala.

Andrii Kravets is a highly motivated, agile-minded engineer with more than 5 years of experience in software development and software project management who wants to make the world better. He has a lot of experience with high-loaded distributed projects, big data, JVM languages, machine learning, and building complex web solutions.

He is currently making the world better at TransferWise.

www.PacktPub.com

Support files, eBooks, discount offers, and more

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at <[email protected]> for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www2.packtpub.com/books/subscription/packtlib

Do you need instant solutions to your IT questions? PacktLib is Packt's online digital book library. Here, you can search, access, and read Packt's entire library of books.

Why subscribe?

Fully searchable across every book published by PacktCopy and paste, print, and bookmark contentOn demand and accessible via a web browser

Free access for Packt account holders

If you have an account with Packt at www.PacktPub.com, you can use this to access PacktLib today and view 9 entirely free books. Simply use your login credentials for immediate access.

Preface

With the growth of the Internet and the widespread adoption of e-commerce and social media, a lot of new services have arrived in recent years. We shop online, we communicate online, we stay up-to-date online, and so on. We have a huge growth of data, and this has made it increasingly tough for service providers to provide only the relevant data. Recommendation engines help us provide only the relevant data to a consumer.

In this book, we will use the Scala programming language and the many tools that are available in its ecosystem, such as Apache Spark, Play Framework, Spray, Kafka, PredictionIO, to build a recommendation engine. We will reach that stage step by step with a real world dataset and a fully functional application that gives readers a hands-on experience. We have discussed the key topics in detail for readers to get started on their own. You will learn the challenges and approaches used to build a recommendation engine.

You must have some understanding of the Scala programming language, SBT, and command-line tools. An understanding of different machine learning and data processing concepts is beneficial but not required. You will learn the tools necessary for writing data-munging programs and experimenting using Scala.

What this book covers

Chapter 1, Introduction to Scala and Machine Learning, is a fast-paced introduction to Scala, SBT, Spark, MLlib, and other related tools. We basically set the stage for the upcoming experiments.

Chapter 2, Data Processing Pipeline Using Scala, explores ways to compose a data processing pipeline using Scala. We do this by taking a sample dataset from the recommendation system and then building the pipeline.

Chapter 3, Conceptualizing an E-Commerce Store, discusses the need for a recommendation engine. We discuss different ways in which we can present recommendations; we will also explore the architecture of our project.

Chapter 4, Machine Learning Algorithms, discusses some machine learning algorithms that are relevant while building different aspects of a recommender system. We will also have hands-on exercises dealing with Apache Spark's MLlib library.

Chapter 5, Recommendation Engines and Where They Fit in?, implements our first recommender system on a dataset for products. We will continue by populating the dataset, creating a web application, and adding recommendation pages and product/customer trends.

Chapter 6, Collaborative Filtering versus Content-Based Recommendation Engines, focuses on tuning the recommendations that are user-specific, rather than being global in nature. We will implement the content-based recommendation and collaborative filtering-based recommendations. Then, we will compare these two approaches.

Chapter 7, Enhancing the User Experience, discusses some tricks that add more spice to the overall user experience. We will add product search and recommendations listing and also discuss recommendation behavior.

Chapter 8, Learning from User Feedback, discusses a case study of PredictionIO. We will have a look at a hybrid recommender called unified recommender that is implemented using PredictionIO.

What you need for this book

Before you start reading this book, ensure that you have all the necessary software installed. The prerequisites for this book are as follows:

Java: http://www.oracle.com/technetwork/java/javase/downloads/jdk7-downloads-1880260.htmlScala: http://www.scala-lang.org/download/SBT: http://www.scala-sbt.org/download.htmlMongoDB: http://www.mongodb.org/downloadsApache Spark: https://spark.apache.org/downloads.html

Who this book is for

This book is intended for those developers who are keen on understanding how a recommender system is built from scratch. It is assumed that you have a basic understanding of the Scala programming language and you can also handle regular data-munging tasks.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book—what you liked or may have disliked. Reader feedback is important for us to develop titles that you really get the most out of.

To send us general feedback, simply send an e-mail to <[email protected]>, and mention the book title via the subject of your message.