139,99 €
Information is an element of knowledge that can be stored, processed or transmitted. It is linked to concepts of communication, data, knowledge or representation. In a context of steady increase in the mass of information it is difficult to know what information to look for and where to find them. Computer techniques exist to facilitate this research and allow relevant information extraction. Recommendation systems introduced the notions inherent to the recommendation, based, inter alia, information search, filtering, machine learning, collaborative approaches. It also deals with the assessment of such systems and has various applications.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 116
Veröffentlichungsjahr: 2015
Cover
Title
Copyright
Introduction
1 A Few Important Details Before We Begin
1.1. Information systems
1.2. Decision support systems
1.3. Recommender systems
1.4. Comparisons
1.5. Recommendation versus personalization
2 Recommender Systems
2.1. Introduction
2.2. Classification of recommender systems
2.3. User profiles
2.4. Data mining
2.5. Content-based approaches
2.6. Collaborative filtering approaches
2.7. Knowledge-based approaches
2.8. Hybrid approaches
2.9. Other approaches
3 Key Concepts, Useful Measures and Techniques
3.1. Vector space model
3.2. Similarity measures
3.3. Dimensionality reduction
3.4. Classification/clustering
3.5. Other techniques
3.6. Comparisons
4 Practical Implementations
4.1. Commercial applications
4.2. Databases
4.3. Collaborative environments
4.4. Smart cities
4.5. Early warning systems
5 Evaluating the Quality of Recommender Systems
5.1. Data sets, sparsity and errors
5.2. Measures
Conclusion
Bibliography
Index
End User License Agreement
1: A Few Important Details Before We Begin
Table 1.1.
Comparison table: operational information systems, decision support systems and recommender systems
2: Recommender Systems
Table 2.1.
Extract from the user profile for Marie
Table 2.2.
Extract from a book catalog
Table 2.3.
Extract from user profile: Marie
Table 2.4.
Matches between book characteristics and Marie’s preferences (profile)
Table 2.5.
Advantages and disadvantages of different recommendation approaches
3: Key Concepts, Useful Measures and Techniques
Table 3.1.
Context of the approaches presented in this book
Table 3.2.
Advantages and disadvantages of the vector model and syntactic approaches
4: Practical Implementations
Table 4.1.
Values for cities Smallville, Metropolis and Gotham
Table 4.2.
Value intervals for Gotham
5: Evaluating the Quality of Recommender Systems
Table 5.1.
Data sets
Table 5.2.
Types of error [JAN 10]
2: Recommender Systems
Figure 2.1.
The recommender system seen as a black box [JAN 10]
Figure 2.2.
Stages and methods used in approaches based on data mining [RIC 11]
Figure 2.3.
Content-based recommender system seen as a black box [JAN 10]
Figure 2.4.
Collaborative filtering recommender system seen as a black box [JAN 10]
Figure 2.5.
Knowledge-based recommender system seen as a black box [JAN 10]
Figure 2.6.
Hybrid recommender system seen as a black box [JAN 10]
Figure 2.7.
Monolithic hybridization design [JAN 10]
Figure 2.8.
Parallelized hybridization design [JAN 10]
Figure 2.9.
Pipelined hybridization design [JAN 10]
Figure 2.10.
Representation of scores as a bipartite graph
3: Key Concepts, Useful Measures and Techniques
Figure 3.1.
Example of a decision tree [JAN 10]
4: Practical Implementations
Figure 4.1.
“Your Amazon” on Amazon.com
Figure 4.2.
In-cart recommendations on Amazon.com
Cover
Table of Contents
Begin Reading
cover
contents
iii
iv
vii
viii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
coordinated byCamille Rosenthal-Sabroux
Volume 4
Elsa Negre
First published 2015 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
© ISTE Ltd 2015
The rights of Elsa Negre to be identified as the author of this work have been asserted by her in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2015948079
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-84821-754-6
The development of Web and communications technologies since the early 1990s has facilitated the generation of initiatives aiming to create opportunities for communication and information sharing. Information and data are increasingly present in our daily lives. This constant flux is often the result of developments in Information and Communication Technologies (ICT)1. Moreover, the possibilities offered by ICT, which have increased almost exponentially, have given rise to a massive volume of data requiring processing [BAT 13]. The world is increasingly “digital” and individuals are increasingly affected by these changes. The digital infrastructure has resulted in the creation of an information environment that is “as imperceptible to us as water is to a fish” [MCL 11]. A type of parallel exists between humans and technology: on the one hand, individuals are making increasing use of technology and becoming “hyper-connected”, on the other hand, digital systems are becoming increasingly user-centered [VII 14].
Systems therefore need to allow users to synthesize information and to explore data. Data exploration is a process focused on the search for relevant information within a set of data, intended to detect hidden correlations or new information. In the current context of “information overload”, and with the increase in calculation and storage capacity, it is difficult to know exactly what information to look for and where to look for it. There is therefore a need for computing techniques that make this search, and the extraction of relevant information, easier. A technique that may be used is recommendation.
The key question concerns the way to guide users in their exploration of data in order to find relevant information.
The recommendation process aims to guide users in their exploration of the large quantities of data available by identifying relevant information. It constitutes a specific form of information filtering, intended to present information items (films, music, books, images, Websites, etc.) that are likely to be of interest to the user. In general, the recommendation process aims to predict the user’s “opinion” of each item, based on certain reference characteristics, and to recommend those items with the best “opinion” rating.
This book is structured as follows:
Chapter 1 introduces the notions inherent in systems that handle data and information. It aims to clarify ambiguities associated with information systems, decision support systems and recommender systems, before establishing a clear distinction between recommendation and personalization.
Chapter 2 presents the most widespread approaches used in presenting recommendations to users: content-based approaches, collaborative approaches, knowledge-based approaches and hybrid approaches.
Chapter 3 describes the different techniques used in recommender systems (similarities between users or items, analysis of relationships between users or items, classification of users or items, etc.).
The concepts presented in Chapters 1, 2 and 3 are illustrated in Chapter 4, showing how recommendation approaches and the associated techniques are used and implemented in practice across a variety of domains.
"Chapter 5 presents different ways in which the quality of recommender systems can be evaluated.
Finally, the conclusion provides a summary of the book, with a presentation of the current challenges that need to be tackled.
Note that this book does not claim to provide an exhaustive and detailed list of all possible approaches and techniques, but it constitutes an introduction and overview of recommender systems and the way in which they operate.
1
The notions of Information and Communication Technologies (ICT) and New Information and Communication Technologies (NICT) include techniques associated with computing, audiovisual, multimedia, the Internet and telecommunications, allowing users to communicate, access information sources and store, process, produce and transmit information in a variety of forms: text, music, sound, image, video and interactive graphical interfaces [WIK 15a].
Savoir pour prévoir, afin de pouvoir (Know in order to predict, and thus to act)Auguste Comte, Course of Positive Philosophy, 1830
In computer science, the concept of information can have multiple meanings. However, most people agree that information is an item of knowledge that may be conserved, processed or communicated, and is thus linked to notions of communication, data, knowledge, meaning, representation, etc.
In this chapter, we aim to remove the ambiguities surrounding the terms “information system”, “decision support system” and “recommender system”, clearly establishing the relative positions of these different systems.
For sustainable development, organizations need to respond to two key challenges: (i) management of an increasingly large quantity of data (both internal and external), enabling increasingly easy access, and (ii) transformation of this quantity of data into information that is useful for efficient accomplishment of their actions, while adapting to a continuously evolving environment.
An information system is an organized set of hardware, software, human resources, data, procedures, etc., which are used to collect, regroup, categorize, process and disseminate information in a given environment [DEC 92]. The general aim of an information system is thus to support an organization in achieving its objectives (essentially of a strategic nature).
Information systems are traditionally grouped into three types: design systems (Computer-Aided Design (CAD), etc.), industrial systems (management of machines, industrial process control, etc.) and management systems (marketing, human resources, etc.). We focus on the third type of system, which can also be split into two subcategories: operational information systems (used to carry out operations) and decision support systems.
Information therefore needs to be robust and durable, as it has an influence on company strategy, but also be able to evolve and adapt for different collaborators, processes, etc. This requirement often leads to the automation of decisions, in operational terms, and predictive analysis of evolutions, for strategic purposes. Real-time knowledge of both past and present situations is a key factor in ensuring the strategic success of companies.
Further details on information systems may be found in [ROS 09] (in French) and [STA 92] (in English).
Unlike operational (or transactional) systems, which are specific to a company’s activities and are intended to assist with everyday management tasks, decision support systems are used to facilitate the definition and implementation of strategies. However, the goal is not to define a strategy once and for all, but to be able to adapt to an environment in a continuous manner and in a better way than the competition. Traditional decision support systems are used to analyze activities that have already been performed in order to obtain information relevant to future activities; to do this, they use more or less recent information (in the best cases, this is updated daily). More advanced decision support systems manage more recent information (some are updated in quasi-real time), automate the decision process and provide real-time operational support (Internet call centers, for example) [BRU 11].
One of the best-known concepts encountered in decision support systems is the data warehouse. Often considered to be the core of any decision support system, a data warehouse integrates and stores significant volumes of data from a wide variety of sources:
– internal sources: software packages (Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), etc.), databases, files, Web services, etc.;
– external sources (clients, suppliers, etc.);
– non-computerized sources (letters, memoranda, minutes of meetings, etc.);
in order to make these data easily accessible for querying and for the purposes of decision analysis. The data warehouse is defined as “a subject-oriented, integrated, time-variant, nonvolatile collection of data in support of managements decision-making process” [INM 94].
Further details on decision support systems and data warehouses may be found in [FER 13] (in French) and [KIM 02] (in English).
Data exploration is a process that involves searching for relevant information, within a set of data, with the intention of detecting hidden correlations or new information. Users face ever-increasing quantities of information, due to increased calculation and storage capacity [LYM 03]1, which makes it increasingly difficult to know exactly what information to look for and where to look for it.
There is therefore a need for IT (Information Technology) techniques to facilitate this search process, along with the extraction of relevant information. One of these techniques is information recommendation. The main aim of this technique is to guide users in their exploration of data in order to obtain relevant information. This is done through the use of recommendation tools, with the intention of providing users with relevant information as quickly as possible. The recommendation process guides users in their exploration of quantities of available information by identifying items that appear to be relevant. This technique represents a specific form of information filtering, aiming to present items (movies, music, books, news, images, Websites, etc.) that are likely to be of interest to the user. Generally, based on certain reference characteristics 2 , the recommendation process aims to predict the “opinion” a user will have of each item and to recommend items with the best predicted “opinion”.
