Recommender Systems - Gerald Kembellec - E-Book

Recommender Systems E-Book

Gérald Kembellec

0,0
139,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Acclaimed by various content platforms (books, music, movies) and auction sites online, recommendation systems are key elements of digital strategies. If development was originally intended for the performance of information systems, the issues are now massively moved on logical optimization of the customer relationship, with the main objective to maximize potential sales. On the transdisciplinary approach, engines and recommender systems brings together contributions linking information science and communications, marketing, sociology, mathematics and computing. It deals with the understanding of the underlying models for recommender systems and describes their historical perspective. It also analyzes their development in the content offerings and assesses their impact on user behavior.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 344

Veröffentlichungsjahr: 2014

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Preface

1 General Introduction to Recommender Systems

1.1. Putting it into perspective

1.2. An interdisciplinary subject

1.3. The fundamentals of algorithms.

1.4. Content offers and recommender systems

1.5. Current issues

1.6. Bibliography

2 Understanding Users’ Expectations For Recommender Systems: the Case of Social Media

2.1. Introduction: the omnipresence of recommender systems

2.2. The social approach to prescription.

2.3. Users who do not focus on the prescriptions of platforms

2.4. A guide for considering recommender systems adapted to different forms of social media

2.5. Conclusion

2.6. Bibliography

3 Recommender Systems and Social Networks: what are the Implications for Digital Marketing?

3.1. Social recommendations: an ancient practice revived by the digital age

3.2. Social recommendations: how are they used for e-commerce?

3.3. Conclusion

3.4. Bibliography

4 Recommender Systems and Diversity: Taking Advantage of the Long Tail and the Diversity of Recommendation Lists

4.1. The stakes associated with diversity within recommender systems

4.2. Recommendation algorithms and diversity: trends, evaluation and optimization

4.3. Conclusion and new directions

4.4. Bibliography

5 Isontre: Intelligent Transformer of Social Networks Into a Recommendation Engine Environment

5.1. Summary

5.2. Introduction.

5.3. Latest developments, definition and history.

5.4. iSoNTRE

5.5. Experiments

5.6. Conclusion

5.7. Bibliography

6 A Two-Level Recommendation Approach for Document Search

6.1. Introduction.

6.2. Tag recommendation: a brief state of the art

6.3. The hypertagging system

6.4. Recommendation approach

6.5. Evaluation.

6.6. Conclusion

6.7. Bibliography

7 Combining Configuration and Recommendation to Enable an Interactive Guidance of Product Line Configuration

7.1. Introduction

7.2. Context

7.3. Overview of the proposed approach

7.4. Preliminary evaluation

7.5. Discussion and related work

7.6. Conclusion and future work

7.7. Bibliography

8 Semio-Cognitive Spaces: the Frontier of Recommender Systems

8.1. Introduction.

8.2. Latest developments: finalized activities, recommender systems and the relevance of information

8.3. Observable interests for decision theory: a combination of content-based, collaboration-based and knowledge-based recommendations

8.4. Discussion and conclusions

8.5. Conclusions: recommender systems linked to finalized activities

8.6. Acknowledgments

8.7. Bibliography

9 The French-Speaking Literary Prescription Market in Networks

9.1. Introduction.

9.2. The economy of prescription

9.3. Methodology

9.4. The competitive structure of the market of online social networks of readers.

9.5. The organization of prescription

9.6. Conclusion: what legitimacy for literary prescription?

9.7. Appendix: list of interviews undertaken

9.8. Bibliography

10 Presentation of Offered Services: Babelio, A Recommendation Engine Dedicated to Books

10.1. Introduction

10.2. The problem of qualitative pertinence

10.3. The problem of quantitative pertinence.

10.4. Balancing recall and precision

10.5. The issue of sparse data

10.6. Performance and scalability

10.7. A few issues specific to books

11 Presentation of the Offer of Services: Nomao, Recommender Systems and Information Search

11.1. Introduction: the actors of Internet recommendation.

11.2. Approaches to recommendation

11.3. Nomao: a local outlets search and recommendation engine

11.4. Prospects: the move toward interactive recommender systems

11.5. Appendix

List of Authors

Index

First published 2014 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

www.iste.co.uk

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.wiley.com

© ISTE Ltd 2014

The rights of Gérald Kembellec, Ghislaine Chartron and Imad Saleh to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2014953032

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-84821-768-3

Preface

Recommender Engines (and Systems)

Research on the mass media has helped us to highlight the activity of the receiver in the decoding of information brought about by primary sociability. This approach was developed [PAS 99, CAR 99, CEF 03] in order to propose methods that take into account audiences of fans, and therefore to understand the reception toward informational content. Other researchers have proposed the “two-step flow” [KAT 08, RAM 11] approach, which is based purely on the model of the transmission and reception of content, but which also introduced the intermediary role of the “opinion leader”, whose influence is instrumental in the choice of products by consumers. This approach is based not only on people who distribute information, but also on others who add social and economic value. Mellet [MEL 09] led this research within the framework of viral marketing for the choice of products or services.

“The exponential increase of available information [on the Web], as well as their high heterogeneity” [PIC 11, p. 38] and the rapid rise in the number of e-business and e-commerce services “have often overwhelmed users, causing them to make bad decisions” [PIC 11, p. 38] and losing themselves in the mass of information. Indeed, this rapid rise in websites has created a need for help by consumers for finding the best choice among multiple available products or pieces of information, by providing them with recommendations arising from the filtering of the possible alternatives.

The advent of Web 2.0 in 1999 completely transformed the relationship between the producers and consumers of content in three aspects [BAS 11]:

1) the reduction of hurdles in the production of content: any person can create content (video, music, text, etc.). This leads to an overabundance of content, created by both amateurs and professionals;
2) the accessibility of content on different platforms: anyone can make his/her content accessible by publishing it on different editorial platforms (YouTube, Dailymotion, personal blog, etc.). This leads to a profusion of content, improved through sorting, selection and information quality control mechanisms. This information can be produced by systems (such as web tracking) or public activity. Currently, it is possible to rank content by the number of views, visits or its time of existence using aggregation tools, RSS reeds, “folksonomies”, etc.;
3) social expression: social platforms, such as Facebook, encourage Internet users to “recommend”, “collaborate”, “share” and “publish” information based on their preferences, interests and activities, leading to an overabundance of opinions on information or an object.

Recommender systems can be used to resolve cognitive overload and the overabundance of Web content. In order to tackle these problems, the system based on recommendations analyzes anonymous data, which can eventually prove to be interesting for users [PIC 11, p. 39; SAL 06], for example navigation history, user profiles, purchasing contexts, the analysis of actions and tracks left by the user. The user consults recommendations. “They can choose whether or not to accept it and can provide, immediately or later on, implicit or explicit feedback. All actions and feedback by users can be saved into the database and used for generating new recommendations for future interactions with the system” [PIC 11, p. 39].

Moreover, the research by Simon [SIM 71] has shown that the abundance of information causes a shortage of attention and the need to efficiently allocate this attention when faced with the overabundance of information which can take over it. The value of this within the digital context does not only concern informational content, but also attention, which is seen as the important resource of the digital environment [KES 10]. “An attention economy is taking over the culture market of the Internet more efficiently than any other informational vector”. Salaün [SAL 11] states that due to the Internet, the attention market has reversed the trend of advertising displays. Indeed, advertising messages have begun to accompany the progression of the Internet user and not that of the documentation1.

Research in marketing has shown that the larger the choice, the more hesitant the users become, which leads them to resorting to either mimicry or recommendations [MEL 11]. The recommendation has become a central component within the digital environment of the overabundance of informational content. It is interesting to ask ourselves about the involvement of the recommendation in the reception of content and the difference between online and offline recommendation. How do we judge the pertinence of the recommendation? Does the public create a hierarchy of these recommendations? “Do sharing [and collaboration] lead to the extension or the reduction of information diversity? How do we rank meta-information [or metadata] added to the content when shared, provided by the identity of the person [...], even personal opinion”? [BAS 11, p. 6].

It is also certainly important to study “the new balance caused by digital technology between recommendation and information reception” [BAS 11, p. 4] through a sociological analysis, by asking which ones are “opinion leaders” on the Web, and also asking about their motivations, approaches, influences and roles within the public or private sphere. Moreover, recommendation also poses an ethical problem in the sense that users share their opinions, preferences, competencies and know-how, in other words self-exposition in an open world without any true regulations [BAS 11, GEO 11a, GEO 11b, CAR 11, MER 14]. However, the act of recommendation can be seen as a helpful social practice generalized in a way that has not been seen before in the history of media.

This book echoes the seminar on the topic at the CNAM in 20122 grouping a good number of talks from that day as well as additional contributions that enhance the analysis.

What is the history of recommender engines? What are the most efficient recommendation methods and models? What impact do they have on the habits of users? Who are the main actors of these services? What are the major developments in the field of content publishing platforms? These are all aspects that are looked at here and which the assembled contributions aim to shed light on.

In a time where “Big Data” is completely transforming marketing and e-commerce jobs by analyzing the behaviors of consumers in depth, the interest in recommendation has not stopped growing, especially in its predictive capacity in order to optimize the “matching of products and services” [BER 14a, p. 21]. Big Data has brought interesting solutions by analyzing web audiences (pages visited, search history, metadata, etc.), consumer profiles, tracks left, for proposing recommendations for targeted purchases. Big Data is consequently useful for “determining the best location for a sales outlet [by analyzing] Open Data, geographic demographics, socio-economic data […] and the market and competition” [BER 14b, p. 22]. The combination of Big Data and recommender engines will most likely be major assets in the coming years.

Acknowledgments

The editors of the book Gérald Kembellec, Ghislaine Chartron and Imad Saleh would like to thank all the members of the Peer Review Committee:

– Evelyne Broudoux, MCF, CNAM, DICEN laboratory;
– Camille Prime Claverie, MCF, University Paris Ouest Nanterre;
– Orélie Desfriches Doria, Dr, Research engineer, Labex CAP (CNAM) ;
– Lucile Desmoulins, MCF, University Paris-Est Marne-la-Vallée;
– Widad Mustafa el Hadi, PR, University Lille 3, Gériico Laboratory;
– Camille Paloque-Berges, Dr, Research engineer, Laboratory for the history of technosciences in society (HT2S) at CNAM;
– Sahbi Sidhom, MCF, Researcher, University of Lorraine laboratory, LORIA.

Due to their efforts and full commitment in their corrections, this book follows a scientific approach assured by a double-blind evaluation.

Bibliography

[BAS 11] BASTARD I., Les logiques du partage d’information sur Internet, Thesis project, Orange Labs, 2011.

[BER 14a] BÉRANGER F., Big Data Analyse et valorisation de masses de données, Smile, Livre Blanc Open Sources solutions, p. 21, 2014.

[BER 14b] BÉRANGER F., Big Data Analyse et valorisation de masses de données, Smile, Livre Blanc Open Sources solutions, p. 22, 2014.

[CAR 11] CARDON D., La décomposition de l’identité numérique, 2011. Available at http://spiralconnect.univ-lyon1.fr/webapp/page/page.html?id=1459331.

[CEF 03] CEFAÏ D., PASQUIER D., (eds.), Les sens du public. Publics politiques, publics médiatiques, Presses Universitaires de France, Paris, 2003.

[GEO 11a] GEORGES F., “L’identité numérique sous emprise culturelle. De l’expression de soi à sa standardisation”, Les cahiers du numérique, vol. 7, no. 1, pp. 31–48, 2011.

[GEO 11b] GEORGES F., “Pratiques informationnelles et identité numérique”, Etudes de communication, no. 35, pp. 105–120, 2011.

[HAL 94] HALL S., ALBARET M., GAMBERINI M.C., “Codage/Décodage”, Réseaux, vol. 12, no. 68, pp. 27–39, 1994.

[KAT 08] KATZ E., LAZARSFELD P.L., Influence personnelle. Ce que les gens font des médias, Armand Colin/ National Audiovisual Institute, Paris, 1975/2008.

[KES 10] KESSOUS E., MELLET K., ZOUINAR M., “L’économie de l’attention entre protection des ressources cognitives et extraction de la valeur”, Sociologie du travail, vol. 52, no. 3, pp. 359–373, 2010.

[MEL 09] MELLET K., “Aux sources du marketing viral”, Réseaux, no. 157–158, pp. 267–292, 2009.

[MEL 11] MELLET K., “Le marketing en ligne, ‘Cultures numériques’”, Communications, no. 88, pp. 103–111, 2011.

[MER 14] MERZEAU L., “Identity commons: du marquage au partage”, in COUTANT A., STENGER T., (eds.), Identités numériques, L’Harmattan, Paris, pp. 35–49, 2014.

[PAS 99] PASQUIER D., La culture des sentiments: l’expérience télévisuelle des adolescentes, Les Editions de la MSH, 1999.

[PIC 11] PICOT-CLÉMENTE R., Une architecture générique de Systèmes de recommandation de combinaison d’items: application au domaine du tourisme, PhD Thesis, University of Franche-Comté, p. 39, 2011.

[SAL 06] SALEH I., MKADMI A., REYES E., “L’hypermédia au service du travail collaboratif”, in SALEH I., (ed.), Les hypermédias: conception et réalisation, Hermès, Paris, 2006.

[SAL 11] SALAÜN J.-M., “Economie de l’information: les fondamentaux”, Documentaliste-Sciences de l’Information, vol. 48, no. 3, pp. 24–35, 2011.

[SIM 71] SIMON H.A., “Designing organizations for an information-rich world”, in GREENBERGER M., (ed.), Computers, Communications and the Public Interest, John Hopkins Press, pp. 37–72, 1971.

1http://www.dicen-idf.org/evenement/journee-etude-moteurs-recommandation.

2http://www.dicen-idf.org/evenement/journee-etude-moteurs-recommandation.

Preface written by Gérald KEMBELLEC, Ghislaine CHARTRON and Imad SALEH.

1

General Introduction to Recommender Systems

1.1. Putting it into perspective

Before the emergence of modern information systems, individuals developed the habit of recommending products or services through “word of mouth”, sharing certain social or cultural affinities [OBR 77, SHA 95]. This approach, which can be qualified as social, pursued the principle of sharing an individual experience with others, in areas, at first, as wide as culture or handicraft and then industry. Beyond the reputation tied to the intrinsic quality of a product, there were assessments that emerged through the prism of sociocultural mediums which also improved products and services.

Today, offers – whether information or products – are increasing day-by-day, proposed on the Internet. Beyond a certain threshold, too much information can lead to a deterioration of the quality of the message, which we refer to as information overload [LEV 98, CHE 09]. For the end user in search of information, it is of interest for the system to carry out preprocessing in order to filter the least important elements, in line with their expectations. The development of automated recommender systems (RecSys) is therefore a foreseeable phenomenon for contributing toward resolving the problem of information overload, valuing content and focusing attention on the user in such a context of overabundance.

The first recommender systems, using “collaborative filtering”, had the aim of using the volume of community evaluations in order to propose personalized cultural advice, based on evaluation statistics and the correlation of user profiles [RES 94].

As early as 2000, Burke remarked that many commercial websites such as Amazon or even eBay had understood the purpose of contextualizing peripheral hyperlink offers consulted by the user [BUR 00]. Commercial search engines have even created related products such as “Google AdSense” in order to optimize advertising profits by taking advantage of recommendations based on the contents of queries, or even e-mails1. The principle is simply to propose private advertisers to provide hyperlinks directed toward their website in the margin of content selected by the user. This second method is called the content-based method.

With the arrival of social networks, be they in the public or professional spheres, sharing and the evaluation of content have become a mass worldwide phenomenon. As a result of this unprecedented generation of data, mercantile diversions are common and have led AFNOR2 to propose standards for controlling the phenomenon [AFN 13].

1.2. An interdisciplinary subject

The first notable papers confirming recommender systems as a dedicated area of study and research involved computer science specialists as well as economists invested in the emerging development of e-commerce. The issue of information systems unified them; it has become a decisive factor in the decision-making of organizations. Thus, the precursory paper by Paul Resnick (AT&T) and Hal R. Varian (Berkeley School of Information Management) in 1997 focused on the functional analysis of five precursory recommender systems by mostly concentrating on the business model and risks of corruption of such systems [RES 97]. In 2000, Robin Burke, a researcher in computer science, prioritized mentioning the emergence of large catalogs and the required assistance for the consumer in making their choices; his articles focused on the design of algorithms and their performance [BUR 00]. E-commerce and recommendation algorithms were originally linked.

The data in Table 1.1, collected by consulting the digital library of publications of the Association for Computing Machinery (ACM) about the thematic area of “Recommender System” in the titles of articles, show the increase in interest in this subject over the last 5 years. This count remains partial compared to the set of articles published by other publishers on this subject over the same period. The growth of information as well as the major development of online commercial platforms explains for the most part the stakes associated with the issue; its development goes hand in hand with the optimization of information systems and the needs of e-marketing.

Table 1.1.Increase in the number of articles dedicated to recommender systems in the library of the ACM (http://dl.acm.org/)

1999–2003

64 articles

2004–2008

318

2008–2013

740

The international conference on recommender systems (RecSys) was held in 2007 by the ACM and gathered many RecSys specialists. The 8th meeting of the conference will be held in Silicon Valley at the end of 20143.

The literature shows that the computer approach is focused on the performance of algorithms, their robustness, the design and comparison of systems based on semantic, social as well as hybrid data. The proposed evaluation is often centered around the interaction with the technical system, but does not take into account the more qualitative approaches centered around the user. The computer approach also takes into consideration questions related to the transparency, clarification, trust and measurement of recommendation diversity. The ongoing renewal mainly includes combinations with other technologies: notably the Web of data, Big Data and automated sentiment analysis.

E-commerce approaches are mostly focused on new techniques which can direct potential clients to targeted products and services. The combinations of different types of recommendation have been tested in fields such as tourism and cultural industries (selling of books, music, on-demand video). Recommender systems are considered to be marketing tools and technologies specific to “business intelligence”, a set of methods and technologies which transform data into useful information for decision-making in industry.

From the point of view of information science, identified works are more recent; they highlight the use of such systems for developing discovery functions in digital libraries and library catalogs [WAK 12]. Qualitative evaluations of recommendations, the perspective of users and psychological factors are all perspectives of analysis which are specific to recommender systems and which open up new areas of research in this field with the help of abundant literature on techniques and algorithms. Several conferences are focused, however, on the user experience with these recommender systems by assessing their acceptance or rejection placed in this context. It is notably the aspects of visualization, clarification, transparency, trust, and help in decision-making which are the objects of investigations by researchers from various subject areas4.

1.3. The fundamentals of algorithms

Here, we introduce the foundations of recommendation systems, models and methods to provide a better context for the later chapters. This conceptual appropriation is intended to be neutral and factual; it will pave the way for the presentation of more involved points of view in the rest of this book.

1.3.1. Collaborative filtering

Historically, the first system proposed was based on collaborative filtering. This method assumes an authentication of users on the content management platform and, of course, personal input. Once a document has been proposed to the user by the system on the basis of criteria researched during the creation of the profile and/or the use of an additional internal search engine by the user, the latter will propose the possibility of attributing a rating to it. This rating can be an intrinsic assessment of the document, or an assessment of the relevance to the context of the search and its main intentions.

This rating will be preserved within the system to be reused. According to the “memory-based” or heuristic collaborative filtering, ratings can help predict the assessment of a user α of an item based on that of another user β, having regularly rated in a similar way. In order to determine which user β is most similar to user α, the Pearson correlation is often used [RES 94]. This method is also referred to as “Word of Mouth” [SHA 95] or “People-to-People Correlation” [SCH 99].

Let r be the Pearson correlation coefficient which in our case compares ratings, from 0 to 10, of 2 users for a collection of items. We note that this function is integrated into modern spreadsheets5. The correlation will be weak if the coefficient is less than 0.5 and strong if it tends toward 1.

Pearson correlation:

[1.1]

Example of the computation of the similarity between users having rated a set of items. Table 1.2 displays a collection of user assessments for certain items.

Table 1.2.Example of a sample of ratings

Table 1.3 displays the correlation coefficients computed two by two for the collection. The values in bold show strongly correlated users.

Table 1.3.Similarity of users based on their Pearson correlation

In the example, for the values presented in Table 1.2, the results displayed in Table 1.3 show that each user can benefit from the assessments of at least one other user with a similar profile to theirs (correlation close to 1).

Once the number of user ratings has reached the maximum value, it can be used for offering a more precise prediction method referred to as “model-based” prediction which uses user profiles [BRE 98]. In this second method, the profile types are established by grouping those which have given similar ratings. These are the profile types or models which will be used to give out recommendations.

1.3.1.1. Advantages and drawbacks of collaborative filtering

The first advantage of recommendations based on collaborative filtering is that familiarity with the area of knowledge is not required for searching for information [BUR 02]. This system also facilitates the recommendation to be extended to genres which are correlated to the area of knowledge by using the other interests of similar profiles. This elicited serendipity is referred to by Burke as “cross-genre niches” [BUR 02]. According to Poirier et al., because of its independence from the representation of data, this technique can be applied to contexts where analysis of the content is difficult to automate [POI 10]. We also add that for image, audio and video documents, metadata is rarely available. In this context, outside of collaborative filtering (or a preliminary significant descriptive crowdsourcing effort), there would not be an alternative recommendation method. The last positive aspect is that the quality of the recommendation proposed through collaborative filtering increases with the use of the system.

Claypool et al. have highlighted a certain number of problems in initial recommendation methods [CLA 99]. For example, in the initial state, a recommender system based on collaborative filtering is unusable due to a “coldstart”. This coldstart problem manifests itself in the following way: without ratings no recommendation is possible. This difficulty is reproduced every time an item or user is added. With an overly low number of evaluations for a vast corpus, the data will be too sparse to establish enough correlations. This phenomenon is referred to as “sparsity” [CLA 99].

It is also shown that the principle of popularity will be favored by collaborative filtering. The more an item is favorably rated, the more it will be recommended and therefore rated again. This principle of self-generated notoriety therefore seems to be a result of age rather than the actual quality as perceived by users. This problem can be made up for, or on the contrary intensified by, a downfall of social recommendation systems, namely rating fraud through multiple identities. It can be tempting to modify recommendations from a marketing perspective by leaving ratings under multiple identities. This technique is referred to as “shilling” and is the object of many studies [LAM 04, BUR 06].

1.3.2. Content filtering

The other classic filtering method is based on the description and analysis of the content proposed by the system. This process is mainly based on text analysis techniques, but can be extended to various forms of content containing metadata. Digital text documents which are already well equipped with a wealth of metadata and linked to catalog records illustrate this point.

The content-based recommendation technique is based on the relationship between the user and metadata associated with the items stored in the knowledge base [BOU 04, LEE 06].

The user can voluntarily enter their preferences during their signup to the service: they are “provided”. The other possibility is to compute preferences through the observation of their behavior [ADO 05]. In this case, they are “calculated” and put into vectors.

User preferences are represented in the form of a vector containing the most representative preferences of the user. These key terms can have a statistically determined value depending on their frequency in documents visited and/or rated by the user within the corpus [BAL 97]. For example, it is possible to use the tf algorithm to weight key terms from texts [SAL 88].

Frequency of a term in a document:

[1.2]

The inverse of the frequency of documents [JON 72] is therefore computed with the logarithm of the quotient between the cardinal number of the whole of the corpus C and the cardinal number of the sub-corpus C′ of documents of C containing term m. The number 1 is added to the denominator in order to generalize the function in the case of the absence of terms in the corpus.

Inverse of the frequency of a word in the corpus:

[1.3]

This basic algorithm is rarely used on its own, and has been replaced by more recent and sophisticated combinations, such as Terrier [OUN 05], notable with okapi BM25, but remains the basis for the weighting of the representative terms of documents in text corpuses.

Methods based on the vectorization of queries show promising results. Berry et al. have suggested the recovery of the query in matrix form through the popular latent semantic indexation (LSI) algorithm. The algorithm creates a vector space of reduced dimensions which offers a representation in n dimensions of a set of documents [DUM 88]. When a request is submitted, its numerical representation is compared with the cosine of other documents in the database, and the algorithm returns the documents with the smallest distance. This method can be adapted to recommending documents according to the needs of users.

1.3.2.1. Advantages and drawbacks of content filtering

The advantages of content filtering are similar to those observed in collaborative filtering [BUR 00]. Thus, knowledge of the area is not required by the user, since recommendations are based on corpus data. The accuracy of the system recommendations will also evolve with the size of the corpus. However, a system based solely on corpus data will not be able to propose “serendipity” in the absence of user correlations. Furthermore, as pointed out by Poirier, each user is absolutely independent of others. Thus, a user who would have appropriately filled their profile with their interests will receive recommendations even if they are the only one to be registered [POI 10].

The main drawback of a content-based recommender system is first, as for collaborative types, the case posed by new users who do not have established profiles and therefore no “observed” reference data. Moreover, it is also very difficult to index non-text-based data. The users will be typecast into a particular search context, the one which has already been set as their area of interest. This problem is referred to as “overspecialization”, which eliminates any possibility of serendipity through the proposal of related subjects.

1.3.3. Hybrid methods

Trivially, the hybridization of recommender systems is the result of the combination of collaborative filtering and content-based methods. This vision for hybridization was refined by Burke and then by Adomavicius and Tuzhilin [BUR 02, ADO 05].

Burke made a list of the following seven hybridization techniques [BUR 02]:

– weighted: the recommendation value of an item is based on the sum of available methods. For example, P-Tango [CLA 99] gives an equal value to both collaborative filtering and content-based filtering. This value is then weighted by a confirmation of the users;
– switching: the system chooses to apply either a data-based method or social filtering depending on the search context of the user;
– mixed: this technology facilitates the proposal of recommendations from traditional methods with the aim of limiting the drawbacks of each classic method;
– features combination: this method offers the possibility of enriching data which has been integrated a priori into the system with the ratings of users, which enriches the database a posteriori. The computation of the recommendation is carried out over all of the data;
– cascade: this process consists of a double analysis of user profiles. The first is used to highlight potential candidates, the second to refine the selection of users;
– features augmentation: this is a technique which is similar to the previous one for the first pass-through. If the number of candidates is too high on the first pass-through, then a second will carry out a secondary discrimination by integrating the data of recommended items;
– meta level: as for the first two methods, it involves filtering users twice in order to determine similarities. The difference is that the first pass-through makes possible the generation of a model or profile type of the user.

Adomavicius and Tuzhilin have proposed a classification of hybrid recommendation methods based on three points of focus [ADO 05]:

– combining separate recommenders: the collaborative method and the content-based method are applied separately, then their predictions are combined;
– adding content-based characteristics to collaborative models: this system uses the classic collaborative “People-to-People Correlation” approach, to which it adds recommendations based on the classification of the content and the interests indicated by users;
– adding collaborative characteristics to content-based models: the principle of this model is not to reverse the previous one, but to incorporate characteristics of the “model-based” group profile collaborative method into the content-based approach;
– single unifying recommendation model: construction of a general model which incorporates the characteristics of two models within a same algorithm.

1.3.4.Conclusion on historical recommendation models

The timelines of the first two types of recommendation model overlapped in the 1990s.

Collaborative filtering recommender systems are based on the statistical processing of opinions expressed by users. It was found that data-based methods are adapted to automatic language processing rules, namely automatic indexing and the weighting of representative terms. In order to mitigate the weaknesses inherent to these initial models, hybrid methods have emerged since the end of the 1990s. We will examine the ways in which these different algorithms have been implemented in online applications.

1.4. Content offers and recommender systems

1.4.1.Culture and recommender systems

1.4.1.1. Recommendation and cinema

Historically, researchers (GroupLens) have mostly been interested in the application of recommender systems to the cultural domain with cinema and film ratings [ALS 97]. Film database interfaces are available to users in return for a rating. This method, used in MovieLens, is exactly that presented in section 1.3.1 [SCH 07]. Based on the ratings of each user, it is possible to provide recommendations.

The French cinema listings website Allociné contextually proposes an offer with similar ratings for each presented film. The improvement of this recommender system is based on the introduction of stars to the Internet user, which represent an evaluation, as well as the popular Facebook “Like” mechanism or even “Would you like to watch this film yes/no” (see Figure 1.1, top left). This website also offers the possibility of rating films in batches, on a scale of 1 to 10 if one has seen the film, or indicating whether the user is interested or not (see Figure 1.1, bottom right). The principle is to consecutively assess a large number of cinematographic works and therefore facilitate the system to create the most accurate profile of our preferences in this department. Additional propositions will be more accurate as the number of rated films increases.

Figure 1.1.Allociné’s rating context

1.4.1.2. Recommendation and literature

For the recommendation of literary works, we mention the social network for readers Goodreads and the French network Babelio6. Goodreads initially modeled its recommendation system on metadata sourced from Amazon. Filtering was therefore based on this data. The partnership which linked the social network with the online selling giant then ended, with Goodreads employing Discovereads, a social algorithm developed at Stanford which only uses data from users on a corpus of metadata pertaining to the contents of books7.

Figure 1.2.Goodreads suggestions

Thus, armed with its 8 million users, its database of 300 million rated books and a correlation algorithm, Goodreads can individually offer reliable recommendations based on 20 rated books. A layer of hybridization of data recommendation intervenes due to a typology of the books. In fact, the books are classified according to a taxonomy of literary genres (Graphic Novels, Historical Fiction, Science Fiction, Thriller, etc.) in which our preferred genres will be defined. The upper section of the illustration shown in Figure 1.2 proposes to show recommendations based on the contents of bibliographic records, which are sourced from summaries or metadata. The left side of Figure 1.2 illustrates the offer of a social suggestion based uniquely on user ratings. Furthermore, as illustrated in the bottom right section of Figures 1.2 and 1.3, it is possible to use book “covers” in order to organize our books into “virtual bookshelves” [MAN 99, HUD 11, DES 12]. These shelves will be reused by the system in order to propose relevant reads to other users.

Figure 1.3.Goodreads virtual bookshelves

The Babelio system is quite similar to Goodreads with a webcam scanner system for integrating books using barcodes or ISBN codes. Babelio offers the rating and annotation of books with classifications as well as book “labels”. The recommender system of this system is based more on data than on collaborative filtering. Indeed, Babelio offers contextualized recommendations on the same page as the listing of a book (see in Figure 1.4 the “word cloud” of the book). The suggestion is in the form of “Do you like this book? Babelio suggests (... similar books)” (see Figure 1.4, top right). The system also offers to display other books by the same writer or those authors considered to be “similar”, without specifying in what way (see Figure 1.4, bottom right).

Social recommendation is also present, to a lesser extent, with the possibility of accessing the library of users who have liked a particular book. This social offering has the initial assumption that the books of “friends” are also those that will interest me. This concept is referred to as “homophilia” or proximity of readers’ social networks [GUI 11, AIE 10]. Once a user has been identified as “similar” to our profile and has been accepted as such, we can rely on their ratings and preferences. It seems as if Babelio prioritizes this system, with a focus on user comments and summaries of books rather than the usual correlation algorithm.

Figure 1.4.Recommendations and data with Babelio

A port of Babelio has been made for use within a traditional OPAC system8. The municipal library of Toulouse is equipped with this platform. The social suggestions network is described as “very rich and very active” [KRA 11]. However, this institution does not integrate, in terms of users, the sufficient “critical mass” of data for the autonomous use of a social recommendation network [WAK 12]. The condition for offering a useable service from the start is therefore to rely on platforms already containing content (in order to avoid a coldstart). The verdict made from this offering is positive with “coherent” suggestions estimated at 80% and an improvement of Babelio’s user base.

1.4.1.3. Recommendation and general culture

Hunch’s personalized recommender system proposes general cultural recommendations. From the user’s point of view, initialization is very controlled since one must first answer closed-ended or semi-open questions on preferences and interests. The user must then rate an initial selection of cultural items as a sequence of videos, books, images or even info graphics related to the selected topics. This data-based step helps in the creation of a profile and the ability to class the Internet user into a user group in order to propose more tailored selections of cultural items. Each element is classed into a thematic with a compatibility percentage associated with the user profile. After a few dozen validations – or invalidations – of propositions, correlations with users with similar interests are offered. It is important to note that the collection of topics is not necessarily identical, but the evaluations within these common topics will correspond quite accurately. The aim of these correlations is to establish experimental relations between users in order to begin proposing hyperlinks to topics that would be considered of interest, the content of which is properly indexed. It would then be possible to have a new content because of others sharing the same preferences on common topics of interest. This system is very effective for discovering new content adapted to individual preferences on targeted themes. This efficiency has not escaped the notice of the commercial players since the commercial giant eBay has bought back Hunch in order to adapt it to its online sales platform. The use of recommender systems for commercial means is not a unique case; many major online commercial players turn recommendation into very effective personalized sales systems9.

1.4.2.Recommender systems and the e-commerce of content

More broadly, the use of recommender systems is growing rapidly within the framework of e-commerce. It has become rare to find an e-commerce service which does not provide purchasing recommendations. The aim of these systems is to contextually provide, with varying levels of success, buying advice on “interesting” products in relation to the “needs” of the client.

Figure 1.5.Recommendations on Amazon

The most well-known example is the recommender system used by Amazon’s online library. As illustrated at the bottom of Figure 1.5, a link is proposed next to every title: “Clients who purchased this book also purchased ...”. These propositions come from the analysis and evaluation of the buying habits of Amazon clients. The upper left section of the same illustration proposes recommendations based on content, in other words the metadata from the listings of products. On the right side, a hybrid offer helps create a personalization of a recommendation on the best sales in relation to previous purchases and the currently visited listing.

1.4.3.The behavior of users

Online personalization proposes recommendations for products and services based on the online purchases of clients or their browsing habits. Personalization applications reduce information overload and provide services of added value.

However, their adoption could be held back by concerns from clients regarding the confidentiality of information. A study has thus been dedicated to determining whether confidentiality versus the quality of a recommender service would have a significant impact on its adoption by customers [LI 12]. Slightly unexpectedly, investigations have shown that users are ready to divulge personal information when using services which they deem to be of high quality. The results even go so far as to show that clients who are susceptible to using online personalization are also susceptible to paying for such a service [LI 12].

Without elaborating on the monetization of the system, an analysis of forums related to French recommendation services indicates an immediate subscription and a high involvement from users. This acceptance can even lead to a feeling of frustration in the case of an interruption of the service10.

On the contrary, it is possible to attribute the high subscription rate of the online bibliographical management service Mendeley in French-speaking scientific environments, in part, to its recommendation service [KEM 12, 13].