Digital Libraries and Crowdsourcing - Mathieu Andro - E-Book

Digital Libraries and Crowdsourcing E-Book

Mathieu Andro

0,0
139,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Instead of outsourcing tasks to providers using labor-intensive countries, libraries around the world increasingly appeal to the crowds of Internet users, making their relationship with users more collaborative . These internet users can be volunteers or paid, work consciously, unconsciously or in the form of games. They can provide the workforce, skills, knowledge or financial resources that libraries need in order to achieve unimaginable goals.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 331

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title

Copyright

Preface

Introduction

1 A Conceptual Introduction to the Concept of Crowdsourcing in Libraries: A New Paradigm?

1.1. A rapidly growing economic model

1.2. Origin, definition and scope of crowdsourcing

1.3. Historical chronology of crowdsourcing

1.4. Philosophical and political controversies

1.5. Economic, sociological and legal consequences

1.6. Managerial, library science and technological consequences

2 Overview of Several Crowdsourcing Projects Applied to the Digitization of Libraries

2.1. Putting content online and participative curation: the Oxford’s Great War Archive and Europeana 1914–1918

2.2. Digitization on demand in the form of crowdfunding applied to digital libraries: the European eBooks on Demand network

2.3. Printing on demand (POD): the Espresso Book Machine

2.4. Participative OCR correction and participative transcription of manuscripts

2.5. Folksonomy, cataloguing and participative indexing

3 Overview and Keys to Success

3.1. Typologies and taxonomies of projects

3.2. Communication and marketing for recruiting volunteers

3.3. The question of motivations

3.4. Sociology of the contributors and community management

3.5. The question of the quality of the contributions

3.6. The evaluation of crowdsourcing projects

3.7. Change management

Conclusion

Bibliography

Index

End User License Agreement

List of Tables

1 A Conceptual Introduction to the Concept of Crowdsourcing in Libraries: A New Paradigm?

Table 1.1.

Multicriteria definitions of crowdsourcing

2 Overview of Several Crowdsourcing Projects Applied to the Digitization of Libraries

Table 2.1.

Statistics of EOD orders from the Bibliothèque interuniversitaire de Santé, from [KLO 14], translated by us

Table 2.2.

Rates offered by various institutions offering digitization and printing on demand

Table 2.3.

Examples of OCRization

Table 2.4.

Statistics of the number of Internet users necessary to correct a word, after [VON 08b]

Table 2.5.

Statistics collected in the literature regarding the reCAPTCHA project

Table 2.6.

Comparative costs between OCR correction via the AMT and via a service provider

Table 2.7.

Estimate of the costs not paid for OCR correction services because of the use of crowdsourcing

3 Overview and Keys to Success

Table 3.1.

Model of public participation inspired by [BON 09]

Table 3.2.

Activities of a digitization project crossed with the types of crowdsourcing. For a color version of the table, see www.iste.co.uk/andro/libraries.zip

Table 3.3.

Existing types of crowdsourcing applied to digitization

Table 3.4.

Types of crowdsourcing applied to digitization that remain to be invented

Table 3.5.

Taxonomy of crowdsourcing applied to digitization

Table 3.6.

Data collected in the literature about the sociology of the contributors to different projects

Table 3.7.

Distribution of the working time of crowdsourcing staff according to activities and missions, from [SMI 11]

Table 3.8.

Use of social metadata made by cultural institutions, according to the OCLC study [SMI 11]

Table 3.9.

Statistics before and after crowdsourcing for the California Digital Newspaper Collection, from [GEI 12]

Table 3.10.

Indicators of quantitative analysis of OCR correction or transcription projects

Table 3.11.

Indicators of quantitative analysis of content indexing projects

Table 3.12.

Indicators of quantitative analysis of digitization on demand projects

Table 3.13.

Other indicators of evolution

Table 3.14.

Calculation of what OCR correction would have cost without use of crowdsourcing for several representative projects, from [AND 15]

List of Illustrations

1 A Conceptual Introduction to the Concept of Crowdsourcing in Libraries: A New Paradigm?

Figure 1.1.

The artwork Ten Thousand Cents

3

. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.2.

An artwork juxtaposing sheep

4

Figure 1.3.

13

th

Century sword whose photograph was published by the British Library

5

. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.4.

Change in the number or searches for the word “crowdsourcing” on Google for each country, according to Google Trends. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.5.

Countries represented in the survey conducted by OCLC about social metadata, from [SMI 11]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.6.

Change in the number of publications on crowdsourcing indexed by Google Scholar applied to the digitization of libraries. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.7.

Relationships between human computation, collective intelligence and crowdsourcing, according to [HAR 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.8.

Position of crowdsourcing among neighboring areas, according to [SCH 10]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.9.

The first form of crowdfunding. From http://gallica.bnf.fr/ark:/12148/btv1b8509563b (consulted June 23, 2016). For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.10.

Percentage of Wikipedians by birthdate, according to Wikipedia

2 Overview of Several Crowdsourcing Projects Applied to the Digitization of Libraries

Figure 2.1.

Location of the members of eBooks on Demand network on July 8, 2014, from https://www.facebook.com/eod.ebooks/app_402463363098062 (consulted June 23, 2016). For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.2.

Extract from an EOD activity report, from [KLO 14]

Figure 2.3.

Orders per price class during the 2009–2011 period at the National Library of Slovenia, from [BRU 12]

Figure 2.4.

The form in which users prefer to consult documents, according to the survey related by [MUH 09]

Figure 2.5.

Positive/negative perception according to prices and delivery times, according to the survey related by [MUH 09]

Figure 2.6.

Areas of interest for users, from [GST 11]

Figure 2.7.

Reasons why users placed orders, from [GST 11]

Figure 2.8.

Photograph of an Espresso Book Machine, from ondemandbooks.com. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.9.

Distribution of EBM throughout the world, according to http://www.ondemandbooks.com/ebm_locations.php (consulted on July 9, 2014). For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.10.

Screen capture of a raw OCR text. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.11.

Screen capture of a digitized newspaper and its OCR. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.12.

Change in the number of corrections on lines on TROVE according to statistics obtained from the site itself (source: http://trove.nla.gov.au/system/stats?env=prod)

Figure 2.13.

Screen capture of TROVE

3

. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.14.

Budget of the Transcribe Bentham project, according to [CAU 12b]

Figure 2.15.

Evolution of the number of accounts, manuscripts transcribed and completed between September 8, 2010 and March 8, 2011, according to [CAU 12b]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.16.

Button used by Transcribe Bentham. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.17.

The transcription interface of Transcribe Bentham, from [BRO 12]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.18.

Diagram representing how Internet users discovered the Transcribe Bentham project, according to [CAU 12a]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.19.

Diagram representing the distribution of contributors to Transcribe Bentham according to age, according to [CAU 12a]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.20.

Motivations of the volunteers of the Transcribe Bentham project, from [CAU 12a]

Figure 2.21.

Screen capture of the game Mole Hunt. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.22.

Screen capture of the game Mole Bridge. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.23.

Proportion of work carried out by 1, 10 and 25%, of the best contributors, from [CHR 11]

Figure 2.24.

Diagram explaining how reCAPTCHA works, according to the site Google.com. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.25.

Another diagram explaining how reCAPTCHA works, from [IPE 11]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.26.

The Turkish chess player, Tuerkischer schachspieler windisch by Karl Gottlieb von Windisch, 1783, public domain via Wikimedia Commons

Figure 2.27.

Number of HITs in November 2013, according to the Mechanical Turk tracker

Figure 2.28.

Distribution of Indian workers and American workers on AMT by sex, according to [IPE 10b]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.29.

Birth year of workers on the AMT, according to [IPE 10b]

Figure 2.30.

Educational level of workers on the AMT, according to [IPE 10b]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.31.

Average time dedicated to the AMT, according to [IPE 10b]

Figure 2.32.

Average income made from the AMT, according to [IPE 10b]

Figure 2.33.

Number of workers stating that AMT is their primary source of income, according to [IPE 10b]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.34.

Types of motivation according to the greater or lesser dedication of workers on the AMT platform, according to [KAU 11]

Figure 2.35.

Number of corrections on TROVE between 2008 and 2012, according to [HAG 13]

Figure 2.36.

Change in the amount of content compared to that of the number of corrections on TROVE, according to [HAG 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.37.

Proportion of genealogists among the contributors, according to a CDNC/Cambridge Public Library survey. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.38.

Distribution of volunteers by age group, according to a CDNC/Cambridge Public Library survey. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.39.

The types of documents distributed on TROVE compared to the types of documents that are corrected there, according to [HAG 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.40.

Most corrected types of documents on TROVE, according to [HAG 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.41.

Classification of contributors according to the number of lines corrected for the TROVE and CDNC projects, according to [ZAR 14]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.42.

Portion of the work accomplished by each contributor to the Old Weather project offering to transcribe meteorological observations, from Brumfield, manuscripttranscription.blogspot.fr, 2013. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.43.

Screen capture of the game Art Collector, first round, from [PAR 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.44.

Screen capture of the game Art Collector, round 2, choice of a piece, from [PAR 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.45.

Screen capture of the game Art Collector, round 2, trying to win a work, from [PAR 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 2.46.

Gender and age of the players of Art Collector, according to [PAR 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

3 Overview and Keys to Success

Figure 3.1.

Taxonomy of crowdsourcing, from [HAR 13]

Figure 3.2.

Taxonomy of the 4Cs of crowdsourcing, from [REN 14b]

Figure 3.3.

Time evolution since 2011 and forecast of the future gamification market, from [OLL 13]

Figure 3.4.

Serious games and gamification, from [DET 11a]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.5.

Screen capture of the What’s on the menu? press release: “Help the New York Public Library improve a unique collection. We need you! Help transcribe. It’s easy! No registration required!” from [VER 13]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.6.

Taxonomy of the motivations of volunteers in a crowdsourcing project. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.7.

Maslow’s Hierarchy of needs, By user: Factoryjoe (Mazlow's Hierarchy of Needs.svg) [CC BY-SA 3.0 (https://creativecommons.org/licenses/by-sa/3.0)], via Wikimedia Commons (consulted October 4, 2017). For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.8.

Diagram showing that a handful of Internet users are the source of the majority of contributions, from Brumfield, manuscripttranscription.blogspot.fr, 2013

4

. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.9.

Distribution of staff activities in management of crowdsourcing projects, from [SMI 11]

Figure 3.10.

The working time of crowdsourcing project staff, from [SMI 11]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.11.

Frequency with which sites put new content online, from [SMI 11b]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.12.

The criteria for success, from [SMI 11]

Figure 3.13.

Number of unique visitors per month for crowdsourcing projects, from [SMI 11]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 3.14.

Number of contributors per month for cultural institutions, from [SMI 11]. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Guide

Cover

Table of Contents

Begin Reading

Pages

C1

iii

iv

v

ix

x

xi

xiii

xiv

xv

xvi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

203

G1

G2

G3

G4

G5

G6

G7

e1

Digital Tools and Uses Set

coordinated by Imad Saleh

Volume 5

Digital Libraries and Crowdsourcing

Mathieu Andro

First published 2018 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd

27-37 St George’s Road

London SW19 4EU

UK

www.iste.co.uk

John Wiley & Sons, Inc.

111 River Street

Hoboken, NJ 07030

USA

www.wiley.com

© ISTE Ltd 2018

The rights of Mathieu Andro to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Control Number: 2017958934

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-78630-161-1

Preface

In lieu of outsourcing certain tasks to service providers with access to countries where labor is cheap, libraries throughout the world are relying more and more on groups of internet users, turning their relationship with users into one that is more collaborative. After a conceptual chapter about the consequences of this new economic model on society and on libraries, an overview of projects in the areas of on-demand digitization, participative correction of OCR especially in the form of games (gamification) and folksonomy will be presented. This panorama leads to an overview of crowdsourcing applied to digitization and digital libraries and analyses in the area of information and communication sciences.

Acknowledgments

I would like to thank Imad Saleh, Professor at the Paragraphe Laboratory of Paris 8 University, for having agreed to supervise my thesis project, for his kindness and for his advice throughout the entire project; Samuel Szoniecky, Senior Lecturer at the Paragraphe Laboratory of Paris 8 University, for having agreed to be the co-director of my thesis and for having invited me to speak with his students; Ghislaine Chartron (Professor at the National Conservatory of Arts and Crafts); Stéphane Chaudiron (Professor at Charles-de-Gaulle University, Lille 3); Céline Paganelli, Senior Lecturer – HDR (accreditation to supervise research) at Paul Valéry University, Montpellier; Alain Garnier, CEO of Jamespot and crowdsourcing advisor at the Groupement Français des Industries de l’Information (GFII) for having agreed to be an examiner of my thesis; François Houllier, Institut University, National de la Recherche Agronomique (INRA), for letting me participate alongside him in a task force on citizen science in order to submit a report on the subject at the request of the appropriate ministers; Odile Hologne, from the department of promoting scientific and technical information of the Institut National de la Recherche Agronomique (INRA), for having encouraged experimentation around INRA’s Numalire project within the framework of my work; Filippo Gropallo and Denis Maingreaud, from the company Orange and the company Yabé, for their project Numalire in which they allowed me to participate, and for their collaboration throughout this research project; Marc Maisonneuve and Emmanuelle Asselin, from the consulting firm TOSCA, for their collaboration on the book that we published together on software and platforms for developing digital libraries; Gaëtan Tröger, Ecole des Ponts ParisTech, for his collaboration in the study that we carried out on the visibility and statistics of the consultation of digital libraries; Pauline Rivière, Sainte-Geneviève Library, and Anaïs Dupuy-Olivier, Académie de Médecine, for their collaboration in the feedback on the Numalire experiment that we wrote together; Robert Miller, Internet Archive, for the collaboration that we had at Sainte-Geneviève Library, which became the first library in France to participate in the Internet Archive; Stéphane Ipert, Centre de Conservation du Livre, for the collaborations and interesting discussions that we had; Pierre Beaudoin and Rémi Mathis, previous and current presidents of Wikimedia France, an association with which collaborations with Wikisource were achieved (National Veterinary School of Toulouse in 2008) or only envisioned (Sainte-Geneviève Library); Valérie Chansigaud, science historian and Wikipedia contributor, with whom first contact was established at the museum followed by a pilot experiment in digitization and participative correction of OCR, which was conducted in 2008 at the National Veterinary School of Toulouse; Gilonne d’Origny, from the company ondemandbooks.com, with whom a collaboration on the first installation of an Espresso Book Machine in France was unsuccessful; Daniel Teeter, from the company Amazon, for the interesting opportunity for partnership that was nearly established; Juan Pirlot de Corbion, founder of chapitre.com and YouScribe, for the passionate discussions that we had over the course of our meetings; Daniel Benoilid, founder of the paid crowdsourcing company Foule Factory, for the discussions that we had; Jean-Pierre Gerault, CEO of the company I2S, leader in the area of manufacturing scanners for the digitization of heritage, president of the Comité Richelieu and CEO of Publishroom, for the interesting discussions that we had; Arnaud Beaufort, National Library of France, whom I met during the Wikimedia days at the National Assembly and with whom I then had an interesting conversation; Silvia Gstrein and Veronika Gründhammer, University of Innsbruck, for having invited me to speak at the Ebooks on Demand 2014 conference; Yves Desrichard and Armelle de Boisse, Ecole Nationale Supérieure des Sciences de l’Information et des Bibliothèques, for having allowed me to speak during the “Quoi de neuf en bibliothèques ?” days these last 5 years; Thierry Claerr, Ministry of Culture and Communication, who allowed me to speak regularly at the ENSSIB and sought me out to write a collaborative work, and with whom I had some very enriching discussions; Jean-Marie Feurtet, Agence Bibliographique de l’Enseignement Supérieur, for our collaboration on a mutualization project of a digital library and for having invited me to speak at the 2011 ABES; Nicolas Turenne, Institut National de la Recherche Agronomique (INRA), for having invited me to show the preliminary results of this work at the seminar entitled “Digital Traces” (Cortext group, Institute for Research and Innovation in Society); Pierre-Benoît Joly, director of the Institute for Research and Innovation in Society (IFRIS), for having invited me to give a master’s level course in Digital Studies and Innovation (NUMI); SNCF for the comfort of the train trips I took while writing this thesis; Google for the Google Drive service, which was used to write the thesis while providing real-time access to it for the director, my collaborators and my contacts who then had the opportunity to add comments; my wife Véronique and my three children Terence, Orégane and Eloïse.

I also want to thank the following people for the constructive comments that they added to the text of the thesis made available in its first draft on Google Drive: Christine Young (proofreading the article in English), Wilfrid Niobet (one idea, eight comments, six corrections), Célya Gruson-Daniel (three comments, four corrections), Olivia Dejean (nine corrections), Michaël Jeulin (seven corrections), Catherine Thiolon (ten comments), Caroline Dandurand (five comments), Diane Le Hénaff (three comments), Sophie Aubin (two comments), Nicolas Ricci (one comment), Pauline Rivière (one comment), Frédérique Bordignon (one comment), Sylvie Cocaud (one comment), Marjolaine Hamelin (one comment), Silvère Hanguehard (one comment), Christine Sireyjol (one comment), Odile Viseux (one comment), Véronique Decognet (one comment), Dominique Fournier (two corrections) and all of the “unknown soldiers” who remained anonymous in their comments (82 corrections).

Mathieu ANDRO November 2017

Introduction

Libraries already resort to outsourcing certain tasks involved in entering bibliographic records, cataloguing, indexation or OCR correction, to service providers in countries where labor is inexpensive. This outsourcing has remained within a contractual and limited framework and has not profoundly overturned the underlying ways in which libraries work. However, with the development of crowdsourcing, it is possible to imagine externalizing (outsourcing) some of these tasks not to service providers but to “crowds” of Internet users and therefore having amateurs carry out some of the professionals’ work. Crowdsourcing thus changes the paradigm up on which libraries are based, which now largely centers around the creation and conservation of collections. It also changed the relationship between the service providers, namely the librarians, and their consumers, namely the users. The latter are also becoming active producers of services. Crowdsourcing could also interrogate the collection management policies of libraries, which anticipate need based on a supply that is not directly or immediately determined by demand. This is especially the case with the on-demand digitization by crowdfunding, a form of crowdsourcing that calls not on the work of crowds, but on their financial resources, or with the printing on demand which is inseparable from it. With these on-demand economic models, the collection management policy is finally shared with users who decide what will be digitized and/or printed. In this way, the collections become the work of the users.

This book has the goal of providing responses to the question of relying on crowdsourcing for library professionals, as well as for students, researchers in information and communication sciences and, more generally, people interested in collective intelligence projects. It is the result of a thesis on information and communication sciences that simultaneously includes action research, an experiment and an analysis of the literature [AND 16]. This thesis itself has previously been the subject of an article using the main contributions [AND 17].

Beyond the questions of costs/benefits and advantages/disadvantages, the question of an evolution of the librarian’s profession refocused on their singular skills will be addressed. This work also has the scientific goal of providing a contribution to knowledge of crowdsourcing on the theoretical and conceptual level around economic models.

This work is limited to the application of crowdsourcing in the area of digitization and digital libraries. Since the 1990s, the digitization of documents has been widespread in libraries. Today, with mass digitization and the development of gigantic digital libraries such as Google Books, which has crossed the threshold of 30 million books, or Internet Archive, Hathi Trust, Europeana, the “harvester” of European digital libraries, it is becoming more and more difficult to identify printed matter that has not been digitized and still deserves to be, among the 130 million1 existing titles printed since the invention of printing.

A significant part of what has been digitized by libraries has never been put online. It generates duplicate digitization and is “sleeping” on CD-ROMs, DVDs or external hard drives whose lifetime is limited. The development of a digital library can, in fact, be expensive in terms of software administration and servers, and the result can be disappointing in terms of functionalities, durability, costs and visibility. In 2012, we published a study dedicated to the software programs YooLib (Polinum), Invenio (CERN), ORI-OAI (universities), DSpace (DuraSpace), DigiTool (Ex Libris), Mnesys (Naoned), ContentDM (OCLC), Eprint (University of Southampton), Greenstone (University of Waikato) and Omeka (George Mason University) [AND 12]. In this study, we found that it was more advantageous for libraries to participate in a shared digital library such as Internet Archive as much from the point of view of costs (free), functions (optical chapter recognition and conversion into EPUB and MOBI for e-readers directly implemented on archive.org) and permanent archiving (multiple mirror servers around the world) as from that of visibility. Indeed, the position of a website in the list of Google search results depends on its PageRank. This depends largely on the number of links that point to its domain name. Under these conditions, a digital library with a large amount of content will automatically have a better PageRank and better visibility on the web and will therefore generate much more web traffic than a small digital library with very little content.

As Waibel [WAI 08] maintains, two schools of thought exist: an old school that believes that each library needs to create its own digital library and attempt to attract Internet users to it, and a new school that instead believes that in going beyond institutional communication and better satisfying the needs of Internet users, libraries would be better off participating in the digital libraries collectives already visited by Internet users, such as Internet Archive or even Flickr. This is also our point of view. With enough web traffic, libraries may prompt the participation of Internet users.

The introductory part of the book attempts to articulate its context and the methodology that was used.

Chapter 1 addresses the philosophical, political and economic representations of crowdsourcing and its consequences regarding the way in which libraries function. This conceptual chapter contains, in particular:

– a critical discussion regarding the definition of crowdsourcing;

– an original chronology of its historical origins;

– an analysis on the subject of its conceptual origins in philosophical currents that are sometimes diametrically opposed and, in particular, a conceptual contribution around the law of value;

– a reflection on the concept of the wisdom of crowds;

– an analysis of the diverse critiques of crowdsourcing applied to digital libraries that some people could today describe as the “uberization” of digital libraries.

Chapter 2 contains a selection of projects through types of tasks including:

– putting content online and participative curation;

– digitization and printing on demand in the form of crowdfunding;

– participative correction of OCR and participative transcription of manuscripts;

– folksonomy.

This chapter contains data and information collected from the literature for each project.

Original analyses for each major type of project are given in the conclusion of Chapter 2.

In Chapter 3, analyses from the point of view of information and communication sciences and a state of the art are offered with, notably:

– an original taxonomy of crowdsourcing in digital libraries distinguishing explicit (or conscious), voluntary and paid crowdsourcing and implicit (or unconscious) crowdsourcing, gamification and crowdfunding;

– an analysis of the motivations of libraries and the conditions necessary for the development of crowdsourcing projects;

– a taxonomy of the motivations of Internet users who contribute to their projects;

– analyses of the possible rewards and remuneration;

– clarification regarding the communication necessary for recruitment;

– developments in the specific community management of this type of project;

– analyses of the question of the quality and reintegration of the data produced;

– a reflection on the evaluation of crowdsourcing projects.

1

The number of books that have been printed since Gutenberg’s invention of the printing press is estimated at 129,864,880 by Leonid Taycher, an engineer at Google, according to an article published on his blog on August 5, 2010.

1A Conceptual Introduction to the Concept of Crowdsourcing in Libraries: A New Paradigm?

1.1. A rapidly growing economic model

1.1.1. What made this new economic model possible

Internet users are growing more and more numerous and the time that they spend surfing the Internet is growing. The online encyclopedia Wikipedia required 100 million cumulative hours to be constructed. As Clay Shirky stated on August 28, 2008 at the Wiki-Conference NYC, if Americans, who watch 200 billion hours of television every year, used that time for creative activities instead, they could create 2,000 projects such as Wikipedia each year instead of watching television.

During a 2011 TED conference, Luis Von Ahn1 claimed that using only 100,000 people, humanity succeeded in building pyramids and digging the Panama Canal, and that because of the Internet and social networks, it is now possible to assemble 750 million people, for example, for a project correcting the Optical Character Recognition (OCR) such as reCAPTCHA. An amazing “reservoir of goodwill” is therefore potentially available for cultural institutions if they know how to benefit from it.

Participatory models came about with the development of the Web 2.0. The term was invented by DiNucci in 1999 [NGU 12] or by Dale Dougherty in 2004 [SAR 14] and popularized by Tim O’Reilly in 2005 [TRA 08]. Crowdsourcing now means that Internet users no longer have to be content with passively consuming Web content within a hierarchical, unilateral and static diffusion model (Web 1.0), but can actively participate in its development. The diffusion of information has become reciprocal, interactive and dynamic. The Internet user therefore ceases to be a consumer, a reader and a passive receptor who is content to browse, and becomes a producer, an author, an active emitter of information, a contributor who can participate in the writing and modification of content on the Web (comments, tags, wikis, social networks, etc.) and in the production of data and metadata. The authority of data has thus been moved from the server to the customer [BAI 12]. As telecommunications expert Benjamin Bayart emphasizes, if printing taught people to read, the Internet is now teaching them to write2.

Well before Web 2.0, the invention of “self-service” which granted the consumer direct access to merchandise without the intermediary of a vendor and which was applied to libraries in the form of open access collections, was an early form of the integration of the consumer into the production process. This economic model was invented by Aristide Boucicaut in his department store “Le Bon Marché” whose slogan was “self-service, free to touch” giving customers, as described in Zola’s Au bonheur des dames (translated into English as The Ladies’ Delight or The Ladies’ Paradise), the opportunity to access the merchandise actively and freely, without a shopkeeper as an intermediary, and, in fine, to take over part of the merchants’ and store owners’ jobs. Broadly speaking, production seems to have thus progressively lost the central place that it occupied in favor of consumption and the consumer society that developed after the Second World War.

Later, the “just in time” model, developed at Toyota, consisted of producing products “on demand” for the customer in order to avoid unsold stock by producing just-in-time supply in a way that is synchronized with and driven by demand. This model of “manufacturing without waste”, “lean manufacturing” or “fat-free manufacturing” consists of producing only what you strictly need, with the necessary correct means, at the time when it is needed and at the least possible cost to the producer to externalize the decision to begin production with the consumer. This model was born from the difficulty Japanese stores had in stocking merchandise due to insufficient space and the necessity of resupplying only when stock ran out. It was also significantly inspired by the way in which supermarkets operate. In the same way, the clothing chain Zara keeps only a single month worth of inventory and thus better adapts its production to trends in the market, producing models depending on sales [SUR 04]. Advertising itself participates in the integration of the consumer into the production process. Indeed, when we view a television program or website, we produce statistics and data, or when we view advertisements, we also produce value. We can therefore talk about an economy of attention [CIT 14]. The decision to visit this or that site could therefore be likened to a vote, a vote that participates in production and revenues of the producers. This model has found its application in libraries, in on-demand digitization by participatory financing (crowdfunding) and in printing on demand, which will be addressed in this book.

Today, crowdsourcing continues the relatively old movement of integrating the consumer into the production process. It was made possible by the development of the technologies of Web 2.0. Born from a cultural evolution toward more participative and collaborative approaches, crowdsourcing was made technologically possible by Web 2.0, that is to say, the possibility of having a large number of people, who have free time available on the Web, work remotely on collective projects. It is especially inspired by the way communities of freeware developers work. By calling on a crowd of Internet users, it is possible to carry out, in very little time, tasks that previously would have been impossible to complete or even imagine, or that would have required huge amounts of time. In short, crowdsourcing “is a way to find a needle in a haystack”, as Lebraty and Lobre [LEB 15] state. Sagot et al. [SAG 11] talk about “myriadization of divided work” and microworking. We could also talk about the “taskification” of work. Crowdsourcing has some similarities to the construction of medieval cathedrals, which required the capacity to “think big”, to delegate, to organize every task and above all to mobilize a large number of people around a common vision and goal, as Levi [LEV 14] recalls. It is also, to take a more recent example, what Alfred Sloan of General Motors described as “group management”, which consists of the solicitation of numerous collaborators to make the most important decisions.

We illustrate this idea with contemporary works of art in Figures 1.1 and 1.2.

Figure 1.1.The artwork Ten Thousand Cents3. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip

Figure 1.2.An artwork juxtaposing sheep4

In addition to art, crowdsourcing has already found applications in many areas. For example, in the field of video, YouTube and DailyMotion could not function without content posted online by Internet users. Crowdsourcing has also found applications in music, politics, fashion, banking, tourism, innovation, cartography, the search for missing planes, medicine, scientific research, publishing, translation and journalism. Using crowdsourcing is also topical in the field of GLAM (galleries, libraries, archives and museums) and digital libraries in particular, which is the subject of this book.

1.1.2. Application to digital libraries

For libraries, digitizing and diffusing their collections on the Web means that they find themselves in the same space as their users. This situation makes possible multiple synergies and collaborations. Among cultural institutions, the amount of content that they make available on the Web has grown exponentially and there is no lack of painstaking work in indexing, describing and correcting this content. However, their budgets and their workforce have experienced an opposite trend which often leaves them sorely lacking. This state of affairs makes many goals impossible and the carrying out of other projects unimaginable without external aid. In addition, the real or virtual publics of these institutions are less and less content with the role of passive consumer of cultural information and would increasingly like to get involved in service to heritage and culture. In cultural institutions, the idea of being receptive to interaction with a participating public and volunteers largely preceded the emergence of the Web 2.0. However, the Relational Web has fostered the emergence of a participative culture on which the model of crowdsourcing in libraries feeds.

In digital libraries, crowdsourcing thus makes it possible to complete tasks that would be impossible to undertake without the help of volunteer Internet users, in the absence of financial and human means. This means, for example, to improve the quality of metadata or to enrich it (comments, tags, analyses, etc.), to benefit from the knowledge and skills of scholars, to develop communities around projects, to increase visits to the resources produced, to make the general public more aware of the conservation of common cultural heritage, to generate more interactions, innovative ideas and collaboration. For example, within the online public, there might be someone who would know how to identify a church in a photograph, a scholar could provide information about its construction and its history, an elderly villager able to identify a person in the photo, etc. The knowledge that teams of librarians have access to is much too limited to be able to respond to all of these questions. The knowledge present in the crowd of Internet users is limitless.

The British Museum understood this well when, on August 3, 2015, it published a call to Internet users on britishlibrary.typepad.co.uk with the title, “Help Us Decipher this Inscription”. Between August 3 and 18, 2015, the post had been shared almost 32, 000 times and had generated more than 11, 000 shares on Facebook and 9,000 tweets, as well as 115 comments directly on the blog between August 3 and 10.

Figure 1.3.13th Century sword whose photograph was published by the British Library5. For a color version of the figure, see www.iste.co.uk/andro/libraries.zip