139,99 €
Collective Intelligence and Digital Archives DIGITAL TOOLS AND USES SET Coordinated by Imad Saleh This book presents the most up-to-date research from different areas of digital archives to show how and why collective intelligence is being developed to organize and better communicate new masses of information. Current archive digitization projects produce an enormous amount of digital data (Big Data). Thanks to the proactive approach of large public institutions, this data is increasingly accessible. Despite the recent stabilization of technical and legal frameworks, the use of data has yet to be enriched by processes such as collective intelligence. By exploring the field of digital humanities, audiovisual archives, preservation of cultural heritage, crowdsourcing and the recovery of scientific archives, this book presents and analyzes concrete examples of collective intelligence for use in digital archives.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 357
Veröffentlichungsjahr: 2017
Cover
Title
Copyright
1 Ecosystems of Collective Intelligence in the Service of Digital Archives
1.1. Digital archives
1.2. Collective intelligence
1.3. Knowledge ecosystems
1.4. Examples of ecosystems of knowledge
1.5. Solutions
1.6. Bibliography
2 Tools for Modeling Digital Archive Interpretation
2.1. What archives are we speaking of? Definition, issues and collective intelligence methods
2.2. Digital archive visualization tools: lessons from the Biolographes experiment
2.3. Prototype for influence network modeling
2.4. Limits and perspectives
2.5. Conclusion
2.6. Bibliography
3 From the Digital Archive to the Resource Enriched Via Semantic Web: Process of Editing a Cultural Heritage
3.1. Influencing the intelligibility of a heritage document
3.2. Mobilizing differential semantics
3.3. Applying an interpretive process to the archive
3.4. Assessment of the semiotic study
3.5. Popularizing the data web in the editorialization approach
3.6. Archive editorialization in the Famille™ architext
3.7. Assessment of the archive’s recontextualization
3.8. Bibliography
4 Studio Campus AAR: A Semantic Platform for Analyzing and Publishing Audiovisual Corpuses
4.1. Introduction
4.2. Context and issues
4.3. Editing knowledge graphs – the Studio Campus AAR example
4.4. Application to media analysis
4.5. Application to the management of individuals
4.6. Application to information searches
4.7. Application to corpus management
4.8. Application to author publication
4.9. Conclusion
4.10. Bibliography
5 Digital Libraries and Crowdsourcing: A Review
5.1. The concept of crowdsourcing in libraries
5.2. Taxonomy and panorama of crowdsourcing in libraries
5.3. Analyses of crowdsourcing in libraries from an information and communication perspective
5.4. Conclusions on collective intelligence and the wisdom of crowds
5.5. Bibliography
6 Conservation and Promotion of Cultural Heritage in the Context of the Semantic Web
6.1. Introduction
6.2. The knowledge resources and models relative to cultural heritage
6.3. Difficulties and possible solutions
6.4. Conclusion
6.5. Bibliography
7 On Knowledge Organization and Management for Innovation: Modeling with the Strategic Observation Approach in Material Science
7.1. General introduction
7.2. Research context: KM and innovation process
7.3. Methodological approach
7.4. Conceptual modeling for innovation: technological transfer
7.5. Conclusion: principal results and recommendations
7.6. Bibliography
List of Authors
Index
End User License Agreement
5 Digital Libraries and Crowdsourcing: A Review
Table 5.1. Chronology of crowdsourcing in libraries
Table 5.2. The conceptual origins of crowdsourcing
Table 5.3. Taxonomy of crowdsourcing and panorama of the projects for digital libraries
6 Conservation and Promotion of Cultural Heritage in the Context of the Semantic Web
Table 6.1. The three main classes used by Europeana
Table 6.2. Some important relationships defined by the EDM schema
Table 6.3. An example of domain terms
Table 6.4. The result of evaluation measurements for a sample of classes
7 On Knowledge Organization and Management for Innovation: Modeling with the Strategic Observation Approach in Material Science
Table 7.1. Prefixed structures to the sentence: introductory proposition
Table 7.2. Noun structures in a sentence: NP
Table 7.3. Relative structures in a sentence: relative clause (REL)
Table 7.4. Verbal structures in a sentence: VP
1 Ecosystems of Collective Intelligence in the Service of Digital Archives
Figure 1.1. Principles of differential semantics
Figure 1.2. Modeling an empty being
Figure 1.3. Recursive cycle of sources
Figure 1.4. Mapping the influence networks. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 1.5. Editing the archives via the semantic web. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 1.6. Studio Campus AAR
Figure 1.7. Crowdsourcing in libraries. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 1.8. Conservation and promotion of cultural heritage
Figure 1.9. Management of knowledge for innovation
2 Tools for Modeling Digital Archive Interpretation
Figure 2.1. Publication rhythm of scientific poetry books
Figure 2.2. Map of the locations visited by Jules Michelet. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 2.3. Relationship between the types and locations of stays (1st attempt). For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 2.4. Palladio network representation
Figure 2.5. RelFinder, Wikipedia’s structured data exploration tool. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 2.6. Keshif faceted browser
Figure 2.7. Table, a Gephi-like software for data visualization
Figure 2.8. Table – a wide variety of graphics for refining readings of relationship types
Figure 2.9. The specialties of the scholars visited throughout his life
Figure 2.10. Are the scholars visited by Michelet young researchers or old authorities of renown?
Figure 2.11. Comparison of the categorization choices among researchers
15
. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 2.12. Edit Influ: addition of an actor reference
Figure 2.13. Edit Influ: addition of a conceptual reference
Figure 2.14. Edit Influ: addition of a reference to a document
Figure 2.15. Edit Influ: addition of a relationship
3 From the Digital Archive to the Resource Enriched Via Semantic Web: Process of Editing a Cultural Heritage
Figure 3.1. Frame from shot no. 48
Figure 3.2. Frame from shot no. 47
Figure 3.3. Frame from shot no. 49
Figure 3.4. Screenshot of the “Ancenis” note in French on the website Wikidata.org
Figure 3.5. Stage of semantically enriching the Famille architext
Figure 3.6. Manual annotation stage on the Famille architext
Figure 3.7. Navigation window of the Famille architext via graph. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 3.8. Advanced research interface on the Famille architext
Figure 3.9. Subject creation interface on the Famille architext
4 Studio Campus AAR: A Semantic Platform for Analyzing and Publishing Audiovisual Corpuses
Figure 4.1. The Campus AAR environment with its Studio; graphic created by F. Lemaitre
Figure 4.2. Screenshot of the Studio Campus AAR menu bar
Figure 4.3. Simple restriction on Stanford’s Protégé
Figure 4.4. Recursive restriction on Stanford’s Protégé
Figure 4.5. Tree of restrictions
Figure 4.6. SPARQL query generated from the tree represented in Figure 4.5
Figure 4.7. Constraints relaxation
Figure 4.8. Knowledge graph editing interfaces
Figure 4.9. Example of a knowledge graph for medium analysis
Figure 4.10. Example of a generated visual segment map and the associated metadata
Figure 4.11. Dynamic formula
Figure 4.12. Suggestions of named entities
Figure 4.13. Representation of semantic queries
Figure 4.14. Graph of a semantic query
Figure 4.15. Results of a semantic query
Figure 4.16. Simplified architecture of the transformation engine
Figure 4.17. General architecture of the publication system
Figure 4.18. Publication graph and web page
5 Digital Libraries and Crowdsourcing: A Review
Figure 5.1. Page from an old thesis saved at the National Veterinary School of Toulouse for which OCR correction is proposed (via Wikisource)
Figure 5.2. “Turkischer Schachspieler” by Karl Gottlieb von Windisch. 1783. Public domain via Wikimedia Commons
Figure 5.3. Screenshot of the Digitalkoot OCR correction game [CHR 11]
Figure 5.4. Diagram explaining how reCAPTCHA works according to https://www.google.com/recaptcha
Figure 5.5. The Espresso Book Machine according to http://ondemandbooks.com
Figure 5.6. Number of publications indexed in Google Scholar as a function of their years of publication and responding to the search “crowdsourcing AND library AND digitization”
Figure 5.7. Relative influence of different countries in the thesis bibliography (216 publications). For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Figure 5.8. Taxonomy of the motivations of Internet users who participate in crowdsourcing projects in libraries
Figure 5.9. A few Internet users produce the largest part of contributions (according to Brumfield’s blog manuscripttranscription.blogspot.fr in 2013)
6 Conservation and Promotion of Cultural Heritage in the Context of the Semantic Web
Figure 6.1. An example of Dublin Core elements
Figure 6.2. LIDO use with Europeana elements
Figure 6.3. An example of LIDO use
Figure 6.4. An example of using the EDM model
Figure 6.5. The creation of semantic links between the local schema and Europeana
Figure 6.6. Semantic richness/CIDOC CRM event-oriented approach
Figure 6.7. An example of an ontological pathway
Figure 6.8. Example of functional divisions and ontological pathways
Figure 6.9. Graphic interface of the “Path Finder” tool
Figure 6.10. An extract of the detailed information on YALTA in the TGN thesaurus
Figure 6.11. En example of the extension of CIDOC CRM by CRM-EH
Figure 6.12. An example of the Iconclass classification for thematic browsing
Figure 6.13. An example of several correct models of the same ideas
7 On Knowledge Organization and Management for Innovation: Modeling with the Strategic Observation Approach in Material Science
Figure 7.1. Identification of a cognitive writing grammar [SID 02]
Figure 7.2. MASK method structuring (source: http://aries.serge.free.fr)
Figure 7.3. Schema of the C-K theory K [SOU 10]
Figure 7.4. Example of a syntactic graph in NooJ for extracting a simple NP
Figure 7.5. Example of a syntactic graph in NooJ for extracting embedded NPs
Figure 7.6. Stability of syntagmatic and cognitive grammar on a corpus
Figure 7.7. Implementation of NooJ to extract NP from corpuses
Figure 7.8. XML corpus used to extract information
Figure 7.9. Example of the NooJ class dictionary
Figure 7.10. NooJ syntactical graphs for extracting NPs (NP or N“)
Figure 7.11. Labeling chemical formulas present in a specialized text
Figure 7.12. Example of a K tree view for the innovative creation of the “spintronic”
Cover
Table of Contents
Begin Reading
C1
iii
iv
v
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
G1
G2
G3
G4
G5
G6
G7
Digital Tools and Uses Set
coordinated byImad Saleh
Volume 1
Edited by
Samuel Szoniecky
Nasreddine Bouhaï
First published 2017 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
© ISTE Ltd 2017
The rights of Samuel Szoniecky and Nasreddine Bouhaï to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2016957668
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN 978-1-78630-060-7
The management of digital archives is crucial today and for years to come. It is estimated that every 2 days, humanity produces as much digital information as was produced during the two million years that preceded our existence. In addition to this human production is the information that machines continuously produce. With the cost of digital memory becoming ever cheaper, most of this information is stored in vast databases. In 2025, all of these “big data” will constitute nearly eight zettabytes (trillions of gigabytes) [SAD 15]. In our age, there are very few human activities that do not generate digital archives; each day we feed digital workflows even outside our use of computers, telephones or other digital devices. It is enough for us to turn on a light, run errands, take public transport or watch television to produce digital traces that, for the most part, will never be accessible to us, but which are compiled, indexed and calculated in server farms and management centers.
The status of these digital archives is obviously not the same when dealing with the tweet sent automatically by a cow, the digitization of a course by Gilles Deleuze or the 3D modeling of the Citadelle Laferrière near Cap-Haïtien. Even if these archives are ultimately composed of a set of 0s and 1s and are therefore formally comparable to one another, their importance is not equivalent and they particularly vary according to space, time and actor contexts that are faced with this information. The tweet sent by a digital device in relation to a cow’s activities1 is probably not important for most of us, but for the milk producer who wants to follow his herd’s movements to correlate the milk composition with the pastures grazed, it is important to know that a certain pasture has an influence on the amount of fat in the milk. Similarly, a certain passage in Gilles Deleuze’s courses where he speaks of the importance as a fundamental criterion seems to some people like an almost meaningless phrase while it takes on very great importance for the researcher interested in the relationship between ethics and ontology, but also for the reader of these lines who at this very moment is thinking about this concept just by the fact that they are reading it:
“What does that mean, this category? The important. No, it is agreed; that is aggravating, but it is not important. What is this calculation? Isn’t it that? Isn’t it the category of the remarkable or the important that would allow us to establish proportions between the two intransigent meanings of the word proportion? Which depends on and results from the intensive part of myself and which rather refers to the extensive parts that I have2.”
These proportions between the inner-being and the outer-having are quite easily transposed into the domain of digital archives. Due to their dynamic, upgradeable and interactive characters, digital archives are ecosystems where each element can be analyzed in terms of existence made up of “intensive parts” and “extensive parts”. The example of the digitization of the fort at Cap-Haïtien sheds light on the importance of digital archives that illustrate this “intensive/extensive” double dimension that Deleuze emphasizes to show the correlation between an exterior dimension connected to having and the material, and an interior dimension connected to being and the immaterial. In the case of this historic monument classified as a UNESCO World Heritage Site, digital archiving is the chance to develop both a material and immaterial heritage in one of the poorest countries in the world. The creation of an international research program focusing on the issues of augmented realities, the teaching and education of students on these issues, and the mobilization of artists for the innovative use of these technologies are three examples of immaterial heritage development. At the same time, these activities allow for consideration of material heritage development through the implementation of an economy that uses these digital archives to create new services aimed at tourists on cruises passing by this country. Here, the impact of the digital archive goes beyond the scope of a company or that of knowledge by having repercussions on the whole economy of a country through a joint development of material and immaterial heritage.
Consequently, the fundamental issue of digital archives consists in examining their importance at both the material and the immaterial level in order to estimate their relevance in terms of balance between the finality of the digitization process and the uses made of it. Given the breadth that digital archives take on today and their impact on our lives, we must examine the importance of these archives at both the personal and the collective level. These investigations can only be done through long-term collective work that must take place through a pooling of analyses and the constitution of a collective intelligence capable of lending humanity the means to avoid handing over to machines the full responsibility of semantic choices necessary for the interpretation of archives [CIT 10]. Solutions already exist or are being developed as initiatives taken by the W3C to harmonize information management practices; others remain to be discovered from a technical, epistemological, political or ethical point of view.
It is rather trivial to explain what collective intelligence is through the anthill analogy [FER 97] or all other insect societies [PEN 06]. This conception leads to a very partial vision of the phenomenon of collective intelligence and brings about a questionable ethical position in the case of human organizations. The conception of a collective intelligence modelled on insect societies tends to reduce the human participant in this intelligence to a simple and basic being, whose entire complexity must be removed to make each individual react like the whole. As Bernard Stiegler remarks, therein lie the stakes of a war for control of societies through symbols [STI 04]. Furthermore, it is one of the recurring criticisms vis-à-vis collective intelligence that would only be intelligent in name, and would only serve to centralize memory to better control it without allowing new knowledge to emerge [MAY 06].
What sets humans apart from ants is their ability to reflect on the information flows in their interior and thus express a reflective conscience [LEV 11]. As Yves Prié explains, reflexivity is the ability to get back in touch with oneself in order to construct from memory representations allowing the regulation of one’s actions [PRI 11]. This definition, which places reflexivity in an individual context, can nevertheless be understood in a collective framework as well, where individuals share their reflexivity to work collectively in accordance with the consciences of each individual. There we find the basic principles of a science that aims to elaborate a consensus and allows us to define collective intelligence as the sharing of reflexivity in order to complete an action that could not be done by a single person.
But before they can benefit from this collective “ability to act” [RAB 05], the actors must agree to direct their personal interests towards an altruistic sharing of their will. This is possible by formalizing and sharing knowledge while also accepting their validation by collective constraints in order to make the task interoperable and reusable for a community. All of the difficulty of collective intelligence remains in this ability of individuals to agree to restrain their own expressions through formalism, for it quite often challenges habits of reflection. They must not deceive themselves about the primary motivations of humans, which do not necessarily go in the direction of the ethical development of harmonious collaboration. As Yves Citton states, sometimes it is necessary to use tricks to make practices evolve and to anchor them in new social organizations [CIT 08]. It is rather indicative to see that research conducted by Anita Woolley to define a collective intelligence factor confirms that the abandonment of selfish interests in favor of an altruistic approach increases a group’s capacity for collective intelligence. In fact, it shows that each individual’s intelligence has far less impact than the social sensibility of a group’s members, allowing them to listen and not monopolize the discussion in particular [WOO 10].
The issue of restraining individual intelligence in favor of completing a collective action today goes through technical devices and particularly through graphic interfaces that will formalize semiotic systems whose goal is to facilitate individual expression in correlation with the constraints necessary for sharing that expression. The use of a computer language like WIKI is a clear example of going through this constraint to facilitate the interoperability of an individual expression and completing an encyclopedia’s project. These collective intelligence projects do not stop at one computer language; they bring with them an entire knowledge ecosystem at the heart of which these projects will be able to develop through the successive completion of the individual actions.
These are the solutions to these issues that we are going to analyze by taking concrete examples in domains as diverse as corporate innovation or personal archives, but which also have in common the use of collective intelligence to exploit digital archives. To provide a strong coherence to the diverse examples and to handle all of the complexity of the issues they present, we will analyze the solutions following the analogy of ecosystems. In these solutions, which implement collective intelligence approaches in relation to the use of digital archives, we will see how these practices can be analyzed by understanding information not as inert objects but as autonomous beings that develop distinctive ways of life [LAT 12].
The goal of our proposed model consists of developing a generic method for analyzing the ecosystems of knowledge that make up a complex universe of simultaneously complementary and antagonistic relationships between a multitude of human, mechanical, institutional, conceptual, documentary, etc. relationships. With this model, we hope to provide researchers with the means to describe their fields of research and the arguments they defend through the modeling of informational beings. The goal is to be able to render analyses interoperable through the automatic comparison of these beings. To achieve comparative analyses of these ecosystems, we model the informational beings by crossing the Gille Deleuze’s Spinozan logical principles [DEL 68] with those of Philippe Descola [DES 05]. Concerning Deleuze, we return to the three dimensions of existence (extensive parts, relationships, essences) correlated with three types of knowledge (shocks, logic, intuition). As for Descola, we use the ontological matrices that characterize the relationships between physicalities and interiorities. More specifically, we focus on the analogy ontology that actually corresponds to the case of digital archives and collective intelligence, given digital physicalities’ unlimited transformational capacity and the multiplicity of interiority relationships proposed by collective intelligence:
‘This continued struggle between a vertiginous ocean and relationship networks, always in the process of multiplying their connections, strictly defines analogism, a word that wonderfully summarizes and paints our objective world, our cognitive tasks, our subjective dreams, and the groups that are born today and will do the politics of the future.’ [SER 09, p. 85]
With the help of these principles, we form unique representations that we describe as monads. They are made up of four groups: documents, actors, concepts and relationships. Within each group, the elements maintain relations of differential semantics [BAC 07, p. 142] following the relative position of an element in relation to two axes, that of the father and the brother in a tree.
Figure 1.1.Principles of differential semantics
The levels defined by the position of the elements in a tree of father–son hierarchies put in contact with the element number in each group gives a precise metric of the monad. This metric allows the level of complexity of a being to be known in order to automatically compare interpretations that cover the same documents, the same actors and the same concepts. We call this metric the Existential Complexity Index (ECI), and we are developing a tool to automatically calculate this index using modeling of a being.
Figure 1.2.Modeling an empty being
Each monad and the associated ECI is a unique description of the state of an ecosystem of knowledge at a given moment for a given person. This state gives a particular perspective on the ecosystem; it does not seek exhaustivity, but rather the expression of an interpretation that serves to support arguments and creates the potential for controversies from which the consensus necessary for collective action may emerge.
The research conducted in the field of digital humanities produces new archive sources that are challenging the division traditionally used by historians and the literati to distinguish between “primary” sources, those produced by the object of study, and “secondary” sources, those produced by research activity. The use of digital technologies leads to the creation of “secondary” archives in the form of databases that, if they are accessible and interoperable, automatically become new “primary” sources for a reflexive analysis of research activities or for other researchers studying the same field. The creation of these digital archives and, more specifically, the durable dimension of their use, conditions the researcher’s task by putting an emphasis on the formalization of the task in such a way that it becomes open, interoperable and lasting. This scientific imperative is imposed upon researchers more and more by the simple fact that they work on projects where the digital dimension is central, as it guarantees financing. The question then arises, how can this data be produced and made visible without being an expert in computer science or knowledge engineering?
Figure 1.3.Recursive cycle of sources
Muriel Louâpre and Samuel Szoniecky aim to tackle this question by analyzing the task performed in the framework of the ANR Biolographes project. This very concrete terrain allows for examination of the nature of digital archives produced by research to extract the special features particular to the field of human science. After a presentation of the digital practices implemented in this type of research, the specific case of visualization methods is dealt with by a review of the primary tools available on the Web in order to critique the epistemological and practical limits. Using the same body of data, the authors show the utility of these tools for quickly testing the coherence of data, for visualizing networks, or for multiplying the approaches and defining new research perspectives. Finally, they reflect on a generic method for modeling influence networks using a prototype developed specifically to help researchers describe their interpretations so that they are interoperable with other perspectives. The goal of this process is to provide cognitive cartographies serving as an aid for the elaboration of a scientific consensus.
Figure 1.4.Mapping the influence networks. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
From these reflections emerges the result of a sometimes-difficult dialogue between researchers coming from different fields of expertise. Faced with the digital “black box”, digital models can be imposed upon researchers whose needs in terms of information processing are too often not explained concretely. Even if the lure of a button that can simply be pushed to obtain the relevant information starts to disappear after disappointments and frustrations during the dialogue with the machine, the lack of knowledge engineering training remains flagrant at times. Beyond knowing what the machine can do, it is important for humanities researchers who use digital technology to understand in what way they also bring reorganization to the collective task and research practices.
As we explained above, there are multiple examples of digital archive creation and they not only concern the field of research, but also cultural heritage. Lénaik Leyoudec is interested in the process of editing these digital archives, wondering about the possibilities of preserving the meaning and intelligibility of heritage documents. To explore these issues, he references the differential semantics defended by Rastier [RAS 01] and Bachimont [BAC 07] (Figure 1.1) to deduce an interpretive approach that can be broken down into three consecutive phases: semiotic analysis, document validation method and architext editing. As with the propositions of Muriel Louâpre and Samuel Szoniecky, Lénaik Leyoudec emphasizes the interpretation of the digital archive and the need to equip this process in order to preserve it in the best way possible.
In the framework of an experiment on various audiovisual funds that possess “semiotic objects” belonging to the “private cinema” register, a precise analysis of the cinematographic structure shows how the interpretive approach allows the definition of “memory indicators” at different levels, depending on whether there is interest in a specific plan (micro), a related plan (meso) or all of the segments (macro). This first level of semiotic analysis is enriched by an analysis of the cinematographic indicators specific to family films to bring about the emergence of “perceptive saliences” like so many “memorial diegeses” that will serve as the basis for archive editing. The editing principle proposed goes through the transcription of memory indicators into as many annotations that will define a generic typology: “person”, “place”, “object”, “date” and “activity”. What is being played at in this stage of editing is the mobilization of Linked Open Data resources like Wikidata.
Fortified by this ambition, a digital device is developed to respond specifically to the needs of family film editing. Devised as an ecosystem of “écrits d’écrans” bringing a semiotic polyphany into play, this tool accompanies the user in the interpretation process by facilitating document annotation. Particularly through a timeline representing the sequences of audiovisual flow, it allows the construction of a graphic in the form of networks for navigation between the categories, a research interface to find the annotations and a device for linking categories with the Linked Open Data resources.
Figure 1.5.Editing the archives via the semantic web. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
Numerous questions were raised by this experiment, matching general issues concerning digital archives and the place of collective intelligence in its validation. One of the primary issues concerns the preservation of the document’s integrity. Each edition of the document, each interpretation, modifies the primary resources, sometimes by enriching it and sometimes by altering it. Archaeologists know well that as the dig moves forward, they destroy sources of information. Conversely, digital technology allows continuous archiving of resources and their annotations; everything can be preserved. But is this really the most important thing? Is it better to enrich digital memories or to stimulate humans’ interpretive experience? If preference is given to the latter approach, it is clearly not necessary to preserve everything for the simple fact that nothing exhausts humans’ interpretive capacity, as is shown by the multitude of interpretations for a single book over millennia or a simple sunrise.
The previous solution proposes a tool dedicated to the analysis of family films by using Linked Open Data to increase the interoperability of interpretations; other researchers are working on similar tools with the aim of facilitating the subjective appropriation of audiovisual data to transform them into a meaningful object. The ANR Studio Campus AAR3 project has allowed for the development of a tool dedicated primarily to academia and research that increasingly uses audiovisual data as research and educational material. In this context, archives are devised as a hub serving as a reference between different communities that form communication ecosystems and lead to a semiotic turning point given the specificity of activities concerning these data.
Structured like foliage of outlines oscillating between the content and the expression, the semiotics of the audiovisual data spreads out according to genres (fiction, documentaries, etc.) and a compositional hierarchy that imposes organization structures and restricts interaction with the data. To describe this system of signs, this tool’s creators use the landscape analogy to define a metalanguage and methods of description. In doing so, they make the concrete management of audiovisual data analysis, publication and reediting activities possible.
The Studio Campus AAR sets out to accompany users following two complementary perspectives, the activities of construction and those of audiovisual data appropriation. These activities are made up of steps, themselves structured into procedures that will serve as the basis for orchestrating the data rewriting practices at the thematic, narrative, expressive, discursive and rhetorical levels. These writing/rewriting operations mobilize complex cognitive operations in an intercultural context of re-coverage.
Devised as a software infrastructure based on cognitive and semiotic approaches, this tool aims to provide actors in the audiovisual world with the means to deal with a document in order to transform it into an intellectual resource for cultural education, research and mediation. To achieve this, the solution is organized around an RDF database and a work environment proposing the functionalities necessary for activities of re-coverage: addition of an archive, analysis with an ontology, management of individuals, publication/republication, research, modeling the discourse universe.
Figure 1.6.Studio Campus AAR
At the heart of this platform, knowledge graph editing constitutes a crucial point, particularly for giving those not specialized in knowledge engineering the means to model and analyze the corpuses with languages originating from works of the W3C like RDF, OWL2 and SPARQL. The means of achieving this consist of providing examples of ontology or ontological structures in the form of patterns defining restriction trees. Once the graphs are edited, they can be resolved following different argumentation algorithms that automatically analyze the corpus to deduce content suggestions. The graphical representation of a knowledge graph is another challenge that the Studio Campus AAR is trying to tackle, particularly to reduce the complexity of editing and to respond to the criteria of simplicity, adaptability, dynamism and reusability.
There are various applications of these knowledge graphs that cover all needs through audiovisual analysis. First of all, the media analysis, which consists, for example, of describing the subjects mentioned in the document in the form of strata divided on the audiovisual steam’s timeline. This description uses various ontological reference documents and SKOS vocabulary by proposing description patterns via dynamic formulas that suggest ontology entities while the user is typing. These principles are also applied to the management of individuals who will be gathered for faceted questioning, which completes the information search applications via SPARQL requests. Some other applications of this tool to be mentioned are the management of corpuses and author publication.
To finish, Studio Campus AAR offers a complete platform for analyzing audiovisual documents by means of knowledge graphs using formal reference languages (RDF, OWL2, SKOS, etc.) that make the analyses produced durable and interoperable. In this sense, this tool illustrates the work necessary for the formalization of digital archives, so that these will provide knowledge allowing collective intelligence to be developed.
Even before being able to promote digital archives, they must first be created by digitizing sources that have not yet been digitized. This task, very simple when the source is recorded directly using digital tools like a word processor or a digital camera, becomes much more difficult when the sources come from a library, or the increase in volume and sometimes their fragility make it difficult to go from an analog to a digital version, and more still the exploitation of digital data that cannot yet be understood by machines. Mathieu Andro and Imad Saleh introduce an original typology of the collective intelligence solutions that can be put into practice to optimize this task through analysis of the notion of “crowdsourcing” and how it is practiced in libraries.
“Crowdsourcing” literally means mobilizing the masses as a resource to carry out a task, but there are different definitions according to whether outsourcing, conscious involvement, volunteering, collaboration, etc. are considered. Whatever the case may be, these practices can be considered to go very far back in time, for example, connecting them to the appeals made in the 18th Century to resolve scientific problems like determining the longitude of a boat at sea; also, the conceptual origins of this notion find their roots in socialist, Marxist, anarchist, humanist or liberalist ideologies, ideologies that actually place the debate in the political domain, particularly on questions of the “uberization” of libraries.
Figure 1.7.Crowdsourcing in libraries. For a color version of the figure, see www.iste.co.uk/szoniecky/collective.zip
To analyze these collective collaboration practices, categorizing according to the degree of participant engagement offers a non-negligible quantitative criterion, but one that can be enriched by other, more qualitative criteria. The authors propose, for example, differentiating the implicit practices like gamification or ludification, which consists in appealing to participants’ desire to play. “Crowdfunding” constitutes one of the other large categories of “crowdsurfing”, where participation is essentially financed like digitization or on-demand printing, for example, which makes it possible to have players pay for a part of the hard work done.
In libraries, there are various challenges of externalizing micro-tasks to Internet users. In addition to reducing costs for correcting errors made by optical character recognition (OCR) tools, these practices would allow the collection to be reedited so as to enrich the existing indexes at the book level with more precise categorization at the page or even the sentence level. However, the management of libraries is not always open to outside participation, especially the devaluing of employees’ jobs, particularly their expertise in categorizing and indexing. Among the other difficulties that halt the development of these collective intelligence projects, we can include the employment of a person dedicated to stirring communities often perceived as useless, the low quality of production and the poor reintegration into information systems, and the difficult evaluation of these projects.
It can be seen here that “crowdsourcing” projects in libraries focus on various issues that allow a better understanding of the relationships between digital archives and collective intelligence. Despite all of these difficulties and the fact that the masses are not always very sensible, “crowdsourcing” is nevertheless a practice that brings about numerous innovations in the fields of technology, economics, politics and even personal development. Let us hope that these experiences will lead to concrete solutions so that we may better coexist in hyper-connected societies.
Human activities leave numerous material and immaterial traces that together make up the cultural heritage whose durable and interoperable promotion is today going through knowledge modeling. To do this, the community of this domain has developed formal languages that take the form of metadata norms like the Dublin Core, LIDO, MODS, EDM, etc. These are completed through the use of controlled vocabularies like KOK, SKOS, RAMEAU, etc. by lexical databases like Wordnet and by ontologies like CIDOC CRM. However, four primary difficulties make knowledge modeling for cultural heritage difficult: the acquisition of data, knowledge modeling, usage and interoperability.
Concerning the acquisition of data, the problem of balance between the complexity of heritage objects, the complexity of implicit expert knowledge and the complexity of formal languages must be resolved. For example, it is often difficult for experts who have their own vocabularies and systems of description to use ontologies whose organization and way of working are different. To facilitate this communication between implicit user knowledge and formal knowledge, it is possible to model ontological paths that will guide the user in the formal description of his or her knowledge. Another way to perform this task consists of automating data input through automatic language processing technology or through the integration of different data sources. In this case, the problems of contradictory data must nevertheless be managed through the use of a named graph.
Figure 1.8.Conservation and promotion of cultural heritage
The diversity of approaches for modeling information is a central issue. Depending on whether the models come from the field of museums, libraries or archaeology, the approaches are different and the harmonization of these is not always clear. There are methods for automatically calculating the approximation between various formal models that for some use the extension of basic ontological classes and for others appeal to thesauri to enrich the terms in the field.
User profiles condition the uses that will be made of the computer system for the conservation and promotion of cultural heritage. These uses will evolve according to their level of knowledge of the semantic web’s technologies, their expertise in the domain and the nature of the terminology. The interfaces of visualization and interaction with information from then on become a fundamental issue so that collective intelligence can be developed effectively. If they are too complex, the tool will not be used; if they are too simple, they will not serve the users’ needs.
As there are multiple ways of describing knowledge, interoperability becomes a challenge, particularly according to the structuring choices that will be made. Even if there are also tools to compare these different structures, the first solution to this type of problem consists of using knowledge models with an elevated level of conceptualization like the OAI-PHM protocol.
