12,99 €
The human mind is incredible. It solves problems with ease that will elude machines even for the next decades. This book explores what happens when humans and machines work together to solve problems machines cannot yet solve alone. It explains how machines and computers can work together and how humans can have fun helping to face some of the most challenging problems of artificial intelligence. In this book, you will find designs for games that are entertaining and yet able to collect data to train machines for complex tasks such as natural language processing or image understanding. You will also find concepts and solutions for some of the various challenges of these games.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 179
Veröffentlichungsjahr: 2014
Homo Ludens in the Loop
© 2014 Markus Krause
Cover Design: Markus Krause
Publisher: tredition GmbH, Hamburg
Paperback
(ISBN: 978-3-8495-9205-9)
Hardcover
(ISBN: 978-3-8495-9206-6)
e-Book
(ISBN: 978-3-8495-9207-3)
Das Werk, einschließlich seiner Teile, ist urheberrechtlich geschützt. Jede Verwertung ist ohne Zustimmung des Verlages und des Autors unzulässig. Dies gilt insbesondere für die elektronische oder sonstige Vervielfältigung, Übersetzung, Verbreitung und öffentliche Zugänglichmachung.
Markus Krause
Homo Ludens in the Loop
PLAYFUL HUMAN COMPUTATION SYSTEMS
A dissertation submitted in partial fulfillment
of the requirements for the degree of
“Doktor der Ingenieurwissenschaften” -Dr.Ing.
at the University of Bremen
This work was partially funded by
the Klaus Tschira Foundation
Supervisors:
Prof. Dr. Rainer Malaka
(University of Bremen, Bremen, Germany)
Prof. Luis von Ahn
(Carnegie Mellon University, Pittsburgh, USA)
Für meinen Sohn,
meine Frau
und
meine Familie,
die mir alle so sehr geholfen haben,
in all diesen wilden Jahren.
Table of Content
1. Introduction
2. State of Art
2.1 Background
2.2 Related work
2.3 Common Concepts
3. FastTag
3.1 Introduction
3.2 Binary Verification
3.3 Conclusion
4. OnToGalaxy
4.1 Introduction
4.2 Design
4.3 Game Play
4.4 Task Design
4.5 User Experience
4.6 Conclusion
5. Webpardy
5.1 Introduction
5.2 The Webpardy Online Game
5.3 Quality Management
5.4 Results
5.5 Conlusion
6. Empathy
6.1 Introduction
6.2 General Game Idea
6.3 GameLab Tool-Suite
6.4 Experiment
6.5 Empathy Game setup
6.6 GuessIt Game Setup
6.7 Crowdsourcing Setup
6.8 Results
6.9 Conclusion
6.10 Future Work
7. Dewknow
7.1 Introduction
7.2 General Idea
7.3 Response Integration
7.4 Calculating the Response Matrix
7.5 Calculating the Requests Probability Vector
7.6 Calculating the Best Fit Kappa
7.7 Evaluation
7.8 Conclusion
8. Bouncer
8.1 Introduction
8.2 Crowdsourced Translation
8.3 Study Design
8.4 The Feature Vector
8.5 Machine Learning Algorithms
8.6 Results
8.7 Conclusion & Future work
9. Conclusion
9.1 Identification
9.2 Observation
9.3 Evaluation
9.4 Motivation
10. Bibliography
11. Appendix
11.1 Figures
11.2 Table of Tables
11.3 List of Synonyms
11.4 OnToGalaxy Scaled Figures
Introduction
1
Humans and machines have abilities so different that powerful systems emerge when their abilities are combined. Humans on the one hand can handle a wide range of tasks even building problem solving machines. Their strength is to solve problems effectively. Machines on the other hand solve only a narrow range of problems they are designed for. However they do that very efficiently. The goal of human computation is to combine the flexibility and effectiveness of humans and the power of machines to store, distribute, and process large amounts of data. This approach however introduces a variety of challenges. The aim of this thesis is to explore these challenges in the context of human computation systems with ludic elements in particular but also to draw general conclusions from relevant findings.
The most dominant challenge this thesis will investigate is to offer human contributors a valuable reward for their participation. One possible approach to this challenge is to design human computation systems in a way that makes their use an inherently pleasurable experience. A promising way to make tasks more pleasurable is to integrate human computation tasks into digital games as pioneered by Luis von Ahn. Games with other purposes than enjoyment are also called “serious games”. In contrast to traditional “serious games” human computation games are not a medium to “teach” human beings. Human computation games reverse the flow of information and let humans create data for computational systems. Chapters 4, 5, and 6 investigate new ways how to design ludic elements for human computation. They explore the design space of systems with homo ludens in the loop and add new games to this space that broaden and deepen player’s gaming experiences.
A common challenge of human computation systems is data reliability. Humans are expected to be unreliable; especially in ludic environments where a playful interaction with the system to test its borders is expected. Therefore, players may generate false data either on purpose or for other reasons. Different strategies have evolved to deal with this issue. As human computation tasks are by definition not efficiently solvable by an algorithm, it is necessary to find strategies to handle this challenge. Chapters 6, 7, and 8 investigate different strategies based on probabilistic methods to ensure data reliability in ludic environments. The goal of these strategies is to maximize data quality and to minimize restriction of game design.
Human computation systems generate useful data primarily by observing human behavior and interactions with the system. Designing interactions and developing strategies to gather and interpret human behavior is therefore a vital element. A variety of interaction designs and survey methods has been developed by different human computation approaches. Chapters 3 and 4 layout a new interaction method to maximize data quality and to simplify and speed up task execution. Chapter 4 shows how choosing an appropriate observational method can allow for greater freedom in game design and allow for new mappings of tasks to ludic systems.
Finally this thesis will investigate mappings between tasks and games. Issues that are of interest to human computation are those that are by definition not effectively or efficiently solvable by computational systems. In general, the challenge is to identify a problem or sub problem that is hard to compute but easily done by humans. Finding good candidate tasks is challenging as many problems that are hard for computers are also hard for humans. However many of the tasks efficiently solvable with human computation systems follow certain patterns. Five of these patterns will be discussed in Chapter 2.3. Each pattern takes advantage of a specific human ability, namely aesthetic judgment, making intuitive decisions, contextual reasoning, common sense knowledge, and free interactions with the physical world. This thesis primarily contains original work on tasks involving common sense knowledge in Chapters 3 and 4 and contextual reasoning in Chapters 3, 4, and 5. All chapters illustrate how certain task or task pattern can be mapped to digital games with only small changes to task and game design.
State of Art
2
This chapter presents the current state of the art of human computation systems and digital games research. In the first part of this chapter, a number of basic concepts and common challenges of human computation systems are explored and tied to corresponding research projects and literature. It continues with a description of literature in the field of digital games research and emphasizes the relevance for this thesis. The second part explores related work in the area of human computation with digital entertainment systems highlighting the respective foci and strengths as well as explicating what separates these existing works from the desirable approach envisioned in the previous chapter.
2.1 Background
Despite the fast-pace growth in speed and capacity and the increasing global interconnectedness of computational machines, human mental abilities still outperform computational systems in many domains. An early work about potential areas was published by Naor (1996). In this work Naor mentions various problems useful as the source for automated Turing Tests such as gender, handwriting, or speech recognition. The design of traditional computational systems that handle contextual and semantic problems, for example, remains a challenge, while human beings are often capable of solving such problems without much conscious cognitive effort because of human common sense knowledge and contextual understanding. Examples for this application domain of human computation are tasks such as image or audio labeling and natural language understanding. Context is a common term in various scientific fields like linguistics and communication theory. Context in the scope of this work means the whole of implicit information about an object such as time, location or personal and situational context.
Methods of crowdsourcing as well as human computation are applicable for various context and semantic related tasks. Prominent examples are resource labeling or tagging tasks as presented by various authors (Diakopoulos & Chiu, 2007; Ho, Chang, Lee, Hsu, & Chen, 2009; von Ahn & Dabbish, 2004). Yet another common task is audio annotation as presented by Barrington et al. (2009) as well as Diakopoulos et al. (2008) or Kim (2008). More detailed descriptions of some of these approaches can be found later in this chapter. Furthermore natural language understanding is also a promising application area for human computation as shown by various authors. Callision-Burch et al. explores using Mechanical Turk for the purpose of collecting data for human language technologies in a general way (2010). Resnik et al. proposes targeted paraphrasing as a new approach to obtaining cost-effective, reasonable quality translation by monolingual speakers in combination with machine translation (2009). They showed that it is possible to identify translation errors with only monolingual knowledge of the target language. They also demonstrated that it is possible to generate paraphrases with only monolingual knowledge of the source language. Other examples for natural language tasks were given by various authors (Chamberlain, Poesio, & Kruschwitz, 2008; Orkin & Roy, 2007; Siorpaes & Hepp, 2008).
In general, computational systems are considered to be very efficient in solving problems of large numbers. However with NP-hard problems, humans can sometimes be able to intuitively solve them much more efficiently. Specific application-domains lie in combinatorial optimization tasks (Bonetta, 2009) and solving packing problems (Andrea et al., 2002). Even though it is yet unclear whether optimal solutions for these problems are feasible, different approaches show that human mental abilities can outperform current computational systems. Humans are able to solve some of these problems in an intuitive manner and thereby overcome issues like local minimum/maximum traps (Corney et al., 2010). In contrast to an algorithm which is based on the logical reasoning of its designer, intuition is the ability to gain insight into something; to form an opinion, or to find an ad-hoc solution; without a conscious reasoning process. As there still is an ongoing discussion in various fields about the complex mental processes behind intuition, it becomes obvious that intuition is not yet reproducible with current models in computer science. Human computation on the other hand, allows for utilizing this human mental ability to find better solutions or algorithms to handle puzzle-like combinatorial problems. Human computation systems such as FoldIt (Bonetta, 2009), Plummings (Terry et al., 2009), Phylo (Kawrykow & Roumanis, 2011), and others exploit this human ability to solve different NP-hard problems. Corney et al. (Corney et al., 2010) report on how packing problems has been used to capture human problem solving strategies. They designed a task for Mechanical Turk and measured how human contributors solved the presented packing problems. They recorded the type of actions a contributor performed on individual shapes as well as the packing efficiency of the resulting solutions.
While humans can outperform algorithms in some situations, most NP-hard problems are also challenging for humans. Human computation systems dealing with such problems need contributors willing to participate for a relatively long term to find solutions that are better than algorithmic ones. Therefore, only relatively few tasks can be tackled and strong incentives are necessary. Ostensibly, computational systems that display a level of perception and understanding of aesthetics that is comparable to that of humans would be able to generate useful complex images, motion design or audio environments. Human computation approaches in this application domain were explored by Talton et al. (2009) and Dawkins (1987), who make use of human aesthetic judgment in order to create natural looking lighting for virtual environments, or to model objects in two and three-dimensional space. Nevertheless, the problem space of aesthetic judgment is investigated by comparatively few approaches, even though it holds potential to assist in the development of more accurate simulation systems in various domains. Possible examples thereof are physical simulations as well as crowd simulation systems for serious and entertainment purposes.
Lastly, the ability of a computational system to sense the physical world and act in it is usually limited. Humans, of course, can directly interact with their physical environment. Tuite et al. (2010) gave an example of a digital game to reconstruct real world locations as detailed 3D models from photographic images. The game called PhotoCity was designed to collect a large quantity of photographic data. The game is played outdoors with a camera and players take photos to capture flags and take over virtual models of real buildings. Matyas also proposes games as tool to collect geospatial data (Matyas, 2007; Matyas, Matyas & Schlieder, 2008). In his paper he used digital mobile games to collect geographic data by player communities of location-based games. He identified three types of geographic data players can collect: data about the localization/communication network, data about the geographic environment, and related non-geographic information. He also presented game design patterns that permit to gather this data. The approach was illustrated with the game CityExplorer. As with aesthetic judgment, real world interaction has not been explored in depth yet. Because it is more imminent in mobile systems, this might change in the near future. Promising applications could be in crowd-sourced journalism or disaster management. Especially crowdsourced-journalism seems to be a vital idea as seen by the events during the green revolution in many Arabic countries in the last years.
Besides exploring the application domains of human computation, many projects have developed sound survey strategies to collect high quality data from human contributors. A well thought out interaction design and sound survey strategies can help to reduce error rates or unwanted behavior. Using different workflows for the same task can reduce error rates as described by Lin et al. (2012). The paper explores how dynamic switching between workflows and therefore different interaction designs can improve data quality. Other approaches use task independent data to detect unwanted behavior. Language evaluation for instance can take advantage of a language independent feature vector that contains values about user behavior to predict whether a users input is reliable (Kilian, Krause, Runge & Smeddinck, 2012).
Human computation systems generate useful data primarily through the observation of human behavior and human interactions with computational systems. The observation strategies differ in terms of their quantitative requirements and based on the complexity of the task at hand. Labeling tasks for instance, despite being relatively simple, require large amounts of data, as in the web image labeling task in the ESP game (von Ahn & Dabbish, 2004). These systems most often parallelize their process as described by Little et al. (2010). When thousands or millions tasks have to be processed a key issue is simplicity and clarity of the interaction design. This means that a contributor directly understands the task at hand. A strategy to simplify tasks can be to let contributors select answers from predefined sets, instead of formulating free responses. Examples for such a simplification are given by Dasdan et al. (2009) as well as Krause et al. (2009).
Human computation systems can also be used to solve more complex tasks that necessitate strong commitment of their contributors, such as Phylo (Kawrykow & Roumanis, 2011) or FoldIt (Bonetta, 2009). These systems typically gather fewer replies per task and benefit more from iterative approaches as also described by Little et al. (2010). In cases where huge amounts of complex tasks have to be processed a direct use of human computation can be expensive. Initial training data for artificial systems can however be acquired with human computation. Such systems can then handle these tasks more accurately then before. Various approaches in this direction were presented (Brew & Greene, 2010; Lease, 2011; Quinn, Bederson, Yeh & Lin, 2010). Other examples are Google’s translation system, where users are able to propose better solutions to given translations. Corney et al. propose a method to learn problem solving strategies from human contributors to enhance machine based strategies (Corney et al., 2010).
Another challenge in respect to interaction design is the field of human subject surveys. For surveys it is crucial to have an insight into various parameters like demographic, previous experience as a participant, and other factors. Schmidt discussed this as a major problem of crowd-sourced human subject studies (Lauren A. Schmidt, 2010). Many Researchers are also concerned that participants fill out surveys haphazardly in unsupervised environments. Kapelner and Chandler describe possible designs to deal with this circumstance (Kapelner & Chandler, 2010).
A dominant and prevailing challenge for human computation systems is data reliability. Humans are expected to be unreliable and generate inconsistent data. Different validation strategies have evolved to deal with this issue. As human computation tasks are by definition not efficiently solvable solely by algorithms, the validation process cannot be as trivial as comparing the answers of the contributors to the results of according algorithms. A common approach is cross-validation where replies to questions are only accepted if a pair of contributors agrees on the answer. Other methods calculate the reliability of the contributors. These methods judge the quality of an answer given by a contributor based on statistical values calculated for that contributor.
User-centered validation strategies are common especially in gaming contexts to solve the quality management issue. A frequent seen approach is to pair contributors and only accept answers both can agree on. Instances for this approach are various (Barrington et al., 2009; Bernstein et al. 2009; Ho et al., 2009; von Ahn, Kedia & Blum, 2006). Standard methods for this type of validation are Input- and Output-Agreement. Output-Agreement games are a generalization of the ESP game. Two strangers are randomly chosen. In each round, both are given the same input and must produce outputs based on the input. Game instructions indicate that players should try to produce the same output as their partners. Players cannot see one another’s outputs or communicate with one another. Both players must produce the same output. They do not have to produce this output at the same time but must produce it while the input is displayed onscreen (Law & von Ahn, 2009). In Input-Agreement games two players are shown either the same object or different objects and each is asked to type a description of their given object. Based on these descriptions, the players must decide whether they have been given the same object (Law & von Ahn, 2009).
Machine based approaches rely on calculating reliability of contributors, or using simple classifiers to accumulate multiple answers for a consensus. Examples for such methods can be found in different publications. Methods such as Majority Vote and Naïve Bayes (Kumar & Lease, 2011) as well as expectation maximization are common (Ipeirotis, Provost & Wang, 2010). Another solution found is calculating certain trust values for each contributor. These values are calculated based on user responses to gold-relations which are interspersed to test the users’ reliability (Krause & Smeddinck, 2012). Gold standard methods are a common practice and even algorithms to automatically generate them are used in the field (Oleson et al., 2011). In some situations, a traditional computational system can, though unintuitive at first, evaluate the quality of a given answer yet not capable of generating the answer by itself. Examples are FoldIt and Phylo where a computer can compare the contributions to existing answers and measure qualitative difference between them (Bonetta, 2009; Kawrykow & Roumanis, 2011).
This thesis contrary to many other publications in the fields of “serious games”, gamification, and human computation games is devoted to the idea of play as an inherent aspect of humanity. Johann Huizinga described in his book Homo Ludens (1944) that every human development personnel as well as cultural happens through games and play. A human being became conscious of his/her abilities through play and experiences made in games. Huizinga thereby defines a conceptual space in which play takes place. He describes play as: “a free activity standing quite consciously outside ‘ordinary’ life as being ‘not serious’, but at the same time absorbing the player intensely and utterly. It is an activity connected with no material interest, and no profit is gained by it.”
This humanistic point of view sometimes seems to be forgotten when topics such as gamification and “serious games” are discussed. Many times it seems as Tom Sawyer is playing his famous white washing trick to let humans do work without an adequate reward. This is sometimes done with the best of intentions, nonetheless neglecting the nature of play. Even though the name of this thesis is inspired by Huizinga’s work and many thoughts are based on philosophers such as Wittgenstein, this thesis will not provide a philosophical discussion about games or play. It will however from time to time refer to these ideas.
Philosophical and cultural investigations on games and play in general were done by different authors such as Wittgenstein, Caillois, and Suits (Caillois, 1961; Suits, 1978; Wittgenstein, 1953). An early work on the cultural meaning of digital games was written by Espen Aarseth(Aarseth, 1997). In this book he laid out the idea that the medium itself is an important factor. Sometimes the medium is as important as the message it transports and can contain a message on its own. This idea became the basis for the ludological theory of games. In contrast at the same time another theory emerges that considered games as a novel form of narrative such as films and books. This point of view is most often attributed to authors such as Murray (1998), Atkins (2003), and Jenkins (2003). Even though the debate between these two groups was an important phase for the field of game studies, this thesis will not go into detail on this debate but mention it for the sake of completeness. An interesting final comment was however given by Murray (2005).