Computational Paralinguistics - Björn Schuller - E-Book

Computational Paralinguistics E-Book

Björn Schuller

0,0
97,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book presents the methods, tools and techniques that are currently being used to recognise (automatically) the affect, emotion, personality and everything else beyond linguistics (‘paralinguistics’) expressed by or embedded in human speech and language.

It is the first book to provide such a systematic survey of paralinguistics in speech and language processing. The technology described has evolved mainly from automatic speech and speaker recognition and processing, but also takes into account recent developments within speech signal processing, machine intelligence and data mining.

Moreover, the book offers a hands-on approach by integrating actual data sets, software, and open-source utilities which will make the book invaluable as a teaching tool and similarly useful for those professionals already in the field.

Key features:

  • Provides an integrated presentation of basic research (in phonetics/linguistics and humanities) with state-of-the-art engineering approaches for speech signal processing and machine intelligence.
  • Explains the history and state of the art of all of the sub-fields which contribute to the topic of computational paralinguistics.
  • C overs the signal processing and machine learning aspects of the actual computational modelling of emotion and personality and explains the detection process from corpus collection to feature extraction and from model testing to system integration.
  • Details aspects of real-world system integration including distribution, weakly supervised learning and confidence measures.
  • Outlines machine learning approaches including static, dynamic and context‑sensitive algorithms for classification and regression.
  • Includes a tutorial on freely available toolkits, such as the open-source ‘openEAR’ toolkit for emotion and affect recognition co-developed by one of the authors, and a listing of standard databases and feature sets used in the field to allow for immediate experimentation enabling the reader to build an emotion detection model on an existing corpus.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 761

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Cover

Title Page

Copyright Page

Dedication

Preface

Acknowledgements

List of Abbreviations

Part I: Foundations

1 Introduction

1.1 What is Computational Paralinguistics? A First Approximation

1.2 History and Subject Area

1.3 Form versus Function

1.4 Further Aspects

1.5 Summary and Structure of the Book

References

2 Taxonomies

2.1 Traits versus States

2.2 Acted versus Spontaneous

2.3 Complex versus Simple

2.4 Measured versus Assessed

2.5 Categorical versus Continuous

2.6 Felt versus Perceived

2.7 Intentional versus Instinctual

2.8 Consistent versus Discrepant

2.9 Private versus Social

2.10 Prototypical versus Peripheral

2.11 Universal versus Culture-Specific

2.12 Unimodal versus Multimodal

2.13 All These Taxonomies – So What?

References

3 Aspects of Modelling

3.1 Theories and Models of Personality

3.2 Theories and Models of Emotion and Affect

3.3 Type and Segmentation of Units

3.4 Typical versus Atypical Speech

3.5 Context

3.6 Lab versus Life, or Through the Looking Glass

3.7 Sheep and Goats, or Single Instance Decision versus Cumulative Evidence and Overall Performance

3.8 The Few and the Many, or How to Analyse a Hamburger

3.9 Reifications, and What You are Looking for is What You Get

3.10 Magical Numbers versus Sound Reasoning

References

4 Formal Aspects

4.1 The Linguistic Code and Beyond

4.2 The Non-Distinctive Use of Phonetic Elements

4.3 The Non-Distinctive Use of Linguistics Elements

4.4 Disfluencies

4.5 Non-Verbal, Vocal Events

4.6 Common Traits of Formal Aspects

References

5 Functional Aspects

5.1 Biological Trait Primitives

5.2 Cultural Trait Primitives

5.3 Personality

5.4 Emotion and Affect

5.5 Subjectivity and Sentiment Analysis

5.6 Deviant Speech

5.7 Social Signals

5.8 Discrepant Communication

5.9 Common Traits of Functional Aspects

References

6 Corpus Engineering

6.1 Annotation

6.2 Corpora and Benchmarks: Some Examples

References

Part II: Modelling

7 Computational Modelling of Paralinguistics: Overview

References

8 Acoustic Features

8.1 Digital Signal Representation

8.2 Short Time Analysis

8.3 Acoustic Segmentation

8.4 Continuous Descriptors

References

9 Linguistic Features

9.1 Textual Descriptors

9.2 Preprocessing

9.3 Reduction

9.4 Modelling

References

10 Supra-segmental Features

10.1 Functionals

10.2 Feature Brute-Forcing

10.3 Feature Stacking

References

11 Machine-Based Modelling

11.1 Feature Relevance Analysis

11.2 Machine Learning

11.3 Testing Protocols

References

12 System Integration and Application

12.1 Distributed Processing

12.2 Autonomous and Collaborative Learning

12.3 Confidence Measures

References

13 ‘Hands-On’: Existing Toolkits and Practical Tutorial

13.1 Related Toolkits

13.2 openSMILE

13.3 Practical Computational Paralinguistics How-to

References

14 Epilogue

Appendix

A.1 openSMILE Feature Sets Used at Interspeech Challenges

A.2 Feature Encoding Scheme

References

Index

This edition first published 2014 © 2014 John Wiley & Sons, Ltd

Registered officeJohn Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. It is sold on the understanding that the publisher is not engaged in rendering professional services and neither the publisher nor the author shall be liable for damages arising herefrom. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Schuller, Björn.   Computational paralinguistics : emotion, affect and personality in speech and languageprocessing / Björn Schuller, Anton Batliner. – First Edition.      pages cm   Includes bibliographical references and index.   ISBN 978-1-119-97136-8 (hardback)  1. Psycholinguistics–Data processing.  2. Linguistic models–Data processing.  3. Paralinguistics.4. Language and emotions.  5. Speech processing systems.  6. Human-computer interaction.7. Emotive (Linguistics)  8. Computational linguistics.  I. Batliner, Anton.  II. Title.   P37.5.D37S38 2014   401′.90285–dc23

2013019325

A catalogue record for this book is available from the British Library.

To Dagmar and Gisela

Preface

It might be safe to claim that 20 years ago, neither the term ‘computational paralinguistics’ nor the field it denotes existed. Some 10 years ago, the term did not yet exist either. However, in hindsight, the field had begun to exist if we think of the first steps towards the automatic processing of emotions in speech in the mid-1990s. For example, Picard’s book on Affective Computing published in 1997, and the International Speech Communication Association (ISCA) workshop on emotion and speech in 2000, just to mention some of the many topics and events related to and belonging to computational paralinguistics. The term ‘paralinguistics’ had already been coined in the 1950s – with different broad or narrow denotations; we will try and sketch this history in Part I of this book. Yet, in the realm of ‘hard core’ automatic (i.e., computational) processing of speech, the topic was still not fully acknowledged; as one of our colleagues said: ‘Emotion recognition, that’s esoterics with HMMs.’

Today, it might be safe to claim that computational paralinguistics has been established as a discipline in its own right – although surprisingly, not the term itself. It is only natural that, as a new and still somewhat exotic field, it has to cope with prejudices on the one hand, and unrealistic promises on the other hand.

This book represents the first attempt towards a unified overview of the field, its extremely ramified and diverse ‘genealogy’, its methodology, and the state of the art. ‘Computational paralinguistics’ is not an established subject that can be studied, and this fact is mirrored in the ‘scientific CVs’ of both authors. B.S. studied electrical engineering and information technology. However, his doctoral thesis dealt with one aspect of computational paralinguistics: the automatic recognition of human emotion in speech. During his habilitation period he broadened the scope of his work to ‘intelligent audio analysis’ – dealing with quite a number of further paralinguistic aspects, including those found in sung language and many other audio processing problems such as emotion in music and general sound. At the time of finalisation of the manuscript, he was a professor in computer science. A.B. started within philology, came from diachronic phonology to phonetics in general to prosody in particular, and via prosody in the interface for and within syntax and semantics to the automatic processing of acted and very soon naturalistic emotions – realising, moreover, that he had been dealing with different topics for a long time that all can be located within (computational) paralinguistics.

Originally, the intended focus of this book was on the computational processing of emotions and affect in speech and language, taking into account personality as well. In the early conceptual stages, however, we realised that this would be sub-optimal and thus decided to deal with everything ‘besides’ linguistics – namely, the computational processing of ‘para’-linguistics in a broad sense. However, we confine ourselves to the acoustic/phonetic/linguistic aspects, that is, we only deal with one modality, namely speech/language including non-verbal components, and disregard other modalities such as facial gestures or body posture. Moreover, we do not aim at a complete description of human–human or human–machine communication which would include the generation and production (synthesis) of speech, the interaction with other components within a multimodal system, the role within application systems, or real-life applications and their evaluation. Apart from the fact that most of these aspects would not be part of our core competence, we feel that it makes sense to try to establish computational paralinguistics as one building block amongst several others. Besides, there are already good overview and introductory books available on these other topics. And last but not least, it would be rather too complex for one book.

We wish to provide the reader with a sort of map presenting an overview of the field, and useful for finding one’s own way through. The scale of this map is medium-size, and we can only display a few of the houses in this virtual paralinguistic ‘city’ with their interiors, on an exemplary basis. In so doing, we hope to provide guidelines for the novice and to present at least a few new insights and perspectives to the expert. Many studies are referred to and core results are summarised. For all of them, the caveat holds that basically all such studies are restricted – confined to a specific choice of subjects, research questions, operationalisations, and features employed, just to mention a few of the decisive factors. There are errors such as the famous erroneous decimal point that made spinach more healthy than anything else – note that reports of this error might be erroneous themselves. And of course, there is much more that can go wrong – and hopefully, we will find out: scrutiny of results and replications of studies will eventually converge to more stable claims.

We decided not to describe basic phonetic/linguistic knowledge such as vowel or consonant charts, details on pitch versus F0, loudness, morphological and grammatical systems, basics on production and perception of speech, and the like. Such information can easily be obtained in introductory and overview books from the respective fields, as well as from online sources. In a similar way, selections had to be made for the computation aspects. For example, many approaches to linguistic modelling exist, and the fields of machine learning and signal processing each deserve at least one book in its own right. Thus, we limited our choices to the methods most established and common in the field – serving as a solid basis and inspiration for the interested reader to look further. A connected resource of information is the book’s homepage found at http://www.cp.openaudio.eu which includes features such as links to the openSMILE toolkit and (part of) the data described.

Björn W. Schuller and Anton M. Batliner Munich, February 2013

Acknowledgements

We would like to thank in particular Florian Eyben and Felix Weninger at Technische Universität München in Munich, Germany, for their great help in finishing Chapters 7–13; we also extend our thanks to Martina Römpp for her help with the illustrations in this part, and to Tobias Bocklet and Florian Hönig at the Friedrich Alexander Universität in Erlangen-Nuremberg, Germany, for commenting on Chapter 5.

We stand on the proverbial shoulders of giants – and of many other ‘normal’ people who all contributed to our understanding. In the same way, we are grateful to colleagues who do not share our points of view: opinions are not only reinforced by encouragement but also by dissent which helps us to rethink or revise our own approaches and theoretical positions, or simply to stick with them.

There are too many people in the field(s) that supported us, discussed with us, and exchanged ideas with us, too many to list in detail, so we wish to express our gratitude to all of them in a generic way.

For the encouragement in the first place, excellent continued support and guidance as well as patience, we sincerely thank the editor and publisher – John Wiley & Sons.

Finally, we would like to thank the HUMAINE Association (henceforth the Association for the Advancement of Affective Computing) for providing an excellent network not only for affective computing, but also for the broader field of computational paralinguistics. The authors also acknowledge funding from the European Commission under grant agreement no. 289021 from the ASC Inclusion Project committed to provide interactive serious emotion games for children with autism spectrum condition.

List of Abbreviations

ACFAutocorrelation FunctionACMAssociation for Computing MachineryAD(H)DAttention Deficit (Hyperactivity) DisorderAEC(FAU) Aibo Emotion CorpusALActive LearningALCAlcohol Language CorpusAMAcoustic ModelANNArtificial Neural NetworkAPIApplication Programming InterfaceARFFAttribute Relation File Format (WEKA)ARMAAutoregressive Moving AverageASCAutism Spectrum ConditionASCIIAmerican Standard Code for Information ExchangeASDAutism Spectrum DisorderASRAutomatic Speech RecognitionAUCArea Under CurveAVECAudiovisual Emotion ChallengeAVIC(TUM) Audiovisual Interest CorpusBACBlood Alcohol ContentBESBerlin Emotional Speech DatabaseBFIBig Five InventoryBLSTMBidirectional Long Short-Term MemoryBoCNGBag-of-Character-N-GramsBoNGBag-of-N-GramsBoWBag-of-wordsBPTTBack-Propagation Through TimeBRACBreath Alcohol Concentration TestBRNNBidirectional Recurrent Neural NetworkC-AuDiTComputer-Assisted Pronunciation and Dialogue TrainingCALLComputer-Aided Language LearningCAPTComputer-Aided Pronunciation TrainingCC(Pearson) Correlation CoefficientCDCompact DiscCEICESCombining Efforts for Improving Automatic Classification of Emotion in SpeechCFSCorrelation based Feature SelectionCGNSpoken Dutch Corpus from Centre for Genetic Resources, the NetherlandsCRCompression RateCSVComma Separated Value (format)DATDigital Audio TapeDBNDynamic Bayesian NetworksDCTDiscrete Cosine TransformationDESDanish Emotional Speech (Database)DETDetection Error Trade-off (curve)DFTDiscrete Fourier TransformDLLDynamic Link LibraryDTDecision Tree (machine learning) / Determiner (linguistics)ECEmotion Challenge (Interspeech 2009)ECAEmbodied Conversational AgentEEREqual Error RateEMExpectation MaximisationEMMAExtensible Multi-Modal Annotation (markup language, XML-style)ERBEquivalent Rectangular BandwidthETSIEuropean Telecommunications Standards InstituteEUEuropean UnionEWEEvaluator Weighted EstimatorF0Fundamental FrequencyFAUFriedrich Alexander UniversityFFTFast Fourier TransformationFNFalse NegativesFNNFeed-forward Neural NetworkFNRFalse Negative Rate/RatioFPFalse PositivesFPRFalse Positive Rate/RatioGMGaussian MixtureGMMGaussian Mixture ModelGPLGNU/General Public LicenceGUIGraphical User InterfaceHMMHidden Markov ModelHNRHarmonics-to-Noise RatioHTKHidden Markov (model) ToolkitICAIndependent Component AnalysisIDFInverse Document FrequencyIGInformation GainIGRInformation Gain RatioIIRInfinite Impulse ResponseIPInterruption PointISCAInternational Speech Communication AssociationISLEItalian and German Spoken Learners English (corpus)ITUInternational Telecommunication UnionKLKullback–Leibler (divergence/distance)L1first languageL2second languageLDALinear Discriminant AnalysisLDCLinguistic Data ConsortiumLIWCLinguistic Inquiry and Word CountLLDLow-Level DescriptorLMLanguage ModelLOOLeave One OutLPLinear Predictor / Linear PredictionLPCLinear Predictive CodingLPCCLinear Predictive Cepstral CoefficientLSFLine Spectral (pair) FrequencyLSPLine Spectral PairLSTMLong Short-Term MemoryLSTM-RNNLong Short-Term Memory Recurrent Neural NetworkLVCSRLarge Vocabulary Continuous Speech RecognitionMAEMean Absolute ErrorMAPEMean Absolute Percentage ErrorMFBMel-Frequency BandMFCCMel-Frequency Cepstral CoefficientMIMLMultimodal Interaction Markup LanguageMLMaximum LikelihoodMLEMean Linear ErrorMLPMultilayer PerceptronMSEMean Square ErrorNCSCNKI CCRT Speech CorpusNHDNull Hypothesis DecisionNHRNoise-to-Harmonics RatioNHTNull Hypothesis TestingNLNon-likeableNMFNon-negative Matrix FactorisationNNNoun (linguistic)NPNoun Phrase (linguistic)NPVNegative Predictive ValueOCEANOpenness, Conscientiousness, Extraversion, Agreeableness, NeuroticismOOVOut Of Vocabulary (words)openEARopen-source Emotion and Affect Recognition (toolkit)openSMILEopen-source Speech and Music Interpretation by Large Scale Extraction (toolkit)PCParalinguistic Challenge (Interspeech 2010)PCAPrincipal Component AnalysisPCMPulse Code ModulationPDAPitch Detection AlgorithmPDFProbability Density FunctionPLPPerceptual Linear PredictionPLP-CCPerceptual Linear Prediction Cepstral CoefficientPLTTPost Laryngectomy Telephone TestPOSPart Of SpeechPPPrepositional Phrase (linguistic)PRPrecisionRASTARelAtive SpecTrARASTA-PLPRelAtive SpecTrA Perceptual Linear Prediction (Coefficients)RERecallRIFFResource Interchange File FormatRMSRoot Mean SquareRMSERoot Mean Squared ErrorRNNRecurrent Neural NetworkROCReceiver Operating CharacteristicSALSensitive Artificial ListenerSAMPASpeech Assessment Methods Phonetic AlphabetSEMAINESustained Emotionally coloured Machine-human Interaction using Non-verbal Expression (project)SFFSSequential Floating Forward SearchSFSSpeech Filing SystemSHSSub-Harmonic SummationSIInternational System of Units (French: Système international d’unités)SIFTSimplified Inverse Filtering TechniqueSIMISSpeech In Minimal Invasive Surgery (database)SLCSleepy Language CorpusSLDSpeaker Likability DatabaseSNRSignal-to-Noise RatioSPSPecificitySPCSpeaker Personality CorpusSSCSpeaker State ChallengeSTCSpeaker Trait ChallengeSVMSupport Vector MachineSVQSplit Vector QuantisationSVRSupport Vector RegressionT-expressionTernary expressionTFTerm FrequencyTFIDFTerm Frequency and Inverse Document FrequencyTNTrue NegativesTPTrue PositivesTPRTrue Positive Rate/RatioTUMTechnische Universität MünchenUAUnweighted AccuracyUARUnweighted Average RecallVAMVera-Am-Mittag (German TV show, corpus)VBVerbVPVerb PhraseVQVector QuantisationWAWeighted Accuracy / Word Accuracy (ASR)WARWeighted Average RecallWYALFIWYGWhat You Are Looking For Is What You GetWYSIWYGWhat You See Is What You GetXMLeXtensible Mark-up LanguageZCRZero Crossing Rate

Part I

Foundations

1

Introduction

1.1 What is Computational Paralinguistics? A First Approximation

So difficult it is to show the various meanings and imperfections of words when we have nothing else but words to do it with.

(John Locke)

The term computational paralinguistics is not yet a well-established term, in contrast to computational linguistics or even computational phonetics; the reader might like to try comparing the hits for each of these terms – or for any other combination of ‘computational’ with the name of a scientific field such as psychology or sociology – in a web search. This terminological gap is a little puzzling given the fact that there is a plethora of studies on, for example, affective computing (Picard 1997) and speech – which can partly be conceived as a sub-field of computational paralinguistics (as far as speech and language are concerned). But let us first take a look at the coarse meanings of the two words this term consists of: ‘computational’ and ‘paralinguistics’.

Here, ‘computational’ means roughly that something is done by a computer and not by a human being; this can mean analysing the phenomenon in question, or generating humanlike behaviour. Note that nowadays computers are used for practically all systematic and scientific work, even if it is only for listing data, detailed information on subjects, or annotations in an ASCII (American Standard Code for Information Exchange) file. In traditional phonetic or psychological approaches, this can go along with the use of highly sophisticated signal extraction and statistical programs. A borderline between the ‘simple’ use of computers for tedious work and the use of computers for actually modelling and performing human behaviour is of course difficult to define. Here, we simply mean both: doing the work with the help of computers, and letting computers do the work of analysing and processing.

‘Paralinguistics’ means ‘alongside linguistics’ (from the Greek preposition παρα); thus the phenomena in question are not typical linguistic phenomena such as the structure of a language, its phonetics, its grammar (syntax, morphology), or its semantics. It is concerned with how you say something rather than what you say.

In Figure 1.1 we try to narrow down the realm of paralinguistics in a reasonable way, as we conceive it and as we will deal with it in this book. Of course, there are other conceptualisations of paralinguistics, some broader, some narrower in scope. Figure 1.1 is a sort of flowchart that we will follow from top to bottom. A grey font indicates fields and topics that are not part of paralinguistics, for instance, the global science of mankind or of everything else that can be found in this world. Dashed lines lead to fields that are more or less disregarded in this book.

Figure 1.1 The realm of computational paralinguistics

The first word shown in black is ‘communication’, denoting that interactions between human beings are focal. Paralinguistics deals with speech and language which both are primarily means of communication; even a soliloquy has to be overheard and eventually recorded and processed by the computer in order to be an object of investigation. The same holds for a private diary in its written form: it might not be intended as communication with others, but as soon as it is read by someone else, it is. Of course, human communication is an important part of related fields such as psychology, sociology, or anthropology. Thus, we have to follow the flowchart further down to point out what distinguishes paralinguistics from all these related fields.

In traditional linguistics, the term ‘language’ refers to the (innate and/or acquired) mental competence, and the term ‘speech’ to the performance, that is, to the ability to convert this competence into motor signals, acoustic waves, and percepts. In this book we adhere to a shallower definition of these two terms, based on their use in speech and language technology. Language is more or less synonymous with ‘natural language’ which is modelled and processed within computational linguistics; speech is the object of investigation within automatic speech processing, that is, ‘spoken language’, as opposed to written language.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!