Multimedia Semantics -  - E-Book

Multimedia Semantics E-Book

0,0
91,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

In this book, the authors present the latest research results in the multimedia and semantic web communities, bridging the "Semantic Gap"

This book explains, collects and reports on the latest research results that aim at narrowing the so-called multimedia "Semantic Gap": the large disparity between descriptions of multimedia content that can be computed automatically, and the richness and subjectivity of semantics in user queries and human interpretations of audiovisual media. Addressing the grand challenge posed by the "Semantic Gap" requires a multi-disciplinary approach (computer science, computer vision and signal processing, cognitive science, web science, etc.) and this is reflected in recent research in this area. In addition, the book targets an interdisciplinary community, and in particular the Multimedia and the Semantic Web communities. Finally, the authors provide both the fundamental knowledge and the latest state-of-the-art results from both communities with the goal of making the knowledge of one community available to the other.

Key Features:

  • Presents state-of-the art research results in multimedia semantics: multimedia analysis, metadata standards and multimedia knowledge representation, semantic interaction with multimedia
  • Contains real industrial problems exemplified by user case scenarios
  • Offers an insight into various standardisation bodies including W3C, IPTC and ISO MPEG
  • Contains contributions from academic and industrial communities from Europe, USA and Asia
  • Includes an accompanying website containing user cases, datasets, and software mentioned in the book, as well as links to the K-Space NoE and the SMaRT society web sites (http://www.multimediasemantics.com/)

This book will be a valuable reference for academic and industry researchers /practitioners in multimedia, computational intelligence and computer science fields. Graduate students, project leaders, and consultants will also find this book of interest.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 605

Veröffentlichungsjahr: 2011

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Title Page

Copyright

Foreword

List of Figures

List of Tables

List of Contributors

Chapter 1: Introduction

Chapter 2: Use Case Scenarios

2.1 Photo Use Case

2.2 Music Use Case

2.3 Annotation in Professional Media Production and Archiving

2.4 Discussion

Acknowledgements

Chapter 3: Canonical Processes of Semantically Annotated Media Production

3.1 Canonical Processes

3.2 Example Systems

3.3 Conclusion and Future Work

Chapter 4: Feature Extraction for Multimedia Analysis

4.1 Low-Level Feature Extraction

4.2 Feature Fusion and Multi-modality

4.3 Conclusion

Chapter 5: Machine Learning Techniques for Multimedia Analysis

5.1 Feature Selection

5.2 Classification

5.3 Classifier Fusion

5.4 Conclusion

Chapter 6: Semantic Web Basics

6.1 The Semantic Web

6.2 RDF

6.3 RDF Schema

6.4 Data Models

6.5 Linked Data Principles

6.6 Development Practicalities

Chapter 7: Semantic Web Languages

7.1 The Need for Ontologies on the Semantic Web

7.2 Representing Ontological Knowledge Using OWL

7.3 A Language to Represent Simple Conceptual Vocabularies: SKOS

7.4 Querying on the Semantic Web

Chapter 8: Multimedia Metadata Standards

8.1 Selected Standards

8.2 Comparison

8.3 Conclusion

Chapter 9: The Core Ontology for Multimedia

9.1 Introduction

9.2 A Multimedia Presentation for Granddad

9.3 Related Work

9.4 Requirements for Designing a Multimedia Ontology

9.5 A Formal Representation for MPEG-7

9.6 Granddad's Presentation Explained by COMM

9.7 Lessons Learned

9.8 Conclusion

Chapter 10: Knowledge-Driven Segmentation and Classification

10.1 Related Work

10.2 Semantic Image Segmentation

10.3 Using Contextual Knowledge to Aid Visual Analysis

10.4 Spatial Context and Optimization

10.5 Conclusions

Chapter 11: Reasoning for Multimedia Analysis

11.1 Fuzzy DL Reasoning

11.2 Spatial Features for Image Region Labeling

11.3 Fuzzy Rule Based Reasoning Engine

11.4 Reasoning over Resources Complementary to Audiovisual Streams

Chapter 12: Multi-Modal Analysis for Content Structuring and Event Detection

12.1 Moving Beyond Shots for Extracting Semantics

12.2 A Multi-Modal Approach

12.3 Case Studies

12.4 Case Study 1: Field Sports

12.5 Case Study 2: Fictional Content

12.6 Conclusions and Future Work

Chapter 13: Multimedia Annotation Tools

13.1 State of the Art

13.2 SVAT: Professional Video Annotation

13.3 KAT: Semi-automatic, Semantic Annotation of Multimedia Content

13.4 Conclusions

Chapter 14: Information Organization Issues in Multimedia Retrieval Using Low-Level Features

14.1 Efficient Multimedia Indexing Structures

14.2 Feature Term Based Index

14.3 Conclusion and Future Trends

Acknowledgement

Chapter 15: The Role of Explicit Semantics in Search and Browsing

15.1 Basic Search Terminology

15.2 Analysis of Semantic Search

15.3 Use Case A: Keyword Search in ClioPatria

15.4 Use Case B: Faceted Browsing in ClioPatria

15.5 Conclusions

Chapter 16: Conclusion

References

Author Index

Subject Index

This edition first published 2011

© 2011 John Wiley & Sons Ltd.

Registered office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Troncy, Raphaël.

Multimedia semantics : metadata, analysis and interaction / Raphaël Troncy, Benoit Huet, Simon Schenk.

p. cm.

Includes bibliographical references and index.

ISBN 978-0-470-74700-1 (cloth)

1. Multimedia systems. 2. Semantic computing. 3. Information retrieval. 4. Database searching. 5. Metadata. I. Huet, Benoit. II. Schenk, Simon. III. Title.

QA76.575.T76 2011

006.7—dc22

2011001669

A catalogue record for this book is available from the British Library.

ISBN: 9780470747001 (H/B)

ISBN: 9781119970224 (ePDF)

ISBN: 9781119970231 (oBook)

ISBN: 9781119970620 (ePub)

ISBN: 9781119970637 (mobi)

Foreword

I am delighted to see a book on multimedia semantics covering metadata, analysis, and interaction edited by three very active researchers in the field: Troncy, Huet, and Schenk. This is one of those projects that are very difficult to complete because the field is advancing rapidly in many different dimensions. At any time, you feel that many important emerging areas may not be covered well unless you see the next important conference in the field. A state of the art book remains a moving, often elusive, target. But this is only a part of the dilemma. There are two more difficult problems. First multimedia itself is like the famous fable of an elephant and blind men. Each person can only experience an aspect of the elephant and hence has only understanding of a partial problem. Interestingly, in the context of the whole problem, it is not a partial perspective, but often is a wrong perspective. The second issue is the notorious issue of the semantic gap. The concepts and abstractions in computing are based on bits, bytes, lists, arrays, images, metadata and such; but the abstractions and concepts used by human users are based on objects and events. The gap between the concepts used by computer and those used by humans is termed the semantic gap. It has been exceedingly difficult to bridge this gap. This ambitious book aims to cover this important, but difficult and rapidly advancing topic. And I am impressed that it is successful in capturing a good picture of the state of the art as it exists in early 2011. On one hand I am impressed, and on the other hand I am sure that many researchers in this field will be thankful to editors and authors for providing all this material in compact, yet comprehensible form, in one book.

The book covers aspects of multimedia from feature extraction to ontological representations to semantic search. This encyclopedic coverage of semantic multimedia is appearing at the right time. Just when we thought that it is almost impossible to find all related topics for understanding emerging multimedia systems, as discussed in use cases, this book appears. Of course, such a book can only provide breadth in a reasonable size. And I find that in covering the breadth, authors have taken care not to become so superficial that the coverage of the topic may become meaningless. This book is an excellent reference sources for anybody working in this area. As is natural, to keep such a book current in a few years, a new edition of the book has to be prepared. Hopefully, all the electronic tools may make this feasible. I would definitely love to see a new edition in a few years.

I want to particularly emphasize the closing sentence of the book: There is no single standard or format that satisfactorily covers all aspects of audiovisual content descriptions; the ideal choice depends on type of application, process and required complexity. I hope that serious efforts will start to develop such a single standard considering all rich metadata in smart phones that can be used to generate meaningful extractable, rather than human generated, tags. We, in academia, often ignore obvious and usable in favor of obscure and complex. We seem to enjoy creation of new problems more than solving challenging existing problems. Semantic multimedia is definitely a field where there is need for simple tools to use available data and information to solve rapidly growing multimedia data volumes. I hope that by pulling together all relevant material, this book will facilitate solution of such real problems.

Ramesh Jain

Donald Bren Professor in Information & Computer Sciences,

Department of Computer Science Bren School of Information and Computer Sciences,

University of California, Irvine.

List of Figures

Figure 2.1Artist recommendations based on information related to a specific user's interestFigure 2.2Recommended events based on artists mentioned in a user profile and geolocationFigure 2.3Management of a personal music collection using aggregated Semantic Web data by GNAT and GNARQLFigure 2.4Metadata flows in the professional audiovisual media production processFigure 4.1Color layout descriptor extractionFigure 4.2Color structure descriptor structuring elementFigure 4.3HTD frequency space partition (6 frequency times, 5 orientation channels)Figure 4.4Real parts of the ART basis functions (12 angular and 3 radial functions)Figure 4.5CSS representation for the fish contour: (a) original image, (b) initialized points on the contour, (c) contour after t iterations, (d) final convex contourFigure 4.6Camera operationsFigure 4.7Motion trajectory representation (one dimension)Figure 4.8Schematic diagram of instantaneous feature vector extractionFigure 4.9Zero crossing rate for a speech signal and a music signal. The ZCR tends to be higher for music signalsFigure 4.10Spectral centroid variation for trumpet and clarinet excerpts. The trumpet produces brilliant sounds and therefore tends to have higher spectral centroid valuesFigure 4.11Frequency response of a mel triangular filterbank with 24 subbandsFigure 5.1Schematic architecture for an automatic classification system (supervised case)Figure 5.2Comparison between SVM and FDA linear discrimination for a synthetic two-dimensional database. (a) Lots of hyperplanes (thin lines) can be found to discriminate the two classes of interest. SVM estimates the hyperplane (thick line) that maximizes the margin; it is able to identify the support vector (indicated by squares) lying on the frontier. (b) FDA estimates the direction in which the projection of the two classes is the most compact around the centroid (indicated by squares); this direction is perpendicular to the discriminant hyperplane (thick line)Figure 6.1Layer cake of important Semantic Web standardsFigure 6.2A Basic RDF GraphFigure 6.3Example as a graphFigure 8.1Parts of the MPEG-7 standardFigure 9.1Family portrait near Pisa Cathedral and the Leaning TowerFigure 9.2COMM: design patterns in UML notation—basic design patterns (A), multimedia patterns (B, D, E) and modeling examples (C, F)Figure 9.3Annotation of the image from Figure 9.1 and its embedding into the multimedia presentation for granddadFigure 10.1Initial region labeling based on attributed relation graph and visual descriptor matchingFigure 10.2Experimental results for an image from the beach domain: (a) input image; (b) RSST segmentation; (c) semantic watershed; (d) semantic RSSTFigure 10.3Fuzzy relation representation: RDF reificationFigure 10.4Graph representation example: compatibility indicator estimationFigure 10.5Contextual experimental results for a beach imageFigure 10.6Fuzzy directional relations definitionFigure 10.7Indicative region-concept association resultsFigure 11.1The FiRE user interface consists of the editor panel (upper left), the inference services panel (upper right) and the output panel (bottom)Figure 11.2The overall analysis chainFigure 11.3Hypothesis set generationFigure 11.4Definition of (a) directional and (b) absolute spatial relationsFigure 11.5Scheme of Nest for image segmentationFigure 11.6Fuzzy confidenceFigure 11.7Detection of moving objects in soccer broadcasts. In the right-hand image all the moving objects have been removedFigure 12.1Detecting close-up/mid-shot images: best-fit regions for face, jersey, and backgroundFigure 12.2Goalmouth views capturing events in soccer, rugby, and hockeyFigure 12.3ERR vs CRR for rugby videoFigure 12.4Detecting events based on audiovisual featuresFigure 12.5FSMs used in detecting sequences where individual features are dominantFigure 12.6An index of character appearances based on dialogues in the movie ShrekFigure 12.7Main character interactions in the movie American BeautyFigure 13.1SVAT user interfaceFigure 13.2Detailed annotation interface for video segmentsFigure 13.3Global annotation dialogueFigure 13.4KAT screenshot during image annotationFigure 13.5Overview of KAT architectureFigure 13.6Available view positions in the default layoutFigure 13.7Using Named Graphs to map COMM objects to repositoriesFigure 13.8COMM video decomposition for whole videoFigure 13.9COMM video decomposition for video segmentFigure 14.1Geometrical representation of PyrRecFigure 14.2Precision with respect to selectivity for color layout featureFigure 14.3Precision with respect to selectivity for edge histogram featureFigure 14.4Number of data accessed with respect to selectivity for colour structure featureFigure 14.5Number of data accessed with respect to selectivity for dominant colour featureFigure 14.6Time with respect to selectivity for colour structure featureFigure 14.7Time with respect to selectivity for homogeneous texture featureFigure 14.8Selection criterion distribution for 80-dimensional edge histogramFigure 14.9Retrieval system frameworkFigure 14.10Mean average precision (MAP) of color layout queryFigure 15.1High level overview of text-based query search: (a) query construction; (b) search algorithm of the system; (c) presentation of the results. Dashed lines represent user feedbackFigure 15.2Autocompletion suggestions are given while the user is typing. The partial query ‘toku’ is contained in the title of three artworks, there is one matching term from the AAT thesaurus and the artist Ando Hiroshige is found as he is also known as TokubelFigure 15.3A user searches for ‘tokugawa’. The Japanese painting on the right matches this query, but is indexed with a thesaurus that does not contain the synonym ‘Tokugawa’ for this Japanese style. Through a ‘same-as’ link with another thesaurus that does contain this label, the semantic match can be madeFigure 15.4Result graph of the E-Culture search algorithm for the query ‘Tokugawa’. The rectangular boxes on the left contain the literal matches, the colored boxes on the left contain a set of results, and the ellipses a single result. The ellipses in between are the resources traversed in the graph searchFigure 15.5Presentation of the search results for the query ‘Togukawa’ in the E-Culture demonstrator. The results are presented in five groups (the first and third groups have been collapsed). Museum objects that are found through a similar path in the graph are grouped togetherFigure 15.6Faceted interface of the NewsML demonstrator. Four facets are active: document type, creation site, event and person. The value ‘photo’ is selected from the document type facet. The full query also contains the keyword ‘Zidane’, as is visible in the header above the resultsFigure 15.7Hierarchical organization of the values in the creation site facet. The value ‘Europe’ is selected and below it the four countries in which photos related to Zidane are createdFigure 15.8Grouped presentation of search results. The photos related to Zidane are presented in groups with the same creation site

List of Tables

Table 3.1Canonical processes and their relation to photo book productionTable 3.2Description of dependencies between visual diary stages and the canonical process for media productionTable 5.1Classifier fusion propertiesTable 6.1Most relevant RDF(S) entailment rulesTable 6.2Overview of data models, from Angles and Gutierrez (2005)Table 7.1A core fragment of OWL2Table 8.1Comparison of selected multimedia metadata standardsTable 10.1Comparison of segmentation variants and their combination with visual context, with evaluation scores per conceptTable 11.1Semantics of concepts and rolesTable 11.2Tableau expansion rulesTable 11.3Knowledge base (TBox): features from text combined with detectors from videoTable 12.1Performance of event detection across various sports: maximum CRR for 90% ERRTable 12.2Results of the cross-media feature selection (P, C, N, Previous, Current, Next; M, E, O, Middle, End, Other)Table 12.3Dual co-occurrence highlighted for different character relationshipsTable 13.1Number of RDF triples for MPEG-7 export of the same video with different metadataTable 13.2Number of RDF triples for MPEG-7 export of different videos with the same metadataTable 14.1Term numbers of homogenous texture in the TRECVid 2006 collectionTable 14.2Number of relevant documents of dominant color, in the top 1000 returned documents, in the TRECVid 2006 collectionTable 14.3Average number of relevant documents, in the top 1000 returned documents, of all query topicsTable 15.1Functionality and interface support in the three phases of semantic search

List of Contributors

Thanos Athanasiadis

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 15780 Zographou, Greece

Yannis Avrithis

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 15780 Zographou, Greece

Werner Bailer

JOANNEUM RESEARCH, Forschungsgesellschaft mbH, DIGITAL—Institute for Information and Communication Technologies, Steyrergasse 17, 8010 Graz, Austria

Rachid Benmokhtar

EURECOM, 2229 Route des Crêtes, BP 193—Sophia Antipolis, France

Petr Berka

Faculty of Informatics and Statistics, University of Economics, Prague, Czech Republic

Susanne Boll

Media Informatics and Multimedia Systems Group, University of Oldenburg, Escherweg 2, 26121 Oldenburg, Germany

Paul Buitelaar

DFKI GmbH, Germany

Marine Campedel

Telecom ParisTech, 37–39 rue Dareau, 75014 Paris, France

Oscar Celma

BMAT, Barcelona, Spain

Krishna Chandramouli

Queen Mary University of London, Mile End Road, London, UK

Slim Essid

Telecom ParisTech, 37–39 rue Dareau, Paris, 75014 France

Thomas Franz

ISWeb—Information Systems and Semantic Web, University of Koblenz-Landau, Universitätsstraße 1, Koblenz, Germany

Lynda Hardman

Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Michael Hausenblas

Digital Enterprise Research Institute, National University of Ireland, IDA Business Park, Lower Dangan, Galway, Ireland

Michiel Hildebrand

Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Frank Hopfgartner

International Computer Science Institute, 1947 Center Street, Suite 600, Berkeley, CA, 94704, USA

Benoit Huet

EURECOM, 2229 Route des Crêtes, BP 193—Sophia Antipolis, France

Antoine Isaac

Vrije Universiteit Amsterdam, de Boelelaan 1081a, Amsterdam, The Netherlands

Joemon M. Jose

University of Glasgow, 18 Lilybank Gardens, Glasgow G12 8RZ, UK

Florian Kaiser

Technische Universität Berlin, Institut für Telekommunkationssysteme, Fachgebiet Nachrichtenübertragung, Einsteinufer 17, 10587 Berlin, Germany

Ioannis Kompatsiaris

Informatics and Telematics Institute, Centre for Research and Technology Hellas, 57001 Thermi-Thessaloniki, Greece

Bart Lehane

CLARITY: Centre for Sensor Web Technologies, Dublin City University, Ireland

Vasileios Mezaris

Informatics and Telematics Institute, Centre for Research and Technology Hellas, 57001 Thermi-Thessaloniki, Greece

Phivos Mylonas

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 15780 Zographou, Greece

Frank Nack

University of Amsterdam, Science Park 107, 1098 XG Amsterdam, The Netherlands

Jan Nemrava

Faculty of Informatics and Statistics, University of Economics, Prague, Czech Republic

Noel E. O'Connor

CLARITY: Centre for Sensor Web Technologies, Dublin City University, Ireland

Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Eyal Oren

Vrije Universiteit Amsterdam, de Boelelaan 1081a, Amsterdam, The Netherlands

Georgios Th. Papadopoulos

Informatics and Telematics Institute, Centre for Research and Technology Hellas, 57001 Thermi-Thessaloniki, Greece

Tomas Piatrik

Queen Mary University of London, Mile End Road, London E1 4NS, UK

Yves Raimond

BBC Audio & Music interactive, London, UK

Reede Ren

University of Surrey, Guildford, Surrey, GU2 7XH, UK

Gaël Richard

Telecom ParisTech, 37–39 rue Dareau, 75014 Paris, France

Carsten Saathoff

ISWeb—Information Systems and Semantic Web, University of Koblenz-Landau, Universitätsstraße 1, Koblenz, Germany

David A. Sadlier

CLARITY: Centre for Sensor Web Technologies, Dublin City University, Ireland

Andrew Salway

Burton Bradstock Research Labs, UK

Peter Schallauer

JOANNEUM RESEARCH, Forschungsgesellschaft mbH, DIGITAL—Institute for Information and Communication Technologies, Steyrergasse 17, 8010 Graz, Austria

Simon Schenk

University of Koblenz-Landau, Universitätsstraße 1, Koblenz, Germany

Ansgar Scherp

University of Koblenz-Landau, Universitätsstraße 1, Koblenz, Germany

Nikolaos Simou

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 15780 Zographou, Greece

Giorgos Stoilos

Image, Video and Multimedia Systems Laboratory, National Technical University of Athens, 15780 Zographou, Greece

Michael G. Strintzis

Informatics and Telematics Institute, Centre for Research and Technology Hellas, 57001 Thermi-Thessaloniki, Greece

Vojtch Svátek

Faculty of Informatics and Statistics, University of Economics, Prague, Czech Republic

Raphaël Troncy

Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

Chapter 1

Introduction

Raphaël Troncy,1 Benoit Huet2 and Simon Schenk3

1Centrum Wiskunde & Informatica, Amsterdam, The Netherlands

2EURECOM, Sophia Antipolis, France

3University of Koblenz-Landau, Koblenz, Germany

Digital multimedia items can be found on most electronic equipment ranging from mobile phones and portable audiovisual devices to desktop computers. Users are able to acquire, create, store, send, edit, browse, and render through such content at an increasingly fast rate. While it becomes easier to generate and store data, it also becomes more difficult to access and locate specific or relevant information. This book addresses directly and in considerable depth the issues related to representing and managing such multimedia items.

The major objective of this book is to gather together and report on recent work that aims to extract and represent the semantics of multimedia items. There has been significant work by the research community aimed at narrowing the large disparity between the low-level descriptors that can be computed automatically from multimedia content and the richness and subjectivity of semantics in user queries and human interpretations of audiovisual media—the so-called semantic gap.

Research in this area is important because the amount of information available as multimedia for the purposes of entertainment, security, teaching or technical documentation is overwhelming but the understanding of the semantics of such data sources is very limited. This means that the ways in which it can be accessed by users are also severely limited and so the full social or economic potential of this content cannot be realized.

Addressing the grand challenge posed by the semantic gap requires a multi-disciplinary approach and this is reflected in recent research in this area. In particular, this book is closely tied to a recent Network of Excellence funded by the Sixth Framework Programme of the European Commission named ‘K-Space’ (Knowledge Space of Semantic Inference for Automatic Annotation and Retrieval of Multimedia Content).

By its very nature, this book is targeted at an interdisciplinary community which incorporates many research communities, ranging from signal processing to knowledge representation and reasoning. For example, multimedia researchers who deal with signal processing, computer vision, pattern recognition, multimedia analysis, indexing, retrieval and management of ‘raw’ multimedia data are increasingly leveraging methods and tools from the Semantic Web field by considering how to enrich their methods with explicit semantics. Conversely, Semantic Web researchers consider multimedia as an extremely fruitful area of application for their methods and technologies and are actively investigating how to enhance their techniques with results from the multimedia analysis community. A growing community of researchers is now pursuing both approaches in various high-profile projects across the globe. However, it remains difficult for both sides of the divide to communicate with and learn from each other. It is our hope that this book will go some way toward easing this difficulty by presenting recent state-of-the-art results from both communities.

Whenever possible, the approaches presented in this book will be motivated and illustrated by three selected use cases. The use cases have been selected to cover a broad range of multimedia types and real-world scenarios that are relevant to many users on the Web: photos on the Web, music on the Web, and professional audiovisual media production process. The use cases introduce the challenges of media semantics in three different areas: personal photo collection, music consumption, and audiovisual media production as representatives of image, audio, and video content. The use cases, detailed in Chapter 2, motivate the challenges in the field and illustrate the kind of media semantics needed for future use of such content on the Web, and where we have just begun to solve the problem.

Nowadays it is common to associate semantic annotations with media assets. However, there is no agreed way of sharing such information among systems. In Chapter 3 a small number of fundamental processes for media production are presented. The so-called canonical processes are described in the context of two existing systems, related to the personal photo use case: CeWe Color Photo Book and SenseCam.

Feature extraction is the initial step toward multimedia content semantic processing. There has been a lot of work in the signal processing research community over the last two decades toward identifying the most appropriate feature for understanding multimedia content. Chapter 4 provides an overview of some of the most frequently used low-level features, including some from the MPEG-7 standard, to describe audiovisual content. A succinct description of the methodologies employed is also provided. For each of the features relevant to the video use case, a discussion will take place and provide the reader with the essential information about its strengths and weaknesses. The plethora of low-level features available today led the research community to study multi-feature and multi-modal fusion. A brief but concise overview is provided in Chapter 4. Some feature fusion approaches are presented and discussed, highlighting the need for the different features to be studied in a joint fashion.

Machine learning is a field of active research that has applications in a broad range of domains. While humans are able to categorize objects, images or sounds and to place them in specific classes according to some common characteristic or semantics, computers are having difficulties in achieving similar classification. Machine learning can be useful, for example, in learning models for very well-known objects or settings. Chapter 5 presents some of the main machine learning approaches for setting up an automatic multimedia analysis system. Continuing the information processing flow described in the previous chapter, feature dimensionality reduction methods, supervised and unsupervised classification techniques, and late fusion approaches are described.

The Internet and the Web have become an important communication channel. The Semantic Web improves the Web infrastructure with formal semantics and interlinked data, enabling flexible, reusable, and open knowledge management systems. In Chapter 6, the Semantic Web basics are introduced: the RDF(S) model for knowledge representation, and the existing web infrastructure composed of URIs identifying resources and representations served over the HTTP protocol. The chapter details the importance of open and interlinked Semantic Web datasets, outlines the principles for publishing such linked data on the Web, and discuss some prominent openly available linked data collections. In addition, it shows how RDF(S) can be used to capture and describe domain knowledge in shared ontologies, and how logical inferencing can be used to deduce implicit information based on such domain ontologies.

Having defined the Semantic Web infrastructure, Chapter 7 addresses two questions concerning rich semantics: How can the conceptual knowledge useful for a range of applications be successfully ported to and exploited on the Semantic Web? And how can one access efficiently the information that is represented on these large RDF graphs that constitute the Semantic Web information sphere? Those two issues are addressed through the presentation of SPARQL, the recently standardized Semantic Web Query language, with an emphasis on aspects relevant to querying multimedia metadata represented using RDF in the running examples of COMM annotations.

Chapter 8 presents and discusses a number of commonly used multimedia metadata standards. These standards are compared with respect to a list of assessment criteria using the use cases listed in Chapter 2 as a basis. Through these examples the limitations of the currents standards are exposed. Some initial solutions provided by COMM for automatically converting and mapping between metadata standards are presented and discussed.

A multimedia ontology framework, COMM, that provides a formal semantics for multimedia annotations to enable interoperability of multimedia metadata among media tools is presented in Chapter 9. COMM maps the core functionalities of the MPEG-7 standard to a formal ontology, following an ontology design approach that utilizes the foundational ontology DOLCE to safeguard conceptual clarity and soundness as well as extensibility towards new annotation requirements.

Previous chapters having described multimedia processing and knowledge representation techniques, Chapter 10 examines how their coupling can improve analysis. The algorithms presented in this chapter address the photo use case scenario from two perspectives. The first is a segmentation perspective, using similarity measures and merging criteria defined at a semantic level for refining an initial data-driven segmentation. The second is a classification perspective, where two knowledge-driven approaches are presented. One deals with visual context and treats it as interaction between global classification and local region labels. The other deals with spatial context and formulates the exploitation of it as a global optimization problem.

Chapter 11 demonstrates how different reasoning algorithms upon previously extracted knowledge can be applied to multimedia analysis in order to extract semantics from images and videos. The rich theoretical background, the formality and the soundness of reasoning algorithms can provide a very powerful framework for multimedia analysis. The fuzzy extension of the expressive DL language , , together with the fuzzy reasoning engine, FiRE, that supports it, are presented here. Then, a model using explicitly represented knowledge about the typical spatial arrangements of objects is presented. Fuzzy constraint reasoning is used to represent the problem and to find a solution that provides an optimal labeling with respect to both low-level and spatial features. Finally, the NEST expert system, used for estimating image regions dissimilarity is described.

Multimedia content structuring is to multimedia documents what tables of contents and indexes are to written documents, an efficient way to access relevant information. Chapter 12 shows how combined audio and visual (and sometimes textual) analysis can assist high-level metadata extraction from video content in terms of content structuring and in detection of key events depicted by the content. This is validated through two case studies targeting different kinds of content. A quasi-generic event-level content structuring approach using combined audiovisual analysis and a suitable machine learning paradigm is described. It is also shown that higher-level metadata can be obtained using complementary temporally aligned textual sources.

Chapter 13 reviews several multimedia annotation tools and presents two of them in detail. The Semantic Video Annotation Tool (SVAT) targets professional users in audiovisual media production and archiving and provides an MPEG-7 based framework for annotating audiovisual media. It integrates different methods for automatic structuring of content and provides the means to semantically annotate the content. The K-Space Annotation Tool is a framework for semi-automatic semantic annotation of multimedia content based on COMM. The annotation tools are compared and issues are identified.

Searching large multimedia collection is the topic covered in Chapter 14. Due to the inherently multi-modal nature of multimedia documents there are two major challenges in the development of an efficient multimedia index structure: the extremely high-dimensional feature space representing the content, on the one hand, and the variable types of feature dimensions, on the other hand. The first index function presented here divides a feature space into disjoint subspaces by using a pyramid tree. An index function is proposed for efficient document access. The second one exploits the discrimination ability of a media collection to partition the document set. A new feature space, the feature term, is proposed to facilitate the identification of effective features as well as the development of retrieval models.

In recent years several Semantic Web applications have been developed that support some form of search. Chapter 15 analyzes the state of the art in that domain. The various roles played by semantics in query construction, the core search algorithm and presentation of search results are investigated. The focus is on queries based on simple textual entry forms and queries constructed by navigation (e.g. faceted browsing). A systematic understanding of the different design dimensions that play a role in supporting search on Semantic Web data is provided. The study is conducted in the context of image search and depicts two use cases: one highlights the use of semantic functionalities to support the search, while the other exposes the use of faceted navigation to explore the image collection.

In conclusion, we trust that this book goes some way toward illuminating some recent exciting results in the field of semantic multimedia. From the wide spectrum of topics covered, it is clear that significant effort is being invested by both the Semantic Web and multimedia analysis research communities. We believe that a key objective of both communities should be to continue and broaden interdisciplinary efforts in this field with a view to extending the significant progress made to date.

Chapter 2

Use Case Scenarios

Werner Bailer,1 Susanne Boll,2 Oscar Celma,3 Michael Hausenblas4 and Yves Raimond5

1JOANNEUM RESEARCH—DIGITAL, Graz, Austria

2University of Oldenburg, Oldenburg, Germany

3BMAT, Barcelona, Spain

4Digital Enterprise Research Institute, National University of Ireland, IDA Business Park, Lower Dangan, Galway, Ireland

5BBC Audio & Music Interactive, London, UK

In this book, the research approaches to extracting, deriving, processing, modeling, using and sharing the semantics of multimedia are presented. We motivate these approaches with three selected use cases that are referred to throughout the book to illustrate the respective content of each chapter. These use cases are partially based on previous work done in the W3C Multimedia Semantics Incubator Group, MMSEM–XG1 and the W3C Media Annotations Working Group2 and have been selected to cover a broad range of multimedia types and real-world scenarios that are relevant to many users on the Web.

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!