140,99 €
Belonging to the wider academic field of computer vision, video analytics has aroused a phenomenal surge of interest since the current millennium. Video analytics is intended to solve the problem of the incapability of exploiting video streams in real time for the purpose of detection or anticipation. It involves analyzing the videos using algorithms that detect and track objects of interest over time and that indicate the presence of events or suspect behavior involving these objects. The aims of this book are to highlight the operational attempts of video analytics, to identify possible driving forces behind potential evolutions in years to come, and above all to present the state of the art and the technological hurdles which have yet to be overcome. The need for video surveillance is introduced through two major applications (the security of rail transportation systems and a posteriori investigation). The characteristics of the videos considered are presented through the cameras which enable capture and the compression methods which allow us to transport and store them. Technical topics are then discussed - the analysis of objects of interest (detection, tracking and recognition), "high-level" video analysis, which aims to give a semantic interpretation of the observed scene (events, behaviors, types of content). The book concludes with the problem of performance evaluation.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 568
Veröffentlichungsjahr: 2012
Contents
Introduction
I.1. General presentation
I.2. Objectives of the book
I.3. Organization of the book
Chapter 1 Image Processing: Overview and Perspectives
1.1. Half a century ago
1.2. The use of images
1.3. Strengths and weaknesses of image processing
1.4. What is left for the future?
1.5. Bibliography
Chapter 2 Focus on Railway Transport
2.1. Introduction
2.2. Surveillance of railway infrastructures
2.3. Onboard surveillance
2.4. Conclusion
2.5. Bibliography
Chapter 3 A Posteriori Analysis for Investigative Purposes
3.1. Introduction
3.2. Requirements in tools for assisted investigation
3.3. Collection and storage of data
3.4. Exploitation of the data
3.5. Conclusion
3.6. Bibliography
Chapter 4 Video Surveillance Cameras
4.1. Introduction
4.2. Constraints
4.3. Nature of the information captured
4.4. Video formats
4.5. Technologies
4.6. Interfaces: from analog to IP
4.7. Smart cameras
4.8. Conclusion
4.9. Bibliography
Chapter 5 Video Compression Formats
5.1. Introduction
5.2. Video formats
5.3. Principles of video compression
5.4. Compression standards
5.5. Conclusion
5.6. Bibliography
Chapter 6 Compressed Domain Analysis for Fast Activity Detection
6.1. Introduction
6.2. Processing methods
6.3. Uses of analysis of the compressed domain
6.4. Conclusion
6.5. Acronyms
6.6. Bibliography
Chapter 7 Detection of Objects of Interest
7.1. Introduction
7.2. Moving object detection
7.3. Detection by modeling of the objects of interest
7.4. Conclusion
7.5. Bibliography
Chapter 8 Tracking of Objects of Interest in a Sequence of Images
8.1. Introduction
8.2. Representation of objects of interest and their associated visual features
8.3. Geometric workspaces
8.4. Object-tracking algorithms
8.5. Updating of the appearance models
8.6. Multi-target tracking
8.7. Object tracking using a PTZ camera
8.8. Conclusion
8.9. Bibliography
Chapter 9 Tracking Objects of Interest Through a Camera Network
9.1. Introduction
9.2. Tracking in a network of cameras whose fields of view overlap
9.3. Tracking through a network of cameras with non-overlapping fields of view
9.4. Conclusion
9.5. Bibliography
Chapter 10 Biometric Techniques Applied to Video Surveillance
10.1. Introduction
10.2. The databases used for evaluation
10.3. Facial recognition
10.4. Iris recognition
10.5. Research projects
10.6. Conclusion
10.7. Bibliography
Chapter 11 Vehicle Recognition in Video Surveillance
11.1. Introduction
11.2. Specificity of the context
11.3. Vehicle modeling
11.4. Exploitation of object models
11.5. Increasing observability
11.6. Performances
11.7. Conclusion
11.8. Bibliography
Chapter 12 Activity Recognition
12.1. Introduction
12.2. State of the art
12.3. Ontology
12.4. Suggested approach: the ScReK system
12.5. Illustrations
12.6. Conclusion
12.7. Bibliography
Chapter 13 Unsupervised Methods for Activity Analysis and Detection of Abnormal Events
13.1. Introduction
13.2. An example of a topic model: PLSA
13.3. PLSM and temporal models
13.4. Applications: counting, anomaly detection
13.5. Conclusion
13.6. Bibliography
Chapter 14 Data Mining in a Video Database
14.1. Introduction
14.2. State of the art
14.3. Pre-processing of the data
14.4. Activity analysis and automatic classification
14.5. Results and evaluations
14.6. Conclusion
14.7. Bibliography
Chapter 15 Analysis of Crowded Scenes in Video
15.1. Introduction
15.2. Literature review
15.3. Data-driven crowd analysis in videos
15.4. Density-aware person detection and tracking in crowds
15.5. Conclusions and directions for future research
15.6. Acknowledgments
15.7. Bibliography
Chapter 16 Detection of Visual Context
16.1. Introduction
16.2. State of the art of visual context detection
16.3. Fast shared boosting
16.4. Experiments
16.5. Conclusion
16.6. Bibliography
Chapter 17 Example of an Operational Evaluation Platform: PPSL
17.1. Introduction
17.2. Use of video surveillance: approach and findings
17.3. Current use contexts and new operational concepts
17.4. Requirements in smart video processing
17.5. Conclusion
Chapter 18 Qualification and Evaluation of Performances
18.1. Introduction
18.2. State of the art
18.3. An evaluation program: ETISEO
18.4. Toward a more generic evaluation
18.5. The Quasper project
18.6. Conclusion
18.7. Bibliography
List of Authors
Index
First published 2013 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.
Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:
ISTE Ltd
27-37 St George’s Road
London SW19 4EU
UK
www.iste.co.uk
John Wiley & Sons, Inc.
111 River Street
Hoboken, NJ 07030
USA
www.wiley.com
© ISTE Ltd 2013
The rights of Jean-Yves Dufour to be identified as the author of this work have been asserted by him in accordance with the Copyright, Designs and Patents Act 1988.
Library of Congress Control Number: 2012946584
British Library Cataloguing-in-Publication Data
A CIP record for this book is available from the British Library
ISBN: 978-1-84821-433-0
Video surveillance consists of remotely watching public or private spaces using cameras. The images captured by these cameras are usually transmitted to a control center and immediately viewed by operators (real-time exploitation) and/or recorded and then analyzed on request (a posteriori exploitation) following a particular event (an accident, an assault, a robbery, an attack, etc.), for the purposes of investigation and/or evidence gathering.
Convenience stores, railways and air transport sectors are, in fact, the largest users of video surveillance. These three sectors alone account for over 60% of the cameras installed worldwide. Today, even the smallest sales points have four cameras per 80 m2 of the shop floor. Surveillance of traffic areas to help ensure the smooth flow of the traffic and the capacity for swift intervention in case of an accident brings the figure upto 80%, in terms of the number of installations. The protection of other critical infrastructures accounts for a further 10% of installations. The proliferation of cameras in pedestrian urban areas is a more recent phenomenon, and is responsible for the rest of the distribution.
Over the past 30+ years, we have seen a constant increase in the number of cameras in urban areas. In many people's minds, the reason behind this trend is a concern for personal protection, sparked first by a rise in crime (a steady increase in assaults in public areas) and then by the increase in terrorism over the past 10 years. However, this aspect cannot mask the multiplication of cameras in train stations, airports and shopping centers.
The defense of people and assets, which states are so eager to guarantee, has benefited greatly from two major technological breakthroughs: first, the advent of very high capacity digital video recorders (DVRs) and, second, the development of Internet protocol (IP) networks and so-called IP cameras. The latter breakthrough enables the images delivered by cameras to be distributed to various processing centers. This facilitates the (re)configuration of the system and the transmission of all the data (images, metadata, commands, etc.) over the same channel.
Today, we are reaping the benefits of these technological advances for the protection of critical infrastructures. Indeed, it is becoming easier to ensure interoperability with other protection or security systems (access monitoring, barriers, fire alarms, etc.). This facility is often accompanied by a poorer quality of images than those delivered by CCTV cameras.
Currently, the evolution of the urban security market is leading to the worldwide deployment of very extensive systems, consisting of hundreds or even thousands of cameras. While such systems, operated in clusters, have long been the panacea for transport operators, they have become unavoidable in urban areas.
All these systems generate enormous quantities of video data, which render real-time exploitation solely by humans near-impossible, and extremely long and very costly in terms of human resources. These systems have now come to be used essentially as operational aids. They are a tool for planning and support in the intervention of a protective force, be it in an urban area or in major transport centers.
“Video analytics”1 is intended to solve the problem of the incapability to exploit video streams in real time for the purposes of detection or anticipation. It involves having the videos analyzed by algorithms that detect and track objects of interest (usually people or vehicles) over time, and that indicate the presence of events or suspect behavior involving these objects. The aim is to be able to alert operators in suspicious situations in real time, economize on the bandwidth by only transmitting data that are pertinent for surveillance and improve searching capabilities in the archived sequences, by adding data relating to the content (metadata) to the videos.
The “Holy Grail” of video analytics can be summed up as the three main automatic functions: real-time detection of expected or unexpected events, capability to replay the events leading up to the observed situation in real time and the capacity to analyze the video a posteriori and retrace the root of an event.
Belonging to the wider academic domain of computer vision, video analytics has aroused a phenomenal surge of interest since the early 2000s, resulting – in concrete terms – in the proliferation of companies developing video analytics software worldwide and the setting up of a large number of collaborative projects(e.g. SERKET, CROMATICA, PRISMATICA, ADVISOR, CARETAKER, VIEWS, BOSS, LINDO, VANAHEIM and VICOMO, all funded by the European union).
Video analytics is also the topic of various academic gatherings. For instance, on a near-yearly basis since 1998, the Institute of Electrical and Electronics Engineers (IEEE) has organized an international conference: Advanced Video and Signal- based Surveillance (AVSS), which has become a reference point in the domain, and facilitates a regular meeting for people belonging to the fields of research, industry and governmental agencies.
Although motion detection, object detection and tracking or license plate recognition technologies have now been shown to be effective in controlled environments, very few systems are, as yet, sufficiently resistant to the changing environment and the complexity of urban scenes. Furthermore, the recognition of objects and individuals in complex scenes, along with the recognition of complex or "unusual" behavior, is one of the greatest challenges faced by researchers in this domain.
Furthermore, new applications, such as consumer behavior analysis and the search for target videos on the Internet, could accelerate the rise of video analytics.
The aims of this book are to highlight the operational attempts for video analytics, to identify possible driving forces behind potential evolutions in years to come and, above all, to present the state of the art and the technological hurdles that have yet to be overcome. This book is intended for an audience of students and young researchers in the field of computer visualization, and for engineers involved in large-scale video surveillance projects.
In Chapter 1, Henri Maitre, a pioneer and an eminent actor in the domain of image analysis today, provides an overview of the major issues that have been addressed and the advances that have been achieved since the advent of this discipline in the 1970s–1980s. The new challenges that have arisen today are also presented, along with the most promising technical approaches to overcome these challenges. These approaches will be illustrated in certain chapters of the book.
The subsequent chapters have been sequenced so as to successively deal with the applications of video analytics and the nature of the data processed, before going into detail about the technical aspects, which constitute the core of this book, and finishing with the subject of performance evaluation.
Chapters 2 and 3 deal with the applications of video analytics and present two important examples: the security of rail transport, which tops the list of users of video surveillance (both chronologically and in terms of the volume of activity generated), and an a posteriori investigation using video data. These chapters list the requirements in terms of video analytics functions, as well as the constraints and main characteristics identified for these two applications. Chapter 2 also discusses the research programs conducted in France and Europe in the field of transport, which have enabled significant advances in this domain.
Chapters 4 and 5 present the characteristics of the videos considered, by way of the sensors used to generate them and issues of transport and storage that, in particular, give rise to the need for compression. In Chapter 4, the recent evolutions in video surveillance cameras are presented, as are the new modes of imaging that could, in the future, enhance the perception of the scenes. Chapter 5 presents the formats of video images and the principles of video compression used in video surveillance.
Chapters 6–11 present the problems related to the analysis of objects of interest (people or vehicles) observed in a video, based on a chain of processing that is classic in image analysis: detection, tracking and recognition of these objects. Each chapter deals with one function, presenting the main characteristics and constraints, as well as the problems that need to be solved and the state-of-the-art of the methods proposed to tackle these problems. Chapter 6 presents the approaches of detection and tracking, based on the direct analysis of the information contained in the compressed video, so as to reduce the computation time for "low-level" operations for video analysis, as far as possible. Object detection is presented in Chapter 7, which describes the various approaches used today (background subtraction, estimation and exploitation of the motion apparent in the images, detection based on models that can be either explicit or estimated by way of automatic learning). Object tracking is dealt with in Chapter 8 (tracking within the field of view of a camera) and Chapter 9, which extends the problem to the case of observation by a network of cameras, considers two distinct configurations: (1) a single object is perceived at the same time by several cameras and (2) a single object is seen at different times by different cameras. In the latter case, the particular problem of "re-identification" of the object arises. Chapter 10 presents the application and adaptation to video surveillance of two functions used for biometrics: facial recognition and iris recognition. Chapter 11 focuses on the function of automatic vehicle recognition.
Chapters 12–16 deal with the "higher level" analysis of the video, aimed at lending semantic content to the scenes observed. Such an analysis might relate to the actions or behaviors of individuals (Chapters 12–14) or crowds (Chapter 15), or indeed to the overall characteristics of the scene being observed (Chapter 16). Chapter 12 examines the approaches that use a description of the activities in the form of scenarios, with a particular emphasis on representation of knowledge, modeling of the scenarios by the users and automatic recognition of these scenarios. Chapters 13 and 14 relate to the characterization of the activities observed by a camera over long periods of observation, and to the use of that characterization to detect "abnormal" activity, using two different approaches: the first (Chapter 13) operates on "visual words", constructed from simple features of the video such as position in the image, apparent motion and indicators of size or shape; and the second (Chapter 14) uses data-mining techniques to analyze trajectories constituted by prior detection and tracking of objects of interest. Chapter 15 gives an overview of the recent projects that have dealt with the various issues associated with crowd scene analysis, and presents two specific contributions: one relating to the creation of a crowd analysis algorithm using information previously acquired on a large database of crowd videos and the other touching on the problem of detection and tracking of people in crowd scenes, in the form of optimization of an energy function combining the estimation of the crowd density and the location of individuals. Finally, Chapter 16 relates to the determination of the visual context (or "scene recognition"), which consists of detecting the presence or absence of pre-established visual concepts in a given image, providing information about the general atmosphere in the image (indoor or outdoor scene; photo taken at night, during the day or at sunrise/sunset; an urban or suburban scene; the presence of vegetation, buildings, etc.). A visual concept may also refer to the technical characteristics of an image (level of blur, quality of the image) or to a more subjective impression of a photograph (amusing, worrying, aesthetically pleasing, etc.).
The final two chapters (Chapters 17 and 18) deal with performance evaluation. Chapter 17 presents the aims of a structure called Pôle Pilote de Sécurité Locale (PPSL) - Pilot Center for Urban Security, set up to create and implement quasi- real-world tests for new technologies for local and urban security, involving both the end users (police, firefighters, ambulance, etc.) and the designers. Chapter 18 discusses the issue of performance evaluation of the algorithms. It first presents the main initiatives that have seen the light of day, with a view to comparing systems on shared functional requirements with evaluation protocols and shared data. Then it focuses on the ETISEO2 competition, which has enabled significant advances to be made, offering – besides annotated video sequences – metrics meant for a particular task and tools to facilitate the evaluation. The objective qualification of an algorithmic solution in relation to measurable factors (such as the contrast of the object) remains an unsolved problem on which there has been little work done to date. An approach is put forward to make progress in this area, and the chapter closes with a brief presentation of the research program QUASPER R&D, which aims to define the scientific and technical knowledge required for the implementation of a platform for qualification and certification of perception systems.
1 Literature on the topic usually uses the term video analytics, but we may also come across the terms video content analysis, intelligent video surveillance or smart video surveillance.
2 Evaluation du Traitement et de l’Interprétation de SEquences vidEO (Evaluation for video processing and understanding).
“Puissance de l’image, dit-on? il s’agit bel et bien, plutôt, de l’extrême richesse du plus évolué de nos sens : la vue – ou, pour mieux dire, de la plus remarquable de nos fonctions de contact avec l’environnement: la vision, œil et cerveau. De fait, en termes de quantité d’information véhiculée et de complexité de son traitement, il n’y a guère, pour l’être humain, que la fonction de reproduction qui puisse soutenir la comparaison avec la fonction de vision.”
D. Estournet1
In an exercise in prospective, it is always helpful to look back toward the foundation of the domain in question, examine the context of its apparition and then that of its evolutions, to identify the reasons for its hurdles or – conversely – the avenues of its progressions. Above all, the greatest advantage can be found in revisiting the promises made by the discipline, comparing them with what has actually been achieved and measuring the differences.
Today, the field of image processing is a little over 50 years old. Indeed, it was in the 1960s that elementary techniques began to emerge – in parallel but often independently of one another – which gradually came together to form image processing as we now know it, which is partly the subject of this book.
Of these techniques, we will begin by discussing the extension to two or three dimensions (2D or 3D) of signal processing methods. In this exercise, among other great names, the following have distinguished themselves: R.M. Mersereau, L.R. Rabiner, J.H. McClellan, T.S. Huang, J.L. Shanks, B.R. Hunt, H.C. Andrews, A. Bijaoui, etc., recognized for their contribution both to 1D and 2D. The aim of their work was to enable images to benefit from all the modeling, prediction, filtering and restoration tools that were becoming established at the time in acoustics, radar and speech. Based on the discovery of rapid transformations and their extension to 2D, these works naturally gave rise to spectral analysis of images – a technique that is still very much in use today. However, this route is pockmarked by insightful but unfulfilled, abandoned projects that have hitherto not been widely exploited – relating, for example, to the stability of multidimensional filters or 2D recursive processes – because the principle of causality that governs temporal signals had long thwarted image processors, which expected to find it in the television signal, for instance. From then on, this field of signal processing became particularly fertile. It is directly at the root of the extremely fruitful approaches of tomographic reconstruction, which nowadays is unavoidable in medical diagnostics or physical experimentation, and wavelet theory, which is useful in image analysis or compression. More recently, it is to be found at the heart of the sparse approaches, which harbor many hopes of producing the next “great leap forward” in image processing.
A second domain also developed in the 1960s, was based on discrete – and often binary – representation of images. Using completely different tools, the pioneers of this domain turned their attention to other properties of images: the connexity, the morphology, the topology of forms and spatial meshes that are a major component of an image. Turning away from continuous faithful representation of the signal, they set about identifying abstract properties: the relative position, the inside and outside, contact and inclusion, thereby opening the way to shape semantics on the one hand, and a verbal description of the space, which naturally gave way to scene analysis on the other hand. In this discipline as well, a number of great names can be held up: A. Rosenfeld, T. Pavlidis, M. Eden, M.J.E. Golay, A. Guzman, H. Freeman, G. Matheron and J. Serra.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
