91,99 €
Steganography is the art of communicating a secret message, hiding the very existence of a secret message. This book is an introduction to steganalysis as part of the wider trend of multimedia forensics, as well as a practical tutorial on machine learning in this context. It looks at a wide range of feature vectors proposed for steganalysis with performance tests and comparisons. Python programs and algorithms are provided to allow readers to modify and reproduce outcomes discussed in the book.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 466
Veröffentlichungsjahr: 2012
Table of Contents
Title Page
Copyright
Preface
Chapter 1: Introduction
1.1 Real Threat or Hype?
1.2 Artificial Intelligence and Learning
1.3 How to Read this Book
Chapter 2: Steganography and Steganalysis
2.1 Cryptography versus Steganography
2.2 Steganography
2.3 Steganalysis
2.4 Summary and Notes
Chapter 3: Getting Started with a Classifier
3.1 Classification
3.2 Estimation and Confidence
3.3 Using libSVM
3.4 Using Python
3.5 Images for Testing
3.6 Further Reading
Chapter 4: Histogram Analysis
4.1 Early Histogram Analysis
4.2 Notation
4.3 Additive Independent Noise
4.4 Multi-dimensional Histograms
4.5 Experiment and Comparison
Chapter 5: Bit-plane Analysis
5.1 Visual Steganalysis
5.2 Autocorrelation Features
5.3 Binary Similarity Measures
5.4 Evaluation and Comparison
Chapter 6: More Spatial Domain Features
6.1 The Difference Matrix
6.2 Image Quality Measures
6.3 Colour Images
6.4 Experiment and Comparison
Chapter 7: The Wavelets Domain
7.1 A Visual View
7.2 The Wavelet Domain
7.3 Farid's Features
7.4 HCF in the Wavelet Domain
7.5 Denoising and the WAM Features
7.6 Experiment and Comparison
Chapter 8: Steganalysis in the JPEG Domain
8.1 JPEG Compression
8.2 Histogram Analysis
8.3 Blockiness
8.4 Markov Model-based Features
8.5 Conditional Probabilities
8.6 Experiment and Comparison
Chapter 9: Calibration Techniques
9.1 Calibrated Features
9.2 JPEG Calibration
9.3 Calibration by Downsampling
9.4 Calibration in General
9.5 Progressive Randomisation
Chapter 10: Simulation and Evaluation
10.1 Estimation and Simulation
10.2 Scalar Measures
10.3 The Receiver Operating Curve
10.4 Experimental Methodology
10.5 Comparison and Hypothesis Testing
10.6 Summary
Chapter 11: Support Vector Machines
11.1 Linear Classifiers
11.2 The Kernel Function
11.3 ν-SVM
11.4 Multi-class Methods
11.5 One-class Methods
11.6 Summary
Chapter 12: Other Classification Algorithms
12.1 Bayesian Classifiers
12.2 Estimating Probability Distributions
12.3 Multivariate Regression Analysis
12.4 Unsupervised Learning
12.5 Summary
Chapter 13: Feature Selection and Evaluation
13.1 Overfitting and Underfitting
13.2 Scalar Feature Selection
13.3 Feature Subset Selection
13.4 Selection Using Information Theory
13.5 Boosting Feature Selection
13.6 Applications in Steganalysis
Chapter 14: The Steganalysis Problem
14.1 Different Use Cases
14.2 Images and Training Sets
14.3 Composite Classifier Systems
14.4 Summary
Chapter 15: Future of the Field
15.1 Image Forensics
15.2 Conclusions and Notes
Bibliography
Index
This edition first published 2012
© 2012, John Wiley & Sons Ltd
Registered office
John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom
For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.
The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.
Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.
Library of Congress Cataloging-in-Publication Data
Schaathun, Hans Georg.
Machine learning in image steganalysis / Hans Georg Schaathun.
p. cm.
Includes bibliographical references and index.
ISBN 978-0-470-66305-9 (hardback)
1. Machine learning. 2. Wavelets (Mathematics) 3. Data encryption (Computer science) I. Title.
Q325.5.S285 2012
006.3′1—dc23
2012016642
A catalogue record for this book is available from the British Library.
Preface
Books are conceived with pleasure and completed with pain. Sometimes the result is not quite the book which was planned.
This is also the case for this book. I have learnt a lot from writing this book, and I have discovered a lot which I originally hoped to include, but which simply could not be learnt from the literature and will require a substantial future research effort. If the reader learns half as much as the author, the book will have been a great success.
The book was written largely as a tutorial for new researchers in the field, be they postgraduate research students or people who have earned their spurs in a different field. Therefore, brief introductions to much of the basic theory have been included. However, the book should also serve as a survey of the recent research in steganalysis. The survey is not complete, but since there has been no other book1 dedicated to steganalysis, and very few longer survey papers, I hope it will be useful for many.
I would like to take this opportunity to thank my many colleagues in the Computing Department at the University of Surrey, UK and the Engineering departments at Ålesund University College, Norway for all the inspiring discussions. Joint research with Dr Johann Briffa, Dr Stephan Wesemeyer and Mr (now Dr) Ainuddin Abdul Wahab has been essential to reach this point, and I owe them a lot. Most of all, thanks to my wife and son for their patience over the last couple of months. My wife has also been helpful with some of the proofreading and some of the sample photographs.
1 We have later learnt that two other books appeared while this was being written: Rainer Böhme: Advanced Statistical Steganalysis and Mahendra Kumar: Steganography and Steganalysis in JPEG images.
Chapter 1
Introduction
Steganography is the art of communicating a secret message, from Alice to Bob, in such a way that Alice's evil sister Eve cannot even tell that a secret message exists. This is (typically) done by hiding the secret message within a non-sensitive one, and Eve should believe that the non-sensitive message she can see is all there is. Steganalysis, on the contrary, is Eve's task of detecting the presence of a secret message when Alice and Bob employ steganography.
A frequently asked question is Who needs steganalysis? Closely related is the question of who is using steganography. Unfortunately, satisfactory answers to these questions are harder to find.
A standard claim in the literature is that terrorist organisations use steganography to plan their operations. This claim seems to be founded on a report in USA Today, by Kelley (2001), where it was claimed that Osama bin Laden was using the Internet in an ‘e-jihad’ half a year before he became world famous in September 2001. The idea of the application is simple. Steganography, potentially, makes it possible to hide detailed plans with maps and photographs of targets within images, which can be left on public sites like e-Bay or Facebook as a kind of electronic dead-drop.
The report in USA Today was based on unnamed sources in US law enforcement, and there has been no other evidence in the public domain that terrorist organisations really are using steganography to plan their activities. Goth (2005) described it as a hype, and predicted that the funding opportunities enjoyed by the steganography community in the early years of the millennium would fade. It rather seems that he was correct. At least the EU and European research councils have shown little interest in the topic.
Steganography has several problems which may make it unattractive for criminal users. Bagnall (2003) (quoted in Goth (2005)) points out that the acquisition, possession and distribution of tools and knowledge necessary to use steganography in itself establishes a traceable link which may arouse as much suspicion as an encrypted message. Establishing the infrastructure to use steganography securely, and keeping it secret during construction, is not going to be an easy exercise.
More recently, an unknown author in The Technical Mujahedin (Givner-Forbes, 2007; Unknown, 2007) has advocated the use of steganography in the jihad, giving some examples of software to avoid and approaches to evaluating algorithms for use. There is no doubt that the technology has some potential for groups with sufficient resources to use it well.
In June 2010 we heard of ten persons (alleged Russian agents) being arrested in the USA, and according to the news report the investigation turned up evidence of the use of steganography. It is too early to say if these charges will give steganalysis research a new push. Adee (2010) suggests that the spies may have been thwarted by old technology, using very old and easily detectable stego-systems. However, we do not know if the investigators first identified the use of steganography by means of steganalysis, or if they found the steganographic software used on the suspects' computers first.
So currently, as members of the general public and as academic researchers, we are unable to tell whether steganography is a significant threat or mainly a brain exercise for academics. We have no strong evidence of significant use, but then we also know that MI5 and MI6, and other secret services, who would be the first to know if such evidence existed, would hardly tell us about it. In contrast, by developing public knowledge about the technology, we make it harder for criminal elements to use it successfully for their own purposes.
Most of the current steganalysis techniques are based on machine learning in one form or another. Machine learning is an area well worth learning, because of its wide applications within medical image analysis, robotics, information retrieval, computational linguistics, forensics, automation and control, etc. The underlying idea is simple; if a task is too complex for a human being to learn, let's train a machine to do it instead. At a philosophical level it is harder. What, after all, do we really mean by learning?
Learning is an aspect of intelligence, which is often defined as the ability to learn. Machine learning thus depends on some kind of artificial intelligence (AI). As a scientific discipline machine learning is counted as a sub-area of AI, which is a more well-known idea at least for the general public. In contrast, our impression of what AI is may be shaped as much by science fiction as by science. Many of us would first think of the sentient computers and robots in the 1960s and 1970s literature, such as Isaac Asimov's famous robots who could only be kept from world domination by the three robotic laws deeply embedded in their circuitry.
As often as a dream, AI has been portrayed as a nightmare. Watching films like Terminator and The Matrix, maybe we should be glad that scientists have not yet managed to realise the dream of AI. Discussing AI as a scientific discipline today, it may be more fruitful to discuss the different sub-disciplines. The intelligent and sentient computer remains science fiction, but various AI-related properties have been realised with great success and valuable applications. Machine learning is one of these sub-disciplines.
The task in steganalysis is to take an object (communication) and classify this into one out of two classes, either the class of steganograms or the class of clean messages. This type of problem, of designing an algorithm to map objects to classes, is known as pattern recognition or classification in the literature. Once upon a time pattern recognition was primarily based on statistics, and the approach was analytic, aiming to design a statistical model to predict the class. Unfortunately, in many applications, the problem is too complex to make this approach feasible. Machine learning provides an alternative to the analytic approach.
A learning classifier builds a statistical model to solve the classification problem, by brute-force study of a large number of statistics (so-called features) from a set of objects selected for training. Thus, we say that the classifier learns from the study of the training set, and the acquired learning can later be used to classify previously unseen objects. Contrary to the analytic models of statistical approaches, the model produced by machine learning does not have to be comprehensible for human users, as it is primarily for machine processing. Thus, more complex and difficult problems can be solved more accurately.
With respect to machine learning, the primary objective of this book is to provide a tutorial to allow a reader with primary interest in steganography and steganalysis to use black box learning algorithms in steganalysis. However, we will also dig a little bit deeper into the theory, to inspire some readers to carry some of their experience into other areas of research at a later stage.
There are no ‘don't do this at home’ clauses in this book. Quite the contrary. The body of experimental data in the literature is still very limited, and we have not been able to run enough experiments to give you more than anecdotal evidence in this book. To choose the most promising methods for steganalysis, the reader will have to make his own comparisons with his own images. Therefore, the advice must be ‘don't trust me; try it yourself’. As an aid to this, the software used in this book can be found at: http://www.ifs.schaathun.net/pysteg/.
For this reason, the primary purpose of this book has been to provide a hands-on tutorial with sufficient detail to allow the reader to reproduce examples. At the same time, we aim to establish the links to theory, to make the connection to relevant areas of research as smooth as possible. In particular we spend time on statistical methods, to explain the limitations of the experimental paradigms and understand exactly how far the experimental results can be trusted.
In this first part of the book, we will give the background and context of steganalysis (Chapter 2), and a quick introduction and tutorial (Chapter 3) to provide a test platform for the next part.
Part II is devoted entirely to feature vectors for steganalysis. We have aimed for a broad survey of available features, but it is surely not complete. The best we can hope for is that it is more complete than any previous one. The primary target audience is research students and young researchers entering the area of steganalysis, but we hope that more experienced researchers will also find some topics of interest.
Part III investigates both the theory and methodology, and the context and challenges of steganalysis in more detail. More diverse than the previous parts, this will both introduce various classifier algorithms and take a critical view on the experimental methodology and applications in steganalysis. The classification algorithms introduced in Chapters 11 and 12 are intended to give an easy introduction to the wider area of machine learning. The discussions of statistics and experimental methods (Chapter 10), as well as applications and practical issues in steganalysis (Chapter 14) have been written to promote thorough and theoretically founded evaluation of steganalytic methods. With no intention of replacing any book on machine learning or statistics, we hope to inspire the reader to read more.
Chapter 2
Steganography and Steganalysis
Secret writing has fascinated mankind for several millennia, and it has been studied for many different reasons and motivations. The military and political purposes are obvious. The ability to convey secret messages to allies and own units without revealing them to the enemy is obviously of critical importance for any ruler. Equally important are the applications in mysticism. Literacy, in its infancy, was a privilege of the elite. Some cultures would hold the ability to create or understand written messages as a sacred gift. Secret writing, further obscuring the written word, would further elevate the inner-most circle of the elite. Evidence of this can be seen both in hieroglyphic texts in Egypt and Norse runes on the Scottish islands.
The term steganography was first coined by an occultist, namely Trithemius (c. 1500) (see also Fridrich (2009)). Over three volumes he discussed methods for encoding messages, occult powers and communication with spirits. The word ‘steganography’ is derived from the Greek words ganesh (steganos) for ‘roof’ or ‘covered’ and ganesh (grafein) ‘to write’. Covered, here, means that the message should be concealed in such a way that the uninitiated cannot tell that there is a secret message at all. Thus the very existence of the secret message is kept a secret, and the observer should think that only a mundane, innocent and non-confidential message is transmitted.
Use of steganography predates the term. A classic example was reported by Herodotus (440 bc, Book V). Histiæus of Miletus (late 6th century bc) wanted to send a message to his nephew, Aristagoras, to order a revolt. With all the roads being guarded, an encrypted message would surely be intercepted and blocked. Even though the encryption would keep the contents safe from enemy intelligence, it would be of no use without reaching the intended recipient. Instead, he took a trusted slave, shaved his head, and tattooed the secret message onto it. Then he let the hair regrow to cover the message before the slave was dispatched.
Throughout history, techniques for secret writing have most often been referred to as cryptography, from ganesh (cryptos) meaning ‘hidden’ and ganesh (grafein, ‘to write’). The meaning of the word has changed over time, and today, the cryptography and steganography communities would tend to use it differently.
Cryptography, as an area of research, now covers a wide range of security-related problems, including secret communications, message authentication, identification protocols, entity authentication, etc. Some of these problems tended to have very simple solutions in the pre-digital era. Cryptography has evolved to provide solutions to these problems in the digital domain. For instance, historically, signatures and seals have been placed on paper copies to guarantee their authenticity. Cryptography has given us digital signatures, providing, in principle, similar protection to digital files. From being an occult discipline in Trithemius' time, it has developed into an area of mathematics in the 20th century. Modern cryptography is always based on formal methods and formal proofs of security, giving highly trusted modules to be used in the design of secure systems.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
