142,99 €
This latest addition to the successful Network Biology series presents current methods for determining the entropy of networks, making it the first to cover the recently established Quantitative Graph Theory.
An excellent international team of editors and contributors provides an up-to-date outlook for the field, covering a broad range of graph entropy-related concepts and methods. The topics range from analyzing mathematical properties of methods right up to applying them in real-life areas.
Filling a gap in the contemporary literature this is an invaluable reference for a number of disciplines, including mathematicians, computer scientists, computational biologists, and structural chemists.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 495
Veröffentlichungsjahr: 2016
Related Title
Title Page
Copyright
List of Contributors
Preface
Chapter 1: Entropy and Renormalization in Chaotic Visibility Graphs
1.1 Mapping Time Series to Networks
1.2 Visibility Graphs and Entropy
1.3 Renormalization Group Transformations of Horizontal Visibility Graphs
1.4 Summary
1.5 Acknowledgments
References
Chapter 2: Generalized Entropies of Complex and Random Networks
2.1 Introduction
2.2 Generalized Entropies
2.3 Entropy of Networks: Definition and Properties
2.4 Application of Generalized Entropy for Network Analysis
2.5 Open Networks
2.6 Summary
References
Chapter 3: Information Flow and Entropy Production on Bayesian Networks
3.1 Introduction
3.2 Brief Review of Information Contents
3.3 Stochastic Thermodynamics for Markovian Dynamics
3.4 Bayesian Networks
3.5 Information Thermodynamics on Bayesian Networks
3.6 Examples
3.7 Summary and Prospects
References
Chapter 4: Entropy, Counting, and Fractional Chromatic Number
4.1 Entropy of a Random Variable
4.2 Relative Entropy and Mutual Information
4.3 Entropy and Counting
4.4 Graph Entropy
4.5 Entropy of a Convex Corner
4.6 Entropy of a Graph
4.7 Basic Properties of Graph Entropy
4.8 Entropy of Some Special Graphs
4.9 Graph Entropy and Fractional Chromatic Number
4.10 Symmetric Graphs with respect to Graph Entropy
4.11 Conclusion
Appendix 4.A
Proof of Lemma 4.6
References
Chapter 5: Graph Entropy: Recent Results and Perspectives
5.1 Introduction
5.2 Inequalities and Extremal Properties on (Generalized) Graph Entropies
5.3 Relationships between Graph Structures, Graph Energies, Topological Indices, and Generalized Graph Entropies
5.4 Summary and Conclusion
References
Chapter 6: Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test
6.1 Introduction
6.2 Random Graphs
6.3 Graph Spectrum
6.4 Graph Spectral Entropy
6.5 Kullback–Leibler Divergence
6.6 Jensen–Shannon Divergence
6.7 Model Selection and Parameter Estimation
6.8 Hypothesis Test between Graph Collections
6.9 Final Considerations
6.10 Conclusions
6.11 Acknowledgments
References
Chapter 7: Graph Entropies in Texture Segmentation of Images
7.1 Introduction
7.2 Graph Entropy-Based Texture Descriptors
7.3 Geodesic Active Contours
7.4 Texture Segmentation Experiments
7.5 Analysis of Graph Entropy-Based Texture Descriptors
7.6 Conclusion
References
Chapter 8: Information Content Measures and Prediction of Physical Entropy of Organic Compounds
8.1 Introduction
8.2 Method
8.3 Prediction of Physical Entropy
8.4 Conclusion
8.5 Acknowledgment
References
Chapter 9: Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data
9.1 Introduction
9.2 State of the Art
9.3 Identifying Collaboration Styles using Graph Entropy from Bibliometric Data
9.4 Method and Materials
9.5 Results
9.6 Discussion and Future Outlook
References
Index
End User License Agreement
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
Table of Contents
Preface
Begin Reading
Chapter 1: Entropy and Renormalization in Chaotic Visibility Graphs
Figure 1.1 Illustrative example of the natural visibility algorithm. In the upper part, we plot a periodic time series and in the bottom part, we represent the graph generated through the natural visibility algorithm. Each datum in the series corresponds to a node in the graph, such that two nodes are connected if their corresponding data heights fulfill the visibility criterion of equation 1.1. Note that the degree distribution of the visibility graph is composed by a finite number of peaks, much in the vein of the discrete Fourier transform (DFT) of a periodic signal. We can thus interpret the visibility algorithm as a geometric transform.
Figure 1.2 Illustrative example of the Natural (a) and Horizontal (b) visibility algorithms. We plot the same time series and represent the graphs generated through both visibility algorithms below. Each datum in the series corresponds to a node in the graph, such that two nodes are connected if their corresponding data heights fulfill the visibility criteria of equations 1.1 and 1.2, respectively. Observe that the Horizontal Visibility graph is a subgraph of the Natural Visibility graph for the same time series.
Figure 1.3
Feigenbaum graphs from the logistic map
. The main Figure portrays the family of attractors of the logistic map and indicates a transition from periodic to chaotic behavior at through period-doubling bifurcations. For , the Figure shows merging of chaotic-band attractors, where aperiodic behavior appears interrupted by windows that, when entered from their left-hand side, display periodic motion of period with (for , ) that subsequently develops into period-doubling cascades with new accumulation points .
Figure 1.4
Periodic Feigenbaum graphs for
. The sequence of graphs associated to periodic attractors with increasing period undergoing a period-doubling cascade. The pattern that occurs for increasing values of the period is related to the universal ordering with which an orbit visits the points of the attractor. Observe that the hierarchical self-similarity of these graphs requires that the graph for is a subgraph of that for .
Figure 1.5
Aperiodic Feigenbaum graphs for
. A sequence of graphs associated with chaotic series after chaotic-band reverse bifurcations, starting at for , when the attractor extends along a single band and the degree distribution does not present any regularity (dashed links). For , the phase space is partitioned in disconnected chaotic bands and the th self-affine image of is the th Misiurewicz point . In all cases, the orbit visits each chaotic band in the same order as in the periodic region . This order of visits induces an ordered structure in the graphs (black links) analogous to that found for the period-doubling cascade.
Figure 1.6
Horizontal visibility network entropy
and Lyapunov exponent
for the Logistic map.
We plot the numericalvalues of and for (the numerical step is and in each case the processed time series have a size of data). The inset reproduces the same data but with a rescaled entropy . The surprisingly good match between both quantities is due to the Pesin identity (see text). Unexpectedly, the Lyapunov exponent within the periodic windows ( inside the chaotic region) is also well captured by .
Figure 1.7 Graphical illustration of how the horizontal visibility (HV) graph inherits in its structure the dynamics of the associated intermittent series. In the top of the figure, we show a sample intermittent series generated by the logistic map close to (), producing laminar regions (black) mixed with chaotic bursts (white). In the bottom, we plot the associated HV graph. Laminar regions are mapped into nodes with a periodic backbone, whereas the actual pseudo-periodicity of the series is inherited in the graph by the existence of the so-called peak or interfacial nodes. Chaotic bursts are mapped into chaotic nodes, with a characteristic degree distribution.
Figure 1.8 Log–log plot of the block entropies constructed from degree distributions of -sequence of connectivities in the HV graphs as a function of : (squares), (upward-pointing triangles), (downward-pointing triangles), and (right triangles). A scaling of the form is found. (Inset panel) Log–log plot of the convergence of to the exponent associated to the Lyapunov exponent, as a function of . A relation of the form is found.
Figure 1.9 Examples of two standard circle map periodic series with dressed winding number , (a) and (b). As can be observed, the order of visits on the circle and the relative values of remain invariant and the associated HV graph is therefore the same in both cases.
Figure 1.10 Six levels of the Farey tree and the periodic motifs of the graphs associated with the corresponding rational fractions taken as dressed winding numbers in the circle map (for space reasons, only two of these are shown at the sixth level). (a) In order to show how graph concatenation works, we have highlighted an example using different gray tones on the left-hand side: as , is placed on the left-hand side, on the right-hand side and their extremes are connected to an additional link closing the motif . (b) Five steps in the Golden ratio route, (thick solid line); (c) Three steps in the Silver ratio route, (thick dashed line).
Figure 1.11 Renormalization process and network RG flow structure. (a) Illustration of the renormalization process: a node with degree is coarse-grained with one of its neighbors (indistinctively) into a block node that inherits the links of both nodes. This process coarse-grains every node with the degree of each renormalization step. (b) Example of an iterated renormalization process in a sample Feigenbaum graph at a periodic window with initial period after period-doubling bifurcations (an orbit of period). (c) RG flow diagram.
Figure 1.12 Illustration of the renormalization operator applied on the HV graph at . This graph renormalizes, after two iterations of , into an HV graph which is itself (i) invariant under and (ii) unstable under perturbations in , thus constituting a nontrivial (saddle) fixed point of the graph-theoretical RG flow.
Chapter 2: Generalized Entropies of Complex and Random Networks
Figure 2.1 The generalized mutual entropy , the generalized entropy , and the average correlator are plotted versus the mutual entropy order when the nonzero matrix elements of the connectivity matrix are replaced by the products of the nodes degrees (a), the inverses of the products of the nodes degrees (b), and a uniform distribution of real numbers in the range between 0 and 1 (c).
Figure 2.2 (a) Mutual entropies of randomly selected subsets of nodes of a matrix with size of 5000 nodes; circles, triangles, squares, and stars represent values of , and the difference of the entropies for and , respectively. Filled symbols are used for the original network, and the hollow symbols are for the perturbed version of it. (b) Plot of the average mutual entropies of randomly chosen subsets of nodes from a scale-free network with 5000 nodes, the bars represent the standard deviations of the mutual entropy at the corresponding subnetwork sizes. Averages and standard deviations have been calculated over a sample of 100 subnetworks for each subset size. (c) Semi-log plot of the average mutual entropies of the randomly chosen subsets of nodes.
Figure 2.3 Six main clusters are identified for a scale-free network of size of 5000 nodes showing a pattern of self-similarity.
Figure 2.4 Mutual entropy for clusters in the 5000-node scale-free network. The graphs correspond to (a), (b), (c), and (d).
Figure 2.5 Changes in the mutual entropy as a function of the size of the subnetwork with nodes included according to the size of the clusters.
Figure 2.6 The network with 100 nodes, each of them randomly connected to seven other nodes.
Figure 2.7 Arbitrary chosen adjacency matrix for the network shown in Figure 2.6.
Figure 2.8 Randomly permuted adjacency matrix in Figure 2.7.
Figure 2.9 Randomly permuted adjacency matrix in Figure 2.7, after interchanging nodes (20,30) and (20,5).
Figure 2.10 The plots of (a) mutual entropy , (b) rescaled mutual entropy , and (c) relative entropy uncertainty versus the size of the open network. The triangles, circles, and squares represent the -degree values 0, 1, and 2, respectively, and the stars represent the difference between -degrees 1 and 2.
Figure 2.11 Degree distribution of the whole 5000 nodes network compared to that of cluster 6.
Figure 2.12 Plot of the mutual entropies of (a) order 2 and (b) the difference of the mutual entropies and for the clusters identified in a scale-free network with 5000 nodes.
Chapter 3: Information Flow and Entropy Production on Bayesian Networks
Figure 3.1 Schematic of the Szilard engine. The demon obtains measurement outcome (left) or (right), corresponding to one bit () of information. It then extracts of work by feedback control.
Figure 3.2 Venn diagram showing the relationship between the Shannon entropy and the mutual information.
Figure 3.3 Discretization of the time evolution of the external parameter.
Figure 3.4 Example of a simple directed acyclic graph with and .
Figure 3.5 Schematic of the parents of . The set of the parents, , is defined as the set of the nodes that have directed edges toward . This Figure also illustrates the setup of Eq. (3.80).
Figure 3.6 Simple examples of Bayesian networks.
Figure 3.7 Schematic of the physical setup of Bayesian networks. The time evolution of system is given by the sequence of nodes , and the time evolution of is given by .
Figure 3.8 Schematics of informational quantities on Bayesian networks. (a) The initial correlation between and , (b) the final correlation between and , and (c) the transfer entropy from to .
Figure 3.9 Schematic of .
Figure 3.10 Bayesian networks of the examples. Example 1: Markov chain. Example 2: Feedback control with a single measurement. Example 3: Repeated feedback control with multiple adaptive measurements.
Figure 3.11 Bayesian networks of the examples. Example 4: Markovian information exchanges between two systems. (a) Entire dynamics. (b) A single transition. Example 5: Complex dynamics of three interacting systems.
Chapter 4: Entropy, Counting, and Fractional Chromatic Number
Figure 4.1 (a) A 5-cycle . (b) A triangle . (c) The graph .
Chapter 5: Graph Entropy: Recent Results and Perspectives
Figure 5.1 The trees with maximum value of among all trees with vertices for .
Figure 5.2 The trees with minimum value of among all trees with vertices for .
Figure 5.3 Minimal graphs for the entropy of orders 8 and 9.
Figure 5.4 Two minimal graphs for the graph entropy of order 8.
Chapter 6: Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test
Figure 6.1
Random graph models
. Graphs with vertices generated by the Erdős–Rényi (a), Gilbert (b), Geometric (c), Barabási- Albert (d), Watts–Strogatz (e), and -regular (f) random graph models. In (a), the number of edges is equal to , where . In (b), the probability of connecting two vertices is equal to 0.007. In (c), we have , where is the radius used by the geometric model. In (d), we set the scale exponent to 1. In (e), the probability of reconnecting a vertex is equal to . In (f), the degree of each vertex is . (Takahashi
et al.
[6]. Used under CC BY 4.0 https://creativecommons.org/licenses/by/4.0/.)
Figure 6.2
Graph spectral density
. Spectral density estimators for graphs with 500 vertices generated by the Erdős–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), Watts–Strogatz (e), and -regular (f) random graph models. In (a), the number of edges is equal to . In (b), the probability of connecting two vertices is equal to 0.007. In (c), we have , where is the radius used by the geometric model. In (d), we set the scale exponent to 1. In (e), the probability of reconnecting a vertex is equal to . In (f), the degree of each vertex is .
Figure 6.3
Graph spectral entropy
. The empirical graph spectral entropy (-axis) for the Erdős–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), Watts–Strogatz (e), and -regular (f) random graph models. In (a), (b), (c), and (e), the values on -axis varies from 0 to 1. In (d), the values vary from 1 to 2. In (f), the values vary from 0 to 0.5. In (b), (c), (d), and (e), the -axis corresponds to the parameters , , , and , respectively. In (a), the parameter is obtained by multiplying the value on the -axis by . In (f), we multiply the value on the -axis by to obtain . For each model, the empirical spectral entropy was obtained from 50 graphs of size .
Figure 6.4
ROC curves under
. The dashed lines show the expected outcome. The continuous lines indicate the observed ROC curve (significance level vs. proportion of rejected null hypothesis) for the Jensen–Shannon test between two collections of graphs of size generated by the same random process (null hypothesis). In each scenario, the Jensen–Shannon test was applied to data sets generated by the Erdős–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), Watts–Strogatz (e), and -regular random graph models. For each model, all graphs were constructed with the same parameters. In (a), the number of edges is equal to . In (b), the probability of connecting two vertices is equal to . In (c), we have , where is the radius used by the geometric model. In (d), we set the scale exponent to 0.5. In (e), the probability of reconnecting a vertex is equal to . In (f), the vertex degrees are equal to .
Figure 6.5
ROC curves under
. The dashed lines show the poorest possible outcome. The continuous lines indicate the observed ROC curve (significance level vs. proportion of rejected null hypothesis) for the Jensen–Shannon test between two collections of graphs of size generated by different random processes (alternative hypothesis). In each scenario, the Jensen–Shannon test was applied to data sets generated by the Erdős–Rényi (a), Gilbert (b), Geometric (c), Barabási–Albert (d), and Watts–Strogatz (e) random graph models. For each model, two different parameters are used to generate the two collections of graphs. In (a), the number of edges is equal to and , where . In (b), the probability of connecting two vertices is equal to 0.4 and 0.41. In (c), we have for one collection and for the other, where is the radius used by the geometric model. In (d), we set the scale exponent to 0.5 and 0.6. In (e), the probability of reconnecting a vertex is equal to and . In (f), the vertex degrees are equal to and .
Chapter 7: Graph Entropies in Texture Segmentation of Images
Figure 7.1
Left to right: (a)
Texture patch
flowers
, pixels. –
(b)
Texture patch
leaves
, same size. –
(c)
Test image composed from (a) and (b), pixels. – Both texture patches are converted to gray scale, down scaled, and clipped from the
VisTex
database [43]. ©1995 Massachusetts Institute of Technology.
Figure 7.4 Texture segmentation of a synthetic image.
Top row, left to right: (a)
Test image ( pixels) showing a stripe-textured shape in front of a noise background. –
(b–d)
Texture descriptors based on graph entropies applied in adaptive patches, , , ; values rescaled to .
(b)
. –
(c)
. –
(d)
. –
Bottom row:
Geodesic active contour segmentation of (a) using the texture descriptor ( on ) from (b), same parameters as in Figure 7.2 except for .
Left to right: (e)
Initial contour (). –
(f)
. –
(g)
. –
(h)
(steady state).
Figure 7.5 Texture segmentation of a real-world image.
Top row, left to right: (a)
Photograph of a zebra ( pixels), converted to gray scale. (Original image from http://en.wikipedia.org/wiki/File:Grants_Zebra.jpg, author Ruby 1x2, released to public domain.) –
(b)
Texture descriptor , , , . –
(c)
Texture descriptor , parameters as in (b). –
Middle row, left to right: (d)
Texture descriptor , parameters as in (b). –
(e)
Texture descriptor , parameters as in (b). –
(f)
Initial contour for geodesic active contour segmentation (). –
Bottom row, left to right: (g)
Geodesic active contours with edge-stopping function computed from the texture descriptors shown in (d) and shown in (e) with , Perona–Malik edge-stopping function, , . –
(h)
Same as (g) but . –
(i)
Same as (g) but (steady state).
Figure 7.2 Graph entropy-based texture descriptors applied to the test image from Figure 7.1c (values rescaled to ). Patch radii were fixed to , contrast scale to , weighting parameter to .
Top row, left to right: (a)
. –
(b)
. –
(c)
. –
(d)
. –
Bottom row, left to right: (e)
. –
(f)
. –
(g)
. –
(h)
.
Figure 7.3 Geodesic active contour segmentation of the image shown in Figure 7.1c. The edge map is computed from the graph entropy-based texture descriptor from Figure 7.2f using presmoothing with , Perona–Malik edge-stopping function (7.15) with , with expansion force .
Left to right: (a)
Initial contour (). –
(b)
. –
(c)
. –
(d)
(steady state).
Figure 7.6 Information functionals (in logarithmic scale) as functions of local dimension.
Top row:
.
Left to right: (a)
. –
(b)
. –
(c)
. –
Bottom row:
.
Left to right: (d)
. –
(e)
. –
(f)
.
Chapter 8: Information Content Measures and Prediction of Physical Entropy of Organic Compounds
Figure 8.1 Vertex-weighted graph and atomic symbol-labeled graph
G
P
of ethane.
Chapter 9: Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data
Figure 9.1 Topological information content and parametric graph entropy distributions.
Figure 9.2 Both entropies plotted over community size.
Figure 9.3 Plot of the largest community in the coauthorship graph.
Figure 9.4 Plot of the 30th community in the coauthorship graph with author names.
Chapter 2: Generalized Entropies of Complex and Random Networks
Table 2.1 Correspondence of the symbols in Figure 2.4 to the clusters identified in Figure 2.3
Chapter 3: Information Flow and Entropy Production on Bayesian Networks
Table 3.1 Summary of notations
Chapter 6: Statistical Methods in Graphs: Parameter Estimation, Model Selection, and Hypothesis Test
Table 6.1 Parameter estimation
Table 6.2 Model selection simulation.
Chapter 9: Application of Graph Entropy for Knowledge Discovery and Data Mining in Bibliometric Data
Table 9.1 Largest 10 journals from DBLP and their article count
Table 9.2 The largest 10 identified communities
“Quantitative and Network Biology”
Series editors M. Dehmer and F. Emmert-Streib
Advisory Board:
Albert-László Barabási
Northeastern University & Harvard Medical School, USA
Douglas Lauffenburger
Massachusetts Institute of Technology, USA
Satoru Miyano
University of Tokyo, Japan
Ilya Shmulevich
Institute for Systems Biology & University of Washington, USA
Previous Volumes of this Series:
Volume 1 Dehmer, M., Emmert-Streib, F., Graber, A., Salvador, A. (eds.)
Applied Statistics for Network Biology
Methods in Systems Biology
2011
ISBN: 978-3-527-32750-8
Volume 2 Dehmer, M., Varmuza, K., Bonchev, D. (eds.)
Statistical Modelling of Molecular Descriptors in QSAR/QSPR
2012
ISBN: 978-3-527-32434-7
Volume 3 Emmert-Streib, F. Dehmer, M. (eds.)
Statistical Diagnostics for Cancer
Analyzing High-Dimensional Data
2013
ISBN: 978-3-527-32434-7
Volume 4 Emmert-Streib, F. Dehmer, M. (eds.)
Advances in Network Complexity
2013
ISBN: 978-3-527-33291-5
Volume 5 Dehmer, M., Emmert-Streib, F., Pickl, S. (eds.)
Computational Network Theory
2015
ISBN: 978-3-527-33724-8
Volume 6 Dehmer, M., Chen, Z., Li, X., Shi, Y., Emmert-Streib, F.
Mathematical Foundations and Applications of Graph Entropy
2017
ISBN: 978-3-527-33909-9
Quantitative and Network Biology Series Editors M. Dehmer and F. Emmert-Streib Volume 6
Edited by
Matthias Dehmer, Frank Emmert-Streib, Zengqiang Chen, Xueliang Li, and Yongtang Shi
The Editors
Prof. Matthias Dehmer
Nankai University
College of Computer and Control Engineering
Tianjin 300071
PR China
and
UMIT - The Health & Life Sciences University
Department of Biomedical Computer Sciences and Mechatronics
Eduard-Wallnöfer-Zentrum 1
6060 Hall/Tyrol
Austria
Prof. Frank Emmert-Streib
Tampere University of Technology
Predictive Medicine and Analytics Lab
Department of Signal Processing
Tampere
Finland
Prof. Zengqiang Chen
Nankai University
College of Computer and Control Engineering
Tianjin 300071
PR China
Prof. Xueliang Li
Nankai University
Center for Combinatorics
94 Weijin Road
Tianjin 300071
China
Prof. Yongtang Shi
Nankai University
Center for Combinatorics
No.94 Weijin Road
Tianjin 300071
China
All books published by Wiley-VCH are carefully produced. Nevertheless, authors, editors, and publisher do not warrant the information contained in these books, including this book, to be free of errors. Readers are advised to keep in mind that statements, data, illustrations, procedural details or other items may inadvertently be inaccurate.
Library of Congress Card No.: applied for
British Library Cataloguing-in-Publication Data
A catalogue record for this book is available from the British Library.
Bibliographic information published by the Deutsche Nationalbibliothek
The Deutsche Nationalbibliothek lists this publication in the Deutsche Nationalbibliografie; detailed bibliographic data are available on the Internet at <http://dnb.d-nb.de>.
© 2016 Wiley-VCH Verlag GmbH & Co. KGaA,
Boschstr. 12, 69469 Weinheim, Germany
All rights reserved (including those of translation into other languages). No part of this book may be reproduced in any form – by photoprinting, microfilm, or any other means – nor transmitted or translated into a machine language without written permission from the publishers. Registered names, trademarks, etc. used in this book, even when not specifically marked as such, are not to be considered unprotected by law.
Print ISBN: 978-3-527-33909-9
ePDF ISBN: 978-3-527-69322-1
ePub ISBN: 978-3-527-69325-2
Mobi ISBN: 978-3-527-69323-8
oBook ISBN: 978-3-527-69324-5
Fernando Jesús Ballesteros
University of Valencia (UV)
Astronomic Observatory
2 Jose Beltran Street
CP E-46980 Paterna
Valencia
Spain
André Calero Valdez
RWTH Aachen University
Human Technology Research Center
Campus Boulevard 57
52074 Aachen
Germany
Seyed Saeed Changiz Rezaei
1QBit Information Technology
458–550 Burrard Street
Vancouver, BC V6C 2B5
Canada
Suzana de Siqueira Santos
University of São Paulo
Department of Computer Science
Institute of Mathematics and Statistics
Rua do Matão
1010–Cidade Universitária
São Paulo–SP 05508-090
Brazil
Matthias Dehmer
Nankai University
College of Computer and Control Engineering
Tianjin 300071
PR China
and
UMIT–The Health & Life Sciences University
Department of Biomedical Computer Sciences and Mechatronics
Eduard-Wallnöfer-Zentrum 1
6060 Hall/Tyrol
Austria
Carlos Eduardo Ferreira
University of São Paulo
Department of Computer Science
Institute of Mathematics and Statistics
Rua do Matão
1010–Cidade Universitária
São Paulo–SP 05508-090
Brazil
André Fujita
University of São Paulo
Department of Computer Science
Institute of Mathematics and Statistics
Rua do Matão
1010–Cidade Universitária
São Paulo–SP 05508-090
Brazil
Vladimir Gudkov
University of South Carolina
Department of Physics and Astronomy
712 Main Street
Columbia, SC 29208
USA
Andreas Holzinger
Medical University Graz
Research Unit HCI-KDD
Institute for Medical Informatics
Statistics and Documentation
Austria
and
Graz University of Technology
Institute for Information Systems and Computer Media
Austria
Sosuke Ito
Department of Physics
Tokyo Institute of Technology
Oh-okayama 2-12-1
Meguro-ku
Tokyo 152-8551
Japan
Lucas Lacasa
Queen Mary University of London
School of Mathematical Sciences
Mile End Road
London E14NS
UK
Xueliang Li
Nankai University
Center for Combinatorics
94 Weijin Road
Tianjin 300071
PR China
Bartolo Luque
Polytechnic University of Madrid (UPM)
Department of Mathematics
3, Cardenal Cisneros
CP 28040 Madrid
Spain
Debnath Pal
Department of Computational and Data Sciences
Indian Institute of Science
C. V. Raman Avenue
Bangalore 560012
India
Chandan Raychaudhury
Department of Computational and Data Sciences
Indian Institute of Science
C. V. Raman Avenue
Bangalore 560012
India
Alberto Robledo
National University of Mexico (UNAM)
Institute of Physics and Center for Complexity Sciences
Mexico City
Distrito Federal
CP 01000 Mexico
Mexico
Takahiro Sagawa
The University of Tokyo
Department of Applied Physics
7-3-1 Hongo
Bunkyo-ku
Tokyo 113-8656
Japan
João Ricardo Sato
Federal University of ABC
Center of Mathematics, Computation, and Cognition
Rua Santa Adelia
166, Santo Andrè
SP 09210-170
Brazil
Daniel Yasumasa Takahashi
Princeton University
Department of Psychology and Neuroscience Institute
Green Hall
Princeton, NJ 08544
USA
Meiqin Wei
Nankai University
Center for Combinatorics
94 Weijin Road
Tianjin 300071
PR China
Martin Welk
University for Health Sciences
Medical Informatics and Technology (UMIT)
Institute for Biomedical Computer Science
Biomedical Image Analysis Division
Eduard-Wallnöfer-Zentrum 1
6060 Hall/Tyrol
Austria
Graph entropy measures represent information-theoretic measures for characterizing networks quantitatively. The first concepts in this framework were developed in the 1950s for investigating biological and chemical systems. Seminal work on this problem was done by Rashevsky, Trucco, and Mowshowitz, who investigated entropy measures for quantifying the so-called structural information content of a graph. To date, numerous graph entropies have been developed and applied to various problems in theoretical and applied disciplines. Examples are biology, computational biology, mathematical chemistry, Web mining, and knowledge engineering. Many of the described quantitative measures have been used for capturing network complexity. However, network complexity is constantly observed, and hence there is no right measure. Consequently, developing efficient information-theoretic graph measures (graph entropies) has been intricate and the overall process depends on a practical application. That is why several graph entropies that have been developed are not mathematically investigated.
Graph entropies have been explored from different perspectives in a variety of disciplines, including discrete mathematics, computer science, finance, computational biology, knowledge mining, structural chemistry, and applications thereof such as structure-oriented drug design (quantitative structure–activity relationship/quantitative structure–property relationship (QSAR/QSPR). From a theoretical viewpoint, exploring properties of graph entropy measures has been crucial but intricate. Because some well-known graph entropies are not computable on large networks (e.g., Körner's entropy), proving mathematical properties and interrelations has been even more important.
The main goal of this book is to present and explain methods for defining graph entropies meaningfully. Furthermore, it deals with applying different graph entropy concepts to completely different application areas. This book is intended for researchers, graduates, and advanced undergraduate students in the fields of mathematics, computer science, chemistry, computational physics, bioinformatics, and systems biology.
Many colleagues have provided us with input, help, and support before and during the preparation of this book. In particular, we would like to thank Abbe Mowshowitz, Maria and Gheorghe Duca, Andrey A. Dobrynin, Boris Furtula, Ivan Gutman, Bo Hu, D. D. Lozovanu, Alexei Levitchi, Andrei Perjan, Ricardo de Matos Simoes, Fred Sobik, Shailesh Tripathi, Kurt Varmuza, Guihai Yu, and Dongxiao Zhu. Apologies go to those whose names havebeen inadvertently missed. Furthermore, we would like to thank the editors from Wiley-Blackwell, who have been always available and helpful. Last but not least, Matthias Dehmer thanks the Austrian Science Funds (project P22029-N13) for supporting this book. Zengqiang Chen, Matthias Dehmer, Xueliang Li and Yongtang Shi thank the National Natural Science Foundation of China and Nankai University for their support.Matthias Dehmer also thanks his sister Marion Dehmer-Sehn, who passed away in 2012 for giving love and mental support.
To the best of our knowledge, this book is the first of its kind that is dedicated exclusively to graph entropy . Existing books dealing with related topics such as network complexity and complex networks have limited scope in the sense that they only consider specialized graph measures for specific applications. Therefore, we believe that this book will broaden the scope of scientists dealing with graph entropies. Finally, we hope this book conveys the enthusiasm and joy we have for this field and inspires fellow researchers in their own practical or theoretical work.
Hall/Tyrol, Tianjian, and TampereMatthias Dehmer, Frank Emmert-Streib, Zengqiang ChenXueliang Li and Yongtang Shi
Bartolo Luque, Fernando Jesús Ballesteros, Alberto Robledo and Lucas Lacasa
In this chapter, we concentrate on a mapping from time series to graphs, the visibility algorithm introduced by Lacasa et al. [1]. In order to cite some of its most relevant features, we will stress its intrinsic nonlocality, low computational cost, straightforward implementation, and quite “simple” way of inheriting the time series properties in the structure of the associated graphs. These features will make it easier to find connections between the underlying processes and the networks obtained from them by a direct analysis of the latter. In particular, in this chapter, we will focus the implementation of the algorithm of visibility to three known routes to chaos. We will define a graph entropy and process of renormalization for visibility graphs that characterize these routes and analyze the relationship between the flow of renormalization and the extremes of the entropy function.
Disregarding any underlying process, we can consider a time series just as an ordered set of values and transform this set into a different mathematical object with the aids of an abstract mapping [2, 3]. We can then ask which properties of the original set are conserved, which are transformed and how, what can we say about one of the mathematical representations just by looking at the other. This exercise is of mathematical interest by itself. In addition, it turns out that time series or signals is a universal method of extracting information from dynamical systems in any field of science. Therefore, the preceding mathematical mapping gains some unexpected practical interest as it opens the possibility of analyzing a time series from an alternative point of view. Of course, the relevant information stored in the original time series should be somehow conserved in the mapping. The motivation is completed when the new representation belongs to a relatively mature mathematical field, where information encoded in such a representation can be effectively disentangled and processed. This is, precisely, the first motivation to map time series into networks.
This motivation is increased by two interconnected factors: (i) Although a mature field, time series analysis has some limitations, when it refers to study the so-called complex signals. Beyond the linear regime, there exists a wide range of phenomena, which are usually embraced in the field of the so-called complex systems. Dynamical phenomena such as chaos, long-range correlated stochastic processes, intermittency, and multifractality are examples of complex phenomena, where time series analysis is pushed to its own limits. Nonlinear time series analysis develops from techniques such as nonlinear correlation functions, embedding algorithms, multrifractal spectra, and projection theorem tools that increase in complexity parallel to the complexity of the process/series under study. New approaches to deal with complexity are not only welcome, but needed. Approaches dealing with the intrinsic nonlinearity by being intrinsically nonlinear in turn deal with the possible multiscale character of the underlying process by being designed to naturally incorporated multiple scales, and such is the framework of networks, of graph theory. (ii) The technological era brings us the possibility to digitally analyze myriads of data in a glimpse. Massive data sets can nowadays be parsed, and with the aid of well-suited algorithms, we can gain access and filter data from many processes, let it be of physical, technological, or even social garment.
The idea of mapping time series into graphs seems attractive, because it bridges two prolific fields of modern science as nonlinear signal analysis and complex networks theory, as much that it has attracted the attention of several research groups, which have contributed to the topic with different strategies of mapping. We shall briefly outline some of them.
Zhang and Small [4] developed a method that mapped each cycle of a pseudo-periodic time series into a node in a graph. The connection between nodes was established by a distance threshold in the reconstructed phase space when possible or by the linear correlation coefficient between cycles in the presence of noise. Noisy periodic time series mapped into random graphs while chaotic time series did it into scale-free, small-world networks due to the presence of unstable periodic orbits. This method was subsequently applied to characterize cardiac dynamics.
Xu, in collaboration with Zhang and Small [5], concentrated in the relative frequencies of appearance of four-node motifs inside a particular graph to classify it into a particular superfamily of networks, which corresponded to specific underlying dynamics of the mapped time series. In this case, the method of mapping consisted in embedding the time series in an appropriated phase space, where each point corresponded to a node in the network. A threshold was imposed not only in the minimum distance between two neighbors to be eligible (temporal separation should be greater than the mean period of the data), but also to the maximum number of neighbors a node could have. Different superfamilies were found for chaotic, hyperchaotic, random, and noisy periodic underlying dynamics and unique fingerprints were also found for specific dynamical systems within a family.
Donneret al. [6–8] presented a technique, which was based on the properties of recurrence in the phase space of a dynamical system. More precisely, the recurrence matrix obtained by imposing a threshold in the minimum distance between two points in the phase space was interpreted as the adjacency matrix of an undirected, unweighted graph (as in Ref. [5]). Properties of such graphs at three different scales (local, intermediated, and global) were presented and studied on several paradigmatic systems (Hénon map, Rossler system, Lorenz system, and Bernoulli map). The variation of some of the properties of the graphs with the distance threshold was analyzed, the use of specific measures, such as the local clustering coefficient, was proposed as a way to detect dynamically invariant objects (saddle points or unstable periodic orbits), and studying the graph properties dependent on the embedding dimension was suggested as a means to distinguish between chaotic and stochastic systems.
The Amaral Lab [9] contributed with an idea along the lines of Shirazi et al. [10], Strozzi et al. [11], and Haraguchi et al. [12] of a surjective mapping, which admits an inverse operation. This approach opens the reciprocal possibility of benefiting from time series analysis to study the structure and properties of networks. Time series are treated as Markov processes, and values are grouped in quantiles, which will correspond to nodes in the associated graph. Weighted and directed connections are established between nodes as a function of the probability of transition between quantiles. An inverse operation can be defined without an a priori knowledge of the correspondence between nodes and quantiles just by imposing a continuity condition in the time series by means of a cost function defined on the weighted adjacency matrix of the graph. A random walk is performed on the network and a time series with properties equivalent to the original one is recovered. This method was applied to a battery of cases, which included a periodic-to-random family of processes parameterized by a probability of transition, a pair of chaotic systems (Lorentz and Rossler attractors), and two human heart rate time series. Reciprocally, the inverse map was applied to the metabolic network of Arabidopsis thaliana and to the '97 year Internet Network.
In the same vein of an inverse transformation, Shimada et al. [13] proposed a framework to transform a complex network to a time series, realized by a multidimensional scaling. Applying the transformation method to a model proposed by Watts and Strogatz [14], they show that ring lattices are transformed to periodic time series, small-world networks to noisy periodic time series, and random networks to random time series. They also show that these relationships are analytically held by using the circulant matrix theory and the perturbation theory of linear operators. They generalize the results to several high-dimensional lattices.
Gao and Jin proposed in Ref. [15] a method for constructing complex networks from a time series with each vector point of the reconstructed phase space represented by a single node and edge determined by the phase space distance. Through investigating an extensive range of network topology statistics, they find that the constructed network inherits the main properties of the time series in its structure. Specifically, periodic series and noisy series convert into regular networks and random networks, respectively, and networks generated from chaotic series typically exhibit small-world and scale-free features. Furthermore, they associate different aspects of the dynamics of the time series with the topological indices of the network and demonstrate how such statistics can be used to distinguish different dynamical regimes. Through analyzing the chaotic time series corrupted by measurement noise, they also indicate the antinoise ability of the method.
Sinatra et al. [16] introduced a method to convert an ensemble of sequences of symbols into a weighted directed network, whose nodes are motifs, while the directed links and their weights are defined from statistically significant co-occurrences of two motifs in the same sequence. The analysis of communities of networks of motifs is shown to be able to correlate sequences with functions in the human proteome database, to detect hot topics from online social dialogs and characterize trajectories of dynamical systems.
Sun et al. [17] have also proposed a novel method to transform a time series into a weighted and directed network. For a given time series, they first generate a set of segments via a sliding window, and then use a doubly symbolic scheme to characterize every windowed segment by combining absolute amplitude information with an ordinal pattern characterization. On the basis of this construction, a network can be directly constructed from the given time series: segments corresponding to different symbol-pairs are mapped to network nodes and the temporal succession between nodes is represented by directed links. With this conversion, dynamics underlying the time series has been encoded into the network structure. They illustrate the potential of their networks with a well-studied dynamical model as a benchmark example. Results show that network measures for characterizing global properties can detect the dynamical transitions in the underlying system. Moreover, they used a random walk algorithm to sample loops in networks, and found that a time series with different dynamics exhibits distinct cycle structure. That is, the relative prevalence of loops with different lengths can be used to identify the underlying dynamics.
In the following, we will first present two versions of the visibility algorithm, our own alternative to these methods of mapping, along with its most notable properties that, in many cases, can be derived analytically. On the basis of these latter properties, several applications are addressed.
Let be a time series of data. The natural visibility algorithm [1] assigns each datum of the series to a node in the natural visibility graph (NVg). Two nodes and in the graph are connected if one can draw a straight line in the time series joining and that does not intersect any intermediate data height (see Figure 1.1 for a graphical illustration). Hence, and are two connected nodes if the following geometrical criterion is fulfilled within the time series:
It can be easily checked that by means of the present algorithm, the associated graph extracted from a time series is always:
Connected: each node sees at least its nearest neighbors (left- and right-hand sides).
Undirected: the way the algorithm is built up, there is no direction defined in the links.
Invariant under affine transformations of the series data: the visibility criterion is invariant under (unsigned) linear rescaling of both horizontal and vertical axis, as well as under horizontal and vertical translations.
“Lossy”: some information regarding the time series is inevitably lost in the mapping from the fact that the network structure is completely determined in the adjacency matrix. For instance, two periodic series with the same period as
and
would have the same visibility graph, albeit being quantitatively different.
Figure 1.1 Illustrative example of the natural visibility algorithm. In the upper part, we plot a periodic time series and in the bottom part, we represent the graph generated through the natural visibility algorithm. Each datum in the series corresponds to a node in the graph, such that two nodes are connected if their corresponding data heights fulfill the visibility criterion of equation 1.1. Note that the degree distribution of the visibility graph is composed by a finite number of peaks, much in the vein of the discrete Fourier transform (DFT) of a periodic signal. We can thus interpret the visibility algorithm as a geometric transform.
(Luque et al. [18]. Reproduced with permission of American Physical Society.)
One straightforward question is: what does the visibility algorithm stand for? In order to deepen the geometric interpretation of the visibility graph, let us focus on a periodic series. It is straightforward that its visibility graph is a concatenation of a motif: a repetition of a pattern (see Figure 1.1). Now, which is the degree distribution of this visibility graph? Since the graph is just a motif's repetition, the degree distribution will be formed by a finite number of nonnull values, this number being related to the period of the associated periodic series. This behavior reminds us the DFT, in which periodic series is formed by a finite number of peaks (vibration modes) related to the series period. Using this analogy, we can understand the visibility algorithm as a geometric transform. Whereas a DFT decomposes a signal in a sum of (eventually infinite) modes, the visibility algorithm decomposes a signal in a concatenation of graph's motifs, and the degree distribution simply makes a histogram of such “geometric modes.” While the time series is defined in the time domain and the DFT is defined on the frequency domain, the visibility graph is then defined on the “visibility domain.” In fact this analogy is, so far, a simple metaphor to help our intuition, this transform is not a reversible one for instance.
An alternative criterion for the construction of the visibility graph is defined as follows: let be a time series of data. The so-called horizontal visibility algorithm [18] assigns each datum of the series to a node in the horizontal visibility graph (HVg). Two nodes and in the graph are connected if one can draw a horizontal line in the time series joining and that does not intersect any intermediate data height (see Figure 1.2 for a graphical illustration). Hence, and are two connected nodes if the following geometrical criterion is fulfilled within the time series:
Figure 1.2 Illustrative example of the Natural (a) and Horizontal (b) visibility algorithms. We plot the same time series and represent the graphs generated through both visibility algorithms below. Each datum in the series corresponds to a node in the graph, such that two nodes are connected if their corresponding data heights fulfill the visibility criteria of equations 1.1 and 1.2, respectively. Observe that the Horizontal Visibility graph is a subgraph of the Natural Visibility graph for the same time series.
(Luque et al. [18]. Reproduced with permission of American Physical Society.)
This algorithm is a simplification of the Natural Visibility algorithm (NVa). In fact, the HVg is always a subgraph of its associated NVg for the same time series (see Figure 1.2). Besides, the HVg graph will also be (i) connected, (ii) undirected, (iii) invariant under affine transformations of the series, and (iv) “lossy.” Some concrete properties of these graphs can be found in Refs [18–21]. HVg method is quite more tractable analytically than NVg. Hence, for example, if is a bi-infinite sequence of independent and identically distributed random variables extracted from a continuous probability density , then its associated HVg has degree distribution:
A lengthy constructive proof can be found in Ref. [18] and alternative, shorter proofs can be found in Ref. [22]. The mean degree of the HVg associated to an uncorrelated random process is then:
In fact, the mean degree of an HVg associated to an infinite periodic series of period (with no repeated values within a period) is
A proof can be found in Ref. [22]. An interesting consequence is that every time series has an associated HVg with a mean degree , where the lower bound is reached for constant series, whereas the upper bound is reached for aperiodic series [18].
In order to end this introduction without intending to be exhaustive, we believe appropriate to point out some of the areas where, despite the recent method, the visibility algorithm has been applied with interesting results:
Aguilar-San Juan and Guzman-Vargas presented, in Ref. [23], a statistical analysis of earthquake magnitude sequences in terms of the visibility graph method. Magnitude time series from Italy, southern California, and Mexico are transformed into networks and some organizational graph properties are discussed. Connectivities are characterized by a scale-free distribution with a notable effect for large scales due to either the presence or absence of large events. In addition, a scaling behavior is observed between different node measures such as betweenness centrality, clustering coefficient, nearest-neighbor connectivity, and earthquake magnitude. Moreover, parameters which quantify the difference between forward and backward links are proposed to evaluate the asymmetry of visibility attachment mechanism. Their results show an alternating average behavior of these parameters as earthquake magnitude changes. Finally, they evaluate the effects of reducing temporal and spatial windows of observation upon visibility network properties for main shocks.
Telesca et al. [24–26] have analyzed the synthetic seismicity generated by a simple stick-slip system with asperities by using the method of the visibility graph. The stick-slip system mimics the interaction between tectonic plates, whose asperities are given by sandpapers of different granularity degrees. The visibility graph properties of the seismic sequences have been put in relationship with the typical seismological parameter, the -value of the Gutenberg–Richter law. Between the -value of the synthetic seismicity and the slope of the least-square line fitting, the plot (relationship between the magnitude of each synthetic event and its connectivity degree ), a close linear relationship is found, which is verified by real seismicity.
Elsner et al. [27] demonstrated the method of construction of a network from a time series of US hurricane counts and showed how it can be used to identify unusual years in the record. The network links years based on a line-of-sight visibility algorithm applied to the time series plot and is physically related to the variation of hurricanes from 1 year to the next. The authors find that the distribution of node degree is consistent with a random Poisson process. High hurricane-occurrence years that are surrounded by years with few hurricanes have many linkages. Of the environmental conditions known to affect coastal hurricane activity, they find years with little sunspot activity during September (peak month of the hurricane season) best correspond with the unusually high linkage years.
A classic topic in fluid mechanics is the complex behavior exhibited by some fluids within a certain regime, characterized basically by a dimensionless number known as Reynolds number, consisting of a high-dimensional spatiotemporal form of chaos called turbulence. The multiscale nature of this phenomenon is reflected in the distribution of velocity increments and energy dissipation rates, which exhibit anomalous scalings suggesting some kind of multifractality. A first attempt to characterize an energy dissipation rate time series by means of the visibility algorithm was made by Liu et al. [28]. In this work, a series obtained from wind tunnel experimental measurements was mapped into a graph by the natural visibility version of the algorithm yielding a power law of exponent for the degree distribution. An edge covering box-counting method was used to prove the nonfractality of the graph and allometric scalings for the skeleton and random spanning trees of the graph were proposed, but no functional relation to any physical magnitude characterizing the phenomenon could be derived.
Yang et al. [29] mapped six exchange rate series and their corresponding detrended series into graphs by means of the NVa. The results suggest that, for certain purposes, these series can be modeled as fractional Brownian motions. The multifractal structure of the series was broken by shuffling them and so, shuffled series mapped into graphs with exponential degree distributions.
Qian et al. [30], in the same philosophy as Liu et al. [28], built three different classes of spanning trees from the graphs associated to 30 world stock market indices and studied their allometric properties, finding universal allometric scaling behavior in one of the classes. No satisfactory explanation was found for this fact. They also built spanning trees from graphs associated to fractional Brownian motions with different Hurst exponents, finding discrepancies in their allometric behavior with the ones mapped from the stock market indices. These discrepancies were attributed to the nonlinear long-term correlations and fat-tailed distributions of the financial series.
Shao [31] used the visibility algorithm to construct the associated networks of time series of filtered data of five healthy subjects and five patients with congestive heart failure (CHF). He used the assortative coefficient of the networks to distinguish healthy patients from CHF patients. On the contrary, Dong and Liâ [32], in a comment on the first work, calculated the assortativity coefficients of heartbeat networks extracted from time series of healthy subjects and CHF patients and concluded that the assortative coefficients of such networks failed as an effective indicator to differentiate healthy patients from CHF patients at large.
Ahmadlou et al. [33] presented a new chaos-wavelet approach for electroencephalogram (EEG)-based diagnosis of Alzheimer's disease (AD) using the visibility graph. The approach is based on the research ideology that nonlinear features may not reveal differences between AD and control group in the band-limited EEG, but may represent noticeable differences in certain subbands. Hence, complexity of EEGs is computed using the VGs of EEGs and EEG subbands produced by wavelet decomposition. Two methods are used for computation of complexity of the VGs: one based on the power of scale-freeness of a graph structure and the other based on the maximum eigenvalue of the adjacency matrix of a graph. Analysis of variation is used for feature selection. Two classifiers are applied to the selected features to distinguish AD and control EEGs: a radial basis function neural network (RBFNN) and a two-stage classifier consisting of principal component analysis (PCA) and RBFNN. After comprehensive statistical studies, effective classification features and mathematical markers are presented.