Pattern Recognition - Wladyslaw Homenda - E-Book

Pattern Recognition E-Book

Wladyslaw Homenda

0,0
109,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A new approach to the issue of data quality in pattern recognition Detailing foundational concepts before introducing more complex methodologies and algorithms, this book is a self-contained manual for advanced data analysis and data mining. Top-down organization presents detailed applications only after methodological issues have been mastered, and step-by-step instructions help ensure successful implementation of new processes. By positioning data quality as a factor to be dealt with rather than overcome, the framework provided serves as a valuable, versatile tool in the analysis arsenal. For decades, practical need has inspired intense theoretical and applied research into pattern recognition for numerous and diverse applications. Throughout, the limiting factor and perpetual problem has been data--its sheer diversity, abundance, and variable quality presents the central challenge to pattern recognition innovation. Pattern Recognition: A Quality of Data Perspective repositions that challenge from a hurdle to a given, and presents a new framework for comprehensive data analysis that is designed specifically to accommodate problem data. Designed as both a practical manual and a discussion about the most useful elements of pattern recognition innovation, this book: * Details fundamental pattern recognition concepts, including feature space construction, classifiers, rejection, and evaluation * Provides a systematic examination of the concepts, design methodology, and algorithms involved in pattern recognition * Includes numerous experiments, detailed schemes, and more advanced problems that reinforce complex concepts * Acts as a self-contained primer toward advanced solutions, with detailed background and step-by-step processes * Introduces the concept of granules and provides a framework for granular computing Pattern recognition plays a pivotal role in data analysis and data mining, fields which are themselves being applied in an expanding sphere of utility. By facing the data quality issue head-on, this book provides students, practitioners, and researchers with a clear way forward amidst the ever-expanding data supply.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 551

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

PREFACE

PART I: FUNDAMENTALS

CHAPTER 1:

PATTERN RECOGNITION: FEATURE SPACE CONSTRUCTION

1.1 CONCEPTS

1.2 FROM PATTERNS TO FEATURES

1.3 FEATURES SCALING

1.4 EVALUATION AND SELECTION OF FEATURES

1.5 CONCLUSIONS

APPENDIX 1.A

APPENDIX 1.B

REFERENCES

CHAPTER 2:

PATTERN RECOGNITION: CLASSIFIERS

2.1 CONCEPTS

2.2 NEAREST NEIGHBORS CLASSIFICATION METHOD

2.3 SUPPORT VECTOR MACHINES CLASSIFICATION ALGORITHM

2.4 DECISION TREES IN CLASSIFICATION PROBLEMS

2.5 ENSEMBLE CLASSIFIERS

2.6 BAYES CLASSIFIERS

2.7 CONCLUSIONS

REFERENCES

CHAPTER 3:

CLASSIFICATION WITH REJECTION PROBLEM FORMULATION AND AN OVERVIEW

3.1 CONCEPTS

3.2 THE CONCEPT OF REJECTING ARCHITECTURES

3.3 NATIVE PATTERNS‐BASED REJECTION

3.4 REJECTION OPTION IN THE DATASET OF NATIVE PATTERNS: A CASE STUDY

3.5 CONCLUSIONS

REFERENCES

CHAPTER 4:

EVALUATING PATTERN RECOGNITION PROBLEM

4.1 EVALUATING RECOGNITION WITH REJECTION: BASIC CONCEPTS

4.2 CLASSIFICATION WITH REJECTION WITH NO FOREIGN PATTERNS

4.3 CLASSIFICATION WITH REJECTION: LOCAL CHARACTERIZATION

4.4 CONCLUSIONS

REFERENCES

CHAPTER 5:

RECOGNITION WITH REJECTION: EMPIRICAL ANALYSIS

5.1 EXPERIMENTAL RESULTS

5.2 GEOMETRICAL APPROACH

5.3 CONCLUSIONS

REFERENCES

PART II:

ADVANCED TOPICS: A FRAMEWORK OF GRANULAR COMPUTING

CHAPTER 6:

CONCEPTS AND NOTIONS OF INFORMATION GRANULES

6.1 INFORMATION GRANULARITY AND GRANULAR COMPUTING

6.2 FORMAL PLATFORMS OF INFORMATION GRANULARITY

6.3 INTERVALS AND CALCULUS OF INTERVALS

6.4 CALCULUS OF FUZZY SETS

6.5 CHARACTERIZATION OF INFORMATION GRANULES: COVERAGE AND SPECIFICITY

6.6 MATCHING INFORMATION GRANULES

6.7 CONCLUSIONS

REFERENCES

CHAPTER 7:

INFORMATION GRANULES: FUNDAMENTAL CONSTRUCTS

7.1 THE PRINCIPLE OF JUSTIFIABLE GRANULARITY

7.2 INFORMATION GRANULARITY AS A DESIGN ASSET

7.3 SINGLE‐STEP AND MULTISTEP PREDICTION OF TEMPORAL DATA IN TIME SERIES MODELS

7.4 DEVELOPMENT OF GRANULAR MODELS OF HIGHER TYPE

7.5 CLASSIFICATION WITH GRANULAR PATTERNS

7.6 CONCLUSIONS

REFERENCES

CHAPTER 8:

CLUSTERING

8.1 FUZZY C‐MEANS CLUSTERING METHOD

8.2

k

‐MEANS CLUSTERING ALGORITHM

8.3 AUGMENTED FUZZY CLUSTERING WITH CLUSTERS AND VARIABLES WEIGHTING

8.4 KNOWLEDGE‐BASED CLUSTERING

8.5 QUALITY OF CLUSTERING RESULTS

8.6 INFORMATION GRANULES AND INTERPRETATION OF CLUSTERING RESULTS

8.7 HIERARCHICAL CLUSTERING

8.8 INFORMATION GRANULES IN PRIVACY PROBLEM: A CONCEPT OF MICROAGGREGATION

8.9 DEVELOPMENT OF INFORMATION GRANULES OF HIGHER TYPE

8.10 EXPERIMENTAL STUDIES

8.11 CONCLUSIONS

REFERENCES

CHAPTER 9:

QUALITY OF DATA: IMPUTATION AND DATA BALANCING

9.1 DATA IMPUTATION: UNDERLYING CONCEPTS AND KEY PROBLEMS

9.2 SELECTED CATEGORIES OF IMPUTATION METHODS

9.3 IMPUTATION WITH THE USE OF INFORMATION GRANULES

9.4 GRANULAR IMPUTATION WITH THE PRINCIPLE OF JUSTIFIABLE GRANULARITY

9.5 GRANULAR IMPUTATION WITH FUZZY CLUSTERING

9.6 DATA IMPUTATION IN SYSTEM MODELING

9.7 IMBALANCED DATA AND THEIR GRANULAR CHARACTERIZATION

9.8 CONCLUSIONS

REFERENCES

INDEX

End User License Agreement

List of Tables

Chapter 01

TABLE 1.1 Numerical features derived from two vectorial features: vertical projection and differential of vertical projections

TABLE 1.2 Matrix of the Pearson correlation coefficients for features outlined in Table 1.1

TABLE 1.3 Features ranking with two classifiers and different indices applied: classifiers SVM and

k

‐NN (

k

 = 1), ANOVA index, and several clustering indices

TABLE 1.4 Features ranking with

distance by rank

: compared are all pairs of classifiers and indices, the scores less than 6000 for the SVM index and less than 7000 for the

k

‐NN index are bolded

TABLE 1.5 Features ranking with

distance by segments cardinality

: compared are initial segments of ranks created by classifiers and an index

TABLE 1.6 Performance time of Algorithm 1.10 on the MNIST dataset (LeCun

et al

., 1998) for three clustering indices and two classifiers

Chapter 04

TABLE 4.1 Confusion matrix for rejecting option of pattern recognition applied to two sets of native and foreign patterns without splitting the set of native patterns into classes

TABLE 4.2 Balanced confusion matrix from Table 4.1 with equalizing parameter applied

TABLE 4.3 Classification with rejection applied to handwritten digits (native set, 10 classes) and crossed‐out digits and handwritten Latin alphabet letters (foreign sets)

TABLE 4.4 Results of classification with rejection from Table 4.3 in terms of identification of native and foreign patterns, according to (4.6) and (4.7), while CC = 9558 according to (4.8)

TABLE 4.5 Characteristics of classification with rejection delineated by accuracy, sensitivity, and precision measures obtained from Tables 4.3 and 4.4

TABLE 4.6 Characteristics of multiclass classification with rejection delineated by precision, sensitivity, accuracy, and separability measures drawn from Tables 4.3 and 4.4

TABLE 4.7 Confusion matrix for classification without rejection applied to the set of native patterns only

TABLE 4.8 Confusion matrix for classification with local rejection for the native set of patterns only, CC = 9558

TABLE 4.9 Summary of recognition results without rejection and with rejection

TABLE 4.10 Characteristics of classification with and without rejection delineated by accuracy measures drawn from Tables 4.7 and 4.8

TABLE 4.11 References to class names, which are used in Tables 4.12 and 4.13

TABLE 4.12 Confusion matrix for classification without rejection applied to the set of symbols of music notation (selected 10 classes) standardized features

TABLE 4.13 Confusion matrix for classification with local rejection for selected 10 classes from the dataset of music notation symbols, standardized features

TABLE 4.14 Characteristics of classification with and without rejection delineated by accuracy measures drawn from Tables 4.12 and 4.13

Chapter 05

TABLE 5.1 Confusion matrices for the classifiers without and with rejection shown in Figures 3.13 and 3.15, respectively

TABLE 5.2 Comparison of classification results with rejection (global, local, and embedded architectures) on the train and test sets of native patterns of handwritten digits with classification results without a rejection mechanism

TABLE 5.3 Classification performance of classifiers constructed in features spaces of different dimensionalities: ranging from 24 to 4 features

TABLE 5.4 Confusion matrix for the low quality basic classifier

TABLE 5.5 Results of the considered models without rejection (upper part) and with rejection (bottom part)

TABLE 5.6 Comparison of experiments on classification with rejection (global, local, and embedded architectures) for two native datasets: handwritten digits and printed music notation

TABLE 5.7 Comparing results of rejection with the hyperrectangles model performed for datasets of handwritten digits and music notation

Chapter 06

TABLE 6.1 Selected examples of t‐norms and t‐conorms

TABLE 6.2 Selected examples of parametric t‐norms and t‐conorms

Chapter 08

TABLE 8.1 Values of the reconstruction criterion

V

obtained for selected values of

c

TABLE 8.2 Values of the reconstruction criterion

V

obtained for selected numbers of clusters

c

; the drop in the values of

V

(

c

) are included in the last column with the higher values of drop marked in boldface

TABLE 8.3 Values of the reconstruction error

V

obtained for single‐, complete‐, and average‐linkage method and selected values of

c

Chapter 09

TABLE 9.1 Coverage produced by imputed data versus different values of

p

; the parameter

β

comes with the performance index maximized by the principle of justifiable granularity

List of Illustrations

Chapter 01

Figure 1.1 Pattern recognition schemes: direct mapping from the space of patterns into the space of classes (a) and composition of mappings from the space of patterns into the space of features and from the space of features into the space of classes (b).

Figure 1.2 A typical pattern recognition scheme.

Figure 1.3 Pattern recognition with rejection.

Figure 1.4 A treble clef, a symbol belonging to a data set of printed music notation, taken as an example of a pattern. The pattern is surrounded by a bounding box of width

W

 = 22 and height

H

 = 60 pixels. The bounding box is not part of the pattern; it has been added only for illustrative purposes.

Figure 1.5 Vectorial features: (a) original pattern, (b) horizontal projection, (c) vertical projection, (d) left margin, (e) right margin, (f) bottom margin, (g) top margin, (h) horizontal transition, and (i) vertical transition. Please note that the transition values are very small, so in order to enhance visibility, we multiplied them by 4.

Figure 1.6 Vectorial to vectorial transformations: (a) original pattern, (b) horizontal projection, (c) its histogram, (d) its smoothing, (e) its differentiation, (f) vertical projection, (g) its histogram, (h) its smoothing, and (i) its differentiation. Note: The values of the vertical histogram are multiplied by 4.

Figure 1.7 Vectorial to numerical transformations: (a) original pattern, (b) numerical features of vertical projection (min = 2, mean = 23, max = 34, min position = 22, max position = 13), (c) directions—white lines on the black pattern (W–E = 13, N–S = 28, NE–SW = 20, NW–SE = 11), (d) eccentricity, and (e) Euler numbers (treble clef: −2, flat: 0, sharp: 0, fermata: 2, mezzo forte: 2).

Figure 1.8 Quality of different sets of features selected with the greedy search method. The procedure was adding features one by one: in each iteration one best feature was added. Feature evaluation was performed using the ANOVA F‐test and PBM index. We display accuracy (vertical axis) versus feature sets cardinality (horizontal axis). Results concern sets of features ranging from 1 to 100. Plots present accuracy measured on test sets. Plots concern different sets: original, normalized, standardized digits, and musical symbols for the dataset (Homenda

et al

., 2017). Information about the kind of data is presented in each individual plot.

Figure 1.9 Evaluation of sets of features with ANOVA F‐test, PBM index, GDI‐41 index, and SVM classifier. Sets of features were selected with greedy forward with expansion limited to 1, 3, and 5. Displayed are scores of indices at the whole learning set of patterns and SVM classifier accuracy at the training set of patterns (vertical axis) as a function of cardinality of features sets ranging from 1 to 100 (horizontal axis). Results concern a dataset of handwritten digits.

Figure 1.10 Quality of classification using SVM classifier constructed based on selected sets of features. Sets of features were selected using the ANOVA F‐test and PBM index. Classification accuracy (vertical axis) is measured at the training set and at the test set for three values of expansion limit and for the best features in the individual index rank, cardinality of features sets ranging from 1 to 100 (horizontal axis). The first two rows of graphs concern F‐ANOVA; the last two rows—PBM. Results concern handwritten digits and musical symbols datasets.

Figure 1.11 Quality of classification using the SVM classifier constructed based on selected sets of features. Sets of features were selected using the GDI‐41 index and SVM classifier. Classification accuracy (vertical axis) is measured at the training set and at the test set for three values of expansion limit and for the best features in the individual index rank, cardinality of features sets ranging from 1 to 100 (horizontal axis). The first two rows of graphs concern F‐ANOVA; the last two rows, PBM. Results concern handwritten digits and musical symbols datasets.

Chapter 02

Figure 2.1 Illustration of the

k

‐NN classifier. The pattern marked with a black star is being classified depending on the value of

k

: for

k

 = 1 to the class of plus signs, for

k

 = 3 to the class of squares, for

k

 = 5 to the class of squares, and for

k

 = 7 to the class of triangles. We may conclude that the classification outcome for the pattern marked with a star greatly depends on parameter

k

. In contrast, it is clear that classification of two patterns marked with a hash (#) and an at (@) sign is independent of the parameter

k

for quite wide ranges of

k

.

Figure 2.2 Illustration of the SVM algorithm: a linearly separable case.

Figure 2.3 The SVM algorithm: a case when classes

O

1

and

O

−1

are not linearly separable.

Figure 2.4 The decision tree built for the training set of the wine dataset. Nodes are illustrated with ellipses (non‐leaf nodes) and rectangles (leaf nodes). Notation

x‐y‐z

(e.g., 43‐

52

‐35 in the root) informs about the number of patterns from classes

O

1

,

O

2

, and

O

3

, respectively, that were assigned to a given node. A single number inside tree nodes, positioned above the

x‐y‐z

notation, informs about the label of the majority class in a node. The majority class is also indicated with a number of patterns in boldface, for example, in the root we have 43‐

52

‐35 that says that the second class is the majority class in this node. Inequalities on the edges indicate the split condition of the father’s set of patterns and the feature that was used to perform this split.

Figure 2.5 Classification procedure for the test set using the decision tree constructed in Figure 2.4. Numbers inside nodes (ellipses and rectangles) depict numbers of patterns from each class that fall into a given node. The majority class number (an integer in the first row in each node) concerns the set of training data. The majority class for the test set is indicated with a number written in boldface in the second row.

Figure 2.6 A simplified decision tree. In the full tree outlined in Figure 2.4, internal nodes for which the set of training patterns included less than 10% incorrectly classified patterns were turned to leaves. Classification results for the simplified (pruned) tree are displayed for the training set (left graph) and the test set (right graph).

Figure 2.7 Graphs of entropy, the Gini index, and index of incorrect classification for a case of two classes. (a) True values of indexes and (b) values of indexes scaled to the unit interval.

Figure 2.8 Illustration of the Bayes classifier. Minimal misclassification error is achieved for the crossing point of joined probability densities. Minimal misclassification error is shown as a lined area; reducible error is shown as a double lined area.

Figure 2.9 Illustration of rejecting regions of the Bayes classifier. A posteriori probabilities for both classes are smaller than

δ

max

on the left and the right, while the absolute difference between

a posteriori

probabilities is smaller than

δ

diff

in the middle.

Figure 2.10 Density function estimation for the first feature of the wine dataset. We present estimation of density function for classes 1, 2, and 3 in the first, second, and third columns, respectively. Results concern interval length

h

equal to 0.2, 0.4, and 0.8 in the first, second, and third rows, respectively.

Figure 2.11 Estimation of density function with the Gaussian kernel function for the first feature of the wine dataset. We present the estimation of the density function for classes 1, 2, and 3 and for interval length (

h

) equal to 0.2, 0.4, and 0.8, respectively.

Chapter 03

Figure 3.1 An idea of classification without rejection. (a) A classifier is constructed on the basis of a learning set of native patterns. (b) Constructed classifier is used in practice. Native and foreign patterns are presented to the classifier. Foreign patterns do not belong to any class, but anyway they have to be classified to a native class, which decreases the overall quality of data processing. In addition, we see that a few native patterns are classified to incorrect classes.

Figure 3.2 An idea of classification with rejection. Now most of foreign patterns are rejected. Please compare this sketch with Figure 3.1b that depicts analogous classification mechanism but without rejection, where foreign patterns are incorrectly classified to native classes.

Figure 3.3 The architecture of global rejection. Native and foreign patterns are separated prior to classification of native patterns. Since foreign patterns are eliminated at the first step of recognition with rejection, the set of patterns subjected to classification is assumed to include native patterns only.

Figure 3.4 An example of a dataset of native patterns contaminated with foreign patterns suitable for global rejection. Both sets of native patterns are easily separable from the set of foreign patterns.

Figure 3.5 The architecture of local rejection. Sets of native and foreign patterns are subjected to classification prior to foreign pattern rejection.

Figure 3.6 An example of a dataset consisting of native and foreign patterns suitable to be processed using the local rejection architecture. Native and foreign patterns are hardly separable. Hence, local rejecting would be more effective than global rejecting.

Figure 3.7 Illustration of embedded rejection architecture.

Figure 3.8 Native patterns—handwritten digits.

Figure 3.9 Native patterns—symbols of printed music notation symbols dataset. Below name of each symbol, we put number of samples in this class in the dataset.

Figure 3.10 An excerpt of printed music notation—visualization of imbalances in the dataset of printed music notation symbols with regard to shapes and sizes.

Figure 3.11 Samples of foreign patterns, two rows with samples of each dataset. From top to bottom: handwritten digits crossed out, handwritten Latin alphabet letters, garbage patterns of music notation.

Figure 3.12 The binary tree structure of the classifier built for the set of 10 classes of the MNIST database (cf. Yann LeCun

et al

., 1998). The set {0, 1, 2, 3, 4, 5, 6, 7, 8, 9} of 10 classes is split into two subsets, {2, 3, 5, 7} and {0, 1, 4, 6, 8, 9}, and then the set {2, 3, 5, 7} of four classes is split into two subsets, {3, 5} and {2, 7}, and so on.

Figure 3.13 Architecture of the example tree of classifiers. In this study, random forests and SVMs were used as binary classifiers placed in nodes of this tree.

Figure 3.14 Tree‐based classifier from Figure 3.13 furnished with the global rejection option. The illustration concerns the right branch of the tree (the left branch is skipped). Rejecting is performed with the same binary classifiers as in the basic

C

‐class classifier, that is, random forests or SVMs.

Figure 3.15 The tree‐based classifier from Figure 3.13 furnished with the local rejection option. The illustration concerns the right branch of the tree (the left branch is skipped). Rejecting is performed with random forests or SVM binary classifiers attached to each leaf of the tree.

Figure 3.16 Tree‐based classifier from Figure 3.13 furnished with the embedded rejection option. The illustration concerns the right branch of the tree. Rejecting is performed with SVM classifiers attached to internal nodes and leaves of the tree.

Chapter 04

Figure 4.1 The structure of binary tree‐based classifiers used in musical symbol classification.

Chapter 05

Figure 5.1 Structure of binary tree‐based classifiers and its textual representation for the handwritten digits dataset.

Figure 5.2 Structure of binary tree‐based classifiers for various sets of features: of size 24, 20, 16, 12, 8, and 4. The classifier for the full set of 24 features is fully represented in the textual form (it is the first shown structure). Dots represent digits (class labels), which are not changed in comparison with the classifier above.

Figure 5.3 Textual representation of the structure of binary tree‐based classifiers applied in classification of music notation symbols.

Figure 5.4 Construction of a hyperrectangles‐based area enclosing the training set of native patterns. The area is the union of hyperrectangles built on classes of patterns from the training set.

Figure 5.5 Construction of an area enclosing the training set of patterns. The area is the union (

E

) of three ellipsoids (

E

1

, 

E

2

, 

E

3

) built on classes of patterns from the training set.

Figure 5.6 Construction of

k

‐spans

H

k

for two features and three classes of native patterns: (a)

H

0

span is the union of rectangles enclosing all patterns of classes. (b)

H

3

span is constructed based on removed three outermost patterns from each class.

Figure 5.7 Geometrical rejection with hyperrectangles—characteristics for handwritten digits and symbols of music notation datasets. Plots outline acceptance rate (for the training and test sets of native patterns) and rejection rate (for the set of foreign patterns) as functions of iterations performed in Algorithm 5.2. Compared are two methods of removing patterns:

one feature

and

volume

.

Figure 5.8 Construction of ellipsoids‐based

k

‐spans

E

k

for two features and three classes of native patterns: (a)

E

0

span is the union of three ellipsoids; each ellipsoid encloses all patterns of one class. (b) Spans

E

1

, then

E

2

, and then

E

3

are constructed by removing three (one from each class) outermost patterns from each class in each iteration.

Figure 5.9 Geometrical rejection with ellipsoids—characteristics for handwritten digits and symbols of music notation datasets. Plots outline acceptance rate (for the training and test sets of native patterns) and rejection rate (for the set of foreign patterns) as functions of iterations performed in Algorithm 5.4.

Chapter 06

Figure 6.1 From a stream of numeric data to their granular description completed with the aid of information granules.

Figure 6.2 Histogram as an example of information granule with data coming from a single‐class (a) and two‐class problem (b).

Figure 6.3 Conceptual realization of information granules—a comparative view.

Figure 6.4 Examples of set‐theoretic operations on numeric intervals.

Figure 6.5 Examples of α‐cut and strong α‐cut.

Figure 6.6 Determination of specificity of (a) interval (set), (b) unimodal fuzzy set, and (c) multimodal fuzzy set.

Figure 6.7 Plots of coverage and specificity as a function of

b

: (a)

σ

 = 1.0, (b)

σ

 = 2.0.

Figure 6.8 Join and meet of interval information granules.

Chapter 07

Figure 7.1 Example plots of coverage and specificity (linear model) regarded as a function of

b

.

Figure 7.2

V

(

b

) as a function

b

: (a)

σ

 = 1.0 and (b)

σ

 = 2.0.

Figure 7.3 Development of information granules in the two‐dimensional case when using two distance functions: (a) Euclidean distance and (b) Chebyshev distance.

Figure 7.4 Triangular membership function with adjustable (optimized) bounds

a

and

b

.

Figure 7.5 Aggregation of experimental evidence through the principle of justifiable granularity: an elevation of type of information granularity.

Figure 7.6 Plots of pdfs of the data for which the principle of justifiable granularity is applied. Shown are also inhibitory data (governed by

p

2

).

Figure 7.7 Plots of

V

(

b

) for

γ

 = 1 (solid line) and

γ

 = 0 (dotted line).

Figure 7.8 Original numeric mapping along with the interval bounds.

Figure 7.9 Linear mapping (dark line) along with its interval (granular) generalization; gray and dotted lines show the upper and lower bounds, respectively.

Figure 7.10 Nonlinear mapping and its granular (interval‐valued) generalization.

Figure 7.11 From fuzzy models to granular fuzzy models: a formation of a granular space of parameters.

Figure 7.12 Allocation of information granularity in aggregation problem.

Figure 7.13 Multistep prediction in granular time series; shown are successive steps of prediction.

Figure 7.14 Forming granular models of higher type: a concept.

Figure 7.15 Interval‐valued output of the rule‐based fuzzy model; gray lines—the bounds of the interval.

Figure 7.16 Interval‐valued output of the rule‐based fuzzy model: the bounds

y

and

y

+

of the interval are shown in gray.

Figure 7.17 Characteristics of type‐2 granular rule‐based models; dotted lines show the bounds produced by the type‐2 (granular) intervals.

Figure 7.18 An overview of the design of granular classifiers; note a functionality of the preprocessing module forming a granular feature space. Two modes of performance evaluation: (a) considering class membership and (b) considering binary classification.

Chapter 08

Figure 8.1 Geometry of data distributed from the origin at some constant distance

ρ

.

Figure 8.2 Example of a dendrogram formed through hierarchical clustering; in the case of agglomerative bottom‐up clustering, starting from the number of clusters being equal to the number of data in

X

, larger groups are formed by aggregating the two clusters that are the closest to each other.

Figure 8.3 Computing distances in single‐linkage (a), complete‐linkage (b), and average‐linkage (c) variants of hierarchical clustering.

Figure 8.4 From data to hotspots and information granules of higher type.

Figure 8.5 Synthetic data coming as a mixture of normal distributions; various symbols denote data coming from the individual distributions.

Figure 8.6 Results of fuzzy clustering; included are prototypes superimposed over the existing data.

Figure 8.7 Reconstruction results obtained for the

k

‐means clustering; the plots include the obtained prototypes superimposed over the data.

Figure 8.8 Plots of granular prototypes for prototypes produced by

k

‐means. Shown are the numeric prototypes along with their radii (shown in dark color).

Figure 8.9 Granular prototypes constructed for results produced by the FCM clustering; shown are the numeric centers and their radii (shown in dark color).

Figure 8.10 Granular prototypes built on a basis of prototypes formed by

k

‐means. Inhibitory information is included when forming information granules, centers, and radii are shown in dark color.

Figure 8.11 Dendrograms obtained for hierarchical clustering for

c

 = 2, 5, and 8: (a) single linkage, (b) complete linkage, and (c) average linkage.

Chapter 09

Figure 9.1 Illustration of the principle of the selected imputation methods: (a) statistics‐based imputation and (b) hot deck imputation.

Figure 9.2 Main phases of processing with a visualization of imputation module; here the features space is treated as an

n

‐dimensional space of real numbers

R

n

.

Figure 9.3 Process of granular imputation yielding a new features space.

Figure 9.4 Granular imputation realized with the principle of justifiable granularity.

Figure 9.5 Building augmented features space resulting from data imputation to be used for models of classification and prediction.

Figure 9.6 Location of

z

and the resulting information granule (circles, majority class). (a)

z

surrounded by data coming from the majority class, (b)

z

located in the region exhibiting a mixture of data coming from the majority and minority class, and (c)

z

located in the region occupied by data coming from the minority class.

Figure 9.7 From a set of imbalanced data to their balanced counterpart involving granular data.

Guide

Cover

Table of Contents

Begin Reading

Pages

ii

iii

iv

ix

x

xi

xii

1

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

WILEY SERIES ON METHODS AND APPLICATIONS IN DATA MINING

Series Editor: Daniel T. Larose

Discovering Knowledge in Data: An Introduction to Data Mining, Second Edition • Daniel T. Larose and Chantal D. Larose

Data Mining for Genomics and Proteomics: Analysis of Gene and Protein Expression Data • Darius M. Dziuda

Knowledge Discovery with Support Vector Machines • Lutz Hamel

Data‐Mining on the Web: Uncovering Patterns in Web Content, Structure, and Usage • Zdravko Markov and Daniel T. Larose

Data Mining Methods and Models • Daniel T. Larose

Practical Text Mining with Perl • Roger Bilisoly

Data Mining and Predictive Analytics • Daniel T. Larose and Chantal D. Larose

Pattern Recognition: A Quality of Data Perspective • Władysław Homenda and Witold Pedrycz

PATTERN RECOGNITION

A Quality of Data Perspective

 

 

WŁADYSŁAW HOMENDA

The Faculty of Mathematics and Information Science, Warsaw University of Technology Warsaw, Poland

and

The Faculty of Economics and Informatics in Vilnius, University of BiaŁystok Vilnius, Lithuania

WITOLD PEDRYCZ

The Systems Research Institute, Polish Academy of Sciences Warsaw, Poland

and

The Department of Electrical and Computer Engineering, University of Alberta Edmonton, Alberta, Canada

 

 

 

 

 

 

 

This edition first published 2018© 2018 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Władysław Homenda and Witold Pedrycz to be identified as the authors of this work has been asserted in accordance with law.

Registered OfficesJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of WarrantyThe publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on‐going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.

Library of Congress Cataloging‐in‐Publication Data

Names: Homenda, Władysław, author. | Pedrycz, Witold, 1953- author.Title: Pattern recognition : a quality of data perspective / by Władysław Homenda, Witold Pedrycz.Description: Hoboken, NJ : John Wiley & Sons, 2018. | Series: Wiley series on methods and applications in data mining | Includes bibliographical references and index. |Identifiers: LCCN 2017045206 (print) | LCCN 2017055746 (ebook) | ISBN 9781119302834 (pdf) | ISBN 9781119302858 (epub) | ISBN 9781119302827 (cloth)Subjects: LCSH: Pattern recognition systems. | Pattern perception. | Data mining.Classification: LCC TK7882.P3 (ebook) | LCC TK7882.P3 H66 2018 (print) | DDC 006.4–dc23LC record available at https://lccn.loc.gov/2017045206

Cover design by WileyCover image: © turbodesign777/Gettyimages

PREFACE

Pattern recognition has established itself as an advanced area with a well‐defined methodology, a plethora of algorithms, and well‐defined application areas. For decades, pattern recognition has been a subject of intense theoretical and applied research inspired by practical needs. Prudently formulated evaluation strategies and methods of pattern recognition, especially a suite of classification algorithms, constitute the crux of numerous pattern classifiers. There are numerous representative realms of applications including recognizing printed text and manuscripts, identifying musical notation, supporting multimodal biometric systems (voice, iris, signature), classifying medical signals (including ECG, EEG, EMG, etc.), and classifying and interpreting images.

With the abundance of data, their volume, and existing diversity arise evident challenges that need to be carefully addressed to foster further advancements of the area and meet the needs of the ever‐growing applications. In a nutshell, they are concerned with the data quality. This term manifests in numerous ways and has to be perceived in a very general sense. Missing data, data affected by noise, foreign patterns, limited precision, information granularity, and imbalanced data are commonly encountered phenomena one has to take into consideration in building pattern classifiers and carrying out comprehensive data analysis. In particular, one has to engage suitable ways of transforming (preprocessing) data (patterns) prior to their analysis, classification, and interpretation.

The quality of data impacts the very essence of pattern recognition and calls for thorough investigations of the principles of the area. Data quality exhibits a direct impact on architectures and the development schemes of the classifiers. This book aims to cover the essentials of pattern recognition by casting it in a new perspective of data quality—in essence we advocate that a new framework of pattern recognition along with its methodology and algorithms has to be established to cope with the challenges of data quality. As a representative example, it is of interest to look at the problem of the so‐called foreign (odd) patterns. By foreign patterns we mean patterns not belonging to a family of classes under consideration. The ever‐growing presence of pattern recognition technologies increases the importance of identifying foreign patterns. For example, in recognition of printed texts, odd patterns (say, blots, grease, or damaged symbols) appear quite rarely. On the other hand, in recognition problem completed for some other sources such as geodetic maps or musical notation, foreign patterns occur quite often and their presence cannot be ignored. Unlike printed text, such documents contain objects of irregular positioning, differing in size, overlapping, or having complex shape. Thus, too strict segmentation results in the rejection of many recognizable symbols. Due to the weak separability of recognized patterns, segmentation criteria need to be relaxed and foreign patterns similar to recognized symbols have to be carefully inspected and rejected.

The exposure of the overall material is structured into two parts, Part I: Fundamentals and Part II: Advanced Topics: A Framework of Granular Computing. This arrangement reflects the general nature of the main topics being covered.

Part I addresses the principles of pattern recognition with rejection. The task of a rejection of foreign pattern arises as an extension and an enhancement of the standard schemes and practices of pattern recognition. Essential notions of pattern recognition are elaborated on and carefully revisited in order to clarify on how to augment existing classifiers with a new rejection option required to cope with the discussed category of problems. As stressed, this book is self‐contained, and this implies that a number well‐known methods and algorithms are discussed to offer a complete overview of the area to identify main objectives and to present main phases of pattern recognition. The key topics here involve problem formulation and understanding; feature space formation, selection, transformation, and reduction; pattern classification; and performance evaluation. Analyzed is the evolution of research on pattern recognition with rejection, including historical perspective. Identified are current approaches along with present and forthcoming issues that need to be tackled to ensure further progress in this domain. In particular, new trends are identified and linked with existing challenges. The chapters forming this part revisit the well‐known material, as well as elaborate on new approaches to pattern recognition with rejection. Chapter 1 covers fundamental notions of feature space formation. Feature space is of a paramount relevance implying quality of classifiers. The focus of the chapter is on the analysis and comparative assessment of the main categories of methods used in feature construction, transformation, and reduction. In Chapter 2, we cover a variety of design approaches to the design of fundamental classifiers, including such well‐known constructs as k‐NN (nearest neighbor), naïve Bayesian classifier, decision trees, random forests, and support vector machines (SVMs). Comparative studies are supported by a suite of illustrative examples. Chapter 3 offers a detailed formulation of the problem of recognition with rejection. It delivers a number of motivating examples and elaborates on the existing studies carried out in this domain. Chapter 4 covers a suite of evaluation methods required to realize tasks of pattern recognition with a rejection option. Along with classic performance evaluation approaches, a thorough discussion is presented on a multifaceted nature of pattern recognition evaluation mechanisms. The analysis is extended by dealing with balanced and imbalanced datasets. The discussion commences with an evaluation of a standard pattern recognition problem and then progresses toward pattern recognition with rejection. We tackle an issue of how to evaluate pattern recognition with rejection when the problem is further exacerbated by the presence of imbalanced data. A wide spectrum of measures is discussed and employed in experiments, including those of comparative nature. In Chapter 5, we present an empirical evaluation of different rejecting architectures. An empirical verification is performed using datasets of handwritten digits and symbols of printed music notation. In addition, we propose a rejecting method based on a concept of geometrical regions. This method, unlike rejecting architectures, is a stand‐alone approach to support discrimination between native and foreign patterns. We study the usage of elementary geometrical regions, especially hyperrectangles and hyperellipsoids.

Part II focuses on the fundamental concept of information granules and information granularity. Information granules give rise to the area of granular computing—a paradigm of forming, processing, and interpreting information granules. Information granularity comes hand in hand with the key notion of data quality—it helps identify, quantify, and process patterns of a certain quality. The chapters are structured in a way to offer a top‐down way of material exposure. Chapter 6 brings the fundamentals of information granules delivering the key motivating factors, elaborating on the underlying formalisms (including sets, fuzzy sets, probabilities) along with the operations and transformation mechanisms as well as the characterization of information granules. The design of information granules is covered in Chapter 7. Chapter 8 positions clustering in a new setting, revealing its role as a mechanism of building information granules. In the same vein, it is shown that the clustering results (predominantly of a numeric nature) are significantly augmented by bringing information granularity to the description of the originally constructed numeric clusters. A question of clustering information granules is posed and translated into some algorithmic augmentations of the existing clustering methods. Further studies on data quality and its quantification and processing are contained in Chapter 9. Here we focus on data (value) imputation and imbalanced data—the two dominant manifestations in which the quality of data plays a pivotal role. In both situations, the problem is captured through information granules that lead to the quantification of the quality of data as well as enrich the ensuing classification schemes.

This book exhibits a number of essential and appealing features:

Systematic exposure of the concepts, design methodology, and detailed algorithms. In the organization of the material, we adhere to the top‐down strategy starting with the concepts and motivation and then proceeding with the detailed design materializing in specific algorithms and a slew of representative applications.

A wealth of carefully structured and organized illustrative material. This book includes a series of brief illustrative numeric experiments, detailed schemes, and more advanced problems.

Self‐containment. We aimed at the delivery of self‐contained material providing with all necessary prerequisites. If required, some parts of the text are augmented with a step‐by‐step explanation of more advanced concepts supported by carefully selected illustrative material.

Given the central theme of this book, we hope that this volume would appeal to a broad audience of researchers and practitioners in the area of pattern recognition and data analytics. It can serve as a compendium of actual methods in the area and offer a sound algorithmic framework.

This book could not have been possible without support provided by organizations and individuals.

We fully acknowledge the support provided by the National Science Centre, grant No 2012/07/B/ST6/01501, decision no. UMO‐2012/07/B/ST6/01501.

Dr Agnieszka Jastrzebska has done a meticulous job by helping in the realization of experiments and producing graphic materials. We are grateful to the team of professionals at John Wiley, Kshitija Iyer, and Grace Paulin Jeeva S for their encouragement from the outset of the project and their continuous support through its realization.

Władysław Homenda and Witold Pedrycz

PART IFUNDAMENTALS

CHAPTER 1PATTERN RECOGNITION: FEATURE SPACE CONSTRUCTION

In this chapter, we proceed with a more detailed discussion on the essence and concepts of pattern recognition. We focus on the initial phase of the overall scheme that is focused on feature formation and analysis as well as feature selection. Let us emphasize that, in general, patterns come in various forms: images, voice recordings, text in some natural language, a sequence of structured information (tuples formed according to some key), and so on. A pattern described through a collection of features can be regarded as a generic chunk of information. Features, generally speaking, are descriptors of the patterns. Naturally, the number of features, their nature, and quality influence the quality of ensuing modeling, especially classification. In this chapter, we look at these issues in detail.

The chapter is structured as follows. First, we formulate a theoretical basis of the problems of pattern recognition. We introduce necessary notation and concepts required in the ensuing discussion across the overall book. We formally define features and pattern recognition process. Next, we present practical approaches to feature extraction applied to visual pattern recognition. In particular, we use symbols of printed music notation as examples of patterns. Then, we discuss some elementary feature transformations. Finally, we present various strategies developed for feature selection.

1.1 CONCEPTS

Formally, a standard pattern recognition problem is a task of splitting a set of objects (patterns)

into subsets composed of objects belonging to the same class

such that

(1.1)

A task that results in the formation of subsets O1, O2, …, OC is defined by a mapping called a classifier

(1.2)

where is the set of classes under consideration. For the sake of simplicity, we assume that the mapping Ψ takes values from the set of class indices , that is, class labels, instead of classes themselves.

Pattern recognition is usually performed on some observed set of features that characterize objects, rather than on objects directly. Therefore, we formulate and distinguish a mapping from the space of objects O into the space features X:

(1.3)

The mapping φ is called a features extractor. Subsequently, we consider a mapping from the space of features into the space of classes

(1.4)

Such a mapping is called a classification algorithm or simply a classifier. It is important to notice that the term classifier is used in different contexts: classification of objects, classification of features characterizing objects, and, more precisely, classification of vectors of features from features space. The meaning of this term can be inferred from the context. Therefore, in the ensuing considerations we will not distinguish explicitly which meaning is being used. A composition of the previously mentioned two mappings constitutes the classifier: . In other words, the mapping can be decomposed into the two mappings realized in series .

In general, a classifier Ψ is not known, that is, we do not know the class that a pattern belongs to. However, in pattern recognition problems, it is assumed that the classifier Ψ is known for some subset of the space of patterns called a learning set, namely, labels of patterns are known in supervised learning task. A learning set is a subset of the set of objects, for which class labels are known, that is, for any pattern from the learning set , the value Ψ(o) is known. Building a classifier with the use of known classes of objects from a learning set, that is, a known mapping is the ultimate goal of pattern recognition. A premise motivating this action is that we hope that having a good enough subset of all objects will let us construct a classifier that will be able to successfully assign correct class labels to all patterns. Summarizing, we explore the pattern recognition problem as a design problem aimed at the development of a classifier regarded as the following mapping:

assuming that we are provided with , a subset of all objects with correctly assigned class labels. Such a classifier is decomposed to a feature extractor

and a (features) classifier (or, in other words, a classification algorithm)

as illustrated in Figure 1.1.

Figure 1.1 Pattern recognition schemes: direct mapping from the space of patterns into the space of classes (a) and composition of mappings from the space of patterns into the space of features and from the space of features into the space of classes (b).

Both the feature extractor and the classification algorithm are built based on a learning set L. The classifier ψ divides features space into so‐called decision regions,

(1.5)

and then, of course, the features extractor splits the space of objects into classes

(1.6)

or equivalently

(1.7)

We assume that the classification algorithm splits the space of feature values, that is, it separates the space X into pairwise disjoint subsets, which cover the entire space X:

(1.8)

Figure 1.2 illustrates the split of the space of features and the space of objects done by the classifier ψ and the feature extractor φ.

Figure 1.2 A typical pattern recognition scheme.

Recognition of objects is usually preceded by an extraction of patterns from a given problem. For instance, dealing with a printed text document or printed sheet music, before proceeding with recognition of symbols, they should be isolated from the environment. In this scenario, a pattern is typically a single symbol (say, a letter or a musical symbol), and patterns are located on a page containing some text message or sheet music with some piece of music. Only after the extraction from the environment are patterns subjected to recognition. If we consider patterns that originate from an image, the task of patterns isolation is usually called segmentation. It is preceded by the stage of preprocessing that facilitates the process of segmentation. In other words, preprocessing aims at introducing various modifications to the source image (e.g., binarization, scaling, etc.) that could help extract patterns of higher quality. For details one may consult in chapter 2 of Krig (2014) where signal preprocessing in pattern recognition is addressed. It is worth noting that not all image acquisition is carried out in a perfect environment, namely, there are a number of possible sources of noise and data of low quality (including imbalanced classes and missing data, among others). There has been a range of studies specifically directed to develop methods for image preprocessing for poor quality signals, for instance, with difficult lighting conditions (Tan and Triggs, 2010) or noise (Haris et al., 1998).

Not all pattern recognition tasks have well‐defined and clearly delineated preprocessing and symbols extraction (segmentation) stages. Automatic patterns acquisition often produces excessive, undesirable symbols and ordinary garbage. Let us refer to such patterns as foreign patterns, in contrast to native patterns of proper, recognized classes (cf. Homenda et al., 2014, 2016). In such a case a classification module, which assigns all extracted symbols to designed classes (proper classes of native symbols, labeled and present in the learning set), will produce misclassification for every undesirable symbol and for every garbage symbol. In order to improve the performance of the classification procedure, it is required to construct such classifiers that could assign native symbols to correct class labels and reject undesirable and garbage symbols.

Rejection of symbols can be formally interpreted by considering a new class O0, to which we classify all undesirable and garbage symbols. In consequence, we can distinguish a classification decision region, which separates foreign symbols from useful ones through the classifier ψ:

(1.9)

This new class (decision region) is a distinct subspace of the space X,

(1.10)

where, of course, all former classes are pairwise disjoint. Rejecting foreign symbols implies a certain problem. Unlike objects of proper classes, foreign symbols are usually not similar to one another and do not create a consistent class. They are not well positioned in the feature space. Moreover, most often they are not available at the stage of classifier construction. Therefore, instead of distinguishing a decision region corresponding to a family of foreign objects, it is reasonable to separate areas outside of decision regions of native objects (cf. Homenda et al., 2014). Of course, in such a case, we assume that decision regions of native symbols cover only their own areas and do not exhaust the whole feature space X. An area outside of decision regions of native objects can be formally defined in the following form:

(1.11)

This is illustrated in Figure 1.3.

Figure 1.3 Pattern recognition with rejection.

1.2 FROM PATTERNS TO FEATURES

From now on we will use the term pattern for objects being recognized as well as for features describing and representing these objects. The exact meaning of this term can be inferred from the context of its usage and, if necessary, it could be explicitly indicated.

Let us distinguish two kinds of features characterizing patterns:

Numerical features—features whose values form a set of real numbers

Categorical features—features that are valued in a finite set of values

Values of categorical features can be of any type, for instance, names over some alphabet. Since sets of values of categorical features are finite, they can be enumerated, and values of categorical features can be cast on their indices. Therefore, for the sake of simplicity, categorical features are considered to be numerical ones.

The space of features X is the Cartesian product of individual features X1, X2, …, XM, that is, . Therefore, mappings φ and ψ operate on vectors (x1, x2, …, xM)T. Such vectors are values of the mapping φ and arguments of the mapping ψ. An i‐th element of the vector is denoted xi, , and it describes the value of the i‐th feature value. For the sake of simplicity, a vector of values of features will be simply called a vector of features or a feature vector.

Now, let us focus on patterns represented as monochrome images, say black and white images, which are in fact rectangular tables with elements called black/white pixels. In this book, we concentrate the discussion on scanned handwritten digits, handwritten letters, and symbols of printed music notation, and we rarely switch to other types of patterns. This choice is motivated by the fact that, in the context of the methods studied here, such patterns have superior illustrative properties. However, it should be clear that the studied methods are of a general nature and could be easily applied to other types of patterns.

As mentioned before, pattern recognition rarely applies to patterns directly, that is, to patterns present in their original form. In almost each case of pattern recognition, features describing patterns are processed. This observation motivates us to discuss in subsequent sections selected features of monochrome images. These features are especially relevant if we consider processing of printed or handwritten letters, digits, symbols of printed music notation, symbols of geodetic maps, and so on. It is worth stressing that different features can be acquired and applied in order to process other kinds of patterns, for example, those present in speech recognition, signal processing, and others.

In our experiments with recognition of handwritten digits, letters, and symbols of printed music notation, we used the following groups of features: numerical, vectorial, vectorial transformed to vectorial, and vectorial transformed to numerical.

Let us consider a treble clef, an example of a pattern taken from a dataset of music notation symbols. A treble clef is depicted in Figure 1.4. It is a monochrome (black and white) image. We will be referring to such a pattern in terms of a raster scan: a rectangular collection of pixels that we locate inside a bounding box of width W and height H (cf. Figure 1.4). In other words, a bounding box is the smallest rectangle part of an image enclosing a pattern. In Figure 1.4, the bounding box is identified as a frame used in order to clearly identify the smallest rectangle of pixels enclosing a pattern.

Figure 1.4 A treble clef, a symbol belonging to a data set of printed music notation, taken as an example of a pattern. The pattern is surrounded by a bounding box of width W = 22 and height H = 60 pixels. The bounding box is not part of the pattern; it has been added only for illustrative purposes.

Specifically, a raster scan pattern is represented as a mapping:

(1.12)

1.2.1 Vectorial Features

Only a very limited number of numerical features, effectively employed in pattern recognition problems, can be derived directly from patterns. These features are discussed later in this chapter. However, many numerical features are derived indirectly from vectorial ones.

Vectorial features are usually created based on a bounding box of a given pattern (cf. Figures 1.5 and 1.6). Now, let us discuss the most prominent examples of vectorial features of monochrome images: projections, margins, and transitions.

Horizontal and vertical projections:

Horizontal projection is a vector of numbers of black pixels in rows.

Vertical projection is a vector of numbers of black pixels in columns.

Therefore, horizontal projection is a vector (of numbers) of length equal to the height of the bounding box (H), while vertical projection is a vector (of numbers) of the length equal to the width of the bounding box (W):

(1.13)

Left, right, bottom, and top margins:

The left margin are indices of the last white pixels preceding the first black ones, pixel indexing starts from 1 from the left side of each row. It is zero if a row begins with a black pixel; it is

W

(the width of the bounding box) if no black pixel is in the row.

The right margin are indices of the last black pixels, pixel indexing starts from 1 from the left side in each row; it is 0 if no black pixel is in the row.

The bottom and top margins are defined like left and right margins, indexing starts from 1 and goes from bottom to top in each column.

Hence, left and right margins are vectors (of numbers) of lengths equal to the height of the bounding box, while bottom and top margins are vectors (of numbers) of length equal to its width; the detailed formulas for margins are as follows:

(1.14)

Horizontal and vertical transitions:

The horizontal transition is the number of pairs of two consecutive white and black pixels in given rows.

The vertical transition is the number of such pairs in columns.

Like in the case of projections, transitions are vectors of numbers of respective length.

(1.15)

Figure 1.5 Vectorial features: (a) original pattern, (b) horizontal projection, (c) vertical projection, (d) left margin, (e) right margin, (f) bottom margin, (g) top margin, (h) horizontal transition, and (i) vertical transition. Please note that the transition values are very small, so in order to enhance visibility, we multiplied them by 4.

Figure 1.6