118,99 €
A comprehensive introduction to the theory and practice of contemporary data science analysis for railway track engineering Featuring a practical introduction to state-of-the-art data analysis for railway track engineering, Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering addresses common issues with the implementation of big data applications while exploring the limitations, advantages, and disadvantages of more conventional methods. In addition, the book provides a unifying approach to analyzing large volumes of data in railway track engineering using an array of proven methods and software technologies. Dr. Attoh-Okine considers some of today's most notable applications and implementations and highlights when a particular method or algorithm is most appropriate. Throughout, the book presents numerous real-world examples to illustrate the latest railway engineering big data applications of predictive analytics, such as the Union Pacific Railroad's use of big data to reduce train derailments, increase the velocity of shipments, and reduce emissions. In addition to providing an overview of the latest software tools used to analyze the large amount of data obtained by railways, Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering: * Features a unified framework for handling large volumes of data in railway track engineering using predictive analytics, machine learning, and data mining * Explores issues of big data and differential privacy and discusses the various advantages and disadvantages of more conventional data analysis techniques * Implements big data applications while addressing common issues in railway track maintenance * Explores the advantages and pitfalls of data analysis software such as R and Spark, as well as the Apache(TM) Hadoop® data collection database and its popular implementation MapReduce Big Data and Differential Privacy is a valuable resource for researchers and professionals in transportation science, railway track engineering, design engineering, operations research, and railway planning and management. The book is also appropriate for graduate courses on data analysis and data mining, transportation science, operations research, and infrastructure management. NII ATTOH-OKINE, PhD, PE is Professor in the Department of Civil and Environmental Engineering at the University of Delaware. The author of over 70 journal articles, his main areas of research include big data and data science; computational intelligence; graphical models and belief functions; civil infrastructure systems; image and signal processing; resilience engineering; and railway track analysis. Dr. Attoh-Okine has edited five books in the areas of computational intelligence, infrastructure systems and has served as an Associate Editor of various ASCE and IEEE journals.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 308
Veröffentlichungsjahr: 2017
Cover
Title Page
Copyright
Preface
Acknowledgments
Chapter 1: Introduction
1.1 General
1.2 Track Components
1.3 Characteristics of Railway Track Data
1.4 Railway Track Engineering Problems
1.5 Wheel–Rail Interface Data
1.6 Geometry Data
1.7 Track Geometry Degradation Models
1.8 Rail Defect Data
1.9 Inspection and Detection Systems
1.10 Rail Grinding
1.11 Traditional Data Analysis Techniques
1.12 Remarks
References
Chapter 2: Data Analysis – Basic Overview
2.1 Introduction
2.2 Exploratory Data Analysis (EDA)
2.3 Symbolic Data Analysis
2.4 Imputation
2.5 Bayesian Methods and Big Data Analysis
2.6 Remarks
References
Chapter 3: Machine Learning: A Basic Overview
3.1 Introduction
3.2 Supervised Learning
3.3 Unsupervised Learning
3.4 Semi-Supervised Learning
3.5 Reinforcement Learning
3.6 Data Integration
3.7 Data Science Ontology
3.8 Imbalanced Classification
3.9 Model Validation
3.10 Ensemble Methods
3.11 Big and Small ()
3.12 Deep Learning
3.13 Data Stream Processing
3.14 Remarks
References
Chapter 4: Basic Foundations of Big Data
4.1 Introduction
4.2 Query
4.3 Taxonomy of Big Data Analytics in Railway Track Engineering
4.4 Data Engineering
4.5 Remarks
References
Chapter 5: Hilbert–Huang Transform, Profile, Signal, and Image Analysis
5.1 Hilbert–Huang Transform
5.2 Axle Box Acceleration
5.3 Analysis
5.4 Remarks
References
Chapter 6: Tensors – Big Data in Multidimensional Settings
6.1 Introduction
6.2 Notations and Definitions
6.3 Tensor Decomposition Models
6.4 Application
6.5 Remarks
References
Chapter 7: Copula Models
7.1 Introduction
7.2 Pair Copula: Vines
7.3 Computational Example
7.4 Remarks
References
Chapter 8: Topological Data Analysis
8.1 Introduction
8.2 Basic Ideas
8.3 A Simple Railway Track Engineering Application
8.4 Remarks
References
Chapter 9: Bayesian Analysis
9.1 Introduction
9.2 Markov Chain Monte Carlo (MCMC)
9.3 Approximate Bayesian Computation
9.4 Markov Chain Monte Carlo Application
9.5 ABC Application
9.6 Remarks
References
Chapter 10: Basic Bayesian Nonparametrics
10.1 General
10.2 Dirichlet Family
10.3 Dirichlet Process
10.4 Finite Mixture Modeling
10.5 Bayesian Nonparametric Railway Track
10.6 Remarks
References
Chapter 11: Basic Metaheuristics
11.1 Introduction
11.2 Remarks
References
Chapter 12: Differential Privacy
12.1 General
12.2 Differential Privacy
12.3 Remarks
References
Index
End User License Agreement
xi
xii
xiii
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
197
198
199
200
201
202
203
204
205
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
249
250
251
252
Cover
Table of Contents
Preface
Begin Reading
Chapter 1: Introduction
Figure 1.1 Track structure components
Figure 1.2 Classification of random data
Figure 1.3 Classification of deterministic data
Figure 1.4 Engineering signals
Figure 1.5 Wheel–rail contact impacts
Figure 1.6 Wheel–rail interface
Figure 1.7 Regions of wheel/rail contact
Figure 1.8 Different types of switches and crossings
Figure 1.9 Schematized standard turnout and its components
Figure 1.10 Classification of track geometry models based on parameters' uncertainty
Figure 1.11 Linear representation of track geometry degradation and restoration based on the standard deviation of roughness
Figure 1.12 Nonlinear representation of track geometry degradation and restoration based on the standard deviation of roughness
Figure 1.13 Rail defects distribution
Figure 1.14 Cross-section of a rail
Figure 1.15 Transverse, vertical, and horizontal places of track
Figure 1.16 Surface regions of rail head
Figure 1.17 Repository of rail head surface classes: normal or noncritical surface
Figure 1.18 Rail defects per mile
Figure 1.19 Rolling contact fatigue. Courtesy: Johannes Bremsteller
Figure 1.20 Different rail maintenance strategies. Courtesy: Johannes Bremsteller
Chapter 2: Data Analysis – Basic Overview
Figure 2.1 Box plot for some track geometry parameters
Figure 2.2 Histogram for some track geometry parameters
Figure 2.3 Q–Q plot for some track geometry parameters
Figure 2.4 Time series data table – track surface inspection
Figure 2.5 Illustration of multivariate scatter plots for different inspection times
Chapter 3: Machine Learning: A Basic Overview
Figure 3.1 Overview of different classifier categories (Camps-Valls and Bruzzone (2009). Reproduced with the permission of John Wiley and Sons)
Figure 3.2 Illustration of data science ontology
Figure 3.3 Transformation of original data to feature space
Figure 3.4 Two hyperplanes (Fu et al. (2014). Reproduced with the permission of Springer)
Figure 3.5 Training and testing approach
Figure 3.6 Training, testing, and validation
Figure 3.7 Receiver operating characteristic (ROC) curve
Figure 3.8 Illustration of bagging procedure
Figure 3.9 Big and small
Figure 3.10 Bias/variance decomposition
Figure 3.11 Bias/variance – method selection
Figure 3.12 A graphical representation of a regional split applied to a univariate scatterplot (Sekulic and Kowalski (1992). Reproduced with the permission of John Wiley and Sons)
Figure 3.13 (a) Examples for surface defects and (b) non-defective samples (Soukup and Huber-Mörk, 2014). Reproduced with the permission of Springer
Figure 3.14 CNN architecture for surface defect detection: two convolutional and pooling layers and a final fully connected layer (Soukup and Huber-Mörk, 2014). Reproduced with the permission of Springer
Figure 3.15 RBM structure
Figure 3.16 Generic DBN
Figure 3.17 DBN layer-wise training process. “” is the input vector, while “” are DBN hidden layers. In each training iteration, one DBN layer is considered as a hidden RBM layer. DBN arrows indicate the direction of the generative model
Figure 3.18 Deep learning CNN model architecture
Figure 3.19 Representation of clustering (Galvan-Nunez and Attoh-Okine, 2016). Reproduced with the permission of American Society of Civil Engineers
Figure 3.20 Clustering process (Galvan-Nunez and Attoh-Okine, 2016). Reproduced with the permission of American Society of Civil Engineers
Figure 3.21 Example of a hash table (El-Metwally et al., 2014). Repoduced with the permission of Springer
Figure 3.22 Example of a Bloom filter (El-Metwally et al., 2014). Repoduced with the permission of Springer
Figure 3.23 Count–min sketch idea
Figure 3.24 IWS sample signal
Figure 3.25 IWS application
Chapter 4: Basic Foundations of Big Data
Figure 4.1 The big data analysis pipeline (Jagadish, 2015). Reproduced with the permission of Elsevier
Figure 4.2 Railway big data
Figure 4.3 Big data environment
Figure 4.4 Landscape
Figure 4.5 Taxonomy of data model
Figure 4.6 Big data versus traditional data
Figure 4.7 Big data taxonomy
Figure 4.8 Five Vs of big data
Figure 4.9 Data size (Adarkwa, 2015). Reproduced with the permission of University of Delaware
Figure 4.10 MapReduce architecture (Attoh-Okine, 2016). Reproduced with the permission of Cambridge University Press
Figure 4.11 Pseudocode of the
-means algorithm
Figure 4.12 Pseudocode MapReduce-based
-means algorithm
Figure 4.13 Apache Spark
Chapter 5: Hilbert–Huang Transform, Profile, Signal, and Image Analysis
Figure 5.1 Illustration of the HHT
Figure 5.2 Sifting process
Figure 5.3 Part of synthetic data
Figure 5.4 Signal and IMF components
Figure 5.5 Plot of instantaneous wave number against distance for highest wave number component IMFs
Figure 5.6 Wavelet transform of synthetic data
Figure 5.7 Analysis of cross-level
Figure 5.8 Comparative analysis of cross-level at different months (June and July)
Figure 5.9 Analysis of surface (right)
Figure 5.10 Comparative analysis of surface (right)
Figure 5.11 Analysis of alignment (right)
Figure 5.12 Comparative analysis of alignment (right)
Figure 5.13 Post-processing ensemble empirical mode decomposition. Courtesy: Ding and Lin, 2010
Figure 5.14 Using BEMD to remove shadows
Figure 5.15 Subtraction of images
Figure 5.16 Preprocessing of track images
Figure 5.17 Preprocessing of track images
Figure 5.18 Schematic view of the axle box acceleration measuring and diagnosis system (Oregui et al., 2016). Reproduced with the permission of John Wiley and Sons
Chapter 6: Tensors – Big Data in Multidimensional Settings
Figure 6.1 3D tensor fibers
Figure 6.2 3D tensor slices
Figure 6.3 Cross-level measurement at different dates
Figure 6.4 Track geometry parameters measurements
Figure 6.5 Data structure for the cross-level
Figure 6.6 Loading plot for cross-level
Figure 6.7 Correlation analysis (matrix)
Figure 6.8 Loading plot for distance points
Figure 6.9 Data structure for cross-level, surface (right) and alignment on same date
Figure 6.10 Correlation matrix
Figure 6.11 Loading plots
Figure 6.12 Loading plots for points
Chapter 7: Copula Models
Figure 7.1 Pairs plot of the track geometry data set with scatterplots above and contour plots with standard normal margins below the diagonal
Figure 7.2 (a) -Plot. (b) Chi-plot. (c) Empirical lambda function (black line), theoretical lambda function of a Student's copula (gray line), as well as independence and comonotonicity limits (dashed lines)
Figure 7.3 Four-dimensional -vine, where Student's copula, Frank copula, Normal/Gaussian copula, and independent copula with corresponding empirical values shown on the links with the copula family
Figure 7.4 Four-dimensional -vine, where Normal/Gaussian copula, Student's copula, Frank copula, and independent copula with corresponding empirical tau values shown on the links with the copula family
Chapter 8: Topological Data Analysis
Figure 8.1 Illustration of simplex
Figure 8.2 Simplicial complex
Figure 8.3 Betti numbers
Figure 8.4 Filtration
Figure 8.5 Persistence diagram
Figure 8.6 Schematic representation of TDA
Figure 8.7 Application of TDA
Chapter 9: Bayesian Analysis
Figure 9.1 ABC steps
Figure 9.2
Figure 9.3 ABC steps cont'd
Figure 9.4 ABC step 1
Figure 9.5 ABC step 2
Figure 9.6 ABC step 3
Figure 9.7 ABC step 4
Figure 9.8 ABC step 5
Figure 9.9 Trace. (a) Intercept, (b) degradation rate, (c) white noise
Figure 9.10 Kernel density. (a) Intercept, (b) degradation rate, (c) white noise
Figure 9.11 Autocorrelation plot. (a) Intercept, (b) degradation rate, (c) white noise
Figure 9.12 Example of ABC simulations (histograms)
Chapter 10: Basic Bayesian Nonparametrics
Figure 10.1 Stick-breaking process
Figure 10.2 Chinese restaurant process
Figure 10.3 Chinese restaurant process continued
Chapter 11: Basic Metaheuristics
Figure 11.1 Relationship between data science with evolutionary algorithms and swarm intelligence (Cheng et al., 2016) Reproduced with the permission of BioMed Central Ltd
Chapter 12: Differential Privacy
Figure 12.1 Sensitivity function
Figure 12.2 General structure of differential privacy
Figure 12.3 An example of DP in rail tank safety
Chapter 1: Introduction
Table 1.1 Taxonomy of big data in railway engineering
Table 1.2 Engineering problems
Table 1.3 Track inspection technologies.
Table 1.4 Vertical track forces
Table 1.5 Indicators for each type of defect according to EN 13848-5 (Teixeira and Andrade, 2014) Reproduced with the permission of Springer
Table 1.6 Summary of literature review
a
).
Table 1.7 Rail defects classification
Table 1.8 Transverse defects
Table 1.9 Longitudinal defects
Table 1.10 Web defects
Table 1.11 Base defects
Table 1.12
Table 1.13 Surface defectsWheel burns
Table 1.14 NDT techniques for the rail industry.
Table 1.15 Automated visual railway component inspection methods
Table 1.16 Big data versus traditional data
Chapter 2: Data Analysis – Basic Overview
Table 2.1 Examples of symbolic variables
Chapter 3: Machine Learning: A Basic Overview
Table 3.1 Confusion matrix
Table 3.2 Sample of deep learning application in railway track engineering
Table 3.3 Dominant rail group structural distresses
Table 3.4 Dominant sleeper, fastening, and ballast structural distresses
Table 3.5 Definition of track classes
Table 3.6 Maintenance and repair strategies
Table 3.7 Decision table
Table 3.8 Consistent table
Table 3.9 Reduced set of independent variables of decision rules
Table 3.10 Equations to determine numerical parameters of the LogLog Counter
Table 3.11 Dependency between the sketch size and accuracy
Table 3.12 Streaming techniques
Table 3.13 Application of machine learning techniques.
Chapter 4: Basic Foundations of Big Data
Table 4.1 System for large data applications
Table 4.2 Comparison between big data and traditional data
Table 4.3 Traditional data warehousing versus big data issues
Table 4.4 Comparison between stream processing and batch processing
Table 4.5 Taxonomy of big data methods in railway track engineering
Table 4.6 Key definitions of railway track engineering
Table 4.7 Data definition
Table 4.8 Railway track engineering and big data
Chapter 5: Hilbert–Huang Transform, Profile, Signal, and Image Analysis
Table 5.1 Classification of track vertical defects upon their wavelengths
Table 5.2 Comparison between Fourier, wavelet, and HHT
Table 5.3 Some applications of HHT in railway track engineering analysis
Chapter 7: Copula Models
Table 7.1 Archimedean copulas
Table 7.2 Kendall and Spearman's values
Table 7.3 Correlation matrix based on Kendall's tau
Table 7.4 The empirical Kendall's matrix and the sum over the absolute entries of each row for the track geometry data set
Table 7.5 The empirical Kendall's matrix and the sum over the absolute entries of each row for the derailment data set given alignment (right) () as first root
Table 7.6 Properties of pair-copula families considered
Table 7.7 Log-likelihood, number of parameters, AIC, and BIC for -vine and -vine copula models using maximum likelihood estimation (MLE) or sequential estimates
Chapter 8: Topological Data Analysis
Table 8.1 Equivalent definitions of the cycle and boundary groups
Chapter 9: Bayesian Analysis
Table 9.1 Conjugate prior distributions
Table 9.2 Cross-level data
a
Table 9.3 Selected case studies of Bayesian analysis in railway track engineering
Chapter 11: Basic Metaheuristics
Table 11.1 Selected examples
Wiley Series in Operations Research and Management Science
A complete list of the titles in this series appears at the end of this volume.
Nii O. Attoh-Okine
This edition first published 2017
© 2017 John Wiley & Sons, Inc.
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Nii O. Attoh-Okine to be identified as the author of this work has been asserted in accordance with law.
Registered Offices
John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office
111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of Warranty
The publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on-going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.
Library of Congress Cataloguing-in-Publication Data
Names: Attoh-Okine, Nii O., author.
Title: Big data and differential privacy : analysis strategies for railway track engineering / Nii O. Attoh-Okine.
Other titles: Wiley series in operations research and management science.
Description: Hoboken, NJ : John Wiley & Sons, 2017. | Series: Wiley series in operations research and management science | Includes bibliographical references and index.
Identifiers: LCCN 2017005398 (print) | LCCN 2017010092 (ebook) | ISBN 9781119229049 (cloth) | ISBN 9781119229056 (pdf) | ISBN 9781119229063 (epub)
Subjects: LCSH: Railroad tracks\endash Mathematical models. | Data protection-Mathematics. | Big data. | Differential equations.
Classification: LCC TF241 .A88 2017 (print) | LCC TF241 (ebook) | DDC 625.1/4028557-dc23
LC record available at https://lccn.loc.gov/2017005398
Cover design: Wiley
Cover image: (Top Image) © Jaap Hart/iStockphoto; (Bottom Image) © mbbirdy/Gettyimages
The ability of railway track engineers to handle and process large and continuous streams of data will provide a considerable opportunity for railway agencies. This will help decision makers to make informed decisions about the maintenance, reliability, and safety of the railway tracks. Now a period is beginning in which the problem is collecting the railway track data and analyzing it in a defined period of time. Therefore, the tools and methods needed to achieve this analysis need to be addressed. Knowledge derived from big data analytics in railway track engineering will become one of the foundational elements of any railway organization and agency. Also, another key issue has been the protection of data by different railway organizations. Therefore, although the data are available, they are really shared among different agencies. This makes the issue of differential privacy of utmost importance in the railway industry. Also, it is not clear if the industry has developed a clear way of both protecting and accessing the data from third parties.
Data science is an emerging field that has all the characteristics needed by railway track engineers to address and handle the enormous amounts of data generated by various technology platforms currently in place. The major objective is for railway track engineers to have an understanding of big data. Using the right tools and methodologies, railway track big data will also uncover new directions for monitoring and collecting railway track data; this apart from the engineering side will also have a major business impact on railway agencies.
This book provides the fundamental concepts needed to work with big data applications for railway engineers. The concepts serve as a foundation, and it is assumed that the reader has some understanding of railway engineering. The book does not attempt to address railway track engineering as a subject, but it does address the use of data science and the big data paradigm in railway track applications. Colleagues in industry will find the book very handy, but it will also serve as a new direction for graduate students interested in data science and the big data paradigm in infrastructure systems. The work in this book is intended to be accessible to an audience broader than those in railway track engineering.
Furthermore, I hope to shed a bright light on the enormous potential and future development that the big data paradigm will bring to railway track engineering. Theamount of data railway agencies already have and the amount they are planning to collect in the future make this book an important milestone. This book attempts to bring together new emerging topics in a coherent way that can address different methodologies that can be used in solving a variety of railway track problems in the analysis of large data from various inspection technologies. In preparing the book, I tried to achieve the following objectives: (a) to develop some data science ontologies, (b) to provide the formulation of large railway track data using big data analytics, (c) to provide direction on how to present the data (visualization of the results), (d) to provide practical applications for the railway and infrastructure industry, and (e) to provide a new direction in railway track data analysis.
Finally, I assume full responsibility for any errors in the book. The opinions presented in the book represent my experiences in civil infrastructure systems, machine learning, signal analysis, and probability analysis.
January, 2016
Nii O. Attoh-OkineNewark, Delaware, USA
I would like to thank the staff of John Wiley & Sons, Inc., especially Susanne Steitz-Filler, for their time. I would also like to thank Dr. Allan Zarembski and Joe Palese and Hugh Thompson of FRA for their support and encouragement. Thanks also to my current and former graduate students Dr. Yaw Adu-Gyamfi, Dr. Offei Adarkwa, and Emmanuel Martey for offering constructive criticisms. Special thanks to Silvia Galvan-Nunez who additionally provided me support with the complex LaTex issues. I would also like to thank Erin Huston for editing the first draft of the book. Finally, as always, I would like to thank my family: my two children, Nii Attoh and Naa Djama; my wife, Rebecca, for providing the peace and excellent working environment; and my brother, Ashalley Attoh-Okine, an excellent actuary and energy expert, who introduced me to so many data analysis techniques, which have been part of my research over the years. I dedicate the book to the memory of my parents, Madam Charkor Quaynor and Richard Ayi Attoh-Okine, and my maternal grandparents, Madam Botor Clottey and Robert Quaynor.
Currently, railroads collect enormous quantities of data through vehicle-based inspection cars, trackside (or wayside) monitoring systems, hand-held gauges, and visual inspections. In addition, these data are located geographically using the global positioning system (GPS). The data from these inspection systems are collected electronically by hand or using various sensors, video inspections, machine visions, and many other sources. Furthermore, the data are growing both in quantity and quality and are more precise and diverse. Data of extremely large sizes are difficult to analyze using traditional approaches since they may exceed the limits of a typical spreadsheet. The railway track data are present in diverse forms, including categorical, numerical, or continuous values. The general characteristics of the data dictate which type of method is appropriate for analysis. For example, categorical and nominal values are unsorted, while numerical and continuous values are assumed to be sorted or to represent ordinal data (Ramírez-Gallego et al., 2016).
The development of advanced sensors and information technology in railway infrastructure monitoring and control has provided a platform for the expansive growth of data. This has created a new paradigm in the processing, storing, streaming, and visualization of data and information. Furthermore, changes in technology include the possibility of installing sensors and smart chips in critical infrastructure to measure system performance, current condition, and other indicators of imminent failures. Many of the railway infrastructure components have communication capabilities that allow data to be uploaded on demand.
Big data is about extremely large volumes of data originating from various sources: databases, audio and video, millions of sensors, and other systems. The sources of data in some cases provide structured outputs, but most are unstructured, semi-structured, or poly-structured. These data are streaming in some cases with high velocity, and the data exposes at a higher speed or some speed as it is generated.
This chapter presents a general overview, basic description, and properties of deterministic and random data that are encountered in railway track engineering data and relies heavily on the data output based on the advances in sensors, information technology, high information technology, and development that has led to extremely massive data sets. These large data sets have made the traditionalanalytical techniques used for railway track maintenance and safety issues somewhat obsolete.
The data obtained in railway track monitoring are collected by different sensors, at different times and environmental conditions, at different frequencies, and at different resolutions. The outputs of these data have different characteristics: discrete or continuous, spatial or temporal, signal and images, and categorical and objective, among others. All these characteristics, properties, and the extreme volume of data collected have made traditional analytical techniques very inefficient; issues like visualization and data streaming, which are very critical in railway track maintenance and safety, are not adequately addressed. The traditional statistical techniques fail to scale up to the extremely large volumes of data collected by railway inspection vehicles and trackside monitoring devices. Therefore, the growing amount of data generated by railway track inspection activities is outpacing the current capacity to explore and interpret these data and hence appropriately addresses maintenance and safety issues.
The term “tracks” includes superstructure, substructure, and special structures (Figure 1.1). The superstructure is made of rails, ties, fasteners, turnouts, and crossings, while the substructure consists of ballast, subballast, the subgrade, and other drainage facilities. The superstructure and substructure are separated by the tie–ballast interface.
Figure 1.1 Track structure components
The main purpose of the railway track structure is to provide a safe and economical train transportation system through guiding the vehicle and transmitting loads through the track components to the subgrade. The carrying capacity and long-term durability of the track structure highly depend on how the superstructure and substructure respond to and interact with each other when subjected to moving trains and environmental factors (Selig and Waters, 1994; Kerr, 2003).
The function of different rail components has been presented by various authors, such as Hay (1982), Selig and Waters (1994), Esveld (2001), Kerr (2003), Sadeghi (2010), and Tzanakakis (2013). The aim of this section is to summarize this function. The rails are the longitudinal steel members that are placed on spaced ties to guide the train wheels evenly and continuously. Their strength and stiffness must be sufficient to maintain a steady shape and smooth track configuration and to resist various forces (vertical, lateral, and longitudinal) by vehicles. The rails also in some cases serve as electrical conductors for the signal circuit and also as a groundline for the electric locomotive power circuit. The profile of the rail surface (transverse and longitudinal) and wheel surface has a major influence on the operation of the vehicles on the track, and track defects may in some instances create and cause large dynamic loads that lead to derailment and safety issues, as well as accelerated degradation.
Most steel rail sections are connected either by bolted joints or by welding. The bolted joints create several problems, including rough riding track, undesirable vibration, and additional impact loads, among others; hence, the use of continuous welded rail (CWR) has been the better solution. CWR attempts to address some of the disadvantages of the bolted joints, which have its own set of maintenance requirements.
The rail fastener systems, or fastenings, include all the components that connect the rail to the tie, with the tie plate, spike, and anchor for wood ties and clip, insulator, and elastic fasteners for concrete ties. The function of the fastenings is to retain the rail against the ties and resist vertical, lateral, longitudinal, and overturning movements of the rail. They also serve as wheel load impact attenuation, increasing track elasticity, as well as electrical isolation between rails.
For concrete tie tracks, rail pads are installed on rail supporting points to reduce and transfer the stress and dynamic forces from the rail to the ties, and they reduce the interaction force between the rail and the ties (Choi, 2014). The pads also provide adequate resistance to longitudinal and rotational movement of the rail and provide a conforming layer between the rail and tie to avoid contact areas of high pressure. From a dynamic point of view, the rail pads tend to influence overall track stiffness.
Ties are transverse beams resting on ballast and support. They span below and tie together two rails. The main functions of ties are as follows:
Uniformly transfer and distribute loads from the rail to the ballast
Hold the fastening system to maintain proper track gage
Restrain the lateral, longitudinal, and vertical rail movement by anchorage of the superstructure to the ballast
Provide a cant to the rails to help develop proper wheel–rail contact by matching the inclination of the conical wheel shape
Provide an insulation layer
Allow fast drainage of fluid
Allow for proper ballast maintenance
Ballast is the layer of crushed stone placed at the top layer of the substructure in which the tie is embedded. It is an elastic support and transfers forces from the rail and tie to the subballast. As some of its functions, it
Distributes load from ties uniformly over the subgrade
Anchors the track in place against lateral, vertical, and longitudinal movements
Absorbs shock from the dynamic load
Allows suitable global and local track settlement
Avoids freezing and melting (thawing) problems by frost action
Allows for proper drainage
Allows for maintenance of the track geometry
The subballast is the layer between the ballast and the subgrade. As some of its functions, it
Reduces the stress at the bottom of the ballast layers to a reasonable level to protect the subgrade
Migrates fines from the subgrade to the upper layer of the ballast
Protects the subgrade from the ballast
Permits drainage of water that might otherwise flow upward from the subgrade
The subgrade is the last support of the track systems and, in some cases, is the existing soil at the location, unless the existing formation is very weak. In the case of a weak existing formation, techniques like stabilization and modification of the existing elevation use more appropriate soil. The addition of geosynthetic material has been used to improve the subgrade performance and bearing capacity. Its main functions are the following:
Provide support to the track structure
Bear and distribute the resultant load from the train vehicle through the track structure
Provide sufficient drainage
Railway track data are similar to data from other infrastructures. Its characteristics include the following:
Massive Data Sets
. Railway track data collection and monitoring has resulted in extremely large data sets for infrastructure monitoring. In some cases, the actual data are processed and only the reduced version is stored, while in most cases smaller amounts of data are stored for further analysis.
Unstructured Data, Heterogeneous Databases
. Some of the railway track data are stored in databases. In most cases, different agencies and countries have different data formats, different database management systems, and different data manipulation algorithms. Most of these databases are evolving, which in some cases makes analysis and data mining across them challenging. Some of the databases include unstructured images, plots, and tables, as well as links to other transportation and infrastructure documents of the agency. This can be challenging in terms of both analysis and reporting.
Information in the Form of Images
. The analysis of railway track, in terms of both rail and geometry defects, by its very nature deals with issues associated with the extraction of meaningful information from massive amounts of railway track images, thus opening a new direction in railway track analysis.
Poor Quality of Data
. Railway track data analysis, especially the image data, in most cases is of poor quality due to the railway track environment and sensor noise. In some cases, data are missing or input incorrectly. Furthermore, the data from different sources can vary in terms of quality. Also, the railway inspectors may in some cases have incomplete knowledge about the mechanism and initiation of different defects. This may lead to inconclusive reporting and analysis.
Multiresolution and Multisensor Data
. Several different sensors are used to collect different information and data. This may create a situation where several images may have different resolutions over time. Therefore, care must be taken so that the change in resolution can be included.
Noisy Data
. Noisy data cannot be avoided in railway track data collections. Methods of reducing the noise in data need to be implemented during the preprocessing of the data for further analysis. For example, shadows and orientations of the vehicle collecting the data can have an impact on the images. Therefore, poor illumination can have a major impact on the obtained image.
Missing Data
. The risk of missing data is always present in railway track data collection; this is mostly due to sensor malfunction. Filling the gaps can be a daunting task. Again care must be taken with how missing data is included.
Streaming Data
. Some of the data sets collected during railway monitoring can be streaming in nature; that is, a constant stream of data is being collected and received. This requires a specialized set of analyses different from the chunk data methods used in traditional analysis.
More broadly, the data can either be random or deterministic. The random data is shown in Figure 1.2, and the deterministic data is shown in Figure 1.3, as presented by Bendat (1998).
Figure 1.2 Classification of random data
(Bendat (1998). Reproduced with their permission of John Wiley & Sons)
Figure 1.3 Classification of deterministic data
(Bendat (1998). Reproduced with their permission of John Wiley & Sons)
Table 1.1 shows the general taxonomy of big data methods in railway engineering.
Table 1.1 Taxonomy of big data in railway engineering
Analysis domain
Sources
Characteristics
Approaches
Comments
Structured data
Field data collection, sensors, data from scientific experiments
Structured records, real time
Data mining, statistical analysis
All infrastructure systems need field data
Unstructured data
Extreme events, sensors
Unstructured records, mixture of variables
Anomaly detection
Infrastructure inspection reports, specification updates
Text analytics
Logs, email, corporate documents, government rules and regulations, text content of web pages, citizen feedback and comments
Unstructured, rich textual, context, semantic, language dependent
