Big Data and Differential Privacy - Nii O. Attoh-Okine - E-Book

Big Data and Differential Privacy E-Book

Nii O. Attoh-Okine

0,0
118,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

A comprehensive introduction to the theory and practice of contemporary data science analysis for railway track engineering Featuring a practical introduction to state-of-the-art data analysis for railway track engineering, Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering addresses common issues with the implementation of big data applications while exploring the limitations, advantages, and disadvantages of more conventional methods. In addition, the book provides a unifying approach to analyzing large volumes of data in railway track engineering using an array of proven methods and software technologies. Dr. Attoh-Okine considers some of today's most notable applications and implementations and highlights when a particular method or algorithm is most appropriate. Throughout, the book presents numerous real-world examples to illustrate the latest railway engineering big data applications of predictive analytics, such as the Union Pacific Railroad's use of big data to reduce train derailments, increase the velocity of shipments, and reduce emissions. In addition to providing an overview of the latest software tools used to analyze the large amount of data obtained by railways, Big Data and Differential Privacy: Analysis Strategies for Railway Track Engineering: * Features a unified framework for handling large volumes of data in railway track engineering using predictive analytics, machine learning, and data mining * Explores issues of big data and differential privacy and discusses the various advantages and disadvantages of more conventional data analysis techniques * Implements big data applications while addressing common issues in railway track maintenance * Explores the advantages and pitfalls of data analysis software such as R and Spark, as well as the Apache(TM) Hadoop® data collection database and its popular implementation MapReduce Big Data and Differential Privacy is a valuable resource for researchers and professionals in transportation science, railway track engineering, design engineering, operations research, and railway planning and management. The book is also appropriate for graduate courses on data analysis and data mining, transportation science, operations research, and infrastructure management. NII ATTOH-OKINE, PhD, PE is Professor in the Department of Civil and Environmental Engineering at the University of Delaware. The author of over 70 journal articles, his main areas of research include big data and data science; computational intelligence; graphical models and belief functions; civil infrastructure systems; image and signal processing; resilience engineering; and railway track analysis. Dr. Attoh-Okine has edited five books in the areas of computational intelligence, infrastructure systems and has served as an Associate Editor of various ASCE and IEEE journals.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 308

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Title Page

Copyright

Preface

Acknowledgments

Chapter 1: Introduction

1.1 General

1.2 Track Components

1.3 Characteristics of Railway Track Data

1.4 Railway Track Engineering Problems

1.5 Wheel–Rail Interface Data

1.6 Geometry Data

1.7 Track Geometry Degradation Models

1.8 Rail Defect Data

1.9 Inspection and Detection Systems

1.10 Rail Grinding

1.11 Traditional Data Analysis Techniques

1.12 Remarks

References

Chapter 2: Data Analysis – Basic Overview

2.1 Introduction

2.2 Exploratory Data Analysis (EDA)

2.3 Symbolic Data Analysis

2.4 Imputation

2.5 Bayesian Methods and Big Data Analysis

2.6 Remarks

References

Chapter 3: Machine Learning: A Basic Overview

3.1 Introduction

3.2 Supervised Learning

3.3 Unsupervised Learning

3.4 Semi-Supervised Learning

3.5 Reinforcement Learning

3.6 Data Integration

3.7 Data Science Ontology

3.8 Imbalanced Classification

3.9 Model Validation

3.10 Ensemble Methods

3.11 Big and Small ()

3.12 Deep Learning

3.13 Data Stream Processing

3.14 Remarks

References

Chapter 4: Basic Foundations of Big Data

4.1 Introduction

4.2 Query

4.3 Taxonomy of Big Data Analytics in Railway Track Engineering

4.4 Data Engineering

4.5 Remarks

References

Chapter 5: Hilbert–Huang Transform, Profile, Signal, and Image Analysis

5.1 Hilbert–Huang Transform

5.2 Axle Box Acceleration

5.3 Analysis

5.4 Remarks

References

Chapter 6: Tensors – Big Data in Multidimensional Settings

6.1 Introduction

6.2 Notations and Definitions

6.3 Tensor Decomposition Models

6.4 Application

6.5 Remarks

References

Chapter 7: Copula Models

7.1 Introduction

7.2 Pair Copula: Vines

7.3 Computational Example

7.4 Remarks

References

Chapter 8: Topological Data Analysis

8.1 Introduction

8.2 Basic Ideas

8.3 A Simple Railway Track Engineering Application

8.4 Remarks

References

Chapter 9: Bayesian Analysis

9.1 Introduction

9.2 Markov Chain Monte Carlo (MCMC)

9.3 Approximate Bayesian Computation

9.4 Markov Chain Monte Carlo Application

9.5 ABC Application

9.6 Remarks

References

Chapter 10: Basic Bayesian Nonparametrics

10.1 General

10.2 Dirichlet Family

10.3 Dirichlet Process

10.4 Finite Mixture Modeling

10.5 Bayesian Nonparametric Railway Track

10.6 Remarks

References

Chapter 11: Basic Metaheuristics

11.1 Introduction

11.2 Remarks

References

Chapter 12: Differential Privacy

12.1 General

12.2 Differential Privacy

12.3 Remarks

References

Index

End User License Agreement

Pages

xi

xii

xiii

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

197

198

199

200

201

202

203

204

205

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

249

250

251

252

Guide

Cover

Table of Contents

Preface

Begin Reading

List of Illustrations

Chapter 1: Introduction

Figure 1.1 Track structure components

Figure 1.2 Classification of random data

Figure 1.3 Classification of deterministic data

Figure 1.4 Engineering signals

Figure 1.5 Wheel–rail contact impacts

Figure 1.6 Wheel–rail interface

Figure 1.7 Regions of wheel/rail contact

Figure 1.8 Different types of switches and crossings

Figure 1.9 Schematized standard turnout and its components

Figure 1.10 Classification of track geometry models based on parameters' uncertainty

Figure 1.11 Linear representation of track geometry degradation and restoration based on the standard deviation of roughness

Figure 1.12 Nonlinear representation of track geometry degradation and restoration based on the standard deviation of roughness

Figure 1.13 Rail defects distribution

Figure 1.14 Cross-section of a rail

Figure 1.15 Transverse, vertical, and horizontal places of track

Figure 1.16 Surface regions of rail head

Figure 1.17 Repository of rail head surface classes: normal or noncritical surface

Figure 1.18 Rail defects per mile

Figure 1.19 Rolling contact fatigue. Courtesy: Johannes Bremsteller

Figure 1.20 Different rail maintenance strategies. Courtesy: Johannes Bremsteller

Chapter 2: Data Analysis – Basic Overview

Figure 2.1 Box plot for some track geometry parameters

Figure 2.2 Histogram for some track geometry parameters

Figure 2.3 Q–Q plot for some track geometry parameters

Figure 2.4 Time series data table – track surface inspection

Figure 2.5 Illustration of multivariate scatter plots for different inspection times

Chapter 3: Machine Learning: A Basic Overview

Figure 3.1 Overview of different classifier categories (Camps-Valls and Bruzzone (2009). Reproduced with the permission of John Wiley and Sons)

Figure 3.2 Illustration of data science ontology

Figure 3.3 Transformation of original data to feature space

Figure 3.4 Two hyperplanes (Fu et al. (2014). Reproduced with the permission of Springer)

Figure 3.5 Training and testing approach

Figure 3.6 Training, testing, and validation

Figure 3.7 Receiver operating characteristic (ROC) curve

Figure 3.8 Illustration of bagging procedure

Figure 3.9 Big and small

Figure 3.10 Bias/variance decomposition

Figure 3.11 Bias/variance – method selection

Figure 3.12 A graphical representation of a regional split applied to a univariate scatterplot (Sekulic and Kowalski (1992). Reproduced with the permission of John Wiley and Sons)

Figure 3.13 (a) Examples for surface defects and (b) non-defective samples (Soukup and Huber-Mörk, 2014). Reproduced with the permission of Springer

Figure 3.14 CNN architecture for surface defect detection: two convolutional and pooling layers and a final fully connected layer (Soukup and Huber-Mörk, 2014). Reproduced with the permission of Springer

Figure 3.15 RBM structure

Figure 3.16 Generic DBN

Figure 3.17 DBN layer-wise training process. “” is the input vector, while “” are DBN hidden layers. In each training iteration, one DBN layer is considered as a hidden RBM layer. DBN arrows indicate the direction of the generative model

Figure 3.18 Deep learning CNN model architecture

Figure 3.19 Representation of clustering (Galvan-Nunez and Attoh-Okine, 2016). Reproduced with the permission of American Society of Civil Engineers

Figure 3.20 Clustering process (Galvan-Nunez and Attoh-Okine, 2016). Reproduced with the permission of American Society of Civil Engineers

Figure 3.21 Example of a hash table (El-Metwally et al., 2014). Repoduced with the permission of Springer

Figure 3.22 Example of a Bloom filter (El-Metwally et al., 2014). Repoduced with the permission of Springer

Figure 3.23 Count–min sketch idea

Figure 3.24 IWS sample signal

Figure 3.25 IWS application

Chapter 4: Basic Foundations of Big Data

Figure 4.1 The big data analysis pipeline (Jagadish, 2015). Reproduced with the permission of Elsevier

Figure 4.2 Railway big data

Figure 4.3 Big data environment

Figure 4.4 Landscape

Figure 4.5 Taxonomy of data model

Figure 4.6 Big data versus traditional data

Figure 4.7 Big data taxonomy

Figure 4.8 Five Vs of big data

Figure 4.9 Data size (Adarkwa, 2015). Reproduced with the permission of University of Delaware

Figure 4.10 MapReduce architecture (Attoh-Okine, 2016). Reproduced with the permission of Cambridge University Press

Figure 4.11 Pseudocode of the

-means algorithm

Figure 4.12 Pseudocode MapReduce-based

-means algorithm

Figure 4.13 Apache Spark

Chapter 5: Hilbert–Huang Transform, Profile, Signal, and Image Analysis

Figure 5.1 Illustration of the HHT

Figure 5.2 Sifting process

Figure 5.3 Part of synthetic data

Figure 5.4 Signal and IMF components

Figure 5.5 Plot of instantaneous wave number against distance for highest wave number component IMFs

Figure 5.6 Wavelet transform of synthetic data

Figure 5.7 Analysis of cross-level

Figure 5.8 Comparative analysis of cross-level at different months (June and July)

Figure 5.9 Analysis of surface (right)

Figure 5.10 Comparative analysis of surface (right)

Figure 5.11 Analysis of alignment (right)

Figure 5.12 Comparative analysis of alignment (right)

Figure 5.13 Post-processing ensemble empirical mode decomposition. Courtesy: Ding and Lin, 2010

Figure 5.14 Using BEMD to remove shadows

Figure 5.15 Subtraction of images

Figure 5.16 Preprocessing of track images

Figure 5.17 Preprocessing of track images

Figure 5.18 Schematic view of the axle box acceleration measuring and diagnosis system (Oregui et al., 2016). Reproduced with the permission of John Wiley and Sons

Chapter 6: Tensors – Big Data in Multidimensional Settings

Figure 6.1 3D tensor fibers

Figure 6.2 3D tensor slices

Figure 6.3 Cross-level measurement at different dates

Figure 6.4 Track geometry parameters measurements

Figure 6.5 Data structure for the cross-level

Figure 6.6 Loading plot for cross-level

Figure 6.7 Correlation analysis (matrix)

Figure 6.8 Loading plot for distance points

Figure 6.9 Data structure for cross-level, surface (right) and alignment on same date

Figure 6.10 Correlation matrix

Figure 6.11 Loading plots

Figure 6.12 Loading plots for points

Chapter 7: Copula Models

Figure 7.1 Pairs plot of the track geometry data set with scatterplots above and contour plots with standard normal margins below the diagonal

Figure 7.2 (a) -Plot. (b) Chi-plot. (c) Empirical lambda function (black line), theoretical lambda function of a Student's copula (gray line), as well as independence and comonotonicity limits (dashed lines)

Figure 7.3 Four-dimensional -vine, where Student's copula, Frank copula, Normal/Gaussian copula, and independent copula with corresponding empirical values shown on the links with the copula family

Figure 7.4 Four-dimensional -vine, where Normal/Gaussian copula, Student's copula, Frank copula, and independent copula with corresponding empirical tau values shown on the links with the copula family

Chapter 8: Topological Data Analysis

Figure 8.1 Illustration of simplex

Figure 8.2 Simplicial complex

Figure 8.3 Betti numbers

Figure 8.4 Filtration

Figure 8.5 Persistence diagram

Figure 8.6 Schematic representation of TDA

Figure 8.7 Application of TDA

Chapter 9: Bayesian Analysis

Figure 9.1 ABC steps

Figure 9.2

Figure 9.3 ABC steps cont'd

Figure 9.4 ABC step 1

Figure 9.5 ABC step 2

Figure 9.6 ABC step 3

Figure 9.7 ABC step 4

Figure 9.8 ABC step 5

Figure 9.9 Trace. (a) Intercept, (b) degradation rate, (c) white noise

Figure 9.10 Kernel density. (a) Intercept, (b) degradation rate, (c) white noise

Figure 9.11 Autocorrelation plot. (a) Intercept, (b) degradation rate, (c) white noise

Figure 9.12 Example of ABC simulations (histograms)

Chapter 10: Basic Bayesian Nonparametrics

Figure 10.1 Stick-breaking process

Figure 10.2 Chinese restaurant process

Figure 10.3 Chinese restaurant process continued

Chapter 11: Basic Metaheuristics

Figure 11.1 Relationship between data science with evolutionary algorithms and swarm intelligence (Cheng et al., 2016) Reproduced with the permission of BioMed Central Ltd

Chapter 12: Differential Privacy

Figure 12.1 Sensitivity function

Figure 12.2 General structure of differential privacy

Figure 12.3 An example of DP in rail tank safety

List of Tables

Chapter 1: Introduction

Table 1.1 Taxonomy of big data in railway engineering

Table 1.2 Engineering problems

Table 1.3 Track inspection technologies.

Table 1.4 Vertical track forces

Table 1.5 Indicators for each type of defect according to EN 13848-5 (Teixeira and Andrade, 2014) Reproduced with the permission of Springer

Table 1.6 Summary of literature review

a

).

Table 1.7 Rail defects classification

Table 1.8 Transverse defects

Table 1.9 Longitudinal defects

Table 1.10 Web defects

Table 1.11 Base defects

Table 1.12

Table 1.13 Surface defectsWheel burns

Table 1.14 NDT techniques for the rail industry.

Table 1.15 Automated visual railway component inspection methods

Table 1.16 Big data versus traditional data

Chapter 2: Data Analysis – Basic Overview

Table 2.1 Examples of symbolic variables

Chapter 3: Machine Learning: A Basic Overview

Table 3.1 Confusion matrix

Table 3.2 Sample of deep learning application in railway track engineering

Table 3.3 Dominant rail group structural distresses

Table 3.4 Dominant sleeper, fastening, and ballast structural distresses

Table 3.5 Definition of track classes

Table 3.6 Maintenance and repair strategies

Table 3.7 Decision table

Table 3.8 Consistent table

Table 3.9 Reduced set of independent variables of decision rules

Table 3.10 Equations to determine numerical parameters of the LogLog Counter

Table 3.11 Dependency between the sketch size and accuracy

Table 3.12 Streaming techniques

Table 3.13 Application of machine learning techniques.

Chapter 4: Basic Foundations of Big Data

Table 4.1 System for large data applications

Table 4.2 Comparison between big data and traditional data

Table 4.3 Traditional data warehousing versus big data issues

Table 4.4 Comparison between stream processing and batch processing

Table 4.5 Taxonomy of big data methods in railway track engineering

Table 4.6 Key definitions of railway track engineering

Table 4.7 Data definition

Table 4.8 Railway track engineering and big data

Chapter 5: Hilbert–Huang Transform, Profile, Signal, and Image Analysis

Table 5.1 Classification of track vertical defects upon their wavelengths

Table 5.2 Comparison between Fourier, wavelet, and HHT

Table 5.3 Some applications of HHT in railway track engineering analysis

Chapter 7: Copula Models

Table 7.1 Archimedean copulas

Table 7.2 Kendall and Spearman's values

Table 7.3 Correlation matrix based on Kendall's tau

Table 7.4 The empirical Kendall's matrix and the sum over the absolute entries of each row for the track geometry data set

Table 7.5 The empirical Kendall's matrix and the sum over the absolute entries of each row for the derailment data set given alignment (right) () as first root

Table 7.6 Properties of pair-copula families considered

Table 7.7 Log-likelihood, number of parameters, AIC, and BIC for -vine and -vine copula models using maximum likelihood estimation (MLE) or sequential estimates

Chapter 8: Topological Data Analysis

Table 8.1 Equivalent definitions of the cycle and boundary groups

Chapter 9: Bayesian Analysis

Table 9.1 Conjugate prior distributions

Table 9.2 Cross-level data

a

Table 9.3 Selected case studies of Bayesian analysis in railway track engineering

Chapter 11: Basic Metaheuristics

Table 11.1 Selected examples

Wiley Series in Operations Research and Management Science

 

A complete list of the titles in this series appears at the end of this volume.

Big Data and Differential Privacy

Analysis Strategies for Railway Track Engineering

 

 

Nii O. Attoh-Okine

 

 

 

 

 

This edition first published 2017

© 2017 John Wiley & Sons, Inc.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Nii O. Attoh-Okine to be identified as the author of this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

Editorial Office

111 River Street, Hoboken, NJ 07030, USA

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print-on-demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

The publisher and the authors make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties; including without limitation any implied warranties of fitness for a particular purpose. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for every situation. In view of on-going research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. The fact that an organization or website is referred to in this work as a citation and/or potential source of further information does not mean that the author or the publisher endorses the information the organization or website may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this works was written and when it is read. No warranty may be created or extended by any promotional statements for this work. Neither the publisher nor the author shall be liable for any damages arising here from.

Library of Congress Cataloguing-in-Publication Data

Names: Attoh-Okine, Nii O., author.

Title: Big data and differential privacy : analysis strategies for railway track engineering / Nii O. Attoh-Okine.

Other titles: Wiley series in operations research and management science.

Description: Hoboken, NJ : John Wiley & Sons, 2017. | Series: Wiley series in operations research and management science | Includes bibliographical references and index.

Identifiers: LCCN 2017005398 (print) | LCCN 2017010092 (ebook) | ISBN 9781119229049 (cloth) | ISBN 9781119229056 (pdf) | ISBN 9781119229063 (epub)

Subjects: LCSH: Railroad tracks\endash Mathematical models. | Data protection-Mathematics. | Big data. | Differential equations.

Classification: LCC TF241 .A88 2017 (print) | LCC TF241 (ebook) | DDC 625.1/4028557-dc23

LC record available at https://lccn.loc.gov/2017005398

Cover design: Wiley

Cover image: (Top Image) © Jaap Hart/iStockphoto; (Bottom Image) © mbbirdy/Gettyimages

Preface

The ability of railway track engineers to handle and process large and continuous streams of data will provide a considerable opportunity for railway agencies. This will help decision makers to make informed decisions about the maintenance, reliability, and safety of the railway tracks. Now a period is beginning in which the problem is collecting the railway track data and analyzing it in a defined period of time. Therefore, the tools and methods needed to achieve this analysis need to be addressed. Knowledge derived from big data analytics in railway track engineering will become one of the foundational elements of any railway organization and agency. Also, another key issue has been the protection of data by different railway organizations. Therefore, although the data are available, they are really shared among different agencies. This makes the issue of differential privacy of utmost importance in the railway industry. Also, it is not clear if the industry has developed a clear way of both protecting and accessing the data from third parties.

Data science is an emerging field that has all the characteristics needed by railway track engineers to address and handle the enormous amounts of data generated by various technology platforms currently in place. The major objective is for railway track engineers to have an understanding of big data. Using the right tools and methodologies, railway track big data will also uncover new directions for monitoring and collecting railway track data; this apart from the engineering side will also have a major business impact on railway agencies.

This book provides the fundamental concepts needed to work with big data applications for railway engineers. The concepts serve as a foundation, and it is assumed that the reader has some understanding of railway engineering. The book does not attempt to address railway track engineering as a subject, but it does address the use of data science and the big data paradigm in railway track applications. Colleagues in industry will find the book very handy, but it will also serve as a new direction for graduate students interested in data science and the big data paradigm in infrastructure systems. The work in this book is intended to be accessible to an audience broader than those in railway track engineering.

Furthermore, I hope to shed a bright light on the enormous potential and future development that the big data paradigm will bring to railway track engineering. Theamount of data railway agencies already have and the amount they are planning to collect in the future make this book an important milestone. This book attempts to bring together new emerging topics in a coherent way that can address different methodologies that can be used in solving a variety of railway track problems in the analysis of large data from various inspection technologies. In preparing the book, I tried to achieve the following objectives: (a) to develop some data science ontologies, (b) to provide the formulation of large railway track data using big data analytics, (c) to provide direction on how to present the data (visualization of the results), (d) to provide practical applications for the railway and infrastructure industry, and (e) to provide a new direction in railway track data analysis.

Finally, I assume full responsibility for any errors in the book. The opinions presented in the book represent my experiences in civil infrastructure systems, machine learning, signal analysis, and probability analysis.

January, 2016

Nii O. Attoh-OkineNewark, Delaware, USA

Acknowledgments

I would like to thank the staff of John Wiley & Sons, Inc., especially Susanne Steitz-Filler, for their time. I would also like to thank Dr. Allan Zarembski and Joe Palese and Hugh Thompson of FRA for their support and encouragement. Thanks also to my current and former graduate students Dr. Yaw Adu-Gyamfi, Dr. Offei Adarkwa, and Emmanuel Martey for offering constructive criticisms. Special thanks to Silvia Galvan-Nunez who additionally provided me support with the complex LaTex issues. I would also like to thank Erin Huston for editing the first draft of the book. Finally, as always, I would like to thank my family: my two children, Nii Attoh and Naa Djama; my wife, Rebecca, for providing the peace and excellent working environment; and my brother, Ashalley Attoh-Okine, an excellent actuary and energy expert, who introduced me to so many data analysis techniques, which have been part of my research over the years. I dedicate the book to the memory of my parents, Madam Charkor Quaynor and Richard Ayi Attoh-Okine, and my maternal grandparents, Madam Botor Clottey and Robert Quaynor.

Chapter 1Introduction

1.1 General

Currently, railroads collect enormous quantities of data through vehicle-based inspection cars, trackside (or wayside) monitoring systems, hand-held gauges, and visual inspections. In addition, these data are located geographically using the global positioning system (GPS). The data from these inspection systems are collected electronically by hand or using various sensors, video inspections, machine visions, and many other sources. Furthermore, the data are growing both in quantity and quality and are more precise and diverse. Data of extremely large sizes are difficult to analyze using traditional approaches since they may exceed the limits of a typical spreadsheet. The railway track data are present in diverse forms, including categorical, numerical, or continuous values. The general characteristics of the data dictate which type of method is appropriate for analysis. For example, categorical and nominal values are unsorted, while numerical and continuous values are assumed to be sorted or to represent ordinal data (Ramírez-Gallego et al., 2016).

The development of advanced sensors and information technology in railway infrastructure monitoring and control has provided a platform for the expansive growth of data. This has created a new paradigm in the processing, storing, streaming, and visualization of data and information. Furthermore, changes in technology include the possibility of installing sensors and smart chips in critical infrastructure to measure system performance, current condition, and other indicators of imminent failures. Many of the railway infrastructure components have communication capabilities that allow data to be uploaded on demand.

Big data is about extremely large volumes of data originating from various sources: databases, audio and video, millions of sensors, and other systems. The sources of data in some cases provide structured outputs, but most are unstructured, semi-structured, or poly-structured. These data are streaming in some cases with high velocity, and the data exposes at a higher speed or some speed as it is generated.

This chapter presents a general overview, basic description, and properties of deterministic and random data that are encountered in railway track engineering data and relies heavily on the data output based on the advances in sensors, information technology, high information technology, and development that has led to extremely massive data sets. These large data sets have made the traditionalanalytical techniques used for railway track maintenance and safety issues somewhat obsolete.

The data obtained in railway track monitoring are collected by different sensors, at different times and environmental conditions, at different frequencies, and at different resolutions. The outputs of these data have different characteristics: discrete or continuous, spatial or temporal, signal and images, and categorical and objective, among others. All these characteristics, properties, and the extreme volume of data collected have made traditional analytical techniques very inefficient; issues like visualization and data streaming, which are very critical in railway track maintenance and safety, are not adequately addressed. The traditional statistical techniques fail to scale up to the extremely large volumes of data collected by railway inspection vehicles and trackside monitoring devices. Therefore, the growing amount of data generated by railway track inspection activities is outpacing the current capacity to explore and interpret these data and hence appropriately addresses maintenance and safety issues.

1.2 Track Components

The term “tracks” includes superstructure, substructure, and special structures (Figure 1.1). The superstructure is made of rails, ties, fasteners, turnouts, and crossings, while the substructure consists of ballast, subballast, the subgrade, and other drainage facilities. The superstructure and substructure are separated by the tie–ballast interface.

Figure 1.1 Track structure components

The main purpose of the railway track structure is to provide a safe and economical train transportation system through guiding the vehicle and transmitting loads through the track components to the subgrade. The carrying capacity and long-term durability of the track structure highly depend on how the superstructure and substructure respond to and interact with each other when subjected to moving trains and environmental factors (Selig and Waters, 1994; Kerr, 2003).

The function of different rail components has been presented by various authors, such as Hay (1982), Selig and Waters (1994), Esveld (2001), Kerr (2003), Sadeghi (2010), and Tzanakakis (2013). The aim of this section is to summarize this function. The rails are the longitudinal steel members that are placed on spaced ties to guide the train wheels evenly and continuously. Their strength and stiffness must be sufficient to maintain a steady shape and smooth track configuration and to resist various forces (vertical, lateral, and longitudinal) by vehicles. The rails also in some cases serve as electrical conductors for the signal circuit and also as a groundline for the electric locomotive power circuit. The profile of the rail surface (transverse and longitudinal) and wheel surface has a major influence on the operation of the vehicles on the track, and track defects may in some instances create and cause large dynamic loads that lead to derailment and safety issues, as well as accelerated degradation.

Most steel rail sections are connected either by bolted joints or by welding. The bolted joints create several problems, including rough riding track, undesirable vibration, and additional impact loads, among others; hence, the use of continuous welded rail (CWR) has been the better solution. CWR attempts to address some of the disadvantages of the bolted joints, which have its own set of maintenance requirements.

The rail fastener systems, or fastenings, include all the components that connect the rail to the tie, with the tie plate, spike, and anchor for wood ties and clip, insulator, and elastic fasteners for concrete ties. The function of the fastenings is to retain the rail against the ties and resist vertical, lateral, longitudinal, and overturning movements of the rail. They also serve as wheel load impact attenuation, increasing track elasticity, as well as electrical isolation between rails.

For concrete tie tracks, rail pads are installed on rail supporting points to reduce and transfer the stress and dynamic forces from the rail to the ties, and they reduce the interaction force between the rail and the ties (Choi, 2014). The pads also provide adequate resistance to longitudinal and rotational movement of the rail and provide a conforming layer between the rail and tie to avoid contact areas of high pressure. From a dynamic point of view, the rail pads tend to influence overall track stiffness.

Ties are transverse beams resting on ballast and support. They span below and tie together two rails. The main functions of ties are as follows:

Uniformly transfer and distribute loads from the rail to the ballast

Hold the fastening system to maintain proper track gage

Restrain the lateral, longitudinal, and vertical rail movement by anchorage of the superstructure to the ballast

Provide a cant to the rails to help develop proper wheel–rail contact by matching the inclination of the conical wheel shape

Provide an insulation layer

Allow fast drainage of fluid

Allow for proper ballast maintenance

Ballast is the layer of crushed stone placed at the top layer of the substructure in which the tie is embedded. It is an elastic support and transfers forces from the rail and tie to the subballast. As some of its functions, it

Distributes load from ties uniformly over the subgrade

Anchors the track in place against lateral, vertical, and longitudinal movements

Absorbs shock from the dynamic load

Allows suitable global and local track settlement

Avoids freezing and melting (thawing) problems by frost action

Allows for proper drainage

Allows for maintenance of the track geometry

The subballast is the layer between the ballast and the subgrade. As some of its functions, it

Reduces the stress at the bottom of the ballast layers to a reasonable level to protect the subgrade

Migrates fines from the subgrade to the upper layer of the ballast

Protects the subgrade from the ballast

Permits drainage of water that might otherwise flow upward from the subgrade

The subgrade is the last support of the track systems and, in some cases, is the existing soil at the location, unless the existing formation is very weak. In the case of a weak existing formation, techniques like stabilization and modification of the existing elevation use more appropriate soil. The addition of geosynthetic material has been used to improve the subgrade performance and bearing capacity. Its main functions are the following:

Provide support to the track structure

Bear and distribute the resultant load from the train vehicle through the track structure

Provide sufficient drainage

1.3 Characteristics of Railway Track Data

Railway track data are similar to data from other infrastructures. Its characteristics include the following:

Massive Data Sets

. Railway track data collection and monitoring has resulted in extremely large data sets for infrastructure monitoring. In some cases, the actual data are processed and only the reduced version is stored, while in most cases smaller amounts of data are stored for further analysis.

Unstructured Data, Heterogeneous Databases

. Some of the railway track data are stored in databases. In most cases, different agencies and countries have different data formats, different database management systems, and different data manipulation algorithms. Most of these databases are evolving, which in some cases makes analysis and data mining across them challenging. Some of the databases include unstructured images, plots, and tables, as well as links to other transportation and infrastructure documents of the agency. This can be challenging in terms of both analysis and reporting.

Information in the Form of Images

. The analysis of railway track, in terms of both rail and geometry defects, by its very nature deals with issues associated with the extraction of meaningful information from massive amounts of railway track images, thus opening a new direction in railway track analysis.

Poor Quality of Data

. Railway track data analysis, especially the image data, in most cases is of poor quality due to the railway track environment and sensor noise. In some cases, data are missing or input incorrectly. Furthermore, the data from different sources can vary in terms of quality. Also, the railway inspectors may in some cases have incomplete knowledge about the mechanism and initiation of different defects. This may lead to inconclusive reporting and analysis.

Multiresolution and Multisensor Data

. Several different sensors are used to collect different information and data. This may create a situation where several images may have different resolutions over time. Therefore, care must be taken so that the change in resolution can be included.

Noisy Data

. Noisy data cannot be avoided in railway track data collections. Methods of reducing the noise in data need to be implemented during the preprocessing of the data for further analysis. For example, shadows and orientations of the vehicle collecting the data can have an impact on the images. Therefore, poor illumination can have a major impact on the obtained image.

Missing Data

. The risk of missing data is always present in railway track data collection; this is mostly due to sensor malfunction. Filling the gaps can be a daunting task. Again care must be taken with how missing data is included.

Streaming Data

. Some of the data sets collected during railway monitoring can be streaming in nature; that is, a constant stream of data is being collected and received. This requires a specialized set of analyses different from the chunk data methods used in traditional analysis.

More broadly, the data can either be random or deterministic. The random data is shown in Figure 1.2, and the deterministic data is shown in Figure 1.3, as presented by Bendat (1998).

Figure 1.2 Classification of random data

(Bendat (1998). Reproduced with their permission of John Wiley & Sons)

Figure 1.3 Classification of deterministic data

(Bendat (1998). Reproduced with their permission of John Wiley & Sons)

Table 1.1 shows the general taxonomy of big data methods in railway engineering.

Table 1.1 Taxonomy of big data in railway engineering

Analysis domain

Sources

Characteristics

Approaches

Comments

Structured data

Field data collection, sensors, data from scientific experiments

Structured records, real time

Data mining, statistical analysis

All infrastructure systems need field data

Unstructured data

Extreme events, sensors

Unstructured records, mixture of variables

Anomaly detection

Infrastructure inspection reports, specification updates

Text analytics

Logs, email, corporate documents, government rules and regulations, text content of web pages, citizen feedback and comments

Unstructured, rich textual, context, semantic, language dependent