Dynamic System Reliability - Liudong Xing - E-Book

Dynamic System Reliability E-Book

Liudong Xing

0,0
107,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Offers timely and comprehensive coverage of dynamic system reliability theory This book focuses on hot issues of dynamic system reliability, systematically introducing the reliability modeling and analysis methods for systems with imperfect fault coverage, systems with function dependence, systems subject to deterministic or probabilistic common-cause failures, systems subject to deterministic or probabilistic competing failures, and dynamic standby sparing systems. It presents recent developments of such extensions involving reliability modelling theory, reliability evaluation methods, and features numerous case studies based on real-world examples. The presented dynamic reliability theory can enable a more accurate representation of actual complex system behavior, thus more effectively guiding the reliable design of real-world critical systems. Dynamic System Reliability: Modelling and Analysis of Dynamic and Dependent Behaviors begins by describing the evolution from the traditional static reliability theory to the dynamic system reliability theory, and provides a detailed investigation of dynamic and dependent behaviors in subsequent chapters. Although written for those with a background in basic probability theory and stochastic processes, the book includes a chapter reviewing the fundamentals that readers need to know in order to understand contents of other chapters which cover advanced topics in reliability theory and case studies. * The first book systematically focusing on dynamic system reliability modelling and analysis theory * Provides a comprehensive treatment on imperfect fault coverage (single-level/multi-level or modular), function dependence, common cause failures (deterministic and probabilistic), competing failures (deterministic and probabilistic), and dynamic standby sparing * Includes abundant illustrative examples and case studies based on real-world systems * Covers recent advances in combinatorial models and algorithms for dynamic system reliability analysis * Offers a rich set of references, providing helpful resources for readers to pursue further research and study of the topics Dynamic System Reliability: Modelling and Analysis of Dynamic and Dependent Behaviors is an excellent book for undergraduate and graduate students, and engineers and researchers in reliability and related disciplines.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 418

Veröffentlichungsjahr: 2019

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Cover

Foreword by Dr. Andre Kleyner, Series Editor

Preface

Nomenclature

1 Introduction

References

2 Fundamental Reliability Theory

2.1 Basic Probability Concepts

2.2 Reliability Measures

2.3 Fault Tree Modeling

2.4 Binary Decision Diagram

2.5 Markov Process

2.6 Reliability Software

References

3 Imperfect Fault Coverage

3.1 Different Types of IPC

3.2 ELC Modeling

3.3 Binary‐State System

3.4 Multi‐State System

3.5 Phased‐Mission System

3.6 Summary

References

4 Modular Imperfect Coverage

4.1 Modular Imperfect Coverage Model

4.2 Nonrepairable Hierarchical System

4.3 Repairable Hierarchical System

4.4 Summary

References

5 Functional Dependence

5.1 Logic OR Replacement Method

5.2 Combinatorial Algorithm

5.3 Case Study 1: Combined Trigger Event

5.4 Case Study 2: Shared Dependent Event

5.5 Case Study 3: Cascading FDEP

5.6 Case Study 4: Dual Event and Cascading FDEP

5.7 Summary

References

6 Deterministic Common‐Cause Failure

6.1 Explicit Method

6.2 Efficient Decomposition and Aggregation Approach

6.3 Decision Diagram–Based Aggregation Method

6.4 Universal Generating Function–Based Method

6.5 Summary

References

7 Probabilistic Common‐Cause Failure

7.1 Single‐Phase System

7.2 Multi‐Phase System

7.3 Impact of PCCF

7.4 Summary

References

8 Deterministic Competing Failure

8.1 Overview

8.2 PFGE Method

8.3 Single‐Phase System with Single FDEP Group

8.4 Single‐Phase System with Multiple FDEP Groups

8.5 Single‐Phase System with PFs Having Global and Selective Effects

8.6 Multi‐Phase System with Single FDEP Group

8.7 Multi‐Phase System with Multiple FDEP Groups

8.8 Summary

References

9 Probabilistic Competing Failure

9.1 Overview

9.2 System with Single Type of Component Local Failures

9.3 System with Multiple Types of Component Local Failures

9.4 System with Random Failure Propagation Time

9.5 Summary

References

10 Dynamic Standby Sparing

10.1 Types of Standby Systems

10.2 CTMC‐Based Method

10.3 Decision Diagram−Based Method

10.4 Approximation Method

10.5 Event Transition Method

10.6 Overview of Optimization Problems

10.7 Summary

References

Index

End User License Agreement

List of Tables

Chapter 2

Table 2.1 Properties of each communication line.

Table 2.2 SFT gate symbols and definitions.

Table 2.3 DFT gate symbols.

Chapter 3

Table 3.1 State space of multi‐state component

j

[22].

Table 3.2 Link state probabilities.

Table 3.3 Analysis results for the example network.

Table 3.4 Failure parameters (

λ

and

μ

are in 10

−6

hr

−1

) an...

Chapter 4

Table 4.1 Input failure parameters and coverage factors.

Table 4.2 Input repair rate parameters.

Table 4.3 Component state probabilities in

CM

1

.

Table 4.5 Component state probabilities in

CM

3

.

Chapter 5

Table 5.1 Input component failure parameters (/hr) and coverage factors.

Table 5.2 Occurrence probabilities of ITEs.

Table 5.3 Evaluation results of

P

(system fails|

ITE

i

) and

Q

.

Table 5.4 Final system unreliability.

Table 5.5 Input component failure parameters and coverage factors.

Table 5.6 Occurrence probabilities of ITEs.

Table 5.7 Evaluation results of

P

(system fails|

ITE

i

) and

Q

.

Table 5.8 Final system unreliability.

Table 5.9 Input component coverage factors.

Table 5.10 Occurrence probabilities of ITEs.

Table 5.11 Occurrence probabilities of ITEs for evaluating

P

(system fails|

ITE

1

).

Table 5.12 Evaluation results of

P

(system fails|

ITE

1,

i

) and

P

(system fails|

ITE

1

)....

Table 5.13 Evaluation results of

P

(system fails|

ITE

2

).

Table 5.14 Final system unreliability.

Table 5.15 Input component coverage factors.

Table 5.16 Occurrence probabilities of ITEs.

Table 5.17 Occurrence probabilities of ITEs for evaluating

P

(system fails|

ITE

1

).

Table 5.18 Evaluation results of

P

(system fails|

ITE

1,

i

) and

P

(system fails|

ITE

1

)....

Table 5.19 Final system unreliability.

Chapter 6

Table 6.1 CCE space for the example system.

Chapter 7

Table 7.1 Total component failure probability evaluation.

Table 7.2 Component total conditional failure probabilities.

Table 7.3 Unreliability of the example WSN PMS.

Chapter 8

Table 8.1 Unreliability of the example memory sub‐system.

Table 8.2 Failure rates of the example memory system components (/hr).

Table 8.3 Component failure events and failure rates.

Table 8.4 Event space for addressing PFSEs.

Table 8.5 Event space for addressing PFSEs.

Table 8.6 Failure parameters for components

A

,

B

, and

C

.

Table 8.7 Failure parameters for component

E

.

Table 8.8 Phase durations and component failure parameters.

Table 8.9 Phase durations and component failure parameters.

Chapter 9

Table 9.1 Component failure parameters.

Table 9.2 Isolation factor groups (IFGs) of the example WSN.

Table 9.3 PFDCs for the example WSN.

Table 9.4 Unreliability of the example WSN.

Table 9.5 Component failure parameters for the BSN system.

Table 9.6 Isolation factors of the example BSN.

Table 9.7 PFDCs for the example BSN.

Table 9.8 Unreliability results of the example BSN.

Table 9.9 Component time to failure distribution parameters (

β

,

α

).

Table 9.10 PFDCs.

Table 9.11 WSN unreliability for different

β

S

1

PT

.

Chapter 10

Table 10.1 Unreliability of the example cold standby processor subsystem.

Table 10.2 Unreliability of the example warm standby system.

Table 10.3 Results comparison for normal distribution.

Table 10.4 Component parameters of the example standby system.

List of Illustrations

Chapter 2

Figure 2.1 Examples of phased‐mission FTs (a)

P

(excellent) = 1−

P

(TOP

excellent

)...

Figure 2.2 An example of MFTs [34] .

Figure 2.3 A non‐sink node in the BDD model.

Figure 2.4 An example FT illustrating ROBDD generation.

Figure 2.5 Final ROBDD for the example FT using

A

 < 

B

 < 

C

 < 

D

.

Figure 2.6 ROBDD for the example FT using

A < D < B < C

...

Figure 2.7 An example DFT illustrating the Markov analysis.

Figure 2.8 State transition diagram for the example DFT.

Chapter 3

Figure 3.1 Structure of IPCM for component

i

[7].

Figure 3.2 Inserting IPCM to a BDD path [18]. (a) Non‐sink node without IPCM; (...

Figure 3.3 BDD with a useless node

i

 + 1 [19].

Figure 3.4 Insertion of useless node

i

 + 1 and its IPCM [19].

Figure 3.5 FT of an example parallel system.

Figure 3.6 BDD for the example parallel system. (a) Without IPCM; (b) Expanded ...

Figure 3.7 Parallel system unreliability vs. coverage factor

c

.

Figure 3.8 FT of an example series system.

Figure 3.9 BDD for the example series system. (a) Without IPCM; (b) Expanded BD...

Figure 3.10 A non‐sink node in MMDD.

Figure 3.11 An example bridge network [22 , 23] .

Figure 3.12 MMDD for the example network:

F

accept

.

Figure 3.13 PDO for PMS.

Figure 3.14 A sub‐PMS BDD.

Figure 3.15 PMS FT model [25] (

P

(excellent) = 1 − 

P

(TOP

excellent

)).

Figure 3.16 PMS BDD generated using

A

b

 < 

A

a

 < 

B

a

 < 

C

a

 < 

C

b

 < 

D

a

 < 

D

b

 < 

D

c

.

Chapter 4

Figure 4.1 General structure of MIPCM [2] .

Figure 4.2 Hierarchical FT solution to consider MIPCM [5] .

Figure 4.3 An example of an HS [2] .

Figure 4.4 FT at the top layer.

Figure 4.5 FT of

CM

i

at the middle layer.

Figure 4.6 FT of

MM

i,j

at the bottom layer.

Figure 4.7 CTMC for component

A

at layer

i

[2] .

Figure 4.8 CTMC for MC, IC at the bottom layer.

Figure 4.9 CTMC for CPUC, PTC at the middle layer.

Chapter 5

Figure 5.1 An illustrative example. (a) Original DFT model; (b) DFT after the ...

Figure 5.2 DFT of an example memory system [ 17 , 28 ].

Figure 5.3 Reduced FT models.

Figure 5.4 Example system with shared dependent events.

Figure 5.5 Reduced FT model.

Figure 5.6 CTMC for the example system (

F

: system failure).

Figure 5.7 An example system with cascading FDEP.

Figure 5.8 Reduced FT models.

Figure 5.9 Reduced FT models for evaluating

P

(system fails|

ITE

1

).

Figure 5.10 An example system containing cascading FDEP and dual events.

Figure 5.11 Reduced FT models.

Figure 5.12 Reduced FT models for evaluating

P

(system fails|

ITE

1

).

Figure 5.13 CTMC for the example system with dual events.

Chapter 6

Figure 6.1 An example FT.

Figure 6.2 Expanded FT considering CCFs.

Figure 6.3 BDD for the expanded FT.

Figure 6.4 BDD for evaluating

UR

0

.

Figure 6.5 Reduced FT for evaluating

UR

1

.

Figure 6.6 Root node for two

s

‐independent or

s

‐dependent CCs.

Figure 6.7 Root node for two mutually exclusive CCs.

Figure 6.8 FT model of an example computer system [18] .

Figure 6.9 System DD for

s

‐independent or

s

‐dependent CCs.

Figure 6.10 System DD for two disjoint CCs.

Figure 6.11 Structure of an example series‐parallel system.

Chapter 7

Figure 7.1 General structure of the PCCF gate.

Figure 7.2 FT representing component

X

's total failure event [5] .

Figure 7.3 An example of a computer system.

Figure 7.4 FT of the example computer system.

Figure 7.5 Expanded FT for the example computer system.

Figure 7.6 BDD model of the expanded FT.

Figure 7.7 BDD model for the example computer system.

Figure 7.8 FT of component total failure event in phase

i

[8].

Figure 7.9 An illustrative example of a WSN [8] .

Figure 7.10 FT for the example WSN.

Figure 7.11 FT for the example WSN with PCCFs.

Figure 7.12 Expanded PMS FT.

Figure 7.13 PMS BDD for the example WSN system.

Figure 7.14 PMS BDD model for the example WSN system.

Figure 7.15 Unreliability of the example WSN PMS.

Chapter 8

Figure 8.1 An example of a computer system.

Figure 8.2 FT model of the example memory subsystem.

Figure 8.3 Reduced FT for

P

(system fails|

E

1

).

Figure 8.4 BDD model for evaluating

Q

(

t

).

Figure 8.5 Reduced FT for

P

(system fails|

E

3

).

Figure 8.6 FT of the example memory system.

Figure 8.7 Reduced FT for

P

(system failure |

E

1,0

).

Figure 8.8 BDD for

P

(system failure |

E

1,0

).

Figure 8.9 Reduced FT for

P

(system failure |

E

1,1

).

Figure 8.10 BDD for

P

(system failure |

E

1,1

).

Figure 8.11 Reduced FT for

P

(system failure |

E

1,2

).

Figure 8.12 BDD for evaluating

P

(system failure |

E

1,2

).

Figure 8.13 An example of a memory system [14] .

Figure 8.14 FT of the example memory system.

Figure 8.15 Reduced FT for

P

(system fails|

SE

0

).

Figure 8.16 BDD for evaluating

P

(system fails|

SE

0

).

Figure 8.17 Reduced FT for

P

(system fails|

SE

8

).

Figure 8.18 An example PMS FT.

Figure 8.19 Reduced FT for evaluating

Q

1

C

.

Figure 8.20 PMS BDD for

Q

1

C

.

Figure 8.21 Reduced FT under event 1.

Figure 8.22 PMS BDD under event 1.

Figure 8.23 FT of an example three‐phase PMS.

Figure 8.24 CTMC for phase 1.

Figure 8.25 CTMC for phase 2.

Figure 8.26 State mapping from phase 1 to phase 2.

Figure 8.27 CTMC for phase 3.

Figure 8.28 State mapping from phase 2 to phase 3.

Figure 8.29 FT of the example PMS.

Figure 8.30 CTMC for phase 1.

Figure 8.31 CTMC for phase 2.

Figure 8.32 State mapping from phase 1 to phase 2.

Chapter 9

Figure 9.1 Structure of a PFD gate [2] .

Figure 9.2 Example of a WSN for condition monitoring.

Figure 9.3 FT of the example WSN.

Figure 9.4 Reduced FT under

FCE

1

.

Figure 9.5 BDD for evaluating

.

Figure 9.6 Reduced FT under

PFDC

1

.

Figure 9.7 BDD for evaluating

.

Figure 9.8 Reduced FT under

PFDC

2

.

Figure 9.9 BDD for evaluating

.

Figure 9.10 Reduced FT under

PFDC

3

.

Figure 9.11 BDD for evaluating

.

Figure 9.12

s

‐Relationships among PFGE, transmission LF and sensing LF.

Figure 9.13 An example of a BSN system for patient monitoring.

Figure 9.14 FT of the example BSN system.

Figure 9.15 FT under

FCE

1

.

Figure 9.16 BDD under

FCE

1

.

Figure 9.17 Reduced FT under

PFDC

1

.

Figure 9.18 Reduced FT model under

PFDC

2

.

Figure 9.19 Reduced FT model under

PFDC

3

.

Figure 9.20 An example of a smart home power generation system [9].

Figure 9.21 FT of the example WSN system.

Figure 9.22 FT under

FCE

1

.

Figure 9.23 FT under

PFDC

1

.

Figure 9.24 FT under

PFDC

2

.

Figure 9.25 FT under

PFDC

3

.

Chapter 10

Figure 10.1 DFT model of a cold standby system.

Figure 10.2 Markov model of the cold standby system with one spare.

Figure 10.3 DFT model of a warm standby system.

Figure 10.4 Markov model of the warm standby system with one spare.

Figure 10.5 An example of a cold standby system. (a) Original DFT; (b) FT after...

Figure 10.6 SBDD of the example cold standby system.

Figure 10.7 An example warm standby system. (a) Original DFT; (b) FT after repl...

Figure 10.8 SBDD for the example warm standby system.

Figure 10.9 The final SBDD of the hard disk system.

Figure 10.10 Unreliability comparison for exponential distribution.

Figure 10.11 Unreliability comparison for exponential distribution.

Figure 10.12 Unreliability of 1‐out‐of‐10 cold standby system with normal distr...

Figure 10.13 Performance level probabilities

p

j

(

t

).

Figure 10.14 Cumulative performance distributions

P

(

G

(

t

) ≥

x

).

Guide

Cover

Table of Contents

Begin Reading

Pages

iii

vi

ix

xi

xii

xiii

14

xv

xvi

1

2

3

4

5

6

7

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

Wiley Series in Quality and Reliability Engineering

Dr. Andre Kleyner

Series Editor

The Wiley Series in Quality and Reliability Engineering aims to provide a solid educational foundation for both practitioners and researchers in Q&R field and to expand the reader's knowledge base to include the latest developments in this field. The series will provide contribution to the teaching and practice of engineering.

The series coverage will contain, but is not exclusive to,

statistical methods;

physics of failure;

reliability modeling;

functional safety;

Six Sigma methods;

lead‐free electronics;

warranty analysis/management; and

risk and safety analysis.

Wiley Series in Quality and Reliability Engineering

Design for Safety

by Louis J. Gullo, Jack Dixon

February 2018

Next Generation HALT and HASS: Robust Design of Electronics and Systems

by Krik A. Gray, John J. Paschkewitz

May 2016

Reliability and Risk Models: Setting Reliability Requirements, 2nd Edition

by Michael Todinov

September 2015

Applied Reliability Engineering and Risk Analysis: Probabilistic Models and Statistical Inference

By Ilia B. Frenkel, Alex Karagrigoriou, Anatoly Lisnianski, Andre V. Kleyner

September 2013

Design for Reliability

by Dev G. Raheja (Editor), Louis J. Gullo (Editor)

July 2012

Effective FMEAs: Achieving Safe, Reliable, and Economical Products and Process using Failure Mode and Effects Analysis

by Carl Carlson

April 2012

Failure Analysis: A Practical Guide for Manufactures of Electronic Components and Systems

by Marius Bazu, Titu Bajenescu

April 2011

Reliability Technology: Principles and Practice of Failure Prevention in Electronic Systems

by Norman Pascoe

April 2011

Improving Product Reliability: Strategies and Implementation

by Mark A. Levin, Ted T. Kalal

March 2003

Test Engineering: A Concise Guide to Cost‐Effective Design, Development and Manufacture

by Patrick O'Connor

April 2001

Integrated Circuit Failure Analysis: A Guide to Preparation Techniques

by Friedrich Beck

January 1998

Measurement and Calibration Requirements for Quality Assurance to ISO 9000

by Alan S. Morris

October 1997

Electronic Component Reliability: Fundamentals, Modeling, Evaluation, and Assurance

by Finn Jensen

November 1995

Dynamic System Reliability

Modeling and Analysis of Dynamic and Dependent Behaviors

Liudong Xing

University of Massachusetts Dartmouth, USA

 

Gregory Levitin

The Israel Electric Corporation

 

Chaonan Wang

Jinan University, China

Copyright

This edition first published 2019

© 2019 John Wiley & Sons Ltd

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.

The right of Liudong Xing, Gregory Levitin and Chaonan Wang to be identified as the authors of this work has been asserted in accordance with law.

Registered Offices

John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

Editorial Office

The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK

For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.

Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.

Limit of Liability/Disclaimer of Warranty

While the publisher and authors have used their best efforts in preparing this work, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials or promotional statements for this work. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a specialist where appropriate. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

Library of Congress Cataloging‐in‐Publication Data applied for

Hardback: 9781119507635

Cover design: Wiley

Cover image: © sakkmesterke/Shutterstock

Foreword by Dr. Andre Kleyner, Series Editor

“Dynamic System Reliability: Modeling and Analysis of Dynamic and Dependent Behaviors”

by Xing, Levitin and Wang

The importance of quality and reliability to a system can hardly be disputed. Product failures in the field inevitably lead to losses in the form of repair cost, warranty claims, customer dissatisfaction, product recalls, loss of sale, and in extreme cases, loss of life.

Engineering systems are becoming more and more complex with added functions and capabilities. Modeling of such complex systems, assessment of their performance, risk analysis and reliability prediction present an increasingly challenging task. Functional dependency, fault detection and coverage, common cause failures, redundancies, standby modes and other interactions among system components further complicate the modeling process requiring new methods and approaches to address the dynamic system reliability.

This book has been written by the leading experts in the field of dynamic reliability and multi‐state systems. It discusses many technical aspects of modeling the reliability of complex systems when the reliabilities of their components change with time due to various types of interactions and state changes.

This book will be a great addition to the Wiley Series in Quality and Reliability Engineering, which aims to provide a solid educational foundation for researchers and practitioners in the field of quality and reliability engineering and to expand the knowledge base by including the latest developments in these disciplines.

Despite its obvious importance, quality and reliability education is paradoxically lacking in today's engineering curriculum. Few engineering schools offer degree programs or even a sufficient variety of courses in quality or reliability methods. Therefore, the majority of quality and reliability practitioners receive their professional training from colleagues, professional seminars, publications and technical books. The lack of formal education opportunities in this field greatly emphasizes the importance of technical publications for professional development.

We hope that this book, as well as the whole series, will continue Wiley's tradition of excellence in technical publishing and provide a lasting and positive contribution to the teaching and practice of engineering.

Preface

Dynamic behavior and dependence are typical characteristics of modern engineering and computing systems and products. Specifically, system load, stress levels, redundancy levels, and other operating environment parameters can be changing with time, causing dynamics in failure behavior of system components and in reliability requirements of the entire system. In addition, system components may have significant dependencies or correlations in time or function during the mission process. Modeling effects of these dynamic and dependent behaviors is crucial for accurate system reliability modeling and analysis, and further design optimization and maintenance activities.

Traditional system reliability models can define only the static logical structure of a system, but not the dynamic and dependent behaviors of the system and its components. Thus, reliability analysis results obtained using the traditional reliability models often deviate from the actual system reliability performance significantly, misleading system design, operation, and maintenance efforts. Therefore, the traditional reliability theory must be extended and enhanced for addressing the dependent and dynamic behaviors. This book presents recent developments of such extensions involving dynamic system reliability modeling theory, reliability evaluation methods, and case studies based on real‐world examples.

The topic of the book “Dynamic System Reliability” has gained increasing attention in the reliability and safety community in the past few decades. Research articles on this subject are continuously being published in peer‐reviewed journals and conference proceedings. However, to the best of the authors' knowledge, the subject has never been adequately or systematically included in any reliability book. Therefore, there is a great need for such a book covering recent developments on the dynamic system reliability modeling and analysis techniques. With an increased and sustained interest in this subject, it is the right time to publish this book.

This book particularly focuses on hot issues of dynamic system reliability, systematically introducing the reliability modeling and analysis methods for systems with imperfect fault coverage, systems with functional dependence, systems subject to deterministic or probabilistic common‐cause failures, systems subject to deterministic or probabilistic competing failures, and dynamic standby sparing systems.

In the Introduction, the book describes the evolution from the traditional static reliability theory to the dynamic system reliability theory, and provides an overview description of dynamic and dependent behaviors addressed in the subsequent chapters of the book.

In Chapter 2, the book reviews basic probability and reliability concepts, various reliability measures, different types of fault trees, fundamentals of binary decision diagrams (a combinatorial model for system reliability analysis), and Markov processes. Some reliability analysis software tools are also introduced.

Chapter 3 introduces an inherent behavior of fault‐tolerant systems called imperfect fault coverage. Just like any system component, the recovery mechanism of a system is hard to be perfect; it can fail such that the system cannot adequately detect, locate, isolate, or recover from a fault occurring in the system. The uncovered component fault may propagate through the system, causing extensive damage to the system. Reliability models and evaluation methods for addressing the imperfect fault coverage in binary‐state systems, multi‐state systems, and phased‐mission systems are discussed in this chapter.

Chapter 4 discusses an extension of the traditional imperfect fault coverage concept to the modular imperfect fault coverage for systems with hierarchical structures. Due to the layered recovery of hierarchical systems, the extent of the damage from an uncovered component fault may exhibit multiple levels. This chapter introduces the modeling of such a modular imperfect fault coverage behavior as well as methods for considering the behavior in the reliability analysis of nonrepairable and repairable hierarchical systems.

Chapter 5 focuses on the functional dependence (Functional DEPendence, FDEP) behavior of complex systems, where the failure of one component (or in general the occurrence of a certain trigger event) causes other components (referred to as dependent components) within the same system to become unusable or inaccessible. The OR‐gate replacement method is discussed for systems with perfect fault coverage. The combinatorial algorithm is discussed for systems with imperfect fault coverage. Case studies involving combined trigger events, cascading effects, dual‐role events, and shared dependent events are also presented in this chapter.

Chapter 6 focuses on the reliability modeling of traditional deterministic common‐cause failures, where the occurrence of a root cause results in deterministic failures of multiple system components simultaneously or in a short time interval. Methods based on Decomposition and Aggregation, Decision Diagrams, and Universal Generating Functions are discussed.

Chapter 7 discusses the extension of the traditional common‐cause failures to the probabilistic common‐cause failures, where the occurrence of a root cause results in failures of multiple system components with different probabilities. Both explicit and implicit methods are discussed for single‐phase and multi‐phase systems.

Chapter 8 presents the deterministic competing failure behavior in systems with the FDEP. This behavior is concerned with competitions in the time domain between the failure isolation and failure propagation effects, causing distinct system statuses. Reliability modeling of the deterministic competing effects is discussed for different types of systems, including single‐phase systems with a single FDEP group, single‐phase systems with multiple FDEP groups, single‐phase systems with both global and selective effects, multi‐phase systems with a single FDEP group, and multi‐phase systems with multiple FDEP groups.

Chapter 9 focuses on probabilistic competing failures, which extend the deterministic competing failure behavior by considering probabilistic or uncertain failure isolation effects (commonly found in systems involving relayed wireless communications). Systems with a single type of local component failures, multiple different types of local component failures, and random propagation times are modeled and illustrated with real‐world examples from wireless sensor networks, body sensor systems, and smart homes.

Chapter 10 presents diverse methods for the reliability analysis of standby sparing systems, including the traditional Markov‐based method, the decision diagrams−based method, the approximation method based on the central limit theorem, and the recently developed event transition method.

The book has the following distinct features:

It is the first book systematically focusing on dynamic system reliability modeling and analysis theory.

It provides a comprehensive treatment on imperfect fault coverage (single‐level ormulti‐level/modular), functional dependence, common‐cause failures (deterministic or probabilistic), competing failures (deterministic or probabilistic), and dynamic standby sparing.

It includes abundant illustrative examples and case studies based on real‐world systems.

It covers recent advances in combinatorial models and algorithms for dynamic system reliability analysis.

It has a rich set of references, providing helpful resources for readers to pursue further study and research of the subjects.

The target audience of the book is undergraduate and graduate students, engineers and researchers in reliability and related disciplines. The readers should have a background in basic probability theory and stochastic processes. However, the book includes a chapter reviewing the fundamentals that the readers need to know for understanding the contents of the other chapters, covering advanced topics in reliability theory and case studies. The book can provide the readers with knowledge and insights on complex system reliability behaviors, as well as skills of modeling and analyzing these behaviors for guiding reliability design of real‐world systems.

We would like to extend our sincere gratitude and appreciation to researchers who have developed some underlying concepts and models of this book, or have co‐authored with us on some subjects of the book, to name a few, Professor Joanne Bechta Dugan and Professor Barry W. Johnson from the University of Virginia, Professor Kishor S. Trivedi from Duke University, Dr. Suprasad V. Amari from BAE Systems, USA, Dr. Akhilesh Shrestha from Autoliv Inc., USA, Dr. Ola Tannous from Illinois Institute of Technology, USA, Dr. Prashanthi Boddu from Global Prior Art Inc., USA, Dr. Yujie Wang from the University of Electronic Science and Technology of China, Ms. Guilin Zhao from the University of Massachusetts Dartmouth, USA, Professor Yuchang Mo from Huaqiao University, China, and Professor Rui Peng from the University of Science and Technology Beijing, China. There are many other researchers to mention. We have tried to recognize their contributions in the bibliographical references of this book.

Finally, it is our great pleasure to work with the editorial staff from Wiley, who have assisted in the publication of this book, their efforts and support are greatly appreciated.

June 8, 2018

Liudong Xing

Gregory Levitin

Chaonan Wang

Nomenclature

ACP

Application Communication Phase

BDD

Binary Decision Diagram

BEM

BDD Expansion Method

BSN

Body Sensor Network

CC

Common Cause

CCE

Common‐Cause Event

CCF

Common‐Cause Failure

CCG

Common‐Cause Group

cdf

cumulative distribution function

CLT

Central Limit Theorem

CM

Computing Module

CPR

Combinatorial Phase Requirement

CPUC

CPU Chip

CSP

Cold SPare

CTE

Combined Trigger Event

CTMC

Continuous Time Markov Chain

DC

Dependent Component

DD

Decision Diagram

DFT

Dynamic Fault Tree

EDA

Efficient Decomposition and Aggregation

ELC

Element Level Coverage

EMB

External Memory Block

FCE

Failure Competition Event

FDEP

Functional DEPendence

FDG

Functional Dependence Group

FLC

Fault Level Coverage

FT

Fault Tree

FTS

Fault Tolerant System

HS

Hierarchical System

HSP

Hot SPare

IC

Interface Chip

ICP

Infrastructure Communication Phase

IFG

Isolation Factor Group

i.i.d.

independent and identically distributed

IoT

Internet of Things

IPC

ImPerfect Coverage

IPCM

IPC Model

ite

if‐then‐else

ITE

Independent Trigger Event

LF

Local Failure

MC

Memory Chip

MFT

Multi‐state Fault Tree

MIPCM

Modular IPCM

MIU

Memory Interface Unit

MM

Memory Module

MMDD

Multi‐state Multi‐valued Decision Diagram

MRL

Mean Residual Life

MSS

Multi‐State System

MTBF

Mean Time Between Failures

MTTF

Mean Time To Failure

MTTR

Mean Time To Repair

NDC

NonDependent Component

OBDD

Ordered BDD

PAND

Priority AND

PCCE

Probabilistic Common‐Cause Event

PCCF

Probabilistic Common‐Cause Failure

PCCG

Probabilistic Common‐Cause Group

PDC

Performance Dependent Coverage

PDEP

Probabilistic‐DEPendent

pdf

probability density function

PDO

Phase Dependent Operation

PF

Propagated Failure

PFD

Probabilistic Functional Dependence

PFDC

Probabilistic Functional Dependence Case

PFGE

Propagated Failure with Global Effect

PFSE

Propagated Failure with Selective Effect

pmf

probability mass function

PMS

Phased‐Mission System

PTC

PorT Chip

RAP

Redundancy Allocation Problem

ROBDD

Reduced OBDD

r.v

.

random variable

SBDD

Sequential BDD

SEA

Simple and Efficient Algorithm

SEQ

SEQquence enforcing

SESP

Standby Element Sequencing Problem

SFT

Static Fault Tree

ttf

time to failure

UF

Uncovered Failure

u

‐function

universal generating function

WSN

Wireless Sensor Network

WSP

Warm SPare

1Introduction

The advances and interdisciplinary integration of science and technology are making modern engineering and computing systems more and more complex. For modern systems (especially those in, e.g. wireless sensor networks, Internet of Things (IoT), smart power systems, space explorations, and cloud computing industries), dynamic behavior and dependence are typical characteristics of the systems or products. System load, operating conditions, stress levels, redundancy levels, and other operating environment parameters are variables of time, causing dynamic failure behavior of the system components as well as dynamic system reliability requirements. In addition, components of these systems often have significant interactions or dependencies in time or functions. Effects of these dynamic and dependent behaviors must be addressed for accurate system reliability modeling and analysis, which is crucial for verifying whether a system satisfies desired reliability requirements and for determining optimal design and operation policies balancing different system parameters like cost and reliability. As a result, reliability modeling and analysis of modern dynamic systems become more challenging than ever.

Traditional reliability modeling methods, such as reliability block diagram [1] and fault tree analysis [2], can define the static logical structure of the system, but they lack the ability to describe dynamic state transfers of the system, and component fault dependencies and propagations. It is difficult or impossible to accurately reflect the actual behavior of modern complex fault‐tolerant systems using the traditional reliability models. In other words, failure to address effects of dynamic behavior and dependencies of modem systems makes the reliability analysis results obtained using the traditional reliability models far from the actual system reliability performance, misleading the system design, operation, and maintenance efforts.

Different from the traditional static reliability modeling, the dynamic reliability theory considers that a system failure depends not only on the static logical combination of basic component failure events, but also on the timing of the occurrence of the events, correlations or interrelationship of the events, and impacts of operating environments. Therefore, the dynamic system reliability theory can provide a more accurate representation of actual complex system behavior, more effectively guiding the reliable design of real‐world critical systems. The dynamic system reliability theory is the evolution and improvement of the traditional reliability modeling theory, and its research will promote the development and application of complex systems engineering.

This book focuses on dynamic reliability modeling of fault‐tolerant systems with imperfect fault coverage, functional dependence, deterministic or probabilistic common‐cause failures, deterministic or probabilistic competing failures, as well as standby sparing.

Specifically, imperfect fault coverage is an inherent behavior of fault‐tolerant systems designed with redundancies and automatic system recovery or reconfiguration mechanisms [3–5]. Just like any system component, the system recovery mechanisms involving fault detection, fault location, fault isolation, and fault recovery will likely not be perfect; they can fail such that the system cannot adequately detect, locate, isolate, or recover from a fault occurring in the system. The uncovered component fault may propagate through the system, causing an extensive damage to the system, sometimes failure of the entire system. Further, it is observed that the extent of the damage from an uncovered component fault occurring in a system with the hierarchical nature may exhibit multiple levels due to the layered recovery [6]. The traditional imperfect fault coverage concept has been extended to the modular imperfect fault coverage to model multiple levels of uncovered failure modes for components in hierarchical systems [7].

Functional dependence occurs in systems where the failure of one component (or, in general, the occurrence of a certain trigger event) causes other components (referred to as dependent components) within the same system to become unusable or inaccessible. A classic example is a computer network where computers can access the Internet through routers [8]. If the router fails, all computers connected to the router become inaccessible. It is said that these computers have functional dependence on the router.

In the case of systems with perfect fault coverage, the functional dependence behavior can be addressed as logic OR relationship [9]. However, for systems with imperfect fault coverage, the logic OR replacement method can lead to overestimation of system unreliability because it allows the disconnected dependent components (in the case of the trigger event occurring) to contribute to the system uncovered failure probability if they can fail uncovered. However, since these dependent components were disconnected or isolated, they could really not generate propagation effect or bring the system down [10]. New algorithms are required for addressing the coupled functional dependence and imperfect fault coverage behavior.

In addition to the imperfect fault coverage, common‐cause failures are another class of behavior that can contribute significantly to the overall system unreliability [11–13]. Common‐cause failures are defined as “A subset of dependent events in which two or more component fault states exist at the same time, or in a short time interval, and are direct results of a shared cause” [11] . Most of the traditional common‐cause failure models assumed the deterministic failure of the multiple components affected by the shared root cause. Recent studies extended the concept to model probabilistic common‐cause failures, where the occurrence of a root cause results in failures of multiple system components with different probabilities [14–16].

As one type of common‐cause failures, a propagated failure with global effect (PFGE) originating from a system component can cause the failure of the entire system [17]. Such a failure can occur due to the imperfect fault coverage or destructive effect of a component failure on other system components (like overheating, explosion, etc.). However, PFGE may not always cause the overall system failure in systems with functional dependence behavior. Specifically, if the trigger event occurs before PFGEs of all the dependent components, these PFGEs can be isolated deterministically and thus cannot affect other parts of the system. On the other hand, if PFGE of any dependent component occurs before the trigger event, the failure propagation effect takes place, crashing the entire system. Therefore, there exist competitions in the time domain between the failure isolation and failure propagation effects, causing distinct system statuses [18,19].

The pioneering works on addressing such competing failures in system with functional dependence have focused on deterministic competing failures, where the occurrence of the trigger event, as long as it happens first, can cause deterministic or certain isolation effect to any failures originating from the corresponding dependent components. Recent studies [20,21] have revealed that in some real‐world systems, e.g. systems involving relayed wireless communications, the failure isolation effect can be probabilistic or uncertain. Consider a specific example of a relay‐assisted wireless sensor network where some sensors preferably deliver their sensed information to the sink device through a relay node due to wireless signal attenuation. These sensors have functional dependence on the relay node. However, unlike in the deterministic competing failure case, when the relay fails, each sensor is not necessarily isolated because it may increase transmission power to be wirelessly connected to the sink device with certain probability dependent on the percentage of remaining energy. A sensor is isolated only when its remaining energy is not sufficient to enable the direct transmission to the sink node. Similarly, there exist time‐domain competitions between the probabilistic failure isolation effect and the failure propagation effect that can lead to dramatically different system statuses. The modeling of such probabilistic competing failures is naturally more complicated than modeling the deterministic competing failure behavior.

Another common dynamic behavior of modern systems, especially life or mission‐critical systems requiring fault‐tolerance and high‐level of system reliability, is standby sparing. In the standby sparing systems, one or several units are online and operating while some redundant units serve as standby spares, which are activated to resume the system mission in the case of the online unit malfunction occurring [3] . Components in the standby sparing systems often exhibit dynamic failure behaviors; they have different failure rates before and after they are activated to replace the failed online component [22–26].

The above described dynamic behaviors abound in real‐world systems, as detailed in case studies in subsequent chapters. Due to the existence of these dynamic behaviors, not only the system structure function is seriously affected, but also the system reliability modeling and analysis become more complicated. Ignoring the dynamic and dependence of failures, or simply performing system reliability analysis under the assumption that components behave independently of each other, often leads to excessive errors and even draws wrong conclusions. The following chapters present models and methods to address effects of the dynamic and dependent behaviors for different types of systems, covering binary‐state and multi‐state systems, single‐phase, and multi‐phase systems.

The traditional reliability models are mostly applicable to binary‐state systems in which both the system and its components assume two and only two states (operation and failure). However, many practical systems are multi‐state systems [27–30], such as those involving imperfect fault coverage, standby sparing, multiple failure modes [31], work sharing [32], load sharing [33], performance sharing [34,35], performance degradation, and limited repair resources [36]. In these systems, both the system and its components can exhibit multiple states or performance levels varying from perfect function to complete failure. The nonbinary property and dependencies among different states of the same component must be addressed in modeling a multi‐state system.

In addition to addressing effects of the dynamic behavior for reliability modeling and analysis of multi‐state systems, this book considers multi‐phase systems, also known as phased‐mission systems. Traditional system reliability models generally assume that a system under study performs a single phased mission, during which the system does not change its task and configuration [37]. Due to an increased use of automation in diverse industries such as airborne weapon systems, aerospace, nuclear power, and communication networks, phased‐mission systems have become a more appropriate and accurate model for many reliability problems since the 1970s [38,39]. These systems perform a mission that involves multiple and consecutive phases with possibly different durations. During each phase, the system has to accomplish a specified and often different task. In addition, the system can be subject to different stress levels, environmental conditions, and reliability requirements. Thus, the system configuration, success criteria (structure function), and component behavior may vary from phase to phase [ 13 ,40]. These dynamics as well as statistical dependence across different phases for a given component make reliability modeling and analysis of multi‐phase systems more difficult than single‐phase systems.

In summary, dynamic reliability models and methods are presented in this book to address effects of single‐level or multi‐level (modular) imperfect fault coverage, functional dependence, deterministic or probabilistic common‐cause failures, deterministic or probabilistic competing failures, standby sparing, multi‐state, and multi‐phase behaviors.

References

1 Rausand, M. and Hoyland, A. (2003).

System Reliability Theory: Models, Statistical Methods, and Applications

, 2e. Wiley Inter‐Science.

2 Dugan, J.B. and Doyle, S.A. (1996). New Results in Fault‐Tree Analysis. In:

Tutorial Notes of Annual Reliability and Maintainability Symposium

, Las Vegas, Nevada, USA.

3 Johnson, B.W. (1989).

Design and Analysis of Fault Tolerant Digital Systems

. Addison‐Wesley.

4 Arnold, T.F. (1973). The concept of coverage and its effect on the reliability model of a repairable system.

IEEE Transactions on Computers

C‐22: 325–339.

5 Dugan, J.B. (1989). Fault trees and imperfect coverage.

IEEE Transactions on Reliability

38 (2): 177–185.

6 Xing, L. and Dugan, J.B. (2001). Dependability analysis of hierarchical systems with modular imperfect coverage. In:

Proceedings of The 19th International System Safety Conference

, 347–356. Huntsville, AL.

7 Xing, L. (2005). Reliability modeling and analysis of complex hierarchical systems.

International Journal of Reliability, Quality and Safety Engineering

12 (6): 477–492.

8 Xing, L., Levitin, G., Wang, C., and Dai, Y. (2013). Reliability of systems subject to failures with dependent propagation effect.

IEEE Transactions Systems, Man, and Cybernetics: Systems

43 (2): 277–290.

9 Merle, G., Roussel, J.M., and Lesage, J.J. (2010). Improving the Efficiency of Dynamic Fault Tree Analysis by Considering Gates FDEP as Static. In:

Proceeding of European Safety and Reliability Conference

, Rhodes, Greece.

10 Xing, L., Morrissette, B.A., and Dugan, J.B. (2014). Combinatorial reliability analysis of imperfect coverage systems subject to functional dependence.

IEEE Transactions on Reliability

63 (1): 367–382.

11 NUREG/CR‐4780. (1988). Procedure for Treating Common‐Cause Failures in Safety and Reliability Studies.

U.S. Nuclear Regulatory Commission

; vol. I and II, Washington DC, USA.

12 Fleming, K.N., Mosleh, A., and Kelly, A.P. (1983). On the analysis of dependent failures in risk assessment and reliability evaluation.

Nuclear Safety

24: 637–657.

13 Xing, L. and Levitin, G. (2013). BDD‐based reliability evaluation of phased‐mission systems with internal/external common‐cause failures.

Reliability Engineering & System Safety

112: 145–153.

14 Xing, L. and Wang, W. (2008). Probabilistic common‐cause failures analysis. In:

Proceedings of the Annual Reliability and Maintainability Symposium, Las Vagas, Nevada

354–358.

15 Xing, L., Boddu, P., Sun, Y., and Wang, W. (2010). Reliability analysis of static and dynamic fault‐tolerant systems subject to probabilistic common‐cause failures.

Proc. IMechE, Part O: Journal of Risk and Reliability

224 (1): 43–53.

16 Wang, C., Xing, L., and Levitin, G. (2014). Explicit and implicit methods for probabilistic common‐cause failure analysis.

Reliability Engineering & System Safety

131: 175–184.

17 Xing, L. and Levitin, G. (2010). Combinatorial analysis of systems with competing failures subject to failure isolation and propagation effects.

Reliability Engineering & System Safety

95 (11): 1210–1215.

18 Xing, L., Wang, C., and Levitin, G. (2012). Competing failure analysis in non‐repairable binary systems subject to functional dependence.

Proc IMechE, Part O: Journal of Risk and Reliability

226 (4): 406–416.

19 Wang, C., Xing, L., and Levitin, G. (2012). Competing failure analysis in phased‐mission systems with functional dependence in one of phases.

Reliability Engineering & System Safety

108: 90–99.

20 Wang, Y., Xing, L., Wang, H., and Levitin, G. (2015). Combinatorial analysis of body sensor networks subject to probabilistic competing failures.

Reliability Engineering & System Safety

142: 388–398.

21 Wang, Y., Xing, L., and Wang, H. (2017). Reliability of systems subject to competing failure propagation and probabilistic failure isolation.

International Journal of Systems Science: Operations & Logistics

4 (3): 241–259.

22 Xing, L., Tannous, O., and Dugan, J.B. (2012). Reliability analysis of non‐repairable cold‐standby systems using sequential binary decision diagrams.

IEEE Transactions on Systems, Man, and Cybernetics, Part A: Systems and Humans

42 (3): 715–726.

23 Zhai, Q., Peng, R., Xing, L., and Yang, J. (2013). BDD‐based reliability evaluation of k‐out‐of‐(n+k) warm standby systems subject to fault‐level coverage.

Proc IMechE, Part O, Journal of Risk and Reliability

227 (5): 540–548.

24 Levitin, G., Xing, L., and Dai, Y. (2013). Optimal sequencing of warm standby elements.

Computers & Industrial Engineering

65 (4): 570–576.

25 Levitin, G., Xing, L., and Dai, Y. (2014). Cold vs. hot standby mission operation cost minimization for 1‐out‐of‐N systems.

European Journal of Operational Research

234 (1): 155–162.

26 Levitin, G., Xing, L., and Dai, Y. (2014). Mission cost and reliability of 1‐out‐of‐N warm standby systems with imperfect switching mechanisms.

IEEE Transactions on Systems, Man, and Cybernetics: Systems

44 (9): 1262–1271.

27 Zang, X., Wang, D., Sun, H., and Trivedi, K.S. (2003). A BDD‐based algorithm for analysis of multistate systems with multistate components.

IEEE Transactions on Computers

52 (12): 1608–1618.

28 Caldarola, L. (1980). Coherent systems with multistate components.

Nuclear Engineering and Design

58 (1): 127–139.

29 Xing, L. and Dai, Y. (2009). A new decision diagram based method for efficient analysis on multi‐state systems.

IEEE Transactions on Dependable and Secure Computing

6 (3): 161–174.

30 Lisnianski, A. and Levitin, G. (2003).

Multi‐state System Reliability: Assessment, Optimization and Applications

. World Scientific.

31 Mo, Y., Xing, L., and Dugan, J.B. (2014). MDD‐based method for efficient analysis on phased‐mission systems with multimode failures.

IEEE Transactions on Systems, Man, and Cybernetics: Systems