131,99 €
Root Cause Failure Analysis Provides the knowledge and failure analysis skills necessary for preventing and investigating process equipment failures Process equipment and piping systems are essential for plant availability and performance. Regularly exposed to hazardous service conditions and damage mechanisms, these critical plant assets can result in major failures if not effectively monitored and assessed--potentially causing serious injuries and significant business losses. When used proactively, Root Cause Failure Analysis (RCFA) helps reliability engineers inspect the process equipment and piping system before any abnormal conditions occur. RCFA is equally important after a failure happens: it determines the impact of a failure, helps control the resultant damage, and identifies the steps for preventing future problems. Root Cause Failure Analysis: A Guide to Improve Plant Reliability offers readers clear understanding of degradation mechanisms of process equipment and the concepts needed to perform industrial RCFA investigations. This comprehensive resource describes the methodology of RCFA and provides multiple techniques and industry practices for identifying, predicting, and evaluating equipment failures. Divided into two parts, the text first introduces Root Cause Analysis, explains the failure analysis process, and discusses the management of both human and latent error. The second part focuses on failure analysis of various components such as bolted joints, mechanical seals, steam traps, gearboxes, bearings, couplings, pumps, and compressors. This authoritative volume: * Illustrates how failures are associated with part integrity, a complete system, or the execution of an engineering process * Describes how proper design, operation, and maintenance of the equipment help to enhance their reliability * Covers analysis techniques and industry practices including 5-Why RCFA, fault tree analysis, Pareto charts, and Ishikawa diagrams * Features a detailed case study of process plant machinery and a chapter on proactive measures for avoiding failures Bridging the gap between engineering education and practical application, Root Cause Failure Analysis: A Guide to Improve Plant Reliability is an important reference and guide for industrial professionals, including process plant engineers, planning managers, operation and maintenance engineers, process designers, chemical engineers, and instrument engineers. It is also a valuable text for researchers, instructors, and students in relevant areas of engineering and science.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 667
Veröffentlichungsjahr: 2021
Cover
Title Page
Copyright Page
Preface
About the Author
Acknowledgment
Dedication Page
1 FAILURE: How to Understand It, Learn from It and Recover from It
Failure Type
Benefits of Failure Analysis
Conclusion
2 What Is Root Cause Analysis
The Causes of TITANIC disaster
What Is Root Cause Analysis?
Top Reasons Why We Need to Perform RCFA
Conclusion
3 Root Cause Analysis Process
What is root cause analysis
Define the problem
Collection of data
Analyze Sequence of Events
Design Review
The Five Whys
Fishbone Diagram
Fault Tree Analysis
Identify the Root Cause
Recommend and Implement Solution
Conclusion
4 Managing Human Error and Latent Error to Overcome Failure
Review of Some of the Accidents
Types of Human Failure:
The Prevention of Human Error
Ways to Reduce Human Error
Conclusion
5 Metallurgical Failure
Understanding the Basics
Brittle vs. Ductile Fracture characteristics
Example: Failure of a Pipe
Stages in Ductile Fracture
Brittle Fracture Characteristics
Origin of Fractures (Ductile and Brittle)
Fatigue Failures
Stress Concentration
Stress Corrosion Cracking
Hydrogen Damage
Failure Investigation
Collection of Background Data and Samples
Data Analysis, Conclusions and Report
Conclusion
6 Pipe Failure
Classification of Failure Mechanisms
Causes of Premature Failures in Piping
How to Overcome Piping System Failure
How to Mitigate Corrosion
How to Reduce the Risk of Water Hammer
Inspection and Maintenance Plan to Avoid Failure
Understand the Ways in Which the Piping Can Fail
Conclusion
7 Failure of Flanged Joint
Creating the Seal
Forces Acting on a Gasket Joint
Integrity of the Bolted Flange Connections
Protection of Bolts
Gasket Reliability
Failure Related to Flange
Problems with Installation
Conclusion
8 Failure of Coupling
Flanged Rigid Couplings
Ribbed Rigid Couplings
Sleeve Rigid Couplings
Quill Shaft Rigid Couplings
Flexible Coupling
How General‐Purpose Couplings Work
Special‐Purpose Couplings
Coupling Selection for Reliability
Coupling Fit
Cause of Coupling Failure
Proper System Maintenance
Special‐Purpose Coupling Failure Mode
Conclusion
9 Bearing Failure
Anti‐Friction Bearings
AntiFriction Bearing – Type, Selection, and Failure Mode
Bearing Basics
Bearing Selection Process
Bearing Service Life
Bearing Tolerances, Fits, and Clearances
Sealing Devices
Journal Bearing
Journal Bearing Failure Mechanisms
Conclusion
10 Mechanical Seals Failure
Type of Mechanical Seal
Application
Cooling System and API Plans
Seal Installation, What to See
Conclusion
11 Centrifugal Pump Failure
Pump Failure Causes
Failure Due to Operational Reasons
Other Mechanical Consideration
Bearing Failure
Seal Failure
Conclusion
12 Reciprocating Pumps Failure
Working Principle
Power Pump Operation and Construction
Conclusion
13 Centrifugal Compressor Failure
Characteristics of Centrifugal Compressor
Major Components of a Centrifugal Compressor
Conclusion
14 Reciprocating Compressor Failure
Major Components
Reciprocating Compressor Failure Causes
Pressure Packing Failure
Piston Ring/Rider Ring Wear
Process Related Problems in Reciprocating Compressor
Maintenance Related Issues
Machine Monitoring
Conclusion
15 Lubrication Related Failure in Machinery
Introduction
Lubrication Related Failure in Sump and Circulating System of Turbomachinery
Lubrication Related Failure Specific to Reciprocating Compressor
Lubrication Program Management
Important Points to Be Considered for Developing an Effective Lubrication Program
Understanding Oil Analysis: How It Can Improve Reliability
Conclusion
16 Steam Traps Failure
Thermodynamic Trap
Float Trap
Inverted Bucket Trap
Thermostatic Traps
Thermostatic Metallic‐Expansion Trap
Balanced‐Pressure Thermostatic Trap
Bimetallic Trap
Selection Criteria
Common Problems of Steam Traps
Maintenance of Steam Traps
Conclusion
17 Proactive Measures to Avoid Failure
What Are Proactive Maintenance Tasks
Evolution of Different Type of Maintenance
Condition Monitoring Technologies
Proactive Inspection Program for Static Equipments
Condition Monitoring and The Internet of Things
Conclusion
Index
End User License Agreement
Chapter 4
Table 4.1 Industrial accidents caused by human error.
Chapter 9
Table 9.1 Max DN value of bearing.
Chapter 14
Table 14.1 Percentage failure due to separate compressor components.and proce...
Chapter 2
Figure 2.1 Events leading to compressor failure.
Chapter 4
Figure 4.1 Contributing factors to human error.
Chapter 5
Figure 5.1 Stress–strain diagram of a medium‐carbon structural steel.
Figure 5.2 Stress–strain curve of brittle and ductile material.
Figure 5.3 Ductile vs brittle fracture. (a) Very ductile, soft metals (e.g.,...
Figure 5.4 Different stages before ductile fracture. (a) Necking (b) formati...
Figure 5.5 Cup and cone fracture in Al.
Figure 5.6 Brittle fracture in a mild steel.
Figure 5.7 Ductile failure: ‐one piece ‐large deformation (after some amount...
Figure 5.8 Brittle failure: ‐many pieces ‐small deformation, (even when the ...
Figure 5.9 Schematic representation of the fatigue crack (three stage) pheno...
Figure 5.10 Stress concentration at corners.
Figure 5.11 Stress corrosion on a bar.
Figure 5.12 Stress corrosion cracking.
Chapter 7
Figure 7.1 Forces acting on a gasket.
Figure 7.2 Tensile stress strain diagram of fastener.
Figure 7.3 Use of sleeve around bolt.
Figure 7.4 Use of conical washer around bolt.
Figure 7.5 Gap in flange.
Figure 7.6 Flange cocked.
Figure 7.7 Mis‐aligned flange.
Figure 7.8 Out of parallel flange.
Figure 7.9 Wrong surface finish of flange.
Figure 7.10 Bolting sequence of flange.
Chapter 8
Figure 8.1 Flanged rigid coupling.
Figure 8.2 Cutaway view (DBC‐ Bolt circle dia.)
Figure 8.3 Ribbed rigid coupling.
Figure 8.4 Sleeve rigid coupling.
Figure 8.5 Quill shaft rigid coupling.
Figure 8.6 Gear coupling.
Figure 8.7 Grid coupling.
Figure 8.8 Disc coupling.
Figure 8.9 Pin bush type coupling.
Figure 8.10 Jaw coupling.
Figure 8.11 Corded tire couplings.
Figure 8.12 Disc couplings.
Figure 8.13 Diaphragm type couplings.
Figure 8.14 Hydraulic hub installation tool.
Figure 8.15 Couplings tire failure.
Figure 8.16 Improperly fitted key.
Figure 8.17 Gear teeth worn from excessive misalignment.
Figure 8.18 Disc pack coupling.
Figure 8.19 Disc pack failure.
Figure 8.20 Bolt damage.
Figure 8.21 Excessive angular misalignment and axial movement diaphragm fail...
Chapter 9
Figure 9.1 Type of antifriction bearings.
Figure 9.2 Bearing internal clearance.
Figure 9.3 Mounting force applied to the wrong ring.
Figure 9.4 Advanced spalling due to subsurface initiated fatigue of the mate...
Figure 9.5 Spalling (surface distress) caused by ineffective lubrication.
Figure 9.6 Moisture acids in a spherical roller bearing
– Moisture corrosion
...
Figure 9.7 Water in bearing.
Figure 9.8 False brinelling.
Figure 9.9 Fatigue fracture of the outer ring flange in a double row full co...
Figure 9.10 Typical plain journal bearing.
Figure 9.11 Pressure profile in a journal bearing.
Figure 9.12 Stribeck curve relating friction factor to viscosity, speed, and...
Figure 9.13 Sleeve journal bearing.
Figure 9.14 Two‐lobe lemon‐shaped sleeve bearing with pressure‐dam.
Figure 9.15 Tilting pad journal bearings.
Figure 9.16 Bearing melting.
Figure 9.17 Bearing damage due to abnormal load.
Figure 9.18 Regions of babbitt material creep in a hot‐running region of a c...
Figure 9.19 Severe wiping on a thrust shoe (circumferential scratching, narr...
Chapter 10
Figure 10.1 Cross section view of centrifugal pump.
Figure 10.2 Cross section view of Mechanical seal.
Figure 10.3 Leakage path in seal.
Figure 10.4 Components of Mechanical seal.
Figure 10.5 Pusher seal.
Figure 10.6 Cross section view of pusher seal.
Figure 10.7 High‐speed stationary welded metal bellow seal schematic.
Figure 10.8 API seal Plan 11 (Recirculation from the pump discharge through ...
Figure 10.9 H–Q curve of a centrifugal pump.
Figure 10.10 Balance pressure mechanical seal schematic drawing.
Figure 10.11 Operating envelope for a contacting seal.
Figure 10.12 Radial force and total dynamic head vs. capacity.
Figure 10.13 Shaft length – L3/D4.
Figure 10.14 Critical pump dimensions.
Figure 10.15 Critical pump features.
Figure 10.16 Concentric runout.
Figure 10.17 Perpendicular runout.
Figure 10.18 Shaft end float.
Chapter 11
Figure 11.1 Cross section view of centrifugal pump.
Figure 11.2 Pump failure causes.
Figure 11.3 Centrifugal pump component damage and causes as a function of op...
Figure 11.4 Recirculation flow pattern in impeller at low flows.
Figure 11.5 The pressure profile across a typical pump at a fixed flow condi...
Figure 11.6 Vortex breaker in cooling tower pump inlet.
Figure 11.7 Bearing cross section view.
Figure 11.8 Oil bath lubrication showing a typical oil level.
Chapter 12
Figure 12.1 Schematic view of reciprocating pump.
Figure 12.2 Schematic view of reciprocating pump parts.
Figure 12.3 The five wave forms with the five pressure variations.
Figure 12.4 Cross‐sectional view of a dampener.
Figure 12.5 Correct piping design.
Figure 12.6 Reciprocating pump stuffing box (nonlubricated) showing pressure...
Figure 12.7 Reciprocating pump disk valve.
Figure 12.8 The structure and wear pattern of the valve is an important piec...
Chapter 13
Figure 13.1 Process diagram for compressor.
Figure 13.2 Parts of centrifugal compressor.
Figure 13.3 Parts of compressor rotor.
Figure 13.4 Impeller.
Figure 13.5 Cross section view of horizontal split compressor.
Figure 13.6 Barrel type casing.
Figure 13.7 Compressor surge diagram.
Figure 13.8 Change of speed affect surge.
Figure 13.9 Process schematic diagram – power turbine/gas compressor lubrica...
Chapter 14
Figure 14.1 Compressor components.
Figure 14.2 Schematic view of piston rod packing.
Figure 14.3 Typical compressor rod packing system positioning in the compres...
Figure 14.4 Forces acting on a compressor piston.
Figure 14.5 Rod load diagram of a reciprocating compressor.
Figure 14.6 Poppet valve.(take drawings to below as marked)
Figure 14.7 Ring valve.(take drawings to below as marked)
Figure 14.8 Plate valve.(take drawings to below as marked)
Figure 14.9 Cut‐away view of packing case.
Figure 14.10 Cross section view of packing over piston rod.
Figure 14.11 Piston ring and Rider ring mounted in the piston.
Figure 14.12 Web deflection measurement.
Figure 14.13 Rod drop measurement.
Figure 14.14
P–V
diagram of a reciprocating compressor.
Figure 14.15
P–V
diagram.
Figure 14.16 Plot of compression efficiency versus compression ratio.
Figure 14.17 (a)–(c) are
P–V
diagrams showing the changes that result ...
Figure 14.18 Plot of power versus suction pressure with constant discharge p...
Figure 14.19 Plot of capacity versus suction pressure with constant discharg...
Chapter 15
Figure 15.1 Moisture in oil its detection and its effect.
Figure 15.2 Oil life varies with base oil type and temperature.
Figure 15.3 Sampling port locations on a lube oil system that feeds three se...
Figure 15.4 Dispensing equipments for lubricants.
Figure 15.5 Lubricant pre filtration pump.
Figure 15.6 Sealable and cleanable oil‐handling containers with colour codin...
Chapter 16
Figure 16.1 Thermodynamic trap.
Figure 16.2 Float steam trap.
Figure 16.3 Inverted bucket trap.
Figure 16.4 Metallic‐expansion trap.
Figure 16.5 Balanced‐pressure thermostatic trap.
Figure 16.6 Bimetallic steam trap.
Chapter 17
Figure 17.1 Typical risk plot “total risk vs. quantity of equipment.”
Cover Page
Title Page
Copyright Page
Preface
About the Author
Acknowledgment
Dedication Page
Table of Contents
Begin Reading
Index
Wiley End User License Agreement
iii
iv
vii
ix
xi
1
3
4
5
6
7
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
Dr. Trinath Sahoo
This edition first published 2021© 2021 by John Wiley & Sons, Inc. All rights reserved.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by law. Advice on how to obtain permission to reuse material from this title is available at http://www.wiley.com/go/permissions.
The right of Trinath Sahoo to be identified as the author of this work has been asserted in accordance with law.
Registered OfficeJohn Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, USA
Editorial Office111 River Street, Hoboken, NJ 07030, USA
For details of our global editorial offices, customer services, and more information about Wiley products visit us at www.wiley.com.
Wiley also publishes its books in a variety of electronic formats and by print‐on‐demand. Some content that appears in standard print versions of this book may not be available in other formats.
Limit of Liability/Disclaimer of WarrantyIn view of ongoing research, equipment modifications, changes in governmental regulations, and the constant flow of information relating to the use of experimental reagents, equipment, and devices, the reader is urged to review and evaluate the information provided in the package insert or instructions for each chemical, piece of equipment, reagent, or device for, among other things, any changes in the instructions or indication of usage and for added warnings and precautions. While the publisher and authors have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this work and specifically disclaim all warranties, including without limitation any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives, written sales materials, or promotional statements for this work. This work is sold with the understanding that the publisher is not engaged in rendering professional services. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. The fact that an organization, website, or product is referred to in this work as a citation and/or potential source of further information does not mean that the publisher and authors endorse the information or services the organization, website, or product may provide or recommendations it may make. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
Library of Congress Cataloging‐in‐Publication Data
Names: Sahoo, Trinath, author.Title: Root cause failure analysis : a guide to improve plant reliability / Trinath Sahoo.Description: Hoboken, New Jersey : Wiley, 2021. | Includes bibliographical references and index.Identifiers: LCCN 2020053092 (print) | LCCN 2020053093 (ebook) | ISBN 9781119615545 (hardback) | ISBN 9781119615590 (adobe pdf) | ISBN 9781119615613 (epub)Subjects: LCSH: Root cause analysis. | Piping. | Industrial equipment.Classification: LCC TA169.55.R66 S25 2021 (print) | LCC TA169.55.R66 (ebook) | DDC 658.2–dc23LC record available at https://lccn.loc.gov/2020053092LC ebook record available at https://lccn.loc.gov/2020053093
Cover Design: WileyCover Images: © ch123/Shutterstock, Yakov Oskanov/Shutterstock
Process industries are home to a huge number of machines, piping, structures, most of them critical to the industry’s mission. Failure of these items can cause loss of life, unscheduled shutdowns, increased maintenance and repair costs, and damaging litigation disputes. Experience shows that all too often, process machinery problems are never defined sufficiently; they are merely “solved” to “get back on stream.” Production pressures often override the need to analyze a situation thoroughly, and the problem and its underlying cause come back and haunt us later. Equipment downtime and component failure risk can be reduced only if potential problems are anticipated and avoided. To prevent future recurrence of the problem, it is essential to carry out an investigation aimed at detecting the root cause of failure.
The ability to identify this weakest link and propose remedial measures is the key for a successful failure analysis investigation. This requires a multidisciplinary approach, which forms the basis of this book. The results of the investigation can also be used as the basis for insurance claims, for marketing purposes, and to develop new materials or improve the properties of existing ones.
The objective of this book is to help anyone involved with machinery reliability, be it in the design of new plants or the maintenance and operation of existing ones, to understand why the process machine fails, so some preventive measures can be taken to avoid another failure of the same kind.
An important feature of this book is that it not only demonstrates the methodology for conducting a successful failure analysis investigation, but also provides the necessary background.
The book is divided in two parts:
The first part discusses the benefit of failure analysis, including some definitions and examples. Here, we examine the failure analysis procedure, including some approaches suitable for different types of problems. We also look at how plant‐wide failure prevention efforts should be conducted, including a discussion about the importance of the role of the top management in the prevention of failure.
In the second part, different types of failure mechanisms that affect process equipment are discussed with several examples of bearings, seals, and other components’ failures.
Because it is simply impossible to deal with every conceivable type of failure, this book is structured to teach failure identification and analysis methods that can be applied to virtually all problem situations that might arise.
Trinath Sahoo
Trinath Sahoo, Ph.D., is the chief general manager at M/S Indian Oil Corporation Ltd. Dr. Sahoo has 30 years of experience in various fields such as engineering design, project management, asset management, maintenance management, lubrication, and reliability. He has published many papers in journals like Hydrocarbon Processing, Chemical Engineering, Chemical Engineering Progress, and World Pumps. Some of his articles were adjudged best articles and published as the cover page story in the magazines. He has also spoken in many international conferences. He was the convener for reliability enhancement projects for different refinery and petrochemical sites of M/S Indian Oil Corporation Ltd. Dr. Sahoo is the author of bestselling book Process Plants: Shutdown and Turnaround Management. He holds a Ph.D. degree from Indian Institute of Technology (ISM), Dhanbad, Jharkhand, India.
First and foremost, I would like to thank God, the Almighty, for His showers of blessings throughout to complete the book successfully. In the process of putting this book together, I realized how true this gift of writing is for me. You have given me the power to believe in my passion and pursue my dreams. I could never have done this without the faith I have in you, the Almighty.
I have to thank my parents for their love and support throughout my life. Thank you both for giving me strength to reach for the stars and chase my dreams.
For my wife Chinoo, all the good that comes from this book I look forward to sharing with you! Thanks for not just believing, but knowing that I could do this! I Love You Always and Forever!
To my children Sonu and Soha: You may outgrow my lap, but you will never outgrow my heart. Your growth provides a constant source of joy and pride to me and helped me to complete the book.
Without the experiences and support from my peers and team at Indian Oil, this book would not exist. You have given me the opportunity to lead a great group of individuals.
“Thanks to everyone on my publishing team.”
Only those who dare to fail greatly can ever achieve greatly.
Robert F. Kennedy.
Failure and fault are virtually inseparable in households, organizations, and cultures. But the wisdom of learning from failure is much more than from success. Many a time we discover what works well, by finding out what will not work; and “probably he who have never made a mistake never made a discovery.”
Thomas Edison’s associate, Walter S. Mallory, while discussing inventions, once said to him, “Isn’t it a shame that with the tremendous amount of work you have done you haven’t been able to get any results?” Edison replied, with a smile, “Results! Why, my dear, I have gotten a lot of results! I know several thousand things that won’t work.”
People see success as positive and failure as negative phenomena. Edison’s quote emphasizes that failure isn’t a bad thing. You can learn and evolve from your past mistakes. But in organizations executives believe that failure is bad. These widely held beliefs are misguided. Understanding of failure’s causes and contexts will help to avoid the blame game and create an atmosphere of learning in the organization. Failure may sometimes considered bad, sometimes inevitable, and sometimes even good in organizations. In most companies, the system and procedures required to effectively detect and analyze failures are in short supply. Even the context‐specific learning strategies are not appreciated many times. In many organizations, managers often want to learn from failures to improve future performance. In the process, they and their teams used to devote many hours in after‐action reviews, post‐mortems, etc. But time after time these painstaking efforts led to no real change. The reason: being, managers think about failure in a wrong way.
To be able to learn from our failures, we need to develop a methodology to decode the “teachable moments” hidden within them. We need to find out what exactly those lessons are and how they can improve our chances of future success.
Although an infinite number of things can go wrong in machinery, systems, and process, mistakes fall into three broad categories: preventable failure, failure in complex system, and intelligent failure.
Most failures in this category are considered as “bad.” These could have been foreseen but weren’t. This is the worst kind of failure, and it usually occurs because an employee didn’t follow best practices, didn’t have the right talent, or didn’t pay attention to detail. They usually deviate from specification in the closely defined processes or deviate from routine operations and maintenance practices. But in such cases, the causes can be readily identified and solutions can be developed.
If you’ve experienced a preventable failure, it’s time to more deeply analyze the effort’s weaknesses and stick to what works in future. Employees can follow those new processes learned from past mistakes consistently, with proper training and support.
Human error used to be an area that was associated with high‐risk industries like aviation, rail, petrochemical and the nuclear industry. The high consequences of failure in these industries meant that there was a real obligation on companies to try to reduce the likelihood of all failure causes. Human error is also a high‐priority, preventable issue.
In complex organizations such as aircraft carriers, nuclear power plants, and petrochemical plants, system failure is a perpetual risk. A large number of failures are due to the inherent uncertainty of working of such systems.
The lesson from this type of failure is to create systems to try to spot small failures resulting from complex factors, and take corrective action before it snowballs and destroys the whole system. These type of failure may not be considered bad but reviewed how complex systems work. Most accidents in these systems result from a series of small failures that went unnoticed and unfortunately lined up in just the wrong way.
The complex systems are heavily and successfully defended against failure by construction of multiple layers of defense against failure. These defenses include obvious technical components (e.g. backup systems, “safety” features of equipment) and human components (e.g. training, knowledge) but also a variety of organizational, institutional, and regulatory defenses (e.g. policies and procedures, certification, work rules, team training). The effect of these measures is to provide a series of shields that normally divert operations away from accidents.
Intelligent failures occur when answers are not known in advance because this exact situation hasn’t been encountered before and experimentation is necessary in these cases. For example testing a prototype, designing a new type of machinery or operating a machine in different operating condition. In these settings, “trial and error” is the common term used for the kind of experimentation needed. These type of failures can be considered “good,” because they provide valuable insight and new knowledge that can help an organization to learn from past mistakes for its future growth. The lesson here is clear: If something works, do more of it. If it doesn’t, go back to the drawing board
Leaders can create and reinforce a culture that makes people feel comfortable for surfacing and learning from failures to avoid blame game. When things go wrong, they should insist to find out what happened – rather than “who did it.” This requires consistently reporting failures, small, and large; systematically analyzing them; and proactively taking steps to avoid reoccurrence.
Most organizations engage in all three kinds of work discussed above – routine, complex, and intelligent. Leaders must ensure that the right approach to learning from failure is applied in each of them. All organizations learn from failure through following essential activities: detection, analysis, learning, and sharing.
Spotting big, painful, expensive failures are easy. But failure that are hidden are hidden as long as it’s unlikely to cause immediate or obvious harm. The goal should be to surface it early, before it can create disaster when accompanied by other lapses in the system. High‐reliability‐organization (HRO) helps prevent catastrophic failures in complex systems like nuclear power plants, aircraft through early detection.
In a big petrochemical plant, the top management is religiously interested to tracks each plant for anything even slightly out of the ordinary, immediately investigates whatever turns up, and informs all its other plants of any anomalies. But many a time, these methods are not widely employed because senior executives – remain reluctant to convey bad news to bosses and colleagues.
Most people avoid analyzing the failure altogether because many a time it is emotionally unpleasant and can chip away at our self‐esteem. Another reason is that analyzing organizational failures requires inquiry and openness, patience, and a tolerance for causal ambiguity. Hence, managers should be rewarded for thoughtful reflection. That is why the right culture can percolate in the organization.
Once a failure has been detected, it’s essential to find out the root causes not just relying on the obvious and superficial reasons. This requires the discipline to use sophisticated analysis to ensure that the right lessons are learned and the right remedies are employed. Engineers need to see that their organizations don’t just move on after a failure but stop to dig in and discover the wisdom contained in it.
A team of leading physicists, engineers, aviation experts, naval leaders, and even astronauts devoted months to an analysis of the Columbia disaster. They conclusively established not only the first‐order cause – a piece of foam had hit the shuttle’s leading edge during launch – but also second‐order causes: A rigid hierarchy and schedule‐obsessed culture at NASA made it especially difficult for engineers to speak up about anything but the most rock‐solid concerns.
Motivating people to go beyond first‐order reasons (procedures weren’t followed) to understanding the second‐ and third‐order reasons can be a major challenge. One way to do this is to use interdisciplinary teams with diverse skills and perspectives. Complex failures in particular are the result of multiple events that occurred in different departments or disciplines or at different levels of the organization. Understanding what happened and how to prevent it from happening again requires detailed, team‐based discussion, and analysis.
Here are some common root causes and their corresponding corrective actions:
Design deficiency caused failure → Revisit in‐service loads and environmental effects, modify design appropriately.
Manufacturing defect caused failure → Revisit manufacturing processes (e.g. casting, forging, machining, heat treat, coating, assembly) to ensure design requirements are met.
Material defect caused failure → Implement raw material quality control plan.
Misuse or abuse caused failure → Educate user in proper installation, use, care, and maintenance.
Useful life exceeded → Educate user in proper overhaul/replacement intervals.
There are various methods that failure analysts use – for example, Ishikawa “fishbone” diagrams, failure modes and effects analysis (FMEA), or fault tree analysis (FTA). Methods vary in approach, but all seek to determine the root cause of failure by looking at the characteristics and clues left behind.
Once the root cause of the failure has been determined, it is possible to develop a corrective action plan to prevent recurrence of the same failure mode. Understanding what caused one failure may allow us to improve upon our design process, manufacturing processes, material properties, or actual service conditions. This valuable insight may allow us to foresee and avoid potential problems before they occur in the future.
Failure is less painful when you extract the maximum value from it. If you learn from each mistake, large and small, share those lessons, and periodically check that these processes are helping your organization move more efficiently in the right direction, your return on failure will skyrocket. While it’s useful to reflect on individual failures, the real payoff comes when you spread the lessons across the organization. As one executive commented, “You need to build a review cycle where this is fed into a broader conversation.” When the information, ideas, and opportunities for improvement gained from an failure incident are passed on to another, their benefits are magnified. The information on root cause failure analysis should be made available to others in the organization so that they can learn too.
The best way to get risk‐averse managers and employees to learn to accept higher risks and their associated failures are to educate them on the many positive aspects and benefits of failure. Some of those many benefits include:
Failure tells you what to stop doing
– Obviously, failure reveals what doesn’t work, so you can avoid using similar unmodified approaches in the future. And over time, by continually eliminating failure factors, you obviously increase the probability of future success.
Failure is the best teacher
– Failure is only valuable if you use it to identify what worked and what didn’t work and to use that information to minimize future failures. In the corporate and engineering worlds, learning from failure starts with failure analysis. This is a process that helps you identify specifically what failed and then to understand the “root causes” of that failure (i.e. critical failure factors). But since failure and success factors are often closely related, the identification of the failure factors will likely aid you in identifying the critical success factors that cause an approach to succeed. The famous auto innovator Henry Ford revealed his understanding of learning from failure in this quote: “The only real mistake is the one from which we learn nothing.”
A failure factor in one area may apply to another area
– Failure analysis tells you what failed and why. But the best corporations develop processes that “spread the word” and warn others in your organization about what clearly doesn’t work so that others don’t need to learn the hard way. On the positive side, lessons learned from both successes and failures in one discipline may be able to be applied to another discipline or functional area.
Experience builds your capability to handle future major failures
– When a major failure does occur, your “rusty” employees and your out of date processes simply won’t be able to handle it. Both the military and healthcare managers have proven that the more often you train for and work through actual major failures, the better prepared you will be when an unplanned failure occurs in the future.
Many companies and organizations have been on the reliability journey for a number of years. There are many elements of a solid reliability program – establishing a reliability‐centered culture, tracking key metrics, bad actor elimination programs and establishing equipment reliability plans – to name a few. But, one key element to a solid reliability program, and one that is very important to improving unit reliability metrics, is root cause failure analysis (RCFA). One of the interesting benefits of organizations that have fully embraced the RCFA work process across the entire organization is that over time the RCFA methodology starts to impact how people approach everyday problems – it becomes how they think about even the smallest failure, problems, or defects. Now the organization starts to evolve into a culture that does not accept failure and provides a mindset to help eliminate failures across the organization.
It is not uncommon to see industries caught in the vicious cycle of failure, repair, blame, failure, repair, blame, etc. When there is premature failure of equipment, people involved often asked the question, whose fault it is. Many a time you will get the answer “it is other guy’s fault.”
If one were to ask a operator why the equipment fail, the immediate answer will be it was the fault of maintenance mechanic who had not fixed it properly. In the same line, a maintenance mechanic likely answer to that question would be “operator error.” At times, there is some validity to both these answers, but the honest and complete answer is much more complex. This chapter briefly introduces the concepts of failure analysis, root cause analysis, and the role of failure analysis as a general engineering tool for enhancing failure prevention.
Failure analysis is a process that is performed in order to determine the causes that may have attributed to the loss of functionality. These defects may come from a deficient design, poor material, mistakes in manufacturing or wrong operation and maintenance. Many a time there is no single cause and no single train of events that lead to a failure. Rather, there are factors that combine at a particular time to allow a failure to occur. Failure analysis involves a logical sequence of steps that lead the investigator through identifying the root causes of faults or problems.
Look at any well‐studied major disaster and ask if there was only one cause. Was there only one cause for the TITANIC? Three Mile Island? The Exxon Valdez mess? Bhopal? Chernobyl? It would be nice if there were only one cause per failure, because correcting the problem would then be easy. However, in reality, there are multiple causes to every equipment failure. Let us take the case of TITANIC failure.
The TITANIC passengers included some of the wealthiest and most prestigious people at that time. Captain Edward John Smith, one of the most experienced shipmasters on the Atlantic, was navigating the TITANIC. On the night of 14 April, although the wireless operators had received several ice warnings from others ships in the area, the TITANIC continued to rush through the darkness at nearly full steam. Suddenly, the captain spotted a massive iceberg less than a quarter of a mile off the bow of the ship. Immediately, the engines were thrown into reverse and the rudder turned hard left. Because of the tremendous mass of the ship, slowing and turning took an incredible distance, more than that available. Without enough distance to alter her course, the TITANIC sideswiped the iceberg, damaging nearly 300 feet of the right side of the hull above and below the waterline.
The two official investigations back in 1912 started with a conclusion – the TITANIC hit an iceberg and sank. They made somewhat of an attempt to answer why that happened without attaching too much blame. The result was not so much as getting to the root cause but found out the immediate cause.
Richard Corfield writes in a Physics World retrospective on the disaster that caused 1514 deaths on 14–15 April 1912. He described it was an event cascade followed by a perfect storm of circumstances conspired the TITANIC to fail. The iceberg that the TITANIC struck on its way from Southampton to New York is No. 1 on a top‐9 list of circumstances. Here are eight other suggested circumstances from Richard Corfield's article and other sources:
Climate caused more icebergs:
Weather conditions in the North Atlantic were particularly conducive for corralling icebergs at the intersection of the Labrador Current and the Gulf Stream, due to warmer‐than‐usual waters in the Gulf Stream. As a result, there were icebergs and sea ice concentrated in the very position where the collision happened
The iron rivets were too weak
: Metallurgists Tim Foecke and Jennifer Hooper McCarty looked into the materials used for the building of the TITANIC at its Belfast shipyard and found that the steel plates toward the bow and the stern were held together with low‐grade iron rivets. Those rivets may have been used because higher‐grade rivets were in short supply, or because the better rivets couldn’t be inserted in those areas using the shipyard's crane‐mounted hydraulic equipment. The metallurgists said those low‐grade rivets would have ripped apart more easily during the collision, causing the ship to sink more quickly that it would have if stronger rivets had been used.
The ship was going too fast:
Many investigators have said that the ship’s captain, Edward J. Smith, was aiming to better the crossing time of the Olympic, the TITANIC’s older sibling in the White Star fleet. For some, the fact that the TITANIC was sailing full speed ahead despite concerns about icebergs was Smith’s biggest misstep. “Simply put, TITANIC was traveling way too fast in an area known to contain ice, which was one of the major reason of the TITANIC disaster.
Iceberg warnings went unheeded
: The TITANIC received multiple warnings about icefields in the North Atlantic over the wireless, but Corfield notes that the last and most specific warning was not passed along by senior radio operator Jack Phillips to Captain Smith, apparently because it didn't carry the prefix “MSG” (Masters’ Service Gram). That would have required a personal acknowledgment from the captain. “Phillips interpreted it as non‐urgent and returned to sending passenger messages to the receiver on shore at Cape Race, Newfoundland, before it went out of range,” Corfield writes.
The binoculars were locked up
: Corfield also says binoculars that could have been used by lookouts on the night of the collision were locked up aboard the ship – and the key was held by David Blair, an officer who was bumped from the crew before the ship’s departure from Southampton. Some historians have speculated that the fatal iceberg might have been spotted earlier if the binoculars were in use, but others say it wouldn’t have made a difference.
The steersman took a wrong turn:
Did the TITANIC’s steersman turn the ship toward the iceberg, dooming the ship? That’s the claim made by Louise Patten, who said the story was passed down from her grandfather, the most senior ship officer to survive the disaster. After the iceberg was spotted, the command was issued to turn “hard a starboard,” but as the command was passed down the line, it was misinterpreted as meaning “make the ship turn right” rather than “push the tiller right to make the ship head left,” Patten said. She said the error was quickly discovered, but not quickly enough to avert the collision. She also speculated that if the ship had stopped where it was hit, seawater would not have pushed into one interior compartment after another as it did, and the ship might not have sunk as quickly.
Reverse thrust reduced the ship's maneuverability:
Just before impact, first officer William McMaster Murdoch is said to have telegraphed the engine room to put the ship's engines into reverse. That would cause the left and right propeller to turn backward, but because of the configuration of the stern, the central propeller could only be halted, not reversed. Corfield said “the fact that the steering propeller was not rotating severely diminished the turning ability of the ship. It is one of the many bitter ironies of the Titanic tragedy that the ship might well have avoided the iceberg if Murdoch had not told the engine room to reduce and then reverse thrust.”
There were too few lifeboats
: Perhaps the biggest tragedy is that there were not enough lifeboats to accommodate all of the TITANIC's more than 2200 passengers and crew members. The lifeboats could accommodate only about 1200 people.
Do these nine causes cover everything, or are there still more factors I'm forgetting? Are there some lessons still unlearned from the TITANIC tragedy?
Looking at the TITANIC failure report, it shows that there is no single cause and no single train of events that lead to a failure. Rather, there are factors that combine at a particular time and place to allow a failure to occur. Sometimes the absence of any single one of the factors may have been enough to prevent the failure. Sometimes, though, it is impossible to determine, at least within the resources allotted for the analysis, whether any single factor was key. If failure analysts are to perform their jobs in a professional manner, they must look beyond the simplistic list of causes of failure that some people still believe. They must keep an open mind and always be willing to get help when beyond their own experience.
A failure is often the result of multiple causes at different levels. Some causes might affect other causes that, in turn, create the visible problem. Causes can be classified as one of the following:
Symptoms. These are not regarded as actual causes, but rather as signs of existing problems.
First‐level causes. Causes that directly lead to a problem.
Higher‐level causes. Causes that lead to the first‐level causes. They may not directly cause the problem, but form links in the chain of cause‐and‐effect relationships that ultimately create the problem.
Some failures often have compound reasons, where different factors combine to cause the problem. Examples of the levels of causes follow.
The highest‐level cause of a problem is called the root cause:
Hence, the root cause is “the evil at the bottom” that sets in motion the entire cause‐and‐effect chain causing the problem(s).
TrevoKletz said
…root cause investigation is like peeling an onion. The outer layers deal with technical causes, while the inner layers are concerned with weaknesses in the management system. I am not suggesting that technical causes are less important. But putting technical causes right will prevent only the LAST event from happening again; attending to the underlying causes may prevent MANY SIMILAR INCIDENCES.
The difference between failure analysis and root cause analysis is that failure analysis is a discipline used for identifying the physical roots of failures, whereas the root cause analysis (RCA) techniques is a discipline used in exploring some of the other contributors to failures, such as the human and latent root causes. Root cause analysis is intended to identify the fundamental cause(s) that if corrected will prevent recurrence. The principles of RCA may be applied to ensure that the real root cause is identified to initiate appropriate corrective actions. RCA helps in correcting and preventing failures, achieving higher levels of quality and reliability, and ultimately enhancing customer satisfaction
Depending on the objectives of the RCA, one should decide how deeply one should analyze the case. These objectives are typically based on the risk associated with the failures and the complexity of the situation. The three levels of root cause analysis are physical roots, human roots, and latent roots. Physical roots, or the roots of equipment problems, are where many failure analyses stop. Physical root causes are derived from laboratory investigation or engineering analysis and are often component‐level or materials‐level findings. Human roots (i.e., people issues) involve human factors, where the error may be happened due to human judgment that may have caused the failure. Latent roots include roots that are organizational or procedural in nature, as well as environmental or other roots that are outside the realm of control.
This is the physical mechanism that caused the failure, it may be fatigue, overload, wear, corrosion, or any combination of these. For example – corrosion damage of a pipeline, a bearing failed due to fatigue. Failure analysis must start with accurately determining the physical roots, for without that knowledge, the actual human and latent roots cannot be detected and corrected. The analysis may focus on physics of the incident. In the case of TITANIC, the iron rivets were too weak.
The steel plates of the TITANIC buckled as there were excessive stress applied to the hull when the ship hit the iceberg. The strength of steel and hull was not sufficient to prevent the hull from being breached by the steel plates buckling. The failure of the hull steel resulted from brittle fractures caused by the high sulfur content of the steel, the low temperature water on the night of the disaster, and the high impact loading of the collision with the iceberg. When the TITANIC hit the iceberg, the hull plates split open and continued cracking as the water flooded the ship.
The human roots are those human errors that result in the mechanisms that caused the physical failures. What is the error committed that lead to the physical cause?
Someone did the wrong thing knowingly or unknowingly. We asked what caused the person to commit this mistake. A good example is, the TITANIC was sailing full speed ahead despite concerns about icebergs was Smith’s biggest misstep. the TITANIC was actually speeding up when it struck the iceberg as it was White Star chairman and managing director, Bruce Ismay’s, intention to run the rest of the route to New York at full speed, arrive early, and prove the TITANIC’s superior performance. Ismay survived the disaster and testified at the inquiries that this speed increase was approved by Captain Smith and the helmsman was operating under his Captain’s direction.
All physical failures are triggered by humans. But humans are negatively influenced by latent forces. The goal is to identify and remove these latent forces. Latent causes reveal themselves in layers. One after the other, the layers can be peeled back, similar to peeling the layers off an onion. It often seems as if there is no end. These forces within the organizations are causing people to make serious mistakes.
These are the management system weaknesses that include training, policies, procedures and specifications. People make decision based on these and if the system is flawed, the decision will be in error and will be the triggering mechanism that causes the mechanical failure to occur. These are the management system weaknesses. These include training, policies, procedures and specifications. The most proactive of all industrial action might be to identify and remove these latent traps. But all our attempts to identify and remove these latent causes of failure start at the human. Humans do things “inappropriately,” for “latent” reasons. In order to understand these reasons, we must first understand what “errors” are being made. This puts people at risk – especially the “culprits.” Once exposed. They are in danger of being inappropriately disciplined.
In the TITANIC case, the voyage had been so hastily pushed that the crew had no specific training or conducted any drills in lifesaving on the TITANIC, being unfamiliar with the lifeboats and their davit lowering mechanisms. Compounding this was a decision by White Star management to equip the TITANIC with only half the necessary lifeboats to handle the number of people onboard. The reasons are long established. White Star felt a full complement of lifeboats would give the ship an unattractive, cluttered look. They also clearly had a false confidence the lifeboats would never be needed.
To understand different level of root causes, let us take one industrial case.
Consider this example: During the overhauling of a large reciprocating compressor, the maintenance supervisor discovers a damaged compressor rod requiring replacement. So, he decides to have a rod made in a local shop by fabricating the rod with cut threads. But the OEM’s design department has recommended the compressor rods for this frame size to have rolled threads. As a result of the improper fabrication, the rod fails due to fatigue in the thread area and causes extensive secondary damage inside the compressor.
Figure 2.1 Events leading to compressor failure.
If you study this example, you can discern the following events leading to the costly failure:
The warehouse did not stock spares for this rod because it was a new compressor installation.
The maintenance supervisor decides to have a rod fabricated without drawings.
Neither the user nor the local shop investigated the thread requirements.
Because the compressor was not equipped with vibration shutdowns, it ran for a significant amount of time before it was shutdown.
There were several chances to break the chain of events leading to the catastrophic compressor failure. If the project engineer had ordered spare parts through the OEM, this failure probably would have been avoided. If either the maintenance supervisor or the local machine shop had talked to the OEM, or studied the failed rod, they would have been aware of the importance of rolled threads. Lastly, if a vibration shutdown had been in place, the compressor would have shutdown after only minimal damage. We see there were six major events leading to the secondary compressor damage. These events were as follows:
No procedure in place to order spare parts for newly purchased equipment (latent root).
The improper installation of the packing leads to rod scoring.
Because a spare rod is not available and plant management wants the compressor back in operation as soon as possible, it was decided to have a replacement rod fabricated at a local machine shop.
No one checks with the OEM about rod thread specifications (physical root).
The rod fails after two days of operation.
The broken rod causes extensive damage to the cylinder, packing box, distance piece, and cross‐head.
After examining the vestiges of the failure, the rotating equipment (RE) engineer would discover a fatigue failure in the threaded portion of the rod. From this, he would conclude an improper thread design led to a stress riser and a shortened fatigue life. After talking to the OEM, he writes a report recommending that all compressor rods in the plant have rolled threads.
This recommendation will surely reduce rod failures, but the investigation did not uncover the latent root of failure. The stress riser, due to the improper thread design, is called the “physical root,” because it did initiate the physical events leading to the secondary damage. However, there were significant events preceding the physical root that are of interest. If the RE engineer had the time and resources, he would have discovered that the absence of a procedure requiring new equipment to be purchased with adequate spares directly initiated the sequence of events. This basic event is called the “latent root.”
By requiring spare parts be purchased from the OEM for all new equipment, the latent root is eliminated, not only for this scenario but, potentially, for many other similar events. This example demonstrates the importance of finding out the “latent root” of rotating equipment failures. Stopping at the “physical root,” deprives the organization of a valuable opportunity for improvement. So, an RCFA is a detailed analysis of a complex, multi‐event failure, such as the example above, in which the sequence of events is hoped to be found, along with the initiating event. The initiating event is called the root cause, and factors that contributed to the severity of the failure or perpetuated the events leading to the failure are called contributing events.
Industry personnel generally divides failure analysis into three categories in order of complexity and depth of investigation.
They are:
Component failure analysis (CFA) looks at the specific physical cause of failure such as fatigue, overload, or corrosion of the machine element that failed, for example, a bearing or a gear. This type of analysis mostly emphasizes to find the physical causes of the failure.
Root cause investigation (RCI) is conducted in greater depth than the CFA and goes substantially beyond the physical root of a problem. It investigates to find the human errors involved but doesn’t involve management system deficiencies.
Root cause analyses (RCA) include everything the RCI covers plus the management system problems that allow the human errors and other system weaknesses to exist.
Although the cost increases as the analyses become more complex, the benefit is that there is a much more complete recognition of the true origins of the problem. Using a CFA to solve the causes of a component failure answers why that specific part or machine failed and can be used to prevent similar future failures. Progressing to an RCI, we find the cost is 5–10 times that of a CFA but the RCI adds a detailed understanding of the human errors contributing to the breakdown and can be used to eliminate groups of similar problems in the future. However, conducting an RCA may cost well into six figures and require several months. These costs may be intimidating to some, but the benefits obtained from correcting the major roots will eliminate huge classes of problems. The return will be many times the expenditure and will start to be realized within a few months of formal program implementation.
One thing that has to be recognized is that, because of the time, manpower, and costs involved, it is essentially impossible to conduct an RCA on every failure. The cost and possible benefits have to be recognized and judgments made to decide on the appropriate type of analysis.
RCFA are normally justified for those events associated with the partial or complete failure of critical production equipment, machinery, or systems. This type of incident can have a severe, negative impact on plant performance. Therefore, it often justifies the effort required to fully evaluate the event and to determine its root cause.
Many a time deviations in operating performance occur without the physical failure of equipment or components. Chronic deviations may justify the use of RCFA as a means of resolving the recurring problem.
RCFA can be used to resolve most quality‐related problems. However, the analysis should not be used for all quality problems.
Many of the problems or events that occur affect a plant’s ability to consistently meet expected production or capacity rates. These problems may be suitable for RCFA, but further evaluation is recommended before beginning an analysis. After the initial investigation, if the event can be fully qualified and a cost‐effective solution not found, then a full analysis should be considered. Note that an analysis normally is not performed on random, nonrecumng events or equipment failures.
Deviations in economic performance, such as high production or maintenance costs, often warrant the use of RCFA. The decision tree and specific steps required to resolve these problems vary depending on the type of problem and its forcing functions or causes.
