48,99 €
Design quality SAS software and evaluate SAS software quality SAS Data Analytic Development is the developer's compendium for writing better-performing software and the manager's guide to building comprehensive software performance requirements. The text introduces and parallels the International Organization for Standardization (ISO) software product quality model, demonstrating 15 performance requirements that represent dimensions of software quality, including: reliability, recoverability, robustness, execution efficiency (i.e., speed), efficiency, scalability, portability, security, automation, maintainability, modularity, readability, testability, stability, and reusability. The text is intended to be read cover-to-cover or used as a reference tool to instruct, inspire, deliver, and evaluate software quality. A common fault in many software development environments is a focus on functional requirements--the what and how--to the detriment of performance requirements, which specify instead how well software should function (assessed through software execution) or how easily software should be maintained (assessed through code inspection). Without the definition and communication of performance requirements, developers risk either building software that lacks intended quality or wasting time delivering software that exceeds performance objectives--thus, either underperforming or gold-plating, both of which are undesirable. Managers, customers, and other decision makers should also understand the dimensions of software quality both to define performance requirements at project outset as well as to evaluate whether those objectives were met at software completion. As data analytic software, SAS transforms data into information and ultimately knowledge and data-driven decisions. Not surprisingly, data quality is a central focus and theme of SAS literature; however, code quality is far less commonly described and too often references only the speed or efficiency with which software should execute, omitting other critical dimensions of software quality. SAS® software project definitions and technical requirements often fall victim to this paradox, in which rigorous quality requirements exist for data and data products yet not for the software that undergirds them. By demonstrating the cost and benefits of software quality inclusion and the risk of software quality exclusion, stakeholders learn to value, prioritize, implement, and evaluate dimensions of software quality within risk management and project management frameworks of the software development life cycle (SDLC). Thus, SAS Data Analytic Development recalibrates business value, placing code quality on par with data quality, and performance requirements on par with functional requirements.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 904
Veröffentlichungsjahr: 2016
Preface
Objectives
Audience
Application of Content
Organization
Acknowledgments
About the Author
Chapter 1: Introduction
Distinguishing Data Analytic Development
Software Development Life Cycle (SDLC)
Risk
Chapter 2: Quality
Defining Quality
Software Product Quality Model
Quality in the SDLC
Chapter 3: Communication
Return Codes
System Numeric Return Codes
System Alphanumeric Return Codes
User-Generated Return Codes
Parallel Processing Communication
Part I: Dynamic Performance
Chapter 4: Reliability
Defining Reliability
Paths To Failure
ACL: The Reliability Triad
Reliability in the SDLC
Chapter 5: Recoverability
Defining Recoverability
Recoverability toward Reliability
Recoverability Matrix
TEACH Recoverability Principles
SPICIER Recoverability Steps
Recovering with Checkpoints
Recoverability in the SDLC
Chapter 6: Robustness
Defining Robustness
Robustness toward Reliability
Defensive Programming
Exception Handling
Robustness in the SDLC
Chapter 7: Execution Efficiency
Defining Execution Efficiency
Factors Affecting Execution Efficiency
False Dependencies
Parallel Processing
Execution Efficiency in the SDLC
Chapter 8: Efficiency
Defining Efficiency
Disambiguating Efficiency
Defining Resources
Efficiency in the SDLC
Chapter 9: Scalability
Defining Scalability
The Scalability Triad
Resource Scalability
Demand Scalability
Load Scalability
Scalability in the SDLC
Chapter 10: Portability
Defining Portability
Disambiguating Portability
3GL versus 4GL Portability
Facets of Portability
Portability in the SDLC
Chapter 11: Security
Defining Security
Confidentiality
Integrity
Availability
Security in the SDLC
Chapter 12: Automation
Defining Automation
Automation in SAS Software
SAS Processing Modes
Starting in Interactive Mode
Starting in Batch Mode
Automation in the SDLC
Part II: Static Performance
Chapter 13: Maintainability
Defining Maintainability
Maintenance
Maintenance in the SDLC
Failure to Maintain
Maintainability
Chapter 14: Modularity
Defining Modularity
From Monolithic to Modular
Modularity Principles
Benefits of Modularity
Chapter 15: Readability
Defining Readability
Plan to Get Hit by a Bus
Software Readability
External Readability
Chapter 16: Testability
Defining Testability
Software Testing
Testability
Chapter 17: Stability
Defining Stability
Achieving Stability
Stable Requirements
Defect-Free Code
Dynamic Flexibility
Stability And Beyond
Modularizing More Than Macros
Chapter 18: Reusability
Defining Reusability
Reuse
Reusability
From Reusability to Extensibility
Index
End User License Agreement
ii
iii
vii
xi
xii
xiii
xiv
xv
xvi
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
87
85
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
421
419
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
603
604
605
606
Table of Contents
Begin Reading
Chapter 1: Introduction
Figure 1.1 Software Development Environments
Figure 1.2 The Software Development Life Cycle (SDLC)
Figure 1.3 Waterfall Development Methodology
Figure 1.4 Agile Software Development
Chapter 2: Quality
Figure 2.1 ISO Software Product Quality Model
Figure 2.2 Software Quality Model Demonstrated in Chapter Organization
Figure 2.3 Interaction of Software Quality Constructs and Dimensions
Figure 2.4 Underperformance and Gold-Plating
Chapter 4: Reliability
Figure 4.1 Software Decay Continuum
Figure 4.2 Increasing Requirements in Operational Phase
Figure 4.3 Reliability Growth Curve with Decreasing Failure Rate
Figure 4.4 Traditional and End-User Development Reliability Growth Curves
Figure 4.5 MTBF Competing Definitions, Inclusive and Exclusive of Recovery
Chapter 5: Recoverability
Figure 5.1 Comparing MTTF and MTBF
Figure 5.2 MTBF Influencing Reliability Growth Curve
Figure 5.3 Recoverability Growth Curve with Decreasing Recovery Period
Figure 5.4 Interaction of Reliability and Recoverability
Figure 5.5 MTTR, RTO, and MTD
Chapter 6: Robustness
Figure 6.1 Exception Inheritance
Figure 6.2 Exception Inheritance in Software Reuse
Figure 6.3 Program Flow and Happy Trail
Chapter 7: Execution Efficiency
Figure 7.1 From Monolithic to Parallel Program Flow
Figure 7.2 Critical Path Analysis
Figure 7.3 Data Set False Dependencies
Figure 7.4 Data Set False Dependencies Removed
Figure 7.5 Diminishing Return Curves for Software Performance
Figure 7.6 Histogram of Real Time for Sorting 10 Million Observations
Figure 7.7 Histogram of Real Time and CPU Time for Sorting 10 Million Observations
Chapter 8: Efficiency
Figure 8.1 Memory Usage by SORT Procedure
Figure 8.2 Inefficiency Elbow Caused by RAM Exhaustion
Chapter 9: Scalability
Figure 9.1 Multithreaded versus Single-Threaded SORT Procedure
Figure 9.2 Contrasting Scalability Performance
Figure 9.3 Phase-Gate and Parallel Software Models
Figure 9.4 Inefficiency Elbow on SORT Procedure
Figure 9.5 Inefficiency Elbow Solutions
Figure 9.6 Scaling with Efficiency
Figure 9.7 Incremental Refactoring to Support Increased Software Performance
Chapter 10: Portability
Figure 10.1 SAS University Edition Performance Limitations
Chapter 12: Automation
Figure 12.1 SAS Display Manager at Startup
Chapter 14: Modularity
Figure 14.1 Transitive Property of Data Analytic Software
Chapter 1: Introduction
Table 1.1 Sample Risk Register
Chapter 2: Quality
Table 2.1 Common Objectives and Questions
Chapter 4: Reliability
Table 4.1 Sample Failure Log for SAS Software Run Daily for Two Months
Table 4.2 Inclusive versus Exclusive MTBF Calculations
Chapter 5: Recoverability
Table 5.1 Sample Failure Log for SAS Software Run Daily for Two Months
Chapter 6: Robustness
Table 6.1 Abbreviated Risk Register
Chapter 7: Execution Efficiency
Table 7.1 Control Table Demonstrating Parallel Processing
Chapter 9: Scalability
Table 9.1 Control Table Demonstrating Parallel Processing
Table 9.2 FULLSTIMER Metrics for SORT Procedure at Inefficiency Elbow
The Wiley & SAS Business Series presents books that help senior-level managers with their critical management decisions.
Titles in the Wiley & SAS Business Series include:
Agile by Design: An Implementation Guide to Analytic Lifecycle Management by Rachel Alt-Simmons
Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications
by Bart Baesens
Bank Fraud: Using Technology to Combat Losses
by Revathi Subramanian
Big Data, Big Innovation: Enabling Competitive Differentiation through Business Analytics
by Evan Stubbs
Business Forecasting: Practical Problems and Solutions
edited by Michael Gilliland, Len Tashman, and Udo Sglavo
Business Intelligence Applied: Implementing an Effective Information and Communications Technology Infrastructure
by Michael Gendron
Business Intelligence and the Cloud: Strategic Implementation Guide
by Michael S. Gendron
Business Transformation: A Roadmap for Maximizing Organizational Insights
by Aiman Zeid
Data-Driven Healthcare: How Analytics and BI Are Transforming the Industry
by Laura Madsen
Delivering Business Analytics: Practical Guidelines for Best Practice
by Evan Stubbs
Demand-Driven Forecasting: A Structured Approach to Forecasting, Second Edition
by Charles Chase
Demand-Driven Inventory Optimization and Replenishment: Creating a More Efficient Supply Chain
by Robert A. Davis
Developing Human Capital: Using Analytics to Plan and Optimize Your Learning and Development Investments
by Gene Pease, Barbara Beresford, and Lew Walker
Economic and Business Forecasting: Analyzing and Interpreting Econometric Results
by John
Silvia, Azhar Iqbal, Kaylyn Swankoski, Sarah Watt, and Sam Bullard
Financial Institution Advantage and the Optimization of Information Processing
by Sean C. Keenan
Financial Risk Management: Applications in Market, Credit, Asset, and Liability Management and Firmwide Risk
by Jimmy Skoglund and Wei Chen
Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection
by Bart Baesens, Veronique Van Vlasselaer, and Wouter Verbeke
Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data Driven Models
by Keith Holdaway
Health Analytics: Gaining the Insights to Transform Health Care
by Jason Burke
Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World
by Carlos Andre, Reis Pinheiro, and Fiona McNeill
Hotel Pricing in a Social World: Driving Value in the Digital Economy
by Kelly McGuire
Implement, Improve and Expand Your Statewide Longitudinal Data System: Creating a Culture of Data in Education
by Jamie McQuiggan and Armistead Sapp
Killer Analytics: Top 20 Metrics Missing from Your Balance Sheet
by Mark Brown
Mobile Learning: A Handbook for Developers, Educators, and Learners
by Scott McQuiggan, Lucy Kosturko, Jamie McQuiggan, and Jennifer Sabourin
The Patient Revolution: How Big Data and Analytics Are Transforming the Healthcare Experience
by Krisa Tailor
Predictive Analytics for Human Resources
by Jac Fitz-enz and John Mattox II
Predictive Business Analytics: Forward-Looking Capabilities to Improve Business Performance
by Lawrence Maisel and Gary Cokins
Statistical Thinking: Improving Business Performance, Second Edition
by Roger W. Hoerl and Ronald D. Snee
Too Big to Ignore: The Business Case for Big Data
by Phil Simon
Trade-Based Money Laundering: The Next Frontier in International Money Laundering Enforcement
by John Cassara
The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions
by Phil Simon
Understanding the Predictive Analytics Lifecycle
by Al Cordoba
Unleashing Your Inner Leader: An Executive Coach Tells All
by Vickie Bevenour
Using Big Data Analytics: Turning Big Data into Big Money
by Jared Dean
Visual Six Sigma, Second Edition
by Ian Cox, Marie Gaudard, and Mia Stephens.
For more information on any of the above titles, please visit www.wiley.com.
Troy Martin Hughes
Copyright © 2016 by SAS Institute, Inc. All rights reserved.
Published by John Wiley & Sons, Inc., Hoboken, New Jersey.
Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Names: Hughes, Troy Martin, 1976– author.
Title: SAS data analytic development : dimensions of software quality / Troy Martin Hughes.
Description: Hoboken, New Jersey : John Wiley & Sons, 2016. | Includes index.
Identifiers: LCCN 2016021300 | ISBN 9781119240761 (cloth) | ISBN 9781119255918 (epub) | ISBN 9781119255703 (ePDF)
Subjects: LCSH : SAS (Computer file) | Quantitative research—Data processing.
Classification: LCC QA276.45.S27 H84 2016 | DDC 005.5/5—dc23
LC record available at https://lccn.loc.gov/2016021300
Cover Design: Wiley
Cover Image: © Emelyanov/iStockphoto
To Mom,who dreamed of being a writer and,through unceasing love, raised one,and Dad,who taught me to programbefore I could even reach the keys.
Because SAS practitioners are software developers, too!
Within the body of SAS literature, an overwhelming focus on data quality eclipses software quality. Whether discussed in books, white papers, technical documentation, or even posted job descriptions, nearly all references to quality in relationship to SAS describe the quality of data or data products.
The focus on data quality and diversion from traditional software development priorities is not without reason. Data analytic development is software development but ultimate business value is delivered not through software products but rather through subsequent, derivative data products. In aligning quality only with data, however, data analytic development environments can place an overwhelming focus on software functional requirements to the detriment or exclusion of software performance requirements. When SAS literature does describe performance best practices, it typically demonstrates only how to make SAS software faster or more efficient while omitting other dimensions of software quality.
However, what about software reliability, scalability, security, maintainability, or modularity—or the host of other software quality characteristics? For all the SAS practitioners of the world—including developers, biostatisticians, econometricians, researchers, students, project managers, market analysts, data scientists, and others—this text demonstrates a model for software quality promulgated by the International Organization for Standardization (ISO) to facilitate the evaluation and pursuit of software quality.
Through hundreds of Base SAS software examples and more than 4,000 lines of code, SAS practitioners will learn how to define, prioritize, implement, and measure 15 dimensions of software quality. Moreover, nontechnical stakeholders, including project managers, functional managers, customers, sponsors, and business analysts, will learn to recognize the value of quality inclusion and the commensurate risk of quality exclusion. With this more comprehensive view of quality, SAS software quality is finally placed on par with SAS data quality.
Why this text and the relentless pursuit of SAS software quality? Because SAS practitioners, regardless of job title, are inherently software developers, too, and should benefit from industry standards and best practices. Software quality can and should be allowed to flourish in any environment.
The primary goal is to describe and demonstrate SAS software development within the framework of the ISO software product quality model. The model defines characteristics of software quality codified within the Systems and software Quality Requirements and Evaluation (SQuaRE) series (ISO/IEC 25000:2014). Through the 15 intertwined dimensions of software quality presented in this text, readers will be equipped to understand, implement, evaluate, and, most importantly, value software quality.
A secondary goal is to demonstrate the role and importance of the software development life cycle (SDLC) in facilitating software quality. Thus, the dimensions of quality are presented as enduring principles that influence software planning, design, development, testing, validation, acceptance, deployment, operation, and maintenance. The SDLC is demonstrated in a requirements-based framework in which ultimate business need spawns technical requirements that drive the inclusion (or exclusion) of quality in software. Requirements initially provide the backbone of software design and ultimately the basis against which the quality of completed software is evaluated.
A tertiary goal is to demonstrate SAS software development within a risk management framework that identifies the threats of poor quality software to business value. Poor data quality is habitually highlighted in SAS literature as a threat to business value, but poor code quality can equally contribute to project failure. This text doesn't suggest that all dimensions of software quality should be incorporated in all software, but rather aims to formalize a structure through which threats and vulnerabilities can be identified and their ultimate risk to software calculated. Thus, performance requirements are most appropriately implemented when the benefits of their inclusion as well as the risks of their exclusion are understood.
Savvy SAS practitioners are the intended audience and represent the professionals who utilize the SAS application to write software in the Base SAS language. An advanced knowledge of Base SAS, including the SAS macro language, is recommended but not required.
Other stakeholders who will benefit from this text include project sponsors, customers, managers, Agile facilitators, ScrumMasters, software testers, and anyone with a desire to understand or improve software performance. Nontechnical stakeholders may have limited knowledge of the SAS language, or software development in general, yet nevertheless generate requirements that drive software projects. These stakeholders will benefit through the introduction of quality characteristics that should be used to define software requirements and evaluate software performance.
The ISO software product quality model is agnostic to industry, team size, organizational structure (e.g., functional, projectized, matrix), development methodology (e.g., Agile, Scrum, Lean, Extreme Programming, Waterfall), and developer role (e.g., developer, end-user developer). The student researcher working on a SAS client machine will gain as much insight from this text as a team of developers working in a highly structured environment with separate development, test, and production servers.
While the majority of Base SAS code demonstrated is portable between SAS interfaces and environments, some input/output (I/O) and other system functions, options, and parameters are OS- or interface-specific. Code examples in this text have been tested in the SAS Display Manager for Windows, SAS Enterprise Guide for Windows, and the SAS University Edition. Functional differences among these applications are highlighted throughout the text, and discussed in chapter 10, “Portability.”
While this text includes hundreds of examples of SAS code that demonstrate the successful implementation and evaluation of quality characteristics, it differs from other SAS literature in that it doesn't represent a compendia of SAS software best practices, but rather the application of SAS code to support the software product quality model within the SDLC. Therefore, code examples demonstrate software performance rather than functionality.
Most software texts are organized around functionality—either a top-down approach in which a functional objective is stated and various methods to achieve that goal are demonstrated, or a bottom-up approach in which uses and caveats of a specific SAS function, procedure, or statement are explored. Because this text follows the ISO software product quality model and focuses on performance rather than functionality, it eschews the conventional organization of functionality-driven SAS literature. Instead, 15 chapters highlight a dynamic or static performance characteristic—a single dimension of software quality. Code examples often build incrementally throughout each chapter as quality objectives are identified and achieved, and related quality characteristics are highlighted for future reference and reading.
The text is divided into two parts comprising 18 total chapters:
Overview
Three chapters introduce the concept of quality, the ISO software product quality model, the SDLC, risk management, Agile and Waterfall development methodologies, exception handling, and other information and terms central to the text. Even to the reader who is anxious to reach the more technically substantive performance chapters,
Chapters 1
, “Introduction,” and 2, “Quality,” should be skimmed to gleam the context of software quality within data analytic development environments.
Part I
. Dynamic Performance
These nine chapters introduce dynamic performance requirements—software quality attributes that are demonstrated, measured, and validated through software execution. For example, software efficiency can be demonstrated by running code and measuring run time and system resources such as CPU and memory usage. Chapters include “Reliability,” “Recoverability,” “Robustness,” “Execution Efficiency,” “Efficiency,” “Scalability,” “Portability,” “Security,” and “Automation.”
Part II
. Static Performance
These six chapters introduce static performance requirements—software quality attributes that are assessed through code inspection rather than execution. For example, the extent to which software is modularized cannot be determined until the code is opened and inspected, either through manual review or automated test software. Chapters include “Maintainability,” “Modularity,” “Readability,” “Testability,” “Stability,” and “Reusability.”
Text formatting constructs are standardized to facilitate SAS code readability. Formatting is not intended to demonstrate best practices but rather standardization. All code samples are presented in lowercase, but the following conventions are used where code is referenced within the text:
SAS libraries are capitalized, such as the
WORK library
, or the
PERM.Burrito data set within the PERM library
.
SAS data sets appear in sentence case, such as the
Chimichanga data set
or the
WORK.Tacos_are_forever data set
.
SAS reserved words—including statements, functions, and procedure names—are capitalized, such as the
UPCASE function
or the
MEANS procedure
.
The DATA step is always capitalized, such as the
DATA step can be deleted if the SQL procedure is implemented
.
Variables used within the DATA step or SAS procedures are capitalized, such as the
variable CHAR1 is missing
.
SAS user-defined formats are capitalized, such as the
MONTHS format
.
SAS macros are capitalized and preceded with a percent sign, such as the
%LOCKITDOWN macro prevents file access collisions
.
SAS macro variables are capitalized, such as the
&DSN macro variable is commonly defined to represent the data set name
.
SAS parameters that are passed to macros are capitalized, such as the
DSN parameter in the %GOBIG macro invocation
.
So many people, through contributions to my life as well as endurance and encouragement throughout this journey, have contributed directly and indirectly and made this project possible.
To the family and friends I ignored for four months while road-tripping through 24 states to write this, thank you for your love, patience, understanding, and couches.
To my teachers who instilled a love of writing, thank you for years of red ink and encouragement: Sister Mary Katherine Gallagher, Estelle McCarthy, Lorinne McKnight, Dolores Cummings, Millie Bizzini, Patty Ely, Jo Berry, Liana Hachiya, Audrey Musson, Dana Trevethan, Cheri Rowton, Annette Simmons, and Dr. Robyn Bell.
To the mentors whose words continue to guide me, thank you for your leadership and friendship: Dr. Cathy Schuman, Dr. Barton Palmer, Dr. Kiko Gladsjo, Dr. Mina Chang, Dean Kauffman, Rich Nagy, Jim Martin, and Jeff Stillman.
To my SAS spirit guides, thank you not only for challenging the limits of the semicolon but also for sharing your successes and failures with the world: Dr. Gerhard Svolba, Art Carpenter, Kirk Paul Lafler, Susan Slaughter, Lora Delwiche, Peter Eberhardt, Ron Cody, Charlie Shipp, and Thomas Billings.
To SAS, thank you for distributing the SAS University Edition and for providing additional software free of charge, without which this project would have been impossible.
Finally, thank you to John Wiley & Sons, Inc. for support and patience throughout this endeavor.
Troy Martin Hughes has been a SAS practitioner for more than 15 years, has managed SAS projects in support of federal, state, and local government initiatives, and is a SAS Certified Advanced Programmer, SAS Certified Base Programmer, SAS Certified Clinical Trials Programmer, and SAS Professional V8. He has an MBA in information systems management and additional credentials, including: PMP, PMI-ACP, PMI-PBA, PMI-RMP, CISSP, CSSLP, CSM, CSD, CSPO, CSP, and ITIL v3 Foundation. He has been a frequent presenter and invited speaker at SAS user conferences, including SAS Global Forum, WUSS, MWSUG, SCSUG, SESUG, and PharmaSUG. Troy is a U.S. Navy veteran with two tours of duty in Afghanistan and, in his spare time, a volunteer firefighter and EMT.
Software development in which ultimate business value is delivered not through software products but rather through subsequent, derivative data products, including data sets, databases, analyses, reports, and data-driven decisions.
Data analytic development creates and implements software as a means to an end, but the software itself is never the end. Rather, the software is designed to automate data ingestion, cleaning, transformation, analysis, presentation, and other data-centric processes. Through the results generated, subsequent data products confer information and ultimately knowledge to stakeholders. Thus, a software product in and of itself may deliver no ultimate business value, although it is necessary to produce the golden egg—the valuable data product. As a data analytic development language, Base SAS is utilized to develop SAS software products (programs created by SAS practitioners) that are compiled and run on the SAS application (SAS editor and compiler) across various SAS interfaces (e.g., SAS Display Manager, SAS Enterprise Guide, SAS University Edition) purchased from or provided by SAS, Inc.
Data analytic software is often produced in a development environment known as end-user development, in which the developers of software themselves are also the users. Within end-user development environments, software is never transferred or sold to a third party but is used and maintained by the developer or development team. For example, a financial fraud analyst may be required to produce a weekly report that details suspicious credit card transactions to validate and calibrate fraud detection algorithms. The analyst is required to develop a repeatable SAS program that can generate results to meet customer needs. However, the analyst is an end-user developer because he is responsible for both writing the software and creating weekly reports based on the data output. Note that this example represents data analytic development within an end-user development environment.
Traditional, user-focused, or software applications development contrasts sharply with data analytic development because ultimate business value is conferred through the production and delivery of software itself. For example, when Microsoft developers build a product such as Microsoft Word or Excel, the working software product denotes business value because it is distributed to and purchased by third-party users. The software development life cycle (SDLC) continues after purchase, but only insofar as maintenance activities are performed by Microsoft, such as developing and disseminating software patches. In the following section, data analytic development is compared to and contrasted with end-user and traditional development environments.
So why bother distinguishing data analytic development environments? Because it's important to understand the strengths and weaknesses of respective development environments and because the software development environment can influence the relative quality and performance of software.
To be clear, data analytic development environments, end-user development environments, and traditional software development environments are not mutually exclusive. Figure 1.1 demonstrates the entanglement between these environments, demonstrating the majority of data analytic development performed within end-user development environments.
Figure 1.1 Software Development Environments
The data-analytic-end-user hybrid represents the most common type of data analytic development for several reasons. Principally, data analytic software is created not from a need for software itself but rather from a need to solve some problem, produce some output, or make some decision. For example, the financial analyst who needs to write a report about fraud levels and fraud detection accuracy in turn authors SAS software to automate and standardize this solution. SAS practitioners building data analytic software are often required to have extensive domain expertise in the data they're processing, analyzing, or otherwise utilizing to ensure they produce valid data products and decisions. Thus, first and foremost, the SAS practitioner is a financial analyst, although primary responsibilities can include software development, testing, operation, and maintenance.
Technical aspects and limitations of Base SAS software also encourage data analytic development to occur within end-user development environments. Because Base SAS software is compiled at execution, it remains as plain text not only in development and testing phases but also in production. This prevents the stabilization or hardening of software required for software encryption, which is necessary when software is developed for third-party users. For this reason, no market exists for companies that build and sell SAS software to third-party user bases because the underlying code would be able to be freely examined, replicated, and distributed. Moreover, without encryption, the accessibility of Base SAS code encourages SAS practitioners to explore and meddle with code, compromising its security and integrity.
The data-analytic-traditional hybrid is less common in SAS software but describes data analytic software in which the software does denote ultimate business value rather than a means to an end. This model is more common in environments in which separate development teams exist apart from analytic or software operational teams. For example, a team of SAS developers might write extract-transform-load (ETL) or analytic software that is provided to a separate team of analysts or other professionals who utilize the software and its resultant data products. The development team might maintain operations and maintenance (O&M) administrative activities for the software, including training, maintenance, and planning software end-of-life, but otherwise the team would not use or interact with the software it developed.
When data-analytic-traditional environments do exist, they typically produce software only for teams internal to their organization. Service level agreements (SLAs) sometimes exist between the development team and the team(s) they support, but the SAS software developed is typically neither sold nor purchased. Because SAS code is plain text and open to inspection, it's uncommon for a SAS development team to sell software beyond its organization. SAS consultants, rather, often operate within this niche, providing targeted developmental support to organizations.
The third and final hybrid environment, the end-user-traditional model, demonstrates software developed by and for SAS practitioners that is not data-focused. Rather than processing or analyzing variable data, the SAS software might operate as a stand-alone application, driven by user inputs. For example, if a rogue SAS practitioner spent a couple weeks of work encoding and bringing to life Parker Brothers' legendary board game Monopoly in Base SAS, the software itself would be the ultimate product. Of course, whether or not the analyst was able to retain his job thereafter would depend on whether his management perceived any business value in the venture!
Because of the tendency of data analytic development to occur within end-user development environments, traditional development is not discussed further in this text except as a comparison where strengths and weaknesses exist between traditional and other development environments. The pros and cons of end-user development are discussed in the next section.
Many end-user developers may not even consider themselves to be software developers. I first learned SAS working in a Veterans Administration (VA) psychiatric ward, and my teachers were psychologists, psychiatrists, statisticians, and other researchers. We saw patients, recorded and entered data, wrote and maintained our own software, analyzed clinical trials data, and conducted research and published papers on a variety of psychiatric topics. We didn't have a single “programmer” on our staff, although more than half of my coworkers were engaged in some form of data analysis in varying degrees of code complexity. However, because we were clinicians first and researchers second, the idea that we were also software developers would have seemed completely foreign to many of the staff.
In fact, this identity crisis is exactly why I use “SAS practitioners” to represent the breadth of professionals who develop software in the Base SAS language—because so many of us may feel that we are only moonlighting as software developers, despite the vast quantity of SAS software we may produce. This text represents a step toward acceptance of our roles—however great or small—as software developers.
The principal advantage of end-user development is the ability for domain experts—those who understand both the ultimate project intent and its data—to design software. The psychiatrists didn't need a go-between to help convey technical concepts to them, because they themselves were building the software. Neither was a business analyst required to convey the ultimate business need and intent of the software to the developers, because the developers were the psychiatrists—the domain experts. Because end-user developers possess both domain knowledge and technical savvy, they are poised to rapidly implement technical solutions that fulfill business needs without business analysts or other brokers.
To contrast, traditional software development environments often demonstrate a domain knowledge divide in which high-level project intent and requirements must be translated to software developers (who lack domain expertise) and where some technical aspects of the software must be translated to customers (who lack technical expertise in computer science or software development). Over time, stakeholders will tend to broaden their respective job roles and knowledge but, if left unmitigated, the domain knowledge divide can lead to communication breakdown, misinterpretation of software intent or requirements, and less functional or lower quality software. In these environments, business analysts and other brokers play a critical role in ensuring a smooth communication continuum among domain knowledge, project needs and objectives, and technical requirements.
Traditional software development environments do outperform end-user development in some aspects. Because developers in traditional environments operate principally as software developers, they're more likely to be educated and trained in software engineering, computer science, systems engineering, or other technically relevant fields. They may not have domain-specific certifications or accreditations to upkeep (like clinicians or psychiatrists) so they can more easily seek out training and education opportunities specific to software development. For example, in my work in the VA hospital, when we received training, it was related to patient care, psychiatry, privacy regulations, or some other medically focused discipline. We read and authored journal articles and other publications on psychiatric topics—but never software development.
Because of this greater focus on domain-specific education, training, and knowledge, end-user developers are less likely to implement (and, in some cases, may be unaware of) established best practices in software development such as reliance on the SDLC, Agile development methodologies, and performance requirements such as those described in the International Organization for Standardization (ISO) software product quality model. Thus, end-user developers can be disadvantaged relative to traditional software developers, both in software development best practices as well as best practices that describe the software development environment.
To overcome inherent weaknesses of end-user development, SAS practitioners operating in these environments should invest in software development learning and training opportunities commensurate with their software development responsibilities. While I survived my tenure in the VA psych ward and did produce much quality software, I would have improved my skills (and software) had I read fewer Diagnostic and Statistical Manual of Mental Disorders (DSM) case studies and more SAS white papers and computer science texts.
The SDLC describes discrete phases through which software passes from cradle to grave. In a more generic sense, the SDLC is also referenced as the systems development life cycle, which bears the same acronym. Numerous representations of the SDLC exist; Figure 1.2 shows a common depiction.
Figure 1.2 The Software Development Life Cycle (SDLC)
In many data analytic and end-user development environments, the SDLC is not in place, and software is produced using an undisciplined, laissez-faire method sometimes referred to as cowboy coding. Notwithstanding any weaknesses this may present, the ISO software product quality model benefits these relaxed development environments, regardless of whether the SDLC phases are formally recognized or implemented. Because the distinct phases of the SDLC are repeatedly referenced throughout this text, readers who lack experience in formalized development environments should learn the concepts associated with each phase so they can apply them (contextually, if not in practice) to their specific environment while reading this text.
Planning
Project needs are identified and high-level discussions occur, such as the “build-versus-buy” decision of whether to develop software, purchase a solution, or abandon the project. Subsequent discussion should define the functionality and performance of proposed software, thus specifying its intended quality.
Design
Function and performance, as they relate to technical implementation, are discussed. Whereas planning is needs-focused, design and later phases are solutions- and software-focused. In relation to quality, specific, measurable performance requirements should be created and, if formalized software testing is implemented, a test plan with test cases should be created.
Development
Software is built to meet project needs and requirements, including accompanying documentation and other artifacts.
Testing
Software is tested (against a test plan using test cases and test data, if these artifacts exist) and modified until it meets requirements.
Acceptance
Software is validated to meet requirements and formally accepted by stakeholders as meeting the intended functional and performance objectives.
Operation
Software is used for some intended duration. Where software maintenance is required, this occurs simultaneously with operation although these discrete activities may be performed by different individuals or teams.
Maintenance
While software is in operation, maintenance or modification may be required. Types of maintenance are discussed in
chapter 13
, “Maintainability,” and may be performed by users (in end-user development), by the original developers, or by a separate O&M team that supports software maintenance once development has concluded.
End of Life
Software is phased out and replaced at some point; however, this should be an intentional decision by stakeholders rather than a retreat from software that, due to poor quality, no longer meets functional or performance requirements.
Although the SDLC is often depicted and conceptualized as containing discrete phases, significant interaction can occur between phases. For example, during the design phase, a developer may take a couple of days to do some development work to test a theory to determine whether it will present a viable solution for the software project. Or, during testing, when significant vulnerabilities or defects are discovered, developers may need to overhaul software, including redesign and redevelopment. Thus, while SDLC phases are intended to represent the focus and majority of the work occurring at that time, their intent is not to exclude other activities that would naturally occur.
Roles such as customer, software developer, tester, and user are uniquely described in software development literature. While some cross-functional development teams do delineate responsibilities by role, in other environments, roles and responsibilities are combined. An extreme example of role combination is common in end-user development environments in which developers write, test, and use their own software—bestowing them with developer, tester, user, and possibly customer credentials. SAS end-user developers often have primary responsibilities in their respective business domain as researchers, analysts, scientists, and other professionals, but develop software to further these endeavors.
A stakeholder represents the “individual or organization having a right, share, claim, or interest in a system or in its possession of characteristics that meet their needs and expectations.”1 While the following distinct stakeholders are referenced throughout the text, readers should interpret and translate these definitions to their specific environments, in which multiple roles may be coalesced into a single individual and in which some roles may be absent:
Sponsor
“The individual or group that provides the financial resources, in cash or in kind, for the project.”
2
Sponsors are rarely discussed in this text but, as software funders, often dictate software quality requirements.
Customer
“The entity or entities for whom the requirements are to be satisfied in the system being defined and developed.”
3
The customer can be the product owner (in Agile or Scrum environments), project manager, sponsor, or other authority figure delegating requirements. This contrasts with some software development literature, especially Agile-related, in which the term
customer
often represents the software end user.
SAS Practitioner/Developer
These are the folks in the trenches writing SAS code. I use the terms
practitioner
and
developer
interchangeably, but intentionally chose
SAS practitioner
because it embodies the panoply of diverse professionals who use the SAS application to write SAS software to support their domain-specific work.
Software Tester
Testers perform a quality assurance function to determine if software meets needs, requirements, and other technical specifications. A tester may be the developer who authored the code, a separate developer (as in software peer review), or an individual or quality assurance team whose sole responsibility is to test software.
User
“The individual or organization that will use the project's product.”
4
In end-user development environments, users constitute the SAS practitioners who wrote the software, while in other environments, users may be analysts or other stakeholders who operate SAS software but who are not responsible for software development, testing, or maintenance activities.
Waterfall software development methodologies employ a stop-gate or phase-gate approach to software development in which discrete phases are performed in sequence. For example, Figure 1.3 demonstrates that planning concludes before design commences, and all design concludes before development commences. This approach is commonly referred to as big design up front (BDUF), because the end-state of software is expected to be fully imagined and prescribed in the initial design documentation, with emphasis on rigid adherence to this design.
Figure 1.3 Waterfall Development Methodology
For years, Waterfall methodologies have been anecdotally referred to as “traditional” software development. Since the rise of Agile software development methodologies in the early 2000s, however, an entire generation of software developers now exists who (fortunately) have never had to experience rigid Waterfall development, so the “traditional” nomenclature is outmoded. Waterfall development methodologies are often criticized because they force customers to predict all business needs up front and eschew flexibility of these initial designs; software products may be delivered on time, but weeks or months after customer needs or objectives have shifted to follow new business opportunities. Thus, the software produced may meet the original needs and requirements, but often fails to meet all current needs and requirements.
Despite the predominant panning of Waterfall development methodologies within contemporary software development literature, a benefit of Waterfall is its clear focus on SDLC phases, even if they are rigidly enforced. For example, because development follows planning and design, software developers only write software after careful consideration of business needs and identification of a way ahead to achieve those objectives. Further, because all software is developed before testing, the testing phase comprehensively validates function and performance against requirements. Thus, despite its rigidity, the phase-gate approach encourages quality controls between discrete phases of the SDLC.
Agile software development methodologies contrast with Waterfall methodologies in that Agile methodologies emphasize responsible flexibility through rapid, incremental, iterative design and development. Agile methodologies follow the Manifesto for Agile Software Development (AKA the Agile Manifesto) and include Scrum, Lean, Extreme Programming (XP), Crystal, Scaled Agile Framework (SAFe), Kanban, and others.
The Agile Manifesto was cultivated by a group of 17 software development gurus who met in Snowbird, Utah, in 2001 to elicit and define a body of knowledge that prescribes best practices for software development.
We are uncovering better ways of developing software by doing it and helping others do it.
Through this work we have come to value:
Individuals and interactions
over processes and tools
Working software
over comprehensive documentation
Customer collaboration
over contract negotiation
Responding to change
over following a plan
That is, while there is value in the items on the right, we value the items on the left more.5
In Agile development environments, software is produced through iterative development in which the entire SDLC occurs within a time-boxed iteration, typically from two to eight weeks. Within that iteration, software design, development, testing, validation, and production occur so that working software is released to the customer at the end of the iteration. At that point, customers prioritize additional functionality or performance to be included in future development iterations. Customers benefit because they can pursue new opportunities and business value during software development, rather than be forced to continue funding or leading software projects whose value decreases over an extended SDLC due to shifting business needs, opportunities, risks, and priorities. Figure 1.4 demonstrates Agile software development in which software is developed in a series of two-week iterations.
Figure 1.4 Agile Software Development
Agile is sometimes conceptualized as a series of miniature SDLC life cycles and, while this does describe the iterative nature of Agile development, it fails to fully capture Agile principles and processes. For example, because Agile development releases software iteratively, maintenance issues from previous iterations may bubble up to the surface during a current iteration, forcing developers (or their customers) to choose between performing necessary maintenance or releasing new functionality or performance as scheduled. Thus, a weakness ascribed to Agile is the inherent competition that exists between new development and maintenance activities, which is discussed in the “Maintenance in Agile Environments” section in chapter 13, “Maintainability.” This competition contrasts with Waterfall environments, in which software maintenance is performed primarily once software is in production and development tasks have largely concluded.
Despite this potential weakness, Agile has been lauded as a best practice in software development for more than a decade and has defined software development in the 21st century. Its prominence within traditional applications development environments, however, has not been mirrored within data analytic development environments. This is due in part to the predominance of end-user developers who support data analytic development and who are likely more focused on domain-specific best practices rather than software development methodologies and best practices.
Another weakness is found in the body of Agile literature itself, which often depicts an idealized “developer” archetype whose responsibilities seem focused narrowly on the development of releasable code rather than the creation of data products or participation in other activities that confer business value. In these software-centric Agile descriptions, common activities in data analytic development environments (such as data analysis or report writing) are often absent or only offhandedly referenced. Despite this myopia within Agile literature, Agile methodologies, principles, and techniques are wholly applicable to and advisable for data analytic development.
For those interested in exploring Agile methodologies, dozens of excellent resources exist, although these typically describe traditional software applications development. For an introduction to Agile methodologies to support data analytic development, I demonstrate the successful application of Agile to SAS software development in a separate text: When Software Development Is Your Means Not Your End: Abstracting Agile Methodologies for End-User Development and Analytic Application.
From a software development perspective, basic risks include functional and performance failure in software. For example, a nonmalicious threat (like big data) exploits a software vulnerability (like an underlying error that limits efficient scaling when big data are encountered), causing risk (inefficient performance or functional failure) to business value. These terms are defined in the text box “Threat, Vulnerability, and Risk.” While the Project Management Body of Knowledge (PMBOK) and other sources define positive risk as opportunity, only negative risks are discussed within this text.
Threat
“A risk that would have a negative effect on one or more project objectives.”
6
“A state of the system or system environment which can lead to adverse effect in one or more given risk dimensions.”
7
Vulnerability
“Weakness in an information system, or cryptographic system, or components (e.g., system security procedures, hardware design, internal controls) that could be exploited.”
8
Risk
“An uncertain event or condition that, if it occurs, has a positive or negative effect on one or more project objectives.”
9
“The combination of the probability of an abnormal event or failure and the consequence(s) of that event or failure to a system's components, operators, users, or environment.”
10
Software failure is typically caused by threats that exploit vulnerabilities, but neither all threats nor all vulnerabilities will lead to failure. Errors (human mistakes) may lie dormant in code as vulnerabilities that may or may not be known to developers. Unknown vulnerabilities include coding mistakes (defects) that have not yet resulted in failure, while known vulnerabilities include coding mistakes (latent defects) that are identified yet unresolved. The “Paths to Failure” section in chapter 4, “Reliability,” further defines and distinguishes these terms.
For example, the following SAS code is often represented in literature as a method to determine the number of observations in a data set:
proc sql noprint; select count(*) into :obstot from temp; quit;
The code is effective for data sets that have fewer than 100 million observations but, as this threshold is crossed, the &OBSTOT changes from numeric to scientific notation. For example, a data set having 10 million observations is represented as 10000000, while 100 million observations is represented as 1E8. To the SAS practitioner running this code to view the number of observations in the log, this discrepancy causes no problems. However, if a subsequent procedure attempts to evaluate or compare &OBSTOT, runtime errors can occur if the evaluated number is in scientific notation. This confusion is noted in the following output:
%let obstot=1E8; %if &obstot<5000000 %then %put LESS THAN 5 MILLION; %else %put GREATER THAN 5 MILLION; LESS THAN 5 MILLION
Obviously 100 million is not less than 5 million but, because of two underlying errors, a vulnerability in the code exists. The vulnerability can be easily eliminated by correcting either of the two errors. The first error can be eliminated by changing the assignment of &OBSTOT to include a format that will accommodate larger numbers, as demonstrated with the FORMAT statement. The second error can be eliminated by enclosing the numeric comparison inside the %SYSEVALF macro function, which interprets 1E8 as a number rather than text. Both solutions are demonstrated and either correction in isolation eliminates the vulnerability and prevents the failure.
proc sql noprint; select count(*) format=15.0 into :obstot from temp; quit; %if %sysevalf(&obstot<5000000) %then %put LESS THAN 5 MILLION; %else %put GREATER THAN 5 MILLION; GREATER THAN 5 MILLION
Because the failure occurs only as the number of observations increases, this can be described as a scalability error. The SAS practitioner failed to imagine (and test) what would occur if a large data set were encountered. But if the 100 million observation threshold is never crossed, the code will continue to execute without failure despite still containing errors. This error type is discussed further in the “SAS Application Thresholds” section in chapter 9, “Scalability.”
Developers often intentionally introduce vulnerabilities into software. For example, a developer familiar with the previous software vulnerability (exploited by the threat of big data) might choose to ignore the error in software designed to process data sets of 10,000 or fewer observations. Because the risk is negligible, it can be accepted, and the software can be released as is—with the vulnerability. In other cases, threats may pose higher risks, yet the risks are still accepted because the cost to eliminate or mitigate them outweighs the benefit.
Unexploited vulnerabilities don't diminish software reliability because no failure occurs. For example, the previous latent defect is never exploited because big data are never encountered. However, vulnerabilities do increase the risk of software failures; therefore, developers should be aware of specific risks to software. In this example, the risk posed is failure caused by the accidental processing of big data within the SQL procedure. When vulnerabilities are exploited and runtime errors or other failures occur, software reliability is diminished. The risk register, introduced in the next section, enables SAS practitioners to record known vulnerabilities, expected risks, and proposed solutions to best measure and manage risk level for software products.
A risk register is a “record of information about identified risks.”11 Risk is an inherent reality of all software applications, so risk registers (sometimes referred to as defect databases) document risks, threats, vulnerabilities, and related information throughout the SDLC. Developers and other stakeholders should decide which performance requirements to incorporate in software, but likely will not include all performance requirements in all software. While vulnerabilities will exist in software, it's important they be identified, investigated, and documented sufficiently to demonstrate the specific risks they pose to software operation.
A risk register qualitatively and quantitatively records known vulnerabilities and associated threats and risks to software function or performance, and can include the following elements:
Description of vulnerability
Location of vulnerability
Threat(s) to vulnerability
Risk if vulnerability is exploited
Severity of risk
Probability of risk
Likelihood of discovery
Cost to eliminate or mitigate risk
Recommended resolution
Some risk registers, as demonstrated, are organized at the defect level while others are organized at the threat or risk level. Vulnerability-level risk registers are common in software development because while many threats lie outside the control of developers, programmatic solutions can often be implemented to eliminate or mitigate specific vulnerabilities. Moreover, general threats like “big data” can exploit numerous, unrelated vulnerabilities within a single software product.
Table 1.1 depicts a simplified risk register for two of the errors mentioned in the code. The risk severity, risk probability, likelihood of risk discovery, and cost to implement solution are demonstrated on a scale of 1 to 5, in which 5 is more severe, more likely to occur, less easy to discover, and more costly to repair.
Table 1.1 Sample Risk Register
Num
Vulnerability
Location
Risk
Risk Severity
Risk Probability
Risk Discovery
Risk Cost
1
%SYSEVALF should be used in evaluation
less-than operator
scientific notation won't be interpreted correctly
5
1
5
1
2
no format statement
SELECT statement of PROC SQL
scientific notation won't be interpreted correctly
5
1
5
1
The first and second risks describe separate vulnerabilities, each exploited by the threat of data sets containing 100 million or more observations. Despite the high severity (5) if the threat is encountered, the likelihood is low (1) because these file sizes have never been encountered in this environment. If these two factors alone were considered, a development team might choose to accept the risk and release the software with the vulnerabilities, given their unlikelihood of occurrence. However, because the likelihood of discovery is low (5)—as no warning or runtime error would be produced if the threat were encountered—and because the cost to implement a remedy (modifying one line of code) is low (1), the development team might instead decide to modify the code, thus eliminating the risk rather than accepting it.
Not depicted, the recommended solution describes the path chosen to manage the risk—often distilled as avoidance, transfer, acceptance, or mitigation, and described in the following section. The recommended solution may contain a technical description of how the risk is being managed. For example, if a risk is being eliminated, the resolution might describe programmatically how the associated threat is being eliminated or controlled or how the associated vulnerability is being eliminated so it can't be exploited by the threat.
Risk management
