SAS Data Analytic Development - Troy Martin Hughes - E-Book

SAS Data Analytic Development E-Book

Troy Martin Hughes

0,0
48,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Design quality SAS software and evaluate SAS software quality SAS Data Analytic Development is the developer's compendium for writing better-performing software and the manager's guide to building comprehensive software performance requirements. The text introduces and parallels the International Organization for Standardization (ISO) software product quality model, demonstrating 15 performance requirements that represent dimensions of software quality, including: reliability, recoverability, robustness, execution efficiency (i.e., speed), efficiency, scalability, portability, security, automation, maintainability, modularity, readability, testability, stability, and reusability. The text is intended to be read cover-to-cover or used as a reference tool to instruct, inspire, deliver, and evaluate software quality. A common fault in many software development environments is a focus on functional requirements--the what and how--to the detriment of performance requirements, which specify instead how well software should function (assessed through software execution) or how easily software should be maintained (assessed through code inspection). Without the definition and communication of performance requirements, developers risk either building software that lacks intended quality or wasting time delivering software that exceeds performance objectives--thus, either underperforming or gold-plating, both of which are undesirable. Managers, customers, and other decision makers should also understand the dimensions of software quality both to define performance requirements at project outset as well as to evaluate whether those objectives were met at software completion. As data analytic software, SAS transforms data into information and ultimately knowledge and data-driven decisions. Not surprisingly, data quality is a central focus and theme of SAS literature; however, code quality is far less commonly described and too often references only the speed or efficiency with which software should execute, omitting other critical dimensions of software quality. SAS® software project definitions and technical requirements often fall victim to this paradox, in which rigorous quality requirements exist for data and data products yet not for the software that undergirds them. By demonstrating the cost and benefits of software quality inclusion and the risk of software quality exclusion, stakeholders learn to value, prioritize, implement, and evaluate dimensions of software quality within risk management and project management frameworks of the software development life cycle (SDLC). Thus, SAS Data Analytic Development recalibrates business value, placing code quality on par with data quality, and performance requirements on par with functional requirements.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 904

Veröffentlichungsjahr: 2016

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Preface

Objectives

Audience

Application of Content

Organization

Acknowledgments

About the Author

Chapter 1: Introduction

Distinguishing Data Analytic Development

Software Development Life Cycle (SDLC)

Risk

Chapter 2: Quality

Defining Quality

Software Product Quality Model

Quality in the SDLC

Chapter 3: Communication

Return Codes

System Numeric Return Codes

System Alphanumeric Return Codes

User-Generated Return Codes

Parallel Processing Communication

Part I: Dynamic Performance

Chapter 4: Reliability

Defining Reliability

Paths To Failure

ACL: The Reliability Triad

Reliability in the SDLC

Chapter 5: Recoverability

Defining Recoverability

Recoverability toward Reliability

Recoverability Matrix

TEACH Recoverability Principles

SPICIER Recoverability Steps

Recovering with Checkpoints

Recoverability in the SDLC

Chapter 6: Robustness

Defining Robustness

Robustness toward Reliability

Defensive Programming

Exception Handling

Robustness in the SDLC

Chapter 7: Execution Efficiency

Defining Execution Efficiency

Factors Affecting Execution Efficiency

False Dependencies

Parallel Processing

Execution Efficiency in the SDLC

Chapter 8: Efficiency

Defining Efficiency

Disambiguating Efficiency

Defining Resources

Efficiency in the SDLC

Chapter 9: Scalability

Defining Scalability

The Scalability Triad

Resource Scalability

Demand Scalability

Load Scalability

Scalability in the SDLC

Chapter 10: Portability

Defining Portability

Disambiguating Portability

3GL versus 4GL Portability

Facets of Portability

Portability in the SDLC

Chapter 11: Security

Defining Security

Confidentiality

Integrity

Availability

Security in the SDLC

Chapter 12: Automation

Defining Automation

Automation in SAS Software

SAS Processing Modes

Starting in Interactive Mode

Starting in Batch Mode

Automation in the SDLC

Part II: Static Performance

Chapter 13: Maintainability

Defining Maintainability

Maintenance

Maintenance in the SDLC

Failure to Maintain

Maintainability

Chapter 14: Modularity

Defining Modularity

From Monolithic to Modular

Modularity Principles

Benefits of Modularity

Chapter 15: Readability

Defining Readability

Plan to Get Hit by a Bus

Software Readability

External Readability

Chapter 16: Testability

Defining Testability

Software Testing

Testability

Chapter 17: Stability

Defining Stability

Achieving Stability

Stable Requirements

Defect-Free Code

Dynamic Flexibility

Stability And Beyond

Modularizing More Than Macros

Chapter 18: Reusability

Defining Reusability

Reuse

Reusability

From Reusability to Extensibility

Index

End User License Agreement

Pages

ii

iii

vii

xi

xii

xiii

xiv

xv

xvi

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

22

23

24

25

26

27

28

29

30

31

32

33

34

35

36

37

38

39

40

41

42

43

44

45

46

47

48

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

87

85

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

407

408

409

410

411

412

413

414

415

416

417

418

419

421

419

422

423

424

425

426

427

428

429

430

431

432

433

434

435

436

437

438

439

440

441

442

443

444

445

447

448

449

450

451

452

453

454

455

456

457

458

459

460

461

462

463

464

465

466

467

468

469

470

471

472

473

474

475

476

477

478

479

480

481

482

483

484

485

486

487

488

489

490

491

492

493

494

495

496

497

498

499

500

501

502

503

504

505

507

508

509

510

511

512

513

514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529

530

531

532

533

534

535

536

537

538

539

540

541

542

543

544

545

546

547

548

549

550

551

552

553

554

555

556

557

558

559

560

561

562

563

564

565

566

567

568

569

570

571

572

573

574

575

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596

597

598

599

600

601

603

604

605

606

Guide

Table of Contents

Begin Reading

List of Illustrations

Chapter 1: Introduction

Figure 1.1 Software Development Environments

Figure 1.2 The Software Development Life Cycle (SDLC)

Figure 1.3 Waterfall Development Methodology

Figure 1.4 Agile Software Development

Chapter 2: Quality

Figure 2.1 ISO Software Product Quality Model

Figure 2.2 Software Quality Model Demonstrated in Chapter Organization

Figure 2.3 Interaction of Software Quality Constructs and Dimensions

Figure 2.4 Underperformance and Gold-Plating

Chapter 4: Reliability

Figure 4.1 Software Decay Continuum

Figure 4.2 Increasing Requirements in Operational Phase

Figure 4.3 Reliability Growth Curve with Decreasing Failure Rate

Figure 4.4 Traditional and End-User Development Reliability Growth Curves

Figure 4.5 MTBF Competing Definitions, Inclusive and Exclusive of Recovery

Chapter 5: Recoverability

Figure 5.1 Comparing MTTF and MTBF

Figure 5.2 MTBF Influencing Reliability Growth Curve

Figure 5.3 Recoverability Growth Curve with Decreasing Recovery Period

Figure 5.4 Interaction of Reliability and Recoverability

Figure 5.5 MTTR, RTO, and MTD

Chapter 6: Robustness

Figure 6.1 Exception Inheritance

Figure 6.2 Exception Inheritance in Software Reuse

Figure 6.3 Program Flow and Happy Trail

Chapter 7: Execution Efficiency

Figure 7.1 From Monolithic to Parallel Program Flow

Figure 7.2 Critical Path Analysis

Figure 7.3 Data Set False Dependencies

Figure 7.4 Data Set False Dependencies Removed

Figure 7.5 Diminishing Return Curves for Software Performance

Figure 7.6 Histogram of Real Time for Sorting 10 Million Observations

Figure 7.7 Histogram of Real Time and CPU Time for Sorting 10 Million Observations

Chapter 8: Efficiency

Figure 8.1 Memory Usage by SORT Procedure

Figure 8.2 Inefficiency Elbow Caused by RAM Exhaustion

Chapter 9: Scalability

Figure 9.1 Multithreaded versus Single-Threaded SORT Procedure

Figure 9.2 Contrasting Scalability Performance

Figure 9.3 Phase-Gate and Parallel Software Models

Figure 9.4 Inefficiency Elbow on SORT Procedure

Figure 9.5 Inefficiency Elbow Solutions

Figure 9.6 Scaling with Efficiency

Figure 9.7 Incremental Refactoring to Support Increased Software Performance

Chapter 10: Portability

Figure 10.1 SAS University Edition Performance Limitations

Chapter 12: Automation

Figure 12.1 SAS Display Manager at Startup

Chapter 14: Modularity

Figure 14.1 Transitive Property of Data Analytic Software

List of Tables

Chapter 1: Introduction

Table 1.1 Sample Risk Register

Chapter 2: Quality

Table 2.1 Common Objectives and Questions

Chapter 4: Reliability

Table 4.1 Sample Failure Log for SAS Software Run Daily for Two Months

Table 4.2 Inclusive versus Exclusive MTBF Calculations

Chapter 5: Recoverability

Table 5.1 Sample Failure Log for SAS Software Run Daily for Two Months

Chapter 6: Robustness

Table 6.1 Abbreviated Risk Register

Chapter 7: Execution Efficiency

Table 7.1 Control Table Demonstrating Parallel Processing

Chapter 9: Scalability

Table 9.1 Control Table Demonstrating Parallel Processing

Table 9.2 FULLSTIMER Metrics for SORT Procedure at Inefficiency Elbow

Wiley & SAS Business Series

The Wiley & SAS Business Series presents books that help senior-level managers with their critical management decisions.

Titles in the Wiley & SAS Business Series include:

Agile by Design: An Implementation Guide to Analytic Lifecycle Management by Rachel Alt-Simmons

Analytics in a Big Data World: The Essential Guide to Data Science and Its Applications

by Bart Baesens

Bank Fraud: Using Technology to Combat Losses

by Revathi Subramanian

Big Data, Big Innovation: Enabling Competitive Differentiation through Business Analytics

by Evan Stubbs

Business Forecasting: Practical Problems and Solutions

edited by Michael Gilliland, Len Tashman, and Udo Sglavo

Business Intelligence Applied: Implementing an Effective Information and Communications Technology Infrastructure

by Michael Gendron

Business Intelligence and the Cloud: Strategic Implementation Guide

by Michael S. Gendron

Business Transformation: A Roadmap for Maximizing Organizational Insights

by Aiman Zeid

Data-Driven Healthcare: How Analytics and BI Are Transforming the Industry

by Laura Madsen

Delivering Business Analytics: Practical Guidelines for Best Practice

by Evan Stubbs

Demand-Driven Forecasting: A Structured Approach to Forecasting, Second Edition

by Charles Chase

Demand-Driven Inventory Optimization and Replenishment: Creating a More Efficient Supply Chain

by Robert A. Davis

Developing Human Capital: Using Analytics to Plan and Optimize Your Learning and Development Investments

by Gene Pease, Barbara Beresford, and Lew Walker

Economic and Business Forecasting: Analyzing and Interpreting Econometric Results

by John

Silvia, Azhar Iqbal, Kaylyn Swankoski, Sarah Watt, and Sam Bullard

Financial Institution Advantage and the Optimization of Information Processing

by Sean C. Keenan

Financial Risk Management: Applications in Market, Credit, Asset, and Liability Management and Firmwide Risk

by Jimmy Skoglund and Wei Chen

Fraud Analytics Using Descriptive, Predictive, and Social Network Techniques: A Guide to Data Science for Fraud Detection

by Bart Baesens, Veronique Van Vlasselaer, and Wouter Verbeke

Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data Driven Models

by Keith Holdaway

Health Analytics: Gaining the Insights to Transform Health Care

by Jason Burke

Heuristics in Analytics: A Practical Perspective of What Influences Our Analytical World

by Carlos Andre, Reis Pinheiro, and Fiona McNeill

Hotel Pricing in a Social World: Driving Value in the Digital Economy

by Kelly McGuire

Implement, Improve and Expand Your Statewide Longitudinal Data System: Creating a Culture of Data in Education

by Jamie McQuiggan and Armistead Sapp

Killer Analytics: Top 20 Metrics Missing from Your Balance Sheet

by Mark Brown

Mobile Learning: A Handbook for Developers, Educators, and Learners

by Scott McQuiggan, Lucy Kosturko, Jamie McQuiggan, and Jennifer Sabourin

The Patient Revolution: How Big Data and Analytics Are Transforming the Healthcare Experience

by Krisa Tailor

Predictive Analytics for Human Resources

by Jac Fitz-enz and John Mattox II

Predictive Business Analytics: Forward-Looking Capabilities to Improve Business Performance

by Lawrence Maisel and Gary Cokins

Statistical Thinking: Improving Business Performance, Second Edition

by Roger W. Hoerl and Ronald D. Snee

Too Big to Ignore: The Business Case for Big Data

by Phil Simon

Trade-Based Money Laundering: The Next Frontier in International Money Laundering Enforcement

by John Cassara

The Visual Organization: Data Visualization, Big Data, and the Quest for Better Decisions

by Phil Simon

Understanding the Predictive Analytics Lifecycle

by Al Cordoba

Unleashing Your Inner Leader: An Executive Coach Tells All

by Vickie Bevenour

Using Big Data Analytics: Turning Big Data into Big Money

by Jared Dean

Visual Six Sigma, Second Edition

by Ian Cox, Marie Gaudard, and Mia Stephens.

For more information on any of the above titles, please visit www.wiley.com.

SAS® Data Analytic Development

Dimensions of Software Quality

Troy Martin Hughes

 

 

 

Copyright © 2016 by SAS Institute, Inc. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.

Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 646-8600, or on the Web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permissions.

Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

SAS and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.

Wiley publishes in a variety of print and electronic formats and by print-on-demand. Some material included with standard print versions of this book may not be included in e-books or in print-on-demand. If this book refers to media such as a CD or DVD that is not included in the version you purchased, you may download this material at http://booksupport.wiley.com. For more information about Wiley products, visit www.wiley.com.

Library of Congress Cataloging-in-Publication Data:

Names: Hughes, Troy Martin, 1976– author.

Title: SAS data analytic development : dimensions of software quality / Troy Martin Hughes.

Description: Hoboken, New Jersey : John Wiley & Sons, 2016. | Includes index.

Identifiers: LCCN 2016021300 | ISBN 9781119240761 (cloth) | ISBN 9781119255918 (epub) | ISBN 9781119255703 (ePDF)

Subjects: LCSH : SAS (Computer file) | Quantitative research—Data processing.

Classification: LCC QA276.45.S27 H84 2016 | DDC 005.5/5—dc23

LC record available at https://lccn.loc.gov/2016021300

Cover Design: Wiley

Cover Image: © Emelyanov/iStockphoto

To Mom,who dreamed of being a writer and,through unceasing love, raised one,and Dad,who taught me to programbefore I could even reach the keys.

Preface

Because SAS practitioners are software developers, too!

Within the body of SAS literature, an overwhelming focus on data quality eclipses software quality. Whether discussed in books, white papers, technical documentation, or even posted job descriptions, nearly all references to quality in relationship to SAS describe the quality of data or data products.

The focus on data quality and diversion from traditional software development priorities is not without reason. Data analytic development is software development but ultimate business value is delivered not through software products but rather through subsequent, derivative data products. In aligning quality only with data, however, data analytic development environments can place an overwhelming focus on software functional requirements to the detriment or exclusion of software performance requirements. When SAS literature does describe performance best practices, it typically demonstrates only how to make SAS software faster or more efficient while omitting other dimensions of software quality.

However, what about software reliability, scalability, security, maintainability, or modularity—or the host of other software quality characteristics? For all the SAS practitioners of the world—including developers, biostatisticians, econometricians, researchers, students, project managers, market analysts, data scientists, and others—this text demonstrates a model for software quality promulgated by the International Organization for Standardization (ISO) to facilitate the evaluation and pursuit of software quality.

Through hundreds of Base SAS software examples and more than 4,000 lines of code, SAS practitioners will learn how to define, prioritize, implement, and measure 15 dimensions of software quality. Moreover, nontechnical stakeholders, including project managers, functional managers, customers, sponsors, and business analysts, will learn to recognize the value of quality inclusion and the commensurate risk of quality exclusion. With this more comprehensive view of quality, SAS software quality is finally placed on par with SAS data quality.

Why this text and the relentless pursuit of SAS software quality? Because SAS practitioners, regardless of job title, are inherently software developers, too, and should benefit from industry standards and best practices. Software quality can and should be allowed to flourish in any environment.

OBJECTIVES

The primary goal is to describe and demonstrate SAS software development within the framework of the ISO software product quality model. The model defines characteristics of software quality codified within the Systems and software Quality Requirements and Evaluation (SQuaRE) series (ISO/IEC 25000:2014). Through the 15 intertwined dimensions of software quality presented in this text, readers will be equipped to understand, implement, evaluate, and, most importantly, value software quality.

A secondary goal is to demonstrate the role and importance of the software development life cycle (SDLC) in facilitating software quality. Thus, the dimensions of quality are presented as enduring principles that influence software planning, design, development, testing, validation, acceptance, deployment, operation, and maintenance. The SDLC is demonstrated in a requirements-based framework in which ultimate business need spawns technical requirements that drive the inclusion (or exclusion) of quality in software. Requirements initially provide the backbone of software design and ultimately the basis against which the quality of completed software is evaluated.

A tertiary goal is to demonstrate SAS software development within a risk management framework that identifies the threats of poor quality software to business value. Poor data quality is habitually highlighted in SAS literature as a threat to business value, but poor code quality can equally contribute to project failure. This text doesn't suggest that all dimensions of software quality should be incorporated in all software, but rather aims to formalize a structure through which threats and vulnerabilities can be identified and their ultimate risk to software calculated. Thus, performance requirements are most appropriately implemented when the benefits of their inclusion as well as the risks of their exclusion are understood.

AUDIENCE

Savvy SAS practitioners are the intended audience and represent the professionals who utilize the SAS application to write software in the Base SAS language. An advanced knowledge of Base SAS, including the SAS macro language, is recommended but not required.

Other stakeholders who will benefit from this text include project sponsors, customers, managers, Agile facilitators, ScrumMasters, software testers, and anyone with a desire to understand or improve software performance. Nontechnical stakeholders may have limited knowledge of the SAS language, or software development in general, yet nevertheless generate requirements that drive software projects. These stakeholders will benefit through the introduction of quality characteristics that should be used to define software requirements and evaluate software performance.

APPLICATION OF CONTENT

The ISO software product quality model is agnostic to industry, team size, organizational structure (e.g., functional, projectized, matrix), development methodology (e.g., Agile, Scrum, Lean, Extreme Programming, Waterfall), and developer role (e.g., developer, end-user developer). The student researcher working on a SAS client machine will gain as much insight from this text as a team of developers working in a highly structured environment with separate development, test, and production servers.

While the majority of Base SAS code demonstrated is portable between SAS interfaces and environments, some input/output (I/O) and other system functions, options, and parameters are OS- or interface-specific. Code examples in this text have been tested in the SAS Display Manager for Windows, SAS Enterprise Guide for Windows, and the SAS University Edition. Functional differences among these applications are highlighted throughout the text, and discussed in chapter 10, “Portability.”

While this text includes hundreds of examples of SAS code that demonstrate the successful implementation and evaluation of quality characteristics, it differs from other SAS literature in that it doesn't represent a compendia of SAS software best practices, but rather the application of SAS code to support the software product quality model within the SDLC. Therefore, code examples demonstrate software performance rather than functionality.

ORGANIZATION

Most software texts are organized around functionality—either a top-down approach in which a functional objective is stated and various methods to achieve that goal are demonstrated, or a bottom-up approach in which uses and caveats of a specific SAS function, procedure, or statement are explored. Because this text follows the ISO software product quality model and focuses on performance rather than functionality, it eschews the conventional organization of functionality-driven SAS literature. Instead, 15 chapters highlight a dynamic or static performance characteristic—a single dimension of software quality. Code examples often build incrementally throughout each chapter as quality objectives are identified and achieved, and related quality characteristics are highlighted for future reference and reading.

The text is divided into two parts comprising 18 total chapters:

Overview

Three chapters introduce the concept of quality, the ISO software product quality model, the SDLC, risk management, Agile and Waterfall development methodologies, exception handling, and other information and terms central to the text. Even to the reader who is anxious to reach the more technically substantive performance chapters,

Chapters 1

, “Introduction,” and 2, “Quality,” should be skimmed to gleam the context of software quality within data analytic development environments.

Part I

. Dynamic Performance

These nine chapters introduce dynamic performance requirements—software quality attributes that are demonstrated, measured, and validated through software execution. For example, software efficiency can be demonstrated by running code and measuring run time and system resources such as CPU and memory usage. Chapters include “Reliability,” “Recoverability,” “Robustness,” “Execution Efficiency,” “Efficiency,” “Scalability,” “Portability,” “Security,” and “Automation.”

Part II

. Static Performance

These six chapters introduce static performance requirements—software quality attributes that are assessed through code inspection rather than execution. For example, the extent to which software is modularized cannot be determined until the code is opened and inspected, either through manual review or automated test software. Chapters include “Maintainability,” “Modularity,” “Readability,” “Testability,” “Stability,” and “Reusability.”

Text formatting constructs are standardized to facilitate SAS code readability. Formatting is not intended to demonstrate best practices but rather standardization. All code samples are presented in lowercase, but the following conventions are used where code is referenced within the text:

SAS libraries are capitalized, such as the

WORK library

, or the

PERM.Burrito data set within the PERM library

.

SAS data sets appear in sentence case, such as the

Chimichanga data set

or the

WORK.Tacos_are_forever data set

.

SAS reserved words—including statements, functions, and procedure names—are capitalized, such as the

UPCASE function

or the

MEANS procedure

.

The DATA step is always capitalized, such as the

DATA step can be deleted if the SQL procedure is implemented

.

Variables used within the DATA step or SAS procedures are capitalized, such as the

variable CHAR1 is missing

.

SAS user-defined formats are capitalized, such as the

MONTHS format

.

SAS macros are capitalized and preceded with a percent sign, such as the

%LOCKITDOWN macro prevents file access collisions

.

SAS macro variables are capitalized, such as the

&DSN macro variable is commonly defined to represent the data set name

.

SAS parameters that are passed to macros are capitalized, such as the

DSN parameter in the %GOBIG macro invocation

.

Acknowledgments

So many people, through contributions to my life as well as endurance and encouragement throughout this journey, have contributed directly and indirectly and made this project possible.

To the family and friends I ignored for four months while road-tripping through 24 states to write this, thank you for your love, patience, understanding, and couches.

To my teachers who instilled a love of writing, thank you for years of red ink and encouragement: Sister Mary Katherine Gallagher, Estelle McCarthy, Lorinne McKnight, Dolores Cummings, Millie Bizzini, Patty Ely, Jo Berry, Liana Hachiya, Audrey Musson, Dana Trevethan, Cheri Rowton, Annette Simmons, and Dr. Robyn Bell.

To the mentors whose words continue to guide me, thank you for your leadership and friendship: Dr. Cathy Schuman, Dr. Barton Palmer, Dr. Kiko Gladsjo, Dr. Mina Chang, Dean Kauffman, Rich Nagy, Jim Martin, and Jeff Stillman.

To my SAS spirit guides, thank you not only for challenging the limits of the semicolon but also for sharing your successes and failures with the world: Dr. Gerhard Svolba, Art Carpenter, Kirk Paul Lafler, Susan Slaughter, Lora Delwiche, Peter Eberhardt, Ron Cody, Charlie Shipp, and Thomas Billings.

To SAS, thank you for distributing the SAS University Edition and for providing additional software free of charge, without which this project would have been impossible.

Finally, thank you to John Wiley & Sons, Inc. for support and patience throughout this endeavor.

About the Author

Troy Martin Hughes has been a SAS practitioner for more than 15 years, has managed SAS projects in support of federal, state, and local government initiatives, and is a SAS Certified Advanced Programmer, SAS Certified Base Programmer, SAS Certified Clinical Trials Programmer, and SAS Professional V8. He has an MBA in information systems management and additional credentials, including: PMP, PMI-ACP, PMI-PBA, PMI-RMP, CISSP, CSSLP, CSM, CSD, CSPO, CSP, and ITIL v3 Foundation. He has been a frequent presenter and invited speaker at SAS user conferences, including SAS Global Forum, WUSS, MWSUG, SCSUG, SESUG, and PharmaSUG. Troy is a U.S. Navy veteran with two tours of duty in Afghanistan and, in his spare time, a volunteer firefighter and EMT.

Chapter 1Introduction

DATA ANALYTIC DEVELOPMENT

Software development in which ultimate business value is delivered not through software products but rather through subsequent, derivative data products, including data sets, databases, analyses, reports, and data-driven decisions.

Data analytic development creates and implements software as a means to an end, but the software itself is never the end. Rather, the software is designed to automate data ingestion, cleaning, transformation, analysis, presentation, and other data-centric processes. Through the results generated, subsequent data products confer information and ultimately knowledge to stakeholders. Thus, a software product in and of itself may deliver no ultimate business value, although it is necessary to produce the golden egg—the valuable data product. As a data analytic development language, Base SAS is utilized to develop SAS software products (programs created by SAS practitioners) that are compiled and run on the SAS application (SAS editor and compiler) across various SAS interfaces (e.g., SAS Display Manager, SAS Enterprise Guide, SAS University Edition) purchased from or provided by SAS, Inc.

Data analytic software is often produced in a development environment known as end-user development, in which the developers of software themselves are also the users. Within end-user development environments, software is never transferred or sold to a third party but is used and maintained by the developer or development team. For example, a financial fraud analyst may be required to produce a weekly report that details suspicious credit card transactions to validate and calibrate fraud detection algorithms. The analyst is required to develop a repeatable SAS program that can generate results to meet customer needs. However, the analyst is an end-user developer because he is responsible for both writing the software and creating weekly reports based on the data output. Note that this example represents data analytic development within an end-user development environment.

Traditional, user-focused, or software applications development contrasts sharply with data analytic development because ultimate business value is conferred through the production and delivery of software itself. For example, when Microsoft developers build a product such as Microsoft Word or Excel, the working software product denotes business value because it is distributed to and purchased by third-party users. The software development life cycle (SDLC) continues after purchase, but only insofar as maintenance activities are performed by Microsoft, such as developing and disseminating software patches. In the following section, data analytic development is compared to and contrasted with end-user and traditional development environments.

DISTINGUISHING DATA ANALYTIC DEVELOPMENT

So why bother distinguishing data analytic development environments? Because it's important to understand the strengths and weaknesses of respective development environments and because the software development environment can influence the relative quality and performance of software.

To be clear, data analytic development environments, end-user development environments, and traditional software development environments are not mutually exclusive. Figure 1.1 demonstrates the entanglement between these environments, demonstrating the majority of data analytic development performed within end-user development environments.

Figure 1.1 Software Development Environments

The data-analytic-end-user hybrid represents the most common type of data analytic development for several reasons. Principally, data analytic software is created not from a need for software itself but rather from a need to solve some problem, produce some output, or make some decision. For example, the financial analyst who needs to write a report about fraud levels and fraud detection accuracy in turn authors SAS software to automate and standardize this solution. SAS practitioners building data analytic software are often required to have extensive domain expertise in the data they're processing, analyzing, or otherwise utilizing to ensure they produce valid data products and decisions. Thus, first and foremost, the SAS practitioner is a financial analyst, although primary responsibilities can include software development, testing, operation, and maintenance.

Technical aspects and limitations of Base SAS software also encourage data analytic development to occur within end-user development environments. Because Base SAS software is compiled at execution, it remains as plain text not only in development and testing phases but also in production. This prevents the stabilization or hardening of software required for software encryption, which is necessary when software is developed for third-party users. For this reason, no market exists for companies that build and sell SAS software to third-party user bases because the underlying code would be able to be freely examined, replicated, and distributed. Moreover, without encryption, the accessibility of Base SAS code encourages SAS practitioners to explore and meddle with code, compromising its security and integrity.

The data-analytic-traditional hybrid is less common in SAS software but describes data analytic software in which the software does denote ultimate business value rather than a means to an end. This model is more common in environments in which separate development teams exist apart from analytic or software operational teams. For example, a team of SAS developers might write extract-transform-load (ETL) or analytic software that is provided to a separate team of analysts or other professionals who utilize the software and its resultant data products. The development team might maintain operations and maintenance (O&M) administrative activities for the software, including training, maintenance, and planning software end-of-life, but otherwise the team would not use or interact with the software it developed.

When data-analytic-traditional environments do exist, they typically produce software only for teams internal to their organization. Service level agreements (SLAs) sometimes exist between the development team and the team(s) they support, but the SAS software developed is typically neither sold nor purchased. Because SAS code is plain text and open to inspection, it's uncommon for a SAS development team to sell software beyond its organization. SAS consultants, rather, often operate within this niche, providing targeted developmental support to organizations.

The third and final hybrid environment, the end-user-traditional model, demonstrates software developed by and for SAS practitioners that is not data-focused. Rather than processing or analyzing variable data, the SAS software might operate as a stand-alone application, driven by user inputs. For example, if a rogue SAS practitioner spent a couple weeks of work encoding and bringing to life Parker Brothers' legendary board game Monopoly in Base SAS, the software itself would be the ultimate product. Of course, whether or not the analyst was able to retain his job thereafter would depend on whether his management perceived any business value in the venture!

Because of the tendency of data analytic development to occur within end-user development environments, traditional development is not discussed further in this text except as a comparison where strengths and weaknesses exist between traditional and other development environments. The pros and cons of end-user development are discussed in the next section.

End-User Development

Many end-user developers may not even consider themselves to be software developers. I first learned SAS working in a Veterans Administration (VA) psychiatric ward, and my teachers were psychologists, psychiatrists, statisticians, and other researchers. We saw patients, recorded and entered data, wrote and maintained our own software, analyzed clinical trials data, and conducted research and published papers on a variety of psychiatric topics. We didn't have a single “programmer” on our staff, although more than half of my coworkers were engaged in some form of data analysis in varying degrees of code complexity. However, because we were clinicians first and researchers second, the idea that we were also software developers would have seemed completely foreign to many of the staff.

In fact, this identity crisis is exactly why I use “SAS practitioners” to represent the breadth of professionals who develop software in the Base SAS language—because so many of us may feel that we are only moonlighting as software developers, despite the vast quantity of SAS software we may produce. This text represents a step toward acceptance of our roles—however great or small—as software developers.

The principal advantage of end-user development is the ability for domain experts—those who understand both the ultimate project intent and its data—to design software. The psychiatrists didn't need a go-between to help convey technical concepts to them, because they themselves were building the software. Neither was a business analyst required to convey the ultimate business need and intent of the software to the developers, because the developers were the psychiatrists—the domain experts. Because end-user developers possess both domain knowledge and technical savvy, they are poised to rapidly implement technical solutions that fulfill business needs without business analysts or other brokers.

To contrast, traditional software development environments often demonstrate a domain knowledge divide in which high-level project intent and requirements must be translated to software developers (who lack domain expertise) and where some technical aspects of the software must be translated to customers (who lack technical expertise in computer science or software development). Over time, stakeholders will tend to broaden their respective job roles and knowledge but, if left unmitigated, the domain knowledge divide can lead to communication breakdown, misinterpretation of software intent or requirements, and less functional or lower quality software. In these environments, business analysts and other brokers play a critical role in ensuring a smooth communication continuum among domain knowledge, project needs and objectives, and technical requirements.

Traditional software development environments do outperform end-user development in some aspects. Because developers in traditional environments operate principally as software developers, they're more likely to be educated and trained in software engineering, computer science, systems engineering, or other technically relevant fields. They may not have domain-specific certifications or accreditations to upkeep (like clinicians or psychiatrists) so they can more easily seek out training and education opportunities specific to software development. For example, in my work in the VA hospital, when we received training, it was related to patient care, psychiatry, privacy regulations, or some other medically focused discipline. We read and authored journal articles and other publications on psychiatric topics—but never software development.

Because of this greater focus on domain-specific education, training, and knowledge, end-user developers are less likely to implement (and, in some cases, may be unaware of) established best practices in software development such as reliance on the SDLC, Agile development methodologies, and performance requirements such as those described in the International Organization for Standardization (ISO) software product quality model. Thus, end-user developers can be disadvantaged relative to traditional software developers, both in software development best practices as well as best practices that describe the software development environment.

To overcome inherent weaknesses of end-user development, SAS practitioners operating in these environments should invest in software development learning and training opportunities commensurate with their software development responsibilities. While I survived my tenure in the VA psych ward and did produce much quality software, I would have improved my skills (and software) had I read fewer Diagnostic and Statistical Manual of Mental Disorders (DSM) case studies and more SAS white papers and computer science texts.

SOFTWARE DEVELOPMENT LIFE CYCLE (SDLC)

The SDLC describes discrete phases through which software passes from cradle to grave. In a more generic sense, the SDLC is also referenced as the systems development life cycle, which bears the same acronym. Numerous representations of the SDLC exist; Figure 1.2 shows a common depiction.

Figure 1.2 The Software Development Life Cycle (SDLC)

In many data analytic and end-user development environments, the SDLC is not in place, and software is produced using an undisciplined, laissez-faire method sometimes referred to as cowboy coding. Notwithstanding any weaknesses this may present, the ISO software product quality model benefits these relaxed development environments, regardless of whether the SDLC phases are formally recognized or implemented. Because the distinct phases of the SDLC are repeatedly referenced throughout this text, readers who lack experience in formalized development environments should learn the concepts associated with each phase so they can apply them (contextually, if not in practice) to their specific environment while reading this text.

Planning

Project needs are identified and high-level discussions occur, such as the “build-versus-buy” decision of whether to develop software, purchase a solution, or abandon the project. Subsequent discussion should define the functionality and performance of proposed software, thus specifying its intended quality.

Design

Function and performance, as they relate to technical implementation, are discussed. Whereas planning is needs-focused, design and later phases are solutions- and software-focused. In relation to quality, specific, measurable performance requirements should be created and, if formalized software testing is implemented, a test plan with test cases should be created.

Development

Software is built to meet project needs and requirements, including accompanying documentation and other artifacts.

Testing

Software is tested (against a test plan using test cases and test data, if these artifacts exist) and modified until it meets requirements.

Acceptance

Software is validated to meet requirements and formally accepted by stakeholders as meeting the intended functional and performance objectives.

Operation

Software is used for some intended duration. Where software maintenance is required, this occurs simultaneously with operation although these discrete activities may be performed by different individuals or teams.

Maintenance

While software is in operation, maintenance or modification may be required. Types of maintenance are discussed in

chapter 13

, “Maintainability,” and may be performed by users (in end-user development), by the original developers, or by a separate O&M team that supports software maintenance once development has concluded.

End of Life

Software is phased out and replaced at some point; however, this should be an intentional decision by stakeholders rather than a retreat from software that, due to poor quality, no longer meets functional or performance requirements.

Although the SDLC is often depicted and conceptualized as containing discrete phases, significant interaction can occur between phases. For example, during the design phase, a developer may take a couple of days to do some development work to test a theory to determine whether it will present a viable solution for the software project. Or, during testing, when significant vulnerabilities or defects are discovered, developers may need to overhaul software, including redesign and redevelopment. Thus, while SDLC phases are intended to represent the focus and majority of the work occurring at that time, their intent is not to exclude other activities that would naturally occur.

SDLC Roles

Roles such as customer, software developer, tester, and user are uniquely described in software development literature. While some cross-functional development teams do delineate responsibilities by role, in other environments, roles and responsibilities are combined. An extreme example of role combination is common in end-user development environments in which developers write, test, and use their own software—bestowing them with developer, tester, user, and possibly customer credentials. SAS end-user developers often have primary responsibilities in their respective business domain as researchers, analysts, scientists, and other professionals, but develop software to further these endeavors.

A stakeholder represents the “individual or organization having a right, share, claim, or interest in a system or in its possession of characteristics that meet their needs and expectations.”1 While the following distinct stakeholders are referenced throughout the text, readers should interpret and translate these definitions to their specific environments, in which multiple roles may be coalesced into a single individual and in which some roles may be absent:

Sponsor

“The individual or group that provides the financial resources, in cash or in kind, for the project.”

2

Sponsors are rarely discussed in this text but, as software funders, often dictate software quality requirements.

Customer

“The entity or entities for whom the requirements are to be satisfied in the system being defined and developed.”

3

The customer can be the product owner (in Agile or Scrum environments), project manager, sponsor, or other authority figure delegating requirements. This contrasts with some software development literature, especially Agile-related, in which the term

customer

often represents the software end user.

SAS Practitioner/Developer

These are the folks in the trenches writing SAS code. I use the terms

practitioner

and

developer

interchangeably, but intentionally chose

SAS practitioner

because it embodies the panoply of diverse professionals who use the SAS application to write SAS software to support their domain-specific work.

Software Tester

Testers perform a quality assurance function to determine if software meets needs, requirements, and other technical specifications. A tester may be the developer who authored the code, a separate developer (as in software peer review), or an individual or quality assurance team whose sole responsibility is to test software.

User

“The individual or organization that will use the project's product.”

4

In end-user development environments, users constitute the SAS practitioners who wrote the software, while in other environments, users may be analysts or other stakeholders who operate SAS software but who are not responsible for software development, testing, or maintenance activities.

Waterfall Software Development

Waterfall software development methodologies employ a stop-gate or phase-gate approach to software development in which discrete phases are performed in sequence. For example, Figure 1.3 demonstrates that planning concludes before design commences, and all design concludes before development commences. This approach is commonly referred to as big design up front (BDUF), because the end-state of software is expected to be fully imagined and prescribed in the initial design documentation, with emphasis on rigid adherence to this design.

Figure 1.3 Waterfall Development Methodology

For years, Waterfall methodologies have been anecdotally referred to as “traditional” software development. Since the rise of Agile software development methodologies in the early 2000s, however, an entire generation of software developers now exists who (fortunately) have never had to experience rigid Waterfall development, so the “traditional” nomenclature is outmoded. Waterfall development methodologies are often criticized because they force customers to predict all business needs up front and eschew flexibility of these initial designs; software products may be delivered on time, but weeks or months after customer needs or objectives have shifted to follow new business opportunities. Thus, the software produced may meet the original needs and requirements, but often fails to meet all current needs and requirements.

Despite the predominant panning of Waterfall development methodologies within contemporary software development literature, a benefit of Waterfall is its clear focus on SDLC phases, even if they are rigidly enforced. For example, because development follows planning and design, software developers only write software after careful consideration of business needs and identification of a way ahead to achieve those objectives. Further, because all software is developed before testing, the testing phase comprehensively validates function and performance against requirements. Thus, despite its rigidity, the phase-gate approach encourages quality controls between discrete phases of the SDLC.

Agile Software Development

Agile software development methodologies contrast with Waterfall methodologies in that Agile methodologies emphasize responsible flexibility through rapid, incremental, iterative design and development. Agile methodologies follow the Manifesto for Agile Software Development (AKA the Agile Manifesto) and include Scrum, Lean, Extreme Programming (XP), Crystal, Scaled Agile Framework (SAFe), Kanban, and others.

The Agile Manifesto was cultivated by a group of 17 software development gurus who met in Snowbird, Utah, in 2001 to elicit and define a body of knowledge that prescribes best practices for software development.

Manifesto for Agile Software Development

We are uncovering better ways of developing software by doing it and helping others do it.

Through this work we have come to value:

Individuals and interactions

over processes and tools

Working software

over comprehensive documentation

Customer collaboration

over contract negotiation

Responding to change

over following a plan

That is, while there is value in the items on the right, we value the items on the left more.5

In Agile development environments, software is produced through iterative development in which the entire SDLC occurs within a time-boxed iteration, typically from two to eight weeks. Within that iteration, software design, development, testing, validation, and production occur so that working software is released to the customer at the end of the iteration. At that point, customers prioritize additional functionality or performance to be included in future development iterations. Customers benefit because they can pursue new opportunities and business value during software development, rather than be forced to continue funding or leading software projects whose value decreases over an extended SDLC due to shifting business needs, opportunities, risks, and priorities. Figure 1.4 demonstrates Agile software development in which software is developed in a series of two-week iterations.

Figure 1.4 Agile Software Development

Agile is sometimes conceptualized as a series of miniature SDLC life cycles and, while this does describe the iterative nature of Agile development, it fails to fully capture Agile principles and processes. For example, because Agile development releases software iteratively, maintenance issues from previous iterations may bubble up to the surface during a current iteration, forcing developers (or their customers) to choose between performing necessary maintenance or releasing new functionality or performance as scheduled. Thus, a weakness ascribed to Agile is the inherent competition that exists between new development and maintenance activities, which is discussed in the “Maintenance in Agile Environments” section in chapter 13, “Maintainability.” This competition contrasts with Waterfall environments, in which software maintenance is performed primarily once software is in production and development tasks have largely concluded.

Despite this potential weakness, Agile has been lauded as a best practice in software development for more than a decade and has defined software development in the 21st century. Its prominence within traditional applications development environments, however, has not been mirrored within data analytic development environments. This is due in part to the predominance of end-user developers who support data analytic development and who are likely more focused on domain-specific best practices rather than software development methodologies and best practices.

Another weakness is found in the body of Agile literature itself, which often depicts an idealized “developer” archetype whose responsibilities seem focused narrowly on the development of releasable code rather than the creation of data products or participation in other activities that confer business value. In these software-centric Agile descriptions, common activities in data analytic development environments (such as data analysis or report writing) are often absent or only offhandedly referenced. Despite this myopia within Agile literature, Agile methodologies, principles, and techniques are wholly applicable to and advisable for data analytic development.

For those interested in exploring Agile methodologies, dozens of excellent resources exist, although these typically describe traditional software applications development. For an introduction to Agile methodologies to support data analytic development, I demonstrate the successful application of Agile to SAS software development in a separate text: When Software Development Is Your Means Not Your End: Abstracting Agile Methodologies for End-User Development and Analytic Application.

RISK

From a software development perspective, basic risks include functional and performance failure in software. For example, a nonmalicious threat (like big data) exploits a software vulnerability (like an underlying error that limits efficient scaling when big data are encountered), causing risk (inefficient performance or functional failure) to business value. These terms are defined in the text box “Threat, Vulnerability, and Risk.” While the Project Management Body of Knowledge (PMBOK) and other sources define positive risk as opportunity, only negative risks are discussed within this text.

THREAT, VULNERABILITY, AND RISK

Threat

“A risk that would have a negative effect on one or more project objectives.”

6

“A state of the system or system environment which can lead to adverse effect in one or more given risk dimensions.”

7

Vulnerability

“Weakness in an information system, or cryptographic system, or components (e.g., system security procedures, hardware design, internal controls) that could be exploited.”

8

Risk

“An uncertain event or condition that, if it occurs, has a positive or negative effect on one or more project objectives.”

9

“The combination of the probability of an abnormal event or failure and the consequence(s) of that event or failure to a system's components, operators, users, or environment.”

10

Failure

Software failure is typically caused by threats that exploit vulnerabilities, but neither all threats nor all vulnerabilities will lead to failure. Errors (human mistakes) may lie dormant in code as vulnerabilities that may or may not be known to developers. Unknown vulnerabilities include coding mistakes (defects) that have not yet resulted in failure, while known vulnerabilities include coding mistakes (latent defects) that are identified yet unresolved. The “Paths to Failure” section in chapter 4, “Reliability,” further defines and distinguishes these terms.

For example, the following SAS code is often represented in literature as a method to determine the number of observations in a data set:

proc sql noprint; select count(*) into :obstot from temp; quit;

The code is effective for data sets that have fewer than 100 million observations but, as this threshold is crossed, the &OBSTOT changes from numeric to scientific notation. For example, a data set having 10 million observations is represented as 10000000, while 100 million observations is represented as 1E8. To the SAS practitioner running this code to view the number of observations in the log, this discrepancy causes no problems. However, if a subsequent procedure attempts to evaluate or compare &OBSTOT, runtime errors can occur if the evaluated number is in scientific notation. This confusion is noted in the following output:

%let obstot=1E8; %if &obstot<5000000 %then %put LESS THAN 5 MILLION; %else %put GREATER THAN 5 MILLION; LESS THAN 5 MILLION

Obviously 100 million is not less than 5 million but, because of two underlying errors, a vulnerability in the code exists. The vulnerability can be easily eliminated by correcting either of the two errors. The first error can be eliminated by changing the assignment of &OBSTOT to include a format that will accommodate larger numbers, as demonstrated with the FORMAT statement. The second error can be eliminated by enclosing the numeric comparison inside the %SYSEVALF macro function, which interprets 1E8 as a number rather than text. Both solutions are demonstrated and either correction in isolation eliminates the vulnerability and prevents the failure.

proc sql noprint; select count(*) format=15.0 into :obstot from temp; quit; %if %sysevalf(&obstot<5000000) %then %put LESS THAN 5 MILLION; %else %put GREATER THAN 5 MILLION; GREATER THAN 5 MILLION

Because the failure occurs only as the number of observations increases, this can be described as a scalability error. The SAS practitioner failed to imagine (and test) what would occur if a large data set were encountered. But if the 100 million observation threshold is never crossed, the code will continue to execute without failure despite still containing errors. This error type is discussed further in the “SAS Application Thresholds” section in chapter 9, “Scalability.”

Developers often intentionally introduce vulnerabilities into software. For example, a developer familiar with the previous software vulnerability (exploited by the threat of big data) might choose to ignore the error in software designed to process data sets of 10,000 or fewer observations. Because the risk is negligible, it can be accepted, and the software can be released as is—with the vulnerability. In other cases, threats may pose higher risks, yet the risks are still accepted because the cost to eliminate or mitigate them outweighs the benefit.

Unexploited vulnerabilities don't diminish software reliability because no failure occurs. For example, the previous latent defect is never exploited because big data are never encountered. However, vulnerabilities do increase the risk of software failures; therefore, developers should be aware of specific risks to software. In this example, the risk posed is failure caused by the accidental processing of big data within the SQL procedure. When vulnerabilities are exploited and runtime errors or other failures occur, software reliability is diminished. The risk register, introduced in the next section, enables SAS practitioners to record known vulnerabilities, expected risks, and proposed solutions to best measure and manage risk level for software products.

Risk Register

A risk register is a “record of information about identified risks.”11 Risk is an inherent reality of all software applications, so risk registers (sometimes referred to as defect databases) document risks, threats, vulnerabilities, and related information throughout the SDLC. Developers and other stakeholders should decide which performance requirements to incorporate in software, but likely will not include all performance requirements in all software. While vulnerabilities will exist in software, it's important they be identified, investigated, and documented sufficiently to demonstrate the specific risks they pose to software operation.

A risk register qualitatively and quantitatively records known vulnerabilities and associated threats and risks to software function or performance, and can include the following elements:

Description of vulnerability

Location of vulnerability

Threat(s) to vulnerability

Risk if vulnerability is exploited

Severity of risk

Probability of risk

Likelihood of discovery

Cost to eliminate or mitigate risk

Recommended resolution

Some risk registers, as demonstrated, are organized at the defect level while others are organized at the threat or risk level. Vulnerability-level risk registers are common in software development because while many threats lie outside the control of developers, programmatic solutions can often be implemented to eliminate or mitigate specific vulnerabilities. Moreover, general threats like “big data” can exploit numerous, unrelated vulnerabilities within a single software product.

Table 1.1 depicts a simplified risk register for two of the errors mentioned in the code. The risk severity, risk probability, likelihood of risk discovery, and cost to implement solution are demonstrated on a scale of 1 to 5, in which 5 is more severe, more likely to occur, less easy to discover, and more costly to repair.

Table 1.1 Sample Risk Register

Num

Vulnerability

Location

Risk

Risk Severity

Risk Probability

Risk Discovery

Risk Cost

1

%SYSEVALF should be used in evaluation

less-than operator

scientific notation won't be interpreted correctly

5

1

5

1

2

no format statement

SELECT statement of PROC SQL

scientific notation won't be interpreted correctly

5

1

5

1

The first and second risks describe separate vulnerabilities, each exploited by the threat of data sets containing 100 million or more observations. Despite the high severity (5) if the threat is encountered, the likelihood is low (1) because these file sizes have never been encountered in this environment. If these two factors alone were considered, a development team might choose to accept the risk and release the software with the vulnerabilities, given their unlikelihood of occurrence. However, because the likelihood of discovery is low (5)—as no warning or runtime error would be produced if the threat were encountered—and because the cost to implement a remedy (modifying one line of code) is low (1), the development team might instead decide to modify the code, thus eliminating the risk rather than accepting it.

Not depicted, the recommended solution describes the path chosen to manage the risk—often distilled as avoidance, transfer, acceptance, or mitigation, and described in the following section. The recommended solution may contain a technical description of how the risk is being managed. For example, if a risk is being eliminated, the resolution might describe programmatically how the associated threat is being eliminated or controlled or how the associated vulnerability is being eliminated so it can't be exploited by the threat.

Risk Management

Risk management