Handbook of Software Fault Localization -  - E-Book

Handbook of Software Fault Localization E-Book

0,0
96,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Handbook of Software Fault Localization A comprehensive analysis of fault localization techniques and strategies In Handbook of Software Fault Localization: Foundations and Advances, distinguished computer scientists Prof. W. Eric Wong and Prof. T.H. Tse deliver a robust treatment of up-to-date techniques, tools, and essential issues in software fault localization. The authors offer collective discussions of fault localization strategies with an emphasis on the most important features of each approach. The book also explores critical aspects of software fault localization, like multiple bugs, successful and failed test cases, coincidental correctness, faults introduced by missing code, the combination of several fault localization techniques, ties within fault localization rankings, concurrency bugs, spreadsheet fault localization, and theoretical studies on fault localization. Readers will benefit from the authors' straightforward discussions of how to apply cost-effective techniques to a variety of specific environments common in the real world. They will also enjoy the in-depth explorations of recent research directions on this topic. Handbook of Software Fault Localization also includes: * A thorough introduction to the concepts of software testing and debugging, their importance, typical challenges, and the consequences of poor efforts * Comprehensive explorations of traditional fault localization techniques, including program logging, assertions, and breakpoints * Practical discussions of slicing-based, program spectrum-based, and statistics-based techniques * In-depth examinations of machine learning-, data mining-, and model-based techniques for software fault localization Perfect for researchers, professors, and students studying and working in the field, Handbook of Software Fault Localization: Foundations and Advances is also an indispensable resource for software engineers, managers, and software project decision makers responsible for schedule and budget control.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 1147

Veröffentlichungsjahr: 2023

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.


Ähnliche


IEEE Press445 Hoes LanePiscataway, NJ 08854

IEEE Press Editorial BoardSarah Spurgeon, Editor in Chief

Jón Atli Benediktsson  

Andreas Molisch  

Diomidis Spinellis  

Anjan Bose  

Saeid Nahavandi  

Ahmet Murat Tekalp  

Adam Drobot  

Jeffrey Reed  

Peter (Yong) Lian  

Thomas Robertazzi  

About IEEE Computer Society

IEEE Computer Society is the world’s leading computing membership organization and the trusted information and career‐development source for a global workforce of technology leaders including: professors, researchers, software engineers, IT professionals, employers, and students. The unmatched source for technology information, inspiration, and collaboration, the IEEE Computer Society is the source that computing professionals trust to provide high‐quality, state‐of‐the‐art information on an on‐demand basis. The Computer Society provides a wide range of forums for top minds to come together, including technical conferences, publications, and a comprehensive digital library, unique training webinars, professional training, and the Tech Leader Training Partner Program to help organizations increase their staff’s technical knowledge and expertise, as well as the personalized information tool my Computer. To find out more about the community for technology leaders, visit http://www.computer.org.

IEEE/Wiley Partnership

The IEEE Computer Society and Wiley partnership allows the CS Press authored book program to produce a number of exciting new titles in areas of computer science, computing, and networking with a special focus on software engineering. IEEE Computer Society members receive a 35% discount on Wiley titles by using their member discount code. Please contact IEEE Press for details.

To submit questions about the program or send proposals, please contact Mary Hatcher, Editor, Wiley‐IEEE Press: Email: [email protected], John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030‐5774.

Handbook of Software Fault Localization

Foundations and Advances

Edited by

W. Eric Wong

Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA

T.H. Tse

Department of Computer Science, The University of Hong Kong, Pokfulam, Hong Kong

This edition first published 2023Copyright © 2023 by the IEEE Computer Society. All rights reserved.

Published by John Wiley & Sons, Inc., Hoboken, New Jersey.Published simultaneously in Canada.

No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per‐copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750‐8400, fax (978) 750‐4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748‐6011, fax (201) 748‐6008, or online at http://www.wiley.com/go/permission.

Trademarks: Wiley and the Wiley logo are trademarks or registered trademarks of John Wiley & Sons, Inc. and/or its affiliates in the United States and other countries and may not be used without written permission. All other trademarks are the property of their respective owners. John Wiley & Sons, Inc. is not associated with any product or vendor mentioned in this book.

Limit of Liability/Disclaimer of Warranty: While the publisher and authors have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages. Further, readers should be aware that websites listed in this work may have changed or disappeared between when this work was written and when it is read. Neither the publisher nor authors shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.

For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762‐2974, outside the United States at (317) 572‐3993 or fax (317) 572‐4002.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com

Library of Congress Cataloging‐in‐Publication Data

Names: Wong, W. Eric, editor. | Tse, T.H., editor. | John Wiley & Sons, publisher.Title: Handbook of software fault localization : foundations and advances / edited by W. Eric Wong, T.H. Tse.Description: Hoboken, New Jersey : Wiley, [2023] | Includes bibliographical references and index.Identifiers: LCCN 2022037789 (print) | LCCN 2022037790 (ebook) | ISBN 9781119291800 (paperback) | ISBN 9781119291817 (adobe pdf) | ISBN 9781119291824 (epub)Subjects: LCSH: Software failures. | Software failures—Data processing. | Debugging in computer science. | Computer software—Quality control.Classification: LCC QA76.76.F34 H36 2023 (print) | LCC QA76.76.F34 (ebook) | DDC 005.1—dc23/eng/20220920LC record available at https://lccn.loc.gov/2022037789LC ebook record available at https://lccn.loc.gov/2022037790

Cover designed by T.H. Tse

Editor Biographies

W. Eric Wong, PhD, is a Full Professor, Director of Software Engineering Program, and the Founding Director of Advanced Research Center for Software Testing and Quality Assurance in Computer Science at the University of Texas at Dallas. He also has an appointment as a Guest Researcher with the National Institute of Standards and Technology, an agency of the US Department of Commerce.

Professor Wong’s research focuses on helping practitioners improve software quality while reducing production cost. In particular, he is working on software testing, program debugging, risk analysis, safety, and reliability. He was the recipient of the ICST 2020 (The 13th IEEE International Conference on Software Testing, Verification, and Validation) Most Influential Paper award for his paper titled “Using Mutation to Automatically Suggest Fixes for Faulty Programs” published at ICST 2010. He also received a JSS 2020 (Journal of Systems and Software) Most Influential Paper award for his paper titled “A Family of Code Coverage‐based Heuristics for Effective Fault Localization” published in Volume 83, Issue 2, 2010. The conference version of this paper received the Best Paper Award at the 31st IEEE International Computer Software and Applications Conference.

Professor Wong was the award recipient of the 2014 IEEE Reliability Society Engineer of the Year. In addition, he was the Editor-in-Chief of the IEEE Transactions on Reliability for six years ending on 31 May 2022. He has also been an Area Editor of Elsevier’s Journal of Systems and Software since 2017. Dr. Wong received his MS and PhD in Computer Science from Purdue University, West Lafayette, IN, USA. More details can be found at Professor Wong’s homepage https://personal.utdallas.edu/~ewong

T.H. Tse received his PhD in Information Systems from the London School of Economics and was a Visiting Fellow at the University of Oxford. He is an Honorary Professor in Computer Science with The University of Hong Kong after retiring from the full professorship in 2014. His research interest is in program testing and debugging. He has more than 270 publications, including a book in the Cambridge Tracts in Theoretical Computer Science series, Cambridge University Press. He is ranked internationally as no. 2 among experts in metamorphic testing. The 2010 paper titled “Adaptive Random Testing: the ART of Test Case Diversity” by Professor Tse and team has been selected as the Grand Champion of the Most Influential Paper Award by the Journal of Systems and Software.

Professor Tse is a Steering Committee Chair of the IEEE International Conference on Software Quality, Reliability, and Security; and an Associate Editor of IEEE Transactions on Reliability. He served on the Search Committee for the Editor-in-Chief of IEEE Transactions on Software Engineering in 2013. He is a Life Fellow of the British Computer Society and a Life Senior Member of the IEEE. He was awarded an MBE by Queen Elizabeth II of the United Kingdom.

List of Contributors

Rui AbreuDepartment of Informatics Engineering, Faculty of Engineering University of Porto, Porto, Portugal

Hira AgrawalPeraton Labs, Basking Ridge, NJ, USA

Peggy CellierINSA, CNRS, IRISA, Universite de Rennes, Rennes, France

Vidroha DebroyDepartment of Computer ScienceUniversity of Texas at Dallas Richardson, TX, USAandDottid Inc., Dallas, TX, USA

Mireille DucasséINSA, CNRS, IRISA, Universite de Rennes, Rennes, France

Sébastien FerréCNRS, IRISA, Universite de Rennes Rennes, France

Ruizhi GaoSonos Inc., Boston, MA, USA

Alex GorceSchool of Informatics, Computing, and Cyber Systems, Northern Arizona University, Flagstaff AZ, USA

Robert HirschfeldDepartment of Computer Science Universität Potsdam, Brandenburg Germany

Birgit HoferInstitute of Software Technology Graz University of Technology, Graz Austria

Linghuan HuGoogle Inc., Mountain View CA, USA

Hua Jie LeeSchool of Computing, Macquarie University, Sydney, NSW Australia

Dongcheng LiDepartment of Computer Science University of Texas at Dallas Richardson, TX, USA

Yihao LiSchool of Information and Electrical Engineering, Ludong University Yantai, China

David LoSchool of Computing and Information Systems, Singapore Management University, Singapore

Wolfgang MayerAdvanced Computing Research Centre, University of South Australia Adelaide, SA, Australia

Lee NaishSchool of Computing and Information Systems, The University of Melbourne Melbourne, Australia

Michael PerscheidDepartment of Computer Science Universität Potsdam, Brandenburg Germany

Olivier RidouxCNRS, IRISA, Universite de Rennes Rennes, France

Markus StumptnerAdvanced Computing Research Centre, University of South Australia Adelaide, SA, Australia

T.H. TseDepartment of Computer Science, The University of Hong Kong, Pokfulam Hong Kong

W. Eric WongDepartment of Computer Science University of Texas at Dallas Richardson, TX, USA

Franz WotawaInstitute of Software Technology, Graz University of Technology, Graz Austria

Xin XiaSoftware Engineering Application Technology Lab, Huawei, China

Xiaoyuan XieSchool of Computer Science, Wuhan University, Wuhan, China

Xiangyu ZhangDepartment of Computer Science Purdue University, West Lafayette IN, USA

Zhenyu ZhangInstitute of Software, Chinese Academy of Sciences, Beijing, China

1Software Fault Localization: an Overview of Research, Techniques, and Tools

W. Eric Wong1, Ruizhi Gao2, Yihao Li3, Rui Abreu4, Franz Wotawa5, and Dongcheng Li1

1 Department of Computer Science, University of Texas at Dallas, Richardson, TX, USA

2 Sonos Inc., Boston, MA, USA

3 School of Information and Electrical Engineering, Ludong University, Yantai, China

4 Department of Informatics Engineering, Faculty of Engineering, University of Porto, Porto, Portugal

5 Institute of Software Technology, Graz University of Technology, Graz, Austria

1.1 Introduction

Software fault localization, the act of identifying the locations of faults in a program, is widely recognized to be one of the most tedious, time‐consuming, and expensive – yet equally critical – activities in program debugging. Due to the increasing scale and complexity of software today, manually locating faults when failures occur is rapidly becoming infeasible, and consequently, there is a strong demand for techniques that can guide software developers to the locations of faults in a program with minimal human intervention. This demand in turn has fueled the proposal and development of a broad spectrum of fault localization techniques, each of which aims to streamline the fault localization process and make it more effective by attacking the problem in a unique way. In this book, we categorize and provide a comprehensive overview of such techniques and discuss key issues and concerns that are pertinent to software fault localization.

Software is fundamental to our lives today, and with its ever‐increasing usage and adoption, its influence is practically ubiquitous. At present, software is not just employed in, but is critical to, many security and safety‐critical systems in industries such as medicine, aeronautics, and nuclear energy. Not surprisingly, this trend has been accompanied by a drastic increase in the scale and complexity of software. Unfortunately, this has also resulted in more software bugs, which often lead to execution failures with huge losses [1–3]. On 15 January 1990, the AT&T operation center in Bedminster, NJ, USA, had an increase of red warning signals appearing across the 75 screens that indicated the status of parts of the AT&T worldwide network. As a result, only about 50 percent of the calls made through AT&T were connected. It took nine hours for the AT&T technicians to identify and fix the issue caused by a misplaced break statement in the code. AT&T lost $60 to $75 million in this accident [4].

Furthermore, software faults in safety‐critical systems have significant ramifications not only limited to financial loss, but also to loss of life, which is alarming [5]. On 20 December 1995, a Boeing 757 departed from Miami, FL, USA. The aircraft was heading to Cali, Colombia. However, it crashed into a 9800 feet mountain. A total of 159 deaths resulted; leaving only five passengers alive. This event marked the highest death toll of any accident in Colombia at the time. This accident was caused by the inconsistencies between the naming conventions of the navigational charts and the flight management system. When the crew looked up the waypoint “Rozo”, the chart indicated the letter “R” as its identifier. The flight management system, however, had the city paired with the word “Rozo”. As a result, when the pilot entered the letter “R”, the system did not know if the desired city was Rozo or Romeo. It automatically picked Romeo, which is a larger city than Rozo, as the next waypoint.

A 2006 report from the National Institute of Standards and Technology (NIST) [6] indicated that software errors are estimated to cost the US economy $59.5 billion annually (0.6 percent of the GDP); the cost has undoubtedly grown since then. Over half the cost of fixing or responding to these bugs is passed on to software users, while software developers and vendors absorb the rest.

Even when faults in software are discovered due to erroneous behavior or some other manifestation of the fault(s),1 finding and fixing them is an entirely different matter. Fault localization, which focuses on the former, i.e. identifying the locations of faults, has historically been a manual task that has been recognized to be time‐consuming and tedious as well as prohibitively expensive [7], given the size and complexity of large‐scale software systems today. Furthermore, manual fault localization relies heavily on the software developer’s experience, judgment, and intuition to identify and prioritize code that is likely to be faulty. These limitations have led to a surge of interest in developing techniques that can partially or fully automate the localization of faults in software while reducing human input. Though some techniques are similar and some very different (in terms of the type of data consumed, the program components focused on, comparative effectiveness and efficiency, etc.), they each try to attack the problem of fault localization from a unique perspective, and typically offer both advantages and disadvantages relative to one another. With many techniques already in existence and others continually being proposed, as well as with advances being made both from a theoretical and practical perspective, it is important to catalog and overview current state‐of‐the‐art techniques in fault localization in order to offer a comprehensive resource for those already in the area and those interested in making contributions to it.

In order to provide a complete survey covering most of the publications related to software fault localization since the late 1970s, in this chapter, we created a publication repository that includes 587 papers published from 1977 to 2020. We also searched for Masters’ and PhD theses closely related to software fault localization, which are listed in Table 1.1.

Table 1.1 A list of recent PhD and Masters’ theses on software fault localization.

Author

Title

Degree

University

Year

Ehud Y. Shapiro

[8]

Algorithmic Program Debugging

PhD

Yale University

1983

Hiralal Agrawal

[9]

Towards Automatic Debugging of Computer Programs

PhD

Purdue University

1991

Hsin Pan

[10]

Software debugging with dynamic instrumentation and test‐based knowledge

PhD

Purdue University

1993

W. Bond Gregory

[11]

Logic Programs for Consistency‐based Diagnosis

PhD

Carleton University

1994

Benjamin Robert Liblit

[12]

Cooperative Bug Isolation

PhD

The University of California, Berkeley

2004

Bernhard Peischl

[13]

Automated Source‐Level Debugging of Synthesizeable VHDL Designs

PhD

Graz University of Technology

2004

Haifeng He

[14]

Automated Debugging using Path‐based Weakest Preconditions

Master

University of Arizona

2004

Alex David Groce

[15]

Error Explanation and Fault Localization with Distance Metrics

PhD

Carnegie Mellon University

2005

Emm7anuel Renieris

[16]

A Research Framework for Software‐Fault Localization Tools

PhD

Brown University

2005

Daniel Köb

[17]

Extended Modeling for Automatic Fault Localization in Object‐Oriented Software

PhD

Graz University of Technology

2005

David Hovemeyer

[18]

Simple and Effective Static Analysis to Find Bugs

PhD

University of Maryland

2005

Peifeng Hu

[19]

Automated Fault Localization: a Statistical Predicate Analysis Approach

PhD

The University of Hong Kong

2006

Xiangyu Zhang

[20]

Fault Localization via Precise Dynamic Slicing

PhD

The University of Arizona

2006

Rafi Vayani

[21]

Improving Automatic Software Fault Localization

Master

Delft University of Technology

2007

Ramana Rao Kompella

[22]

Fault Localization in Backbone Networks

PhD

University of California, San Diego

2007

Andreas Griesmayer

[23]

Debugging Software: from Verification to Repair

PhD

Graz University of Technology

2007

Tao Wang

[24]

Post‐Mortem Dynamic Analysis For Software Debugging

PhD

Fudan University

2007

Sriraman Tallam

[25]

Fault Location and Avoidance in Long‐Running Multithreaded Applications

PhD

The University of Arizona

2007

Ophelia C. Chesley

[26]

CRISP‐A fault localization Tool for Java Programs

Master

Rutgers, The State University of New Jersey

2007

Shan Lu

[27]

Understanding, Detecting and Exposing Concurrency Bugs

PhD

University of Illinois at Urbana‐Champaign

2008

Naveed Riaz

[28]

Automated Source‐Level Debugging of Synthesizable Verilog Designs

PhD

Graz University of Technology

2008

James Arthur Jones

[29]

Semi‐Automatic Fault Localization

PhD

Georgia Institute of Technology

2008

Zhenyu Zhang

[30]

Software Debugging through Dynamic Analysis of Program Structures

PhD

The University of Hong Kong

2009

Rui Abreu

[31]

Spectrum‐based Fault Localization in Embedded Software

PhD

Delft University of Technology

2009

Dennis Jefferey

[32]

Dynamic State Alteration Techniques for Automatically Locating Software Errors

PhD

University of California Riverside

2009

Xinming Wang

[33]

Automatic Localization of Code Omission Faults

PhD

The Hong Kong University of Science and Technology

2010

Fabrizio Pastore

[34]

Automatic Diagnosis of Software Functional Faults by Means of Inferred Behavioral Models

PhD

University of Milan Bicocca

2010

Mihai Nica [

35

]

On the Use of Constraints in Automated Program Debugging – From Foundations to Empirical Results

PhD

Graz University of Technology

2010

Zachary P. Fry

[36]

Fault Localization Using Textual Similarities

Master

The University of Virginia

2011

Hua Jie Lee

[37]

Software Debugging Using Program Spectra

PhD

The University of Melbourne

2011

Vidroha Debroy

[38]

Towards the Automation of Program Debugging

PhD

The University of Texas at Dallas

2011

Alberto Gonzalez Sanchez

[39]

Cost Optimizations in Runtime Testing and Diagnosis

PhD

Delft University of Technology

2011

Jared David DeMott

[40]

Enhancing Automated Fault Discovery and Analysis

PhD

Michigan State University

2012

Xin Zhang

[41]

Secure and Efficient Network Fault Localization

PhD

Carnegie Mellon University

2012

Xiaoyuan Xie

[42]

On the Analysis of Spectrum‐based Fault Localization

PhD

Swinburne University of Technology

2012

Alexandre Perez

[43]

Dynamic Code Coverage with Progressive Detail Levels

Master

University of Porto

2012

Raul Santelices

[44]

Change‐effects Analysis for Effective Testing and Validation of Evolving Software

PhD

Georgia Institute of Technology

2012

George. K. Baah

[45]

Statistical Causal Analysis for Fault Localization

PhD

Georgia Institute of Technology

2012

Swarup K. Sahoo

[46]

A Novel Invariants‐based Approach for Automated Software Fault Localization

PhD

University of Illinois at Urbana‐Champaign

2012

Birgit Hofer

[47]

From Fault Localization of Programs written in 3rd level Language to Spreadsheets

PhD

Graz University of Technology

2013

Aritra Bandyopadhyay

[48]

Mitigating the Effect of Coincidental Correctness in Spectrum‐based Fault Localization

PhD

Colorado State University

2013

Shounak Roychowdhury [

49

]

A Mixed Approach to Spectrum‐based Fault Localization Using Information Theoretic Foundations

PhD

The University of Texas at Austin

2013

Shaimaa Ali

[50]

Localizing State‐Dependent Faults Using Associated Sequence Mining

PhD

The University of Western Ontario

2013

Christian Kuhnert

[51]

Data‐driven Methods for Fault Localization in Process Technology

PhD

Karlsruhe Institute of Technology

2013

Dawei Qi

[52]

Semantic Analyses to Detect and Localize Software Regression Errors

PhD

Tsinghua University

2013

William N. Sumner

[53]

Automated Failure Explanation Through Execution Comparison

PhD

Purdue University

2013

Mark A. Hays

[54]

A Fault‐based Model of Fault Localization Techniques

PhD

University of Kentucky

2014

Sang Min Park

[55]

Effective Fault Localization Techniques for Concurrent Software

PhD

Georgia Institute of Technology

2014

Gang Shu

[56]

Statistical Estimation of Software Reliability and Failure‐causing Effect

PhD

Case Western Reserve University

2014

Lucia

[57]

Ranking‐based Approaches for Localizing Faults

PhD

Singapore Management University

2014

Seok‐Hyeon Moon

[58]

Effective Software Fault Localization using Dynamic Program Behaviors

Master

Korea Advanced Institute of Science and Technology

2014

Yepang Liu

[59]

Automated Analysis of Energy Efficiency and Performance for Mobile Applications

PhD

The Hong Kong University of Science and Technology

2014

Cuiting Chen

[60]

Automated Fault Localization for Service‐Oriented Software Systems

PhD

Delft University of Technology

2015

Matthias Rohr

[61]

Workload‐sensitive Timing Behavior Analysis for Fault Localization in Software Systems

PhD

Kiel University

2015

Ozkan Bayraktar

[62]

Ela: an Automated Statistical Fault Localization Technique

PhD

The Middle East Technical University

2015

Azim Tonzirul

[63]

Fault Discovery, Localization, and Recovery in Smartphone Apps

PhD

University of California Riverside

2016

Laleh Gholamosseinghandehari

[64]

Fault Localization based on Combinatorial Testing

PhD

The University of Texas at Arlington

2016

Ruizhi Gao

[65]

Advanced Software Fault Localization for Programs with Multiple Bugs

PhD

The University of Texas at Dallas

2017

Shih‐Feng Sun

[66]

Statistical Fault Localization and Causal Interactions

PhD

Case Western Reserve University

2017

Rongxin Wu

[67]

Automated Techniques for Diagnosing Crashing Bugs

PhD

The Hong Kong University of Science and Technology

2017

Arjun Roy

[68]

Simplifying dataleft fault detection and localization

PhD

University of California San Diego

2018

Yun Guo

[69]

Towards Automatically Localizing and Repairing SQL Faults

PhD

George Mason University

2018

Nasir Safdari

[70]

Learning to Rank Relevant Files for Bug Reports Using Domain knowledge, Replication and Extension of a Learning‐to‐Rank Approach

Master

Rochester Institute of Technology

2018

Dai Ting

[71]

A Hybrid Approach to Cloud System Performance Bug Detection, Diagnosis and Fix

PhD

North Carolina State University

2019

George Thompson

[72]

Towards Automated Fault Localization for Prolog

Master

North Carolina A&T State University

2020

Xia Li

[73]

An Integrated Approach for Automated Software Debugging via Machine Learning and Big Code Mining

PhD

The University of Texas at Dallas

2020

Muhammad Ali Gulzar

[74]

Automated Testing and Debugging for Big Data Analytics

PhD

University of California, Los Angeles

2020

Mihir Mathur

[75]

Leveraging Distributed Tracing and Container Cloning for Replay Debugging of Microservices

Master

University of California, Los Angeles

2020

All papers in our repository2 are sorted by year, and the result is displayed in Figure 1.1. As shown in the figure, the number of publications grew rapidly after 2001, indicating that more and more researchers began to devote themselves to the area of software fault localization over the last two decades.

Figure 1.1 Papers on software fault localization from 1977 to 2020.

Figure 1.2 Publications on software fault localization in top venues from 2001 to 2020.

Also, as per our repository, Figure 1.2. gives the number of publications related to software fault localization that have appeared in top quality and leading journals and conferences that focus on Software Engineering – IEEE Transactions on Software Engineering, ACM Transactions on Software Engineering and Methodology, International Conference on Software Engineering, ACM International Symposium on Foundations of Software Engineering, and ACM International Conference on Automated Software Engineering – from 2001 to 2019. This trend again supports the claim that software fault localization is not just an important but also a popular research topic and has been discussed very heavily in top quality software engineering journals and conferences over the last two decades.

There is thus a rich collection of literature on various techniques that aim to facilitate fault localization and make it more effective. Despite the fact that these techniques share similar goals, they can be quite different from one another and often stem from ideas that originate from several different disciplines. While we aim to comprehensively cover as many fault localization techniques as possible, no article, regardless of breadth or depth, can cover all of them. In this book, our primary focus is on the techniques for locating Bohrbugs [76]. Those for diagnosing Mandelbugs [76] such as performance bugs, memory leaks, software bloats, and security vulnerabilities are not included in the scope. Also, due to space limitations, we group techniques into appropriate categories for collective discussion with an emphasis on the most important features and leave other details of these techniques to their respectively published papers. This is especially the case for techniques targeting a specific application domain, such as fault localization for concurrency bugs and spreadsheets. For these, we provide a review that helps readers with general understanding.

The following terms appear repeatedly throughout this chapter, and thus for convenience, we provide definitions for them here per the taxonomy provided in [77]:

A failure is when a service deviates from its correct behavior.

An error is a condition in a system that may lead to a failure.

A fault is the underlying cause of an error, also known as a bug.

In this book, we group fault localization techniques into appropriate categories (including traditional, slicing‐based, spectrum‐based, statistics‐based, machine learning‐based, data mining‐based, information‐retrieval‐based, model‐based, spreadsheet‐based, and emerging techniques) for collective discussion with an emphasis on the most important features. We introduce the popular subject programs that have been used in different case studies and discuss how these programs have evolved through the years. Different evaluation metrics to assess the effectiveness of fault localization techniques are also described as well as a discussion of fault localization tools and theoretical studies. Moreover, we explore some critical aspects of software fault localization, including (i) fault localization for programs with multiple bugs, (ii) inputs, outputs, and impact of test cases, (iii) coincidental correctness, (iv) faults introduced by missing code, (v) combination of multiple fault localization techniques, (vi) ties within fault localization rankings, and (vii) fault localization for concurrency bugs. The general information of each chapter is introduced as follows.

This book begins by introducing traditional software fault localization techniques in Chapter 2, including program logging, assertions, and breakpoints. Examples will also be provided to clearly explain these techniques.

Program slicing is a technique to abstract a program into a reduced form by deleting irrelevant parts such that the resulting slice will still behave in the same way as the original program with respect to certain specifications. Chapter 3 introduces slicing‐based fault localization techniques, which can be classified into three major categories: static slicing, dynamic slicing, and execution slicing‐based techniques. Examples will be given to illustrate the differences among these categories. Techniques based on other slicing such as dual slicing, thin slicing, and relevant slicing are also included.

Program spectrum‐based techniques are presented in Chapter 4. A program spectrum details the execution information of a program from certain perspectives, such as execution information for conditional branches or loop‐free intra‐procedural paths. It can be used to track program behavior. A list of different kinds of program spectra will be provided. Also discussed are issues and concerns related to program spectrum‐based techniques.

Software fault localization techniques based on well‐defined statistical analyses (e.g. parametric and nonparametric hypothesis testing, causal‐inference analysis, and cross tabulation analysis) are described in Chapter 5.

Machine learning is the study of computer algorithms that improve through experience. These techniques are adaptive and robust and can produce models based on data, with limited human interaction. Such properties have led to their employment in many disciplines including bioinformatics, natural language processing, cryptography, computer vision, etc. In the context of software fault localization, the problem at hand can be identified as trying to learn or deduce the location of a fault based on input data such as statement coverage and the execution result (success or failure) of each test case. Chapter 6 covers fault localization techniques based on machine learning techniques.

Along the lines of machine learning, data mining also seeks to produce a model using pertinent information extracted from data. Data mining can uncover hidden patterns in samples of data that may not be discovered by manual analysis alone, especially due to the sheer volume of information. Efficient data mining techniques transcend such problems and do so in reasonable amounts of time with high degrees of accuracy. The software fault localization problem can be abstracted to a data mining problem – for example, we wish to identify the pattern of statement execution that leads to a failure. Data mining‐based techniques are reviewed and analyzed in Chapter 7.

Chapter 8 introduces information retrieval (IR)‐based fault localization techniques. Fault localization is the problem of identifying buggy source code files given a textual description of a bug. This problem is important since many bugs are reported through bug tracking systems like Bugzilla and Jira, and the number of bug reports is often too many for developers to handle. This necessitates an automated tool that can help developers identify relevant files given a bug report. Due to the textual nature of bug reports, IR techniques are often employed to solve this problem. Many IR‐based fault localization techniques have been proposed in the literature.

Program models can be used for software fault localization. The first part of Chapter 9 discusses techniques based on different program models such as dependency‐based models, abstraction‐based models, and value‐based models. The second part emphasizes model checking‐based techniques.

Spreadsheets are one of the most popular types of end‐user software and have been used in many sectors, especially in business. Chapter 10 discusses how techniques using value‐based or dependency‐based models can effectively locate bugs in cells with erroneous formulae and avoid incorrect computation.

Instead of being evaluated empirically, the effectiveness of software fault localization techniques can also be analyzed from theoretical perspectives. Chapter 11 discusses theoretical studies on software fault localization.

Many of the software fault localization techniques assume that there is only one bug in the program under study. This assumption may not be realistic in practice. Mixed failed test cases associated with different causative bugs may reduce the fault localization effectiveness. In Chapter 12, we present fault localization techniques for programs with multiple bugs.

Finally, Chapter 13 presents emerging aspects of software fault localization, including how to apply the scientific method to fault localization, how to locate faults when the oracle is not available, how to automatically predict fault localization effectiveness, and how to integrate fault localization into automatic test generation tools.

The remaining part of this Chapter is organized in the following manner: we begin by describing traditional and intuitive fault localization techniques in Section 1.2, moving on to more advanced and complex techniques in Section 1.3. In Section 1.4, we list some of the popular subject programs that have been used in different case studies and discuss how these programs have evolved through the years. Different evaluation metrics to assess the effectiveness of fault localization techniques are described in Section 1.5, followed by a discussion of fault localization tools in Section 1.6. Finally, critical aspects and conclusions are presented in Section 1.7 and Section 1.8, respectively.

1.2 Traditional Fault Localization Techniques

This section describes traditional and intuitive fault localization techniques, including program logging, assertions, breakpoints, and profiling.

1.2.1 Program Logging

Statements (such as print) used to produce program logging are commonly inserted into the code in an ad‐hoc fashion to monitor variable values and other program state information [78]. When abnormal program behavior is detected, developers examine the program log in terms of saved log files or printed run‐time information to diagnose the underlying cause of failure.

1.2.2 Assertions

Assertions are constraints added to a program that have to be true during the correct operation of a program. Developers specify these assertions in the program code as conditional statements that terminate execution if they evaluate to false. Thus, they can be used to detect erroneous program behavior at runtime. More details of using assertions for program debugging can be found in [79, 80].

1.2.3 Breakpoints

Breakpoints are used to pause the program when execution reaches a specified point and allow the user to examine the current state. After a breakpoint is triggered, the user can modify the value of variables or continue the execution to observe the progression of a bug. Data breakpoints can be configured to trigger when the value changes for a specified expression, such as a combination of variable values. Conditional breakpoints pause execution only upon the satisfaction of a predicate specified by the user. Early studies (e.g. [81, 82]) use this approach to help developers locate bugs while a program is executed under the control of a symbolic debugger. The same approach is also adopted by more advanced debugging tools such as GNU GDB [83] and Microsoft Visual Studio Debugger [84].

1.2.4 Profiling

Profiling is the runtime analysis of metrics such as execution speed and memory usage, which is typically aimed at program optimization. However, it can also be leveraged for debugging activities, such as the following:

Detecting unexpected execution frequencies of different functions (e.g.

[85]

).

Identifying memory leaks or code that performs unexpectedly poorly (e.g.

[86]

).

Examining the side effects of lazy evaluation (e.g.

[87]

).

Tools that use profiling for program debugging include GNU’s gprof [88] and the Eclipse plugin TPTP [89].

1.3 Advanced Fault Localization Techniques

With the massive size and scale of software systems today, traditional fault localization techniques are not effective in isolating the root causes of failures. As a result, many advanced fault localization techniques have surfaced recently using the idea of causality [90, 91], which is related to philosophical theories with an objective to characterize the relationship between events/causes (program bugs in our case) and a phenomenon/effect (execution failures in our case). There are different causality models [91] such as counterfactual‐based, probabilistic‐ or statistical‐based, and causal calculus models. Among these, probabilistic causality models are the most widely used in fault localization to identify suspicious code that is responsible for execution failures.

In this chapter, we classify fault localization techniques into nine categories, including slicing‐based, spectrum‐based, statistics‐based, machine learning‐based, data mining‐based, IR‐based, model‐based, spreadsheet‐based techniques, and additional emerging techniques. Many studies that evaluate the effectiveness of specific fault localization techniques have been reported [92–124]. However, none of them offer a comprehensive discussion on all these techniques.

1.3.1 Slicing‐Based Techniques

Program slicing is a technique to abstract a program into a reduced form by deleting irrelevant parts such that the resulting slice will still behave the same as the original program with respect to certain specifications. Hundreds of papers on this topic have been published [125–127] since Weiser first proposed static slicing in 1979 [128].

One of the important applications of static slicing [129] is to reduce the search domain while programmers locate bugs in their programs. This is based on the idea that if a test case fails due to an incorrect variable value at a statement, then the defect should be found in the static slice associated with that variable‐statement pair, allowing us to confine our search to the slice rather than looking at the entire program. Lyle and Weiser extend the above approach by constructing a program dice (as the set difference of two groups of static slices) to further reduce the search domain for possible locations of a fault [130]. Although static slice‐based techniques have been experimentally evaluated and confirmed to be useful in fault localization [109], one problem is that handling pointer variables can make data‐flow analysis inefficient because large sets of data facts that are introduced by dereferences of pointer variables need to be stored. Equivalence analysis, which identifies equivalence relationships among the various memory locations accessed by a procedure, is used to improve the efficiency of data‐flow analyses in the presence of pointer variables [131]. Two equivalent memory locations share identical sets of data facts in a procedure. As a result, data‐flow analysis only needs to compute information for a representative memory location, and data‐flow for other equivalent locations can be garnered from the representative location. Static slicing is also applied for fault localization in binary executables [132], and type‐checkers [133].

A disadvantage of static slicing is that the slice for a given variable at a given statement contains all the executable statements that could possibly affect the value of this variable at the statement. As a result, it might generate a dice with certain statements that should not be included. This is because we cannot predict some run‐time values via a static analysis. To deal with the imprecision of static slicing, Zhang and Santelices [134] propose PRIOSLICE to refine the results reported by static slicing.