177,99 €
Distributed systems employed in critical infrastructures must fulfill dependability, timeliness, and performance specifications. Since these systems most often operate in an unpredictable environment, their design and maintenance require quantitative evaluation of deterministic and probabilistic timed models. This need gave birth to an abundant literature devoted to formal modeling languages combined with analytical and simulative solution techniques
The aim of the book is to provide an overview of techniques and methodologies dealing with such specific issues in the context of distributed systems and covering aspects such as performance evaluation, reliability/availability, energy efficiency, scalability, and sustainability. Specifically, techniques for checking and verifying if and how a distributed system satisfies the requirements, as well as how to properly evaluate non-functional aspects, or how to optimize the overall behavior of the system, are all discussed in the book. The scope has been selected to provide a thorough coverage on issues, models. and techniques relating to validation, evaluation and optimization of distributed systems. The key objective of this book is to help to bridge the gaps between modeling theory and the practice in distributed systems through specific examples.
Sie lesen das E-Book in den Legimi-Apps auf:
Seitenzahl: 646
Veröffentlichungsjahr: 2015
Contents
Cover
Half Title page
Title page
Copyright page
Preface
Part I: Verification
Chapter 1: Modeling and Verification of Distributed Systems Using Markov Decision Processes
1.1 Introduction
1.2 Markov Decision Processes
1.3 Markov Decision Well-Formed Net formalism
1.4 Case study: Peer-to-Peer Botnets
1.5 Conclusion
Acknowledgments
Appendix A Well-Formed Net Formalism
References
Chapter 2: Quantitative Analysis of Distributed Systems in Stoklaim: A Tutorial
2.1 Introduction
2.2 STOKLAIM: Stochastic KLAIM
2.3 STOKLAIM Operational Semantics
2.4 MoSL: Mobile Stochastic Logic
2.5 jSAM: Java Stochastic Model-Checker
2.6 Leader Election in STOKLAIM
2.7 Concluding Remarks
References
Chapter 3: Stochastic Path Properties of Distributed Systems: The CSLTA Approach
3.1 Introduction
3.2 The Reference Formalisms for System Definition
3.3 The Formalism for Path Property Definition: CSLTA
3.4 CSLTA at Work: A Fault-Tolerant Node
3.5 Literature Comparison
3.6 Summary and Final Remarks
References
Part II: Evaluation
Chapter 4: Failure Propagation in Load-Sharing Complex Systems
4.1 Introduction
4.2 Building Blocks
4.3 Sand Box for Distributed Failures
4.4 Summary
References
Chapter 5: Approximating Distributions and Transient Probabilities by Matrix Exponential Distributions and Functions
5.1 Introduction
5.2 Phase Type and Matrix Exponential Distributions
5.3 Bernstein Polynomials and Expolynomials
5.4 Application of BEs to Distribution Fitting
5.5 Application of BEs to Transient Probabilities
5.6 Conclusions
References
Chapter 6: Worst-Case Analysis of Tandem Queueing Systems Using Network Calculus
6.1 Introduction
6.2 Basic Network Calculus Modeling: Per-Flow Scheduling
6.3 Advanced Network Calculus Modeling: Aggregate Multiplexing
6.4 Tandem Systems Traversed by Several Flows
6.5 Mathematical Programming Approach
6.6 Related Work
6.7 Numerical Results
6.8 Conclusions
References
Chapter 7: Cloud Evaluation: Benchmarking and Monitoring
7.1 Introduction
7.2 Benchmarking
7.3 Benchmarking with mOSAIC
7.4 Monitoring
7.5 Cloud Monitoring in mOSAIC’s Cloud Agency
7.6 Conclusions
References
Chapter 8: Multiformalism and Multisolution Strategies for Systems Performance Evaluation
8.1 Introduction
8.2 Multiformalism and Multisolution
8.3 Choosing the Right Strategy
8.4 Learning by the Experience
8.5 Conclusions and Perspectives
References
Part III: Optimization and Sustainability
Chapter 9: Quantitative Assessment of Distributed Networks Through Hybrid Stochastic Modeling
9.1 Introduction
9.2 Modeling of Complex Systems
9.3 Performance Evaluation of KNXnet/IP Networks Flow Control Mechanism
9.4 LCII: On-Line Risk Estimation of a Power-Telco Network
9.5 Conclusion
Acknowledgements
References
Chapter 10: Design of it Infrastructures of Data Centers: An Approach Based on Business and Technical Metrics
10.1 Introduction
10.2 Fundamental Concepts
10.3 Business-Oriented Models
10.4 Data Center Infrastructure Models
10.5 Methodology
10.6 Case Study - Data Center Design
10.7 Conclusion
References
Chapter 11: Software Rejuvenation and its Application in Distributed Systems
11.1 Introduction
11.2 Software Rejuvenation Scheduling Classification
11.3 Software Rejuvenation Granularity Classification
11.4 Methods, Policies and Metrics of Software Rejuvenation
11.5 Software Rejuvenation in Distributed Systems
11.6 Summary
References
Chapter 12: Machine Learning Based Dynamic Reconfiguration of Distributed Data Management Systems
12.1 Introduction
12.2 Methodologies
12.3 Brief Overview of Neural Networks
12.4 System Architecture and Performance Prediction Scheme
12.5 Experimentation
12.6 Conclusions
References
Chapter 13: Going Green with the Networked Cloud: Methodologies and Assessment
13.1 Introduction
13.2 Modeling of Data Centre Power Consumption
13.3 Energy Efficiency in the Cloud
13.4 Performance Analysis Methodologies and Tools
13.5 Case Study: Performance Evaluation of Energy Aware Resource Allocation in the Cloud
13.6 Summary
References
Index
Quantitative Assessments of Distributed Systems
Scrivener Publishing 100 Cummings Center, Suite 541J Beverly, MA 01915-6106
Performability Engineering Series Series Editors: Krishna B. Misra ([email protected]) and John Andrews ([email protected])
Scope: A true performance of a product, or system, or service must be judged over the entire life cycle activities connected with design, manufacture, use and disposal in relation to the economics of maximization of dependability, and minimizing its impact on the environment. The concept of performability allows us to take a holistic assessment of performance and provides an aggregate attribute that reflects an entire engineering effort of a product, system, or service designer in achieving dependability and sustainability. Performance should not just be indicative of achieving quality, reliability, maintainability and safety for a product, system, or service, but achieving sustainability as well. The conventional perspective of dependability ignores the environmental impact considerations that accompany the development of products, systems, and services. However, any industrial activity in creating a product, system, or service is always associated with certain environmental impacts that follow at each phase of development. These considerations have become all the more necessary in the 21st century as the world resources continue to become scarce and the cost of materials and energy keep rising. It is not difficult to visualize that by employing the strategy of dematerialization, minimum energy and minimum waste, while maximizing the yield and developing economically viable and safe processes (clean production and clean technologies), we will create minimal adverse effect on the environment during production and disposal at the end of the life. This is basically the goal of performability engineering.
It may be observed that the above-mentioned performance attributes are interrelated and should not be considered in isolation for optimization of performance. Each book in the series should endeavor to include most, if not all, of the attributes of this web of interrelationship and have the objective to help create optimal and sustainable products, systems, and services.
Publishers at Scrivener Martin Scrivener ([email protected]) Phillip Carmical ([email protected])
Copyright © 2015 by Scrivener Publishing LLC. All rights reserved.
Co-published by John Wiley & Sons, Inc. Hoboken, New Jersey, and Scrivener Publishing LLC, Salem, Massachusetts. Published simultaneously in Canada.
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center, Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201) 748-6008, or online at http://www.wiley.com/go/permission.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in preparing this book, they make no representations or warranties with respect to the accuracy or completeness of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a particular purpose. No warranty may be created or extended by sales representatives or written sales materials. The advice and strategies contained herein may not be suitable for your situation. You should consult with a professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our Customer Care Department within the United States at (800) 762-2974, outside the United States at (317) 572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic formats. For more information about Wiley products, visit our web site at www.wiley.com.
For more information about Scrivener products please visit www.scrivenerpublishing.com.
Library of Congress Cataloging-in-Publication Data:
ISBN 978-1-118-59521-3
Preface
Modern technology has to implement and provide services and systems able to meet ever-increasing quality standards while minimizing costs. A way to pursue such a goal is through distributed systems, implementing multiple and complex operations to manage the user demand, thereby ensuring adequate quality levels. A distributed system consists of a collection of interconnected (autonomous) entities, subsystems or systems, properly managed and coordinated to achieve a common goal, so that it is perceived as a whole, single, integrated facility.
Distributed systems are usually a melting pot of heterogeneous technologies and processes (computing, networking, manufacturing, marketing, mechanical, economical, biological, etc.) involving complex interactions (dependencies, influences, interferences, etc.). In order to achieve an adequate standard level, not only basic functionalities have to be provided through adequate mechanisms, but also advanced ones implementing specific quality-driven policies. That way, both functional and non-functional aspects and properties become key issues to address during the whole system/product/process lifecycle at design time and run time, as well as at maintenancetesting stages, which call for adequate methodologies and techniques for their evaluation.
Indeed, distributed systems, in particular those which are a part of critical infrastructures, have to meet tight dependability, timeliness, and performance requirements and specifications. Since these systems most often operate in an unpredictable environment, their design and maintenance require quantitative evaluation of deterministic and probabilistic timed models. Specifically required are techniques for checking and verifying if and how a distributed system satisfies the requirements (verification), as well as properly evaluating its nonfunctional aspects (evaluation) or optimizing the overall behavior of the system (optimization). Verification is a process of system quality management by which a product, service or system is checked, inspected and/or tested to verify that the requirements are satisfactory. It is mainly applied at early design stages to check the system properties through specific logic statements. Evaluation refers to the act of evaluating the system’s nonfunctional properties such as performance, reliability, and availability. Optimization is instead related to the identification and selection of the best configuration available for the distributed system according to some given (usually multiple) parameters in order to meet high level requirements such as overall costs and sustainability
The boundaries among verification, evaluation, and optimization techniques and methodologies are smooth, i.e., often verification techniques include evaluation and/or optimization ones and vice versa. In particular, evaluation and optimization often overlap, but a difference between them could consist of the number of properties considered: the former usually investigates a single nonfunctional aspect of the system, while optimization problems usually evaluate the system taking into account multiple, complex, and/or composed properties such as dependability, performability, and sustainability, often also including costs. Anyway, all of them rely on models to provide their useful insights. All such considerations and needs have given birth to an abundance of literature devoted to formal modeling languages combined with analytical and simulation solution techniques. The aim of this book is to provide an overview of techniques and methodologies dealing with such specific issues in the context of distributed systems and to cover aspects such as correctness, validity, performance, reliability, availability, energy efficiency, and sustainability.
Following this path, the book has been organized in three parts dealing with verification (Part 1), evaluation (Part 2), and optimization and sustainability (Part 3) problems and issues, providing and discussing related models and techniques for investigating nonfunctional properties of distributed systems. The chapters have been selected to provide a good, although not exhaustive, coverage of issues, models and techniques related to validation, evaluation and optimization of distributed systems, hoping that this will be somehow useful in guiding students, researchers, and practitioners when approaching the quantitative assessment of distributed systems. Indeed, a key objective of this book is to help bridge the gaps between modeling theory and practice in a context distribution system through specific examples.
Specifically, included in Part 1 are three contributions on verification and model-checking models and techniques for distributed systems. Chapter 1 by Marco Beccuti, Giuliana Franceschinis, and Jeremy Sproston addresses the problem of decision making on distributed systems through a high-level probabilistic model checking formalism based on Markovian models, applied to a case study on a peer-to-peer illegal botnet. Chapter 2 by Rocco De Nicola, Diego Latella, and Michele Loreti focuses on the verification of nonfunctional properties in distributed systems through statistical model-checking techniques implemented in the StoKlaim tool, and adopted in the analysis of three election algorithms. Chapter 3 by Elvio G. Amparore, and Susanna Donatelli proposes a stochastic model-checking technique to investigate stochastic path properties of distributed systems, which has been applied to the evaluation of a flexible manufacturing system.
Part 2 focuses on the evaluation of nonfunctional properties of distributed systems and is composed of five chapters. Chapter 4 by Vitali Volovoi and Shahnewaz Siddique deals with reliability and failure propagation issues through two different strength/load interaction models adopted in the modeling of scale-free phenomena and self-organized criticality. Chapter 5 by Andras Horvath, Marco Paolieri, and Enrico Vicario addresses the problem of fitting statistical data through matrix exponential distributions, proposing a new approach based on Bernstein expolynomials applied to the representation of some well-known distributions and to the evaluation of a whole distributed system example. Chapter 6 by Anne Bouillard and Giovanni Stea is related to the performance evaluation of tandem queueing systems through network calculus, proposing a solution technique based on integer programming that is applied to a tandem scenario network. Chapter 7 by Massimo Ficco, Massimiliano Rak, Salvatore Venticinque, Luca Tasquier, and Giuseppe Aversano deals with benchmarking and monitoring techniques of different metrics in Cloud computing, comparing several available solutions. Chapter 8 by Enrico Barbierato, Marco Gribaudo, and Mauro Iacono proposes multi-formalism approaches for evaluating complex phenomena and multiple quantities in distributed systems, providing several examples in computing contexts such as service-oriented architecture, distributed software, and Big Data.
Part 3 deals with optimization of distributed systems considering multiple metrics, proposing different techniques in five chapters. Chapter 9 by Salvatore Cavalieri, Ferdinando Chiacchio, Gabriele Manno, and Peter Popov deals with performability and dependability evaluation of networks through Stochastic Activity Networks and Adaptive Transition Systems used in the evaluation of two case studies on telecommunication and power grid contexts. Chapter 10 by Almir P. Guimaraes, Paulo Maciel, and Rivalino Matias Jr. focuses on the design of IT infrastructure, proposing a quasi-optimal design strategy for data centers implementing a trade-off among technical and business aspects based on Petri nets and reliability block diagrams. It has been adapted to different data center configurations, comparing them through several performance/dependability-oriented and business-oriented metrics. Chapter 11 by Javier Alonso and Kishor S. Trivedi deals with software degradation due to aging phenomena, and also discusses several software rejuvenation techniques through examples on distributed computing systems. Chapter 12 by Diego Rughetti, Pierangelo Di Sanzo, Francesco Quaglia, and Bruno Ciciani proposes machine learning techniques for dealing with data management in distributed infrastructures, considering both quality of service requirements and costs, which are then applied to a real case study on the Amazon Elastic Cloud Computing infrastructure. Chapter 13 by Aris Leivadeas, Chrysa Papagianni, and Symeon Papavassiliou focuses on energy efficiency, sustainability, performance, and costs of networked Cloud computing, proposing a specific framework and simulation technique for the analysis of related infrastructures, which are then applied to a datacenter evaluation.
The chapters have been written by more than 40 leading experts in distributed systems, modeling formalisms, and evaluation techniques, from both academia and industry. We wish to thank all of them for their contributions and cooperation. Special thanks go to the Scrivener staff, and in particular to Martin Scrivener, who patiently supported us, and also to Krishna B. Misra and John Andrews for their valuable advice. We hope that practitioners will find this book useful when looking for solutions to practical problems, and that researchers can consider it as a first-aid reference when dealing with distributed systems from a quantitative perspective.
Dario Bruneo and Salvatore Distefano Messina, Italy, January 2015
MARCO BECCUTI1, GIULIANA FRANCESCHINIS2 AND JEREMY SPROSTON1
1Dipartimento di Informatica, Università di Torino, Italy. {beccuti,sproston}@di.unito.it
2DiSIT, Istituto di Informatica, Università del Piemonte Orientale, [email protected]
The Markov Decision Process (MDP) formalism is a well-known mathematical formalism to study systems with unknown scheduling mechanisms or with transitions whose next-state probability distribution is not known with precision. Analysis methods for MDPs are based generally on the identification of the strategies that maximize (or minimize) a target function based on the MDP’s rewards (or costs). Alternatively, formal languages can be defined to express quantitative properties that we want to be ensured by an MDP, including those which extend classical temporal logics with probabilistic operators.
The MDP formalism is low level: to facilitate the representation of complex real-life distributed systems higher-level languages have been proposed. In this chapter we consider Markov Decision Well-formed Nets (MDWN), which are probabilistic extensions of Petri nets that allow one to describe complex nondeterministic (probabilistic) behavior as a composition of simpler nondeterministic (probabilistic) steps, and which inherit the efficient analysis algorithms originally devised for well-formed Petri nets. The features of the formalism and the type of properties that can be studied are illustrated by an example of a peer-to-peer illegal botnet.
Keywords. Markov decision processes, modeling and verification.
The mathematical formalism of Markov Decision Processes (MDPs) was introduced in the 1950s by Bellman and Howard [17, 7] in the context of operations research and dynamic programming, and has been used in a wide area of disciplines including economics, manufacturing, robotics, automated control and communication systems. An MDP can be regarded as a Markov chain extended with nondeterministic choice over actions, and is typically equipped with rewards (or costs) associated with transitions from state to state.
A key notion for MDPs is that of strategy, which defines the choice of action to be taken after any possible time step of the MDP. Analysis methods for MDPs are based on the identification of the strategies which maximize (or minimize) a target function either based on the MDP’s rewards (or costs), or based on properties satisfied by the MDP’s execution paths. For example, in a distributed system, there may be different recovery and preventive maintenance policies (modeled by different actions in the MDP); we can model the system using an MDP in order to identify the optimal strategy with respect to reliability, e.g., the optimal recovery and preventive maintenance policy that maximizes system availability. Reward-based performance indices rely on standard methods for MDPs, whereas path-based properties rely on probabilistic model checking methods [8, 3].
It is important to observe that the formalism of MDPs is low level, and it could be difficult to represent directly at this level a complex real-life distributed system. To cope with this problem, a number of higher-level formalisms have been proposed in the literature (e.g., stochastic transition systems [13], dynamic decision networks [14], probabilistic extensions of reactive modules [1], Markov decision Petri nets and Markov decision well-formed nets [5], etc.).
In this chapter we introduce the MDP formalism in the context of distributed systems and discuss how to express and compute (quantitative) properties which should be ensured by an MDP model (Sec. 1.2). Markov decision well-formed nets (MDWNs) are presented highlighting how they can be a good choice to model multi-component distributed systems (Sec. 1.3) such as an illegal botnet example. Standard MDP analysis and probabilistic model checking techniques are used to compute a number of performance indices on the illegal botnet example (Sec. 1.4).
An application example: peer-to-peer botnet. The application example presented in this chapter is inspired by the peer-to-peer illegal botnet model presented in [23]. Illegal botnets are networks of compromised machines under the remote control of an attacker that is able to use the computing power of these compromised machines for different malicious purposes (e.g., e-mail spam, distributed denial-of-service attacks, spyware, scareware, etc.). Typically, infection begins by exploiting web browser vulnerabilities or by involving a specific malware (a Trojan horse) to install malicious code on a target machine. Then the injected malicious code begins its bootstrap process and attempts to join to the botnet. When a machine is connected to the botnet it is called a , and can be used for a malicious purpose (we say that it becomes a ) or specifically to infect new machines (it becomes a ). This choice is a crucial aspect for the success of the malicious activity, meaning that the trade-off between the number of working bots and the number of propagation bots should be carefully investigated. To reduce the probability to be detected, the working and propagation bots are inactive most of the time. A machine can only be recovered if an anti-malware software discovers the infection, or if the computer is physically disconnected from the network.
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
Lesen Sie weiter in der vollständigen Ausgabe!
