Distibuted Systems -  - E-Book

Distibuted Systems E-Book

0,0
144,99 €

oder
-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

In today's digital environment, distributed systems are increasingly present in a wide variety of environments, ranging from public software applications to critical systems.

Distributed Systems introduces the underlying concepts, the associated design techniques and the related security issues.

Distributed Systems: Design and Algorithms, is dedicated to engineers, students, and anyone familiar with algorithms and programming, who want to know more about distributed systems.

These systems are characterized by: several components with one or more threads, possibly running on different processors; asynchronous communications with possible additional assumptions (reliability, order preserving, etc.); local views for every component and no shared data between components. This title presents distributed systems from a point of view dedicated to their design and their main principles: the main algorithms are described and placed in their application context, i.e. consistency management and the way they are used in distributed file-systems.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 556

Veröffentlichungsjahr: 2013

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Contents

Foreword

Chapter 1. Introduction

First Part. Large Scale Peer-to-Peer Distributed Systems

Chapter 2. Introduction to Large-Scale Peer-to-Peer Distributed Systems

2.1. “Large-Scale” distributed systems?

2.2. Consequences of “large-scale”

2.3. Some large-scale distributed systems

2.4. Architectures of large scale distributed systems

2.5. Objective of Part 1

2.6. Bibliography

Chapter 3. Design Principles of Large-Scale Distributed System

3.1. Introduction to peer-to-peer systems

3.2. The peer-to-peer paradigms

3.3. Services on structured overlays

3.4. Building trust in P2P systems

3.5. Conclusion

3.6. Bibliography

Chapter 4. Peer-to-Peer Storage

4.1. Introduction

4.2. BitTorrent

4.3. Gnutella

4.4. Conclusion

4.5. Bibliography

Chapter 5. Large-Scale Peer-to-Peer Game Applications

5.1. Introduction

5.2. Large-scale game applications: model and specific requirements

5.3. Overview of peer-to-peer overlays for large-scale game applications

5.4. Overlays for FPS games

5.5. Overlays for online life-simulation games

5.6. Conclusion

5.7. Bibliography

Second Part. Distributed, Embedded and Real-Time Systems

Chapter 6. Introduction to Distributed Embedded and Real-time Systems

6.1. Distributed real-time embedded systems

6.2. Safety critical systems as examples of DRE systems

6.3. Design process of DRE systems

6.4. Objectives of Part 2

6.5. Bibliography

Chapter 7. Scheduling in Distributed Real-Time Systems

7.1. Introduction

7.2. Generalities about real-time systems

7.3. Temporal correctness

7.4. WCRT of the tasks

7.5. WCRT of the messages

7.6. Case study

7.7. Conclusion

7.8. Bibliography

Chapter 8. Software Engineering for Adaptative Embedded Systems

8.1. Introduction

8.2. Adaptation, an additional complexity factor

8.3. Theoretical aspects of adaptation management

8.4. Technical solutions for the design of adaptative embedded systems

8.5. An example of adaptative system from the robotic domain

8.6. Applying MDE techniques to the design of the robotic use-case

8.7. Exploitation of the models

8.8. Conclusion

8.9. Bibliography

Chapter 9. The Design of Aerospace Systems

9.1. Introduction

9.2. Flight software typical architecture

9.3. Traditional development methods and their limits

9.4. Modeling a software system using TASTE: philosophy

9.5. Common solutions

9.6. What TASTE specifically proposes

9.7. Modeling process and tools

9.8. Technology

9.9. Model transformations

9.10. The TASTE run-time

9.11. Illustrating our process by designing heterogeneous systems

9.12. First user feedback and TASTE future

9.13. Conclusion

9.14. Bibliography

Third Part. Security in Distributed Systems

Chapter 10. Introduction to Security Issues in Distributed Systems

10.1. Problem

10.2. Secure data exchange

10.3. Security in specific distributed systems

10.4. Outline of Part III

10.5. Bibliography

Chapter 11. Practical Security in Distributed Systems

11.1. Introduction

11.2. Confidentiality

11.3. Authentication

11.4. Availability and fault tolerance

11.5. Ensuring resource security

11.6. Result checking in distributed computations

11.7. Conclusion

11.8. Bibliography

Chapter 12. Enforcing Security with Cryptography

12.1. Introduction

12.2. Cryptography: from a general perspective

12.3. Symmetric encryption schemes

12.4. Prime numbers and public key cryptography

12.5. Conclusion

12.6. Bibliography

List of Authors

Index

First published 2011 in Great Britain and the United States by ISTE Ltd and John Wiley & Sons, Inc.

Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or transmitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms and licenses issued by the CLA. Enquiries concerning reproduction outside these terms should be sent to the publishers at the undermentioned address:

ISTE Ltd27-37 St George’s RoadLondon SW19 4EUUK

John Wiley & Sons, Inc.111 River StreetHoboken, NJ 07030USA

www.iste.co.uk

www.wiley.com

© ISTE Ltd 2011

The rights of Serge Haddad, Fabrice Kordon, Laurent Pautet and Laure Petrucci to be identified as the authors of this work have been asserted by them in accordance with the Copyright, Designs and Patents Act 1988.

Library of Congress Cataloging-in-Publication Data

Distributed systems: design and algorithms / edited by Serge Haddad … [et al.].

p. cm.

Includes bibliographical references and index.

ISBN 978-1-84821-250-3

1. Electronic data processing--Distributed processing. 2. Peer-to-peer architecture (Computer networks) 3. Computer algorithms. 4. Embedded computer systems. 5. Real-time data processing. I. Haddad, Serge.

QA76.9.D5D6144 2011

004’.33--dc22

2011012243

British Library Cataloguing-in-Publication Data

A CIP record for this book is available from the British Library

ISBN 978-1-84821-250-3

Foreword

It is hard to imagine today a single computation that does not rely on at least one distributed system directly or indirectly. It could be a distributed file system, a distributed database, a content distribution network, a peer-to-peer game, a remote mal-ware detection service, a sensor network, or any other distributed computation. Distributed systems have become the equivalent of economic globalization in the world of computing. Adopted for economic reasons, powered by highly efficient and ubiquitous networking, distributed systems define the default architecture for almost every computing infrastructure in use today.

Over the last two decades, distributed systems have taken many shapes and forms. Clusters of computers were among the earliest generations of distributed systems, whose goal was to provide a cost-effective alternative to highly expensive parallel machines. File servers were first to evolve from the cluster-based distributed system model to serve an increasing hunger for storage. The World Wide Web introduced the web server and, with it, the client-server distributed system model, on which millions of other Internet services have been built. Peer-to-peer systems appeared as an “anti-globalization movement”, in fact an anti-corporate globalization movement that fought against the monopoly of the service provider in the client-server model. Cloud computing turned distributed systems into a utility that offers computing and storage as services over the Internet. One of the emerging and least expected beneficiaries of cloud computing will be the mobile world of smart phones and personal devices, whose resource limitation can be solved through computation offloading. At the other end, wireless networking has initiated the use of distributed systems in sensor networks and embedded devices. Finally, online social networking is providing a novel use for distributed systems.

With this multitude of realizations, distributed systems have generated a rich set of research problems and goals. Performance was the first one. However, although the performance of distributed systems has increased, there has been a resultant increase in the programming burden. For a decade, research in distributed systems had tried to reconcile performance and programmability by making the distribution of computation transparent to the programmer through software distributed shared memory. In the end, things have not become simpler as achieving performance under distributed shared memory comes with a non-negligible semantic cost caused by the relaxed memory consistency models.

With the shift of distributed systems towards file systems and Internet-based services, the research changed focus from performance to fault tolerance and availability. More recently, the ubiquity of distributed system architecture has resulted in an increased research interest in manageability aspects. Concerns of sustainability resulted in energy-aware distributed servers, which essentially proposed dynamic reconfiguration for energy saving without performance loss. In the mobile arena, wireless networking introduced the important issues of location-awareness, ad-hoc networking, and distributed data collection and processing. Finally, as computation and storage is increasingly offloaded to the cloud, issues of security and privacy have recently gained momentum.

This book is a journey into three domains of this vast landscape of distributed systems: large-scale peer-to-peer systems, embedded and real-time systems, and security in distributed systems. The authors have recognized expertise in all three areas, and, more importantly, the experience of building real distributed systems. This book reflects the expertise of its authors by balancing algorithms and fundamental concepts with concrete examples.

Peer-to-peer systems have generated a certain fascination amongst researchers. I see at least two reasons for this. First, peer-to-peer systems come from the position of the challenger who wants to take away the crown from the long-reigning client-server model. Essentially, the challenge is whether it is possible for a democratic society of systems to function efficiently without leadership. I am not sure whether history has ever proven that this is possible, but the peer-to-peer systems researchers have shown it to be possible. They employed efficient peer-to-peer data structures called distributed hash tables (DHT) to achieve scalable data retrieval when peers come and go, fail or misbehave.

Tribal instinct might also be responsible for our interest in peer-to-peer systems: it is more likely to seek help from our peers whenever possible rather than from the outsiders. This may explain the popularity of peer-to-peer applications, such as Gnutella, BitTorrent, and the peer-to-peer games discussed in the book, some of them (Gnutella) developed even before researchers showed how to design peer-to-peer systems efficiently.

However, take heed, occasionally, peer-to-peer systems can be an illusion. Popular social networks today may look like peer-to-peer systems to the user, but, in reality, their implementation is heavily centralized. Recent concerns of data ownership and privacy have triggered an appetite for building truly peer-to-peer online social networks. It is better to understand how peer-to-peer systems work rather than be fooled again.

The distributed embedded and real-time systems, which make the middle part of the book, take distributed systems’ computing labs or centers, into the real, uncontrollable world. Whether embedded in cars, buildings, or our own bodies, embedded systems must function without continuous operator assistance, adapting their functionality to the changing demands of the physical systems they assist or control. Physical systems may also incorporate highly inter-connected embedded computers in order to become cyber-physical systems. Computer scientists have always been good at designing systems for themselves: languages, operating systems, and network protocols. However, embedded systems are about others. They represent a prerequisite in implementing Mark Weiser’s vision of pervasive computing, according to which computers will not just become ubiquitous, but also invisible.

Embedded computing often demands real-time guarantees, a requirement that has been shown to be challenging for any kind of computing, not just for distributed systems. This part of the book covers distributed real-time systems, how to build adaptive embedded systems from a software engineering perspective, and concludes with an interesting real-world example of software design for an aerospace system using the modeling tool they developed. After reading this book, whenever you fly, I am sure you will hope that the engineer who designed the plane’s software has read it too.

Finally, the last part of the book covers security in distributed systems. Distributed systems inherently require security. Whether they are clients and servers or just peers, these parties, as in real life, rarely trust each other. The authors present key aspects of grid systems’ security and dependability such as confidentiality, authentication, availability, and integrity. With the increasing popularity of cloud computing, security and privacy issues will be an even greater concern. Virtual machine environments are shown not to be sufficiently trustworthy as long as they are in the hands of the cloud providers. Users are likely to ask for stronger assurances, which may come from using the Trusted Platform Module (TPM) support, presented in this book, as well as from intelligent auditing techniques. The book’s last section is about cryptography, the mystical part of computer science, which we always rely on when it comes to protecting the confidentiality of our communications.

Who should read the book? The authors recommend it for engineers and masters students. I am inclined to agree with them that this book is certainly not for the inexperienced. It requires some background knowledge, but also the maturity to read further from the recommended bibliography in order to fully internalize the material. If you finish the book and want to read more, stay tuned; this is just the first book: more is coming.

Professor Liviu IftodeRutgers University

Chapter 1

Introduction1

Problematics

Most do not know but, in 1946, ENIAC, one of the first known computers (after Z3 in 1941 and Colossus in 1943), was already a parallel machine [WIK 11a]. The very basic programming mechanisms1 of this computer hid this capability that was never really exploited.

Then, for years, programming remained sequential. In the 1960s, interest in parallel programming increased again. Two approaches were then explored:

– supercomputers;

– multi-processor computers.

Parallel programming on supercomputers

In the 1960s, the notion of supercomputers emerged. It was a brute-force2 computer able to perform complex scientific calculi such as meteorologic predictions.

Technologies of this time made such computers extremely costly. As an example, the Cray-1 (1976), which constituted a major achievement in this domain, cost US$ 8.8 million [CRA 11]. It weighed 5 tons and performed 166 MegaFLOPS (FLOPS means FLoating point Operations Per Second). It required a dedicated electrical power supply and had a very complex cooling system.

Parallelism in such computers was mainly based on the “vector calculus” principle. The idea is to apply a given operator to a vector instead of a scalar. This objective was achieved thanks to pipe-line in processors, which enabled “pseudo-parallelism” to process several data simultaneously. The same operation was performed on several data but at different stages in the pipe-line.

The Cray Y-MP, in 1988, was based on this principle but also comprised parallel configurations from four up to eight vector units. Then, two types of parallelism coexist: vector-based and multi-processor based.

FORTRAN was the traditional programming language for numerical applications. It was rapidly extended to enable parallel programming. Other languages were enriched with new instructions or associated libraries to also handle parallelism. However, to get all benefits from such computers, a deep understanding of their architecture was required. Then, when a computer was replaced by a new one implementing a more efficient architecture, programs had to be thoroughly rewritten to exploit the new capabilities. This additional cost could only be supported by large institutions such as the army, aircraft manufactories, state research agencies, etc.

Parallel programming on multi-processor machines

During the second half of the 1990s, when network usage was growing, some started to build clusters of computers. It was first a way to build “supercomputers for the poor” by assembling PC motherboards connected via a dedicated local network. The time of the first configurations involving 64 to 128 nodes is now gone: current clusters are gathering thousands of machines via high-speed networks such as glass reinforced plastic fibers. The Jaguar machine3 is made of 18,688 processors, each one being an hex-core (that makes 112,128 cores in total) [WIK 11b]. Its peak computing capacity is 1.75 PetaFLOPS (1015 FLOPS).

The cost of clusters dramatically increased the affordability of supercomputers and thus almost all the fastest machines in the world are of this type. Most companies selling “traditional supercomputers” have reduced their activity.

Another aspect made this new generation of supercomputers popular; their programming is much easier and reusable compared with the old ones. This is because the programming paradigm is that of a distributed application that is not architecture dependent. Hence, “classical” distributed programming techniques can be used and preserved when transferring the program from one machine to another.

In fact, the Internet itself can be seen as a gigantic supercomputer, as applications like SETI@home [UCB 11] do. Thus, some experiments involve dozens of thousands of machines over the Internet. The main difference with clusters is that the nodes are not connected via a high-speed network. There is also a need to check for trust between nodes.

Thus, the problem of distributed computing is now mainly a software problem. However, distributed applications belong to a difficult class of problems.

Objectives of this book

This book is aimed at engineers or masters students or anyone familiar with algorithmic and programming who wants to know more about distributed systems.

We deliberately chose, in this first book, to group the presentation of distributed systems in relation to their design and their main principles. To do so, we present both the main algorithms and replace them in their application context (i.e. consistency management and the way they are used in distributed file systems).

Description of chapters

The first part is dedicated to large-scale peer-to-peer distributed systems. This is currently a very active area with new improvements (especially those induced by mobility and the numerous small devices we use daily):

– Chapter 3 presents the main principles of large-scale distributed peer-to-peer systems. It details the main algorithms used to communicate and ensure trust in such systems.

– Chapter 4 deals with peer-to-peer storage, an application domain which has already accumulated several years of experience. Some well-known protocols and tools, such as BitTorrent and Gnutella, are detailed.

– Chapter 5 presents another hot application domain for such systems: gaming. Once again, the principles adapted to this class of applications are put into a practical perspective.

The second part is dedicated to distributed real-time embedded systems: a domain that has always been very active. The topic of distributed systems is now gaining importance in the design of the next real-time systems generation.

– Chapter 7 presents the holistic analysis, a well-known method used to compute the schedulability of distributed real-time systems. This chapter provides some background knowledge of scheduling analysis in the distributed real-time systems area.

– Chapter 8 deals with the design of adaptative real-time embedded systems. This second contribution provides the results for some schedulability theories, it also details some of the fundamental insights of the design process of adaptative real-time embedded systems.

– Chapter 9 presents an innovative approach to designing the new generation space systems. This approach is supported by a toolset which is required to automate the process where possible.

The third part is devoted to security issues in distributed systems. This is a critical area that is of the utmost importance in such systems where trust among communicating entities and confidentiality of data are key issues:

– Chapter 11 presents the main characteristics of grid computing, with a focus on security. The security properties that have to be guaranteed are detailed, and how they are achieved in practice is presented through several case studies.

– Chapter 12 tackles the issue of data confidentiality using cryptography. It describes the core techniques which use symmetric key and public key protocols, and details their main characteristics.

The MeFoSyLoMa community

MeFoSyLoMa (Méthodes Formelles pour les Systèmes Logiciels et Matériels4) is an association gathering several world-renowned research teams from various laboratories in the Paris area [MEF 11]. It is composed of people from LIP65 (P. & M. Curie University), LIPN6 (University of Paris 13), LSV7 (École Normale Supérieure de Cachan), LTCI8 (Telecom ParisTech), CÉDRIC9, (CNAM), IBISC10 (University of Évry-Val-d’Esssone), andLACL 11 (University of Paris 12). Its members, approximately 80 researchers and PhD students, all have common interest in the construction of distributed systems and promote a software development cycle based on modeling, analysis (formal), and model-based implementation. This community was founded in 2005 and is federated by regular seminars from well-known researchers (inside and outside the community) as well as by common research activities and the organization of events in their domains such as conferences, workshops, or book writing.

The editors of this book, as well as most authors, are from this community.

Bibliography

[CRA 11] CRAY-RESEARCH, “Cray history”, http://www.cray.com/about_cray/history.html 2011.

[MEF 11] MEFOSYLOMA, “MeFoSyLoMa, home-page”, www.mefosyloma.fr 2011.

[TOP 11] TOP500.ORG, “BlueGene/L”, http://www.top500.org/lists/2010/06 2011.

[UCB 11] UCB, “SETI@home”, http://setiathome.berkeley.edu/ 2011.

[WIK 11a] WIKIPEDIA, “ENIAC”, http://en.wikipedia.org/wiki/ENIAC 2011.

[WIK 11b] WIKIPEDIA, “Jaguar (computer)”, http://en.wikipedia.org/wiki/Jaguar_(computer) 2011.

1 Introduction written by Serge HADDAD, Fabrice KORDON, Laurent PAUTET and Laure PETRUCCI.

1. From documents and pictures, it is clear that ENIAC looked more like an old-fashioned telephone center than a modern computer. It was programmed by means of cables wiring switches.

2. As a comparison, a modern laptop is far more efficient than the Cray Y-MP from 1988, the most powerful computer at this time.

3. This was the fastest computer recorded in June 2010 [TOP 11].

4. This acronym stands for Formal Methods for Software and Hardware Systems (in French).

5. Laboratoire d’Informatique de Paris 6.

6. Laboratoire d’Informatique de Paris Nord.

7. Laboratoire de Spécification et de Vérification.

8. Laboratoire Traitement et Communication de l’Information.

9. Centre d’Études et de Recherche en Informatique du CNAM.

10. Informatique, Biologie Intégrative et Systèmes Complexes.

11. Laboratoire d’Algorithmique, Complexité etLogique.

FIRSTPART

Large Scale Peer-to-Peer Distributed Systems

Chapter 2

Introduction to Large-Scale Peer-to-Peer Distributed Systems1

2.1. Large-Scale distributed systems?

For several years now, the term Large-Scale has applied to distributed systems. It indicates that they involve a very large number of computers that are usually spread worldwide.

As a typical example, we are all regular users of a large-scale distributed system: the Internet. So, the Internet universe provides a rough idea of the way such systems work. How programs contact and cope with other programs (such as the relationship between a web browser and a web server) can be easily imagined. Moreover, it is possible to feel how the Internet is a worldwide parallel machine of which each connected computer is a part. The structure of such distributed systems can thus be comprehended by anyone regularly using the Internet.

However, only a few really understand the implication of large scale regarding such distributed systems. Creating a program to be executed on a few (or several dozens) nodes is completely different from the same exercise for a few thousand nodes or more. This is a problem when ubiquitous systems 1 become more and more present in our day-to-day life.

2.2. Consequences of large-scale

What are the consequences of having large-scale distributed systems? Simply the number of components in the involved systems (to illustrate the concepts, we consider here that all components are programs, even if the human factor is important too). The following are some of the problems encountered:

communication;

fault tolerance;

decision making.

2.2.1. Communication in large-scale distributed systems

For distributed systems, every component can be connected to any other. However, such a connectivity cannot be maintained when the system grows. It is obvious that the management of dozens of thousands of connections cannot be handled by a single program.

Point-to-point communication can no longer be adapted; new mechanisms are required, such as broadcast or multicast. It should be noted that such mechanisms already exist (for example, they are used in the Internet, when a new router is connected to the global network).

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!

Lesen Sie weiter in der vollständigen Ausgabe!