Publish / Subscribe Systems - Sasu Tarkoma - E-Book

Publish / Subscribe Systems E-Book

Sasu Tarkoma

0,0
84,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

This book offers an unified treatment of the problems solved by publish/subscribe, how to design and implement the solutions

In this book, the author provides an insight into the publish/subscribe technology including the design, implementation, and evaluation of new systems based on the technology.  The book also addresses the basic design patterns and solutions, and discusses their application in practical application scenarios. Furthermore, the author examines current standards and industry best practices as well as recent research proposals in the area. Finally, necessary content matching, filtering, and aggregation algorithms and data structures are extensively covered as well as the mechanisms needed for realizing distributed publish/subscribe across the Internet.

Key Features:

  • Addresses the basic design patterns and solutions
  • Covers applications and example cases including; combining Publish/Subscribe with cloud, Twitter, Facebook, mobile push (app store), Service Oriented Architecture (SOA), Internet of Things and multiplayer games
  • Examines current standards and industry best practices as well as recent research proposals in the area
  • Covers content matching, filtering, and aggregation algorithms and data structures as well as the mechanisms needed for realizing distributed publish/subscribe across the Internet

Publish/Subscribe Systems will be an invaluable guide for graduate/postgraduate students and specialists in the IT industry, distributed systems and enterprise computing, software engineers and programmers working in social computing and mobile computing, researchers. Undergraduate students will also find this book of interest.

Sie lesen das E-Book in den Legimi-Apps auf:

Android
iOS
von Legimi
zertifizierten E-Readern

Seitenzahl: 639

Veröffentlichungsjahr: 2012

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Table of Contents

Wiley Series in Communications Networking & Distributed Systems

Title Page

Copyright

About the Author

Notes on Contributors

Preface

Chapter 1: Introduction

1.1 Overview

1.2 Components of a Pub/Sub System

1.3 A Pub/Sub Service Model

1.4 Distributed Pub/Sub

1.5 Interfaces and Operations

1.6 Pub/Sub Semantics for Targeted Delivery

1.7 Communication Techniques

1.8 Environments

1.9 History

1.10 Application Areas

1.11 Structure of the Book

References

Chapter 2: Networking and Messaging

2.1 Networking

2.2 Multicast

2.3 Reverse Path Forwarding and Routing

2.4 Causality and Clocks

2.5 Message Passing and RPC/RMI

2.6 Web Services

2.7 Session Initiation Protocol (SIP)

2.8 Summary

References

Chapter 3: Overlay Networks and Distributed Hash Tables

3.1 Overview

3.2 Usage

3.3 Consistent Hashing

3.4 Geometries

3.5 DHTs

3.6 Gossip Systems

3.7 Summary

References

Chapter 4: Principles and Patterns

4.1 Introduction

4.2 General Pub/Sub Model

4.3 Architectural Patterns

4.4 Design Patterns

4.5 Design Patterns for Pub/Sub

4.6 Event Notifier Pattern

4.7 Enterprise Integration Patterns

4.8 Summary

References

Chapter 5: Standards and Products

5.1 CORBA Event Service

5.2 CORBA Notification Service and Channel Management

5.3 OMG Data Distribution Service (DDS)

5.4 SIP Event Framework

5.5 Java Delegation Event Model

5.6 Java Distributed Event Model

5.7 Java Message Service (JMS)

5.8 TibCo Rendezvous

5.9 COM+ and .NET

5.10 Websphere MQ

5.11 Advanced Message Queuing Protocol (AMQP)

5.12 MQ Telemetry Transport (MQTT)

5.13 Summary

References

Chapter 6: Web Technology

6.1 REST

6.2 AJAX

6.3 RSS and Atom

6.4 SOAP

6.5 XMPP

6.6 Constrained Application Protocol (CoAP)

6.7 W3C DOM Events

6.8 WS-Eventing and WS-Notification

6.9 Summary

References

Chapter 7: Distributed Publish/Subscribe

7.1 Overview

7.2 Filtering Content

7.3 Routing Function

7.4 Topic-Based Routing

7.5 Filter-Based Routing

7.6 Content-Based Routing

7.7 Rendezvous-Based Routing

7.8 Routing Invariants

7.9 Summary

References

Chapter 8: Matching Content Against Constraints

8.1 Overview

8.2 Matching Techniques

8.3 Filter Preliminaries

8.4 The Counting Algorithm

8.5 Matching with Posets

8.6 Tree Matcher

8.7 XFilter and YFilter

8.8 Bloom Filters

8.9 Summary

References

Chapter 9: Research Solutions

9.1 Gryphon

9.2 The Cambridge Event Architecture (CEA)

9.3 Scalable Internet Event Notification Architecture (SIENA)

9.4 Elvin

9.5 JEDI

9.6 PADRES

9.7 REDS

9.8 GREEN

9.9 Rebeca

9.10 XSIENA and StreamMine

9.11 Fuego Event Service

9.12 STEAM

9.13 ECho and JECho

9.14 DHT-Based Systems

9.15 Summary

References

Chapter 10: IR-Style Document Dissemination in DHTs

10.1 Introduction

10.2 Data Model and Problem Statement

10.3 STAIRS: Threshold-Based Document Filtering in DHTs

10.4 Recent Progress and Discussion

10.5 Summary

References

Chapter 11: Advanced Topics

11.1 Security

11.2 Composite Subscriptions

11.3 Filter Merging

11.4 Load Balancing

11.5 Content-Based Channelization

11.6 Reconfiguration

11.7 Mobility Support

11.8 Congestion Control

11.9 Evaluation of Pub/Sub Systems

11.10 Summary

References

Chapter 12: Applications

12.1 Cloud Computing

12.2 SOA and XML Brokering

12.3 Facebook Services

12.4 PubSubHubbub

12.5 Complex Event Processing (CEP)

12.6 Online Advertisement

12.7 Online Multiplayer Games

12.8 Apple Push Notification Service (APNS)

12.9 Internet of Things

12.10 Summary

References

Chapter 13: Clean-Slate Datacentric Pub/Sub Networking

13.1 Datacentric Communication Model

13.2 CCN

13.3 PSIRP/PURSUIT

13.4 Internet Interdomain Structure

13.5 Summary

References

Chapter 14: Conclusions

Index

Wiley Series in Communications Networking & Distributed Systems

Series Editors:

David Hutchison,

Lancaster University, Lancaster, UK

Serge Fdida,

Université Pierre et Marie Curie, Paris, France

Joe Sventek,

University of Glasgow, Glasgow, UK

The ‘Wiley Series in Communications Networking & Distributed Systems’ is a series of expert-level, technically detailed books covering cutting-edge research, and brand new developments as well as tutorial-style treatments in networking, middleware and software technologies for communications and distributed systems. The books will provide timely and reliable information about the state-of-the-art to researchers, advanced students and development engineers in the Telecommunications and the Computing sectors.

Other titles in the series:

Wright: Voice over Packet Networks 0-471-49516-6 (February 2001)

Jepsen: Java for Telecommunications 0-471-49826-2 (July 2001)

Sutton: Secure Communications 0-471-49904-8 (December 2001)

Stajano: Security for Ubiquitous Computing 0-470-84493-0 (February 2002)

Martin-Flatin: Web-Based Management of IP Networks and Systems 0-471-48702-3 (September 2002)

Berman, Fox, Hey: Grid Computing. Making the Global Infrastructure a Reality 0-470-85319-0 (March 2003)

Turner, Magill, Marples: Service Provision. Technologies for Next Generation Communications 0-470-85066-3 (April 2004)

Welzl: Network Congestion Control: Managing Internet Traffic 0-470-02528-X (July 2005)

Raz, Juhola, Serrat-Fernandez, Galis: Fast and Efficient Context-Aware Services 0-470-01668-X (April 2006)

Heckmann: The Competitive Internet Service Provider 0-470-01293-5 (April 2006)

Dressler: Self-Organization in Sensor and Actor Networks 0-470-02820-3 (November 2007)

Berndt: Towards 4G Technologies: Services with Initiative 0-470-01031-2 (March 2008)

Jacquenet, Bourdon, Boucadair: Service Automation and Dynamic Provisioning Techniques in IP/MPLS Environments 0-470-01829-1 (March 2008)

Gurtov: Host Identity Protocol (HIP): Towards the Secure Mobile Internet 0-470-99790-7 (June 2008)

Boucadair: Inter-Asterisk Exchange (IAX): Deployment Scenarios in SIP-enabled Networks 0-470-77072-4 (January 2009)

Fitzek: Mobile Peer to Peer (P2P): A Tutorial Guide 0-470-69992-2 (June 2009)

Shelby: 6LoWPAN: The Wireless Embedded Internet 0-470-74799-4 (November 2009)

Stavdas: Core and Metro Networks 0-470-51274-1 (February 2010)

Gómez Herrero, van der Ven, Network Mergers and Migrations: Junos® Design and Implementation 0-470-74237-2 (March 2010)

Jacobsson, Niemegeers, Heemstra de Groot, Personal Networks: Wireless Networking for Personal Devices 0-470-68173-X (June 2010)

Minei, Lucek: MPLS-Enabled Applications: Emerging Developments and New Technologies, Third Edition, 0-470-66545-9 (December 2011)

Barreiros: QOS-Enabled Networks, 0-470-68697-9 (December 2011)

This edition first published 2012

© 2012 John Wiley & Sons Ltd

Registered office

John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom

For details of our global editorial offices, for customer services and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com.

The right of the author to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988.

All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher.

Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books.

Designations used by companies to distinguish their products are often claimed as trademarks. All brand names and product names used in this book are trade names, service marks, trademarks or registered trademarks of their respective owners. The publisher is not associated with any product or vendor mentioned in this book. This publication is designed to provide accurate and authoritative information in regard to the subject matter covered. It is sold on the understanding that the publisher is not engaged in rendering professional services. If professional advice or other expert assistance is required, the services of a competent professional should be sought.

Library of Congress Cataloging-in-Publication Data

Tarkoma, Sasu.

Publish/subscribe systems : design and principles / Sasu Tarkoma.

p. cm.

Includes bibliographical references and index.

ISBN 978-1-119-95154-4 (pbk.)

1. Push technology (Computer networks) I. Title.

TK5105.887.T37 2012

006.7'876–dc23

2012010711

A catalogue record for this book is available from the British Library.

Paper ISBN: 9781119951544

About the Author

Sasu Tarkoma received his MSc and PhD degrees in Computer Science from the University of Helsinki, Department of Computer Science. He is full professor at University of Helsinki, Department of Computer Science and Head of the networking and services specialization line. He has managed and participated in national and international research projects at the University of Helsinki, Helsinki University of Technology, and Helsinki Institute for Information Technology (HIIT). He has worked in the IT industry as a consultant and chief system architect as well as principal researcher and laboratory expert at Nokia Research Center. He has over 100 publications, several patents in the area of distributed systems and mobile computing, and has also authored several books on distributed systems.

Notes on Contributors

Dr. Weixiong Rao contributed to the topic-based pub/sub part of Chapter 7, and contributed Chapter 10. Dr. Rao is a post-doctoral researcher at University of Helsinki.

Mr. Kari Visala contributed Chapter 13. He is a PhD student at Aalto University and researcher at Helsinki Institute for Information Technology.

Ms. Nelli Tarkoma produced most of the diagrams used in this book. She is a professional graphic artist and illustrator.

This work was supported by the Academy of Finland, grant numbers 255932, 139144, 135230.

Preface

The book offers a unified presentation of the publish/subscribe technology including the design, implementation, and evaluation of new systems based on the technology. Publish/subscribe is a frequently used paradigm for connecting information suppliers and consumers across time and space. The paradigm is extensively applied in modern distributed services, and it has a profound role in current and forthcoming enterprise, cloud, and mobile solutions. The book covers the basic design patterns and solutions, and discusses their application in practical application scenarios. The book examines current standards and industry best practices as well as recent research proposals in the area. The necessary content matching, filtering, and aggregation algorithms and data structures are extensively covered, and the mechanisms needed for realizing distributed publish/subscribe across the Internet.

Chapter 1

Introduction

Publish/subscribe (pub/sub) technology encompasses a wide number of solutions that aim at solving a vital problem pertaining to timely information dissemination and event delivery from publishers to subscribers [1, 2]. In this chapter, we give an overview to pub/sub systems, examine their history, and motivate the contents and structure of this book.

1.1 Overview

The pub/sub paradigm is very useful in describing and monitoring the world around us. Any person meets a constant barrage of events in his waking hours. Most of these events are irrelevant and they should not be allowed to consume the decision maker's resources of awareness, watchfulness, processing and deciding upon actions. Some events are useful to notice and then there are others which are important, even critically important and create the need to muster all the tools and resources to hand. The ability to be aware of a rich stream of events with minimal exertion and to immediately detect critical events for further processing is central to any successful person or organization. The task of efficient event awareness is formidable.

There are a couple of mitigating factors, though. Typically we might know something about the probable sources of interesting events, although we are not actually interested in knowing who sends the notification of an event. Also we might know in advance something about the type of interesting events and can use this knowledge to preselect sources and also to recognize which are critical events. Thus we are interested in event streams of certain types and sources. One can say that we want to subscribe only such a subset of events streams that is enriched for our purposes.

For digital communication purposes this can be interpreted like this: we need a useful communication paradigm, a pub/sub, also called event notification, service that enables the communication components to dynamically detect and isolate particular events. Simultaneously the pub/sub service must allow introduction of new kinds of events. The participating components are generally unaware of each other, that is, an event may be sourceless from the viewpoint of the receiver.

The pub/sub information dissemination and event delivery problem can be stated as follows: How to deliver information from its publishers to interested and active subscribers in an efficient and timely manner? Information is delivered in the form of asynchronous events, which are first detected, and then delivered by publishers to active subscribers in the form of notification messages.

The problem is vital, because many applications require timely data dissemination. To give some examples, stock market data updates, online advertising, asynchronous events in a graphical user interface (GUI), purchase and delivery tracking, digital news delivery, online games, Web feeds (RSS), and in signalling in many embedded and industrial systems. Indeed, pub/sub is a general enabler for many different kinds of applications and it is especially useful in connecting distributed components together forming a basis for loosely coupled systems.

This problem is also challenging, because the information delivery and processing environments can be diverse and a single technological solution cannot address all these environments and the scenario specific requirements. Thus many different pub/sub systems have been developed. Research oriented systems have demonstrated algorithms, structures, and optimizations to pub/sub technology being applied in a certain operating environment. Industry standards have defined the conventions, interfaces, and Application Programming Interfaces (APIs) for creating interoperable pub/sub-based products and solution that use the technology. Thus academic research and industry standardization address two different but partially overlapping facets of the information dissemination problem.

Pub/sub and event-based systems are very different from database systems, because they enable data dissemination from publishers to subscribers in the present and future. This contrasts the traditional database model, in which queries are performed on existing data that is available in a database. The notions of database query and subscription are similar, but the query is about the past whereas the subscription is about the future when it is issued. Data tuples stored in a database and the published event, or notification, are also similar, but differ in that the event is forwarded from the publisher to the subscriber and is not stored by the pub/sub system other than for queuing purposes.

Pub/sub is a broad technology domain and consists of many solutions for different environments. Experiences in building pub/sub solutions and implementing them suggest that no single solution is able to meet the demands of the differing application environments and their requirements. This is evident in the number of pub/sub related standards, implementations, protocols, and algorithms. Yet, the goal of connecting diverse communicating entities through a substrate that supports asynchronous one-to-many communication is shared by these solutions.

Pub/sub is a potential candidate to become a key enabler for Web and mobile applications. On the Web, pub/sub enables the asynchronous communication of various Web components, such as web pages and web sites. Figure 1.1 presents a vision for content dissemination on the Internet that has inspired Google's Pubsubhubbub system.1 In this vision, anyone can become content publisher and aggregator. Open interfaces and protocols allow the integration of various content sources. Some publishers and sites become large and others remain small and topical.

Figure 1.1 A vision of a self-organizing content dissemination system.

Popular alert services, such as Google Alerts2 and Microsoft Live Alerts3 allow end users to input keywords and receive relevant dynamic Web content. They are examples of centralized pub/sub solutions for the Web. Their implementation details are not available, but it is believed that alert services are still based on batch processing through search engines. The search engines need to crawl and index live content. Except for a small number of frequently crawled selected sites, the crawling period is typically in the order of a week or tens of days. Thus, they offer a limited form of pub/sub. The next step would be a more decentralized, scalable, and real-time service with support for expressive content matching. Unfortunately, expressive matching semantics and scalability contrast each other making the design, implementation, and deployment of such a global pub/sub service challenging.

Architecture and protocol design should support self-organization and preferential attachment to content sources as well as efficient and timely content dissemination from content publishers through the intermediaries to the content subscribers. The mechanism, techniques, and algorithm are in the key focus of this book. We will address the different facets of the information dissemination problem, and present a collection of frequently employed pub/sub solutions as well as guidelines on how to apply them in practice.

1.2 Components of a Pub/Sub System

Before going deeper into the topic, we first define the central terms and components, and the overall structure of a pub/sub system.

1.2.1 Basic System

The main entities in a pub/sub system are the publishers and subscribers of content. A publisher detects an event and then publishes the event in the form of a notification. A notification encapsulates information pertaining to the observed event. The notification can also be called the event message.

There are many terms for the entities in pub/sub or event systems; for example, the terms subscriber, consumer, and event sink are synonymous. Similarly, publisher, producer, supplier, and event source are synonymous. As mentioned above, the notification or event message denotes that an observed event has happened.

An event represents any discrete state transition that has occurred and is signalled from one entity to a number of other entities. For example, a successful login to a service, the firing of detection or monitoring hardware and the detection of a missile in a tactical system are all events.

Events may be categorized by their attributes, such as which physical property they are related to. For instance spatial events and temporal events note physical activity. Moreover, an event may be a combination of these, for example an event that contains both temporal and spatial information. Events can be categorized into taxonomies on their type and complexity. More complex events, called composite or compound events, can be built out of more specific simple events. Composite events are important in many applications. For example, a composite event may be fired

in a hospital, when the reading of a sensor attached to a patient exceeds a given threshold and a new drug has been administered in a given time interval;

in a location tracking service, where a set of users are in the same room or near the same location at the same time; or

in an office building, where a motion detector fires and there has been a certain interval of time after the last security round.

After the notification has been published, it is the duty of the pub/sub system to deliver the message to interested recipients—the subscribers. A subscriber is an entity that has expressed prior interest to a set of events that meet certain requirements that the subscriber has set. The actual delivery depends on the pub/sub solution being used; for example, it could be based on the following:

The message is broadcast on the network and devices on the same network will see the message. The pub/sub system running on a device can then process the message and deliver it to the subscriber if it is active on the device.

The message is delivered via network supported multicast, in which a specific network primitive is used for delivering the message from one publisher to many subscribers.

The message is sent directly by the publisher to subscribers that have informed the publisher that they are interested in receiving a notification. The publisher then utilizes a one-to-one message delivery protocol on top of the communication primitives offered by the network, typically the TCP/IP protocol stack.

The message is first sent to a broker server and then delivered by the broker to active subscribers. In this case, the subscribers have expressed their interest in receiving notifications with the broker.

The message is delivered through a network of brokers. The scalability of a pub/sub system can be increased by deploying a network of pub/sub brokers.

The two first cases are based on communication primitives provided by the underlying network, namely broadcast and multicast. Typically these primitives are not usable with Internet applications, because they are supported only within specific regions of the Internet and thus cannot be used to deliver messages in the global environment. The third case is very typical and extensively used when the number of subscribers is known to be small. This strategy does not scale when the number of subscribers increases. The fourth and fifth case introduce the concept of a broker, also called pub/sub router, that mediates events and provides a routing and matching engine for the publishers and subscribers. This is a commonly used solution for the distributed environment. A well-known technique for deploying pub/sub systems is to create them as overlay networks that operate on top of the current Internet routing system [3].

1.2.2 Distribution and Overlay Networks

A pub/sub system may be centralized or distributed in nature. The notification processing and delivery responsibility may be provided by different entities:

publishers;

a centralized broker;

a set of brokers in a routing configuration typically realized as an overlay network.

Event and notification processing can be easily implemented in publishers and with a centralized broker; however, as observed above, these approaches do not scale well when there are many entities and events in the system. Scalability can be improved by implementing the pub/sub system with a set of brokers as an overlay construct on top of the network layer.

An application layer overlay network is implemented on top of the network layer and it provides services such as resource lookup, overlay multicast, and distributed storage. An overlay network typically provides useful features such as easy deployment of new distributed functions, resilience to network failures, and fault-tolerance [3]. An overlay-routing algorithm is based on the underlying packet-routing primitives. A pub/sub overlay system is implemented as a network of application layer brokers or routers that communicate by using the lower layer primitives, typically TCP/IP.

Figure 1.2 illustrates a pub/sub overlay network. The two important parts of a distributed pub/sub network are the broker topology and how routing state is established and maintained by the brokers. By propagating routing state we mean how the interests of the subscribers are sent towards the publishers of that information. In essence, the routing state stored by a broker must enable it to forward event messages either to other brokers or to subscribers that have previously subscribed to the notifications.

Figure 1.2 Example of a pub/sub overlay network.

In this book, we will investigate the above ways of realizing the notification as well as solutions for achieving high performance, expressiveness, availability, fault resilience, and security.

1.2.3 Agreements

The pub/sub system is used to facilitate the delivery of the messages; however, the meaning of the event is application and domain specific. In order to build a pub/sub system with many entities the following agreements need to be considered:

Agreement of the notification message format and syntax. For example, many systems utilize a typed-tuple-based format or XML. This agreement may consist of additional details such as those pertaining to timestamps and content security.

Agreement of the message protocol that is used to transfer the event between two entities. This can include many parameters, for example security, reliability, etc.

Agreement of the notification filtering semantics. This specifies what elements of the message can be used to make a notification decision. For example, a notification is forwarded based on the publisher, observation time, and type of the event.

Agreement on the visibility of the published event. It may be necessary to restrict the delivery and processing of the event in the operating environment.

Agreement of the application and domain specific interpretation of the event. This agreement is outside the scope of a pub/sub system.

Thus many implicit or explicit agreements are needed to design and implement a pub/sub system for an environment that consists of many entities.

1.2.4 The Event Loop

The event loop is a key construct in creating event-based applications. The event loop is a frequently used approach in implementing applications that react to various events. For example, Microsoft Windows programs are based on events. The main thread of the application contains the event loop, which waits for new events to process. The event loop can use a blocking function call for receiving messages or a nonblocking peek message function. Typically when a message is received it is processed and delivered to callbacks for further processing.

The event loop is a crucial part of an application that needs to react to events in a timely manner, for example GUI events. The event loop naturally combines with a distributed pub/sub system and it is a key construct for implementing pub/sub engines. A simple pub/sub engine can be implemented as an event loop that reacts to incoming subscription and publishing requests.

1.2.5 Basic Properties

Pub/sub technology has evolved since its inception in the late 1980s to a promising technology for connecting software components across space, time, and synchronization [4]. These three properties summarize the salient features of the technology. We will examine each of the three properties presented in Figure 1.3 in detail in this section.

Figure 1.3 Decoupling properties in pub/sub. (a) Space decoupling; (b) Time decoupling; (c) Syschronization decoupling.

Space decoupling is illustrated by subfigure A, in which the event notification service decouples the publisher and the subscribers. The event message is transferred to the event service, and then it is transferred to the subscribers. Thus memory space is not shared by the entities. Subfigure B presents an example of time decoupling. The setting is the same as for the space decoupling case with the exception of message buffering at the service side. Time decoupling is achieved by storing the message in a message buffer at the event notification service for eventual delivery to subscribers. The synchronization decoupling is illustrated by subfigure C, which emphasizes the temporal aspect. The publish and notify phases of event delivery are decoupled and they do not require synchronization. The message is first delivered to the event notification service and then to the subscriber.

Figure 1.4 summarizes the decoupling properties of well-known communication techniques. As observed before, the communication techniques are not orthogonal but rather they are combined in order to implement more sophisticated systems. Message passing, Remote Procedure calls (RPC) and Remote Method Invocation (RMI), and asynchronous RPC/RMI do not offer decoupling in space and time. They can offer decoupling in synchronization. Tuple spaces offer decoupling in space/time through the shared space; however, the reader of the tuple space is blocked and thus tuple spaces do not offer decoupling in synchronization [5]. Message queuing, on the other hand, offers decoupling of all three properties and it is a building block for the more sophisticated pub/sub systems.

Figure 1.4 Summary of decoupling properties.

Pub/sub is based on message queuing and message-oriented middleware. Message queuing is a communication method that employs message passing between a sender and a receiver with the help of a sender-side message queue. A message being sent is first stored in the local message queue. After the delivery has been made, the message can be removed from the queue. If the message cannot be delivered or the message is incorrectly received, the message can be resent.

Queuing is a basic solution in achieving reliability in data communications. Queuing also supports disconnections during which the message cannot be sent. Message queuing is thus the basic ingredient for achieving decoupled communications.

One distinction between message queuing systems and pub/sub is that they typically offer one-to-one communications and require that the receivers are explicitly defined. Pub/sub on the other hand supports one-to-many and many-to-many communications and the subscribers can be defined implicitly by the event message being delivered and the a priori subscriptions that the subscribers have set.

The key properties of pub/sub systems are: decoupling in space, time and synchronization, many-to-many communications, and information filtering.

1.3 A Pub/Sub Service Model

Figure 1.5 illustrates a generic pub/sub service design. In the figure, the pub/sub service is a logically centralized service that provides the necessary functions and interfaces for supporting notification delivery from publishers to subscribers. The pub/sub service consists of the following key components:

A notification engine that builds and maintain an index structure of the subscriptions, and uses the index table to forward notifications to subscribers. The engine offers the necessary interfaces for subscribers and publishers that allow them to subscribe, unsubscribe, and publish content.

A subscription manager that accepts subscriptions from the engine and maintains those. The two mandatory operations are insert and remove a subscription.

A subscription storage that stores subscriptions and data related to the subscriptions.

An event storage is a facility for storing published events so that they can be retrieved later.

A notification consumer that is an intermediary component in the notification process. A consumer receives notifications from the engine and then forwards those to the proper subscriber. The consumer can buffer, compress, and process notifications before the final delivery.

Figure 1.5 Components of a pub/sub system.

A publisher observes a situation and when an event of interest is observed, a notification is created and sent to the notification engine using its publication interface. The notification is then matched with the subscription index maintained by the engine with the help of the subscription manager. The notification is given by the engine to the notification consumers of subscribers that have expressed interest in the notification. In other words, the notification matches with the subscribers' subscriptions. The notification is then prepared by each consumer for delivery to the associated subscriber.

This model of a pub/sub service separates the management of the subscriptions, the matching process with the notification engine, and the final delivery to the subscribers. This separation allows, for example, changing of the notification consumer without changing the engine.

The design of Figure 1.5 is logically centralized and it hides the distribution of the components. It is necessary to distribute and replicate the components in order to achieve scalability and reliability in a distributed environment.

1.4 Distributed Pub/Sub

As mentioned in this chapter, direct notification of subscribers by a publisher is not scalable. Therefore it is vital to develop techniques for distributing the notification process. To this end, a number of pub/sub network designs have been developed.

An event broker or router is a component of a pub/sub network that forwards notification messages across multiple hops. An example pub/sub network is presented in Figure 1.2 that shows the layered design. The pub/sub network offers the notification API to subscribers and publishers and utilizes the network API, typically the Sockets API, to disseminate the notification message and take it from the source router to the destination router and subnetwork. The network level routers are responsible for taking the message end-to-end across the Internet. Such overlay designs have favourable characteristics in terms of deployability and flexibility; however, the resulting high level routing may not be efficient in terms of the network level topology.

An event router typically has local clients and neighbouring routers. The algorithms and protocols for local clients and neighbouring routers are different. Both cases require a routing table for storing information about message destinations. A pub/sub routing table is an index structure that contains the active subscriptions and typically supports add, remove, and match operations.

The design and configuration of pub/sub networks has become an active area of research and development. We will focus on various strategies for implementing pub/sub networks.

The simplest form of notification in the distributed environment is called flooding. With flooding each pub/sub broker simply sends the message to all neighbours except the one that sent the message. Thus the message is introduced at every broker; however, the price of the technique is its inaccuracy. Ideally, we want to prevent the forwarding of a message to a broker that we know does not have subscribers for the message. Moreover, excess and uncontrolled messaging may lead to congestion that in turn may cause notification messages to be dropped.

In order to avoid unnecessary message deliveries, we introduce the notion of filtering into the pub/sub network. Filtering involves an interest registration service that accepts filter information from the subscribers. The subscribers can thus specify in more detail what kind of data they desire. The pub/sub network then distributes this filtering information in such a way that minimizes the overhead in notification message delivery. The process of optimizing a pub/sub network is not simple, because the filtering information also introduces overhead into the network. For example, filtering information may need to be updated, and there is propagation delay in setting up and maintaining the routing tables of pub/sub brokers. Later in this book we will consider various techniques in optimizing these networks.

Accuracy is a key requirement for a pub/sub network. The accuracy of event delivery can be expressed with the number of false positives and false negatives.
A false positive is a message that is sent to a subscriber that does not match the subscriber's active interests. Similarly, a false negative is a message that was not sent to a subscriber, but should have been because it matches the subscriber's active interests.

Various filtering languages and filter matching algorithms have been developed. Filtering involves the specification of filters that are organized into a filtering data structure. A filter selects a subset of notifications based on the filtering language. Thus a filter is a constraint on the notification message and it can be applied in the context of the notification type, structure, header, and content.

Filtering allows the subscribers to specify their interest beforehand and thus reduce the number of uninteresting event messages that they will receive. A filter or a set of filters that describes the desired content is included with the subscription message that is used by brokers to configure routing tables. Many filtering languages have been developed, specified, and proposed. For example, the filtering language used by Java Message Service (JMS) is based on Structured Query Language (SQL) [6].

Filtering is a central core functionality for realizing event-based systems and accurate content-delivery. Filtering is performed before delivering a notification to a client or neighbouring router to ensure that the notification matches an active subscription from the client or neighbour. Filtering is therefore essential in maintaining accurate event notification delivery.
Filtering increases the efficiency of the pub/sub network by avoiding to forward notifications to brokers that have no active subscriptions for them. Filters and their properties are useful for many different operations, such as matching, optimizing routing, load balancing, and access control. To give some examples, a firewall is a filtering router and an auditing gateway is a router that records traffic that matches a given set of filters.

1.5 Interfaces and Operations

Table 1.1 presents the pub/sub operations used by many event systems [7]. The operations are requested by a client, denoted by X, of the system. There are many ways to define the interests of the subscriber. In our generic API, we denote the general interests by F. In expressive content-based routing F is typically defined with a Boolean function that selects a subspace of the content space, in which the notifications are defined. Notifications are points in this space. There are also less expressive semantics for subscribing content, such as type-based subscriptions. We will return to these notions shortly.

Table 1.1 Infrastructure interface operations

Operation

Description

Semantics

Sub(X,F)

X

subscribes content defined by

F

Sub/Adv

Pub(X,n)

X

publishes notification

n

Sub/Adv

Notify(X,n)

X

is notified about notification

n

Sub/Adv

Unsub(X,F)

X

unsubscribes content defined by

F

Sub/Adv

Adv(X,C)

X

advertises content

C

Adv

Unadv(X,C)

X

unadvertises content

C

Adv

Fetch(X,P)

X

fetches messages that satisfy the given constraints

P

Sub/Adv

As presented by the table, the key operations pertain to publishing, subscribing, unsubscribing, and fetching content. It should be noted that the subscribe and unsubscribe operations are idempotent, which means that even if the same operation is executed repeatedly it does not change the state of the system. Publish operation, however, is not idempotent and repetitions will cause many publications to be delivered.

In a large-scale pub/sub system, the API typically supports leases that determine the validity time period for each subscription and advertisement. Leases are useful in removing obsolete state from the pub/sub network, and they are instrumental in ensuring the eventual stability of the network. The unsubscription and unadvertisement are not necessary if leases are supported by the API; however, they may still be useful in terminating a lease before the it expires.

There are two different kinds of operational semantics for a pub/sub system:

Subscription-driven: Subscriptions are propagated by the pub/sub network and the routing tables are based on filters specified in the subscription messages.

Advertisement-driven: Publishers first advertise content with advertisement messages that are propagated by the pub/sub network. The pub/sub network then connects subscriptions with matching advertisements to active content delivery across the network.

The table presents the API operations for these two filtering semantics. The advertisement semantics introduces the operations for advertising and unadvertising content. Moreover, the API operations are typically extended with security and quality-of-service properties as well as more expressive notification retrieval strategies. Key extensions pertaining to event retrieval, the fetch operation illustrated by Figure 1.6, include:

Fetch operation for retrieving a specific number of messages.

Nondestructive fetch, which leaves the retrieved messages at the server's queue. This is useful when multiple instances of the same software are retrieving messages.

Fetch operation with query operation that allows specific event messages to be fetched from the queue. This operation is frequently supported by pub/sub standards, for example JMS.

Fetch the latest event message in the message queue. This is useful when starting an application or recovering from application failure.

Figure 1.6 Example of the expressive fetch operation.

In the following section we will investigate the different filtering semantics for targeted information delivery.

1.6 Pub/Sub Semantics for Targeted Delivery

As mentioned above, there needs to be agreement on how notification messages are delivered from publishers to subscribers. There are many possible semantics for selecting notifications that need to be delivered for a given set of subscribers. In this section, we will briefly examine key semantics for targeted notification delivery.

Depending on the expressiveness of the filtering language, a field, header, or the whole content of the notification message may be filterable. In content-based routing the whole content of the event message is filterable.

Figure 1.7 illustrates the four key types of message routing semantics. The types are the following: content-based, header-based, topic-based, and type-based. As mentioned above, content-based routing allows the evaluation of filters on the whole event message. Header-based is more limited and only allows to evaluate elements included in the header of the message. Topic-based only allows to evaluate a specific topic field in the message. Topic-based systems are similar to channel-based systems and the topic name can be seen to be the same as the channel name. Typically topic-based systems require that the topic of an event message exactly matches with the requested topic name and thus it is not very expressive. Finally, type-based systems allow the selection of event messages based on their designated type in a type hierarchy. We can take a type hierarchy pertaining to buildings as an example: the root of the hierarchy is the building name, the second level consists of floors, and the third level of the offices. By subscribing to a floor the subscriber receives all events related to that specific floor in the named building.

Figure 1.7 Examples of message targeting systems.

The different routing semantics are characterized by their selectivity. Type-based systems make the forwarding decision based on a predefined set of message types. In topic-based and channel-based pub/sub, the subscribers are defined by a queue name or a channel name. The notifications are sent to a named queue or channel, from which the subscriber then extracts the messages. An important limitation is that the queue or channel name has to be agreed beforehand. Subject-based systems make the routing decision based on a single header field in the notification. Header-based systems use a special header part of the notification in order to forward the message. Content-based systems are the most expressive and use the whole content of the message in making the forwarding decision. Content-based pub/sub is flexible because it does not require that topic or channel names are assigned beforehand.

Various pub/sub delivery semantics can be implemented with a content-based communication scheme making it very expressive. Header-based routing is more limited, but it has a performance advantage to content-based routing, because only the header of a message is evaluated when making a forwarding decision.

Expressiveness and scalability are important characteristics of an event system [8]. Expressiveness pertains to how well the interests of the subscribers are captured by the pub/sub service. Scalability involves federation, state, and the number of subscribers, publishers, and brokers can be supported as well as the how much notification traffic can the system support.

Other requirements for a pub/sub network include simplicity, manageability, implementability, and support for rapid deployment. Moreover, the system needs to be extensible and interoperable. Other nonfunctional requirements include: timely delivery of notifications (bounded delivery time), support for Quality of Service (QoS), high availability and fault-tolerance.

Event order is an important nonfunctional requirement and many applications require support for either causal order or total order. Causality determines the relationship of two events A and B. In order to be able to determine causality in the distributed system a logical clock mechanism is needed. The two well-known solutions are the Lamport clocks and vector clocks. We will examine these clocks in more detail in Chapter 2.

1.7 Communication Techniques

Event systems are widely used, because asynchronous messaging provides a flexible alternative to RPC [4, 9]. RPC is typically synchronous and one-to-one, whereas pub/sub is asynchronous and many-to-many. Limitations of synchronous RPC calls include:

Tight coupling of client and server lifetimes. The server must be available to process a request. If a request fails the client receives an exception.

Synchronous communication. A client must wait until the server finishes processing and returns the results. The client must be connected for the duration of the invocation.

Point-to-point communication. Invocation is typically targeted at a single object on a particular server.

On the other hand, RPC is a building block for distributed pub/sub systems. Many pub/sub implementations use RPC operations to implement the API operations presented in Table 1.1.

Event delivery between two processes can be realized in many ways depending on the requirements and the operating environment. The two key differing environments are the local and remote communication context. In local event delivery, techniques such as shared resources and local procedure calls or message passing can be used. Remote event delivery is typically implemented with message queuing or RPC.

RPC offers reliability and at-most-once semantics whereas message queuing systems have differing message delivery options. The key reliability semantics are:

Exactly-once: The highest reliability guarantee in which the message is sent to the remote node exactly once. The message is delivered even if the server crashes or the network fails. This reliability level is not possible to achieve in the typical distributed environment.

At-least-once: This reliability level guarantees that a message is sent to the remote node, but duplicates are allowed. Duplicates may happen due to network failures and server crashes. The semantics are appropriate for idempotent operations.

At-most-once: This reliability level guarantees that the message is sent to node once if at all. It does not guarantee that message is delivered. A message can disappear due to network problems or server crashes.

Typically commercially used message queue systems support either at-least-once or at-most-once. The semantics are implemented with sender and receiver side message buffering, sequence numbers, and timers for detecting lost messages and other problems.

Figure 1.8 illustrates the options when implementing pub/sub systems. The key difference between message queuing and RPC is that messaging is asynchronous whereas traditional RPC is synchronous although there are also asynchronous RPC features. Alternative techniques are tuple spaces and distributed shared memory. We will later in Chapter 2 consider Java RMI as one example of an RPC system.

Figure 1.8 Communication techniques for event delivery.

Distributed shared memory can be realized in many ways based on the memory abstraction. A page based abstraction organizes the shared memory into pages of fixed size. The object based abstraction organizes the shared memory as an abstract space for storing shareable objects. A tuple space, on the other hand, is based on the tuple abstraction. A coherence protocol is needed to maintain memory coherence in a distributed shared memory implementation. Memory update and invalidation techniques include update-on-write, update-on-read, invalidate-on-write, invalidate-on-read. Typically these systems follow the weak consistency model, in which synchronizations must be atomic and have a consistent order.

Figure 1.9 illustrates the layered nature of the communications environment. Each layer provides functions for the higher layers and abstracts details of the underlying layers. The organization of protocols into a stack structure offers separation of concerns; however, it makes it difficult to optimize system behaviour across layers. As shown by the figure, each layer adds its own header and details to the packets and messages being processed. In a similar fashion, when receiving a packet, each layer processes its own information and gives the data to a higher layer. Pub/sub systems can be implemented on multiple levels of the stack, starting from the link layer towards the application layer. Most pub/sub systems are implemented on top of TCP/IP and they are offered as middleware services or libraries. A pub/sub system can itself be viewed to be a layered system, in which the higher level functions of distributed routing and forward are based on lower layer message queuing primitives.

Figure 1.9 Protocol stack with middleware.

1.8 Environments

The pub/sub paradigm can be applied in many different contexts and environments. Early examples include GUIs, control plane signalling in industrial systems, and topic-based document dissemination. The paradigm is fundamental to todays's graphical and network applications. Most programmers apply the paradigm in the context of a single server or device; however, distributed pub/sub is vital for many applications that require the timely and efficient dissemination of data from one or more sources to many subscribers.

The operating environments for pub/sub can be examined from differing viewpoints, for example based on the underlying communications environment and the application type. In the following we summarize key environments for pub/sub technology:

Local: Event loop, GUI, in-device information delivery.

Wireless and ad hoc: Event delivery in wireless networks in which nodes can move. Publishers and subscribers are typically run on constrained limited devices, for example mobile phones.

Sensor: Event delivery in sensor networks from a number of source sensors to sinks that then deliver the events for further processing.

Embedded and industrial: Event delivery in an embedded or industrial setting, for example within a car or an airplane or a factory.

Regional: Event delivery within an organization or a region.

Internet-wide: Event delivery in the wide-area network across organizational boundaries.

In this book we will focus especially on distributed pub/sub systems for the last three categories; however, we do also consider the mobile and wireless domain as well.

Small and wireless devices have limited capabilities compared to desktop systems: their memory, performance, battery life, and connectivity are limited and constrained. The requirements of mobile computing need to be taken into account when designing an event framework that integrates with mobile devices.

From the small device point of view, message queuing is a frequently used communication method because it supports disconnected operation. When a client is disconnected, messages are inserted into a queue, and when a client reconnects the messages are sent. The distinction between popular message-queue-based middleware and notification systems is that message-queue-based approaches are a form of directed communication, where the producers explicitly define the recipients. The recipients may be defined by the queue name or a channel name, and the messages are inserted into a named queue, from which the recipient extracts messages. Notification-based systems extend this model by adding an entity, the event service or event dispatcher, that brokers notifications between producers of information and subscribers of information. This undirected communication supported by the notification model is based on message passing and retains the benefits of message queuing. In undirected communication the publisher does not necessarily know which parties receive the notification.

The pub/sub paradigm and technology can be seen to be a unifying technology that combines the different environments and domains through event delivery. Indeed, pub/sub has been proposed as the new basis for an internetworking architecture; however, there are still many unsolved challenges in applying the paradigm on the global Internet scale. We consider these solutions in Chapter 13.

1.9 History

As mentioned above, pub/sub can applied in many different environments for solving the information dissemination problem. The early applications include the filtering and delivery of Usenet postings as well as being the glue of many GUIs. More recent applications include Internet technologies such as RSS and XMPP as well as the many standards such as JMS, CORBA Notification Service [10], and OMG DDS.

In this section, we will examine the history of pub/sub systems from three viewpoints, namely the research highlights, standardization, and Internet technology. The last category illustrates the importance and applicability of pub/sub for Internet-based applications. Figure 1.10 gives a timeline of the evolution of pub/sub technology for the three categories. In the following, we briefly examine the key developments. We will return to many of these later in the book.

Figure 1.10 Timeline of pub/sub solutions.

1.9.1 Research Systems

The history of pub/sub has its roots in the requirement to process asynchronous events. The notion of the event loop is very old; however, the patterns employed today to realize distributed pub/sub are considerably newer. The first systems were based on the shared memory abstraction. The memory represented rendezvous space for senders and receivers. Processors communicated by posting messages to the shared memory.

The shared memory is very similar to the blackboard pattern frequently used in creating artificial intelligence systems. The blackboard pattern that was proposed in 1985 for solving complex problems with the help of a shared memory abstraction [11, 12]. The LINDA tuple space model is also from this period [5]. LINDA is a coordination and synchronization technique based on the tuple abstraction. LINDA supports communication through a shared memory region called the tuple space. Processes generate tuples and store those in the shared space. Other processes can then monitor the tuple space and read tuples.

Another early example of interprocess communication is the UNIX signal notification system that was implemented in 1986. UNIX processes use signals to notify each other. A process has a unique numeric process identifier and a group of processes have a numeric group identifier. A signal can be directed to a specific process or a group of processes.

The frequently used Model-View-Control(MVC) design pattern was developed in the SmallTalk community in 1988. MVC facilitates the communication between the model, view, and control components [13]. The MVC pattern separates concerns over application state (model), user interface (view), and the control aspect. MVC requires that a component is able to subscribe to the state of another component. This subpattern used in the MVC developed into the observer pattern that is widely used. We examine these and other patterns in more detail in Chapter 4.

An early pub/sub service was proposed in the seminal ISIS system in 1987 [14]. This ISIS subsystem was responsible for disseminating news items from publishers to subscribers. The ISIS news service allowed processes to subscribe to system-wide news announcements. The subscriber specified a subject and then received postings under that subject. The ISIS architecture also featured filters that were used on client systems to process incoming messages. The subscribe operation of the news subsystem was implemented with one RPC per posting and the actual posting delivery with one asynchronous multicast operation (with causal or total ordering).

The key contributions of the ISIS system for pub/sub were:

Reliable atomic multicast communications primitives.

Causal and total ordering of multicast messages.

Developing the pub/sub system based on the RPC and multicast primitives.

Another early example of a pub/sub system is the Information Bus proposed in 1993 [15]. This model consists of service objects and data objects. Service objects can have local data objects, and they send and receive them through the datacentric information bus. Each data object is labelled with a subject string. Subjects are hierarchically structured. The Information Bus supports both pub/sub and request/reply APIs. With the pub/sub model, the system decouples the component and subscribers do not need to know the identities of the publishers. The Information Bus model calls this kind of communication subject-based addressing. The system has built-in support for dynamic discovery of participants. This is implemented with two publications, first a query to prospective participants listening a specific subject, and then response publication indicating presence. The system can be extended with adapters that convert information from the data objects to application specific formats.

The ISIS system and Information Bus did not take the content of the messages into account. In the news applications implemented with these systems, the news items were delivered based on the subject that was a configuration parameter. The SIFT information dissemination system is an early example of an alert service [16] that takes the content of the disseminated documents into account. The system proposed the inverted index for matching documents to subscriptions. Figure 1.11 illustrates this system and its key components.

Figure 1.11 Overview of the SIFT system.

The system accepts document queries and stores them into a subscription database. Similarly, documents are parsed and an inverted index is stored in the document index. A filtering engine then is responsible for matching documents to the queries with the document index.

The SIFT system did not consider the distributed environment in more detail. IBM's Gryphon project developed a distributed pub/sub system consisting of a network of brokers [17, 18]. The Gryphon system was developed at the Distributed Messaging Systems group at the IBM T.J.Watson Research Center. Gryphon is a Java-based pub/sub message broker intended to distribute data in real time over a large public network. Gryphon uses content-based routing algorithms developed at the research center. The clients of Gryphon use an implementation of the JMS API to send and receive messages. The Gryphon project was started in 1997 to develop the next generation web applications and the first deployments were made in 1999. Gryphon is designed to be scalable, and it was used to deliver information about the Tennis Australian Open to 50000 concurrently connected clients. Gryphon has also been deployed over the Internet for other real-time sports score distribution, for example the Tennis US Open, Ryder Cup, and monitoring and statistics reporting at the Sydney Olympics.

Gryphon supports both topic-based and content-based publish-subscribe, relies on adopted standards such as TCP/IP and HTTP, and supports recovery from server failures and security. In Gryphon, the flow of streams of events is described using an information flow graph (IFG), which specifies the selective delivery of events, the transformation of events, and the creation of derived events as a function of states computed from event histories.

Elvin is another early example of a content-based routing system with expressive semantics [19]. Elvin uses a client-server architecture in notification delivery. Clients establish sessions with Elvin servers and subscribe and publish notifications.

Scalable Internet Event Notification Service (SIENA) is an Internet-scale event notification service developed at the University of Colorado. SIENA balances expressiveness with scalability and explores content-based routing in a wide-area network. The basic pub/sub mechanism is extended with advertisements that are used to optimize the routing of subscriptions [8]. Several network topologies are supported in the architecture, including hierarchical, acyclic peer-to-peer, and general peer-to-peer topologies. Servers only know about their neighbours, which helps in minimizes routing table management overhead. Servers employ a server-server protocol to communicate with their peers and a client-server protocol to communicate with the clients that subscribe to notifications. It is also possible to create hybrid network topologies.

SIENA introduced covering relations between filters to prevent unnecessary signalling. The SIENA system used the notion of covering for three different comparisons:

matching a notification against a filter;

covering relation between two subscription filters;

and overlapping between an advertisement filter and a subscription filter.

Covering relations have been used in many later event systems, such as Scribe [20], Rebeca [21], and Hermes [22, 23]. Scribe and Hermes are examples of Distributed Hash Table (DHT)-based pub/sub systems. Scribe is a topic-based system and Hermes supports both topic-based and content-based communication. Scribe and Hermes choose a rendezvous point for each topic or event type in the overlay network topology, and then build and maintain multicast trees rooted at this rendezvous point. We will later return to DHT structures and DHT-based pub/sub systems.

DHTs are a class of decentralized distributed algorithms. They provide a hashtable API and implement the hashtable functionality in a wide-area environment in which nodes can join and leave the network. A DHT maintains (key, value) pairs and allows a client to retrieve a value corresponding to the given key.

The combined broadcast and content-based (CBCB) routing scheme extends the SIENA routing protocols by combining higher-level routing using covering relations and lower-level broadcast delivery [24]. The protocol prunes the broadcast distribution paths using higher-level information exchanged by routers.

Java Event-based Distributed Infrastructure (JEDI) [25] is a distributed event system developed at Politecnico di Milano. In JEDI the distributed architecture consists of a set of dispatching servers (DS) that are connected in a tree structure. Each DS is located on a node of the tree and all nodes except the root node are connected to one parent DS. Each node has zero or more descendants.

Gryphon, Elvin, SIENA, and JEDI paved way for the next generation of content-based pub/sub systems developed as overlay networks over the Internet. More recent developments have considered also the introduction of pub/sub primitives into the protocol stack design.

SIENA pioneered the notion of content-based networking, in which content demand defines subnetworks and where information is sent. The notion of datacentric networking is similar and has been pioneered by projects such as TRIAD [26] and DONA [27]. These new forms of networking are motivated by the observation that the current Internet architecture has been designed around a host-based model that dates from the 1970s. The aim is to allow the network to adapt to the network usage patterns and improve performance with targeted information delivery and caching.

For example, the Publish-Subscribe Internet Routing Paradigm (PSIRP) system [28] and the Content Centric Networking (CCN) architecture [29] are based on receiver driven designs. The motivation is that Internet hosts are interested in receiving the proper content rather than who is supplying the content.

1.9.2 Standards

The standards timeline includes systems such as CORBA Event Service, Microsoft's DCOM, CORBA Notification Service, JMS, SIP, and DDS. In this section we briefly examine these developments.

The CORBA Event Service specification defined a communication model that allowed an object to accept registrations and send events to a number of receiver objects. The Event Service supplements the standard CORBA client-server communication model and is part of the CORBAServices that provide system level services for object-based systems. The CORBA Notification Service [10] extends the functionality and interfaces of the older Event Service [30] specification. The Event Service specification defines the event channel object that provides interfaces for interest registration and event notification. One of the most significant additions to the Notification Service is event filtering.

The Distributed Component Object Model (DCOM) was the Microsoft alternative to CORBA technology. DCOM facilitates communication between distributed software components. DOM extends the COM model and provides the communication with COM+ application infrastructure. Today DCOM has been replaced with Microsoft .NET Framework.

Standard COM and OLE supported asynchronous communication and the passing of events using callbacks, however, these approaches had their problems. Standard COM publishers and subscribers were tightly coupled. The subscriber knows the mechanism for connecting to the publisher (interfaces exposed by the container). This approach does not work very well beyond a single desktop. The change in the COM+ Event Service was the addition of the event service in the middle of the communication. The event service keeps track of which subscribers want to receive the calls, and mediates the calls.

JMS defines a generic and standard API for the implementation of message-oriented middleware. The JMS API is an integral part of the Java Enterprise Edition (Java EE). JMS is an interface and the specification does not provide any concrete implementation of a messaging engine. The fact that JMS does not define the messaging engine or the message transport gives rise to many possible implementations and ways to configure JMS.

The Session Initiation Protocol (SIP) [31] is a text-based, application-layer protocol that can be used to setup, maintain, and terminate calls between two or more end points. SIP is designed to be independent of the underlying transport layer. SIP has been designed for call control tasks and thus the driving application has been telephony and multiparty communications. SIP has been standardized by IETF and adopted widely in the telecommunications industry. SIP was accepted as a 3GPP signalling protocol in November 2000.

The Data Distribution Service for Real-Time Systems (DDS) OMG specification defines an API for datacentric pub/sub communication for distributed real-time systems [32]. DDS is a middleware service that provides a global data space that is accessible to all interested applications.

1.9.3 Internet Technology

We briefly consider developments on the Internet technology timeline focusing on Web technologies for building pub/sub systems. One of the earliest examples of a loosely coupled message dissemination system is the Usenet that was created in 1980. Usenet thus precedes many other message dissemination systems. Later developments include W3C's SOAP protocol, the Really Simple Syndication (RSS) specifications, the Extensible Messaging and Presence Protocol (XMPP), the Representational State Transfer (REST) model, HTML5 from W3C, and the Pubsubhubbub protocol. There are also many other systems that have been proposed and deployed.