34,79 €
Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur.
This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.
By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.
Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:
Seitenzahl: 306
Veröffentlichungsjahr: 2017
BIRMINGHAM - MUMBAI
Copyright © 2017 Packt Publishing
All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.
Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.
Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.
First published: August 2017
Production reference: 1170817
ISBN 978-1-78728-398-5
www.packtpub.com
Authors
Manish Kumar
Chanchal Singh
Copy Editor
Manisha Sinha
Reviewer
Anshul Joshi
Project Coordinator
Manthan Patel
Commissioning Editor
Amey Varangaonkar
Proofreader
Safis Editing
Acquisition Editor
Tushar Gupta
Indexer
Tejal Daruwale Soni
Content Development Editor
Tejas Limkar
Graphics
Tania Dutta
Technical Editor
Dinesh Chaudhary
Production Coordinator
Deepika Naik
Manish Kumar is a Technical Architect at DataMetica Solution Pvt. Ltd.. He has approximately 11 years, experience in data management, working as a Data Architect and Product Architect. He has extensive experience in building effective ETL pipelines, implementing security over Hadoop, and providing the best possible solutions to Data Science problems. Before joining the world of big data, he worked as an Tech Lead for Sears Holding, India. He is a regular speaker on big data concepts such as Hadoop and Hadoop Security in various events. Manish has a Bachelor's degree in Information Technology.
Chanchal Singh is a Software Engineer at DataMetica Solution Pvt. Ltd.. He has over three years' experience in product development and architect design, working as a Product Developer, Data Engineer, and Team Lead. He has a lot of experience with different technologies such as Hadoop, Spark, Storm, Kafka, Hive, Pig, Flume, Java, Spring, and many more. He believes in sharing knowledge and motivating others for innovation. He is the co-organizer of the Big Data Meetup - Pune Chapter.
He has been recognized for putting innovative ideas into organizations. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai.
Anshul Joshi is a Data Scientist with experience in recommendation systems, predictive modeling, neural networks, and high performance computing. His research interests are deep learning, artificial intelligence, computational physics, and biology.
Most of the time, he can be caught exploring GitHub or trying anything new that he can get his hands on. He blogs on https://anshuljoshi.com/.
For support files and downloads related to your book, please visit www.PacktPub.com.
Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at; www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.
At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.
https://www.packtpub.com/mapt
Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.
Fully searchable across every book published by Packt
Copy and paste, print, and bookmark content
On demand and accessible via a web browser
Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787283984.
If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!
Preface
What this book covers
What you need for this book
Who this book is for
Conventions
Reader feedback
Customer support
Downloading the example code
Downloading the color images of this book
Errata
Piracy
Questions
Introduction to Messaging Systems
Understanding the principles of messaging systems
Understanding messaging systems
Peeking into a point-to-point messaging system
Publish-subscribe messaging system
Advance Queuing Messaging Protocol
Using messaging systems in big data streaming applications
Summary
Introducing Kafka the Distributed Messaging Platform
Kafka origins
Kafka's architecture
Message topics
Message partitions
Replication and replicated logs
Message producers
Message consumers
Role of Zookeeper
Summary
Deep Dive into Kafka Producers
Kafka producer internals
Kafka Producer APIs
Producer object and ProducerRecord object
Custom partition
Additional producer configuration
Java Kafka producer example
Common messaging publishing patterns
Best practices
Summary
Deep Dive into Kafka Consumers
Kafka consumer internals
Understanding the responsibilities of Kafka consumers
Kafka consumer APIs
Consumer configuration
Subscription and polling
Committing and polling
Additional configuration
Java Kafka consumer
Scala Kafka consumer
Rebalance listeners
Common message consuming patterns
Best practices
Summary
Building Spark Streaming Applications with Kafka
Introduction to Spark
Spark architecture
Pillars of Spark
The Spark ecosystem
Spark Streaming
Receiver-based integration
Disadvantages of receiver-based approach
Java example for receiver-based integration
Scala example for receiver-based integration
Direct approach
Java example for direct approach
Scala example for direct approach
Use case log processing - fraud IP detection
Maven
Producer
Property reader
Producer code
Fraud IP lookup
Expose hive table
Streaming code
Summary
Building Storm Applications with Kafka
Introduction to Apache Storm
Storm cluster architecture
The concept of a Storm application
Introduction to Apache Heron
Heron architecture
Heron topology architecture
Integrating Apache Kafka with Apache Storm - Java
Example
Integrating Apache Kafka with Apache Storm - Scala
Use case – log processing in Storm, Kafka, Hive
Producer
Producer code
Fraud IP lookup
Storm application
Running the project
Summary
Using Kafka with Confluent Platform
Introduction to Confluent Platform
Deep driving into Confluent architecture
Understanding Kafka Connect and Kafka Stream
Kafka Streams
Playing with Avro using Schema Registry
Moving Kafka data to HDFS
Camus
Running Camus
Gobblin
Gobblin architecture
Kafka Connect
Flume
Summary
Building ETL Pipelines Using Kafka
Considerations for using Kafka in ETL pipelines
Introducing Kafka Connect
Deep dive into Kafka Connect
Introductory examples of using Kafka Connect
Kafka Connect common use cases
Summary
Building Streaming Applications Using Kafka Streams
Introduction to Kafka Streams
Using Kafka in Stream processing
Kafka Stream - lightweight Stream processing library
Kafka Stream architecture
Integrated framework advantages
Understanding tables and Streams together
Maven dependency
Kafka Stream word count
KTable
Use case example of Kafka Streams
Maven dependency of Kafka Streams
Property reader
IP record producer
IP lookup service
Fraud detection application
Summary
Kafka Cluster Deployment
Kafka cluster internals
Role of Zookeeper
Replication
Metadata request processing
Producer request processing
Consumer request processing
Capacity planning
Capacity planning goals
Replication factor
Memory
Hard drives
Network
CPU
Single cluster deployment
Multicluster deployment
Decommissioning brokers
Data migration
Summary
Using Kafka in Big Data Applications
Managing high volumes in Kafka
Appropriate hardware choices
Producer read and consumer write choices
Kafka message delivery semantics
At least once delivery
At most once delivery
Exactly once delivery
Big data and Kafka common usage patterns
Kafka and data governance
Alerting and monitoring
Useful Kafka matrices
Producer matrices
Broker matrices
Consumer metrics
Summary
Securing Kafka
An overview of securing Kafka
Wire encryption using SSL
Steps to enable SSL in Kafka
Configuring SSL for Kafka Broker
Configuring SSL for Kafka clients
Kerberos SASL for authentication
Steps to enable SASL/GSSAPI - in Kafka
Configuring SASL for Kafka broker
Configuring SASL for Kafka client - producer and consumer
Understanding ACL and authorization
Common ACL operations
List ACLs
Understanding Zookeeper authentication
Apache Ranger for authorization
Adding Kafka Service to Ranger
Adding policies
Best practices
Summary
Streaming Application Design Considerations
Latency and throughput
Data and state persistence
Data sources
External data lookups
Data formats
Data serialization
Level of parallelism
Out-of-order events
Message processing semantics
Summary
Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records and process them in a fault-tolerant way as they occur.
This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications and tackles some common challenges such as how to use Kafka efficiently to handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.
By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka and to design efficient streaming data applications with it.
Chapter 1, Introduction to Messaging System, introduces concepts of messaging systems. It covers an overview of messaging systems and their enterprise needs. It further emphasizes the different ways of using messaging systems such as point to point or publish/subscribe. It introduces AMQP as well.
Chapter 2, Introducing Kafka - The Distributed Messaging Platform, introduces distributed messaging platforms such as Kafka. It covers the Kafka architecture and touches upon its internal component. It further explores the roles and importance of each Kafka components and how they contribute towards low latency, reliability, and the scalability of Kafka Message Systems.
Chapter 3, Deep Dive into Kafka Producers, is about how to publish messages to Kafka Systems. This further covers Kafka Producer APIs and their usage. It showcases examples of using Kafka Producer APIs with Java and Scala programming languages. It takes a deep dive into Producer message flows and some common patterns for producing messages to Kafka Topics. It walks through some performance optimization techniques for Kafka Producers.
Chapter 4, Deep Dive into Kafka Consumers, is about how to consume messages from Kafka Systems. This also covers Kafka Consumer APIs and their usage. It showcases examples of using Kafka Consumer APIs with the Java and Scala programming languages. It takes a deep dive into Consumer message flows and some common patterns for consuming messages from Kafka Topics. It walks through some performance optimization techniques for Kafka Consumers.
Chapter 5, Building Spark Streaming Applications with Kafka, is about how to integrate Kafka with the popular distributed processing engine, Apache Spark. This also provides a brief overview about Apache Kafka, the different approaches for integrating Kafka with Spark, and their advantages and disadvantages. It showcases examples in Java as well as in Scala with use cases.
Chapter 6, Building Storm Applications with Kafka, is about how to integrate Kafka with the popular real-time processing engine Apache Storm. This also covers a brief overview of Apache Storm and Apache Heron. It showcases examples of different approaches of event processing using Apache Storm and Kafka, including guaranteed event processing.
Chapter 7, Using Kafka with Confluent Platform, is about the emerging streaming platform Confluent that enables you to use Kafka effectively with many other added functionalities. It showcases many examples for the topics covered in the chapter.
Chapter 8, Building ETL Pipelines Using Kafka, introduces Kafka Connect, a common component, which for building ETL pipelines involving Kafka. It emphasizes how to use Kafka Connect in ETL pipelines and discusses some in-depth technical concepts surrounding it.
Chapter 9, Building Streaming Applications Using Kafka Streams, is about how to build streaming applications using Kafka Stream, which is an integral part of the Kafka 0.10 release. This also covers building fast, reliable streaming applications using Kafka Stream, with examples.
Chapter 10, Kafka Cluster Deployment, focuses on Kafka cluster deployment on enterprise-grade production systems. It covers in depth, Kafka clusters such as how to do capacity planning, how to manager single/multi cluster deployments, and so on. It also covers how to manage Kafka in multi-tenant environments. It further walks you through the various steps involved in Kafka data migrations.
Chapter 11, Using Kafka in Big Data Applications, walks through some of the aspects of using Kafka in big data applications. This covers how to manage high volumes in Kafka, how to ensure guaranteed message delivery, the best ways to handle failures without any data loss, and some governance principles that can be applied while using Kafka in big data pipelines.
Chapter 12, Securing Kafka, is about securing your Kafka cluster. It covers authentication and authorization mechanisms along with examples.
Chapter 13, Streaming Applications Design Considerations, is about different design considerations for building a streaming application. It walks you through aspects such as parallelism, memory tuning, and so on. It provides comprehensive coverage of the different paradigms for designing a streaming application.
You will need the following software to work with the examples in this book:
Apache Kafka, big data, Apache Hadoop, publish and subscribe, enterprise messaging system, distributed Streaming, Producer API, Consumer API, Streams API, Connect API
If you want to learn how to use Apache Kafka and the various tools in the Kafka ecosystem in the easiest possible manner, this book is for you. Some programming experience with Java is required to get the most out of this book.
In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.
Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The next lines of code read the link and assign it to the to theBeautifulSoupfunction."
A block of code is set as follows:
import
org.apache.Kafka.clients.producer.KafkaProducer;
import
org.apache.Kafka.clients.producer.ProducerRecord;
import
org.apache.Kafka.clients.producer.RecordMetadata;
Any command-line input or output is written as follows:
sudo su - hdfs -c "hdfs dfs -chmod 777 /tmp/hive"
sudo chmod 777 /tmp/hive
New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go toFiles|Settings|Project Name|Project Interpreter."
Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.
To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.
If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.
Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.
You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
You can download the code files by following these steps:
Log in or register to our website using your e-mail address and password.
Hover the mouse pointer on the
SUPPORT
tab at the top.
Click on
Code Downloads & Errata
.
Enter the name of the book in the
Search
box.
Select the book for which you're looking to download the code files.
Choose from the drop-down menu where you purchased this book from.
Click on
Code Download
.
Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:
WinRAR / 7-Zip for Windows
Zipeg / iZip / UnRarX for Mac
7-Zip / PeaZip for Linux
The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Building-Data-Streaming-Applications-with-Apache-Kafka. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!
We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/BuildingDataStreamingApplicationswithApacheKafka_ColorImages.pdf.
Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.
To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.
Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.
Please contact us at [email protected] with a link to the suspected pirated material.
We appreciate your help in protecting our authors and our ability to bring you valuable content.
If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.
People have different styles of learning. This chapter will give you the necessary context to help you achieve a better understanding of the book.
The goal of any Enterprise Integration is to establish unification between separate applications to achieve a consolidated set of functionalities.
These discrete applications are built using different programming languages and platforms. To achieve any unified functionality, these applications need to share information among themselves. This information exchange happens over a network in small packets using different protocols and utilities.
So let us say that you are adding a new campaign component to an existing e-commerce application that needs to interact with a different application to calculate loyalty points. In this case, you will be integrating your e-commerce application with a different application using enterprise integration strategies.
This chapter will help you understand messaging systems, one of the common ways of establishing enterprise integration. It will walk you through various types of messaging system and their uses. At the end of this chapter, you will be able to distinguish between different messaging models available today and understand different design considerations for enterprise application integration.
We will be covering the following topics in this chapter:
Principles of designing a good messaging system
How a messaging system works
A point-to-point messaging system
A publish-subscribe messaging system
The AMQP messaging protocol
Finally we will go through the messaging system needed in designing streaming applications
Continuing our focus on messaging systems, you may have seen applications where one application uses data that gets processed by other external applications or applications consuming data from one or more data sources. In such scenarios, messaging systems can be used as an integration channel for information exchange between different applications. If you haven't built such an application yet, then don't worry about it. We will build it in upcoming chapters.
In any application integration system design, there are a few important principles that should be kept in mind, such as loose coupling, common interface definitions, latency, and reliability. Let's look into some of these one by one:
Loose coupling
between applications ensures minimal dependencies on each other. This ensures that any changes in one application do not affect other applications. Tightly coupled applications are coded as per predefined specifications of other applications. Any change in specification would break or change the functionality of other dependent applications.
Common interface definitions
ensure a common agreed-upon data format for exchange between applications. This not only helps in establishing message exchange standards among applications but also ensures that some of the best practices of information exchange can be enforced easily. For example, you can choose to use the Avro data format to exchange messages. This can be defined as your common interface standard for information exchange.
Avro
is a good choice for message exchanges as it serializes data in a compact binary format and supports schema evolution.
Latency
is the time taken by messages to traverse between the sender and receiver. Most applications want to achieve low latency as a critical requirement. Even in an asynchronous mode of communication, high latency is not desirable as significant delay in receiving messages could cause significant loss to any organization.
Reliability
ensures that temporary unavailability of applications does not affect dependent applications that need to exchange information. In general, the w
hen source application sends a message to the remote application,
sometimes the remote application may be running slow or it may not be running due to some failure
. Reliable, asynchronous message communication ensures that the source application continues its work and feels confident that the remote application will resume its task later.
As mentioned earlier, application integration is key for any enterprise to achieve a comprehensive set of functionalities spanning multiple discrete applications. To achieve this, applications need to share information in a timely manner. A messaging system is one of the most commonly used mechanisms for information exchange in applications.
The other mechanisms used to share information could be remote procedure calls (RPC), file share, shared databases, and web service invocation. While choosing your application integration mechanism, it is important that you keep in mind the guiding principles discussed earlier. For example, in the case of shared databases, changes done by one application could directly affect other applications that are using the same database tables. Both of the applications are tightly coupled. You may want to avoid that in cases where you have additional rules to be applied before accepting the changes in the other application. Likewise, you have to think about all such guiding principles before finalizing ways of integrating your applications.
As depicted in the following figure, message-based application integration involves discrete enterprise applications connecting to a common messaging system and either sending or receiving data to it. A messaging system acts as an integration component between multiple applications. Such an integration invokes different application behaviors based on application information exchanges. It also adheres to some of the design principles mentioned earlier.
Enterprises have started adopting micro service architecture and the main advantage of doing so is to make applications loosely coupled with each other. Applications communicate with each other asynchronously and it makes communication more reliable as both applications need not be running simultaneously. A messaging system helps in transferring data from one application to the other. It allows applications to think of what they need to share as data rather than how it needs to be shared. You can share small packets of data or data streams with other applications using messaging in a timely and real-time fashion. This fits the need of low latency real-time application integration.
For a start, you should understand some of the basic concepts of any messaging system. Understanding these concepts is beneficial to you as it will help you understand different messaging technologies such as Kafka. The following are some of the basic messaging concepts:
Message queues
: You will sometimes find q
ueues
referred as c
hannels
as well. In a simple way, they are connectors between sending and receiving applications. Their core function is to receive message packets from the source application and send it to the receiver application in a timely and reliable manner.
Messages (data packets)
: A message is an atomic data packet that gets transmitted over a network to a message queue. The sender application breaks data into smaller data packets and wraps it as a message with protocol and header information. It then sends it to the message queue. In a similar fashion, a receiver application receives a message and extracts the data from the message wrapper to further process it.
Sender (producer)
: Sender or producer applications are the sources of data that needs to be sent to a certain destination. They establish connections to message queue endpoints and send data in smaller message packets adhering to common interface standards. Depending on the type of messaging system in use, sender applications can decide to send data one by one or in a batch.
Receiver (consumer)
: Receiver or consumer applications are the receivers of messages sent by the sender application. They either pull data from message queues or they receive data from messages queues through a persistent connection. On receiving messages, they extract data from those message packets and use it for further processing.
Data transmission protocols
: Data transmission protocols determine rules to govern message exchanges between applications. Different queuing systems use different data transmission protocols. It depends on the technical implementation of the messaging endpoints. Kafka uses binary protocols over TCP. The client initiates a socket connection with Kafka queues and then writes messages along with reading back the acknowledgment message. Some examples of such data transmission protocols are
AMQP
(
Advance Message Queuing Protocol
),
STOMP
(
Streaming Text Oriented Message Protocol
),
MQTT
(
Message Queue Telemetry Protocol
), and
HTTP
(
Hypertext Transfer Protocol
).
Transfer mode
: The transfer mode in a messaging system can be understood as the manner in which data is transferred from the source application to the receiver application. Examples of transfer modes are synchronous, asynchronous, and batch modes.
This section focuses on the point-to-point (PTP) messaging model. In a PTP messaging model, message producers are calledsendersandconsumersare called receivers. They exchange messages by means of a destination called a queue. Senders produce messages to a queue and receivers consume messages from this queue. What distinguishes point-to-point messaging is that a message can be consumed by only one consumer.
Point-to-point messaging is generally used when a single message will be received by only one message consumer. There may be multiple consumers listening on the queue for the same message but only one of the consumers will receive it. Note that there can be multiple producers as well. They will be sending messages to the queue but it will be received by only one receiver.
Typically, in the PTP model, a receiver requests a message that a sender sends to the queue, rather than subscribing to a channel and receiving all messages sent on a particular queue.
You can think of queues supporting PTP messaging models as FIFO queues. In such queues, messages are sorted in the order in which they were received, and as they are consumed, they are removed from the head of the queue. Queues such as Kafka maintain message offsets. Instead of deleting the messages, they increment the offsets for the receiver. Offset-based models provide better support for replaying messages.
The following figure shows an example model of PTP. Suppose there are two senders, S1 and S2, who send a message to a queue, Q1. On the other side, there are two receivers, R1 and R2, who receive a message from Q1. In this case, R1 will consume the message from S2 and R2 will consume the message from S1:
You can deduce the following important points about a PTP messaging system from the preceding figure:
More than one sender can produce and send messages to a queue. Senders can share a connection or use different connections, but they can all access the same queue.
More than one receiver can consume messages from a queue, but each message can be consumed by only one receiver. Thus,
Message 1
,
Message 2
, and
Message 3
are consumed by different receivers. (This is a message queue extension.)
Receivers can share a connection or use different connections, but they can all access the same queue. (This is a message queue extension.)
Senders and receivers have no timing dependencies; the receiver can consume a message whether or not it was running when the sender produced and sent the message.
Messages are placed in a queue in the order they are produced, but the order in which they are consumed depends on factors such as message expiration date, message priority, whether a selector is used in consuming messages, and the relative message processing rate of the consumers.
Senders and receivers can be added and deleted dynamically at runtime, thus allowing the messaging system to expand or contract as needed.
The PTP messaging model can be further categorized into two types:
Fire-and-forget model
Request/reply model
In fire-and-forget processing, the producer sends a message to a centralized queue and does not wait for any acknowledgment immediately. It can be used in a scenario where you want to trigger an action or send a signal to the receiver to trigger some action that does not require a response. For example, you may want to use this method to send a message to a logging system, to alert a system to generate a report, or trigger an action to some other system. The following figurerepresents a fire-and-forget PTP messaging model:
With an asynchronous request/reply PTP model, the message sender sends a message on one queue and then does a blocking wait on a reply queue waiting for the response from the receiver. The request/reply model provides for a high degree of decoupling between the sender and receiver, allowing the message producer and consumer components to be heterogeneous languages or platforms. The followingfigurerepresents a request/reply PTP messaging model:
Before concluding this section, it is important for you to understand where you can use the PTP model of messaging. It is used when you want one receiver to process any given message once and only once. This is perhaps the most critical difference: only one consumer will process a given message.
Another use case for point-to-point messaging is when you need synchronous communication between components that are written in different technology platforms or programming languages. For example, you may have an application written in a language, say PHP, which may want to communicate with a Twitter application written in Java to process tweets for analysis. In this scenario, a point-to-point messaging system helps provide interoperability between these cross-platform applications.
In this section, we will take a look at a different messaging model called the publish/subscribe (Pub/Sub) messaging model.
In this type of model, asubscriberregisters its interest in a particular topic or event and is subsequently notified about the event asynchronously. Subscribers have the ability to express their interest in an event, or a pattern of events, and are subsequently notified of any event generated by a publisher that matches their registered interest. These events are generated bypublishers. It is different from the PTP messaging model in a way that a topic can have multiple receivers and every receiver receives a copy of each message. In other words, a message is broadcast to all receivers without them having to poll the topic. In the PTP model, the receiver polls the queue for new messages.
A Pub/Sub messaging model is used when you need to broadcast an event or message to many message consumers. Unlike the PTP messaging model, all message consumers (called subscribers) listening on the topic will receive the message.