Building Data Streaming Applications with Apache Kafka - Chanchal Singh - E-Book

Building Data Streaming Applications with Apache Kafka E-Book

Chanchal Singh

0,0
34,79 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records, and process them in a fault-tolerant way as they occur.

This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications, and tackles some common challenges such as how to use Kafka efficiently and handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.

By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka, and to design efficient streaming data applications with it.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 306

Veröffentlichungsjahr: 2017

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Building Data Streaming Applications with Apache Kafka
Designing and deploying enterprise messaging queues
Manish Kumar
Chanchal Singh

BIRMINGHAM - MUMBAI

Building Data Streaming Applications with Apache Kafka

Copyright © 2017 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the authors, nor Packt Publishing, and its dealers and distributors will be held liable for any damages caused or alleged to be caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

First published: August 2017

Production reference: 1170817

Published by Packt Publishing Ltd.
Livery Place
35 Livery Street
Birmingham
B3 2PB, UK.

ISBN 978-1-78728-398-5

www.packtpub.com

Credits

Authors

Manish Kumar

Chanchal Singh

Copy Editor

Manisha Sinha

Reviewer

Anshul Joshi

Project Coordinator

Manthan Patel

Commissioning Editor

Amey Varangaonkar

Proofreader

Safis Editing

Acquisition Editor

Tushar Gupta

Indexer

Tejal Daruwale Soni

Content Development Editor

Tejas Limkar

Graphics

Tania Dutta

Technical Editor

Dinesh Chaudhary

Production Coordinator

Deepika Naik

About the Authors

Manish Kumar is a Technical Architect at DataMetica Solution Pvt. Ltd.. He has approximately 11 years, experience in data management, working as a Data Architect and Product Architect. He has extensive experience in building effective ETL pipelines, implementing security over Hadoop, and providing the best possible solutions to Data Science problems. Before joining the world of big data, he worked as an Tech Lead for Sears Holding, India. He is a regular speaker on big data concepts such as Hadoop and Hadoop Security in various events. Manish has a Bachelor's degree in Information Technology.

I would like to thank my parents, Dr. N.K. Singh and Mrs. Rambha Singh, for their support and blessings, my wife; Mrs. Swati Singh, for her successfully keeping me healthy and happy; and my adorable son, Master Lakshya Singh, for teaching me how to enjoy the small things in life. I would like to extend my gratitude to Mr. Prashant Jaiswal, whose mentorship and friendship will remain gems of my life, and Chanchal Singh, my esteemed friend, for standing by me in times of trouble and happiness. This note will be incomplete if I do not mention Mr. Anand Deshpande, Mr. Parashuram Bastawade, Mr. Niraj Kumar, Mr. Rajiv Gupta, and Dr. Phil Shelley for giving me exciting career opportunities and showing trust in me, no matter how adverse the situation was.

Chanchal Singh is a Software Engineer at DataMetica Solution Pvt. Ltd.. He has over three years' experience in product development and architect design, working as a Product Developer, Data Engineer, and Team Lead. He has a lot of experience with different technologies such as Hadoop, Spark, Storm, Kafka, Hive, Pig, Flume, Java, Spring, and many more. He believes in sharing knowledge and motivating others for innovation. He is the co-organizer of the Big Data Meetup - Pune Chapter.

He has been recognized for putting innovative ideas into organizations. He has a Bachelor's degree in Information Technology from the University of Mumbai and a Master's degree in Computer Application from Amity University. He was also part of the Entrepreneur Cell in IIT Mumbai.

I would like to thank my parents, Mr. Parasnath Singh and Mrs. Usha Singh, for showering their blessings on me and their loving support. I am eternally grateful to my love, Ms. Jyoti, for being with me in every situation and encouraging me. I would also like to express my gratitude to all the mentors I've had over the years. Special thanks to Mr Abhijeet Shingate who helped me as a mentor and guided me in the right direction during the initial phase of my career. I am highly indebted to Mr. Manish Kumar, without whom writing this book would have been challenging, for always enlightening me and sharing his knowledge with me. I would like to extend my sincere thanks by mentioning a few great personalities: Mr Rajiv Gupta, Mr. Niraj Kumar, Mr. Parashuram Bastawade, and Dr.Phil Shelley for giving me ample opportunities to explore solutions for real customer problems and believing in me.

About the Reviewer

Anshul Joshi is a Data Scientist with experience in recommendation systems, predictive modeling, neural networks, and high performance computing. His research interests are deep learning, artificial intelligence, computational physics, and biology.

Most of the time, he can be caught exploring GitHub or trying anything new that he can get his hands on. He blogs on https://anshuljoshi.com/.

www.PacktPub.com

For support files and downloads related to your book, please visit www.PacktPub.com.

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at; www.PacktPub.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.PacktPub.com, you can also read a collection of free technical articles, sign up for a range of free newsletters and receive exclusive discounts and offers on Packt books and eBooks.

https://www.packtpub.com/mapt

Get the most in-demand software skills with Mapt. Mapt gives you full access to all Packt books and video courses, as well as industry-leading tools to help you plan your personal development and advance your career.

Why subscribe?

Fully searchable across every book published by Packt

Copy and paste, print, and bookmark content

On demand and accessible via a web browser

Customer Feedback

Thanks for purchasing this Packt book. At Packt, quality is at the heart of our editorial process. To help us improve, please leave us an honest review on this book's Amazon page at https://www.amazon.com/dp/1787283984.

If you'd like to join our team of regular reviewers, you can e-mail us at [email protected]. We award our regular reviewers with free eBooks and videos in exchange for their valuable feedback. Help us be relentless in improving our products!

Table of Contents

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Downloading the example code

Downloading the color images of this book

Errata

Piracy

Questions

Introduction to Messaging Systems

Understanding the principles of messaging systems

Understanding messaging systems

Peeking into a point-to-point messaging system

Publish-subscribe messaging system

Advance Queuing Messaging Protocol

Using messaging systems in big data streaming applications

Summary

Introducing Kafka the Distributed Messaging Platform

Kafka origins

Kafka's architecture

Message topics

Message partitions

Replication and replicated logs

Message producers

Message consumers

Role of Zookeeper

Summary

Deep Dive into Kafka Producers

Kafka producer internals

Kafka Producer APIs

Producer object and ProducerRecord object

Custom partition

Additional producer configuration

Java Kafka producer example

Common messaging publishing patterns

Best practices

Summary

Deep Dive into Kafka Consumers

Kafka consumer internals

Understanding the responsibilities of Kafka consumers

Kafka consumer APIs

Consumer configuration

Subscription and polling

Committing and polling

Additional configuration

Java Kafka consumer

Scala Kafka consumer

Rebalance listeners

Common message consuming patterns

Best practices

Summary

Building Spark Streaming Applications with Kafka

Introduction to Spark 

Spark architecture

Pillars of Spark

The Spark ecosystem

Spark Streaming 

Receiver-based integration

Disadvantages of receiver-based approach

Java example for receiver-based integration

Scala example for receiver-based integration

Direct approach

Java example for direct approach

Scala example for direct approach

Use case log processing - fraud IP detection

Maven

Producer 

Property reader

Producer code 

Fraud IP lookup

Expose hive table

Streaming code

Summary

Building Storm Applications with Kafka

Introduction to Apache Storm

Storm cluster architecture

The concept of a Storm application

Introduction to Apache Heron

Heron architecture 

Heron topology architecture

Integrating Apache Kafka with Apache Storm - Java

Example

Integrating Apache Kafka with Apache Storm - Scala

Use case – log processing in Storm, Kafka, Hive

Producer

Producer code 

Fraud IP lookup

Storm application

Running the project

Summary

Using Kafka with Confluent Platform

Introduction to Confluent Platform

Deep driving into Confluent architecture

Understanding Kafka Connect and Kafka Stream

Kafka Streams

Playing with Avro using Schema Registry

Moving Kafka data to HDFS

Camus 

Running Camus

Gobblin

Gobblin architecture

Kafka Connect

Flume

Summary

Building ETL Pipelines Using Kafka

Considerations for using Kafka in ETL pipelines

Introducing Kafka Connect

Deep dive into Kafka Connect

Introductory examples of using Kafka Connect

Kafka Connect common use cases

Summary 

Building Streaming Applications Using Kafka Streams

Introduction to Kafka Streams

Using Kafka in Stream processing

Kafka Stream - lightweight Stream processing library 

Kafka Stream architecture 

Integrated framework advantages

Understanding tables and Streams together

Maven dependency

Kafka Stream word count

KTable

Use case example of Kafka Streams

Maven dependency of Kafka Streams

Property reader

IP record producer

IP lookup service

Fraud detection application

Summary

Kafka Cluster Deployment

Kafka cluster internals

Role of Zookeeper

Replication

Metadata request processing

Producer request processing

Consumer request processing

Capacity planning

Capacity planning goals

Replication factor

Memory

Hard drives

Network

CPU

Single cluster deployment

Multicluster deployment

Decommissioning brokers

Data migration

Summary

Using Kafka in Big Data Applications

Managing high volumes in Kafka

Appropriate hardware choices 

Producer read and consumer write choices

Kafka message delivery semantics

At least once delivery 

At most once delivery 

Exactly once delivery 

Big data and Kafka common usage patterns

Kafka and data governance

Alerting and monitoring

Useful Kafka matrices

Producer matrices 

Broker matrices

Consumer metrics

Summary

Securing Kafka

An overview of securing Kafka

Wire encryption using SSL

Steps to enable SSL in Kafka

Configuring SSL for Kafka Broker

Configuring SSL for Kafka clients

Kerberos SASL for authentication

Steps to enable SASL/GSSAPI - in Kafka

Configuring SASL for Kafka broker

Configuring SASL for Kafka client - producer and consumer

Understanding ACL and authorization

Common ACL operations

List ACLs

Understanding Zookeeper authentication

Apache Ranger for authorization

Adding Kafka Service to Ranger

Adding policies 

Best practices

Summary

Streaming Application Design Considerations

Latency and throughput

Data and state persistence

Data sources

External data lookups

Data formats

Data serialization

Level of parallelism

Out-of-order events

Message processing semantics

Summary

Preface

Apache Kafka is a popular distributed streaming platform that acts as a messaging queue or an enterprise messaging system. It lets you publish and subscribe to a stream of records and process them in a fault-tolerant way as they occur.

This book is a comprehensive guide to designing and architecting enterprise-grade streaming applications using Apache Kafka and other big data tools. It includes best practices for building such applications and tackles some common challenges such as how to use Kafka efficiently to handle high data volumes with ease. This book first takes you through understanding the type messaging system and then provides a thorough introduction to Apache Kafka and its internal details. The second part of the book takes you through designing streaming application using various frameworks and tools such as Apache Spark, Apache Storm, and more. Once you grasp the basics, we will take you through more advanced concepts in Apache Kafka such as capacity planning and security.

By the end of this book, you will have all the information you need to be comfortable with using Apache Kafka and to design efficient streaming data applications with it.

What this book covers

Chapter 1, Introduction to Messaging System, introduces concepts of messaging systems. It covers an overview of messaging systems and their enterprise needs. It further emphasizes the different ways of using messaging systems such as point to point or publish/subscribe. It introduces AMQP as well.

Chapter 2, Introducing Kafka - The Distributed Messaging Platform, introduces distributed messaging platforms such as Kafka. It covers the Kafka architecture and touches upon its internal component. It further explores the roles and importance of each Kafka components and how they contribute towards low latency, reliability, and the scalability of Kafka Message Systems.

Chapter 3, Deep Dive into Kafka Producers, is about how to publish messages to Kafka Systems. This further covers Kafka Producer APIs and their usage. It showcases examples of using Kafka Producer APIs with Java and Scala programming languages. It takes a deep dive into Producer message flows and some common patterns for producing messages to Kafka Topics. It walks through some performance optimization techniques for Kafka Producers.

Chapter 4, Deep Dive into Kafka Consumers, is about how to consume messages from Kafka Systems. This also covers Kafka Consumer APIs and their usage. It showcases examples of using Kafka Consumer APIs with the Java and Scala programming languages. It takes a deep dive into Consumer message flows and some common patterns for consuming messages from Kafka Topics. It walks through some performance optimization techniques for Kafka Consumers.

Chapter 5, Building Spark Streaming Applications with Kafka, is about how to integrate Kafka with the popular distributed processing engine, Apache Spark. This also provides a brief overview about Apache Kafka, the different approaches for integrating Kafka with Spark, and their advantages and disadvantages. It showcases examples in Java as well as in Scala with use cases.

Chapter 6, Building Storm Applications with Kafka, is about how to integrate Kafka with the popular real-time processing engine Apache Storm. This also covers a brief overview of Apache Storm and Apache Heron. It showcases examples of different approaches of event processing using Apache Storm and Kafka, including guaranteed event processing.

Chapter 7, Using Kafka with Confluent Platform, is about the emerging streaming platform Confluent that enables you to use Kafka effectively with many other added functionalities. It showcases many examples for the topics covered in the chapter.

Chapter 8, Building ETL Pipelines Using Kafka, introduces Kafka Connect, a common component, which for building ETL pipelines involving Kafka. It emphasizes how to use Kafka Connect in ETL pipelines and discusses some in-depth technical concepts surrounding it.

Chapter 9, Building Streaming Applications Using Kafka Streams, is about how to build streaming applications using Kafka Stream, which is an integral part of the Kafka 0.10 release. This also covers building fast, reliable streaming applications using Kafka Stream, with examples.

Chapter 10, Kafka Cluster Deployment, focuses on Kafka cluster deployment on enterprise-grade production systems. It covers in depth, Kafka clusters such as how to do capacity planning, how to manager single/multi cluster deployments, and so on. It also covers how to manage Kafka in multi-tenant environments. It further walks you through the various steps involved in Kafka data migrations.

Chapter 11, Using Kafka in Big Data Applications, walks through some of the aspects of using Kafka in big data applications. This covers how to manage high volumes in Kafka, how to ensure guaranteed message delivery, the best ways to handle failures without any data loss, and some governance principles that can be applied while using Kafka in big data pipelines.

Chapter 12, Securing Kafka, is about securing your Kafka cluster. It covers authentication and authorization mechanisms along with examples.

Chapter 13, Streaming Applications Design Considerations, is about different design considerations for building a streaming application. It walks you through aspects such as parallelism, memory tuning, and so on. It provides comprehensive coverage of the different paradigms for designing a streaming application.

What you need for this book

You will need the following software to work with the examples in this book:

Apache Kafka, big data, Apache Hadoop, publish and subscribe, enterprise messaging system, distributed Streaming, Producer API, Consumer API, Streams API, Connect API

Who this book is for

If you want to learn how to use Apache Kafka and the various tools in the Kafka ecosystem in the easiest possible manner, this book is for you. Some programming experience with Java is required to get the most out of this book.

Conventions

In this book, you will find a number of text styles that distinguish between different kinds of information. Here are some examples of these styles and an explanation of their meaning.

Code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles are shown as follows: "The next lines of code read the link and assign it to the to theBeautifulSoupfunction."

A block of code is set as follows:

import

org.apache.Kafka.clients.producer.KafkaProducer;

import

org.apache.Kafka.clients.producer.ProducerRecord;

import

org.apache.Kafka.clients.producer.RecordMetadata;

Any command-line input or output is written as follows:

sudo su - hdfs -c "hdfs dfs -chmod 777 /tmp/hive"

sudo chmod 777 /tmp/hive

New terms and important words are shown in bold. Words that you see on the screen, for example, in menus or dialog boxes, appear in the text like this: "In order to download new modules, we will go toFiles|Settings|Project Name|Project Interpreter."

Warnings or important notes appear in a box like this.
Tips and tricks appear like this.

Reader feedback

Feedback from our readers is always welcome. Let us know what you think about this book-what you liked or disliked. Reader feedback is important for us as it helps us develop titles that you will really get the most out of.

To send us general feedback, simply e-mail [email protected], and mention the book's title in the subject of your message.

If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, see our author guide at www.packtpub.com/authors.

Customer support

Now that you are the proud owner of a Packt book, we have a number of things to help you to get the most from your purchase.

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

Log in or register to our website using your e-mail address and password.

Hover the mouse pointer on the

SUPPORT

tab at the top.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box.

Select the book for which you're looking to download the code files.

Choose from the drop-down menu where you purchased this book from.

Click on

Code Download

.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR / 7-Zip for Windows

Zipeg / iZip / UnRarX for Mac

7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Building-Data-Streaming-Applications-with-Apache-Kafka. We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Downloading the color images of this book

We also provide you with a PDF file that has color images of the screenshots/diagrams used in this book. The color images will help you better understand the changes in the output. You can download this file from https://www.packtpub.com/sites/default/files/downloads/BuildingDataStreamingApplicationswithApacheKafka_ColorImages.pdf.

Errata

Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you find a mistake in one of our books-maybe a mistake in the text or the code-we would be grateful if you could report this to us. By doing so, you can save other readers from frustration and help us improve subsequent versions of this book. If you find any errata, please report them by visiting http://www.packtpub.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details of your errata. Once your errata are verified, your submission will be accepted and the errata will be uploaded to our website or added to any list of existing errata under the Errata section of that title.

To view the previously submitted errata, go to https://www.packtpub.com/books/content/support and enter the name of the book in the search field. The required information will appear under the Errata section.

Piracy

Piracy of copyrighted material on the Internet is an ongoing problem across all media. At Packt, we take the protection of our copyright and licenses very seriously. If you come across any illegal copies of our works in any form on the Internet, please provide us with the location address or website name immediately so that we can pursue a remedy.

Please contact us at [email protected] with a link to the suspected pirated material.

We appreciate your help in protecting our authors and our ability to bring you valuable content.

Questions

If you have a problem with any aspect of this book, you can contact us at [email protected], and we will do our best to address the problem.

Introduction to Messaging Systems

People have different styles of learning. This chapter will give you the necessary context to help you achieve a better understanding of the book.

The goal of any Enterprise Integration is to establish unification between separate applications to achieve a consolidated set of functionalities.

These discrete applications are built using different programming languages and platforms. To achieve any unified functionality, these applications need to share information among themselves. This information exchange happens over a network in small packets using different protocols and utilities.

So let us say that you are adding a new campaign component to an existing e-commerce application that needs to interact with a different application to calculate loyalty points. In this case, you will be integrating your e-commerce application with a different application using enterprise integration strategies.

This chapter will help you understand messaging systems, one of the common ways of establishing enterprise integration. It will walk you through various types of messaging system and their uses. At the end of this chapter, you will be able to distinguish between different messaging models available today and understand different design considerations for enterprise application integration.

We will be covering the following topics in this chapter:

Principles of designing a good messaging system

How a messaging system works

A point-to-point messaging system

A publish-subscribe messaging system

The AMQP messaging protocol

Finally we will go through the messaging system needed in designing streaming applications

Understanding the principles of messaging systems

Continuing our focus on messaging systems, you may have seen applications where one application uses data that gets processed by other external applications or applications consuming data from one or more data sources. In such scenarios, messaging systems can be used as an integration channel for information exchange between different applications. If you haven't built such an application yet, then don't worry about it. We will build it in upcoming chapters.

In any application integration system design, there are a few important principles that should be kept in mind, such as loose coupling, common interface definitions, latency, and reliability. Let's look into some of these one by one:

Loose coupling

between applications ensures minimal dependencies on each other. This ensures that any changes in one application do not affect other applications. Tightly coupled applications are coded as per predefined specifications of other applications. Any change in specification would break or change the functionality of other dependent applications.

Common interface definitions

ensure a common agreed-upon data format for exchange between applications. This not only helps in establishing message exchange standards among applications but also ensures that some of the best practices of information exchange can be enforced easily. For example, you can choose to use the Avro data format to exchange messages. This can be defined as your common interface standard for information exchange.

Avro

is a good choice for message exchanges as it serializes data in a compact binary format and supports schema evolution.

Latency

is the time taken by messages to traverse between the sender and receiver. Most applications want to achieve low latency as a critical requirement. Even in an asynchronous mode of communication, high latency is not desirable as significant delay in receiving messages could cause significant loss to any organization.

Reliability

ensures that temporary unavailability of applications does not affect dependent applications that need to exchange information. In general, the w

hen source application sends a message to the remote application,

sometimes the remote application may be running slow or it may not be running due to some failure

. Reliable, asynchronous message communication ensures that the source application continues its work and feels confident that the remote application will resume its task later.

Understanding messaging systems

As mentioned earlier, application integration is key for any enterprise to achieve a comprehensive set of functionalities spanning multiple discrete applications. To achieve this, applications need to share information in a timely manner. A messaging system is one of the most commonly used mechanisms for information exchange in applications.

The other mechanisms used to share information could be remote procedure calls (RPC), file share, shared databases, and web service invocation. While choosing your application integration mechanism, it is important that you keep in mind the guiding principles discussed earlier. For example, in the case of shared databases, changes done by one application could directly affect other applications that are using the same database tables. Both of the applications are tightly coupled. You may want to avoid that in cases where you have additional rules to be applied before accepting the changes in the other application. Likewise, you have to think about all such guiding principles before finalizing ways of integrating your applications.

As depicted in the following figure, message-based application integration involves discrete enterprise applications connecting to a common messaging system and either sending or receiving data to it. A messaging system acts as an integration component between multiple applications. Such an integration invokes different application behaviors based on application information exchanges. It also adheres to some of the design principles mentioned earlier.

A graphical display of how messaging systems are linked to applications

Enterprises have started adopting micro service architecture and the main advantage of doing so is to make applications loosely coupled with each other. Applications communicate with each other asynchronously and it makes communication more reliable as both applications need not be running simultaneously. A messaging system helps in transferring data from one application to the other. It allows applications to think of what they need to share as data rather than how it needs to be shared. You can share small packets of data or data streams with other applications using messaging in a timely and real-time fashion. This fits the need of low latency real-time application integration.

For a start, you should understand some of the basic concepts of any messaging system. Understanding these concepts is beneficial to you as it will help you understand different messaging technologies such as Kafka. The following are some of the basic messaging concepts:

Message queues

: You will sometimes find q

ueues

referred as c

hannels

as well. In a simple way, they are connectors between sending and receiving applications. Their core function is to receive message packets from the source application and send it to the receiver application in a timely and reliable manner.

Messages (data packets)

: A message is an atomic data packet that gets transmitted over a network to a message queue. The sender application breaks data into smaller data packets and wraps it as a message with protocol and header information. It then sends it to the message queue. In a similar fashion, a receiver application receives a message and extracts the data from the message wrapper to further process it.

Sender (producer)

: Sender or producer applications are the sources of data that needs to be sent to a certain destination. They establish connections to message queue endpoints and send data in smaller message packets adhering to common interface standards. Depending on the type of messaging system in use, sender applications can decide to send data one by one or in a batch.

Receiver (consumer)

: Receiver or consumer applications are the receivers of messages sent by the sender application. They either pull data from message queues or they receive data from messages queues through a persistent connection. On receiving messages, they extract data from those message packets and use it for further processing.

Data transmission protocols

: Data transmission protocols determine rules to govern message exchanges between applications. Different queuing systems use different data transmission protocols. It depends on the technical implementation of the messaging endpoints. Kafka uses binary protocols over TCP. The client initiates a socket connection with Kafka queues and then writes messages along with reading back the acknowledgment message. Some examples of such data transmission protocols are

AMQP

(

Advance Message Queuing Protocol

),

STOMP

(

Streaming Text Oriented Message Protocol

),

MQTT

(

Message Queue Telemetry Protocol

), and

HTTP

(

Hypertext Transfer Protocol

).

Transfer mode

: The transfer mode in a messaging system can be understood as the manner in which data is transferred from the source application to the receiver application. Examples of transfer modes are synchronous, asynchronous, and batch modes.

Peeking into a point-to-point messaging system

This section focuses on the point-to-point (PTP) messaging model. In a PTP messaging model, message producers are calledsendersandconsumersare called receivers. They exchange messages by means of a destination called a queue. Senders produce messages to a queue and receivers consume messages from this queue. What distinguishes point-to-point messaging is that a message can be consumed by only one consumer.

Point-to-point messaging is generally used when a single message will be received by only one message consumer. There may be multiple consumers listening on the queue for the same message but only one of the consumers will receive it. Note that there can be multiple producers as well. They will be sending messages to the queue but it will be received by only one receiver.

A PTP model is based on the concept of sending a message to a named destination. This named destination is the message queue's endpoint that is listening to incoming messages over a port.

Typically, in the PTP model, a receiver requests a message that a sender sends to the queue, rather than subscribing to a channel and receiving all messages sent on a particular queue.

You can think of queues supporting PTP messaging models as FIFO queues. In such queues, messages are sorted in the order in which they were received, and as they are consumed, they are removed from the head of the queue. Queues such as Kafka maintain message offsets. Instead of deleting the messages, they increment the offsets for the receiver. Offset-based models provide better support for replaying messages.

The following figure shows an example model of PTP. Suppose there are two senders, S1 and S2, who send a message to a queue, Q1. On the other side, there are two receivers, R1 and R2, who receive a message from Q1. In this case, R1 will consume the message from S2 and R2 will consume the message from S1:

A graphical representation of how a point-to-point messaging model works

You can deduce the following important points about a PTP messaging system from the preceding figure:

More than one sender can produce and send messages to a queue. Senders can share a connection or use different connections, but they can all access the same queue.

More than one receiver can consume messages from a queue, but each message can be consumed by only one receiver. Thus,

Message 1

,

Message 2

, and

Message 3

are consumed by different receivers. (This is a message queue extension.)

Receivers can share a connection or use different connections, but they can all access the same queue. (This is a message queue extension.)

Senders and receivers have no timing dependencies; the receiver can consume a message whether or not it was running when the sender produced and sent the message.

Messages are placed in a queue in the order they are produced, but the order in which they are consumed depends on factors such as message expiration date, message priority, whether a selector is used in consuming messages, and the relative message processing rate of the consumers.

Senders and receivers can be added and deleted dynamically at runtime, thus allowing the messaging system to expand or contract as needed.

The PTP messaging model can be further categorized into two types:

Fire-and-forget model

Request/reply model

In fire-and-forget processing, the producer sends a message to a centralized queue and does not wait for any acknowledgment immediately. It can be used in a scenario where you want to trigger an action or send a signal to the receiver to trigger some action that does not require a response. For example, you may want to use this method to send a message to a logging system, to alert a system to generate a report, or trigger an action to some other system. The following figurerepresents a fire-and-forget PTP messaging model:

Fire-and-forget message model

With an asynchronous request/reply PTP model, the message sender sends a message on one queue and then does a blocking wait on a reply queue waiting for the response from the receiver. The request/reply model provides for a high degree of decoupling between the sender and receiver, allowing the message producer and consumer components to be heterogeneous languages or platforms. The followingfigurerepresents a request/reply PTP messaging model:

Request/reply message model

Before concluding this section, it is important for you to understand where you can use the PTP model of messaging. It is used when you want one receiver to process any given message once and only once. This is perhaps the most critical difference: only one consumer will process a given message.

Another use case for point-to-point messaging is when you need synchronous communication between components that are written in different technology platforms or programming languages. For example, you may have an application written in a language, say PHP, which may want to communicate with a Twitter application written in Java to process tweets for analysis. In this scenario, a point-to-point messaging system helps provide interoperability between these cross-platform applications.

Publish-subscribe messaging system

In this section, we will take a look at a different messaging model called the publish/subscribe (Pub/Sub) messaging model.

In this type of model, asubscriberregisters its interest in a particular topic or event and is subsequently notified about the event asynchronously. Subscribers have the ability to express their interest in an event, or a pattern of events, and are subsequently notified of any event generated by a publisher that matches their registered interest. These events are generated bypublishers. It is different from the PTP messaging model in a way that a topic can have multiple receivers and every receiver receives a copy of each message. In other words, a message is broadcast to all receivers without them having to poll the topic. In the PTP model, the receiver polls the queue for new messages.

A Pub/Sub messaging model is used when you need to broadcast an event or message to many message consumers. Unlike the PTP messaging model, all message consumers (called subscribers) listening on the topic will receive the message.