Apache Kafka Quick Start Guide - Raúl Estrada - E-Book

Apache Kafka Quick Start Guide E-Book

Raul Estrada

0,0
23,99 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.

Mehr erfahren.
Beschreibung

Apache Kafka is a great open source platform for handling your real-time data pipeline to ensure high-speed filtering and pattern matching on the ?y. In this book, you will learn how to use Apache Kafka for efficient processing of distributed applications and will get familiar with solving everyday problems in fast data and processing pipelines.
This book focuses on programming rather than the configuration management of Kafka clusters or DevOps. It starts off with the installation and setting up the development environment, before quickly moving on to performing fundamental messaging operations such as validation and enrichment.
Here you will learn about message composition with pure Kafka API and Kafka Streams. You will look into the transformation of messages in different formats, such asext, binary, XML, JSON, and AVRO. Next, you will learn how to expose the schemas contained in Kafka with the Schema Registry. You will then learn how to work with all relevant connectors with Kafka Connect. While working with Kafka Streams, you will perform various interesting operations on streams, such as windowing, joins, and aggregations. Finally, through KSQL, you will learn how to retrieve, insert, modify, and delete data streams, and how to manipulate watermarks and windows.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB
MOBI

Seitenzahl: 178

Veröffentlichungsjahr: 2018

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Apache Kafka Quick Start Guide
Leverage Apache Kafka 2.0 to simplify real-time data processing for distributed applications
Raúl Estrada
BIRMINGHAM - MUMBAI

Apache Kafka Quick Start Guide

Copyright © 2018 Packt Publishing

All rights reserved. No part of this book may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, without the prior written permission of the publisher, except in the case of brief quotations embedded in critical articles or reviews.

Every effort has been made in the preparation of this book to ensure the accuracy of the information presented. However, the information contained in this book is sold without warranty, either express or implied. Neither the author, nor Packt Publishing or its dealers and distributors, will be held liable for any damages caused or alleged to have been caused directly or indirectly by this book.

Packt Publishing has endeavored to provide trademark information about all of the companies and products mentioned in this book by the appropriate use of capitals. However, Packt Publishing cannot guarantee the accuracy of this information.

Commissioning Editor: Amey VarangaonkarAcquisition Editor: Siddharth MandalContent Development Editor: Smit CarvalhoTechnical Editor: Niral AlmeidaCopy Editor: Safis EditingProject Coordinator: Pragati ShuklaProofreader: Safis EditingIndexer: Mariammal ChettiyarGraphics: Jason MonteiroProduction Coordinator: Deepika Naik

First published: December 2018

Production reference: 1261218

Published by Packt Publishing Ltd. Livery Place 35 Livery Street Birmingham B3 2PB, UK.

ISBN 978-1-78899-782-9

www.packtpub.com

This book is dedicated to my mom, who convinced me to write it, for her sacrifices and for exemplifying the power of self confidence.
– Raúl Estrada
mapt.io

Mapt is an online digital library that gives you full access to over 5,000 books and videos, as well as industry leading tools to help you plan your personal development and advance your career. For more information, please visit our website.

Why subscribe?

Spend less time learning and more time coding with practical eBooks and Videos from over 4,000 industry professionals

Improve your learning with Skill Plans built especially for you

Get a free eBook or video every month

Mapt is fully searchable

Copy and paste, print, and bookmark content

Packt.com

Did you know that Packt offers eBook versions of every book published, with PDF and ePub files available? You can upgrade to the eBook version at www.packt.com and as a print book customer, you are entitled to a discount on the eBook copy. Get in touch with us at [email protected] for more details.

At www.packt.com, you can also read a collection of free technical articles, sign up for a range of free newsletters, and receive exclusive discounts and offers on Packt books and eBooks.

Contributors

About the author

Raúl Estrada has been a programmer since 1996 and a Java developer since 2001. He loves all topics related to computer science. With more than 15 years of experience in high-availability and enterprise software, he has been designing and implementing architectures since 2003. His specialization is in systems integration, and he mainly participates in projects related to the financial sector. He has been an enterprise architect for BEA Systems and Oracle Inc., but he also enjoys web, mobile, and game programming. Raúl is a supporter of free software and enjoys experimenting with new technologies, frameworks, languages, and methods.

Raúl is the author of other Packt Publishing titles, such as Fast Data Processing Systems with SMACK and Apache Kafka Cookbook.

I want to say thanks to the technical reviewer, Isaac Ruiz; without his effort and patience, it would not have been possible to write this book.
I also thankSiddharth Mandal, the acquisition editor, who believed in this project from the beginning.
And finally, I want to thank all the heroes who contribute to open source projects, specifically with Apache Kafka.

About the reviewer

Isaac RuizGuerra has been a Java programmer since 2001 and an IT consultant since 2003. Isaac is specialized in systems integration, and has participated in projects to do with the financial sector. Isaac has worked mainly on the backend side, using languages such as Java, Python, and Elixir. For more than 10 years, he has worked with different application servers for the Java world, including JBoss, Glassfish, and WLS. Isaac is currently interested in topics such as microservices, cloud native, and serverless. He is a regular lecturer, mainly at conferences related to the JVM. Isaac is interested in the formation of interdisciplinary and high-performance teams.

Packt is searching for authors like you

If you're interested in becoming an author for Packt, please visit authors.packtpub.com and apply today. We have worked with thousands of developers and tech professionals, just like you, to help them share their insight with the global tech community. You can make a general application, apply for a specific hot topic that we are recruiting an author for, or submit your own idea.

Table of Contents

Title page

Copyright and Credits

Apache Kafka Quick Start Guide

Dedication

About Packt

Why subscribe?

Packt.com

Contributors

About the author

About the reviewer

Packt is searching for authors like you

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Conventions used

Get in touch

Reviews

Configuring Kafka

Kafka in a nutshell

Kafka installation

Kafka installation on Linux

Kafka installation on macOS

Confluent Platform installation

Running Kafka

Running Confluent Platform

Running Kafka brokers

Running Kafka topics

A command-line message producer

A command-line message consumer

Using kafkacat

Summary

Message Validation

Enterprise service bus in a nutshell

Event modeling

Setting up the project

Reading from Kafka

Writing to Kafka

Running the processing engine

Coding a validator in Java

Running the validation

Summary

Message Enrichment

Extracting the geographic location

Enriching the messages

Extracting the currency price

Enriching with currency price

Running the engine

Extracting the weather data

Summary

Serialization

Kioto, a Kafka IoT company

Project setup

The constants

HealthCheck message

Java PlainProducer

Running the PlainProducer

Java plain consumer

Java PlainProcessor

Running the PlainProcessor

Custom serializer

Java CustomProducer

Running the CustomProducer

Custom deserializer

Java custom consumer

Java custom processor

Running the custom processor

Summary

Schema Registry

Avro in a nutshell

Defining the schema

Starting the Schema Registry

Using the Schema Registry

Registering a new version of a schema under a – value subject

Registering a new version of a schema under a – key subject

Registering an existing schema into a new subject

Listing all subjects

Fetching a schema by its global unique ID

Listing all schema versions registered under the healthchecks–value subject

Fetching version 1 of the schema registered under the healthchecks-value subject

Deleting version 1 of the schema registered under the healthchecks-value subject

Deleting the most recently registered schema under the healthchecks-value subject

Deleting all the schema versions registered under the healthchecks–value subject

Checking whether a schema is already registered under the healthchecks–key subject

Testing schema compatibility against the latest schema under the healthchecks–value subject

Getting the top-level compatibility configuration

Globally updating the compatibility requirements

Updating the compatibility requirements under the healthchecks–value subject

Java AvroProducer

Running the AvroProducer

Java AvroConsumer

Java AvroProcessor

Running the AvroProcessor

Summary

Kafka Streams

Kafka Streams in a nutshell

Project setup

Java PlainStreamsProcessor

Running the PlainStreamsProcessor

Scaling out with Kafka Streams

Java CustomStreamsProcessor

Running the CustomStreamsProcessor

Java AvroStreamsProcessor

Running the AvroStreamsProcessor

Late event processing

Basic scenario

Late event generation

Running the EventProducer

Kafka Streams processor

Running the Streams processor

Stream processor analysis

Summary

KSQL

KSQL in a nutshell

Running KSQL

Using the KSQL CLI

Processing data with KSQL

Writing to a topic

Summary

Kafka Connect

Kafka Connect in a nutshell

Project setup

Spark Streaming processor

Reading Kafka from Spark

Data conversion

Data processing

Writing to Kafka from Spark

Running the SparkProcessor

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Preface

Since 2011, Kafka's been exploding in terms of growth. More than one third of Fortune 500 companies use Apache Kafka. These companies include travel companies, banks, insurance companies, and telecom companies.

Uber, Twitter, Netflix, Spotify, Blizzard, LinkedIn, Spotify, and PayPal process their messages with Apache Kafka every day.

Today, Apache Kafka is used to collect data, do real-time data analysis, and perform real-time data streaming. Kafka is also used to feed events to Complex Event Processing (CEP) architectures, is deployed in microservice architectures, and is implemented in Internet of Things (IoT) systems.

In the realm of streaming, there are several competitors to Kafka Streams, including Apache Spark, Apache Flink, Akka Streams, Apache Pulsar, and Apache Beam. They are all in competition to perform better than Kafka. However, Apache Kafka has one key advantage over them all: its ease of use. Kafka is easy to implement and maintain, and its learning curve is not very steep.

This book is a practical quick start guide. It is focused on showing practical examples and does not get involved in theoretical explanations or discussions of Kafka's architecture. This book is a compendium of hands-on recipes, solutions to everyday problems faced by those implementing Apache Kafka.

Who this book is for

This book is for data engineers, software developers, and data architects looking for a quick hands-on Kafka guide.

This guide is about programming; it is an introduction for those with no previous knowledge about Apache Kafka.

All the examples are written in Java 8; experience with Java 8 is the only requirement for following this guide.

What this book covers

Chapter 1, Configuring Kafka, explains the basics for getting started with Apache Kafka. It discusses how to install, configure, and run Kafka. It also discusses how to make basic operations with Kafka brokers and topics.

Chapter 2, Message Validation, explores how to program data validation for your enterprise service bus, covering how to filter messages from an input stream.

Chapter 3, Message Enrichment, looks at message enrichment, another important task for an enterprise service bus. Message enrichment is the process of incorporating additional information into the messages of your stream.

Chapter 4, Serialization, talks about how to build serializers and deserializers for writing, reading, or converting messages in binary, raw string, JSON, or AVRO formats.

Chapter 5, Schema Registry, covers how to validate, serialize, deserialize, and keep a history of versions of messages using the Kafka Schema Registry.

Chapter 6,Kafka Streams,explains how to obtain information about a group of messages – in other words, a message stream – and how to obtain additional information, such as that to do with the aggregation and composition of messages, using Kafka Streams.

Chapter 7, KSQL, talks about how to manipulate event streams without a single line of code using SQL over Kafka Streams.

Chapter 8, Kafka Connect, talks about other fast data processing tools and how to make a data processing pipeline with them in conjunction with Apache Kafka. Tools such as Apache Spark and Apache Beam are covered in this chapter.

To get the most out of this book

The reader should have some experience of programming with Java 8.

The minimum configuration required for executing the recipes in this book is an Intel ® Core i3 Processor, 4 GB of RAM, and 128 GB of disk space. Linux or macOS is recommended, as Windows is not fully supported.

Download the example code files

You can download the example code files for this book from your account at www.packt.com. If you purchased this book elsewhere, you can visit www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

Log in or register at

www.packt.com

.

Select the

SUPPORT

tab.

Click on

Code Downloads & Errata

.

Enter the name of the book in the

Search

box and follow the onscreen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

WinRAR/7-Zip for Windows

Zipeg/iZip/UnRarX for Mac

7-Zip/PeaZip for Linux

The code bundle for the book is also hosted on GitHub athttps://github.com/PacktPublishing/Apache-Kafka-Quick-Start-Guide. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available athttps://github.com/PacktPublishing/. Check them out!

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. Here is an example: "The --topic parameter sets the name of the topic; in this case, amazingTopic."

A block of code is set as follows:

{ "event": "CUSTOMER_CONSULTS_ETHPRICE", "customer": { "id": "14862768", "name": "Snowden, Edward", "ipAddress": "95.31.18.111" }, "currency": { "name": "ethereum", "price": "RUB" }, "timestamp": "2018-09-28T09:09:09Z"}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

dependencies { compile group: 'org.apache.kafka', name: 'kafka_2.12', version: '2.0.0'

compile group: 'com.maxmind.geoip', name: 'geoip-api', version: '1.3.1'

compile group: 'com.fasterxml.jackson.core', name: 'jackson-core', version: '2.9.7'}

Any command-line input or output is written as follows:

> <confluent-path>/bin/kafka-topics.sh --list --ZooKeeper localhost:2181

Bold: Indicates a new term, an important word, or words that you see onscreen. For example, words in menus or dialog boxes appear in the text like this. Here is an example: "To differentiate among them, the events on t1 have one stripe, the events on t2 have two stripes, and the events on t3 have three stripes."

Warnings or important notes appear like this.
Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at [email protected].

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book, we would be grateful if you would report this to us. Please visit www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at [email protected] with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.

Configuring Kafka

This chapter describes what Kafka is and the concepts related to this technology: brokers, topics, producers, and consumers. It also talks about how to build a simple producer and consumer from the command line, as well as how to install Confluent Platform. The information in this chapter is fundamental to the following chapters.

In this chapter, we will cover the following topics:

Kafka in a nutshell

Installing Kafka (Linux and macOS)

Installing the Confluent Platform

Running Kafka

Running Confluent Platform

Running Kafka brokers

Running Kafka topics

A command–line message producer

A command–line message consumer

Using kafkacat

Kafka in a nutshell

Apache Kafka is an open source streaming platform. If you are reading this book, maybe you already know that Kafka scales very well in a horizontal way without compromising speed and efficiency.

The Kafka core is written in Scala, and Kafka Streams and KSQL are written in Java. A Kafka server can run in several operating systems: Unix, Linux, macOS, and even Windows. As it usually runs in production on Linux servers, the examples in this book are designed to run on Linux environments. The examples in this book also consider bash environment usage.

This chapter explains how to install, configure, and run Kafka. As this is a Quick Start Guide, it does not cover Kafka's theoretical details. At the moment, it is appropriate to mention these three points:

Kafka is a

service bus

: To connect heterogeneous applications, we need to implement a message publication mechanism to send and receive messages among them. A message router is known as message broker. Kafka is a message broker, a solution to deal with routing messages among clients in a quick way.

Kafka architecture has two directives

: The first is to not block the producers (in order to deal with the back pressure). The second is to isolate producers and consumers. The producers should not know who their consumers are, hence Kafka follows the dumb broker and smart clients

model

.

Kafka is a real-time messaging system

:Moreover, Kafka is a software solution with a publish-subscribe model: open source, distributed, partitioned, replicated, and commit-log-based.

There are some concepts and nomenclature in Apache Kafka:

Cluster

: This is a set of Kafka brokers.

Zookeeper

:

This is a

cluster coordinator—a tool with different services that are part of the Apache ecosystem.

Broker

:

This is a

Kafka server, also the Kafka server process itself.

Topic

:

This is a

queue (that has log partitions); a broker can run several topics.

Offset

:

This is a

n identifier for each message.

Partition

:

This is a

n immutable and ordered sequence of records continually appended to a structured commit log.

Producer

: This is the program that publishes data to topics.

Consumer

: Th

is is the

program that processes data from the topics.

Retention period

:

Th

is is the t

ime to keep messages available for consumption.

In Kafka, there are three types of clusters:

Single node–single broker

Single node–multiple broker

Multiple node–multiple broker

In Kafka, there are three (and just three) ways to deliver messages:

Never redelivered

: The messages may be lost because, once delivered, they are not sent again.

May be redelivered

: The messages are never lost because, if it is not received, the message can be sent again.

Delivered once

: The message is delivered exactly once. This is the most difficult form of delivery; since the message is only sent once and never redelivered, it implies that there is zero loss of any message.

The message log can be compacted in two ways:

Coarse-grained

: Log compacted by time

Fine-grained

: Log compacted by message

Kafka installation

There are three ways to install a Kafka environment:

Downloading the executable files

Using

brew

(in macOS) or

yum

(in Linux)

Installing Confluent Platform

For all three ways, the first step is to install Java; we need Java 8. Download and install the latest JDK 8 from the Oracle's website:

http://www.oracle.com/technetwork/java/javase/downloads/index.html

At the time of writing, the latest Java 8 JDK version is 8u191.

For Linux users :

Change the file mode to executable as follows, follows these steps:

> chmod +x jdk-8u191-linux-x64.rpm

Go to the directory in which you want to install Java:

> cd <directory path>

Run the

rpm

installer with the following command:

> rpm -ivh jdk-8u191-linux-x64.rpm

Add to your environment the

JAVA_HOME

variable.

The following command writes the

JAVA_HOME

environment variable to the

/etc/profile

file:

> echo "export JAVA_HOME=/usr/java/jdk1.8.0_191" >> /etc/profile

Validate the Java installation as follows:

> java -version java version "1.8.0_191" Java(TM) SE Runtime Environment (build 1.8.0_191-b12) Java HotSpot(TM) 64-Bit Server VM (build 25.191-b12, mixed mode)

At the time of writing, the latest Scala version is 2.12.6. To install Scala in Linux, perform the following steps:

Download the latest Scala binary from

http://www.scala-lang.org/download

Extract the

downloaded file,

scala-2.12.6.tgz

, as follows:

> tar xzf scala-2.12.6.tgz

Add the

SCALA_HOME

variable

to your environment as follows:

> export SCALA_HOME=/opt/scala

Add the Scala bin directory to your

PATH

environment variable as follows:

> export PATH=$PATH:$SCALA_HOME/bin

To validate the Scala installation, do the following:

> scala -version Scala code runner version 2.12.6 -- Copyright 2002-2018, LAMP/EPFL and Lightbend, Inc.