Kafka Up and Running for Network DevOps - Eric Chou - E-Book

Kafka Up and Running for Network DevOps E-Book

Eric Chou

0,0
8,49 €

-100%
Sammeln Sie Punkte in unserem Gutscheinprogramm und kaufen Sie E-Books und Hörbücher mit bis zu 100% Rabatt.
Mehr erfahren.
Beschreibung

Today's network is about agility, automation, and continuous improvement. In Kafka Up and Running for Network DevOps, we will be on a journey to learn and set up the hugely popular Apache Kafka data messaging system. Kafka is unique in its principle to treat network data as a continuous flow of information that can adapt to the ever-changing business requirements. Whether you need a system to aggregate log messages, collect metrics, or something else, Kafka can be the reliable, highly redundant system you want.


We will begin by learning about the core concepts of Kafka, followed by detailed steps of setting up a Kafka system in a lab environment. For the production environment, we will take advantage of the various public cloud provider offerings. Next, we will set up our Kafka cluster in Amazon Managed Kafka Service to host our Kafka cluster in the AWS cloud. We will also learn about AWS Kinesis, Azure Event Hub, and Google Cloud Put/Sub. Finally, the book will illustrate several use cases of how to integrate Kafka with our network from data enhancement, monitoring, to an event-driven architecture.


The Network DevOps Series is a series of books targeted for the next generation of Network Engineers who wants to take advantage of the powerful tools and projects in modern software development and the open-source communities.

Das E-Book können Sie in Legimi-Apps oder einer beliebigen App lesen, die das folgende Format unterstützen:

EPUB

Seitenzahl: 142

Bewertungen
0,0
0
0
0
0
0
Mehr Informationen
Mehr Informationen
Legimi prüft nicht, ob Rezensionen von Nutzern stammen, die den betreffenden Titel tatsächlich gekauft oder gelesen/gehört haben. Wir entfernen aber gefälschte Rezensionen.



Kafka Up and Running for Network DevOps

Set Your Network Data in Motion

 

Eric Chou

 

This book is for sale at http://leanpub.com/network-devops-kafka-up-and-running

This version was published on 2021-11-12

*   *   *   *   *

This is a Leanpub book. Leanpub empowers authors and publishers with the Lean Publishing process. Lean Publishing is the act of publishing an in-progress ebook using lightweight tools and many iterations to get reader feedback, pivot until you have the right book and build traction once you do.

*   *   *   *   *

© 2021 Network Automation Nerds, LLC.

ISBN for EPUB version: 978-1-957046-01-3

ISBN for MOBI version: 978-1-957046-02-0

For my family, you are my ‘why’ for everything I do.

I would like to thank the open-source software community. My life would be very different without the many dedicated, talented individuals in the open-source community. Thank you all.

Table of Contents

Introduction

What is Kafka

Why do we need Kafka

Prerequisites for this book

Who this book is for

What this book covers

Download the example code files

Conventions used

Get in touch

Chapter 1. Kafka Introduction

History of Kafka

Kafka Use Cases

Disadvantages of Kafka

Kafka Concepts

Conclusion

Chapter 2. Kafka Installation and Testing

Network Lab Setup

Kafka Installation Overview

Install Java

Download Kafka

Configure Zookeeper

Configure Kafka

Start Zookper and Kafka manually

Test the Kafka operations

Configure System Services

Conclusion

Chapter 3. Kafka Concepts and Examples

Producers: Writing Messages

Consumers: Receiving Messages

Offsets in Action

Kafka Topic Administration

Replication

Conclusion

Chapter 4. Hosted Kafka Services

AWS Managed Kafka Service

Amazon MSK Costs

Launch Amazon MSK Cluster

Client Setup

Produce and Consume Data

Conclusion

Chapter 5. Cloud Provider Messaging Services

Amazon Kinesis

Amazon Kinesis Example

Azure Event Hub

Azure Event Hub Example

Google Cloud Pub/Sub

GCP Pub/Sub Python Example

Conclusion

Chapter 6. Network Operations with Kafka

Install Docker

Install Elasticsearch

Install Kibana

Network Data Feed

Network Data Pipeline

Network Log as a Service

Conclusion

Chapter 7. Other Kafka Considerations and Looking Ahead

Hardware Considerations

Kafka Broker and Topic Configurations

Schema Registry

Kafka Stream Processing

Cross-Cluster Data Mirroring

Additional Resources

Conclusion

Appendix A. Installing Lab Instance in Public Cloud

Guide

Begin Reading

Introduction

Welcome to the world of data!

Unless you have been living under a rock for the last few years, you know data processing, machine learning, and artificial intelligence are taking over the world. Data exists everywhere around us. We can now check real-time traffic information from online cameras before we even leave the house. We can connect to our thermometers remotely to automatically adjust house temperatures. Better yet, the thermometers can also be self-taught so that they can adjust the temperatures all by themselves. Before our family weekend movie nights, my kids love to leverage the WiFi-enabled lights to match the lighting with our mood.

How do these cameras, lights, and thermometers able to take measurements and generate data? It turns out the cost of small sensors and tiny computing units have been coming down steadily since the early days and now can be integrated into everyday items. However, the generated data by one or two devices might not be sufficient enough to yield meaningful results. After all, traffic information on one street might only benefit a tiny fraction of people who travels on that street, but aggregated traffic information on all streets can help everyone. Generally, it is by aggregating all disperse data sets across hundreds of devices; we are able to derive useful information that helps us with our daily lives. The data are constantly flowing between producers and consumers of data.

Have you ever wondered how these data are being exchanged between data producers and consumers? Does each of the devices provide an API (Application Programming Interface) to be queried? Do each of them have local databases that persist the data? What about data integrity, transmission latency, or scalability?

There are many tools and projects that address these data streaming and exchange issues. One of the most popular open-source tools widely used by companies large and small alike is Apache Kafka.

What is Kafka

You might be thinking, “Don’t we already have lots of data storage systems? Why do we need yet-another-storage-system?” You are right, and we do have lots of storage solutions such as relational and non-relational databases, cache systems, big data storage clusters, search solutions, and many more. But in most of the data storage cases, the data is entered in once, stored in the database, then retrieved later when needed. For example, when I visited my dentist for the first time, they asked for my personal information, entered them into a database so for my future visits, they could pull up my record. This is very different than the traffic sensor data example that we discussed.

What sets Kafka apart is it was built from the ground up to treat data as continuous flows of information that are constantly being produced, enhanced, manipulated, and consumed. Instead of a focus on holding in data like databases, key-value stores, search indexes, or caches, Kafka architects itself as a system that allows data to be a continually evolving stream of information.

According to the Apache Kafka project page:

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Companies known for a large amount of data, such as AirBnb, Datadog, Etsy, and many others across different industries, use Kafka to build their data pipeline. These data pipelines use a variety of services that both produce and consume data in a continuous format.

Figure Intro. 1: Powered by Apache Kafka (https://kafka.apache.org/powered-by)

Don’t worry if you have not heard of Kafka before or are not sure how, as network DevOps engineers, this tool can help us. We will go a lot deeper into Kafka in this book.

Why do we need Kafka

As a general overview, there are many uses cases for Kafka in network engineers:

We can use Kafka to stream data, such as logs and NetFlow data, once and be consumed by multiple receivers. Kafka takes care of the ordering of messages, acknowledging receipt to producers, delivery confirmation to consumers, and balancing the data between different recipients. We can separate data into logical partitions called Topics in a single Kafka cluster. This allows subscribers to only receive the data they are interested in, so the log receiver will not need to receive flow data. Kafka allows for an event-driven architecture, such as triggering events based on different types of events. For example, a log receiver can page an on-call engineer if it notices a BGP neighbor of a core device going down. Kafka allows us to build a centralized pipeline for network data processing instead of having dispersed teams process bits and pieces of data separately.

These are just some of the use cases of Kafka. By the end of this book, I am sure we will be able to find much more creative use cases.

Prerequisites for this book

Basic knowledge of Linux command line is required to make the most out of this book. We would use command-line tools such as using cd for changing directories, ls for listing directories contents, and pwd to know where in the directory tree you are currently operating from.

We will be using Python 3 as the programming language in this book. Python is a popular language amongst network engineers with a large ecosystem of tools and libraries. We will use Python to create Kafka producers, consumers and interface with public cloud providers. However, I do not believe you need to be an expert in Python 3 to understand the scripts in this book. If you need a refresher on Python, a good place to go would be the official Python Tutorial.

Who this book is for

This book is ideal for IT professionals and engineers who want to take advantage of Kafka’s distributed, fault-tolerant streaming data platform. This book can also be used by management to gain a general understanding of Kafka and how it fits into the general IT infrastructure.

What this book covers

Chapter 1. Kafka Introduction, In this chapter, we will cover the general concepts of Kafka. The core architecture, components, and tools. The idea behind Kafka, how it was built, and how the components can help maintain data streams at scale.

Chapter 2. Kafka Installation and Testing, In this chapter, we will install Zookeeper and Kafka on a single Virtual Machine and configure both components. We will also prepare our network lab to be used for future examples. After installation, we will work on a few producer-consumer examples using Kafka command-line tools.

Chapter 3. Kafka Concepts and Examples, In this chapter, we will provide examples of Kafka usage for Producers and Consumers. The producers will write messages to a Topic with consumers receiving the messages. We will look at examples of offset, commit, and acknowledgment for data in the topics.

Chapter 4. Hosted Kafka Services, When we want to move Kafka from our lab setup into production, we can use the Kafka-hosting-as-a-service provided by various cloud providers, such as Amazon AWS or Confluent Cloud. In this chapter, we will provide a step-by-step guide to launch our Kafka cluster using Amazon Managed Streaming for Apache Kafka.

Chapter 5. Cloud Providers Messaging Services, If we are not ready for a managed Kafka cluster, the top public cloud providers, Amazon AWS, Microsoft Azure, and Google Cloud, offer their adopted version of message streaming service. The messaging services have various degrees of Kafka compatibility. In this chapter, we will look at examples of AWS Kinesis, Azure Event Hub, and Google Pub/Sub.

Chapter 6. Network Operations with Kafka, In this chapter, we will explore examples of Kafa in network engineering. We will look at data feeds, data enhancement, and Kafka Connect. The Kafka Connect reuses code provided by the community. We will look at the File and Elasticsearch Kafka connect plugins.

Chapter 7. Other Kafka Considerations and Looking Ahead, In this chapter, we will discuss other Kafka considerations, such as hardware requirements, Broker and Topic configuration, Schema registry, and many more. This chapter will provide additional resources for readers to explore Kafka.

Download the example code files

The code examples used in this book can be downloaded from GitHub at https://github.com/ericchou1/network-devops-kafka-up-and-running.

Conventions used

There are a number of text conventions used in this book to help organize the flow. Information in bold and italic are used to indicate important or special terms.

Code blocks are shown below:

1 print('hello world')

Command-line input or output will be shown as follows:

1 $ touch my_script.py 2 3 $ ls / 4 bin cdrom etc lib lib64 lost+found mnt proc run snap sw\ 5 apfile tmp var 6 boot dev home lib32 libx32 media opt root sbin srv sy\ 7 s usr 8 9 $ python 10 Python 3.8.10 (default, Jun 22021, 10:49:15)11 [GCC 9.4.0] on linux 12 Type "help", "copyright", "credits" or "license"for more information. 13 >>> print('hello world')14 hello world 15 >>> exit()

Warning, tips, and information will be specified in their own special block:

This is a tip section. It will include useful tips and tricks in relation to the topic discussed at hand.

This is an information section. It will provide additional information to help you explore the topic further.

This is a warning blurb. Please pay special attention to this section when they appear, as they will contain important warnings.

Get in touch

Feedbacks from our readers is always welcome and appreciated. Please consider leaving a review on various platforms. They can really help others to discover the book.

All feedback can be submitted to [email protected].

Chapter 1. Kafka Introduction

As mentioned in the introduction section, Apache Kafka is a high-throughput, low-latency platform for handling real-time data feeds.

At first glance, ‘low-latency, high-throughput for real-time data feed’ might not look much. After all, every open-source project and commercial vendor (and their brother) can claim to be low-latency and high-throughput. But once you consider the type of companies using Kafka in their products and services, such as Uber, Netflix, LinkedIn, you quickly realize how significant that claim is. When we click on the like button on a LinkedIn post, it needs to appear on the post right away. That is low-latency. If we consider how many Netflix movies are streaming every second, that is high throughput. Of course, the customers of these companies expect all of the operations to take place in real-time.

According to Netflix, Kafka Inside Keystone Pipeline, “700 Billion messages are ingested on an average day” by their 400+ Kafka brokers. Did they say they process 700 Billion messages in a day in real-time? Or let’s also consider Uber’s use case, Real-Time Exactly-Once Ad Event Processing, of being a two-way marketplace for UberEats. In it, the message needs to be fast and reliable, but they also need to ensure the events are processed only once, with no overcount or undercount. The events need to be exactly once amongst all the consumers, full stop.

Kafka is excellent at how it can achieve its goals for these demanding projects. But how did this fantastic tool come about? First, let’s look into the history of Kafka.

History of Kafka

Kafka was originally developed at LinkedIn by Jay Kreps, Neha Narkhede, and Jun Rao (Wikipedia). As the story goes, Jay Kreps named the project Kafka because he likes the author Franz Kafka’s work. The author Franze Kafa has a ‘system optimzed for writing’ and Apache Kafka is also “a system optimzed for writing”.

The project was released as an open-sourced project with the Apache Software Foundation in early 2011 and went from incubation to top-level apache project on October 23, 2012. It is written in Java and Scala with significant community backing.

The three original developers left LinkedIn and found the company Confluent in 2014. The company aims to Set Data in Motion with (surprise!) Kafka is at the center of that idea. As a result, many of the Kafka-related projects, documentation, products, and initiatives are actively developed and sponsored by Confluent.

Kafka Use Cases

Within the Kafka architecture, at the center is the idea of event streaming. Software systems drive our world. These systems are interconnected, always-on, and automated. Kafka provides the centralized middle ground for these systems to exchange information, or events, in the form of topics (or categories). The producer systems can send events to a particular topic, while the consumer systems can receive these events via subscription.

We will use the term events and messages interchangeably in this book to refer to the data being exchanged by producers, consumers, and Kafka.

In the words of Kafka, event streaming is analogous to the central nervous system of the human body, which allows the connectivity of tissues between different parts of the body.

In terms of network engineering, in my opinion, can use Kafka event streaming in a few different scenarios:

We can use Kafka to process transactions in real-time, such as device provisioning from warehouse shipment to fully functional in a data center. We can use Kafka to implement an event-driven architecture. Kafka can be used to track and analyze changes in network events, such as BGP neighbor relationships or interface flapping. We can use Kafka to capture and analyze IoT and wireless sensor data continuously. This process can be done in a distributed fashion, with Kafka servers across different regions. We can use Kafka to connect, store, and make available data produced by a single source to multiple destinations. An example would be to store a single set of network SNMP data in a Kafka topic, which multiple monitoring systems can consume. This allows us only to poll the network device once and reduce CPU and network overhead.

If we combine the above use cases, Kafka allows us to:

Continuously capture eventsConnect different parts of the system Immediate react to a change in system stateMinimizing the impact on the network devices

We will look at some of the disadvantages of Kafka in the next section.

Disadvantages of Kafka

If Kafka is so great, why doesn’t everybody use Kafka? Of course, no system can be perfect. Like many, if not all, system design approaches, the design of Kafa is a story of tradeoffs. What are some of the disadvantages of Kafka? Let’s take a look at a few of them:

Kafka clusters can be complex and hard to set up. Managing a Kafka can have a high learning curve.