Getting Started With Apache Kafka

Table of Contents

Introduction

Why Kafka

Real World Examples

Quick Set Up

Reference

Introduction

This article will help readers understand what Kafka is, why it is used, and how it works. Apache Kafka, an open-source stream-processing software platform developed by the Apache Software Foundation, written in Scala and Java. Designed with the fundamental premise of enabling high-throughput, fault-tolerant, publish-subscribe messaging systems, Kafka has become an essential tool in the handling of real-time data feeds.

Main Concepts and Terminology

image

The above diagram shows the Kafka cluster architecture. The elements of the Kafka cluster architecture can be explained in the following way:

image

The above diagram illustrates the basic components and their interactions within a Kafka system:

Why Kafka

Apache Kafka is a powerful tool in data processing and streaming, favored for its ability to handle high volumes of data with high throughput and low latency. Its distributed architecture ensures scalability and fault tolerance, making it reliable for critical applications. Kafka facilitates real-time data processing with its stream processing capabilities and is versatile in handling various data formats. Additionally, its integration with a wide range of systems and strong community support make it a go-to choice for complex data architectures in various applications.

Real World Examples

This section lists only a few common use cases with real world examples. For more Apache Kafka use cases, you can check this link: Use Cases.

  1. Real-Time Data Processing:

    Uber uses Kafka for processing real-time data from its large amount of drivers and riders. This helps in tracking trips, optimizing routes, and managing supply and demand dynamically.

  2. Log Aggregation:

    LinkedIn uses Kafka for log aggregation. It helps in collecting and processing logs from various services for monitoring, troubleshooting, and performance analysis.

  3. Stream Processing:

    Netflix uses Kafka Streams for real-time stream processing to provide personalized viewing recommendations and to analyze viewing patterns.

  4. Messaging:

    Cisco uses Kafka as a message broker in their networking systems for efficiently processing network telemetry data and enabling asynchronous communication between different services.

Quick Set Up

Step 1: Download Kafka

Download the latest Kafka release and extract it:

$ tar -xzf kafka_2.13-3.6.0.tgz
$ cd kafka_2.13-3.6.0

Step 2: Set Up the Environment

NOTE: Your local environment must have Java 8+ installed.

Run the following commands in order to start all services in the correct order:

# Start the ZooKeeper service
$ bin/zookeeper-server-start.sh config/zookeeper.properties

Open another terminal session and run:

# Start the Kafka broker service
$ bin/kafka-server-start.sh config/server.properties

Once all services have successfully launched, you will have a basic Kafka environment running and ready to use.

Step 3: Create a Topic

Before you can write your first data, you must create a topic. Open another terminal session and run:

$ bin/kafka-topics.sh --create --topic quickstart-events --bootstrap-server localhost:9092

All of Kafka’s command line tools have additional options: run the kafka-topics.sh command without any arguments to display usage information.

Step 3: Write Data into the Topic

Run the console producer client to write a few data into your topic. By default, each line you enter will result in a separate event being written to the topic.

$ bin/kafka-console-producer.sh --topic quickstart-events --bootstrap-server localhost:9092
This is my first event
This is my second event

You can stop the producer client with Ctrl-C at any time.

Step 4: Read the Data

Open another terminal session and run the console consumer client to read the events you just created:

$ bin/kafka-console-consumer.sh --topic quickstart-events --from-beginning --bootstrap-server localhost:9092
This is my first event
This is my second event

You can stop the consumer client with Ctrl-C at any time.

Step 5: Terminate the Kafka Enviroment

Stop the producer and consumer clients with Ctrl-C, if you haven’t done so already.

Stop the Kafka broker with Ctrl-C.

Lastly, if the Kafka with ZooKeeper section was followed, stop the ZooKeeper server with Ctrl-C.

If you also want to delete any data of your local Kafka environment including any events you have created along the way, run the command:

$ rm -rf /tmp/kafka-logs /tmp/zookeeper /tmp/kraft-combined-logs

Reference

If you want to know more detailed information, you could check Apache Kafka’s offical website and its offical documentation.