Jek’s Kafka Get Started Notes

CHOO Jek Bao
3 min readJul 1, 2022

Terminology

Message →Batch

Message is a unit of data in key value pair.

Batch is a collection of messages.

Topic → Partition

Topic is a category or feed name to which records/message are published. This is like a database table.

Partition is topics that are broken up into ordered commit logs.

Producer → Consumer

Producer is an application that publish messages to a topic. Producer pushes message.

Consumer is an application that subscribe to a topic and consume the messages. Consumer pulls messages.

Broker → Cluster

Broker is a single Kafka server; hence, broker is also known as Kafka server.

Cluster is a set of Zookeeper servers/brokers.

Zookeeper

Zookeeper is for managing Kafka cluster.

Monitoring

  • Producer rate
  • Consumer rate
  • Current lag — diff. between last message produced and the last message processed by consumer
  • Offset information — understanding of queues growth

To access Kafka metrics, we can use Java Management Extensions (JMX) interface. The options are:

  1. Use a collection agent which may be a separate process that runs on the system and connects to the JMX interface. e.g. of collection agents are jmxtrans or others.
  2. Use JMX agent that runs directly in the Kafka process to access metrics via an HTTP connection. e.g. of JMX agents are Jolokia or MX4J

Metrics

Broker-level Metrics

Topic-level Metrics

Partition-level Metrics

Install Apache Kafka

$  sudo apt update$  sudo apt install openjdk-11-jdk$ java -version

Install Java and verify Java is installed.

$ curl <download url of kafka...tgz> -o kafka.tgz

Download the distribution file as output to kafka.tgz. The download url can be found here.

$ mkdir kafka$ cd kafka$ tar -xvzf ~/Downloads/kafka.tgz --strip 1

Make directory, go into directory, and extract downloaded content into the folder by stripping one level down. Because the first level is just version folder.

Start Apache Zookeeper server

$ bin/zookeeper-server-start.sh configookeeper.properties

Start Zookeeper server on localhost:2181 (default).

Start Apache Kafka server

$ bin/kafka-server-start.sh config/server.properties

Start Kafka server running on localhost:9092 (default).

Create Kafka Topic

$ bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --topic testtopic --partitions 3 --replication-factor 1

Create a Kafka topic with:

  • --create
  • --bootstrap-server (i.e. Kafka broker) or zookeeper (i.e. cluster) command: value of localhost:9092 or localhost:2181 if we use default settings.
  • --topic [string]
  • --partitions [number]
  • --replication-factor [number]

A decent reference example.

$ bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic testtopic

Describe the topic after successful creation to confirm.

$ bin/kafka-topics.sh --list --bootstrap-server localhost:9092

List all topics.

Push messages using Kafka Console Producer

$ bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic testtopic> abc
> xyz
>

Connect to Kafka Console Producer to begin pushing message e.g. abc, xyz, etc…

Pull messages using Kafka Console Consumer

$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testtopic

Connect to Kafka Console Consumer to pull messages from the ending.

$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testtopic --from-beginning

Connect to Kafka Console Consumer to pull messages from the beginning.

Note: Kafka does not store all messages forever and after specific amount of time (or when size of the log exceeds configured max size) messages are deleted. Default log retention period is 7 days (168 hours).

Message Structure

  • Timestamp
  • Offset number (unique across partition)
  • Key (optional)
  • Value (sequence of bytes)

Start Zookeeper Server and Kafka Server in MacOS (installed via Homebrew)

$ /usr/local/bin/zkServer start$ /usr/local/bin/kafka-server-start /usr/local/etc/kafka/server.properties$ zookeeper-shell localhost:2181 ls /brokers/ids

Start Zookeeper server. Next, start Kafka server. Finally, check that broker id are available.

  • Binaries and scripts will be in /usr/local/bin.
  • Kafka configurations will be in /usr/local/etc/kafka.
  • ZooKeeper configurations will be in /usr/local/etc/zookeeper.
  • The log.dirs configuration (the location for Kafka data) will be set to /usr/local/var/lib/kafka-logs.

From ref.

Stop Zookeeper Server in MacOS (installed via Homebrew)

$ /usr/local/bin/zkServer stop

Stop zookeeper server.

Java Example

Node.js Example

Python Example

--

--

CHOO Jek Bao

Love writing my thoughts, reading biographies, and meeting like-minded friends to talk on B2B software sales, engineering & cloud solution architecture.