Jek’s Kafka Get Started Notes
Terminology
Message →Batch
Message is a unit of data in key value pair.
Batch is a collection of messages.
Topic → Partition
Topic is a category or feed name to which records/message are published. This is like a database table.
Partition is topics that are broken up into ordered commit logs.
Producer → Consumer
Producer is an application that publish messages to a topic. Producer pushes message.
Consumer is an application that subscribe to a topic and consume the messages. Consumer pulls messages.
Broker → Cluster
Broker is a single Kafka server; hence, broker is also known as Kafka server.
Cluster is a set of Zookeeper servers/brokers.
Zookeeper
Zookeeper is for managing Kafka cluster.
Monitoring
- Producer rate
- Consumer rate
- Current lag — diff. between last message produced and the last message processed by consumer
- Offset information — understanding of queues growth
To access Kafka metrics, we can use Java Management Extensions (JMX) interface. The options are:
- Use a collection agent which may be a separate process that runs on the system and connects to the JMX interface. e.g. of collection agents are jmxtrans or others.
- Use JMX agent that runs directly in the Kafka process to access metrics via an HTTP connection. e.g. of JMX agents are Jolokia or MX4J
Metrics
Broker-level Metrics
…
Topic-level Metrics
…
Partition-level Metrics
…
Install Apache Kafka
$ sudo apt update$ sudo apt install openjdk-11-jdk$ java -version
Install Java and verify Java is installed.
$ curl <download url of kafka...tgz> -o kafka.tgz
Download the distribution file as output to kafka.tgz. The download url can be found here.
$ mkdir kafka$ cd kafka$ tar -xvzf ~/Downloads/kafka.tgz --strip 1
Make directory, go into directory, and extract downloaded content into the folder by stripping one level down. Because the first level is just version folder.
Start Apache Zookeeper server
$ bin/zookeeper-server-start.sh configookeeper.properties
Start Zookeeper server on localhost:2181 (default).
Start Apache Kafka server
$ bin/kafka-server-start.sh config/server.properties
Start Kafka server running on localhost:9092 (default).
Create Kafka Topic
$ bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --topic testtopic --partitions 3 --replication-factor 1
Create a Kafka topic with:
- --create
- --bootstrap-server (i.e. Kafka broker) or zookeeper (i.e. cluster) command: value of localhost:9092 or localhost:2181 if we use default settings.
- --topic [string]
- --partitions [number]
- --replication-factor [number]
A decent reference example.
$ bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic testtopic
Describe the topic after successful creation to confirm.
$ bin/kafka-topics.sh --list --bootstrap-server localhost:9092
List all topics.
Push messages using Kafka Console Producer
$ bin/kafka-console-producer.sh --bootstrap-server localhost:9092 --topic testtopic> abc
> xyz
>
Connect to Kafka Console Producer to begin pushing message e.g. abc, xyz, etc…
Pull messages using Kafka Console Consumer
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testtopic
Connect to Kafka Console Consumer to pull messages from the ending.
$ bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic testtopic --from-beginning
Connect to Kafka Console Consumer to pull messages from the beginning.
Note: Kafka does not store all messages forever and after specific amount of time (or when size of the log exceeds configured max size) messages are deleted. Default log retention period is 7 days (168 hours).
Message Structure
- Timestamp
- Offset number (unique across partition)
- Key (optional)
- Value (sequence of bytes)
Start Zookeeper Server and Kafka Server in MacOS (installed via Homebrew)
$ /usr/local/bin/zkServer start$ /usr/local/bin/kafka-server-start /usr/local/etc/kafka/server.properties$ zookeeper-shell localhost:2181 ls /brokers/ids
Start Zookeeper server. Next, start Kafka server. Finally, check that broker id are available.
- Binaries and scripts will be in /usr/local/bin.
- Kafka configurations will be in /usr/local/etc/kafka.
- ZooKeeper configurations will be in /usr/local/etc/zookeeper.
- The
log.dirs
configuration (the location for Kafka data) will be set to /usr/local/var/lib/kafka-logs.
From ref.
Stop Zookeeper Server in MacOS (installed via Homebrew)
$ /usr/local/bin/zkServer stop
Stop zookeeper server.
Java Example
…
Node.js Example
…
Python Example
…