Data Engineering·5 minutes read

Kafka Basics and ZooKeeper vs KRaft

Let’s walk through the basics: what Kafka is, how its core pieces fit together, and what this whole "ZooKeeper vs KRaft" thing means. No deep dives into distributed systems theory required. Just enough to give you a solid foundation, especially if you’re a developer still getting your feet wet with backend architecture.

Linas Kapočius

Solutions Architect at Corgineering.com

April 8, 2025

Apache Kafka is a distributed event streaming platform. Which is a fancy way of saying: it helps systems send, receive, and store streams of data in real time.

Imagine your app is a kitchen. Orders (data) are coming in nonstop. You want to keep track of them, process them efficiently, and make sure nothing gets lost. Kafka is like the conveyor belt system that moves those orders from the front counter to the kitchen and beyond, while also logging everything that happened for later review.

You’ll typically hear about:

Producers – These are the apps or services sending data into Kafka. (Like the cashier punching in orders.)
Topics – Think of these like channels or categories. A producer sends data into a topic.
Brokers – These are the Kafka servers doing the heavy lifting. They store the data and handle requests.
Consumers – These are the apps or services reading data from Kafka. (Like the kitchen staff picking up new orders.)

All of this makes Kafka really good at decoupling systems. Instead of one service calling another directly, they communicate through Kafka. That way, services can scale independently, go down without chaos, or evolve without breaking everything.

What’s So Special About Kafka?

A few things make Kafka stand out:

It can handle a ton of data, really fast.
It keeps data persisted for a while, so consumers can re-read it if needed.
It’s designed to be distributed, which means it runs across multiple servers and scales horizontally.
You can stream data to multiple sinks (consumers) and from multiple sources (producers)

This makes it popular for everything from logging systems to event-driven microservices, data pipelines, user tracking, and real-time analytics.

Okay, So Where Does ZooKeeper Fit In?

Historically, Kafka versions prior to v4.0.0 used Apache ZooKeeper to keep track of metadata: which brokers exist, who’s the cluster controller, how partitions are assigned, and so on.

Think of ZooKeeper as the metadata brain. Kafka asks it stuff like, “Who’s the leader of this topic partition?” or “What’s the current cluster state?” It works, but it means you’re running an entirely separate system just to manage your main system.

Which brings us to...

KRaft: Kafka Without ZooKeeper

KRaft (short for Kafka Raft) is Kafka’s new built-in consensus system. It does everything ZooKeeper used to do, but within Kafka itself. That means fewer moving parts and a more streamlined setup.

Instead of depending on ZooKeeper, Kafka uses its own internal mechanism (based on the Raft consensus algorithm) to handle cluster metadata. It’s more modern, scalable, and easier to manage—especially for newcomers.

Feature	ZooKeeper	KRaft (Kafka + Raft)
External service	Yes (ZooKeeper)	No, all inside Kafka
Setup complexity	Higher	Lower
Performance	Good	Even better
Official Future	Being phased out	The future of Kafka

What I like the most about this setup that now it is possible to define kafka as one microservice in the docker compose file.

services:
  kafka:
    image: apache/kafka:4.0.0
    container_name: kafka
    ports:
      - "9092:9092"
    restart: always
    environment:
      KAFKA_NODE_ID: 1
      KAFKA_PROCESS_ROLES: broker,controller
      KAFKA_LISTENERS: PLAINTEXT://0.0.0.0:9092,CONTROLLER://0.0.0.0:9093
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://localhost:9092
      KAFKA_CONTROLLER_LISTENER_NAMES: CONTROLLER
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: CONTROLLER:PLAINTEXT,PLAINTEXT:PLAINTEXT
      KAFKA_CONTROLLER_QUORUM_VOTERS: 1@localhost:9093
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_REPLICATION_FACTOR: 1
      KAFKA_TRANSACTION_STATE_LOG_MIN_ISR: 1
      KAFKA_GROUP_INITIAL_REBALANCE_DELAY_MS: 0
      KAFKA_NUM_PARTITIONS: 3
    volumes:
      - ./kafka-data:/tmp/kraft-combined-logs

Final Thoughts: Kafka’s Not That Scary

Kafka can feel intimidating at first, especially when you’re getting hit with terms like “replication factor” or “consumer offset” before you even understand what a broker is.

But once you see it as a way to pass messages between systems—safely, reliably, and at scale—it starts to click. And if you're just getting into it, you’re in a good spot. The tooling is better than it used to be, the community is strong, and the move from ZooKeeper to KRaft is making things simpler.

Whether you're experimenting locally or deploying Kafka in a real system, just remember: nobody was born knowing this stuff. It’s okay to learn one piece at a time.

This article is part of our Data Engineering series. Check out our other articles.