Skip to main content

Command Palette

Search for a command to run...

Kafka explained : The easy way

Published
5 min read

Well, this is always a hefty topic to understand—but the core idea behind it isn’t that complex. In fact, once you strip away the jargon, it’s surprisingly intuitive.

Kafka acts as a broker for messages, which is why it’s classified as a distributed event streaming platform (often loosely called a message broker, though it’s more powerful than traditional ones).

Idea

The core idea behind Kafka was simple:

Prevent servers from collapsing under massive load by decoupling data producers and consumers.

Instead of services directly talking to each other (tight coupling), Kafka introduces a middle layer that stores and distributes events efficiently.

It was engineered by LinkedIn in 2010 to handle high-throughput, real-time data ingestion and event processing—because existing systems couldn’t handle their scale.

It was developed by Jay Kreps, Jun Rao, and Neha Narkhede, open-sourced in 2011, and became a top-level Apache Software Foundation project in 2012 as Apache Kafka.

Kafka eventually became the de facto standard for high-throughput, low-latency event streaming—powering logs, analytics pipelines, microservices communication, and real-time systems globally.

But Why Kafka Was Needed?

Before Kafka, systems looked like this:

  • Service A → directly calls Service B

  • Service B → directly calls Service C

Problems:

  • Tight coupling

  • System crashes under load

  • No buffering mechanism

  • Hard to scale

Kafka solves this by acting as a central pipeline:

Producers → Kafka → Consumers

Now:

  • Producers just send data

  • Consumers read when ready

  • System becomes asynchronous and resilient

Core Principles of Kafka

When Kafka was designed, it relied on a few foundational assumptions:

1. High Throughput Over Low Latency

Kafka is optimised to handle millions of messages per second efficiently.

2. Sequential Disk Writes

Instead of random writes, Kafka appends messages to logs:

  • Faster disk I/O

  • Better performance

3. Immutable Logs

Once written, messages:

  • Are not modified

  • Are only appended and read

4. Pull-Based Consumption

Consumers:

  • Pull data at their own pace

  • Avoid overload

5. Partition-Based Scaling

Topics are divided into partitions:

  • Enables parallel processing

  • Horizontal scalability

Kafka Core Components

1. Producer

  • Sends messages to Kafka

  • Example: a backend service sending user activity

2. Broker

  • Kafka server that stores data

  • Handles read/write requests

3. Topic

  • Logical category of messages

  • Example: user-signups, payments

4. Partition

  • Subdivision of a topic

  • Enables scaling and parallelism

5. Consumer

  • Reads messages from Kafka

6. Consumer Group

  • Multiple consumers working together

  • Each partition is consumed by only one consumer in a group

How Kafka Works (Simple Flow)

  1. Producer sends message to a topic

  2. Kafka stores it in a partition (append-only log)

  3. Consumer reads from that partition using an offset

  4. Message stays in Kafka for a configured retention period

  • Handles real-time data streams

  • Fault-tolerant and distributed

  • Scales horizontally

  • Decouples services

  • Works as a backbone for event-driven architecture

Real-World Use Cases

  • Logging systems

  • Real-time analytics

  • Event-driven microservices

  • Fraud detection systems

  • Streaming pipelines (ETL)

The Problem: Kafka at Extreme Scale

Kafka works incredibly well—but 15 years later, its original assumptions are being pushed to the limit at LinkedIn-scale systems.

We’re talking about:

  • Trillions of daily messages

  • Multi-region deployments

  • Terabytes of metadata

  • Highly dynamic workloads needing auto-rebalancing

At this scale, new challenges emerge:

1. Metadata Bottlenecks

Kafka relies heavily on cluster metadata:

  • Becomes large and complex

  • Hard to manage efficiently

2. Rebalancing Issues

When consumers join/leave:

  • Kafka needs to rebalance partitions

  • Causes latency spikes

3. Operational Complexity

Running Kafka clusters at massive scale:

  • Requires heavy tuning

  • Complex infrastructure management

4. Multi-Region Limitations

Cross-region replication:

  • Adds latency

  • Hard to maintain consistency

LinkedIn’s Shift: NorthGuard

To address these challenges, LinkedIn is rethinking event streaming systems entirely with a new system called NorthGuard.

What is NorthGuard?

NorthGuard is LinkedIn’s next-generation event streaming architecture designed to:

  • Handle extreme scale more efficiently

  • Reduce operational overhead

  • Improve elasticity and rebalancing

  • Better support multi-region systems

What’s Changing?

1. Dynamic Scaling

Kafka assumes relatively stable workloads.

NorthGuard:

  • Adapts dynamically

  • Handles fluctuating traffic seamlessly

2. Improved Metadata Management

Instead of massive centralized metadata:

  • More efficient distribution

  • Better scalability

3. Faster Rebalancing

Kafka rebalancing is expensive.

NorthGuard:

  • Aims for near-zero disruption

  • Faster partition movement

4. Cloud-Native Thinking

Kafka was designed pre-cloud era.

NorthGuard:

  • Built with modern distributed systems + cloud infra in mind

Key Takeaway

Kafka is still incredibly powerful and widely used—but:

At extreme scale, even great systems need evolution.

Kafka solved:

  • Decoupling

  • High-throughput streaming

  • Fault tolerance

NorthGuard is solving:

  • Elastic scaling

  • Massive metadata

  • Global distribution challenges

4 views