What Problem Does Kafka Solve?

April 06, 2026

What Problem Does Kafka Solve?

Imagine you're building a ride-sharing app (like Uber).

Without Kafka:

User requests ride → goes to matching service
Matching service calls payment service
Payment service calls notification service
Notification service calls analytics service
If any service is slow or down, the entire chain breaks

This is tightly coupled. One failure cascades.

With Kafka:

User requests ride → publishes "ride_requested" event to Kafka
Matching service listens and responds
Payment service listens independently
Notification service listens independently
Analytics service listens independently
If one service is down, events wait in Kafka. No cascade failure.

That's Kafka. It decouples systems, handles massive scale, and guarantees messages aren't lost.

This post covers:

What Kafka is (and isn't)
How it works internally
Producers & consumers (with code)
Core concepts (topics, partitions, consumer groups)
Why Netflix, LinkedIn, Uber built on Kafka
How startups should think about it

Let's dive in.

Part 1: What is Kafka? (And What It Isn't)

Simple Definition

Kafka is a distributed, append-only log.

You write events to it. Multiple applications read from it. Events persist for a configurable time.

┌─────────────────────────────────────────────┐

│           KAFKA CLUSTER                      │
│  ┌──────────────────────────────────────┐  │
│  │ Topic: "user_events"                 │  │
│  │ [event1, event2, event3, event4 ...] │  │
│  └──────────────────────────────────────┘  │
│                                             │
│  ┌──────────────────────────────────────┐  │
│  │ Topic: "payments"                    │  │
│  │ [payment1, payment2, payment3 ...]   │  │
│  └──────────────────────────────────────┘  │
└─────────────────────────────────────────────┘
        ↑                          ↓
   Producers             Consumers
   (Write)              (Read)

What Kafka IS:

✅ Event streaming platform: Publish events, consume elsewhere ✅ Distributed: Runs across multiple machines ✅ Fast: Handles millions of messages/second ✅ Durable: Events are persisted (won't lose data) ✅ Decoupled: Producers don't know consumers exist ✅ Replayable: Can re-read old events ✅ Scalable: Add partitions and brokers for more capacity

What Kafka ISN'T:

❌ Not a database: No query language, not for random access ❌ Not a message queue (mostly): RabbitMQ is for job queues, Kafka is for events ❌ Not real-time (technically): It's fast, but not zero-latency ❌ Not a replacement for everything: Use the right tool for the job

Kafka vs Other Tools

Tool	Purpose	Use Case
Kafka	Event streaming	Uber detecting surge, Netflix tracking views
RabbitMQ	Task queue	Processing background jobs
Redis Streams	In-memory events	Lightweight, single-machine
AWS SQS	Managed queue	Simple AWS-native needs
Google Pub/Sub	Managed events	GCP-native workloads
Database CDC	Change capture	Syncing data between systems

Part 2: Why Does Kafka Exist? (The History)

The LinkedIn Problem (2010)

LinkedIn was growing fast. Their data infrastructure was a mess:

┌─────────────┐ ┌─────────────┐ ┌──────────────┐

│   Website   │─────▶│  Analytics  │─────▶│  Data Lake   │
└─────────────┘      └─────────────┘      └──────────────┘
       │
       ├─────────────▶ Search Index
       │
       ├─────────────▶ Recommendations
       │
       └─────────────▶ Notifications

Every system had its own data pipeline. Expensive. Fragile. Duplicated logic.

Solution: Build a central "event log" that all systems subscribe to.

Jay Kreps, Neha Narkhede, and Jun Rao created Kafka (named after Franz Kafka, the writer).

Released open source in 2011. Changed the industry.

Why It Worked

Simplicity: Just an append-only log
Scalability: Could handle LinkedIn's 1 billion+ events/day
Durability: Events persisted to disk
Decoupling: Producers and consumers independent
Replayability: Could re-process events

Now used by Netflix, Uber, Stripe, Twitter, Airbnb, Dropbox, etc.

Part 3: Kafka Architecture (How It Works)

The Mental Model

Think of Kafka like a distributed newspaper:

Topics = Sections (Sports, Tech, Politics)
Partitions = Copies of each section (for speed)
Messages = Individual articles
Brokers = Printing presses
Producers = Journalists (write articles)
Consumers = Readers (read articles)

Core Components

1. Brokers

Kafka brokers are servers that store and serve messages.

─────────────────────────────────────────────┐

│         KAFKA CLUSTER                       │
│                                             │
│  ┌──────────────┐  ┌──────────────┐       │
│  │   Broker 1   │  │   Broker 2   │       │
│  │ (Partition 0)│  │ (Partition 1)│  ...  │
│  │ (Partition 2)│  │ (Partition 3)│       │
│  └──────────────┘  └──────────────┘       │
│                                             │
└─────────────────────────────────────────────┘

A typical Kafka cluster has 3-5 brokers (for redundancy).

2. Topics

A topic is a stream of events. Like a channel.

Topic: "user_events"

[user_signup, user_login, user_logout, user_logout, user_signup, ...]

3. Partitions

Each topic is split into partitions for parallelism.

Topic: "user_events"

├── Partition 0: [event1, event2, event3]
├── Partition 1: [event4, event5, event6]
└── Partition 2: [event7, event8, event9]

Why partition?

Multiple readers can read from different partitions in parallel
More throughput
Distributed storage

How does Kafka decide which partition?

By key (if provided):

Key: "user_123" → Always goes to same partition

Key: "user_456" → Goes to different partition

This ensures all events for one user are ordered (but events across users can be out of order).

4. Replicas

Each partition is replicated across brokers for durability.

Partition 0: [event1, event2, event3]

├── Leader (Broker 1) - receives writes, serves reads
├── Replica (Broker 2) - copy
└── Replica (Broker 3) - copy

If Broker 1 goes down, Broker 2 takes over as leader. No data loss.

5. Consumer Groups

Multiple consumers can work together as a group.

Consumer Group: "analytics_team"

├── Consumer 1 reads Partition 0
├── Consumer 2 reads Partition 1
└── Consumer 3 reads Partition 2

Each partition is read by ONE consumer in a group (but multiple groups can read same partition).

Part 4: Internal Mechanics (How Kafka Really Works)

How a Message Gets Written

Producer Code:

  producer.send("user_events", key="user_123", value={"action": "login"})
     ↓
1. Producer batches messages (for efficiency)
     ↓
2. Partitioner decides which partition (based on key)
     ↓
3. Message goes to Partition 0, Broker 1 (leader)
     ↓
4. Leader writes to disk (append-only log)
     ↓
5. Replicates to Broker 2 and Broker 3 (asynchronously)
     ↓
6. Producer gets acknowledgment
     ↓
7. Message is immediately available to consumers

Replication: The Key to Durability

Kafka uses log replication for redundancy:

Broker 1 (Leader) Broker 2 (Replica) Broker 3 (Replica)

[msg1, msg2, msg3]   [msg1, msg2, msg3]    [msg1, msg2, msg3]
      ↑                    ↓                      ↓
  Receives writes    Replicates from       Replicates from
                     Leader                Leader

In-Sync Replicas (ISR):

Replicas that are caught up with leader
Only ISR replicas can become new leader
Ensures no data loss

Replication Factor:

Usually 3 (leader + 2 replicas)
If leader dies, one of the 2 replicas becomes new leader
Tolerance: Can lose 1 broker with 3x replication

Leader Election

When a broker dies, how does a new leader get elected?

Zookeeper (or KRaft in newer versions) tracks broker health
Detects leader is down
Elects new leader from ISR
Updates metadata
Consumers/producers get new leader info
Traffic redirects to new leader

All automatic. Happens in seconds.

Offset: How Consumers Track Position

Consumers need to know which messages they've read.

Offset: Position in the partition.

Partition 0: [msg0, msg1, msg2, msg3, msg4]

              0     1     2     3     4      ← Offset

Consumer starts reading:
- Offset 0: reads msg0
- Offset 1: reads msg1
- ...
- Offset 4: reads msg4

Consumer stores: "I've read up to offset 4"
If consumer crashes and restarts: Start from offset 4 (resume where left off)

This is stored in a special topic called __consumer_offsets.

Kafka vs Alternatives

Feature	Kafka	RabbitMQ	AWS SQS	Redis
Throughput	Very High	High	High	Very High
Durability	Excellent	Good	Excellent	Good
Ordering	Per-partition	FIFO (queues)	FIFO	N/A
Replayability	Yes	No	No	Limited
Distributed	Yes	No	Managed	No
Complexity	High	Low	Low	Low
Cost	Infrastructure	Simpler	Pay per msg	Infrastructure

Choose Kafka if:

High throughput (1000+ events/sec)
Need event replay
Multiple subscribers needed
Distributed required

Choose RabbitMQ if:

Job queuing
Complex routing
Simple, single-server ok

Search This Blog

Genuinely My Thoughts

Featured

Solving Economic Crisis Without Work-From-Home: A Systems Approach to Resource Prioritization