Search This Blog
Exploring the Wonders of Science, Technology, and Human Potential
Featured
- Get link
- X
- Other Apps
What Problem Does Kafka Solve?
Imagine you're building a ride-sharing app (like Uber).
Without Kafka:
- User requests ride → goes to matching service
- Matching service calls payment service
- Payment service calls notification service
- Notification service calls analytics service
- If any service is slow or down, the entire chain breaks
This is tightly coupled. One failure cascades.
With Kafka:
- User requests ride → publishes "ride_requested" event to Kafka
- Matching service listens and responds
- Payment service listens independently
- Notification service listens independently
- Analytics service listens independently
- If one service is down, events wait in Kafka. No cascade failure.
That's Kafka. It decouples systems, handles massive scale, and guarantees messages aren't lost.
This post covers:
- What Kafka is (and isn't)
- How it works internally
- Producers & consumers (with code)
- Core concepts (topics, partitions, consumer groups)
- Why Netflix, LinkedIn, Uber built on Kafka
- How startups should think about it
Let's dive in.
Part 1: What is Kafka? (And What It Isn't)
Simple Definition
Kafka is a distributed, append-only log.
You write events to it. Multiple applications read from it. Events persist for a configurable time.
┌─────────────────────────────────────────────┐
│ KAFKA CLUSTER ││ ┌──────────────────────────────────────┐ ││ │ Topic: "user_events" │ ││ │ [event1, event2, event3, event4 ...] │ ││ └──────────────────────────────────────┘ ││ ││ ┌──────────────────────────────────────┐ ││ │ Topic: "payments" │ ││ │ [payment1, payment2, payment3 ...] │ ││ └──────────────────────────────────────┘ │└─────────────────────────────────────────────┘ ↑ ↓ Producers Consumers (Write) (Read)What Kafka IS:
✅ Event streaming platform: Publish events, consume elsewhere ✅ Distributed: Runs across multiple machines ✅ Fast: Handles millions of messages/second ✅ Durable: Events are persisted (won't lose data) ✅ Decoupled: Producers don't know consumers exist ✅ Replayable: Can re-read old events ✅ Scalable: Add partitions and brokers for more capacity
What Kafka ISN'T:
❌ Not a database: No query language, not for random access ❌ Not a message queue (mostly): RabbitMQ is for job queues, Kafka is for events ❌ Not real-time (technically): It's fast, but not zero-latency ❌ Not a replacement for everything: Use the right tool for the job
Kafka vs Other Tools
| Tool | Purpose | Use Case |
|---|---|---|
| Kafka | Event streaming | Uber detecting surge, Netflix tracking views |
| RabbitMQ | Task queue | Processing background jobs |
| Redis Streams | In-memory events | Lightweight, single-machine |
| AWS SQS | Managed queue | Simple AWS-native needs |
| Google Pub/Sub | Managed events | GCP-native workloads |
| Database CDC | Change capture | Syncing data between systems |
Part 2: Why Does Kafka Exist? (The History)
The LinkedIn Problem (2010)
LinkedIn was growing fast. Their data infrastructure was a mess:
┌─────────────┐ ┌─────────────┐ ┌──────────────┐
│ Website │─────▶│ Analytics │─────▶│ Data Lake │└─────────────┘ └─────────────┘ └──────────────┘ │ ├─────────────▶ Search Index │ ├─────────────▶ Recommendations │ └─────────────▶ NotificationsEvery system had its own data pipeline. Expensive. Fragile. Duplicated logic.
Solution: Build a central "event log" that all systems subscribe to.
Jay Kreps, Neha Narkhede, and Jun Rao created Kafka (named after Franz Kafka, the writer).
Released open source in 2011. Changed the industry.
Why It Worked
- Simplicity: Just an append-only log
- Scalability: Could handle LinkedIn's 1 billion+ events/day
- Durability: Events persisted to disk
- Decoupling: Producers and consumers independent
- Replayability: Could re-process events
Now used by Netflix, Uber, Stripe, Twitter, Airbnb, Dropbox, etc.
Part 3: Kafka Architecture (How It Works)
The Mental Model
Think of Kafka like a distributed newspaper:
- Topics = Sections (Sports, Tech, Politics)
- Partitions = Copies of each section (for speed)
- Messages = Individual articles
- Brokers = Printing presses
- Producers = Journalists (write articles)
- Consumers = Readers (read articles)
Core Components
1. Brokers
Kafka brokers are servers that store and serve messages.
─────────────────────────────────────────────┐
│ KAFKA CLUSTER ││ ││ ┌──────────────┐ ┌──────────────┐ ││ │ Broker 1 │ │ Broker 2 │ ││ │ (Partition 0)│ │ (Partition 1)│ ... ││ │ (Partition 2)│ │ (Partition 3)│ ││ └──────────────┘ └──────────────┘ ││ │└─────────────────────────────────────────────┘A typical Kafka cluster has 3-5 brokers (for redundancy).
2. Topics
A topic is a stream of events. Like a channel.
Topic: "user_events"
[user_signup, user_login, user_logout, user_logout, user_signup, ...]3. Partitions
Each topic is split into partitions for parallelism.
Topic: "user_events"
├── Partition 0: [event1, event2, event3]├── Partition 1: [event4, event5, event6]└── Partition 2: [event7, event8, event9]
Why partition?
- Multiple readers can read from different partitions in parallel
- More throughput
- Distributed storage
How does Kafka decide which partition?
By key (if provided):
Key: "user_123" → Always goes to same partition
Key: "user_456" → Goes to different partitionThis ensures all events for one user are ordered (but events across users can be out of order).
4. Replicas
Each partition is replicated across brokers for durability.
Partition 0: [event1, event2, event3]
├── Leader (Broker 1) - receives writes, serves reads├── Replica (Broker 2) - copy└── Replica (Broker 3) - copyIf Broker 1 goes down, Broker 2 takes over as leader. No data loss.
5. Consumer Groups
Multiple consumers can work together as a group.
Consumer Group: "analytics_team"
├── Consumer 1 reads Partition 0├── Consumer 2 reads Partition 1└── Consumer 3 reads Partition 2Each partition is read by ONE consumer in a group (but multiple groups can read same partition).
Part 4: Internal Mechanics (How Kafka Really Works)
How a Message Gets Written
Producer Code:
producer.send("user_events", key="user_123", value={"action": "login"}) ↓1. Producer batches messages (for efficiency) ↓2. Partitioner decides which partition (based on key) ↓3. Message goes to Partition 0, Broker 1 (leader) ↓4. Leader writes to disk (append-only log) ↓5. Replicates to Broker 2 and Broker 3 (asynchronously) ↓6. Producer gets acknowledgment ↓7. Message is immediately available to consumersReplication: The Key to Durability
Kafka uses log replication for redundancy:
Broker 1 (Leader) Broker 2 (Replica) Broker 3 (Replica)
[msg1, msg2, msg3] [msg1, msg2, msg3] [msg1, msg2, msg3] ↑ ↓ ↓ Receives writes Replicates from Replicates from Leader LeaderIn-Sync Replicas (ISR):
- Replicas that are caught up with leader
- Only ISR replicas can become new leader
- Ensures no data loss
Replication Factor:
- Usually 3 (leader + 2 replicas)
- If leader dies, one of the 2 replicas becomes new leader
- Tolerance: Can lose 1 broker with 3x replication
Leader Election
When a broker dies, how does a new leader get elected?
- Zookeeper (or KRaft in newer versions) tracks broker health
- Detects leader is down
- Elects new leader from ISR
- Updates metadata
- Consumers/producers get new leader info
- Traffic redirects to new leader
All automatic. Happens in seconds.
Offset: How Consumers Track Position
Consumers need to know which messages they've read.
Offset: Position in the partition.
Partition 0: [msg0, msg1, msg2, msg3, msg4]
0 1 2 3 4 ← OffsetConsumer starts reading:- Offset 0: reads msg0- Offset 1: reads msg1- ...- Offset 4: reads msg4Consumer stores: "I've read up to offset 4"If consumer crashes and restarts: Start from offset 4 (resume where left off)
This is stored in a special topic called __consumer_offsets.
\
Kafka vs Alternatives
| Feature | Kafka | RabbitMQ | AWS SQS | Redis |
|---|---|---|---|---|
| Throughput | Very High | High | High | Very High |
| Durability | Excellent | Good | Excellent | Good |
| Ordering | Per-partition | FIFO (queues) | FIFO | N/A |
| Replayability | Yes | No | No | Limited |
| Distributed | Yes | No | Managed | No |
| Complexity | High | Low | Low | Low |
| Cost | Infrastructure | Simpler | Pay per msg | Infrastructure |
Choose Kafka if:
- High throughput (1000+ events/sec)
- Need event replay
- Multiple subscribers needed
- Distributed required
Choose RabbitMQ if:
- Job queuing
- Complex routing
- Simple, single-server ok
- Get link
- X
- Other Apps
Popular Posts
What If India Loses this mindset of Reusing Things?
- Get link
- X
- Other Apps
Polar Bear is Suffering to Find Land Here is Why?
- Get link
- X
- Other Apps
Smarter move through technology revolution
- Get link
- X
- Other Apps
The Role of UX Design in Evolving Technology
- Get link
- X
- Other Apps
Comments
Post a Comment