How Apache Kafka Works: Distributed Streaming at Scale
System Design

How Apache Kafka Works: Distributed Streaming at Scale

IdealResume TeamJuly 2, 202510 min read
Share:

Kafka's Role in Modern Architecture

Apache Kafka is the de facto standard for real-time data streaming. LinkedIn, where it was created, processes 7 trillion messages per day through Kafka. Companies like Netflix, Uber, and Airbnb rely on it for critical data pipelines.

Core Concepts

1. Topics

  • Named feeds of messages
  • Similar to database tables
  • Partitioned for scalability
  • Replicated for durability

2. Partitions

  • Topics split into partitions
  • Each partition is ordered
  • Parallelism = number of partitions
  • Messages distributed by key

3. Producers

  • Publish messages to topics
  • Choose partition (key-based or round-robin)
  • Acknowledge writes

4. Consumers

  • Subscribe to topics
  • Read from partitions
  • Track position (offset)
  • Consumer groups for scaling

5. Brokers

  • Kafka servers
  • Store and serve data
  • Handle replication
  • Coordinate consumers

Architecture Deep Dive

Cluster Layout:

  • Multiple brokers (typically 3+)
  • Each broker handles subset of partitions
  • Zookeeper/KRaft for coordination
  • No single point of failure

Partition Distribution:

  • Partitions spread across brokers
  • One broker = leader for partition
  • Other brokers = followers (replicas)
  • Automatic leader election on failure

Write Path

Producer Flow:

  1. Producer chooses partition (by key hash or round-robin)
  2. Request sent to partition leader
  3. Leader writes to local log
  4. Followers replicate
  5. Leader acknowledges when enough replicas confirm

Acknowledgment Modes:

  • acks=0: No wait (fastest, least safe)
  • acks=1: Leader acknowledges (balanced)
  • acks=all: All replicas acknowledge (safest, slowest)

Log Storage

Append-Only Log:

  • Messages appended sequentially
  • Never modified after write
  • Position = offset (monotonic)

Log Segments:

  • Log split into segment files
  • Active segment receives writes
  • Old segments immutable
  • Retention-based deletion

Index Files:

  • Offset index: offset → position
  • Time index: timestamp → offset
  • Enables efficient seeking

Read Path

Consumer Flow:

  1. Consumer requests messages from offset
  2. Broker serves from page cache or disk
  3. Consumer processes messages
  4. Consumer commits offset

Sequential Reads:

  • Most reads are sequential
  • OS page cache very effective
  • Zero-copy optimization (sendfile)

Consumer Groups:

  • Multiple consumers share workload
  • Each partition assigned to one consumer
  • Rebalancing on consumer changes

Replication

Leader-Follower Model:

  • Each partition has one leader
  • Leader handles all reads/writes
  • Followers replicate from leader
  • ISR (In-Sync Replicas) tracks healthy replicas

Durability:

  • Replication factor typically 3
  • Survives up to RF-1 failures
  • min.insync.replicas for write guarantees

Failure Handling:

  • Leader fails → follower promoted
  • Automatic, transparent to clients
  • Some messages might be lost (depends on acks)

Kafka's Secret Sauce

1. Sequential I/O

  • All writes are appends
  • All reads are sequential
  • Optimizes for disk throughput
  • 100x faster than random I/O

2. Zero-Copy

  • Data transferred directly from page cache to network
  • Bypasses application layer
  • sendfile system call
  • Reduces CPU overhead

3. Batching

  • Producers batch messages
  • Consumers fetch batches
  • Reduces network round trips
  • Amortizes overhead

4. Compression

  • Batch-level compression
  • gzip, snappy, lz4, zstd
  • Reduces network and storage

Consumer Group Mechanics

Partition Assignment:

  • Partitions assigned to consumers
  • One consumer per partition (max)
  • More consumers than partitions = some idle

Offset Management:

  • Consumers track their position
  • Offsets stored in Kafka (__consumer_offsets)
  • Commit periodically or on each message
  • Enables replay from any position

Rebalancing:

  • Triggered on consumer join/leave
  • Partitions redistributed
  • Brief pause in processing
  • Cooperative rebalancing minimizes disruption

Stream Processing

Kafka Streams:

  • Library for stream processing
  • Stateful operations (aggregations, joins)
  • Exactly-once semantics
  • Scales with Kafka

Common Operations:

  • Filter: select messages by criteria
  • Map: transform messages
  • Aggregate: group and combine
  • Join: correlate streams
  • Windowing: time-based grouping

Use Cases

1. Event Sourcing

  • Events as source of truth
  • Replay to rebuild state
  • Audit trail built-in

2. Change Data Capture (CDC)

  • Database changes to Kafka
  • Sync across systems
  • Tools: Debezium, Maxwell

3. Log Aggregation

  • Centralize logs from many sources
  • Real-time processing
  • Long-term storage

4. Metrics Pipeline

  • Application metrics to Kafka
  • Real-time dashboards
  • Alerting systems

Scaling Considerations

Horizontal Scaling:

  • Add partitions for more parallelism
  • Add brokers for more capacity
  • Add consumers for more processing

Limitations:

  • Partition count affects memory
  • Too many partitions = slower leader election
  • Key-based ordering limited to single partition

Operational Concerns

Monitoring:

  • Under-replicated partitions
  • Consumer lag
  • Broker CPU/disk utilization
  • Request latency

Common Issues:

  • Consumer lag building up
  • Rebalancing storms
  • Disk space exhaustion
  • Network saturation

Interview Application

When discussing event streaming:

Key Concepts:

  • Partitioning and ordering
  • Replication and durability
  • Consumer groups
  • Exactly-once semantics

Design Questions:

  • When to use Kafka?
  • How to choose partition count?
  • How to handle failures?
  • Stream processing patterns?

Trade-offs:

  • Throughput vs latency (batching)
  • Durability vs performance (acks)
  • Ordering vs parallelism (partitions)

Kafka's design choices - sequential I/O, replication, consumer groups - make it the backbone of modern data architecture.

Ready to Build Your Perfect Resume?

Let IdealResume help you create ATS-optimized, tailored resumes that get results.

Get Started Free

Found this helpful? Share it with others who might benefit.

Share: