Design Spotify: A System Design Interview Conversation
The Interview Scenario
You're in a 45-minute system design interview. The interviewer asks: "Design Spotify."
---
Interviewer: "Design Spotify - a music streaming service. You have 45 minutes."
Candidate: "Great, I'd love to tackle this. Let me start by clarifying requirements to make sure we're aligned on scope."
---
Phase 1: Requirements Clarification (5 minutes)
Candidate: "I have several questions:
- **Core features** - Are we focusing on music playback, playlists, search, or recommendations?
- **User types** - Free tier with ads vs premium?
- **Offline mode** - Should users be able to download music?
- **Social features** - Following friends, collaborative playlists?
- **Scale** - How many users are we targeting?"
Interviewer: "Let's design:
- Music streaming and playback
- Playlist creation and management
- Search functionality
- Basic recommendations
- Assume 500 million users, 200 million daily active
- Both free (with ads) and premium tiers
- Mobile and web clients"
Candidate: "Perfect. For non-functional requirements:
- **Low latency** - songs should start within 200ms
- **High availability** - 99.99% uptime
- **Seamless playback** - no buffering during songs
- **Offline sync** - for premium users
- **Personalization** - recommendations should improve over time
Sound right?"
Interviewer: "Yes, let's proceed."
---
Phase 2: Back-of-Envelope Calculations (5 minutes)
Candidate: "Let me estimate the scale:
Storage:
- Assume 100 million songs in catalog
- Average song: 3 minutes at 320kbps = ~7MB (high quality)
- Store multiple qualities (64, 128, 256, 320 kbps)
- Total per song: ~15MB average across qualities
- Total catalog: 100M × 15MB = **1.5 PB**
- Plus metadata, user data, playlists: add 20% = **~1.8 PB total**
Bandwidth:
- 200M DAU, average 1 hour listening/day
- 20 songs/day × 7MB = 140MB per user per day
- Daily bandwidth: 200M × 140MB = **28 PB/day**
Concurrent streams:
- Peak hours: 20% of DAU online simultaneously
- 200M × 20% = **40 million concurrent streams**
QPS for metadata:
- Each song play = 1 metadata lookup
- 200M users × 20 songs = 4 billion lookups/day
- ~46,000 QPS average, ~100,000 QPS peak"
Interviewer: "How do these numbers affect your design?"
Candidate: "Key insights:
- **CDN is critical** - 28 PB/day needs edge caching
- **Predictable storage** - 1.5 PB is large but manageable with object storage
- **High concurrent connections** - Need efficient connection handling
- **Caching metadata** - 100K QPS needs heavy caching layer"
---
Phase 3: High-Level Design (10 minutes)
Candidate: "Here's the high-level architecture:"

Interviewer: "Walk me through what happens when a user presses play."
Candidate: "Here's the playback flow:
- **Client requests song** - Sends song_id to Playback Service
- **Auth check** - Verify user has access (premium or free tier)
- **Get audio URL** - Fetch CDN URL for requested quality
- **Metadata fetch** - Get song details from Redis cache (or DB on miss)
- **Client receives manifest** - Contains URLs for audio segments
- **Streaming begins** - Client fetches segments from CDN
- **Buffering ahead** - Client pre-buffers next 30 seconds
- **Track listening event** - Async event sent for analytics/royalties
Key optimizations:
- Predictive pre-fetching of next song in queue
- Quality adaptation based on network conditions
- Gapless playback - load next song before current ends"
Interviewer: "Why did you separate Playback Service from Playlist Service?"
Candidate: "Separation of concerns and scaling:
- **Different scaling patterns** - Playback is read-heavy constant load; Playlist has more writes during peak creation times
- **Different data stores** - Playback needs fast key-value lookups; Playlists need flexible queries
- **Failure isolation** - If playlist service goes down, users can still play their current queue
- **Team ownership** - Different teams can own and deploy independently"
---
Phase 4: Deep Dive - Audio Streaming (10 minutes)
Interviewer: "Let's go deeper on audio delivery. How do you ensure smooth playback?"
Candidate: "Audio streaming has unique challenges compared to video:"
Audio File Preparation:
```
Original Audio (FLAC/WAV)
│
▼
┌─────────────────────────────────────┐
│ Transcoding Pipeline │
│ • 64 kbps (low quality/mobile) │
│ • 128 kbps (normal) │
│ • 256 kbps (high quality) │
│ • 320 kbps (premium) │
│ • FLAC (lossless - premium only) │
└─────────────────────────────────────┘
│
▼
Ogg Vorbis / AAC Format
(Segmented for streaming)
```
Adaptive Streaming:
- Songs divided into 5-10 second segments
- Client monitors download speed
- Dynamically switches quality mid-song if needed
- Buffer threshold: switch down if buffer < 10 seconds
Interviewer: "How do you achieve 200ms playback start?"
Candidate: "Several optimizations:
- **Pre-fetch on hover** - When user hovers over a song, start loading first segment
- **Predictive loading** - Queue next song while current plays
- **Edge caching** - Popular songs cached within 50ms of users
- **Small first segment** - First segment is 2 seconds (fast to download)
- **Connection keep-alive** - Maintain persistent connections to CDN
- **DNS pre-resolution** - Resolve CDN domains on app startup
Latency breakdown:
- DNS: 0ms (pre-resolved)
- TCP/TLS: 50ms (persistent connection)
- First segment download: 100ms (from edge, 2-second segment)
- Audio decode: 20ms
- Total: ~170ms to first sound"
Interviewer: "How do you handle offline mode?"
Candidate: "Offline is premium-only:
- **Download manager** - Background service downloads songs
- **Encrypted storage** - Songs stored with device-specific key
- **License validation** - Check license expiry periodically (every 30 days requires online check)
- **Sync service** - When back online, sync listening history
- **Storage management** - Auto-remove least-played downloaded songs when space low
DRM approach:
- Encrypt files with user-specific key
- Key tied to device ID + user ID
- Prevents sharing downloaded files between accounts"
---
Phase 5: Playlist and Recommendations (8 minutes)
Interviewer: "Tell me about playlist storage and recommendations."
Candidate: "Playlists have interesting requirements:"
Playlist Data Model:
```
// Cassandra schema - chosen for write scalability
playlists (
user_id UUID,
playlist_id UUID,
name TEXT,
created_at TIMESTAMP,
PRIMARY KEY (user_id, playlist_id)
)
playlist_tracks (
playlist_id UUID,
position INT,
track_id UUID,
added_at TIMESTAMP,
PRIMARY KEY (playlist_id, position)
)
```
Why Cassandra:
- Handles millions of playlists with ease
- Fast writes for playlist updates
- Partition by user_id for data locality
- Scales horizontally
Collaborative playlists:
- Use CRDT (Conflict-free Replicated Data Types)
- Each client can add/remove independently
- Conflicts auto-resolve (duplicate adds ignored, removes win)
Interviewer: "How does the recommendation system work?"
Candidate: "Spotify's recommendations use multiple signals:
Data Collection:
```
┌─────────────────┐
│ Listening Events│ ─▶ Kafka ─▶ Event Processing
└─────────────────┘
│
▼
User Behavior:
• Songs played
• Songs skipped
• Playlist additions
• Search queries
• Time of day patterns
```
Recommendation Models:
- **Collaborative Filtering** - "Users like you also listen to..."
- **Content-based** - Audio features analysis (tempo, key, energy)
- **NLP on metadata** - Genre, mood, lyrics analysis
- **Sequential** - What songs follow each other in playlists
Serving Recommendations:
- Pre-compute daily for each user
- Store in feature store (Redis/DynamoDB)
- Real-time adjustments based on current session
- Blend multiple models with learned weights
Interview tip: I'd implement a simpler version first (collaborative filtering with item-item similarity) and add complexity incrementally."
---
Phase 6: Trade-offs Discussion (5 minutes)
Interviewer: "What are the key trade-offs?"
Candidate: "Several important decisions:
1. Audio Format:
- **Chose:** Ogg Vorbis / AAC
- **Alternative:** MP3
- **Trade-off:** Better compression vs wider compatibility
- **Why:** Ogg gives 20% better quality at same bitrate; we control the client
2. Playlist Database:
- **Chose:** Cassandra
- **Alternative:** PostgreSQL
- **Trade-off:** Eventual consistency vs strong consistency
- **Why:** Playlist operations can tolerate slight delays; need write scale
3. Pre-computed vs Real-time Recommendations:
- **Chose:** Pre-computed with real-time adjustments
- **Alternative:** Fully real-time
- **Trade-off:** Freshness vs latency
- **Why:** Complex ML models take seconds to run; pre-compute overnight, adjust in real-time
4. CDN Strategy:
- **Chose:** Cache only popular songs at edge
- **Alternative:** Cache everything
- **Trade-off:** Hit rate vs storage costs
- **Why:** 80% of plays are 20% of songs; optimize for common case
Interviewer: "What would you improve with more time?"
Candidate: "Three areas:
- **Social features** - Friend activity feed, listening parties
- **Audio quality** - Lossless streaming for premium
- **Podcasts** - Different requirements (longer files, chapters, variable bitrate)"
---
Key Interview Takeaways
- **Audio vs Video** - Audio has simpler segmentation but stricter latency needs
- **Predictive loading** - Pre-fetch is crucial for instant playback
- **Offline requires DRM** - Important for premium differentiation
- **Recommendations** - Start simple (collaborative filtering) before going complex
- **Cassandra for playlists** - High write throughput, partition by user
Ready to Build Your Perfect Resume?
Let IdealResume help you create ATS-optimized, tailored resumes that get results.
Get Started Free