Design YouTube: A System Design Interview Conversation
The Interview Scenario
You're in a 45-minute system design interview at a top tech company. The interviewer asks: "Design YouTube."
---
Interviewer: "Let's design YouTube. You have 45 minutes. Where would you like to start?"
Candidate: "Before diving into the design, I'd like to ask a few clarifying questions to make sure I understand the scope and requirements. Is that okay?"
Interviewer: "Absolutely, go ahead."
---
Phase 1: Requirements Clarification (5 minutes)
Candidate: "Great. Let me understand what we're building:
- **Core features** - Are we focusing on video upload, video playback, or both?
- **Scale** - What's our target user base? Millions or billions?
- **Geographic distribution** - Is this a global service?
- **Video types** - Are we supporting live streaming, or just pre-recorded videos?
- **Quality** - Should we support multiple resolutions like 480p, 720p, 1080p, 4K?"
Interviewer: "Good questions. Let's focus on:
- Video upload and playback (no live streaming for now)
- Global scale - assume 2 billion monthly active users
- Support multiple video qualities
- Users should be able to search, like, comment on videos"
Candidate: "Perfect. And for non-functional requirements, I'm assuming:
- **High availability** - the service should be up 99.99% of the time
- **Low latency** - videos should start playing within 200ms
- **Eventual consistency** is acceptable for likes and view counts
- **Durability** - uploaded videos should never be lost
Does that align with your expectations?"
Interviewer: "Yes, that's a good set of requirements. Please continue."
---
Phase 2: Back-of-Envelope Calculations (5 minutes)
Candidate: "Let me do some quick math to understand the scale we're dealing with."
Storage Calculation:
- 2 billion MAU, assume 10% upload content = 200 million creators
- Average 1 video per month per creator = 200 million videos/month
- Average video size (compressed, multiple resolutions) = 1GB
- Monthly new storage = 200 million × 1GB = **200 PB/month**
- We'll need distributed object storage like S3
Bandwidth Calculation:
- 2 billion users, assume 30% DAU = 600 million daily users
- Average 5 videos per day per user = 3 billion video views/day
- Average video watch = 5 minutes at 5 Mbps = 187.5 MB per view
- Daily egress = 3 billion × 187.5 MB = **~500 PB/day**
- This is massive - we definitely need a CDN
QPS Calculation:
- Video views: 3 billion/day = ~35,000 views/second
- Uploads: 200 million/month = ~77 uploads/second
- Read-heavy system with ~450:1 read-to-write ratio
Interviewer: "Good estimates. How do these numbers influence your design?"
Candidate: "Three key takeaways:
- **CDN is essential** - 500 PB/day egress can't come from origin servers
- **Object storage** - We need distributed storage, not traditional databases for video files
- **Read optimization** - 450:1 ratio means we should heavily cache metadata"
---
Phase 3: High-Level Design (10 minutes)
Candidate: "Let me sketch out the major components:"

Interviewer: "Walk me through the video upload flow."
Candidate: "Sure. Here's the upload process:
- **Client initiates upload** - Requests a pre-signed URL from Upload Service
- **Direct upload to S3** - Client uploads directly to object storage (bypasses our servers)
- **Upload complete notification** - Webhook triggers transcoding job
- **Transcoding queue** - Job added to message queue (SQS/Kafka)
- **Transcoding workers** - Convert video to multiple resolutions (480p, 720p, 1080p, 4K)
- **Store transcoded files** - All versions saved to S3
- **Update metadata** - Write video metadata to MySQL
- **CDN invalidation** - Push to CDN edge locations
- **Notify user** - Video is ready to view"
Interviewer: "Why did you choose direct upload to S3?"
Candidate: "Great question. Three reasons:
- **Scalability** - S3 handles unlimited concurrent uploads without us scaling servers
- **Cost** - No bandwidth costs through our infrastructure
- **Reliability** - S3's multipart upload handles large files and network issues
The trade-off is slightly more complexity with pre-signed URLs, but it's worth it at this scale."
---
Phase 4: Deep Dive - Video Delivery (10 minutes)
Interviewer: "Let's go deeper on video playback. How do we achieve that 200ms start time?"
Candidate: "Video delivery is critical. Here's my approach:"
CDN Strategy:
- Partner with multiple CDN providers (Akamai, CloudFlare, Fastly)
- Deploy edge nodes in 100+ geographic locations
- Cache popular videos at edge (80% of views are 20% of videos)
- Use DNS-based routing to direct users to nearest edge
Adaptive Bitrate Streaming:
- Implement HLS (HTTP Live Streaming) or DASH
- Video split into 2-4 second segments
- Client dynamically switches quality based on bandwidth
- Start with lower quality for faster initial load, then upgrade
Interviewer: "How do you decide what to cache at the edge?"
Candidate: "I'd use a tiered caching strategy:
Tier 1 - Edge (CDN):
- Hot content: videos with >10K views in last 24 hours
- First few segments of all videos (for fast start)
- Approximately 5% of total content = 95% of traffic
Tier 2 - Regional:
- Warm content: videos with 1K-10K recent views
- Full videos for regional popular content
Tier 3 - Origin:
- Cold content: rarely accessed videos
- All master copies for durability
We'd track view velocity and use ML to predict what will trend."
Interviewer: "What happens on a cache miss?"
Candidate: "On cache miss:
- Edge returns 302 redirect to regional cache (or origin)
- Regional cache fetches from origin if needed
- Video streams to user while simultaneously caching
- Background job warms the edge cache if views spike
We can also use cache warming - when we detect a video going viral (sudden spike in views), we proactively push it to more edge locations."
---
Phase 5: Database Design (8 minutes)
Interviewer: "Tell me about your database choices and schema."
Candidate: "For the metadata layer, I'd use MySQL with Vitess for sharding:"
Why MySQL + Vitess:
- Proven at YouTube's actual scale
- ACID compliance for critical data
- Vitess handles sharding, connection pooling, query routing
- Strong consistency where needed
Key Tables and Sharding:
```sql
-- Sharded by video_id
videos (
video_id BIGINT PRIMARY KEY,
user_id BIGINT,
title VARCHAR(200),
description TEXT,
status ENUM('processing', 'active', 'deleted'),
created_at TIMESTAMP,
INDEX (user_id)
)
-- Sharded by user_id
users (
user_id BIGINT PRIMARY KEY,
username VARCHAR(50),
email VARCHAR(100),
subscriber_count BIGINT
)
-- Sharded by video_id
video_stats (
video_id BIGINT PRIMARY KEY,
view_count BIGINT,
like_count BIGINT,
comment_count BIGINT
)
```
Interviewer: "How do you handle view counts at 35K views per second?"
Candidate: "View counts are interesting because they need to handle high write throughput but can tolerate eventual consistency.
My approach:
- **Write to Redis first** - Increment counter in Redis (100K+ ops/sec per node)
- **Batch persist to MySQL** - Every 30 seconds, flush Redis counters to database
- **Read from Redis** - Display counts always come from cache
- **Approximate counts** - For very popular videos, show '1.2M views' not exact count
Why this works:
- Users don't notice if view count is 30 seconds stale
- Redis handles the write burst
- MySQL sees 1/30th the write load
- We don't lose counts if Redis fails (WAL in Redis, retry queue)"
---
Phase 6: Trade-offs and Alternatives (5 minutes)
Interviewer: "What are the main trade-offs in your design?"
Candidate: "Good question. Let me highlight the key decisions:
1. SQL vs NoSQL for metadata:
- **Chose:** MySQL with Vitess
- **Alternative:** Cassandra or DynamoDB
- **Trade-off:** SQL gives us joins and transactions at cost of more complex sharding
- **Why SQL:** Video metadata has relationships (users, comments, playlists) that benefit from joins
2. Single CDN vs Multi-CDN:
- **Chose:** Multi-CDN
- **Trade-off:** More complexity in routing logic vs better reliability and cost optimization
- **Why Multi:** Avoid vendor lock-in, geographic coverage, negotiate better rates
3. Sync vs Async transcoding:
- **Chose:** Async with queue
- **Trade-off:** Delay before video available vs simpler sync processing
- **Why Async:** Transcoding takes 2-10x video length; can't block user that long
4. Strong vs Eventual consistency for view counts:
- **Chose:** Eventual consistency
- **Trade-off:** Slightly stale counts vs massive write scalability
- **Why Eventual:** Users accept approximate view counts; strong consistency at 35K/sec would require complex distributed consensus"
Interviewer: "If you had to scale this 10x, what would break first?"
Candidate: "Great question for identifying bottlenecks:
- **Transcoding pipeline** - Would need to 10x worker fleet, consider spot instances
- **Database write path** - Add more shards, optimize batch sizes
- **CDN costs** - At 10x, we'd need to negotiate volume discounts or build our own edge
The read path scales well with caching, but writes need careful capacity planning."
---
Phase 7: Wrap-up (2 minutes)
Interviewer: "Any final thoughts?"
Candidate: "I'd add two more considerations for production:
Monitoring & Observability:
- Track p99 latencies for video start time
- Monitor CDN cache hit rates (target: 95%+)
- Alert on transcoding queue depth
- Real-time dashboard for error rates by region
Security:
- DRM for premium content
- Rate limiting on uploads to prevent abuse
- Content moderation pipeline (ML-based + human review)
- Signed URLs with expiration for video access
Want me to elaborate on any of these areas?"
Interviewer: "No, that was comprehensive. Good job structuring your approach and explaining trade-offs."
---
Key Takeaways for Your Interview
- **Always clarify requirements** - 5 minutes here saves 10 minutes of wrong design
- **Do back-of-envelope math** - Shows you think about scale
- **Draw before you talk** - Visual diagrams help communicate complex systems
- **Explain trade-offs** - Senior engineers make informed decisions, not perfect ones
- **Stay organized** - Follow a structure: Requirements → Estimates → High-level → Deep dive → Trade-offs
Ready to Build Your Perfect Resume?
Let IdealResume help you create ATS-optimized, tailored resumes that get results.
Get Started Free