Design YouTube: A System Design Interview Conversation
System Design Interview

Design YouTube: A System Design Interview Conversation

IdealResume TeamOctober 4, 202512 min read
Share:

The Interview Scenario

You're in a 45-minute system design interview at a top tech company. The interviewer asks: "Design YouTube."

---

Interviewer: "Let's design YouTube. You have 45 minutes. Where would you like to start?"

Candidate: "Before diving into the design, I'd like to ask a few clarifying questions to make sure I understand the scope and requirements. Is that okay?"

Interviewer: "Absolutely, go ahead."

---

Phase 1: Requirements Clarification (5 minutes)

Candidate: "Great. Let me understand what we're building:

  1. **Core features** - Are we focusing on video upload, video playback, or both?
  2. **Scale** - What's our target user base? Millions or billions?
  3. **Geographic distribution** - Is this a global service?
  4. **Video types** - Are we supporting live streaming, or just pre-recorded videos?
  5. **Quality** - Should we support multiple resolutions like 480p, 720p, 1080p, 4K?"

Interviewer: "Good questions. Let's focus on:

  • Video upload and playback (no live streaming for now)
  • Global scale - assume 2 billion monthly active users
  • Support multiple video qualities
  • Users should be able to search, like, comment on videos"

Candidate: "Perfect. And for non-functional requirements, I'm assuming:

  • **High availability** - the service should be up 99.99% of the time
  • **Low latency** - videos should start playing within 200ms
  • **Eventual consistency** is acceptable for likes and view counts
  • **Durability** - uploaded videos should never be lost

Does that align with your expectations?"

Interviewer: "Yes, that's a good set of requirements. Please continue."

---

Phase 2: Back-of-Envelope Calculations (5 minutes)

Candidate: "Let me do some quick math to understand the scale we're dealing with."

Storage Calculation:

  • 2 billion MAU, assume 10% upload content = 200 million creators
  • Average 1 video per month per creator = 200 million videos/month
  • Average video size (compressed, multiple resolutions) = 1GB
  • Monthly new storage = 200 million × 1GB = **200 PB/month**
  • We'll need distributed object storage like S3

Bandwidth Calculation:

  • 2 billion users, assume 30% DAU = 600 million daily users
  • Average 5 videos per day per user = 3 billion video views/day
  • Average video watch = 5 minutes at 5 Mbps = 187.5 MB per view
  • Daily egress = 3 billion × 187.5 MB = **~500 PB/day**
  • This is massive - we definitely need a CDN

QPS Calculation:

  • Video views: 3 billion/day = ~35,000 views/second
  • Uploads: 200 million/month = ~77 uploads/second
  • Read-heavy system with ~450:1 read-to-write ratio

Interviewer: "Good estimates. How do these numbers influence your design?"

Candidate: "Three key takeaways:

  1. **CDN is essential** - 500 PB/day egress can't come from origin servers
  2. **Object storage** - We need distributed storage, not traditional databases for video files
  3. **Read optimization** - 450:1 ratio means we should heavily cache metadata"

---

Phase 3: High-Level Design (10 minutes)

Candidate: "Let me sketch out the major components:"

![YouTube High Level Architecture](/images/blog/youtube-architecture.svg)

Interviewer: "Walk me through the video upload flow."

Candidate: "Sure. Here's the upload process:

  1. **Client initiates upload** - Requests a pre-signed URL from Upload Service
  2. **Direct upload to S3** - Client uploads directly to object storage (bypasses our servers)
  3. **Upload complete notification** - Webhook triggers transcoding job
  4. **Transcoding queue** - Job added to message queue (SQS/Kafka)
  5. **Transcoding workers** - Convert video to multiple resolutions (480p, 720p, 1080p, 4K)
  6. **Store transcoded files** - All versions saved to S3
  7. **Update metadata** - Write video metadata to MySQL
  8. **CDN invalidation** - Push to CDN edge locations
  9. **Notify user** - Video is ready to view"

Interviewer: "Why did you choose direct upload to S3?"

Candidate: "Great question. Three reasons:

  1. **Scalability** - S3 handles unlimited concurrent uploads without us scaling servers
  2. **Cost** - No bandwidth costs through our infrastructure
  3. **Reliability** - S3's multipart upload handles large files and network issues

The trade-off is slightly more complexity with pre-signed URLs, but it's worth it at this scale."

---

Phase 4: Deep Dive - Video Delivery (10 minutes)

Interviewer: "Let's go deeper on video playback. How do we achieve that 200ms start time?"

Candidate: "Video delivery is critical. Here's my approach:"

CDN Strategy:

  • Partner with multiple CDN providers (Akamai, CloudFlare, Fastly)
  • Deploy edge nodes in 100+ geographic locations
  • Cache popular videos at edge (80% of views are 20% of videos)
  • Use DNS-based routing to direct users to nearest edge

Adaptive Bitrate Streaming:

  • Implement HLS (HTTP Live Streaming) or DASH
  • Video split into 2-4 second segments
  • Client dynamically switches quality based on bandwidth
  • Start with lower quality for faster initial load, then upgrade

Interviewer: "How do you decide what to cache at the edge?"

Candidate: "I'd use a tiered caching strategy:

Tier 1 - Edge (CDN):

  • Hot content: videos with >10K views in last 24 hours
  • First few segments of all videos (for fast start)
  • Approximately 5% of total content = 95% of traffic

Tier 2 - Regional:

  • Warm content: videos with 1K-10K recent views
  • Full videos for regional popular content

Tier 3 - Origin:

  • Cold content: rarely accessed videos
  • All master copies for durability

We'd track view velocity and use ML to predict what will trend."

Interviewer: "What happens on a cache miss?"

Candidate: "On cache miss:

  1. Edge returns 302 redirect to regional cache (or origin)
  2. Regional cache fetches from origin if needed
  3. Video streams to user while simultaneously caching
  4. Background job warms the edge cache if views spike

We can also use cache warming - when we detect a video going viral (sudden spike in views), we proactively push it to more edge locations."

---

Phase 5: Database Design (8 minutes)

Interviewer: "Tell me about your database choices and schema."

Candidate: "For the metadata layer, I'd use MySQL with Vitess for sharding:"

Why MySQL + Vitess:

  • Proven at YouTube's actual scale
  • ACID compliance for critical data
  • Vitess handles sharding, connection pooling, query routing
  • Strong consistency where needed

Key Tables and Sharding:

```sql

-- Sharded by video_id

videos (

video_id BIGINT PRIMARY KEY,

user_id BIGINT,

title VARCHAR(200),

description TEXT,

status ENUM('processing', 'active', 'deleted'),

created_at TIMESTAMP,

INDEX (user_id)

)

-- Sharded by user_id

users (

user_id BIGINT PRIMARY KEY,

username VARCHAR(50),

email VARCHAR(100),

subscriber_count BIGINT

)

-- Sharded by video_id

video_stats (

video_id BIGINT PRIMARY KEY,

view_count BIGINT,

like_count BIGINT,

comment_count BIGINT

)

```

Interviewer: "How do you handle view counts at 35K views per second?"

Candidate: "View counts are interesting because they need to handle high write throughput but can tolerate eventual consistency.

My approach:

  1. **Write to Redis first** - Increment counter in Redis (100K+ ops/sec per node)
  2. **Batch persist to MySQL** - Every 30 seconds, flush Redis counters to database
  3. **Read from Redis** - Display counts always come from cache
  4. **Approximate counts** - For very popular videos, show '1.2M views' not exact count

Why this works:

  • Users don't notice if view count is 30 seconds stale
  • Redis handles the write burst
  • MySQL sees 1/30th the write load
  • We don't lose counts if Redis fails (WAL in Redis, retry queue)"

---

Phase 6: Trade-offs and Alternatives (5 minutes)

Interviewer: "What are the main trade-offs in your design?"

Candidate: "Good question. Let me highlight the key decisions:

1. SQL vs NoSQL for metadata:

  • **Chose:** MySQL with Vitess
  • **Alternative:** Cassandra or DynamoDB
  • **Trade-off:** SQL gives us joins and transactions at cost of more complex sharding
  • **Why SQL:** Video metadata has relationships (users, comments, playlists) that benefit from joins

2. Single CDN vs Multi-CDN:

  • **Chose:** Multi-CDN
  • **Trade-off:** More complexity in routing logic vs better reliability and cost optimization
  • **Why Multi:** Avoid vendor lock-in, geographic coverage, negotiate better rates

3. Sync vs Async transcoding:

  • **Chose:** Async with queue
  • **Trade-off:** Delay before video available vs simpler sync processing
  • **Why Async:** Transcoding takes 2-10x video length; can't block user that long

4. Strong vs Eventual consistency for view counts:

  • **Chose:** Eventual consistency
  • **Trade-off:** Slightly stale counts vs massive write scalability
  • **Why Eventual:** Users accept approximate view counts; strong consistency at 35K/sec would require complex distributed consensus"

Interviewer: "If you had to scale this 10x, what would break first?"

Candidate: "Great question for identifying bottlenecks:

  1. **Transcoding pipeline** - Would need to 10x worker fleet, consider spot instances
  2. **Database write path** - Add more shards, optimize batch sizes
  3. **CDN costs** - At 10x, we'd need to negotiate volume discounts or build our own edge

The read path scales well with caching, but writes need careful capacity planning."

---

Phase 7: Wrap-up (2 minutes)

Interviewer: "Any final thoughts?"

Candidate: "I'd add two more considerations for production:

Monitoring & Observability:

  • Track p99 latencies for video start time
  • Monitor CDN cache hit rates (target: 95%+)
  • Alert on transcoding queue depth
  • Real-time dashboard for error rates by region

Security:

  • DRM for premium content
  • Rate limiting on uploads to prevent abuse
  • Content moderation pipeline (ML-based + human review)
  • Signed URLs with expiration for video access

Want me to elaborate on any of these areas?"

Interviewer: "No, that was comprehensive. Good job structuring your approach and explaining trade-offs."

---

Key Takeaways for Your Interview

  1. **Always clarify requirements** - 5 minutes here saves 10 minutes of wrong design
  2. **Do back-of-envelope math** - Shows you think about scale
  3. **Draw before you talk** - Visual diagrams help communicate complex systems
  4. **Explain trade-offs** - Senior engineers make informed decisions, not perfect ones
  5. **Stay organized** - Follow a structure: Requirements → Estimates → High-level → Deep dive → Trade-offs

Ready to Build Your Perfect Resume?

Let IdealResume help you create ATS-optimized, tailored resumes that get results.

Get Started Free

Found this helpful? Share it with others who might benefit.

Share: