How URL Shorteners Work: System Design Deep Dive
System Design

How URL Shorteners Work: System Design Deep Dive

IdealResume TeamJuly 26, 20257 min read
Share:

The URL Shortener Problem

URL shorteners like bit.ly, TinyURL, and t.co convert long URLs into short links. While seemingly simple, designing one that handles billions of URLs and redirects reveals interesting system design challenges.

Requirements Analysis

Functional Requirements:

  • Shorten long URL to short URL
  • Redirect short URL to original
  • Custom aliases (optional)
  • Expiration (optional)
  • Analytics (optional)

Non-Functional Requirements:

  • Low latency redirection (< 100ms)
  • High availability (99.99%)
  • Scalability (billions of URLs)
  • Not guessable (security)

Capacity Estimation:

  • 100M URLs created per month
  • 10:1 read to write ratio
  • 1B redirects per month
  • 5 years of data: 6B URLs
  • Storage: ~500GB for URLs

URL Encoding Approaches

Approach 1: Hash-based

  • MD5/SHA256 hash of original URL
  • Take first N characters
  • Problem: Collisions, not ideal for custom aliases

Approach 2: Counter-based

  • Increment counter for each URL
  • Convert to base62 (a-zA-Z0-9)
  • Problem: Predictable, single point of failure

Approach 3: Pre-generated Keys

  • Generate keys in advance
  • Assign from pool on request
  • Avoids collision, scalable
  • Recommended approach

Base62 Encoding

Converting numbers to short strings:

  • Characters: a-z (26) + A-Z (26) + 0-9 (10) = 62
  • 6 characters: 62^6 = 56.8 billion combinations
  • 7 characters: 62^7 = 3.5 trillion combinations
  • 6-7 characters sufficient for most use cases

Example:

  • Number: 12345678
  • Base62: "dnh62"

System Architecture

Components:

1. API Service

  • Create short URLs (POST /shorten)
  • Redirect (GET /{shortCode})
  • Rate limiting
  • Authentication

2. Key Generation Service

  • Pre-generates unique keys
  • Distributes to API servers
  • Handles exhaustion

3. Database

  • Stores URL mappings
  • shortCode → {longUrl, createdAt, expiresAt, userId}
  • Needs high read throughput

4. Cache Layer

  • Hot URLs cached
  • LRU eviction
  • 80% hit rate typical

5. Analytics Service

  • Click tracking
  • Geographic data
  • Referrer information

Database Design

Schema:

  • id (primary key)
  • short_code (unique index)
  • long_url
  • created_at
  • expires_at
  • user_id (optional)
  • click_count (denormalized)

Database Choice:

  • NoSQL (Cassandra, DynamoDB) for scale
  • Or sharded MySQL/PostgreSQL
  • Key-value store for simple lookups

Sharding Strategy:

  • Shard by short_code hash
  • Even distribution
  • No hot partition concerns

Handling Redirects

Redirect Flow:

  1. Request: GET /abc123
  2. Check cache for abc123
  3. If miss, query database
  4. Return 301/302 redirect
  5. Update analytics asynchronously

301 vs 302 Redirect:

  • 301 (Permanent): Browser caches, better for SEO
  • 302 (Temporary): Always hits server, better for analytics
  • Most shorteners use 302

Caching Strategy

Cache Layer:

  • Redis or Memcached
  • Key: short_code
  • Value: long_url
  • TTL: Based on popularity

Cache Warming:

  • Pre-load popular URLs
  • Analyze access patterns
  • Background refresh

Cache Hit Ratio:

  • Power law distribution (few URLs get most traffic)
  • 80%+ hit rate achievable
  • Significantly reduces DB load

High Availability

Redundancy:

  • Multiple API server instances
  • Database replication
  • Cache cluster

Failover:

  • Load balancer health checks
  • Automatic failover
  • Multi-region deployment

Consistency:

  • Strong consistency for writes
  • Eventual consistency acceptable for analytics

Security Considerations

URL Validation:

  • Check for malicious URLs
  • Blocklist known bad domains
  • Rate limit creation

Spam Prevention:

  • CAPTCHA for anonymous users
  • Rate limiting
  • Abuse detection

Privacy:

  • Don't leak URL patterns
  • Secure analytics data
  • GDPR compliance

Analytics Implementation

Click Tracking:

  • Log each redirect
  • Async processing (Kafka → analytics DB)
  • Aggregation for dashboard

Data Collected:

  • Timestamp
  • IP address (for geo)
  • User agent
  • Referrer
  • Short code

Scaling Considerations

Read Scaling:

  • Add cache nodes
  • Add read replicas
  • CDN for global reach

Write Scaling:

  • Shard database
  • Batch key generation
  • Async analytics

Interview Tips

Common Questions:

  • How to generate unique short codes?
  • How to handle high read traffic?
  • How to prevent collisions?
  • How to track analytics?

Key Trade-offs:

  • Counter vs random (predictability vs simplicity)
  • 301 vs 302 (SEO vs analytics)
  • Strong vs eventual consistency

Extensions:

  • Custom aliases
  • Expiring URLs
  • Private URLs
  • API rate limiting

The URL shortener is a great interview problem because it's simple enough to discuss thoroughly but has depth in scaling and optimization.

Ready to Build Your Perfect Resume?

Let IdealResume help you create ATS-optimized, tailored resumes that get results.

Get Started Free

Found this helpful? Share it with others who might benefit.

Share: