How URL Shorteners Work: System Design Deep Dive
The URL Shortener Problem
URL shorteners like bit.ly, TinyURL, and t.co convert long URLs into short links. While seemingly simple, designing one that handles billions of URLs and redirects reveals interesting system design challenges.
Requirements Analysis
Functional Requirements:
- Shorten long URL to short URL
- Redirect short URL to original
- Custom aliases (optional)
- Expiration (optional)
- Analytics (optional)
Non-Functional Requirements:
- Low latency redirection (< 100ms)
- High availability (99.99%)
- Scalability (billions of URLs)
- Not guessable (security)
Capacity Estimation:
- 100M URLs created per month
- 10:1 read to write ratio
- 1B redirects per month
- 5 years of data: 6B URLs
- Storage: ~500GB for URLs
URL Encoding Approaches
Approach 1: Hash-based
- MD5/SHA256 hash of original URL
- Take first N characters
- Problem: Collisions, not ideal for custom aliases
Approach 2: Counter-based
- Increment counter for each URL
- Convert to base62 (a-zA-Z0-9)
- Problem: Predictable, single point of failure
Approach 3: Pre-generated Keys
- Generate keys in advance
- Assign from pool on request
- Avoids collision, scalable
- Recommended approach
Base62 Encoding
Converting numbers to short strings:
- Characters: a-z (26) + A-Z (26) + 0-9 (10) = 62
- 6 characters: 62^6 = 56.8 billion combinations
- 7 characters: 62^7 = 3.5 trillion combinations
- 6-7 characters sufficient for most use cases
Example:
- Number: 12345678
- Base62: "dnh62"
System Architecture
Components:
1. API Service
- Create short URLs (POST /shorten)
- Redirect (GET /{shortCode})
- Rate limiting
- Authentication
2. Key Generation Service
- Pre-generates unique keys
- Distributes to API servers
- Handles exhaustion
3. Database
- Stores URL mappings
- shortCode → {longUrl, createdAt, expiresAt, userId}
- Needs high read throughput
4. Cache Layer
- Hot URLs cached
- LRU eviction
- 80% hit rate typical
5. Analytics Service
- Click tracking
- Geographic data
- Referrer information
Database Design
Schema:
- id (primary key)
- short_code (unique index)
- long_url
- created_at
- expires_at
- user_id (optional)
- click_count (denormalized)
Database Choice:
- NoSQL (Cassandra, DynamoDB) for scale
- Or sharded MySQL/PostgreSQL
- Key-value store for simple lookups
Sharding Strategy:
- Shard by short_code hash
- Even distribution
- No hot partition concerns
Handling Redirects
Redirect Flow:
- Request: GET /abc123
- Check cache for abc123
- If miss, query database
- Return 301/302 redirect
- Update analytics asynchronously
301 vs 302 Redirect:
- 301 (Permanent): Browser caches, better for SEO
- 302 (Temporary): Always hits server, better for analytics
- Most shorteners use 302
Caching Strategy
Cache Layer:
- Redis or Memcached
- Key: short_code
- Value: long_url
- TTL: Based on popularity
Cache Warming:
- Pre-load popular URLs
- Analyze access patterns
- Background refresh
Cache Hit Ratio:
- Power law distribution (few URLs get most traffic)
- 80%+ hit rate achievable
- Significantly reduces DB load
High Availability
Redundancy:
- Multiple API server instances
- Database replication
- Cache cluster
Failover:
- Load balancer health checks
- Automatic failover
- Multi-region deployment
Consistency:
- Strong consistency for writes
- Eventual consistency acceptable for analytics
Security Considerations
URL Validation:
- Check for malicious URLs
- Blocklist known bad domains
- Rate limit creation
Spam Prevention:
- CAPTCHA for anonymous users
- Rate limiting
- Abuse detection
Privacy:
- Don't leak URL patterns
- Secure analytics data
- GDPR compliance
Analytics Implementation
Click Tracking:
- Log each redirect
- Async processing (Kafka → analytics DB)
- Aggregation for dashboard
Data Collected:
- Timestamp
- IP address (for geo)
- User agent
- Referrer
- Short code
Scaling Considerations
Read Scaling:
- Add cache nodes
- Add read replicas
- CDN for global reach
Write Scaling:
- Shard database
- Batch key generation
- Async analytics
Interview Tips
Common Questions:
- How to generate unique short codes?
- How to handle high read traffic?
- How to prevent collisions?
- How to track analytics?
Key Trade-offs:
- Counter vs random (predictability vs simplicity)
- 301 vs 302 (SEO vs analytics)
- Strong vs eventual consistency
Extensions:
- Custom aliases
- Expiring URLs
- Private URLs
- API rate limiting
The URL shortener is a great interview problem because it's simple enough to discuss thoroughly but has depth in scaling and optimization.
Ready to Build Your Perfect Resume?
Let IdealResume help you create ATS-optimized, tailored resumes that get results.
Get Started Free