Step 1: Understanding the Problem and Design Scope
Key clarification questions and assumptions:
- Example: A long URL (e.g.,
https://www.example.com/long-url
) is transformed into a short alias (e.g.,https://tinyurl.com/y7keocwj
). Clicking the alias redirects to the original URL. - Traffic Volume: Approximately 100 million URLs generated per day.
- Shortened URL Length: As short as possible.
- Allowed Characters: Numbers (0-9) and characters (a-z, A-Z).
- Deletion/Updates: For simplicity, shortened URLs cannot be deleted or updated.
Basic Use Cases:
- URL Shortening: Input a long URL, receive a shorter URL.
- URL Redirecting: Input a short URL, redirect to the original long URL.
- Non-functional requirements: High availability, scalability, and fault tolerance.
Back-of-the-Envelope Estimation:
- Write Operations: 100 million URLs/day ≈ 1160 writes/second.
- Read Operations: Assuming a 10:1 read-to-write ratio, ≈ 11,600 reads/second.
- Total Records (10 years): 365 billion records.
- Average URL Length: 100 bytes.
- Storage Requirement (10 years): 36.5 TB.
Step 2: High-Level Design
API Endpoints (REST-style):
- URL Shortening:
POST api/v1/data/shorten
- Request:
{longUrl: longURLString}
- Returns:
shortURL
- URL Redirecting:
GET api/v1/shortUrl
- Returns:
longURL
for HTTP redirection.
URL Redirecting Flow: When a short URL is accessed, the server receives the request and redirects to the original long URL. This involves a 301 redirect (permanent, cached by browser) or 302 redirect (temporary, subsequent requests hit the service). 301 reduces server load, while 302 is better for analytics.
(Figure 1: Illustration of URL redirecting process) (Figure 2: Detailed communication between client and server for redirection)
URL Shortening Flow:
Assumes the short URL format: www.tinyurl.com/{hashValue}
. A hash function fx
maps a long URL to a unique hashValue
.
(Figure 3: Long URL to Hash Value mapping)
Step 3: Design Deep Dive
Data Model:
Instead of an in-memory hash table, a relational database is preferred to store <shortURL, longURL>
mappings.
(Figure 4: Simplified database table design with id, shortURL, longURL columns)
Hash Function:
The hashValue
consists of 62 possible characters (0-9, a-z, A-Z). To support 365 billion URLs, the length of the hashValue
needs to be 7 characters (since 62^7 ≈ 3.5 trillion, which is sufficient).
Two approaches for Hash Functions:
-
Hash + Collision Resolution:
- Use standard hash functions (CRC32, MD5, SHA-1). These produce longer hashes.
- Take the first 7 characters, but this can cause collisions.
- Collision resolution: Recursively append a predefined string until no collision is found. Bloom filters can optimize collision checking.
(Figure 5: Collision resolution process)
-
Base 62 Conversion (Preferred Approach):
- Convert a unique numerical ID (from a unique ID generator) into a base 62 representation.
- Example: 11157 (base 10) converts to “2TX” (base 62).
(Figure 6: Base 62 conversion process example)
Comparison of Approaches:
Feature | Hash + Collision Resolution | Base 62 Conversion |
---|---|---|
Short URL Length | Fixed | Not fixed, depends on ID |
Unique ID Generator | Not needed | Depends on it |
Collision | Possible, needs resolution | Not possible (ID is unique) |
Next Short URL | Not predictable | Predictable (if ID increments) |
URL Shortening Deep Dive (using Base 62 Conversion):
(Figure 7: URL shortening flow)
- Input:
longURL
. - Check Database: See if
longURL
already exists. - Return Existing: If found, return the existing
shortURL
. - Generate New: If new, generate a unique
ID
(e.g.,2009215674938
) from a distributed unique ID generator. - Convert ID: Convert the
ID
toshortURL
using base 62 conversion (e.g., “zn9edcu”). - Save to DB: Store
ID
,shortURL
, andlongURL
in the database.
URL Redirecting Deep Dive:
(Figure 8: Detailed URL redirecting design with caching)
- User Clicks: User clicks a
shortURL
. - Load Balancer: Forwards request to web servers.
- Cache Check: If
shortURL
is in cache, returnlongURL
directly. - Database Fetch: If not in cache, fetch
longURL
from the database. Handle invalidshortURL
s. - Return LongURL: Return the
longURL
to the user.
Step 4: Wrap Up (Additional Considerations)
- Rate Limiter: Prevent malicious users from overwhelming the service with requests.
- Web Server Scaling: Web tier is stateless, easy to scale by adding/removing servers.
- Database Scaling: Use replication and sharding.
- Analytics: Integrate solutions to track click rates and other metrics.
- Availability, Consistency, Reliability: Core principles for large systems.