Design a URL Shortening Service

Compare model answers for this System Design benchmark and review scores, judging comments, and related examples.

X f L

Contents

Task Overview

Benchmark Genres

System Design

Task Creator Model The task creator is randomly selected from top task-generation models of supported providers.

Anthropic Claude Opus 4.6

Answering Models In this benchmark, models from the same provider as the task creator are excluded from answering.

Answer A Google Gemini 2.5 Flash-Lite

Answer B OpenAI GPT-5.2

Judge Models Judging uses exactly 3 judge models, excluding the answering models. At least 1 judge is selected from flagship models, lightweight models are not selected as judges, and the 3 judges come from 3 distinct providers.

OpenAI GPT-5.4 Anthropic Claude Opus 4.6 Google Gemini 2.5 Pro

Task Prompt

Show more ▼

Design a URL shortening service (similar to bit.ly or tinyurl.com) that must handle the following constraints: 1. The service must support 100 million new URL shortenings per month. 2. The average read-to-write ratio is 100:1 (i.e., shortened URLs are accessed far more often than they are created). 3. Shortened URLs must remain accessible for at least 5 years after creation. 4. The system must achieve 99.9% uptime availability. 5. Redirect latency (from receiving a short URL request to issuing the HTTP redirect) must be under 50ms at the 95th percentile. In your design, address all of the following: A. High-level architecture: Describe the major components (API servers, databases, caches, load balancers, etc.) and how they interact. Include a clear description of the request flow for both URL creation and URL redirection. B. Short URL generation strategy: Explain how you would generate unique short codes. Discuss the trade-offs between different approaches (e.g., hashing, counter-based, pre-generated key pools) and justify your choice. C. Data storage: Choose a database technology and schema. Estimate the storage requirements over 5 years given the constraints. Explain why your chosen database is appropriate. D. Scaling strategy: Explain how the system scales to handle the read-heavy traffic pattern. Discuss caching strategy, database partitioning or sharding approach, and how you would handle hot keys (viral URLs that receive disproportionate traffic). E. Reliability and fault tolerance: Describe how the system maintains 99.9% availability. Address what happens when individual components fail, and how you handle data replication and failover. F. Key trade-offs: Identify at least two significant design trade-offs you made and explain why you chose one side over the other given the stated constraints.

Judging Policy

Show more ▼

A strong answer should present a coherent end-to-end architecture that clearly addresses all six required sections (A through F). Evaluate based on the following criteria: 1. Completeness: All six sections should be substantively addressed. Missing or superficial treatment of any section is a weakness. 2. Quantitative reasoning: The answer should include back-of-the-envelope calculations where appropriate, such as estimating QPS for reads and writes, total storage over 5 years, and cache sizing. Numbers should be internally consistent with the stated constraints. 3. Architecture soundness: The proposed components should work together logically. The request flow should be clearly described and make engineering sense. The choice of technologies should be appropriate for the workload characteristics. 4. Short URL generation: The answer should compare at least two approaches, identify concrete trade-offs (collision risk, coordination overhead, predictability), and provide a justified recommendation. 5. Scaling and caching: Given the 100:1 read-to-write ratio, the answer should propose a robust caching layer and explain cache invalidation or expiration strategy. The sharding approach should be clearly explained with a partitioning key. Hot key mitigation should be specifically addressed. 6. Reliability: The answer should go beyond simply saying "use replication" and explain specific failover mechanisms, redundancy across availability zones or regions, and how the 99.9% SLA is achieved in practice. 7. Trade-off reasoning: The identified trade-offs should be genuine engineering decisions (not trivial), and the reasoning should demonstrate understanding of why the constraints push toward one choice over another. 8. Clarity and organization: The answer should be well-structured and easy to follow, with clear separation between the required sections.

Answers

Answer A: Google Gemini 2.5 Flash-Lite

Show Full Answer ▼

## System Design: URL Shortening Service ### A. High-Level Architecture The system will consist of the following major components: 1. **Load Balancers (LBs):** Distribute incoming traffic (both API requests for shortening and redirection requests) across multiple API servers. We'll use multiple layers of LBs, starting with a global LB (e.g., AWS Route 53) for DNS-based routing and then regional LBs (e.g., AWS ELB/ALB) for distributing traffic within a region. 2. **API Servers (Web Servers):** Handle incoming requests for both creating short URLs (POST /shorten) and redirecting short URLs (GET /{short_code}). These servers will be stateless. 3. **URL Shortening Service (Business Logic):** A microservice responsible for generating short codes, interacting with the database, and potentially caching. 4. **Database:** Stores the mapping between short codes and original URLs. We'll discuss the choice in detail in section C. 5. **Cache:** Stores frequently accessed short URL mappings to reduce database load and latency for redirects. Redis or Memcached will be used. 6. **Message Queue (Optional but recommended for scale):** For asynchronous processing of tasks like updating analytics or handling potential retries, though not strictly necessary for core functionality. **Request Flow:** * **URL Creation (POST /shorten):** 1. Client sends a POST request with the original URL to the API Gateway/Load Balancer. 2. The LB forwards the request to an available API Server. 3. The API Server calls the URL Shortening Service. 4. The URL Shortening Service generates a unique short code (see section B). 5. The service stores the mapping (short_code -> original_url) in the Database. 6. The service might also write the mapping to the Cache. 7. The API Server returns the shortened URL (e.g., `https://short.url/{short_code}`) to the client. * **URL Redirection (GET /{short_code}):** 1. User enters a shortened URL in their browser. 2. The request hits the Load Balancer. 3. The LB forwards the request to an available API Server. 4. The API Server (or a dedicated Redirect Service) first checks the Cache for the `short_code`. 5. **Cache Hit:** If found, the original URL is retrieved from the cache, and the API Server issues an HTTP 301 (Permanent Redirect) or 302 (Temporary Redirect) response to the client with the original URL. 6. **Cache Miss:** If not found in the cache, the API Server queries the Database for the `short_code`. 7. The Database returns the original URL. 8. The API Server adds the mapping to the Cache for future requests. 9. The API Server issues the HTTP redirect response. ### B. Short URL Generation Strategy We need a strategy that generates unique, short, and preferably somewhat random-looking codes. The primary goal is uniqueness and efficiency. * **Hashing (e.g., MD5, SHA-1):** * *Pros:* Deterministic, can generate unique codes. Can be truncated. * *Cons:* Collisions are possible (though rare with good hashing and truncation). Codes might not be uniformly distributed, leading to hot spots. Not easily human-readable or memorable. * **Counter-Based (e.g., Auto-incrementing ID):** * *Pros:* Guarantees uniqueness. Simple to implement. * *Cons:* Centralized counter can become a bottleneck. Sequential IDs are predictable and can reveal usage patterns. Requires a distributed counter system for high availability and scale. * **Pre-generated Key Pools:** * *Pros:* Distributes the generation load. Can pre-generate a large pool of unique codes. * *Cons:* Requires managing the pool, ensuring no overlap, and potentially pre-generating a massive number of keys. * **Base-62/Base-64 Encoding of a Counter:** * *Pros:* Generates short, unique codes by encoding a monotonically increasing counter (e.g., a 64-bit integer) into a base-62 (0-9, a-z, A-Z) or base-64 alphabet. This is efficient and produces relatively short strings. Guarantees uniqueness if the counter is managed properly. * *Cons:* Requires a distributed counter service to avoid bottlenecks and ensure uniqueness across multiple API servers. **Chosen Strategy: Base-62 Encoding of a Distributed Counter** This approach offers the best balance of uniqueness, shortness, and scalability. We'll use a distributed ID generation service (like Twitter's Snowflake or Apache ZooKeeper) to generate unique, sequential 64-bit IDs. Each ID will then be encoded into a base-62 string. This provides short, unique codes. The sequential nature of IDs can be managed to avoid predictable patterns by using a random component in the ID generation or by shuffling the mapping later. ### C. Data Storage **Database Technology:** A NoSQL database like **Cassandra** or **ScyllaDB** (a high-performance Cassandra-compatible database) is a strong candidate due to its distributed nature, high availability, and excellent write performance, which is crucial for URL creation. Alternatively, a sharded relational database (like PostgreSQL with Citus or MySQL with Vitess) could work, but NoSQL often simplifies scaling for this type of key-value lookup. Let's assume **Cassandra** for its linear scalability and fault tolerance. **Schema:** We'll use a simple schema in a keyspace optimized for reads: ```cql CREATE TABLE url_mappings ( short_code text PRIMARY KEY, original_url text, created_at timestamp ); ``` * `short_code`: The unique short code generated (e.g., `aBcDeF`). This is the primary key. * `original_url`: The long URL to redirect to. * `created_at`: Timestamp for when the URL was shortened. **Storage Requirements Estimation (over 5 years):** * **New URLs per month:** 100 million * **Total URLs over 5 years:** 100 million/month * 12 months/year * 5 years = 6 billion URLs * **Average short code length:** Base-62 encoding of a 64-bit integer results in ~11 characters. Let's assume 15 bytes for the `short_code` (including overhead). * **Average original URL length:** Let's assume 2000 bytes (URLs can be very long). * **Overhead:** Assume 10% overhead for database storage (replication, indexes, etc.). **Total Storage:** (6 billion URLs) * (15 bytes/short_code + 2000 bytes/original_url) * 1.10 (overhead) = 6 * 10^9 * 2015 bytes * 1.10 ≈ 13.3 * 10^12 bytes ≈ 13.3 Terabytes (TB) This is a manageable amount for a distributed NoSQL database like Cassandra, which can scale horizontally by adding more nodes. **Why Cassandra?** * **Scalability:** Scales linearly by adding more nodes, handling massive amounts of data and traffic. * **Availability:** Designed for high availability with no single point of failure. Data is replicated across multiple nodes and data centers. * **Write Performance:** Excellent write throughput, crucial for the URL creation part. * **Read Performance:** Can be optimized for fast key-value lookups, which is what redirection requires. ### D. Scaling Strategy **1. Caching Strategy:** * **Level 1 Cache (Distributed Cache):** Use Redis or Memcached cluster. Store `short_code -> original_url` mappings. This will serve the vast majority of read requests (99% of traffic, given the 100:1 read-to-write ratio). * **Cache Invalidation/Expiration:** For simplicity and given the 5-year lifespan, we can use a Time-To-Live (TTL) slightly longer than the expected access pattern, or simply rely on the fact that once a short URL is created, its mapping rarely changes. A common strategy is to cache indefinitely until the application restarts or a manual invalidation is triggered (though this is less common for URL shorteners). * **Cache Population:** When a redirect request misses the cache, the original URL is fetched from the database, and then written into the cache before returning the redirect. **2. Database Partitioning/Sharding:** * **Cassandra:** Cassandra handles partitioning automatically based on the primary key (`short_code`). Data is distributed across nodes using a consistent hashing algorithm. This inherently shards the data. * **Relational DB (if chosen):** Shard the database based on the `short_code` (e.g., using a hash of the `short_code` to determine the shard). This distributes the load and data across multiple database instances. **3. Handling Hot Keys (Viral URLs):** * **Caching:** The primary defense against hot keys is aggressive caching. A viral URL will be heavily cached in Redis/Memcached, meaning subsequent requests will be served directly from memory, bypassing the database entirely. * **Database Read Replicas (if using RDBMS):** If the database becomes a bottleneck even with caching, read replicas can be employed, though Cassandra's distributed nature already handles this well. * **Content Delivery Network (CDN):** For extremely high traffic, the redirect responses themselves could potentially be cached at the edge using a CDN, though this adds complexity and might not be suitable for all redirect types (e.g., 302). However, the API servers themselves could be placed behind a CDN for faster global access. * **Dedicated Read Path:** For extreme cases, a separate read-optimized data store or cache layer could be considered, but the primary cache should handle most hot key scenarios. ### E. Reliability and Fault Tolerance **1. High Availability (99.9% Uptime):** * **Redundancy:** Deploy multiple instances of each component (API servers, cache nodes, database nodes) across multiple Availability Zones (AZs) or even regions. * **Load Balancers:** LBs will automatically route traffic away from unhealthy instances. * **Stateless API Servers:** API servers are stateless, so any server can handle any request. If one fails, others take over seamlessly. * **Database Replication:** Cassandra provides built-in data replication. We'll configure a replication factor (e.g., 3) across different nodes and racks/AZs. If a node fails, replicas on other nodes serve the data. * **Cache Replication/Clustering:** Redis Sentinel or Redis Cluster can provide high availability for the cache. **2. Component Failure Handling:** * **API Server Failure:** LBs detect failure and stop sending traffic to the unhealthy instance. Other instances continue to serve requests. * **Cache Node Failure:** If using Redis Cluster or Sentinel, the system automatically promotes a replica or reconfigures the cluster. If a single node fails, some cached data might be temporarily unavailable, but the system will fall back to the database. * **Database Node Failure:** Cassandra automatically handles node failures. Data is available from replicas on other nodes. If a whole AZ fails, data is still available from nodes in other AZs (if multi-AZ deployment). **3. Data Replication and Failover:** * **Database:** Cassandra's replication strategy ensures data durability and availability. We'll use a replication factor of 3, with replicas spread across different physical locations (racks/AZs). If a node fails, reads and writes can still be served by other replicas. * **Cache:** Redis Sentinel or Cluster provides automatic failover for cache nodes. * **Application Level:** Implement retry mechanisms for transient network issues when communicating between services. ### F. Key Trade-offs 1. **Consistency vs. Availability (CAP Theorem):** * **Choice:** We prioritize **Availability** and **Partition Tolerance** over strong Consistency for the redirect service. For URL creation, we need strong consistency to ensure unique short codes. For redirection, if a database replica is temporarily unavailable, we'd rather serve a stale entry from cache or return an error than block requests. * **Justification:** The core function of a URL shortener is to redirect users quickly and reliably. Occasional stale cache entries or temporary unavailability of a specific short URL mapping (which would be rare with proper replication) is more acceptable than widespread service outages. For URL creation, a distributed ID generator with strong consistency guarantees is used to ensure uniqueness. 2. **Latency vs. Cost/Complexity (Caching Strategy):** * **Choice:** We chose an aggressive, multi-layered caching strategy (in-memory cache like Redis/Memcached) to achieve the sub-50ms latency requirement for redirects. * **Justification:** The 100:1 read-to-write ratio and the strict latency requirement (99.9% uptime, <50ms latency at 95th percentile) make caching essential. Serving most requests directly from RAM significantly reduces database load and response times. While caching adds complexity (cache invalidation, consistency concerns, additional infrastructure cost), it's a necessary trade-off to meet the performance and availability goals. Without it, the database would be overwhelmed by read traffic, leading to higher latency and potential outages.

Result

Winning Votes

0 / 3

Average Score

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

Answer A provides a solid, well-structured design that covers all six required sections. It includes reasonable back-of-envelope calculations for storage, a clear request flow, and discusses multiple short URL generation approaches. However, it has several weaknesses: the QPS calculations are missing (it never computes reads/writes per second), the storage estimate uses an unusually high 2000-byte average URL length, the hot key mitigation is somewhat superficial (mostly just "use caching"), the reliability section lacks specifics about multi-region active-active deployment, and the trade-offs section, while valid, is somewhat generic (CAP theorem discussion and caching vs. cost). The caching strategy lacks details like negative caching, request coalescing, or in-process L1 caching. The 301 vs 302 redirect choice is mentioned but not analyzed.

View Score Details ▼

Architecture Quality

Weight 30%

Answer A presents a reasonable architecture with standard components (LB, API servers, cache, DB, optional message queue) and clear request flows. However, it lacks edge/CDN considerations, doesn't mention multi-region active-active, and the architecture is somewhat generic without distinguishing between L1/L2 caching or addressing async pipelines in detail. The request flow is clear but misses validation, rate limiting, and normalization steps.

Completeness

Weight 20%

Answer A addresses all six sections but with varying depth. QPS calculations are entirely missing (never computes reads or writes per second). Storage estimation is present but uses an unrealistically high 2000-byte average URL. Hot key handling is superficial. Cache invalidation strategy is vague. The trade-offs section covers only two trade-offs as required but they are somewhat generic.

Trade-off Reasoning

Weight 20%

Answer A identifies two trade-offs: CAP theorem (consistency vs availability) and latency vs cost/complexity for caching. The CAP discussion is valid but somewhat textbook and generic. The caching trade-off is real but doesn't go deep into specific tensions. Neither trade-off is particularly novel or demonstrates deep insight into the specific constraints of this system.

Scalability & Reliability

Weight 20%

Answer A discusses caching, Cassandra's built-in partitioning, and basic hot key handling (mostly 'use caching'). Reliability section mentions multi-AZ deployment, replication factor 3, and Redis Sentinel/Cluster but lacks specifics on multi-region failover, thundering herd protection, circuit breakers, or deployment safety. No mention of SLIs/SLOs or how 99.9% is actually measured and maintained. Hot key mitigation is weak beyond caching.

Clarity

Weight 10%

Answer A is well-organized with clear section headers matching the required sections A-F. The request flows are easy to follow with numbered steps. The writing is clear and accessible. However, some sections could benefit from more concrete details rather than general statements. The use of markdown formatting aids readability.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer A provides a solid and complete design that addresses all the required sections. The architecture is logical, using standard components like load balancers, stateless servers, a cache, and a NoSQL database. The URL generation strategy is well-reasoned, and the reliability measures are appropriate. However, some aspects are less detailed than they could be. The storage estimation uses a questionable assumption for average URL length, leading to an inflated result. The caching strategy is also somewhat simplistic, lacking detail on handling updates or malicious links.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is sound and uses appropriate standard components. The request flows are clear. However, it lacks the depth of a production-grade system, such as considerations for edge computing, multi-region deployment, or asynchronous processing for non-critical tasks.

Completeness

Weight 20%

The answer addresses all six required sections substantively. It meets all the prompt's requirements without any omissions.

Trade-off Reasoning

Weight 20%

The answer identifies two valid and important trade-offs (Consistency vs. Availability, Latency vs. Cost) and provides clear, logical justifications for the choices made.

Scalability & Reliability

Weight 20%

The answer discusses standard scaling and reliability techniques like caching and database replication. However, the caching strategy is overly simplistic (suggesting indefinite caching), and the discussion lacks the specific, advanced techniques needed for a system at this scale.

Clarity

Weight 10%

The answer is very well-structured and clearly written, with distinct sections that make it easy to follow and evaluate.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer A covers all required sections and presents a generally workable architecture with load balancers, stateless API servers, cache, and a distributed database. Its strongest points are the end-to-end request flows and a reasonable comparison of code-generation options. However, it is light on quantitative rigor: it does not estimate read/write QPS well, its storage estimate is internally weak because the assumed average URL length is unrealistically high while replication is folded into vague overhead, and it gives little concrete sizing for cache or peak traffic. Reliability discussion is mostly generic replication and failover language without enough detail on multi-region behavior, consistency settings, or how the SLA is actually met under failures. Hot-key handling is also somewhat shallow.

View Score Details ▼

Architecture Quality

Weight 30%

The core components and request flows are coherent and mostly sensible, with stateless APIs, cache, and a distributed database. However, the design stays high level, the role separation between API servers and URL service is a bit redundant, and some decisions such as optional queue/CDN are not tightly integrated into the critical path requirements.

Completeness

Weight 20%

All required sections are present, but several are handled somewhat superficially. Quantitative treatment is limited, hot-key handling lacks depth, and reliability/trade-offs are discussed in broad terms rather than with concrete mechanisms.

Trade-off Reasoning

Weight 20%

It identifies real trade-offs, especially availability versus consistency and latency versus caching complexity. Still, the reasoning is somewhat generic and does not deeply connect the chosen trade-offs to the specific workload and multi-region implications.

Scalability & Reliability

Weight 20%

The answer recognizes caching, automatic partitioning in Cassandra, replication, and multi-AZ deployment, but it lacks concrete throughput planning, cache hit assumptions, miss protection, and specific failover/consistency behavior. Reliability mechanisms are mostly standard statements rather than an actionable strategy for 99.9% uptime.

Clarity

Weight 10%

The structure is easy to follow and the sectioning maps well to the prompt. Some passages are repetitive and a few points are vague, but overall readability is solid.

Answer B: OpenAI GPT-5.2

Show Full Answer ▼

A) High-level architecture Components 1) Edge - DNS + CDN/Anycast edge (optional but strongly recommended for <50ms p95 globally) - WAF + DDoS protection - Global load balancers (GSLB) routing to nearest region 2) Regional stack (at least 2 regions, active-active) - L7 load balancer (ALB/Envoy) - Stateless API service (URL Shortener) - Create API (write path) - Redirect API (read path) - Cache tier - L1 in-process cache (tiny, per instance, very hot keys) - L2 distributed cache (Redis/Memcached cluster) - Persistent storage - Primary KV store (DynamoDB/Cassandra/ScyllaDB-style) for short_code -> long_url mapping - Optional relational store (Postgres/MySQL) for analytics, user accounts, custom aliases (not in critical redirect path) - Async pipeline (non-critical) - Kafka/Kinesis/PubSub for click events, logs, abuse signals - Stream processors / OLAP sink (ClickHouse/BigQuery/S3) Request flow: URL creation (write) 1) Client calls POST /api/shorten with long_url (+ optional custom alias, TTL policy, user auth). 2) Request hits GSLB -> regional LB -> API server. 3) API validates URL (format, allow/deny lists, safe browsing), normalizes it, enforces rate limits. 4) Short code generation service allocates a unique code (details in section B). 5) API writes mapping to KV store (short_code as partition key). - Write includes created_at, expire_at (>=5 years), long_url, owner_id, flags. - Use quorum write (or strongly consistent write depending on store). 6) API populates L2 cache with short_code -> long_url (write-through) and returns short URL. 7) Emit asynchronous events (created, user, spam scoring) to stream; not required for success response. Request flow: URL redirection (read) 1) Browser requests GET /{code}. 2) Edge routes to nearest region; optional CDN can cache 301/302 responses if policy allows. 3) API server parses code. 4) Lookup order: a) L1 in-process cache b) L2 Redis/Memcached c) KV store (if cache miss) 5) If found and not expired/blocked: return HTTP 301/302 redirect with Location: long_url. - Cache fill on miss (set in L2; optionally L1). 6) Emit click event asynchronously (fire-and-forget). 7) If not found: 404 (optionally fallback to origin region if using regional isolation). Latency targets - L2 cache hit should dominate (sub-millisecond to a few ms within region). With edge routing + high cache hit rate, <50ms p95 is feasible. B) Short URL generation strategy Requirements - 100M new shortenings/month ~ 3.3M/day ~ ~38.6 writes/sec average (peaks higher). - Must generate unique, short, URL-safe codes with low collision risk and no coordination bottleneck. Options 1) Hash(long_url) - Pros: deterministic; dedup possible. - Cons: collisions require handling; reveals patterns; hard to support multiple identical long URLs for different users; changing normalization affects stability. 2) Central counter (autoincrement) + base62 encoding - Pros: compact; no collisions; simple. - Cons: single logical bottleneck unless sharded; sequential codes are guessable (security/abuse). 3) Distributed ID (Snowflake-like: time + worker + sequence) then base62 - Pros: no central coordinator; scalable; unique; bounded size. - Cons: slightly longer codes; time-based ordering leaks some info. 4) Pre-generated key pool - Pros: fastest allocation; decouples creation from ID generation. - Cons: requires managing pool, storage, and avoiding reuse; operational complexity. Chosen approach - Use a Snowflake-style 64-bit unique ID generated by each API node (or dedicated ID service) and encode to base62/base64url. - 64 bits in base62 yields up to 11 characters worst-case; typically 10–11 chars, acceptable. - If shorter codes are required, use 56–60 bits (still plenty) with careful epoch/time window and worker bits. - To reduce predictability, optionally apply a reversible permutation (e.g., Feistel network) to the 64-bit ID before base62 encoding; keeps uniqueness but makes codes non-sequential. Why this choice - Eliminates single counter bottleneck. - No collision handling needed. - Works well with multi-region active-active. - Predictability mitigated via permutation. Custom aliases - If user requests custom code, validate availability with a conditional write (put-if-absent) in KV store; if conflict, reject. C) Data storage Database choice - Primary store: a highly available, horizontally scalable KV database (DynamoDB or Cassandra/ScyllaDB). - Reasoning: - Access pattern is key-based reads by short_code. - Read-heavy at high QPS; KV stores excel at low-latency point lookups. - Built-in replication, partitioning, and operational maturity. Schema (single table) Table: url_mapping - Partition key: short_code (string) - Attributes: - long_url (string) - created_at (timestamp) - expire_at (timestamp, must be >= created_at + 5 years; can be null for “never”, but requirement says at least 5 years) - owner_id (string, optional) - status (active/disabled/malicious) - meta (optional: title, campaign, etc.) Indexes (optional) - GSI/secondary index on owner_id + created_at for listing user links (not on redirect path). Storage estimation (5 years) - New URLs: 100M/month * 12 * 5 = 6 billion mappings. - Per record size rough estimate: - short_code: 10–11 bytes (store as ~16 bytes with overhead) - long_url: assume average 100 bytes (could be 50–200; use 120 including overhead) - timestamps/status/owner_id: ~40 bytes average (owner_id optional) - DB overhead (keys, replication metadata, compression): varies; assume ~100 bytes overhead - Total logical per item ~250 bytes (conservative) - Raw logical storage: 6e9 * 250B = 1.5 TB. - With replication factor 3 (common in Cassandra/Scylla): ~4.5 TB. - Add indexes + headroom: plan ~6–10 TB across the cluster over 5 years. Notes - Long URLs can be compressed (dictionary/deflate) at application layer to reduce size; but only if it doesn’t add latency. Many KV stores also compress SSTables. D) Scaling strategy Traffic estimation - Writes: ~38.6/s average; assume peak 10x => ~400/s. - Reads: 100:1 => average ~3,860 reads/s; peak potentially much higher (e.g., 50k–200k/s depending on usage spikes). Scaling reads 1) Caching - L1 in-process cache: tiny (e.g., 10k–100k entries) with short TTL (30–120s) to absorb micro-hotspots and reduce Redis QPS. - L2 Redis/Memcached: - Cache value: long_url + status + expire_at. - TTL strategy: - If links are immutable (common), set long TTL (hours to days). - Still respect expire_at by setting TTL = min(configured_max_ttl, expire_at - now). - Use cache-aside with negative caching: - Cache “not found” for a short TTL (e.g., 30–60s) to reduce DB hits from scanners. - Aim for >95–99% cache hit rate for redirects. 2) CDN caching (optional but powerful) - For 301 redirects that are stable, CDN can cache the redirect response for popular codes. - Must consider ability to disable links quickly; mitigate by: - Short CDN TTL (e.g., 60–300s) + purge on disable - Or keep redirect logic at origin but cache mapping at edge KV (advanced). Database partitioning/sharding - Partition by short_code (consistent hashing / token ring). - Even distribution expected because codes are pseudo-random (post-permutation). - Add nodes to scale linearly. Handling hot keys (viral URLs) - Multi-layer approach: 1) Edge/CDN caches the redirect response for hottest keys with short TTL. 2) L1 + L2 caches reduce origin DB load. 3) Redis cluster scaling: shard by key; enable client-side hashing. 4) Request coalescing / single-flight on cache miss: if many requests miss simultaneously for same key, one request populates cache while others wait briefly. 5) For extreme cases, allow “hot key replication” in cache (store same key in multiple shards) or use a dedicated “hot cache” tier. Write scaling - Writes are modest; ensure conditional writes for uniqueness and idempotency. - Use rate limiting and abuse prevention to avoid write amplification from bots. E) Reliability and fault tolerance (99.9% uptime) Multi-region - Active-active in at least 2 regions. - GSLB health checks route traffic away from unhealthy region. Stateless services - API servers are stateless; scale horizontally; deploy across multiple AZs. Cache tier - Redis/Memcached deployed as a sharded cluster with replicas per shard (or Redis Cluster with replicas). - If cache fails: system falls back to DB; increased latency but still functional. - Protect DB with circuit breakers and adaptive rate limiting if cache outage causes thundering herd. Database replication and failover - DynamoDB option: - Use Global Tables for multi-region active-active replication. - Configure on-demand or provisioned capacity with autoscaling. - Cassandra/Scylla option: - Multi-AZ replication within region (RF=3). - Multi-region replication via NetworkTopologyStrategy; tune consistency (e.g., LOCAL_QUORUM for reads/writes) to keep latency low. - Failure modes: - Single node failure: replica reads/writes continue. - AZ failure: remaining AZs serve traffic (RF across AZs). - Region failure: GSLB shifts to healthy region; data available via replicated store. Data durability - Regular backups/snapshots to object storage (S3/GCS) + point-in-time recovery. - Disaster recovery drills and restore testing. Deployment safety - Rolling deployments + canary releases. - Feature flags. - Automated rollback on error budget burn. Operational controls - SLIs: redirect p95 latency, cache hit rate, DB p95, error rate, replication lag. - SLO: 99.9% availability; design for graceful degradation (e.g., temporarily disable analytics writes). F) Key trade-offs 1) Shortness of code vs distributed generation - Trade-off: A centralized counter can yield very short codes (often 6–8 chars early on), but adds coordination and is harder in multi-region. - Choice: distributed Snowflake-like IDs + base62 (10–11 chars) to ensure scalability, uniqueness, and multi-region compatibility while meeting latency/availability requirements. 2) Strong consistency vs low latency/high availability - Trade-off: Strongly consistent global reads increase latency and reduce availability during partitions. - Choice: use local-quorum/region-local consistency for redirect lookups (fast) with multi-region replication. For create operations, ensure uniqueness with conditional writes and accept eventual cross-region propagation; redirects remain correct because the authoritative mapping exists in at least one region and is quickly replicated. 3) Aggressive caching vs rapid disable/abuse response - Trade-off: Long cache TTLs improve latency and reduce DB load, but make it harder to immediately disable malicious links. - Choice: moderate TTLs + purge mechanisms, plus status checks in cached value and short TTL for CDN; allow “kill switch” by caching disabled status with high priority. Summary - Use edge routing + heavy caching to keep redirects under 50ms p95. - Use distributed unique ID generation (Snowflake + permutation) for scalable code generation. - Store mappings in a replicated KV store with sharding by short_code and multi-region failover. - Maintain 99.9% uptime via multi-AZ + multi-region, stateless services, replicated storage, and graceful degradation paths.

Result

#1 | Winner

Winning Votes

3 / 3

Average Score

Judge Models Anthropic Claude Opus 4.6

Total Score

Overall Comments

Answer B delivers a comprehensive, deeply technical design that excels across all six sections. It includes detailed QPS calculations (38.6 writes/s average, ~3,860 reads/s), a realistic storage estimate with reasonable per-record sizing (250 bytes vs A's 2015 bytes), and a multi-layered caching strategy (L1 in-process + L2 distributed + optional CDN). The hot key mitigation is thorough with request coalescing/single-flight, hot key replication, and CDN caching. The reliability section covers multi-region active-active, circuit breakers, thundering herd protection, deployment safety (canary releases, feature flags), and operational SLIs/SLOs. It identifies three genuine trade-offs including the nuanced caching vs. rapid disable tension. The Feistel network suggestion for code unpredictability shows deep engineering knowledge. Minor weakness: the answer is dense and could benefit from slightly more visual separation, but overall clarity is strong.

View Score Details ▼

Architecture Quality

Weight 30%

Answer B provides a thorough, production-grade architecture with edge layer (DNS, CDN, Anycast, WAF/DDoS), multi-region active-active design, L1 in-process + L2 distributed caching, async pipeline for analytics/abuse, and clear separation of read/write paths. Request flows include validation, normalization, rate limiting, and async event emission. The architecture demonstrates real-world operational awareness with components like circuit breakers and request coalescing.

Completeness

Weight 20%

Answer B substantively addresses all six sections with strong depth. It includes QPS calculations (38.6 writes/s, ~3,860 reads/s, peak estimates), realistic storage estimates (250 bytes/record), detailed cache sizing and TTL strategies, negative caching, custom alias handling, operational SLIs/SLOs, deployment safety, and three well-articulated trade-offs. The only minor gap is that cache sizing in terms of memory is not explicitly calculated.

Trade-off Reasoning

Weight 20%

Answer B identifies three trade-offs, all genuine and well-reasoned: (1) code shortness vs distributed generation, directly tied to the multi-region requirement; (2) strong consistency vs low latency, with specific mention of local-quorum reads; (3) aggressive caching vs rapid disable/abuse response, a nuanced operational concern that shows real-world experience. Each trade-off is clearly connected to the stated constraints and includes specific mitigation strategies.

Scalability & Reliability

Weight 20%

Answer B provides a comprehensive scaling and reliability strategy: multi-layer caching (L1+L2+CDN), request coalescing/single-flight for hot keys, hot key replication in cache, negative caching, circuit breakers for thundering herd, multi-region active-active with GSLB health checks, detailed failover scenarios (node/AZ/region), deployment safety (canary, feature flags, automated rollback), and explicit SLIs/SLOs. The hot key mitigation strategy is particularly thorough with five specific techniques.

Clarity

Weight 10%

Answer B is well-organized with clear sections and subsections. It packs significantly more information but remains readable through good use of hierarchical structure, numbered lists, and inline annotations. The summary section at the end is a nice touch. The density of information occasionally makes it harder to scan quickly, but the logical flow is maintained throughout.

Judge Models Google Gemini 2.5 Pro

Total Score

Overall Comments

Answer B presents an exceptionally detailed and robust system design that demonstrates a deep understanding of building scalable, resilient services. The architecture is more sophisticated, incorporating multi-region active-active deployment, multi-level caching (L1/L2), and an asynchronous pipeline from the outset. The quantitative reasoning is more realistic and thorough. The answer includes many advanced and practical considerations, such as using permutations to make IDs non-sequential, negative caching, request coalescing for hot keys, and circuit breakers. The trade-offs identified are insightful and reflect real-world engineering challenges.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is outstanding. It presents a mature, multi-layered design from the edge (CDN, GSLB) inwards, incorporating a multi-region active-active setup, two-level caching, and an async pipeline. This level of detail demonstrates a professional understanding of building high-performance systems.

Completeness

Weight 20%

100

The answer is exceptionally complete. It not only addresses all six required sections in great detail but also includes additional practical considerations like custom aliases, SLIs/SLOs, and deployment safety, which enrich the design significantly.

Trade-off Reasoning

Weight 20%

The answer identifies three excellent trade-offs. In addition to the standard ones, it includes the highly practical trade-off between aggressive caching and the ability to rapidly disable malicious links, demonstrating a deep understanding of the operational realities of such a service.

Scalability & Reliability

Weight 20%

The scalability and reliability design is excellent. It proposes a sophisticated multi-level caching strategy, negative caching, request coalescing for hot keys, and specific database replication strategies for multi-region active-active deployment. It also includes crucial details like circuit breakers.

Clarity

Weight 10%

The answer is extremely clear and well-organized. It uses formatting and bullet points effectively to present a large amount of dense, technical information in a highly digestible way. The final summary is a helpful addition.

Judge Models OpenAI GPT-5.4

Total Score

Overall Comments

Answer B is well-structured, directly aligned to sections A-F, and shows stronger systems thinking throughout. It provides clearer quantitative reasoning for write and read rates, a more realistic storage estimate with replication accounted for explicitly, and a stronger scaling plan with L1/L2 cache, negative caching, request coalescing, CDN trade-offs, and hot-key mitigation. The reliability section is more concrete, including active-active multi-region, AZ-aware replication, failover behavior, and graceful degradation. It also presents a more nuanced short-code strategy by combining Snowflake-style IDs with optional reversible permutation to mitigate predictability. Overall it is more complete, precise, and operationally credible.

View Score Details ▼

Architecture Quality

Weight 30%

The architecture is stronger and more production-ready, with edge routing, active-active regional stacks, explicit L1/L2 caching, a KV store for the critical path, and async analytics separated cleanly. Request flows are detailed and logically consistent with the latency and availability goals.

Completeness

Weight 20%

All six sections are substantively addressed with clear coverage of architecture, ID generation, storage, scaling, reliability, and trade-offs. The answer also adds practical considerations like custom aliases, abuse controls, negative caching, and operational monitoring without losing focus.

Trade-off Reasoning

Weight 20%

The trade-offs are concrete and well chosen: short code length versus distributed generation, consistency versus latency/availability, and cache aggressiveness versus fast disablement. The justifications are tied directly to the stated constraints and operational realities.

Scalability & Reliability

Weight 20%

This is a major strength: it estimates average and peak traffic, proposes layered caching with TTL strategy and negative caching, explains partitioning and hot-key handling, and covers cache failure, DB fallback, circuit breakers, multi-AZ replication, and multi-region failover in practical terms. The SLA discussion is much more credible.

Clarity

Weight 10%

The answer is very well organized, with crisp sectioning, bullet points, and clear request flows. It is easy to scan while still remaining specific and technically dense.

Comparison Summary

Final rank order is determined by judge-wise rank aggregation (average rank + Borda tie-break). Average score is shown for reference.

Judges: 3

Google Gemini 2.5 Flash-Lite

Winning Votes

0 / 3

Average Score

View this answer

Winner OpenAI GPT-5.2

Winning Votes

3 / 3

Average Score

View this answer

View head-to-head record for this model pair

Judging Results

Judge Models OpenAI GPT-5.4

Gemini 2.5 Flash-Lite 68

GPT-5.2 Winner 87

Why This Side Won

Answer B wins because it performs better on the heavily weighted architecture and scalability/reliability criteria while also being more complete and quantitatively grounded. Its design addresses the read-heavy workload with a robust multi-layer caching approach, gives concrete traffic and storage estimates, explains hot-key mitigation in detail, and specifies practical multi-region failover and replication strategies for meeting the availability target. Answer A is serviceable, but it is more generic and less rigorous in the areas most important to this task.

Judge Models Google Gemini 2.5 Pro

Gemini 2.5 Flash-Lite 79

GPT-5.2 Winner 96

Why This Side Won

Answer B is the clear winner due to its superior depth, detail, and practical considerations. While Answer A provides a correct and complete design, Answer B's proposed architecture is significantly more robust, scalable, and production-ready. Key differentiators include B's multi-level caching strategy, more realistic quantitative estimates, a more sophisticated URL generation method that mitigates predictability, and a more nuanced discussion of reliability and trade-offs, such as considering the tension between caching and abuse response. Answer B consistently demonstrates a higher level of expertise across all criteria.

Judge Models Anthropic Claude Opus 4.6

Gemini 2.5 Flash-Lite 61

GPT-5.2 Winner 81

Why This Side Won

Answer B wins because it demonstrates significantly deeper engineering depth across all criteria. It provides complete QPS calculations, a multi-layered caching strategy with specific techniques (negative caching, request coalescing, single-flight), more realistic storage estimates, comprehensive reliability mechanisms (circuit breakers, thundering herd protection, deployment safety, SLIs/SLOs), and three well-reasoned trade-offs including the nuanced caching-vs-abuse-response tension. The architecture is more complete with edge routing, L1/L2 caching, and async pipelines. On the most heavily weighted criterion (Architecture Quality at 30%), Answer B is substantially stronger with its multi-region active-active design, edge considerations, and operational maturity details.

Design a URL Shortening Service

Task Overview

Task Prompt

Answers

Answer A: Google Gemini 2.5 Flash-Lite

Answer B: OpenAI GPT-5.2

Comparison Summary

Judging Results

Related Tasks

Explain Database Indexing to a Junior Developer

Persuade a Skeptical School Board to Adopt a Later Start Time for High Schools

Explain the Paradox of the Ship of Theseus in Philosophy of Identity

Implement a Lock-Free Concurrent LRU Cache

Evaluating Transportation Options for a Mid-Size City

Responding to a Friend Who Just Lost Their Job

Innovative Uses for Retired Electric Vehicle Batteries

Write a Humorous Office Email Chain About a Missing Stapler

Related Links