Design OpenTable

OpenTable is a comprehensive restaurant reservation platform that connects diners with restaurants, managing millions of reservations daily across thousands of restaurants worldwide. The system handles real-time table availability, complex reservation logistics, waitlist management, and restaurant yield optimization.

Designing OpenTable presents unique challenges including preventing double-booking through strong consistency guarantees, real-time availability computation with high read throughput, intelligent waitlist management with accurate wait time estimation, and sophisticated table assignment algorithms that maximize restaurant revenue while maintaining service quality.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-3):

Diners should be able to search for restaurants with filters like cuisine, price, location, and ratings.
Diners should be able to check real-time table availability for specific times and party sizes.
Diners should be able to create, modify, and cancel reservations.
Restaurants should be able to manage table inventory, operating hours, and reservation policies.
The system should support waitlist management with estimated wait times and notifications.

Below the Line (Out of Scope):

Diners should be able to leave reviews and ratings for restaurants.
The system should support loyalty programs with points and tier benefits.
Restaurants should receive customer insights and analytics.
The system should handle recurring reservations for regular customers.
Integration with restaurant POS systems for automatic table status updates.

Non-Functional Requirements

Core Requirements:

The system should provide search results in under 500ms for excellent user experience.
The system must guarantee strong consistency for reservation creation to prevent double-booking under any circumstances.
The system should handle 100K concurrent users during peak dinner hours without degradation.
The system should support 10K reservations per minute globally during peak times.
The system should maintain 99.99% uptime for the reservation flow.

Below the Line (Out of Scope):

The system should ensure the security and privacy of user data, complying with regulations like GDPR.
The system should have robust monitoring, logging, and alerting to quickly identify issues.
The system should facilitate easy updates and maintenance without significant downtime.
The system should ensure payment processing meets PCI DSS compliance standards.

Clarification Questions & Assumptions:

Platform: Web and mobile apps (iOS, Android) for both diners and restaurants.
Scale: 100K+ restaurants on platform, 50M+ registered users, 500M+ reservations per year.
Geographic Coverage: Global, with focus on major cities in North America, Europe, and Asia.
Availability Window: Restaurants typically accept reservations 30 days in advance.
Table Management: Average restaurant has 20-50 tables with varying capacities (2-8 people).

Capacity Estimation

Storage Requirements:

Reservations: 500M/year at 2KB each = 1TB/year for reservation data.
Reviews: 10M/year at 5KB each = 50GB/year for review content.
Restaurant data: 100K restaurants at 50KB each = 5GB total.
User profiles: 50M users at 2KB each = 100GB total.
With indexes, replicas, and historical data: approximately 5TB/year total storage.

Bandwidth Requirements:

Peak reservation creation: 10K/minute = approximately 167 QPS for writes.
With typical 10:1 read-to-write ratio: approximately 1,670 QPS for searches and availability checks.
Peak dinner rush traffic: 5,000 QPS across all operations.
Average response size 10KB: 5,000 QPS * 10KB = 50MB/s = 400 Mbps egress.

Cache Requirements:

Hot restaurant availability data: 10K popular restaurants * 100KB = 1GB.
Search result cache: 100K active queries * 50KB = 5GB with reasonable TTL.
User sessions: 100K concurrent users * 10KB = 1GB session data.
Total hot cache: approximately 10GB for optimal performance.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

Restaurant: Represents a restaurant on the platform. Includes basic information (name, address, cuisine type, price range), operating hours for different shifts, contact details, average ratings, review count, and configuration for reservation policies.

Table: Represents individual tables or seating areas in a restaurant. Includes table number, minimum and maximum capacity, table type (standard, booth, outdoor, bar), whether it can be combined with other tables for large parties, and current status (active, inactive, maintenance).

Reservation: An individual reservation from creation through completion. Records the diner and restaurant identities, reservation time, party size, assigned tables, status (pending, confirmed, seated, completed, cancelled, no-show), special requests, deposit information, and all relevant timestamps.

Waitlist Entry: Represents a customer on the waitlist for walk-in seating. Includes party size, estimated wait time, position in queue, status (waiting, notified, seated, cancelled, expired), and notification timestamps.

Review: A post-dining review with ratings and feedback. Includes overall rating, individual ratings for food, service, ambiance, and value, review text, photos, verification status, restaurant response, and helpfulness metrics.

Operating Hours: Defines when restaurants accept reservations. Includes day of week, shift name (breakfast, lunch, dinner, brunch), open and close times, reservation interval (typically 15 minutes), maximum party size, and advance booking parameters.

API Design

Search Restaurants Endpoint: Used by diners to discover restaurants based on various criteria.

GET /restaurants -> Restaurant[]
Query: {
  location: { lat, long, radius },
  cuisine: string[],
  priceRange: number[],
  date: date,
  time: time,
  partySize: number
}

Check Availability Endpoint: Used by diners to see available time slots for a specific restaurant.

GET /restaurants/:restaurantId/availability -> TimeSlot[]
Query: {
  date: date,
  partySize: number
}

Create Reservation Endpoint: Used by diners to book a table after selecting a time slot.

POST /reservations -> Reservation
Body: {
  restaurantId: string,
  reservationTime: datetime,
  partySize: number,
  specialRequests: string
}

Modify Reservation Endpoint: Allows diners to change reservation details or cancel.

PATCH /reservations/:reservationId -> Reservation
Body: {
  action: "modify" | "cancel",
  newTime: datetime (optional),
  newPartySize: number (optional)
}

Add to Waitlist Endpoint: Used by walk-in customers to join the waitlist.

POST /restaurants/:restaurantId/waitlist -> WaitlistEntry
Body: {
  partySize: number,
  phone: string
}

Update Table Inventory Endpoint: Used by restaurant managers to configure their tables.

POST /restaurants/:restaurantId/tables -> Table
Body: {
  tableNumber: string,
  minCapacity: number,
  maxCapacity: number,
  tableType: string
}

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Diners should be able to search for restaurants with filters

The core components necessary to fulfill restaurant search are:

Client Applications: Web and mobile apps (iOS, Android) that provide the user interface for diners. Handle user input, display search results, and manage user sessions.
API Gateway: Entry point for all client requests, handling authentication, rate limiting, request routing, and protocol translation. Provides a unified interface regardless of backend microservices.
Search Service: Dedicated service for restaurant discovery and filtering. Maintains searchable indexes with restaurant data, handles complex queries with multiple filters, and ranks results based on relevance, ratings, distance, and availability.
Elasticsearch Cluster: Specialized search engine optimized for full-text search and filtering. Stores denormalized restaurant data including name, cuisine, location (geo-point), ratings, price range, and pre-computed availability windows.
CDN: Content Delivery Network for caching static assets like restaurant photos, reducing latency and backend load.

Search Flow:

Diner enters search criteria in the client app (location, cuisine, price, party size, date/time).
Client sends a GET request to the Search Service via API Gateway.
API Gateway handles authentication and rate limiting before forwarding.
Search Service constructs an Elasticsearch query combining text search, geographic filters, price/cuisine filters, and availability constraints.
Elasticsearch returns ranked results based on relevance score, distance, and ratings.
Search Service enriches results with real-time availability from cache before returning.
Results are returned to client with restaurant details and available time slots.

2. Diners should be able to check real-time table availability

We extend our design to support availability checking:

Restaurant Service: Manages restaurant profiles, table inventory, and operating hours. Provides configuration data to other services.
Table Management Service: Core service for availability computation. Tracks table states in real-time, computes available time slots based on existing reservations and table turnover times, and handles table combination logic for large parties.
Redis Cluster: In-memory data store for caching availability windows. Stores pre-computed availability for the current day with short TTL, enabling fast availability checks without hitting the database.
PostgreSQL Database: Primary data store for restaurants, tables, operating hours, and reservations. Configured with appropriate indexes for efficient querying.

Availability Check Flow:

Diner selects a restaurant and specifies party size and desired date.
Client sends a GET request to check availability.
Table Management Service first checks Redis cache for pre-computed availability for that restaurant and date.
If cache hit and data is fresh, return cached availability slots immediately.
If cache miss, query database to get all tables for the restaurant and all existing reservations for that date.
Compute available time slots by iterating through all possible reservation times, checking table capacity against existing reservations plus required turnover time.
Cache the computed availability in Redis with 1-hour TTL.
Return available time slots to client, showing times when sufficient tables are available for the party size.

3. Diners should be able to create, modify, and cancel reservations

We add critical components for reservation management:

Reservation Service: Core service managing the complete reservation lifecycle. Handles creation with atomic transactions, modification workflows, cancellation policies, and integration with payment systems for deposits.
Notification Service: Dispatches confirmations, reminders, and updates via multiple channels. Sends email confirmations immediately, SMS reminders 24 hours before, push notifications for waitlist status, and restaurant notifications for new reservations.
Kafka Event Stream: Message broker for asynchronous event processing. Streams reservation events (created, modified, cancelled) to downstream consumers, enables decoupled architecture, and ensures reliable event delivery with replay capability.

Reservation Creation Flow:

Diner selects an available time slot and confirms the reservation request.
Client sends a POST request with restaurant ID, time, party size, and special requests.
Reservation Service initiates a database transaction with SERIALIZABLE isolation level.
Service acquires row-level locks on the specific tables using FOR UPDATE NOWAIT to prevent concurrent bookings.
Service double-checks no conflicting reservations exist for those tables within the time window (reservation time plus/minus turnover time).
If tables are still available, create new reservation record with “confirmed” status and generate unique confirmation code.
Update user loyalty points for making a reservation.
Invalidate the availability cache for that restaurant and date.
Commit transaction to make reservation permanent.
Publish “reservation.created” event to Kafka for asynchronous processing.
Notification Service consumes event and sends confirmation email to diner and notification to restaurant.
Return reservation details to client with confirmation code.

Modification and Cancellation:

For modifications: Check availability for new time/size, apply modification as atomic cancel-and-create operation.
For cancellations: Update reservation status, release tables, check cancellation policy for fees, publish cancellation event, notify restaurant of freed capacity.
All operations use optimistic locking with version numbers to prevent concurrent modification conflicts.

4. Restaurants should be able to manage table inventory and operating hours

Restaurant management components:

Restaurant Management Portal: Dedicated web interface for restaurant owners and managers. Provides tools for configuring tables, setting operating hours and reservation policies, managing reservations and waitlist, and viewing analytics.
Table Management Service: Handles table CRUD operations, validates table configurations, manages table combinations and floor plans, and syncs changes to availability computations.

Restaurant Configuration Flow:

Restaurant manager logs into management portal.
Manager adds or updates table information (number, capacity, type, combinability).
Changes sent to Restaurant Service for validation and persistence.
Table Management Service updates table inventory and invalidates cached availability.
Manager configures operating hours per day and shift with reservation intervals, party size limits, and advance booking windows.
Operating hours determine which time slots are available for reservations.

5. The system should support waitlist management with estimated wait times

Waitlist components:

Waitlist Service: Manages real-time waitlist queue and position tracking. Implements wait time estimation algorithm using historical data and ML models, handles automatic queue advancement when tables free up, and manages expiry for customers who don’t respond to notifications.
ML Service: Machine learning models for operational optimization. Provides wait time predictions based on current occupancy, party size, day/time, and historical patterns. Also supports no-show prediction and demand forecasting.

Waitlist Flow:

Walk-in customer or diner unable to get reservation adds themselves to waitlist via mobile app or restaurant host tablet.
Client sends POST request with party size and contact info.
Waitlist Service creates entry, assigns position in queue based on check-in time.
ML Service estimates wait time based on current restaurant occupancy, average turnover time for similar party sizes, and number of parties ahead in queue.
Background job continuously monitors restaurant state (new reservations completing, tables becoming available).
When suitable capacity becomes available, Waitlist Service notifies next eligible party via SMS and push notification.
Customer has 10 minutes to confirm they’re coming, or entry expires and next party is notified.
When customer arrives, host marks them as seated, removing from waitlist.

Step 3: Design Deep Dive

With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.

Deep Dive 1: How do we prevent double-booking with strong consistency guarantees?

The most critical challenge in OpenTable is preventing double-booking while maintaining high throughput during peak hours. We need strong consistency guarantees combined with optimistic concurrency control.

Problem: Race Conditions in Concurrent Reservations

Without proper synchronization, two customers could simultaneously book the same table:

User A checks availability and sees table 5 is free at 7 PM.
User B checks availability and also sees table 5 is free at 7 PM.
User A submits reservation and begins transaction.
User B submits reservation and begins transaction.
Both transactions read that table 5 is available.
Both transactions insert reservations for table 5.
Result: Double-booking!

Solution: Multi-Layer Consistency Strategy

Level 1: Optimistic Availability Cache

The first line of defense is a Redis cache storing pre-computed availability. This provides fast rejection of obviously unavailable slots without hitting the database:

Cache key structure: “availability:restaurantID:date”
Cache value: JSON structure with available capacity for each time slot and party size.
When user checks availability, we first consult cache.
If cache shows no capacity, immediately return “not available” without database query.
Cache is invalidated when reservations are created, modified, or cancelled.
Cache has 1-hour TTL to handle data staleness from cache invalidation failures.

Level 2: Pessimistic Database Locking

When creating a reservation, we use database-level locking to ensure atomicity:

Open transaction with SERIALIZABLE isolation level.
Use SELECT FOR UPDATE NOWAIT to acquire row-level locks on specific tables.
This blocks other transactions from reading or modifying these tables until commit.
NOWAIT means transaction fails immediately if lock is unavailable rather than waiting.
After acquiring locks, double-check no conflicting reservations exist.
Consider turnover time: if booking 7 PM for party of 4 with 90-minute turnover, block that table from 7 PM to 8:30 PM.
Check both exact time match and overlapping time windows.

Level 3: Optimistic Locking with Versioning

Each reservation has a version number that increments on every update:

When modifying reservation, include current version in WHERE clause.
If another transaction modified it first, version won’t match and update fails.
Application detects failure and retries with fresh data.
This prevents lost updates when multiple users modify same reservation.

Retry Strategy:

If lock acquisition fails (another transaction holds lock), use exponential backoff.
Retry up to 3 times with 100ms, 200ms, 400ms delays.
After 3 failures, return error to user: “This time slot was just booked. Please try another time.”
Most lock contention resolves within first retry due to fast transaction commit times.

Performance Trade-offs:

Strong consistency reduces write throughput by 2-3x compared to eventual consistency.
However, correctness is non-negotiable for reservations - double-booking destroys user trust.
Read throughput unaffected due to cache layer.
Lock contention only occurs when multiple users try to book same restaurant at same time simultaneously (rare except for extremely popular restaurants).

Deep Dive 2: How do we efficiently compute and cache availability for thousands of restaurants?

Computing availability in real-time for every request is computationally expensive. We need intelligent caching strategies that balance freshness with performance.

Problem: Expensive Availability Computation

For each availability check, we need to:

Retrieve all tables for the restaurant (20-50 tables typically).
Retrieve all existing reservations for the requested date.
Compute turnover time based on party size and shift.
Iterate through all possible reservation times (e.g., 5 PM to 10 PM in 15-minute increments = 20 time slots).
For each time slot, determine which tables are occupied considering turnover times.
Check if remaining capacity can accommodate the party size.
Consider table combination logic for large parties.

This computation becomes very expensive at scale:

100K restaurants * multiple checks per second = unsustainable database load.

Solution: Multi-Level Caching with Intelligent Invalidation

Precomputation Strategy:

For popular restaurants (top 10K by traffic), we proactively pre-compute availability:

Background job runs every hour to build availability cache for next 24 hours.
For each restaurant, compute availability for all time slots and common party sizes (2, 4, 6, 8).
Store in Redis with structured key: “availability:restaurantID:date”
Value is compact JSON array: each element has time, and available capacity per party size.
This eliminates real-time computation for 90% of availability checks.

Incremental Cache Updates:

When reservations are created, modified, or cancelled, we incrementally update the cache rather than invalidating completely:

For reservation creation: decrement available capacity for the specific time slot and affected party sizes.
For cancellation: increment available capacity.
For modification: decrement at new time, increment at old time.
Use Redis transactions (MULTI/EXEC) to ensure atomic cache updates.
If cache update fails, fallback to full invalidation.

Cache-Aside Pattern with Read-Through:

For restaurants not in pre-computed cache:

On availability check, first try to read from Redis.
If cache miss, compute availability by querying database.
Store computed result in cache with 1-hour TTL.
Subsequent requests hit cache until TTL expires.
This lazily builds cache for long-tail restaurants based on actual traffic patterns.

Time-Bucketed Caching:

We use separate cache keys for different time buckets:

“availability:restaurantID:date:lunch” for 11 AM - 3 PM.
“availability:restaurantID:date:dinner” for 5 PM - 10 PM.
This allows targeted invalidation when reservations are created in specific time ranges.
Reduces cache invalidation blast radius.

Consistency Guarantees:

Cache is advisory, not authoritative.
Always double-check availability during reservation creation with database locks.
Cache shows “optimistic” availability but database is source of truth.
Acceptable for cache to show stale data briefly (shows available when actually booked) since database check will catch it.
Not acceptable for cache to hide available slots (shows unavailable when actually available) so we use conservative TTLs.

Deep Dive 3: How do we estimate wait times accurately and manage the waitlist queue?

Providing accurate wait time estimates is critical for customer satisfaction. Overestimating causes frustration and customer loss; underestimating causes longer perceived waits and poor experience.

Problem: Wait Time Estimation Complexity

Wait time depends on many factors:

Number of parties ahead in queue.
Current restaurant occupancy (how many tables are occupied).
Average turnover time for current shift and party size.
Day of week and time of day (Friday dinner vs. Tuesday lunch have different dynamics).
Party size (larger parties take longer and are harder to seat).
Seasonal and event-based variations (holidays, local events).

Solution: Machine Learning-Based Prediction

Feature Engineering:

We collect extensive features for ML model:

Temporal features:

Hour of day (dinner rush hours have different patterns).
Day of week (weekends vs. weekdays).
Whether it’s a holiday or special event day.
Month and season (outdoor seating demand varies).

Restaurant features:

Historical average turnover time for this restaurant, party size, and time slot.
Current occupancy rate (percentage of tables occupied).
Number of parties ahead in waitlist.
Restaurant’s average rating and price point (affects dining duration).

Party features:

Party size (larger parties typically take longer).
Whether special requests were made (might indicate celebration, longer stay).

Contextual features:

Weather conditions (affects outdoor seating and no-show rates).
Local events (concerts, sports games increase demand and affect timing).

Model Architecture:

We use gradient boosted trees (XGBoost) for wait time prediction:

Training data: historical waitlist entries with actual wait times.
Target variable: minutes between check-in and being seated.
Model trained weekly on last 6 months of data.
Separate models per restaurant category (fast casual vs. fine dining have different patterns).
Model outputs predicted wait time in minutes.

Prediction Refinement:

Base ML prediction is refined with business logic:

Add 20% buffer for conservative estimate (better to underestimate wait than overestimate).
Round to nearest 5 minutes for cleaner customer communication.
Cap at reasonable maximum (e.g., 2 hours) and suggest reservation instead.
If queue is very long (10+ parties), increase estimate more aggressively.

Real-Time Updates:

Waitlist is dynamic, so we continuously update estimates:

Background job runs every 30 seconds to recompute wait times for all active waitlist entries.
When a table becomes available and party is seated, everyone else’s position and estimate updates.
If wait time changes significantly (more than 10 minutes), send update notification to customer.
This manages expectations and reduces perceived wait time.

Notification Strategy:

When table becomes available:

Identify next eligible party in queue based on party size and table availability.
Send SMS immediately: “Your table is ready at RestaurantName! Please check in within 10 minutes.”
Also send push notification if customer has the app.
Start 10-minute expiry timer.
If customer doesn’t respond within 10 minutes, mark as expired and notify next party.
This prevents queue blocking from unresponsive customers.

Queue Advancement Logic:

Not always first-in-first-out due to table size constraints:

When a 2-person table frees up, we can only seat parties of 1-2.
When a 6-person table frees up, we prefer seating a 5-6 person party over combining two small parties.
Algorithm prioritizes longest-waiting party that fits the available table capacity.
Prevents small parties from blocking large parties when only large tables are available.

Deep Dive 4: How do we handle peak load during dinner rush hours?

Peak dinner hours (6-8 PM) see 10x normal traffic. We need to handle this surge without degrading performance or dropping requests.

Problem: Traffic Spikes

Normal hours: 500 QPS across all operations.
Dinner rush: 5,000+ QPS.
Without proper scaling, system would become overloaded: slow responses, timeouts, failed reservations.

Solution: Auto-Scaling with Traffic Patterns

Horizontal Scaling:

All services are stateless and containerized (Kubernetes):

Define CPU and memory-based auto-scaling policies.
When average CPU exceeds 70%, automatically spawn additional pods.
Scale down during off-peak hours to save costs.
Pre-scale based on predictable patterns (scale up at 5 PM before dinner rush).
This ensures adequate capacity without over-provisioning 24/7.

Database Scaling:

PostgreSQL handles peak load through:

Read replicas for scaling read operations (availability checks, search queries).
Write operations go to primary database.
Connection pooling (PgBouncer) manages thousands of connections efficiently.
Each app server maintains connection pool, PgBouncer multiplexes to database.
Database sharding by geographic region for global deployments.

Redis Cluster Scaling:

Redis cluster with multiple nodes for horizontal scaling.
Data partitioned across nodes using consistent hashing.
Each node has replicas for high availability.
Read operations can hit replicas, write operations go to primary.
This provides 100K+ ops/second capacity.

Message Queue Buffering:

During extreme peaks, requests are queued:

Kafka can buffer millions of messages with very low latency.
If Reservation Service is temporarily overwhelmed, messages wait in queue.
As soon as capacity is available, messages are processed.
This prevents request dropping and provides backpressure.

Rate Limiting:

Protect system from abuse and ensure fair access:

API Gateway implements rate limiting per user and per IP.
Tier-based limits: standard users get 100 requests/hour, premium users get higher limits.
During extreme surge, apply temporary global rate limiting to shed load gracefully.
Return 429 status code with retry-after header.

Degraded Mode:

If system becomes critically overloaded:

Temporarily disable non-critical features (reviews, photos, advanced filters).
Focus all capacity on core flow: search, availability, reservations.
Show cached search results with stale availability data.
Reduce real-time availability computation frequency.
Display warning to users about potential delays.

Deep Dive 5: How do we optimize table assignments to maximize restaurant revenue?

Restaurants want to maximize revenue per seat while maintaining service quality. This requires intelligent table assignment and yield management.

Problem: Suboptimal Table Utilization

Without optimization:

Seating party of 2 at table for 6 wastes 4 seats.
Saving all large tables for potential large parties leaves them empty during slow periods.
Not accounting for time-based demand leads to empty tables during less popular hours.

Solution: Dynamic Table Assignment Algorithm

Assignment Scoring:

When assigning tables for a reservation, score each possible assignment:

Capacity utilization score:

Prefer tables where party size is 75-100% of table capacity.
Heavy penalty for wasting large tables on small parties.
Calculate: partySize / tableCapacity, want values close to 1.0.

Future flexibility score:

Prefer leaving larger tables available for future reservations.
Single tables scored higher than table combinations.
Consider upcoming reservations in the next 2 hours.

Table type preference:

Booths and premium tables saved for special requests or high-value customers.
Outdoor seating offered based on weather and customer preferences.
Bar seating offered to smaller parties during peak hours.

Combined scoring formula:

Total score = 10 * utilizationScore + 5 * flexibilityScore + typeScore
Select table assignment with highest score.

Table Combination Logic:

For large parties exceeding single table capacity:

Identify tables that can be physically combined (adjacent tables, combinable flag set).
Group tables by combination groups (tables that can merge together).
Try to minimize number of tables combined (2 tables better than 3).
Prefer combining similar-sized tables over different sizes.
Ensure combined capacity is appropriate (not combining four 4-person tables for party of 6).

Strategic Overbooking:

Restaurants can overbook to account for predicted no-shows:

Analyze historical no-show rate by time slot, day of week, and customer tier.
Calculate safe overbooking limit: 50% of expected no-shows.
Never overbook by more than 10% of total capacity regardless of predictions.
If overbooked and everyone shows up, offer priority waitlist placement with compensation.
ML model predicts no-show probability for each reservation to identify safe overbooking candidates.

Dynamic Reservation Policies:

Adjust policies based on predicted demand:

High demand periods (predicted occupancy over 90%):

Require deposits ($25 per person) to reduce no-shows.
Higher cancellation fees to discourage last-minute cancellations.
Reduced reservation windows (book only 14 days ahead instead of 30).
Shorter table turnover allocations (encourage quicker dining).

Low demand periods (predicted occupancy under 40%):

Remove deposit requirements to reduce booking friction.
Offer discounts or incentives for booking off-peak times.
Allow longer reservation windows and more flexible policies.
Encourage larger parties with special promotions.

Demand Forecasting:

ML model predicts demand for each restaurant by date and time:

Features:

Historical reservation volume for this restaurant, day, time, month.
Day of week and seasonality patterns.
Local events, holidays, weather forecasts.
Restaurant’s recent trend (growing vs. declining popularity).

Model outputs:

Predicted occupancy rate (percentage of tables that will be reserved).
Confidence interval for the prediction.

Used for:

Setting dynamic policies (deposits, cancellation fees).
Determining overbooking limits.
Recommending optimal operating hours to restaurants.

Deep Dive 6: How do we prevent and detect fraudulent reviews?

Review authenticity is critical for platform trust. Fake reviews (either overly positive or negative) damage credibility.

Problem: Review Fraud

Types of fraud:

Restaurants posting fake positive reviews for themselves.
Competitors posting fake negative reviews.
Users reviewing restaurants they never visited.
Copy-pasted reviews across multiple restaurants.
Paid review services generating bulk fake reviews.

Solution: Multi-Layer Verification and Detection

Verified Diner Badge:

Strongest signal of review authenticity:

Only users who completed a reservation can review that restaurant.
Review must be submitted within 30 days of dining.
One review per reservation (can’t review same visit multiple times).
Reviews are linked to reservation ID for verification.
Display “Verified Diner” badge on these reviews prominently.

Duplicate Detection:

Identify copy-pasted or template-based reviews:

Compute text similarity scores between reviews using cosine similarity on TF-IDF vectors.
Flag reviews with high similarity (over 90%) to other reviews by same user or for same restaurant.
Detect template patterns: “The [food type] was [adjective] and the [service] was [adjective].”
Flag users posting multiple highly similar reviews.

Sentiment Analysis:

Analyze review text for suspicious patterns:

Extremely positive reviews (all 5-star ratings, excessive superlatives) are flagged.
Reviews focusing only on specific dishes or features may indicate paid promotion.
Reviews with generic praise (“great food, great service”) lack authenticity signals.
Mismatch between star rating and text sentiment (5 stars but negative text) indicates issues.

User Behavioral Analysis:

Profile user reviewing patterns:

New accounts with immediate reviews are suspicious (review farms create accounts just for fake reviews).
Accounts with only 5-star or only 1-star reviews lack natural variation.
Burst patterns: many reviews in short time period indicates coordinated campaigns.
Geographic patterns: reviewing restaurants across country without corresponding reservation history.

Network Analysis:

Detect coordinated review rings:

Graph analysis connecting users who review same sets of restaurants.
Identify clusters of users with highly overlapping review targets.
IP address analysis: multiple accounts from same IP posting reviews.
Device fingerprinting: multiple accounts from same device.

Machine Learning Model:

Train classifier to predict fake reviews:

Features:

Review text features (length, sentiment, vocabulary richness).
User features (account age, review count, rating variance, verified diner ratio).
Temporal features (time since reservation, review posting time patterns).
Network features (connections to suspicious accounts).

Model outputs:

Fraud probability score 0-1.
Reviews with score over 0.7 flagged for manual moderation.
Reviews with score over 0.9 automatically hidden pending review.

Manual Moderation:

Human moderators review flagged content:

Queue of suspicious reviews prioritized by fraud score.
Moderators can approve, reject, or request additional information.
Repeated violations lead to account suspension.
Restaurants can appeal rejected reviews through formal process.

Review Impact Control:

Even with verification, control outlier impact:

Time-weighted ratings: recent reviews count more than old reviews.
Volume-weighted: restaurants with few reviews show confidence interval.
Outlier detection: single 1-star review among all 5-stars has limited impact.
Variance in ratings viewed positively (shows authentic mixed experiences).

Step 4: Wrap Up

In this design, we proposed a comprehensive system architecture for a restaurant reservation platform like OpenTable. If there is extra time at the end of the interview, here are additional points to discuss:

Key Design Decisions

1. Strong Consistency for Reservations:

We chose PostgreSQL with row-level locking and SERIALIZABLE isolation for reservation creation to prevent double-booking. The performance cost (2-3x slower writes) is acceptable given the critical importance of data consistency. A single double-booking incident can cause significant reputational damage and customer loss.

2. Multi-Layer Caching Strategy:

Redis caches daily availability windows with 1-hour TTL, while Elasticsearch maintains searchable indexes with eventual consistency. This balances read performance (under 500ms) with data freshness. Cache invalidation is handled through event-driven architecture to minimize staleness.

3. Sharding Strategy:

Reservations table is sharded by restaurant ID, enabling horizontal scaling while keeping restaurant-specific queries efficient. Cross-shard queries (user reservation history) are denormalized into a separate user-centric table. This allows independent scaling per geographic region.

4. Event-Driven Architecture:

Kafka streams reservation events to downstream services (notifications, analytics, CRM) without blocking critical path. This enables sub-2-second reservation confirmation while doing heavy processing asynchronously. Event sourcing provides complete audit trail.

5. Machine Learning for Operational Optimization:

No-show prediction, wait time estimation, and demand forecasting use gradient boosted tree models retrained weekly. Feature engineering includes temporal patterns, behavioral signals, and contextual information. Models achieve 85%+ accuracy for wait time prediction.

Performance Optimizations

Database Level:

Connection pooling with PgBouncer maintains 1000 connections per shard.
Read replicas handle 90% of queries (availability checks, searches).
Covering indexes on frequently queried columns reduce disk I/O.
Partitioning reservations table by date for efficient historical data archival.

Caching Level:

Redis cluster with read replicas for geographic distribution.
Cache-aside pattern with read-through for automatic cache population.
Time-bucketed caching reduces invalidation blast radius.
Negative caching prevents repeated queries for unavailable slots.

Search Level:

Elasticsearch with 5 primary shards and 2 replicas per index.
Separate indexes for active and historical data.
Document denormalization to avoid joins.
Query result caching for popular searches.

Application Level:

Asynchronous processing for non-critical operations (email, analytics).
Request batching for notification delivery.
Circuit breakers prevent cascading failures.
Graceful degradation during partial outages.

Monitoring and Observability

Service Level Objectives:

99.99% availability for reservation creation flow.
p99 latency under 2 seconds for reservation confirmation.
p95 latency under 500ms for search and availability checks.
Zero double-bookings tolerance.

Key Metrics:

Reservation success rate and failure reasons.
Average time from search to booking completion.
Cache hit ratio and invalidation frequency.
Database query latency and lock contention.
Message queue lag and consumer throughput.

Alerting Strategy:

PagerDuty integration for critical errors (double-bookings, payment failures).
Automated rollback triggers for deployment issues.
Anomaly detection for traffic patterns and error rates.
SLA monitoring with error budget tracking.

Observability Tools:

Distributed tracing with Jaeger for request flow visualization.
Grafana dashboards for real-time system health.
Kibana for log analysis and debugging.
Prometheus for metrics collection and alerting.

Scalability Path

Current Capacity:

10,000 QPS across all operations.
100,000 restaurants on platform.
500M reservations per year.
100K concurrent users during peak.

2x Growth (Next 12-18 Months):

Add 2 more database shards (from 4 to 6 shards).
Scale Kubernetes pods from 50 to 100 instances.
Increase Redis cluster from 6 to 12 nodes.
Add Elasticsearch nodes from 15 to 25.
Estimated cost: additional $50K/month in infrastructure.

10x Growth (Next 3-5 Years):

Multi-region active-active deployment across 3 geographic regions.
Geo-distributed database with CockroachDB for automatic replication.
Separate read and write databases with change data capture.
Edge caching with regional Redis clusters.
Global load balancing with anycast routing.
Estimated cost: approximately $500K/month in infrastructure.

Future Enhancements

AI-Powered Features:

Natural language reservation booking via ChatGPT integration.
Personalized restaurant recommendations based on dining history.
Predictive service suggesting reservations before user searches.
Image recognition for automated photo moderation.

Enhanced User Experience:

AR table preview showing restaurant interior and exact table location.
Virtual restaurant tours before booking.
Real-time menu updates based on ingredient availability.
Dietary restriction filtering with menu analysis.

Social Features:

Group booking coordination for parties from multiple users.
Split payment with automatic bill division.
Social dining discovery based on friend recommendations.
Shared dining wishlists and collaborative planning.

Restaurant Operations:

Integration with POS systems for automatic table status updates.
AI-powered staff scheduling based on predicted demand.
Inventory management integrated with reservation volume.
Automated dynamic pricing recommendations for yield optimization.

Loyalty and Rewards:

Blockchain-based loyalty points with transferability.
NFT-based exclusive access tiers for popular restaurants.
Partner rewards integration (hotels, entertainment, travel).
Gamification with badges and achievements.

Additional Considerations

Security and Privacy:

Encryption in transit (TLS 1.3) and at rest (AES-256).
PCI DSS compliance for payment processing.
GDPR compliance with data portability and right to be forgotten.
Regular security audits and penetration testing.

Disaster Recovery:

Daily database snapshots with point-in-time recovery.
Cross-region replication for critical data.
Automated failover with health checks.
Regular disaster recovery drills.

Business Continuity:

Manual reservation fallback for system outages.
Phone-based booking during downtime.
Clear communication channels for status updates.
Service level agreements with penalties for downtime.

This design provides a production-ready foundation for a restaurant reservation platform handling millions of users while maintaining strict consistency guarantees, excellent performance, and room for future growth and innovation. The architecture balances technical sophistication with operational simplicity, ensuring the system is both powerful and maintainable.

Design OpenTable