Design Tinder

Tinder is a location-based social search mobile application that facilitates communication between mutually interested users. The platform serves 75+ million users globally with billions of swipes per day, requiring highly scalable infrastructure to handle real-time matching, geospatial queries, and instant messaging.

Designing Tinder presents unique challenges including efficient geospatial indexing, atomic match detection, personalized recommendation algorithms, real-time messaging at scale, and sophisticated safety mechanisms to prevent abuse and ensure user trust.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, we need to define what we’re building. A good approach is to separate functional requirements from non-functional requirements, establish scale, and clarify any ambiguities through questions.

Functional Requirements

Core Requirements (Priority 1-3):

Users should be able to create profiles with photos, bio, and preferences (age range, distance, gender).
Users should be able to discover potential matches within a configurable radius based on their location.
Users should be able to swipe right (like), left (pass), or super like on profiles.
When two users mutually like each other, a match should be created and both users notified in real-time.
Matched users should be able to chat with each other via text and media messages.
Users should receive personalized recommendations based on their preferences, location, and behavior.

Below the Line (Out of Scope):

Users should be able to see who liked them without matching (premium feature).
Users should be able to undo their last swipe (rewind feature).
Users should be able to change their location to anywhere in the world (passport feature).
Users should be able to boost their profile for increased visibility.
Users should be able to schedule dates and meet in person.
Users should be able to verify their photos via selfie matching.

Non-Functional Requirements

Core Requirements:

The system should handle 75+ million daily active users with low latency (P99 < 200ms for swipes).
The system should ensure strong consistency for match creation to prevent duplicate matches or race conditions.
The system should support real-time notifications (< 1 second) when matches occur.
The system should be highly available (99.95% uptime) with no single point of failure.
The system should protect user privacy and location data, complying with GDPR and CCPA.

Below the Line (Out of Scope):

The system should gracefully handle partial failures without losing user data.
The system should support A/B testing for recommendation algorithms.
The system should provide comprehensive monitoring, logging, and alerting.
The system should facilitate easy updates and CI/CD deployment.

Clarification Questions & Assumptions:

Platform: Mobile apps for iOS and Android, plus web application.
Scale: 75 million daily active users, 1.6 billion daily swipes, 26 million daily matches, 1.2 billion daily messages.
Location Updates: Location updates occur when users open the app or manually refresh, not continuous tracking.
Geographic Coverage: Global deployment with focus on major metropolitan areas.
Storage: Approximately 1.2 petabytes total (profiles, photos, messages, metadata).
Recommendation Refresh: Card stacks refreshed every 4 hours or when depleted.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

For a product-style system design like Tinder, we should build the design sequentially, addressing each functional requirement one by one. This ensures we cover all core features before diving into optimizations and deep dives.

Defining the Core Entities

To satisfy our key functional requirements, we need these entities:

User: Represents both profile data and authentication. Includes personal information (name, age, gender, bio), photos (up to 9 images), preferences (age range, distance, gender filter), location coordinates, activity status, and attractiveness score (ELO rating). Users have verification status and premium subscription status.

Swipe: Records every swipe action taken by users. Contains the swiper’s ID, the swipee’s ID, action type (like, pass, super like), and timestamp. This entity is crucial for match detection and prevents duplicate swipes on the same profile.

Match: Represents a mutual like between two users. Created when both users have liked each other. Contains both user IDs, match timestamp, active status, and optionally expiration time. The match serves as the gateway to enabling chat.

Message: Individual chat messages between matched users. Includes the match ID, sender ID, message content, message type (text, image, GIF), media URL if applicable, and timestamps for sent, delivered, and read status. Messages are partitioned by match for efficient retrieval.

Location: Real-time geospatial data for users. Stores latitude, longitude, geohash encoding for efficient proximity searches, and last update timestamp. Location data has privacy controls and is anonymized for safety.

Recommendation: Personalized card stack for each user. Contains pre-computed list of candidate profiles ranked by compatibility score, which considers attractiveness, distance, activity recency, and preference matching. Recommendations are cached and refreshed periodically.

API Design

Create Profile Endpoint: Used by new users to set up their dating profile after authentication.

POST /profile -> User
Body: {
  name: string,
  age: number,
  gender: string,
  bio: string,
  photos: [file uploads],
  preferences: { minAge, maxAge, maxDistance, genderFilter }
}

Update Location Endpoint: Called when user opens the app or manually refreshes to update their current location.

POST /location -> Success/Error
Body: {
  lat: number,
  long: number
}

Get Recommendations Endpoint: Retrieves the personalized card stack of potential matches for the user.

GET /recommendations?limit=50 -> [User]

Swipe Endpoint: Records a swipe action and checks for potential match.

POST /swipes -> { status: "swiped" | "match", matchId? }
Body: {
  swipeeId: string,
  action: "like" | "pass" | "super_like"
}

Send Message Endpoint: Sends a message to a matched user.

POST /matches/:matchId/messages -> Message
Body: {
  text?: string,
  mediaUrl?: string,
  type: "text" | "image" | "gif"
}

Get Message History Endpoint: Retrieves chat history for a specific match.

GET /matches/:matchId/messages?limit=50&offset=UUID -> [Message]

Note: User authentication is handled via JWT tokens in headers, not in request bodies. All endpoints require authentication, and rate limiting is enforced at the API gateway level.

High-Level Architecture

Let’s build the system incrementally, addressing each functional requirement:

1. Users should be able to create profiles with photos, bio, and preferences

The core components for profile management:

Mobile Client: Native iOS and Android apps that provide the user interface. Handles photo uploads, form validation, and local caching.
API Gateway: Entry point for all client requests, handling authentication via JWT, rate limiting to prevent abuse, SSL termination, and routing to appropriate microservices.
Profile Service: Manages all profile-related operations including CRUD operations, photo upload orchestration, preference updates, and profile search. Validates data constraints like age limits and photo count.
Object Storage (S3): Stores user photos with lifecycle policies. Photos are organized by user ID and indexed. Pre-signed URLs are generated for secure upload and download.
CDN (CloudFront): Distributes profile images globally for low-latency access. Implements caching strategies with appropriate TTLs to reduce origin load.
Database (PostgreSQL): Stores structured profile data including user information, preferences, and photo metadata. Uses indexes on frequently queried fields like age and location.

Profile Creation Flow:

User fills out profile information in the mobile app and selects photos from their device.
Client sends POST request to /profile with form data. Photos are uploaded separately via pre-signed S3 URLs.
API Gateway authenticates the request and forwards to Profile Service.
Profile Service validates input, generates photo URLs, and creates user record in PostgreSQL database.
Photos are automatically processed (resized, compressed) and distributed to CDN edge locations.
Service returns the complete User object to the client.

2. Users should be able to discover potential matches within a configurable radius

We introduce location management components:

Location Service: Dedicated service for handling geospatial operations. Receives location updates from users, stores coordinates with geohash encoding, and performs efficient radius searches to find nearby users. Implements privacy controls to anonymize exact locations.
Geospatial Data Store (Redis GEO): In-memory database optimized for geospatial queries. Uses sorted sets with geohash-based scoring for O(log N) proximity searches. Data is partitioned by geographic regions for better scalability.

Location Update Flow:

When user opens the app, the mobile client gets GPS coordinates from device sensors.
Client sends POST request to /location with latitude and longitude.
API Gateway forwards to Location Service.
Location Service encodes coordinates as geohash, stores in Redis GEO using GEOADD command, and associates with user ID.
Old location data is overwritten or expires based on TTL policies.

Discovery Query: When generating recommendations, the system queries Location Service to find users within the configured radius, returning a list of candidate user IDs for further filtering.

3. Users should be able to swipe on profiles and receive personalized recommendations

We add recommendation and swipe handling:

Recommendation Service: Generates personalized card stacks using multi-factor scoring algorithm. Considers user preferences, geographic proximity, attractiveness scores (ELO rating), activity recency, and profile completeness. Applies machine learning models to predict compatibility. Pre-computes and caches stacks for active users.
Swipe Service: Processes swipe actions with idempotency guarantees. Records swipes in database, updates ELO scores asynchronously, enforces daily limits for free users, and triggers match detection logic when a like occurs.
Cache Layer (Redis): Stores pre-computed recommendation stacks, swipe history for deduplication, and temporary data for match detection. Uses appropriate TTLs to balance freshness and performance.

Recommendation Generation Flow:

User requests recommendations by swiping through their stack or explicitly refreshing.
Recommendation Service queries Location Service for nearby users within max distance.
Service filters candidates by age, gender preferences, and excludes previously swiped profiles.
Service scores each candidate using weighted formula: attractiveness (30%), compatibility (30%), recency (20%), distance (20%).
Top 50 candidates are sorted by score and returned as the card stack, which is cached for future requests.

Swipe Flow:

User swipes on a profile in the mobile app, which sends POST request to /swipes with swipee ID and action.
API Gateway routes to Swipe Service.
Service checks for duplicate swipe using cached swipe history.
If new, service records swipe in database and removes swipee from user’s cached recommendation stack.
Daily swipe counter is incremented with Redis INCR, checking against user’s limit.

4. When two users mutually like each other, a match should be created

We extend swipe handling with atomic match detection:

Match Detection Logic: Embedded in Swipe Service, uses Redis Lua scripts for atomic operations. When processing a like, checks if the swipee has already liked the swiper. If yes, creates match atomically to prevent race conditions. Uses deterministic user ID ordering to prevent duplicate match records.

Match Creation Flow:

User A swipes right on User B, triggering the swipe endpoint.
Swipe Service records the like and executes atomic match check.
Using Redis Lua script, service adds User A to User B’s like set and checks if User B is in User A’s like set.
If mutual like detected, service creates match record in database with unique constraint on (user1_id, user2_id).
Match event is published to message queue for notification processing.

5. Matched users should receive real-time notifications

We add notification infrastructure:

Notification Service: Consumes match events from message queue (Kafka). Sends push notifications via APN (iOS) and FCM (Android). Manages WebSocket connections for real-time in-app notifications. Handles notification preferences and delivery retry logic.
Message Queue (Kafka): Decouples match creation from notification delivery. Provides durability and replay capabilities. Enables asynchronous processing and horizontal scaling of notification workers.
WebSocket Server: Maintains persistent connections with active users for instant in-app updates. Sends match notifications, typing indicators, and message delivery receipts. Implements heartbeat mechanism to detect disconnections.

Notification Flow:

After match is created, Swipe Service publishes match event to Kafka topic.
Notification Service consumes event and retrieves user details for personalization.
Service checks if users are online via WebSocket connection registry in Redis.
For online users, sends instant notification via WebSocket with matched user’s profile.
For offline users or as backup, sends push notification via FCM/APN.
Mobile client receives notification and displays match screen with animation.

6. Matched users should be able to chat

We introduce messaging infrastructure:

Chat Service: Handles real-time messaging between matched users. Validates that both users are part of the match before allowing messages. Manages message persistence, delivery receipts, and read status. Implements rate limiting to prevent spam.
Message Database (Cassandra): NoSQL database optimized for time-series data. Partitions messages by match ID for locality. Uses time-based UUID for message ordering. Supports efficient pagination for chat history retrieval.
WebSocket Manager: Routes messages between users in real-time. Maintains connection mappings from user ID to WebSocket connection. Handles connection failures and reconnection logic.

Messaging Flow:

User types message in chat interface and hits send.
Client sends POST request to /matches/:matchId/messages over HTTPS.
API Gateway routes to Chat Service.
Chat Service validates match membership and creates message object with time-based UUID.
Message is published to Kafka for asynchronous persistence to Cassandra.
Recent messages (last 100) are cached in Redis for fast retrieval.
If recipient is online, message is immediately sent via WebSocket connection.
If offline, message waits in queue and push notification is sent.
When recipient opens app, messages are retrieved from cache or Cassandra.

Step 3: Design Deep Dive

With core functionality in place, we focus on non-functional requirements and system optimizations. These deep dives differentiate a basic design from production-ready architecture.

Deep Dive 1: How do we efficiently handle geospatial queries for millions of users?

Challenge: Finding users within X miles of a given location requires comparing coordinates, which is computationally expensive at scale. A naive approach would scan all users and calculate distances, resulting in O(N) complexity and prohibitively slow query times.

Solution: Geohash-Based Indexing with Redis GEO

Geohashes encode 2D coordinates (latitude, longitude) into a single string using base-32 characters. The key property is that geographically close locations share longer common prefixes. For example:

San Francisco: “9q8yy”
San Jose: “9q9h”
Los Angeles: “9q5c”

Redis provides built-in GEO commands that leverage sorted sets with geohash scoring:

GEOADD adds a location with longitude, latitude, and member key
GEORADIUS finds all members within a specified radius of given coordinates
GEOSEARCH provides more flexible querying with sorting options

Implementation Strategy:

To handle 75 million users efficiently, we partition the world into regional grids of approximately 100km × 100km. Each region becomes a separate Redis key, reducing the search space significantly.

When a user updates their location:

Calculate which grid region they belong to based on coordinates
Remove them from their old region’s sorted set (if changed)
Add them to the new region’s sorted set using GEOADD
Store metadata (exact coordinates, timestamp) in a separate hash key

When finding nearby users:

Determine user’s current region from their coordinates
Query primary region using GEORADIUS with desired radius
If near region boundaries or insufficient candidates, query adjacent regions
Redis returns members within radius sorted by distance
Results are filtered by availability, preferences, and previous swipes

Optimization Techniques:

Caching Frequent Searches: Since many users in the same area perform similar queries, we cache results in Redis with 5-minute TTL. Cache key format: “nearby:{lat}:{long}:{radius}”. This reduces repeated computations and achieves 85% cache hit rate.

Lazy Location Updates: To reduce write load, location is only updated if the user has moved more than 1km or 30 minutes have elapsed since last update. For stationary users, this dramatically reduces unnecessary writes.

Read Replicas: Location queries far outnumber writes. Redis read replicas handle GEORADIUS queries while master handles GEOADD writes. This separates concerns and allows independent scaling.

Regional Sharding: Users in Asia query Asian regions, US users query US regions. Geographic sharding reduces latency and allows regional scaling based on user density.

Performance Characteristics:

Proximity search: O(log N + M) where N is users in region, M is results within radius
P99 latency: < 50ms for typical queries
Write throughput: ~100k location updates per second per Redis instance
Memory: ~50 bytes per user location entry

Deep Dive 2: How do we generate personalized recommendations that maximize engagement?

Challenge: With millions of potential matches, how do we rank and recommend profiles that users will actually like? A random approach leads to poor match rates. We need a sophisticated algorithm that considers multiple factors while maintaining performance.

Solution: Multi-Factor Scoring with ELO Rating System

Our recommendation engine combines several signals into a unified score:

Attractiveness Score (ELO Rating):

Borrowed from chess ratings, ELO provides a relative measure of profile attractiveness based on swipe outcomes. Every profile starts at 1000 points. When User A swipes on User B:

If A likes B: B’s score increases, A’s decreases slightly (B “won” the comparison)
If A passes on B: A’s score increases, B’s decreases (A “won”)

The magnitude of change depends on the rating difference. If a highly-rated profile is liked by a lower-rated user, the change is small. If the opposite occurs, the change is larger. This creates a dynamic rating that converges to reflect true attractiveness.

ELO updates are processed asynchronously via Kafka to avoid blocking swipe operations. Expected score is calculated using formula: 1 / (1 + 10^((opponent_elo - player_elo) / 400)). Rating change is K-factor (32) multiplied by (actual_result - expected_score).

Compatibility Scoring:

We calculate compatibility based on shared interests, education level, occupation, and behavioral patterns. For interests, we use Jaccard similarity: size of intersection divided by size of union. Profile photos are processed through CNN to extract style embeddings, and cosine similarity measures visual compatibility.

Recency Weighting:

Active users are prioritized using exponential decay. Time since last active (in hours) is converted to recency score using: exp(-hours_inactive / 24). This ensures newly active users appear in recommendations quickly.

Distance Factor:

Proximity matters for dating. Distance score is calculated as: 1 - (distance_km / max_distance_preference). Users at 5 miles get higher scores than those at 45 miles, assuming a 50-mile preference.

Profile Completeness:

Profiles with more photos and complete bios are ranked higher: completeness_score = (num_photos / 6) * 0.5 + (has_bio ? 0.5 : 0).

Combined Score:

Final recommendation score = (0.30 × attractiveness) + (0.30 × compatibility) + (0.20 × recency) + (0.15 × distance) + (0.05 × completeness). These weights are tuned via A/B testing to maximize match rates.

Card Stack Generation:

Query Location Service for users within max distance (e.g., 50 miles)
Filter by hard preferences: age range, gender, mutual blocks
Exclude previously swiped profiles using Bloom filter for efficiency
Calculate score for each candidate using formula above
Sort by score descending
Apply diversity heuristics to avoid showing similar profiles consecutively
Take top 50 candidates as card stack
Cache stack in Redis with 4-hour TTL

Performance Optimizations:

Pre-computation is key. For highly active users (logged in within last hour), we proactively generate recommendations during off-peak hours. This moves computation off critical path.

Scoring calculations are parallelized across candidates. If location service returns 500 candidates, we score them concurrently using worker pools.

Machine learning models run on GPU-equipped instances for faster inference. Photo similarity models are particularly compute-intensive.

Stacks are refreshed when depleted or after 4 hours. If user swipes through all 50 profiles, a new batch is generated immediately to maintain engagement.

Deep Dive 3: How do we detect matches atomically and prevent race conditions?

Challenge: When User A swipes right on User B while User B simultaneously swipes right on User A, we need to create exactly one match record. If both swipe operations check for matches independently, race conditions could cause duplicate matches or missed matches.

Solution: Atomic Match Detection with Redis Lua Scripts

Redis Lua scripts execute atomically, providing transactional guarantees without complex distributed locking.

Data Structure:

For each user, maintain a Redis set containing user IDs who have liked them. Key format: “likes:{user_id}”. When User A likes User B, we add A to B’s like set.

Atomic Check-and-Set:

When processing a like from User A to User B, we execute a Lua script that:

Adds A to B’s like set (SADD likes:B A)
Checks if B is in A’s like set (SISMEMBER likes:A B)
Returns 1 if mutual like detected, 0 otherwise

This entire operation is atomic. No other operation can interleave between the add and check.

Match Creation:

If the Lua script returns 1 (mutual like), we create a match record in PostgreSQL. To prevent duplicate matches, we enforce uniqueness:

Always order user IDs: smaller ID becomes user1_id, larger becomes user2_id
Unique constraint on (user1_id, user2_id) column pair
If constraint violation occurs, we know match already exists

Idempotency:

Swipes themselves are idempotent. If User A likes User B twice (perhaps due to client retry), we check for existing swipe record before processing. Unique constraint on (swiper_id, swipee_id) prevents duplicate swipe records.

Failure Handling:

If match creation fails after Redis confirms mutual like, we have options:

Retry match creation (idempotent due to unique constraint)
Publish to dead letter queue for manual inspection
Return success to user anyway (match eventually created by async worker)

Distributed Considerations:

Even with multiple Swipe Service instances, the atomic Lua script guarantees correctness. Redis is single-threaded for command execution, so concurrent scripts execute serially with no race condition.

Performance:

The Lua script executes in microseconds. Combined with match record creation, end-to-end match detection latency is under 10ms at P99. This is fast enough for real-time user experience.

Deep Dive 4: How do we handle real-time messaging at scale with 1.2 billion daily messages?

Challenge: Supporting real-time chat for millions of concurrent users requires low-latency message delivery, efficient storage for message history, and reliable delivery guarantees. Traditional RDBMS struggle with write-heavy messaging workloads.

Solution: WebSocket + Cassandra + Message Queue Architecture

Real-Time Delivery via WebSocket:

HTTP request-response is inefficient for bidirectional real-time communication. WebSocket provides persistent TCP connections with low overhead. Each mobile client establishes a WebSocket connection to the Chat Service upon app launch.

WebSocket Manager maintains a hash map: user_id -> WebSocket connection object. When a message arrives for User B, the manager looks up B’s connection and sends the message immediately. Heartbeat pings every 30 seconds detect dead connections.

Connection state is stored in Redis for multi-instance coordination. When Chat Service instance receives message for User B, it checks Redis to find which server instance holds B’s connection, then forwards via inter-service messaging.

Message Persistence with Cassandra:

Cassandra excels at write-heavy time-series data. We model messages with match_id as partition key and message_id (time-based UUID) as clustering key. This co-locates all messages for a conversation on the same node, enabling efficient retrieval.

Schema design: partition key is match_id, clustering key is message_id (TIMEUUID) with descending order. This allows paginated queries returning most recent messages first.

Asynchronous Write Path:

To minimize user-facing latency, message persistence is asynchronous:

Chat Service receives message via HTTPS POST
Service validates match membership
Message is immediately sent via WebSocket to recipient (if online)
Message is published to Kafka topic for persistence
Response is returned to sender (delivered status)
Kafka consumer workers write to Cassandra in background
Once written, status is updated to “persisted”

This approach decouples user-facing latency from database write latency. Even if Cassandra is slow, users see instant message delivery.

Message Caching:

Recent messages (last 100 per match) are cached in Redis list structure. Key format: “chat:{match_id}:recent”. When users open a chat, we serve from cache with < 5ms latency. Older messages require Cassandra query.

Cache is updated optimistically when sending messages and periodically refreshed from source of truth (Cassandra). TTL of 24 hours keeps memory usage bounded.

Delivery Guarantees:

Messages are persisted in Kafka before acknowledgment. If Chat Service crashes after sending via WebSocket but before Kafka publish, the message is lost. To prevent this, we use Kafka producer acknowledgments and only return success after Kafka confirms receipt.

For offline users, messages queue in Cassandra until they reconnect. Upon reconnection, client requests message history, retrieving any missed messages.

Read Receipts:

When User B reads messages from User A, client sends acknowledgment to Chat Service. Service updates message read_at timestamp in Cassandra and sends “message_read” event via WebSocket to User A. This provides real-time read receipt indication.

Typing Indicators:

Typing indicators are ephemeral and not persisted. When User A types, client sends typing event via WebSocket. Chat Service forwards to User B (if online) without database interaction. This minimizes latency for transient state.

Scalability:

WebSocket Manager can scale horizontally. Each instance handles a subset of connections. Redis serves as coordination layer for routing messages between instances.

Cassandra scales horizontally by adding nodes. Partition key (match_id) distributes data evenly. Replication factor of 3 provides fault tolerance.

Kafka partitions enable parallel message processing. Each partition is consumed by one consumer instance, allowing horizontal scaling of write workers.

Performance Metrics:

Message delivery latency (sender to recipient): P99 < 100ms
Message persistence latency: P99 < 500ms
Chat history query: P99 < 50ms (cache hit), P99 < 200ms (cache miss)
Throughput: 50k messages per second per Cassandra cluster

Deep Dive 5: How do we ensure system resilience during high-traffic periods and partial failures?

Challenge: Dating apps experience traffic spikes during evenings and weekends. Special events like Valentine’s Day can cause 3-5x normal load. We need graceful degradation rather than cascading failures.

Solution: Rate Limiting, Circuit Breakers, and Asynchronous Processing

API Rate Limiting:

API Gateway enforces rate limits per user to prevent abuse and ensure fair resource allocation. Limits are tiered based on subscription:

Free users: 100 requests per minute
Premium users: 500 requests per minute

Rate limiting uses token bucket algorithm implemented in Redis. For each user request, we increment counter with expiration. If counter exceeds limit, request is rejected with 429 status code.

Swipe limits are separate: free users get 100 swipes per day, premium get unlimited. This is enforced at the Swipe Service level using daily counters.

Circuit Breakers:

When downstream services fail, circuit breakers prevent cascading failures. Each service dependency (e.g., Chat Service calling Notification Service) has circuit breaker wrapper:

Closed state: Requests pass through normally. Failures increment counter. Open state: After N failures, circuit opens. Requests fail immediately without calling downstream. This prevents resource exhaustion. Half-open state: After timeout, circuit allows one test request. If successful, circuit closes. If failed, reopens.

Fallback Strategies:

When services degrade, we provide fallback responses:

Recommendation Service failure: Return cached recommendations or random nearby profiles
Location Service failure: Use last known location with staleness warning
Notification Service failure: Queue notifications for retry, user sees match in app regardless
Third-party APIs (maps, ML models) failure: Use cached results or simplified algorithms

Asynchronous Processing:

Non-critical operations are asynchronous to avoid blocking user-facing requests:

ELO score updates published to Kafka, processed by workers
Email notifications sent via background jobs
Analytics events batched and written offline
Photo processing (resize, compress, moderation) happens asynchronously

Load Shedding:

During extreme load, we prioritize critical paths:

Swipes and match detection are highest priority (core user experience)
Recommendation generation is medium priority (can use stale cache)
Analytics and logging are lowest priority (can be sampled or dropped)

Load shedding is implemented via priority queues. When queue depth exceeds threshold, low-priority tasks are dropped with exponential backoff.

Database Connection Pooling:

Database connections are expensive resources. Connection pools maintain reusable connections, preventing connection exhaustion during traffic spikes. Pool size is tuned based on database capacity and service instance count.

Auto-Scaling:

Services run on containerized infrastructure (Kubernetes or ECS) with auto-scaling policies:

Scale up when CPU > 70% or request latency > 200ms
Scale down when CPU < 30% for 10 minutes
Minimum replicas: 3 per service (for redundancy)
Maximum replicas: 100 per service (cost control)

Health Checks:

Load balancers perform health checks every 5 seconds. Unhealthy instances are removed from rotation automatically. Health check includes database connectivity and service dependencies.

Partial Degradation:

Features are designed to fail independently. If Chat Service is down, swipes and matches still work. If Recommendation Service is down, users can manually search profiles. This prevents total outage from single component failure.

Deep Dive 6: How do we implement safety features like photo verification and content moderation?

Challenge: Dating apps face risks of fake profiles, catfishing, inappropriate content, and harassment. Safety features are critical for user trust and regulatory compliance.

Solution: Multi-Layered Safety Architecture

Photo Verification:

Users can verify their profile by taking a real-time selfie. Verification flow:

User initiates verification, service generates session with random pose requirement (smile, turn left, neutral)
User takes selfie following instructions within time limit
Client uploads selfie to verification service
Service detects liveness using ML model to ensure it’s not a photo of a photo (checks for screen reflection, digital artifacts)
Service extracts facial embedding from selfie using face recognition model
Service compares selfie embedding to embeddings of profile photos using cosine similarity
If similarity exceeds 80% threshold, user is marked as verified
Verified badge is displayed on profile, increasing trust and engagement

Verification reduces catfishing and increases match rates by 40% according to typical metrics.

Automated Content Moderation:

All uploaded photos are screened before approval:

Photo is analyzed by NSFW detection model (convolutional neural network) to classify explicit content
Photo is sent to cloud vision API (AWS Rekognition, Google Cloud Vision) for moderation labels
If NSFW score exceeds 85% or moderation labels include explicit categories, photo is rejected or quarantined
Flagged photos are queued for human review by moderation team
Repeat violations result in account suspension

Text moderation for bios and messages:

Text is analyzed using toxicity detection API (Perspective API, OpenAI Moderation)
Scores are calculated for categories: toxicity, harassment, hate speech, threats, profanity
If any score exceeds 80%, content is blocked or flagged for review
For messages, users are warned before sending, with option to rephrase

Report and Block System:

Users can report profiles for inappropriate behavior:

Reporter selects reason (harassment, fake profile, inappropriate photos, scam, other)
Report is recorded with timestamp and details
Reported user’s violation counter increments in Redis
If report count exceeds threshold (5 reports from different users), account is auto-suspended pending review
Reports are queued for moderation team with priority based on severity
Moderation team reviews evidence (profile, messages, photos) and decides action (warning, suspension, ban)

Blocking is immediate and bidirectional:

User A blocks User B
Both users disappear from each other’s recommendations permanently
Any existing match is dissolved
Message history is preserved (for legal compliance) but inaccessible
Block list is stored in Redis for fast filtering during recommendation generation

Safety Center:

In-app safety center provides:

Tips for safe dating (meet in public, tell friend, video chat first)
Resources for reporting abuse
Privacy controls (hide profile, control location precision, manage photo visibility)
Panic button that immediately contacts emergency services with location

Machine Learning for Fraud Detection:

Anomaly detection models identify suspicious behavior:

Accounts that mass-swipe right (bots or spammers)
Profiles with stock photos (reverse image search)
Accounts sending identical messages to many users (spam)
Unusual activity patterns (logging in from multiple countries simultaneously)

Flagged accounts are automatically shadowbanned (invisible to others) pending investigation.

Compliance and Privacy:

GDPR and CCPA compliance requires:

Data deletion upon request (right to be forgotten)
Data export functionality (right to access)
Consent management for data collection
Location data anonymization (rounded coordinates shown publicly, exact coordinates used only for matching)
Encryption at rest and in transit (TLS 1.3, AES-256)

Step 4: Wrap Up

In this design, we’ve built a comprehensive architecture for a dating application like Tinder. If there’s extra time, here are additional points to discuss:

Additional Features:

Video profiles: Short video introductions for more authentic representation
Virtual dates: In-app video calling for safer first interactions
Events and experiences: Coordinated group meetups and activities
Advanced matching: Machine learning models that learn user preferences over time
Icebreakers: AI-generated conversation starters based on profile analysis
Safety check-ins: Scheduled check-ins during in-person dates
Background checks: Optional identity verification and criminal record screening

Scaling Considerations:

Geographic Distribution: Multi-region deployment with data residency compliance. Users matched primarily with others in same region. Cross-region replication for global profiles.
Database Sharding: User data sharded by user ID using consistent hashing. Message data naturally partitioned by match ID. Location data partitioned by geographic region.
Caching Layers: Multi-level caching: application-level (user sessions), distributed cache (Redis), CDN (images). Cache warming during off-peak hours for active users.
Content Delivery: Images served via CDN with edge caching. Aggressive compression (WebP format). Lazy loading and progressive image rendering.

Advanced Recommendation Algorithms:

Collaborative filtering: “Users similar to you also liked these profiles”
Reinforcement learning: Learn from swipe patterns and match outcomes to improve recommendations
Contextual bandits: Balance exploration (showing diverse profiles) vs exploitation (showing likely matches)
Lookalike modeling: Find profiles similar to previously matched users
Time-aware recommendations: Show different profiles based on time of day or day of week

Monitoring and Analytics:

Key Metrics: Swipes per user per day, match rate (matches / likes), message response rate, time to match, user retention, premium conversion rate
Real-Time Dashboards: Geographic heatmaps of activity, swipe velocity, match velocity, error rates by service
A/B Testing Framework: Test recommendation algorithms, UI changes, pricing models, notification strategies
Anomaly Detection: Spike detection for traffic, latency, error rates. Automated alerting via PagerDuty

Cost Optimization:

Infrastructure: Use reserved instances for predictable baseline load, spot instances for batch jobs (image processing, analytics)
Storage: Tiered storage with S3 lifecycle policies. Move old messages and inactive profiles to cheaper storage classes
Image Optimization: Aggressive compression reduces CDN bandwidth by 60%. Serve appropriately sized images for device resolution
Database Optimization: Archived old data to cold storage. Implemented query optimization and proper indexing

Estimated Monthly Cost (75M users):

Compute (ECS/Kubernetes): $250k
Database (PostgreSQL, Cassandra): $180k
Cache (Redis clusters): $50k
Storage (S3): $75k for 225TB photos
CDN (CloudFront): $120k for 1PB transfer
Message Queue (Kafka): $30k
Monitoring and logging: $20k
Total: Approximately $725k/month or $0.01 per user per month

Security Considerations:

End-to-end encryption for messages (optional, user-controlled)
JWT tokens with short expiration and refresh tokens
API authentication via OAuth 2.0
SQL injection prevention via parameterized queries
XSS prevention via input sanitization and CSP headers
DDoS protection via CDN and rate limiting
Regular security audits and penetration testing

Future Enhancements:

Group dating: Match groups of friends for double dates
Matchmaker mode: Friends can swipe on behalf of user
AI dating coach: Personalized tips to improve profile and conversation skills
Integration with other social platforms: Import interests from Instagram, Spotify
Subscription tiers: Multiple premium levels with different features
In-app purchases: Buy individual boosts, super likes, or rewinds without full subscription

Trade-Offs and Design Decisions:

Eventual Consistency for Recommendations:

Pro: Faster response times, simpler architecture
Con: Recommendations may be slightly stale
Decision: Acceptable for non-critical feature, refresh periodically

Redis GEO vs PostGIS:

Redis: 10x faster, limited storage capacity
PostGIS: Slower queries, unlimited storage, richer query capabilities
Decision: Use Redis as primary with PostGIS as backup for complex spatial queries

Cassandra vs MongoDB for Messages:

Cassandra: Superior write throughput, time-series optimized, proven at scale
MongoDB: Easier queries, better for ad-hoc analysis, simpler operations
Decision: Cassandra for production workload, MongoDB for analytics replica

Microservices vs Monolith:

Microservices: Independent scaling, team autonomy, technology diversity
Monolith: Simpler deployment, easier local development, fewer operational concerns
Decision: Microservices for scale, with clear service boundaries

Build vs Buy:

Third-party mapping API: Buy (Google Maps, Mapbox) - not core competency
Photo verification: Build - critical for trust and customization
Payment processing: Buy (Stripe, Braintree) - regulatory complexity
ML models: Build with open-source frameworks - competitive advantage

Congratulations on completing this comprehensive design! Tinder’s architecture showcases how to handle real-time systems, geospatial data, recommendation algorithms, and safety features at massive scale. The key principles are atomic operations for consistency, asynchronous processing for scalability, caching for performance, and layered safety for trust.

Summary

This comprehensive guide covered the design of a dating application like Tinder, including:

Core Functionality: Profile management, geospatial discovery, swipe mechanics, match detection, real-time chat, and personalized recommendations.
Key Challenges: Efficient proximity searches, atomic match detection, recommendation algorithm design, real-time messaging at scale, and comprehensive safety mechanisms.
Solutions: Redis GEO for geospatial indexing, ELO rating system for attractiveness scoring, Lua scripts for atomic operations, Cassandra for message storage, WebSocket for real-time communication, and multi-layered content moderation.
Scalability: Horizontal scaling with microservices, multi-region deployment, aggressive caching, asynchronous processing, and database sharding.

The design demonstrates how to build a production-ready dating platform that balances user experience, performance, safety, and operational complexity while serving tens of millions of users globally.

Design Tinder

Design Tinder

Step 1: Understand the Problem and Establish Design Scope

Functional Requirements

Non-Functional Requirements

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Defining the Core Entities

API Design

High-Level Architecture

1. Users should be able to create profiles with photos, bio, and preferences

2. Users should be able to discover potential matches within a configurable radius

3. Users should be able to swipe on profiles and receive personalized recommendations

4. When two users mutually like each other, a match should be created

5. Matched users should receive real-time notifications

6. Matched users should be able to chat

Step 3: Design Deep Dive

Deep Dive 1: How do we efficiently handle geospatial queries for millions of users?

Deep Dive 2: How do we generate personalized recommendations that maximize engagement?

Deep Dive 3: How do we detect matches atomically and prevent race conditions?

Deep Dive 4: How do we handle real-time messaging at scale with 1.2 billion daily messages?

Deep Dive 5: How do we ensure system resilience during high-traffic periods and partial failures?

Deep Dive 6: How do we implement safety features like photo verification and content moderation?

Step 4: Wrap Up

Summary

Gaurav Aryal

Comments

Recently Viewed

Design Tinder

Design Tinder

Step 1: Understand the Problem and Establish Design Scope

Functional Requirements

Non-Functional Requirements

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Defining the Core Entities

API Design

High-Level Architecture

1. Users should be able to create profiles with photos, bio, and preferences

2. Users should be able to discover potential matches within a configurable radius

3. Users should be able to swipe on profiles and receive personalized recommendations

4. When two users mutually like each other, a match should be created

5. Matched users should receive real-time notifications

6. Matched users should be able to chat

Step 3: Design Deep Dive

Deep Dive 1: How do we efficiently handle geospatial queries for millions of users?

Deep Dive 2: How do we generate personalized recommendations that maximize engagement?

Deep Dive 3: How do we detect matches atomically and prevent race conditions?

Deep Dive 4: How do we handle real-time messaging at scale with 1.2 billion daily messages?

Deep Dive 5: How do we ensure system resilience during high-traffic periods and partial failures?

Deep Dive 6: How do we implement safety features like photo verification and content moderation?

Step 4: Wrap Up

Summary

Stay Updated

Gaurav Aryal

Comments

Recently Viewed

Keyboard Shortcuts

Navigation

Actions