Design Tinder
Tinder is a location-based social search mobile application that facilitates communication between mutually interested users. The platform serves 75+ million users globally with billions of swipes per day, requiring highly scalable infrastructure to handle real-time matching, geospatial queries, and instant messaging.
Designing Tinder presents unique challenges including efficient geospatial indexing, atomic match detection, personalized recommendation algorithms, real-time messaging at scale, and sophisticated safety mechanisms to prevent abuse and ensure user trust.
Step 1: Understand the Problem and Establish Design Scope
Before diving into the design, we need to define what we’re building. A good approach is to separate functional requirements from non-functional requirements, establish scale, and clarify any ambiguities through questions.
Functional Requirements
Core Requirements (Priority 1-3):
- Users should be able to create profiles with photos, bio, and preferences (age range, distance, gender).
- Users should be able to discover potential matches within a configurable radius based on their location.
- Users should be able to swipe right (like), left (pass), or super like on profiles.
- When two users mutually like each other, a match should be created and both users notified in real-time.
- Matched users should be able to chat with each other via text and media messages.
- Users should receive personalized recommendations based on their preferences, location, and behavior.
Below the Line (Out of Scope):
- Users should be able to see who liked them without matching (premium feature).
- Users should be able to undo their last swipe (rewind feature).
- Users should be able to change their location to anywhere in the world (passport feature).
- Users should be able to boost their profile for increased visibility.
- Users should be able to schedule dates and meet in person.
- Users should be able to verify their photos via selfie matching.
Non-Functional Requirements
Core Requirements:
- The system should handle 75+ million daily active users with low latency (P99 < 200ms for swipes).
- The system should ensure strong consistency for match creation to prevent duplicate matches or race conditions.
- The system should support real-time notifications (< 1 second) when matches occur.
- The system should be highly available (99.95% uptime) with no single point of failure.
- The system should protect user privacy and location data, complying with GDPR and CCPA.
Below the Line (Out of Scope):
- The system should gracefully handle partial failures without losing user data.
- The system should support A/B testing for recommendation algorithms.
- The system should provide comprehensive monitoring, logging, and alerting.
- The system should facilitate easy updates and CI/CD deployment.
Clarification Questions & Assumptions:
- Platform: Mobile apps for iOS and Android, plus web application.
- Scale: 75 million daily active users, 1.6 billion daily swipes, 26 million daily matches, 1.2 billion daily messages.
- Location Updates: Location updates occur when users open the app or manually refresh, not continuous tracking.
- Geographic Coverage: Global deployment with focus on major metropolitan areas.
- Storage: Approximately 1.2 petabytes total (profiles, photos, messages, metadata).
- Recommendation Refresh: Card stacks refreshed every 4 hours or when depleted.
Step 2: Propose High-Level Design and Get Buy-in
Planning the Approach
For a product-style system design like Tinder, we should build the design sequentially, addressing each functional requirement one by one. This ensures we cover all core features before diving into optimizations and deep dives.
Defining the Core Entities
To satisfy our key functional requirements, we need these entities:
User: Represents both profile data and authentication. Includes personal information (name, age, gender, bio), photos (up to 9 images), preferences (age range, distance, gender filter), location coordinates, activity status, and attractiveness score (ELO rating). Users have verification status and premium subscription status.
Swipe: Records every swipe action taken by users. Contains the swiper’s ID, the swipee’s ID, action type (like, pass, super like), and timestamp. This entity is crucial for match detection and prevents duplicate swipes on the same profile.
Match: Represents a mutual like between two users. Created when both users have liked each other. Contains both user IDs, match timestamp, active status, and optionally expiration time. The match serves as the gateway to enabling chat.
Message: Individual chat messages between matched users. Includes the match ID, sender ID, message content, message type (text, image, GIF), media URL if applicable, and timestamps for sent, delivered, and read status. Messages are partitioned by match for efficient retrieval.
Location: Real-time geospatial data for users. Stores latitude, longitude, geohash encoding for efficient proximity searches, and last update timestamp. Location data has privacy controls and is anonymized for safety.
Recommendation: Personalized card stack for each user. Contains pre-computed list of candidate profiles ranked by compatibility score, which considers attractiveness, distance, activity recency, and preference matching. Recommendations are cached and refreshed periodically.
API Design
Create Profile Endpoint: Used by new users to set up their dating profile after authentication.
POST /profile -> User
Body: {
name: string,
age: number,
gender: string,
bio: string,
photos: [file uploads],
preferences: { minAge, maxAge, maxDistance, genderFilter }
}
Update Location Endpoint: Called when user opens the app or manually refreshes to update their current location.
POST /location -> Success/Error
Body: {
lat: number,
long: number
}
Get Recommendations Endpoint: Retrieves the personalized card stack of potential matches for the user.
GET /recommendations?limit=50 -> [User]
Swipe Endpoint: Records a swipe action and checks for potential match.
POST /swipes -> { status: "swiped" | "match", matchId? }
Body: {
swipeeId: string,
action: "like" | "pass" | "super_like"
}
Send Message Endpoint: Sends a message to a matched user.
POST /matches/:matchId/messages -> Message
Body: {
text?: string,
mediaUrl?: string,
type: "text" | "image" | "gif"
}
Get Message History Endpoint: Retrieves chat history for a specific match.
GET /matches/:matchId/messages?limit=50&offset=UUID -> [Message]
Note: User authentication is handled via JWT tokens in headers, not in request bodies. All endpoints require authentication, and rate limiting is enforced at the API gateway level.
High-Level Architecture
Let’s build the system incrementally, addressing each functional requirement:
1. Users should be able to create profiles with photos, bio, and preferences
The core components for profile management:
- Mobile Client: Native iOS and Android apps that provide the user interface. Handles photo uploads, form validation, and local caching.
- API Gateway: Entry point for all client requests, handling authentication via JWT, rate limiting to prevent abuse, SSL termination, and routing to appropriate microservices.
- Profile Service: Manages all profile-related operations including CRUD operations, photo upload orchestration, preference updates, and profile search. Validates data constraints like age limits and photo count.
- Object Storage (S3): Stores user photos with lifecycle policies. Photos are organized by user ID and indexed. Pre-signed URLs are generated for secure upload and download.
- CDN (CloudFront): Distributes profile images globally for low-latency access. Implements caching strategies with appropriate TTLs to reduce origin load.
- Database (PostgreSQL): Stores structured profile data including user information, preferences, and photo metadata. Uses indexes on frequently queried fields like age and location.
Profile Creation Flow:
- User fills out profile information in the mobile app and selects photos from their device.
- Client sends POST request to
/profilewith form data. Photos are uploaded separately via pre-signed S3 URLs. - API Gateway authenticates the request and forwards to Profile Service.
- Profile Service validates input, generates photo URLs, and creates user record in PostgreSQL database.
- Photos are automatically processed (resized, compressed) and distributed to CDN edge locations.
- Service returns the complete User object to the client.
2. Users should be able to discover potential matches within a configurable radius
We introduce location management components:
- Location Service: Dedicated service for handling geospatial operations. Receives location updates from users, stores coordinates with geohash encoding, and performs efficient radius searches to find nearby users. Implements privacy controls to anonymize exact locations.
- Geospatial Data Store (Redis GEO): In-memory database optimized for geospatial queries. Uses sorted sets with geohash-based scoring for O(log N) proximity searches. Data is partitioned by geographic regions for better scalability.
Location Update Flow:
- When user opens the app, the mobile client gets GPS coordinates from device sensors.
- Client sends POST request to
/locationwith latitude and longitude. - API Gateway forwards to Location Service.
- Location Service encodes coordinates as geohash, stores in Redis GEO using GEOADD command, and associates with user ID.
- Old location data is overwritten or expires based on TTL policies.
Discovery Query: When generating recommendations, the system queries Location Service to find users within the configured radius, returning a list of candidate user IDs for further filtering.
3. Users should be able to swipe on profiles and receive personalized recommendations
We add recommendation and swipe handling:
- Recommendation Service: Generates personalized card stacks using multi-factor scoring algorithm. Considers user preferences, geographic proximity, attractiveness scores (ELO rating), activity recency, and profile completeness. Applies machine learning models to predict compatibility. Pre-computes and caches stacks for active users.
- Swipe Service: Processes swipe actions with idempotency guarantees. Records swipes in database, updates ELO scores asynchronously, enforces daily limits for free users, and triggers match detection logic when a like occurs.
- Cache Layer (Redis): Stores pre-computed recommendation stacks, swipe history for deduplication, and temporary data for match detection. Uses appropriate TTLs to balance freshness and performance.
Recommendation Generation Flow:
- User requests recommendations by swiping through their stack or explicitly refreshing.
- Recommendation Service queries Location Service for nearby users within max distance.
- Service filters candidates by age, gender preferences, and excludes previously swiped profiles.
- Service scores each candidate using weighted formula: attractiveness (30%), compatibility (30%), recency (20%), distance (20%).
- Top 50 candidates are sorted by score and returned as the card stack, which is cached for future requests.
Swipe Flow:
- User swipes on a profile in the mobile app, which sends POST request to
/swipeswith swipee ID and action. - API Gateway routes to Swipe Service.
- Service checks for duplicate swipe using cached swipe history.
- If new, service records swipe in database and removes swipee from user’s cached recommendation stack.
- Daily swipe counter is incremented with Redis INCR, checking against user’s limit.
4. When two users mutually like each other, a match should be created
We extend swipe handling with atomic match detection:
- Match Detection Logic: Embedded in Swipe Service, uses Redis Lua scripts for atomic operations. When processing a like, checks if the swipee has already liked the swiper. If yes, creates match atomically to prevent race conditions. Uses deterministic user ID ordering to prevent duplicate match records.
Match Creation Flow:
- User A swipes right on User B, triggering the swipe endpoint.
- Swipe Service records the like and executes atomic match check.
- Using Redis Lua script, service adds User A to User B’s like set and checks if User B is in User A’s like set.
- If mutual like detected, service creates match record in database with unique constraint on (user1_id, user2_id).
- Match event is published to message queue for notification processing.
5. Matched users should receive real-time notifications
We add notification infrastructure:
- Notification Service: Consumes match events from message queue (Kafka). Sends push notifications via APN (iOS) and FCM (Android). Manages WebSocket connections for real-time in-app notifications. Handles notification preferences and delivery retry logic.
- Message Queue (Kafka): Decouples match creation from notification delivery. Provides durability and replay capabilities. Enables asynchronous processing and horizontal scaling of notification workers.
- WebSocket Server: Maintains persistent connections with active users for instant in-app updates. Sends match notifications, typing indicators, and message delivery receipts. Implements heartbeat mechanism to detect disconnections.
Notification Flow:
- After match is created, Swipe Service publishes match event to Kafka topic.
- Notification Service consumes event and retrieves user details for personalization.
- Service checks if users are online via WebSocket connection registry in Redis.
- For online users, sends instant notification via WebSocket with matched user’s profile.
- For offline users or as backup, sends push notification via FCM/APN.
- Mobile client receives notification and displays match screen with animation.
6. Matched users should be able to chat
We introduce messaging infrastructure:
- Chat Service: Handles real-time messaging between matched users. Validates that both users are part of the match before allowing messages. Manages message persistence, delivery receipts, and read status. Implements rate limiting to prevent spam.
- Message Database (Cassandra): NoSQL database optimized for time-series data. Partitions messages by match ID for locality. Uses time-based UUID for message ordering. Supports efficient pagination for chat history retrieval.
- WebSocket Manager: Routes messages between users in real-time. Maintains connection mappings from user ID to WebSocket connection. Handles connection failures and reconnection logic.
Messaging Flow:
- User types message in chat interface and hits send.
- Client sends POST request to
/matches/:matchId/messagesover HTTPS. - API Gateway routes to Chat Service.
- Chat Service validates match membership and creates message object with time-based UUID.
- Message is published to Kafka for asynchronous persistence to Cassandra.
- Recent messages (last 100) are cached in Redis for fast retrieval.
- If recipient is online, message is immediately sent via WebSocket connection.
- If offline, message waits in queue and push notification is sent.
- When recipient opens app, messages are retrieved from cache or Cassandra.
Step 3: Design Deep Dive
With core functionality in place, we focus on non-functional requirements and system optimizations. These deep dives differentiate a basic design from production-ready architecture.
Deep Dive 1: How do we efficiently handle geospatial queries for millions of users?
Challenge: Finding users within X miles of a given location requires comparing coordinates, which is computationally expensive at scale. A naive approach would scan all users and calculate distances, resulting in O(N) complexity and prohibitively slow query times.
Solution: Geohash-Based Indexing with Redis GEO
Geohashes encode 2D coordinates (latitude, longitude) into a single string using base-32 characters. The key property is that geographically close locations share longer common prefixes. For example:
- San Francisco: “9q8yy”
- San Jose: “9q9h”
- Los Angeles: “9q5c”
Redis provides built-in GEO commands that leverage sorted sets with geohash scoring:
- GEOADD adds a location with longitude, latitude, and member key
- GEORADIUS finds all members within a specified radius of given coordinates
- GEOSEARCH provides more flexible querying with sorting options
Implementation Strategy:
To handle 75 million users efficiently, we partition the world into regional grids of approximately 100km × 100km. Each region becomes a separate Redis key, reducing the search space significantly.
When a user updates their location:
- Calculate which grid region they belong to based on coordinates
- Remove them from their old region’s sorted set (if changed)
- Add them to the new region’s sorted set using GEOADD
- Store metadata (exact coordinates, timestamp) in a separate hash key
When finding nearby users:
- Determine user’s current region from their coordinates
- Query primary region using GEORADIUS with desired radius
- If near region boundaries or insufficient candidates, query adjacent regions
- Redis returns members within radius sorted by distance
- Results are filtered by availability, preferences, and previous swipes
Optimization Techniques:
Caching Frequent Searches: Since many users in the same area perform similar queries, we cache results in Redis with 5-minute TTL. Cache key format: “nearby:{lat}:{long}:{radius}”. This reduces repeated computations and achieves 85% cache hit rate.
Lazy Location Updates: To reduce write load, location is only updated if the user has moved more than 1km or 30 minutes have elapsed since last update. For stationary users, this dramatically reduces unnecessary writes.
Read Replicas: Location queries far outnumber writes. Redis read replicas handle GEORADIUS queries while master handles GEOADD writes. This separates concerns and allows independent scaling.
Regional Sharding: Users in Asia query Asian regions, US users query US regions. Geographic sharding reduces latency and allows regional scaling based on user density.
Performance Characteristics:
- Proximity search: O(log N + M) where N is users in region, M is results within radius
- P99 latency: < 50ms for typical queries
- Write throughput: ~100k location updates per second per Redis instance
- Memory: ~50 bytes per user location entry
Deep Dive 2: How do we generate personalized recommendations that maximize engagement?
Challenge: With millions of potential matches, how do we rank and recommend profiles that users will actually like? A random approach leads to poor match rates. We need a sophisticated algorithm that considers multiple factors while maintaining performance.
Solution: Multi-Factor Scoring with ELO Rating System
Our recommendation engine combines several signals into a unified score:
Attractiveness Score (ELO Rating):
Borrowed from chess ratings, ELO provides a relative measure of profile attractiveness based on swipe outcomes. Every profile starts at 1000 points. When User A swipes on User B:
- If A likes B: B’s score increases, A’s decreases slightly (B “won” the comparison)
- If A passes on B: A’s score increases, B’s decreases (A “won”)
The magnitude of change depends on the rating difference. If a highly-rated profile is liked by a lower-rated user, the change is small. If the opposite occurs, the change is larger. This creates a dynamic rating that converges to reflect true attractiveness.
ELO updates are processed asynchronously via Kafka to avoid blocking swipe operations. Expected score is calculated using formula: 1 / (1 + 10^((opponent_elo - player_elo) / 400)). Rating change is K-factor (32) multiplied by (actual_result - expected_score).
Compatibility Scoring:
We calculate compatibility based on shared interests, education level, occupation, and behavioral patterns. For interests, we use Jaccard similarity: size of intersection divided by size of union. Profile photos are processed through CNN to extract style embeddings, and cosine similarity measures visual compatibility.
Recency Weighting:
Active users are prioritized using exponential decay. Time since last active (in hours) is converted to recency score using: exp(-hours_inactive / 24). This ensures newly active users appear in recommendations quickly.
Distance Factor:
Proximity matters for dating. Distance score is calculated as: 1 - (distance_km / max_distance_preference). Users at 5 miles get higher scores than those at 45 miles, assuming a 50-mile preference.
Profile Completeness:
Profiles with more photos and complete bios are ranked higher: completeness_score = (num_photos / 6) * 0.5 + (has_bio ? 0.5 : 0).
Combined Score:
Final recommendation score = (0.30 × attractiveness) + (0.30 × compatibility) + (0.20 × recency) + (0.15 × distance) + (0.05 × completeness). These weights are tuned via A/B testing to maximize match rates.
Card Stack Generation:
- Query Location Service for users within max distance (e.g., 50 miles)
- Filter by hard preferences: age range, gender, mutual blocks
- Exclude previously swiped profiles using Bloom filter for efficiency
- Calculate score for each candidate using formula above
- Sort by score descending
- Apply diversity heuristics to avoid showing similar profiles consecutively
- Take top 50 candidates as card stack
- Cache stack in Redis with 4-hour TTL
Performance Optimizations:
Pre-computation is key. For highly active users (logged in within last hour), we proactively generate recommendations during off-peak hours. This moves computation off critical path.
Scoring calculations are parallelized across candidates. If location service returns 500 candidates, we score them concurrently using worker pools.
Machine learning models run on GPU-equipped instances for faster inference. Photo similarity models are particularly compute-intensive.
Stacks are refreshed when depleted or after 4 hours. If user swipes through all 50 profiles, a new batch is generated immediately to maintain engagement.
Deep Dive 3: How do we detect matches atomically and prevent race conditions?
Challenge: When User A swipes right on User B while User B simultaneously swipes right on User A, we need to create exactly one match record. If both swipe operations check for matches independently, race conditions could cause duplicate matches or missed matches.
Solution: Atomic Match Detection with Redis Lua Scripts
Redis Lua scripts execute atomically, providing transactional guarantees without complex distributed locking.
Data Structure:
For each user, maintain a Redis set containing user IDs who have liked them. Key format: “likes:{user_id}”. When User A likes User B, we add A to B’s like set.
Atomic Check-and-Set:
When processing a like from User A to User B, we execute a Lua script that:
- Adds A to B’s like set (SADD likes:B A)
- Checks if B is in A’s like set (SISMEMBER likes:A B)
- Returns 1 if mutual like detected, 0 otherwise
This entire operation is atomic. No other operation can interleave between the add and check.
Match Creation:
If the Lua script returns 1 (mutual like), we create a match record in PostgreSQL. To prevent duplicate matches, we enforce uniqueness:
- Always order user IDs: smaller ID becomes user1_id, larger becomes user2_id
- Unique constraint on (user1_id, user2_id) column pair
- If constraint violation occurs, we know match already exists
Idempotency:
Swipes themselves are idempotent. If User A likes User B twice (perhaps due to client retry), we check for existing swipe record before processing. Unique constraint on (swiper_id, swipee_id) prevents duplicate swipe records.
Failure Handling:
If match creation fails after Redis confirms mutual like, we have options:
- Retry match creation (idempotent due to unique constraint)
- Publish to dead letter queue for manual inspection
- Return success to user anyway (match eventually created by async worker)
Distributed Considerations:
Even with multiple Swipe Service instances, the atomic Lua script guarantees correctness. Redis is single-threaded for command execution, so concurrent scripts execute serially with no race condition.
Performance:
The Lua script executes in microseconds. Combined with match record creation, end-to-end match detection latency is under 10ms at P99. This is fast enough for real-time user experience.
Deep Dive 4: How do we handle real-time messaging at scale with 1.2 billion daily messages?
Challenge: Supporting real-time chat for millions of concurrent users requires low-latency message delivery, efficient storage for message history, and reliable delivery guarantees. Traditional RDBMS struggle with write-heavy messaging workloads.
Solution: WebSocket + Cassandra + Message Queue Architecture
Real-Time Delivery via WebSocket:
HTTP request-response is inefficient for bidirectional real-time communication. WebSocket provides persistent TCP connections with low overhead. Each mobile client establishes a WebSocket connection to the Chat Service upon app launch.
WebSocket Manager maintains a hash map: user_id -> WebSocket connection object. When a message arrives for User B, the manager looks up B’s connection and sends the message immediately. Heartbeat pings every 30 seconds detect dead connections.
Connection state is stored in Redis for multi-instance coordination. When Chat Service instance receives message for User B, it checks Redis to find which server instance holds B’s connection, then forwards via inter-service messaging.
Message Persistence with Cassandra:
Cassandra excels at write-heavy time-series data. We model messages with match_id as partition key and message_id (time-based UUID) as clustering key. This co-locates all messages for a conversation on the same node, enabling efficient retrieval.
Schema design: partition key is match_id, clustering key is message_id (TIMEUUID) with descending order. This allows paginated queries returning most recent messages first.
Asynchronous Write Path:
To minimize user-facing latency, message persistence is asynchronous:
- Chat Service receives message via HTTPS POST
- Service validates match membership
- Message is immediately sent via WebSocket to recipient (if online)
- Message is published to Kafka topic for persistence
- Response is returned to sender (delivered status)
- Kafka consumer workers write to Cassandra in background
- Once written, status is updated to “persisted”
This approach decouples user-facing latency from database write latency. Even if Cassandra is slow, users see instant message delivery.
Message Caching:
Recent messages (last 100 per match) are cached in Redis list structure. Key format: “chat:{match_id}:recent”. When users open a chat, we serve from cache with < 5ms latency. Older messages require Cassandra query.
Cache is updated optimistically when sending messages and periodically refreshed from source of truth (Cassandra). TTL of 24 hours keeps memory usage bounded.
Delivery Guarantees:
Messages are persisted in Kafka before acknowledgment. If Chat Service crashes after sending via WebSocket but before Kafka publish, the message is lost. To prevent this, we use Kafka producer acknowledgments and only return success after Kafka confirms receipt.
For offline users, messages queue in Cassandra until they reconnect. Upon reconnection, client requests message history, retrieving any missed messages.
Read Receipts:
When User B reads messages from User A, client sends acknowledgment to Chat Service. Service updates message read_at timestamp in Cassandra and sends “message_read” event via WebSocket to User A. This provides real-time read receipt indication.
Typing Indicators:
Typing indicators are ephemeral and not persisted. When User A types, client sends typing event via WebSocket. Chat Service forwards to User B (if online) without database interaction. This minimizes latency for transient state.
Scalability:
WebSocket Manager can scale horizontally. Each instance handles a subset of connections. Redis serves as coordination layer for routing messages between instances.
Cassandra scales horizontally by adding nodes. Partition key (match_id) distributes data evenly. Replication factor of 3 provides fault tolerance.
Kafka partitions enable parallel message processing. Each partition is consumed by one consumer instance, allowing horizontal scaling of write workers.
Performance Metrics:
- Message delivery latency (sender to recipient): P99 < 100ms
- Message persistence latency: P99 < 500ms
- Chat history query: P99 < 50ms (cache hit), P99 < 200ms (cache miss)
- Throughput: 50k messages per second per Cassandra cluster
Deep Dive 5: How do we ensure system resilience during high-traffic periods and partial failures?
Challenge: Dating apps experience traffic spikes during evenings and weekends. Special events like Valentine’s Day can cause 3-5x normal load. We need graceful degradation rather than cascading failures.
Solution: Rate Limiting, Circuit Breakers, and Asynchronous Processing
API Rate Limiting:
API Gateway enforces rate limits per user to prevent abuse and ensure fair resource allocation. Limits are tiered based on subscription:
- Free users: 100 requests per minute
- Premium users: 500 requests per minute
Rate limiting uses token bucket algorithm implemented in Redis. For each user request, we increment counter with expiration. If counter exceeds limit, request is rejected with 429 status code.
Swipe limits are separate: free users get 100 swipes per day, premium get unlimited. This is enforced at the Swipe Service level using daily counters.
Circuit Breakers:
When downstream services fail, circuit breakers prevent cascading failures. Each service dependency (e.g., Chat Service calling Notification Service) has circuit breaker wrapper:
Closed state: Requests pass through normally. Failures increment counter. Open state: After N failures, circuit opens. Requests fail immediately without calling downstream. This prevents resource exhaustion. Half-open state: After timeout, circuit allows one test request. If successful, circuit closes. If failed, reopens.
Fallback Strategies:
When services degrade, we provide fallback responses:
- Recommendation Service failure: Return cached recommendations or random nearby profiles
- Location Service failure: Use last known location with staleness warning
- Notification Service failure: Queue notifications for retry, user sees match in app regardless
- Third-party APIs (maps, ML models) failure: Use cached results or simplified algorithms
Asynchronous Processing:
Non-critical operations are asynchronous to avoid blocking user-facing requests:
- ELO score updates published to Kafka, processed by workers
- Email notifications sent via background jobs
- Analytics events batched and written offline
- Photo processing (resize, compress, moderation) happens asynchronously
Load Shedding:
During extreme load, we prioritize critical paths:
- Swipes and match detection are highest priority (core user experience)
- Recommendation generation is medium priority (can use stale cache)
- Analytics and logging are lowest priority (can be sampled or dropped)
Load shedding is implemented via priority queues. When queue depth exceeds threshold, low-priority tasks are dropped with exponential backoff.
Database Connection Pooling:
Database connections are expensive resources. Connection pools maintain reusable connections, preventing connection exhaustion during traffic spikes. Pool size is tuned based on database capacity and service instance count.
Auto-Scaling:
Services run on containerized infrastructure (Kubernetes or ECS) with auto-scaling policies:
- Scale up when CPU > 70% or request latency > 200ms
- Scale down when CPU < 30% for 10 minutes
- Minimum replicas: 3 per service (for redundancy)
- Maximum replicas: 100 per service (cost control)
Health Checks:
Load balancers perform health checks every 5 seconds. Unhealthy instances are removed from rotation automatically. Health check includes database connectivity and service dependencies.
Partial Degradation:
Features are designed to fail independently. If Chat Service is down, swipes and matches still work. If Recommendation Service is down, users can manually search profiles. This prevents total outage from single component failure.
Deep Dive 6: How do we implement safety features like photo verification and content moderation?
Challenge: Dating apps face risks of fake profiles, catfishing, inappropriate content, and harassment. Safety features are critical for user trust and regulatory compliance.
Solution: Multi-Layered Safety Architecture
Photo Verification:
Users can verify their profile by taking a real-time selfie. Verification flow:
- User initiates verification, service generates session with random pose requirement (smile, turn left, neutral)
- User takes selfie following instructions within time limit
- Client uploads selfie to verification service
- Service detects liveness using ML model to ensure it’s not a photo of a photo (checks for screen reflection, digital artifacts)
- Service extracts facial embedding from selfie using face recognition model
- Service compares selfie embedding to embeddings of profile photos using cosine similarity
- If similarity exceeds 80% threshold, user is marked as verified
- Verified badge is displayed on profile, increasing trust and engagement
Verification reduces catfishing and increases match rates by 40% according to typical metrics.
Automated Content Moderation:
All uploaded photos are screened before approval:
- Photo is analyzed by NSFW detection model (convolutional neural network) to classify explicit content
- Photo is sent to cloud vision API (AWS Rekognition, Google Cloud Vision) for moderation labels
- If NSFW score exceeds 85% or moderation labels include explicit categories, photo is rejected or quarantined
- Flagged photos are queued for human review by moderation team
- Repeat violations result in account suspension
Text moderation for bios and messages:
- Text is analyzed using toxicity detection API (Perspective API, OpenAI Moderation)
- Scores are calculated for categories: toxicity, harassment, hate speech, threats, profanity
- If any score exceeds 80%, content is blocked or flagged for review
- For messages, users are warned before sending, with option to rephrase
Report and Block System:
Users can report profiles for inappropriate behavior:
- Reporter selects reason (harassment, fake profile, inappropriate photos, scam, other)
- Report is recorded with timestamp and details
- Reported user’s violation counter increments in Redis
- If report count exceeds threshold (5 reports from different users), account is auto-suspended pending review
- Reports are queued for moderation team with priority based on severity
- Moderation team reviews evidence (profile, messages, photos) and decides action (warning, suspension, ban)
Blocking is immediate and bidirectional:
- User A blocks User B
- Both users disappear from each other’s recommendations permanently
- Any existing match is dissolved
- Message history is preserved (for legal compliance) but inaccessible
- Block list is stored in Redis for fast filtering during recommendation generation
Safety Center:
In-app safety center provides:
- Tips for safe dating (meet in public, tell friend, video chat first)
- Resources for reporting abuse
- Privacy controls (hide profile, control location precision, manage photo visibility)
- Panic button that immediately contacts emergency services with location
Machine Learning for Fraud Detection:
Anomaly detection models identify suspicious behavior:
- Accounts that mass-swipe right (bots or spammers)
- Profiles with stock photos (reverse image search)
- Accounts sending identical messages to many users (spam)
- Unusual activity patterns (logging in from multiple countries simultaneously)
Flagged accounts are automatically shadowbanned (invisible to others) pending investigation.
Compliance and Privacy:
GDPR and CCPA compliance requires:
- Data deletion upon request (right to be forgotten)
- Data export functionality (right to access)
- Consent management for data collection
- Location data anonymization (rounded coordinates shown publicly, exact coordinates used only for matching)
- Encryption at rest and in transit (TLS 1.3, AES-256)
Step 4: Wrap Up
In this design, we’ve built a comprehensive architecture for a dating application like Tinder. If there’s extra time, here are additional points to discuss:
Additional Features:
- Video profiles: Short video introductions for more authentic representation
- Virtual dates: In-app video calling for safer first interactions
- Events and experiences: Coordinated group meetups and activities
- Advanced matching: Machine learning models that learn user preferences over time
- Icebreakers: AI-generated conversation starters based on profile analysis
- Safety check-ins: Scheduled check-ins during in-person dates
- Background checks: Optional identity verification and criminal record screening
Scaling Considerations:
- Geographic Distribution: Multi-region deployment with data residency compliance. Users matched primarily with others in same region. Cross-region replication for global profiles.
- Database Sharding: User data sharded by user ID using consistent hashing. Message data naturally partitioned by match ID. Location data partitioned by geographic region.
- Caching Layers: Multi-level caching: application-level (user sessions), distributed cache (Redis), CDN (images). Cache warming during off-peak hours for active users.
- Content Delivery: Images served via CDN with edge caching. Aggressive compression (WebP format). Lazy loading and progressive image rendering.
Advanced Recommendation Algorithms:
- Collaborative filtering: “Users similar to you also liked these profiles”
- Reinforcement learning: Learn from swipe patterns and match outcomes to improve recommendations
- Contextual bandits: Balance exploration (showing diverse profiles) vs exploitation (showing likely matches)
- Lookalike modeling: Find profiles similar to previously matched users
- Time-aware recommendations: Show different profiles based on time of day or day of week
Monitoring and Analytics:
- Key Metrics: Swipes per user per day, match rate (matches / likes), message response rate, time to match, user retention, premium conversion rate
- Real-Time Dashboards: Geographic heatmaps of activity, swipe velocity, match velocity, error rates by service
- A/B Testing Framework: Test recommendation algorithms, UI changes, pricing models, notification strategies
- Anomaly Detection: Spike detection for traffic, latency, error rates. Automated alerting via PagerDuty
Cost Optimization:
- Infrastructure: Use reserved instances for predictable baseline load, spot instances for batch jobs (image processing, analytics)
- Storage: Tiered storage with S3 lifecycle policies. Move old messages and inactive profiles to cheaper storage classes
- Image Optimization: Aggressive compression reduces CDN bandwidth by 60%. Serve appropriately sized images for device resolution
- Database Optimization: Archived old data to cold storage. Implemented query optimization and proper indexing
Estimated Monthly Cost (75M users):
- Compute (ECS/Kubernetes): $250k
- Database (PostgreSQL, Cassandra): $180k
- Cache (Redis clusters): $50k
- Storage (S3): $75k for 225TB photos
- CDN (CloudFront): $120k for 1PB transfer
- Message Queue (Kafka): $30k
- Monitoring and logging: $20k
- Total: Approximately $725k/month or $0.01 per user per month
Security Considerations:
- End-to-end encryption for messages (optional, user-controlled)
- JWT tokens with short expiration and refresh tokens
- API authentication via OAuth 2.0
- SQL injection prevention via parameterized queries
- XSS prevention via input sanitization and CSP headers
- DDoS protection via CDN and rate limiting
- Regular security audits and penetration testing
Future Enhancements:
- Group dating: Match groups of friends for double dates
- Matchmaker mode: Friends can swipe on behalf of user
- AI dating coach: Personalized tips to improve profile and conversation skills
- Integration with other social platforms: Import interests from Instagram, Spotify
- Subscription tiers: Multiple premium levels with different features
- In-app purchases: Buy individual boosts, super likes, or rewinds without full subscription
Trade-Offs and Design Decisions:
Eventual Consistency for Recommendations:
- Pro: Faster response times, simpler architecture
- Con: Recommendations may be slightly stale
- Decision: Acceptable for non-critical feature, refresh periodically
Redis GEO vs PostGIS:
- Redis: 10x faster, limited storage capacity
- PostGIS: Slower queries, unlimited storage, richer query capabilities
- Decision: Use Redis as primary with PostGIS as backup for complex spatial queries
Cassandra vs MongoDB for Messages:
- Cassandra: Superior write throughput, time-series optimized, proven at scale
- MongoDB: Easier queries, better for ad-hoc analysis, simpler operations
- Decision: Cassandra for production workload, MongoDB for analytics replica
Microservices vs Monolith:
- Microservices: Independent scaling, team autonomy, technology diversity
- Monolith: Simpler deployment, easier local development, fewer operational concerns
- Decision: Microservices for scale, with clear service boundaries
Build vs Buy:
- Third-party mapping API: Buy (Google Maps, Mapbox) - not core competency
- Photo verification: Build - critical for trust and customization
- Payment processing: Buy (Stripe, Braintree) - regulatory complexity
- ML models: Build with open-source frameworks - competitive advantage
Congratulations on completing this comprehensive design! Tinder’s architecture showcases how to handle real-time systems, geospatial data, recommendation algorithms, and safety features at massive scale. The key principles are atomic operations for consistency, asynchronous processing for scalability, caching for performance, and layered safety for trust.
Summary
This comprehensive guide covered the design of a dating application like Tinder, including:
- Core Functionality: Profile management, geospatial discovery, swipe mechanics, match detection, real-time chat, and personalized recommendations.
- Key Challenges: Efficient proximity searches, atomic match detection, recommendation algorithm design, real-time messaging at scale, and comprehensive safety mechanisms.
- Solutions: Redis GEO for geospatial indexing, ELO rating system for attractiveness scoring, Lua scripts for atomic operations, Cassandra for message storage, WebSocket for real-time communication, and multi-layered content moderation.
- Scalability: Horizontal scaling with microservices, multi-region deployment, aggressive caching, asynchronous processing, and database sharding.
The design demonstrates how to build a production-ready dating platform that balances user experience, performance, safety, and operational complexity while serving tens of millions of users globally.
Comments