Design Proximity Service
A proximity service is a geospatial system that enables users to discover nearby points of interest (POI) such as restaurants, gas stations, hotels, or other businesses. Systems like Yelp, Google Maps, Uber, and Foursquare rely on efficient proximity search at massive scale. This design covers how to build a production-grade proximity service handling billions of places and millions of concurrent queries.
Step 1: Requirements Clarification
Functional Requirements
Core Features:
- Search for nearby places within a given radius (e.g., 5km, 10km)
- Filter results by category (restaurants, gas stations, hotels, etc.)
- Return detailed place information (name, address, rating, photos)
- Rank results by distance, popularity, ratings, and relevance
- Support real-time updates when places are added, modified, or closed
- Handle both stationary objects (businesses) and moving objects (users)
- Support different search modes: radius search, k-nearest neighbors (kNN)
Optional Features:
- Place recommendations based on user preferences
- Real-time place availability (e.g., wait times, parking spots)
- Direction and navigation integration
- User check-ins and reviews
Non-Functional Requirements
Scale:
- 500 million places globally
- 100 million daily active users (DAU)
- 5 billion search queries per day (~57,870 QPS)
- Peak load: 200,000 QPS
- Each place has ~1KB of data
- Search latency: p99 < 200ms
Availability and Reliability:
- 99.99% availability (52 minutes downtime per year)
- Geo-redundant deployment across multiple regions
- Graceful degradation during partial failures
Data Characteristics:
- Read-heavy: 99% reads, 1% writes
- Place data changes infrequently (hours/days)
- User location changes frequently (seconds/minutes)
- Search patterns are geographically clustered
Storage Estimation:
- Place data: 500M places * 1KB = 500GB
- With metadata, indexes: ~2TB total
- User location cache: 100M users * 100 bytes = 10GB
Step 2: High-Level Design
System Architecture
┌─────────────┐
│ Clients │ (Mobile, Web)
└──────┬──────┘
│
┌──────▼──────────────────────────────────────┐
│ API Gateway / Load Balancer │
│ (Rate Limiting, Authentication) │
└──────┬──────────────────────────────────────┘
│
┌──────▼───────────────────────────────────────┐
│ Location Service Cluster │
│ (Geohash, Quadtree, Proximity Algorithms) │
└──┬───────────────────────────────────────┬───┘
│ │
┌──▼─────────────────┐ ┌──────────▼──────────┐
│ Search Service │ │ Ranking Service │
│ (Elasticsearch/ │ │ (ML-based scoring) │
│ Redis GEO) │ └──────────┬──────────┘
└──┬─────────────────┘ │
│ │
┌──▼───────────────────────────────────────▼───┐
│ Place Service (CRUD) │
│ (Place metadata management) │
└──┬───────────────────────────────────────────┘
│
┌──▼──────────────────────────────────────────┐
│ Data Layer │
│ ┌────────────┐ ┌──────────┐ ┌─────────┐ │
│ │ PostgreSQL │ │ Redis │ │ S3/CDN │ │
│ │ (Master- │ │ (Cache) │ │(Photos) │ │
│ │ Replica) │ └──────────┘ └─────────┘ │
│ └────────────┘ │
└─────────────────────────────────────────────┘
Core Components
1. API Gateway:
- Authentication and authorization
- Rate limiting per user/API key
- Request routing and load balancing
- SSL termination
2. Location Service:
- Receives user coordinates and search radius
- Translates geographic coordinates to geospatial indexes
- Performs initial candidate selection
- Handles geospatial computations
3. Search Service:
- Executes proximity queries using geospatial indexes
- Filters results by category, hours, ratings
- Returns candidate list to ranking service
- Powered by Redis GEO or Elasticsearch
4. Ranking Service:
- Scores candidates based on multiple factors
- Distance-based scoring (closer = higher)
- Popularity signals (reviews, ratings, check-ins)
- Personalization based on user preferences
- ML models for relevance ranking
5. Place Service:
- CRUD operations for place data
- Manages place metadata (name, category, hours, photos)
- Handles place updates and deletions
- Synchronizes with search indexes
6. Data Layer:
- PostgreSQL: Primary source of truth for place data
- Redis: Geospatial index and cache layer
- Elasticsearch: Full-text search and complex geo queries
- S3/CDN: Static assets (photos, logos)
API Design
GET /v1/search/nearby
Parameters:
- latitude: double (required)
- longitude: double (required)
- radius: int (meters, default: 5000, max: 50000)
- category: string (optional)
- limit: int (default: 20, max: 100)
- offset: int (pagination)
- sort: string (distance, rating, popularity)
Response:
{
"results": [
{
"place_id": "uuid",
"name": "Blue Bottle Coffee",
"category": "cafe",
"location": {"lat": 37.7749, "lon": -122.4194},
"distance": 450, // meters
"rating": 4.5,
"price_level": 2,
"open_now": true,
"photos": ["url1", "url2"]
}
],
"total": 156,
"next_offset": 20
}
Step 3: Deep Dive into Critical Components
3.1 Geospatial Indexing Strategies
The core challenge is efficiently finding nearby places from 500M+ locations. Traditional database indexes don’t work well for 2D spatial queries.
Option 1: Geohash
How it works: Geohash encodes latitude/longitude into a short alphanumeric string. Nearby locations share common prefixes.
San Francisco: 9q8yy (37.7749, -122.4194)
Oakland: 9q9p1 (37.8044, -122.2712)
Properties:
- 4-character geohash: ~20km x 20km grid
- 5-character geohash: ~4.9km x 4.9km grid
- 6-character geohash: ~1.2km x 1.2km grid
- 7-character geohash: ~152m x 152m grid
Implementation in Redis:
# Add places to geospatial index
GEOADD places:geo -122.4194 37.7749 "place:123"
GEOADD places:geo -122.2712 37.8044 "place:456"
# Search within radius
GEORADIUS places:geo -122.4194 37.7749 5 km WITHDIST WITHCOORD COUNT 20
# Search by existing member
GEORADIUSBYMEMBER places:geo "place:123" 10 km WITHDIST
Advantages:
- Simple to implement
- Fast lookups using prefix matching
- Works with standard databases (index on geohash string)
- Consistent grid size at same precision level
Disadvantages:
- Edge cases: Places just across geohash boundaries might be missed
- Requires checking neighboring geohashes for border queries
- Fixed grid doesn’t adapt to density
Production Implementation:
import geohash2
def find_nearby_geohashes(lat, lon, radius_km):
"""
Returns list of geohashes to check based on radius.
For 5km radius, check center + 8 neighbors.
"""
center_hash = geohash2.encode(lat, lon, precision=6)
neighbors = geohash2.neighbors(center_hash)
return [center_hash] + neighbors
def search_nearby(lat, lon, radius_km, category=None):
geohashes = find_nearby_geohashes(lat, lon, radius_km)
candidates = []
for gh in geohashes:
# Query database with geohash prefix
places = db.query(
"SELECT * FROM places WHERE geohash LIKE ? AND category = ?",
(gh + '%', category)
)
candidates.extend(places)
# Filter by actual distance
results = []
for place in candidates:
distance = haversine(lat, lon, place.lat, place.lon)
if distance <= radius_km:
results.append((place, distance))
return sorted(results, key=lambda x: x[1])
Option 2: Quadtree
How it works: Quadtree recursively divides 2D space into four quadrants. Dense areas get more subdivisions, sparse areas remain coarse.
Root (World)
├─ NW (North America)
│ ├─ NW (Pacific Northwest)
│ ├─ NE (Northeast US)
│ ├─ SW (Southwest US)
│ └─ SE (Southeast US)
├─ NE (Europe)
├─ SW (South America)
└─ SE (Asia)
Structure:
class QuadTreeNode:
def __init__(self, boundary, capacity=50):
self.boundary = boundary # (min_lat, max_lat, min_lon, max_lon)
self.capacity = capacity
self.places = []
self.divided = False
self.nw = self.ne = self.sw = self.se = None
def subdivide(self):
mid_lat = (self.boundary.min_lat + self.boundary.max_lat) / 2
mid_lon = (self.boundary.min_lon + self.boundary.max_lon) / 2
self.nw = QuadTreeNode(Boundary(mid_lat, max_lat, min_lon, mid_lon))
self.ne = QuadTreeNode(Boundary(mid_lat, max_lat, mid_lon, max_lon))
self.sw = QuadTreeNode(Boundary(min_lat, mid_lat, min_lon, mid_lon))
self.se = QuadTreeNode(Boundary(min_lat, mid_lat, mid_lon, max_lon))
self.divided = True
def insert(self, place):
if not self.boundary.contains(place.location):
return False
if len(self.places) < self.capacity:
self.places.append(place)
return True
if not self.divided:
self.subdivide()
# Redistribute existing places
for p in self.places:
self._insert_to_child(p)
self.places = []
return self._insert_to_child(place)
def search_radius(self, center, radius):
results = []
if not self.boundary.intersects_circle(center, radius):
return results
# Check leaf nodes
for place in self.places:
if distance(center, place.location) <= radius:
results.append(place)
# Recurse to children
if self.divided:
results.extend(self.nw.search_radius(center, radius))
results.extend(self.ne.search_radius(center, radius))
results.extend(self.sw.search_radius(center, radius))
results.extend(self.se.search_radius(center, radius))
return results
Advantages:
- Adaptive: More subdivisions in dense areas (Manhattan) vs sparse areas (rural)
- Efficient for k-nearest neighbor queries
- No edge case issues like geohash
- Memory-efficient for sparse regions
Disadvantages:
- Complex to implement and maintain
- Expensive to rebalance on updates
- Difficult to distribute across multiple servers
- In-memory structure, hard to persist
Best for: In-memory caching layer, not primary storage.
Option 3: R-tree
How it works: R-tree is similar to B-tree but for multi-dimensional data. Groups nearby objects into hierarchical bounding boxes.
Use with PostgreSQL + PostGIS:
-- Create table with geospatial column
CREATE TABLE places (
id UUID PRIMARY KEY,
name VARCHAR(255),
location GEOGRAPHY(POINT, 4326),
category VARCHAR(50)
);
-- Create spatial index (uses R-tree internally)
CREATE INDEX idx_places_location ON places USING GIST(location);
-- Query nearby places
SELECT
id,
name,
ST_Distance(location, ST_MakePoint(-122.4194, 37.7749)::geography) AS distance
FROM places
WHERE ST_DWithin(
location,
ST_MakePoint(-122.4194, 37.7749)::geography,
5000 -- 5km in meters
)
AND category = 'restaurant'
ORDER BY distance
LIMIT 20;
Advantages:
- Battle-tested with PostgreSQL PostGIS
- Handles complex geospatial queries
- ACID transactions for updates
- Production-grade reliability
Disadvantages:
- Slower than in-memory solutions (Redis GEO)
- Database load increases with query volume
- Scaling requires read replicas and sharding
3.2 Redis GEO for High-Performance Proximity Search
Architecture:
┌──────────────────────────────────────┐
│ Redis GEO Cluster │
│ ┌────────┐ ┌────────┐ ┌────────┐ │
│ │Shard 1 │ │Shard 2 │ │Shard 3 │ │
│ │ US-West│ │ US-East│ │ Europe │ │
│ └────────┘ └────────┘ └────────┘ │
└──────────────────────────────────────┘
Why Redis GEO:
- Sub-millisecond latency
- 100K+ queries per second per instance
- Built-in geospatial commands
- Sorted set implementation using geohash
Production Implementation:
import redis
from typing import List, Dict
class ProximitySearchService:
def __init__(self):
self.redis_client = redis.Redis(
host='redis-cluster.internal',
port=6379,
decode_responses=True,
socket_connect_timeout=2,
socket_timeout=2
)
def index_place(self, place_id: str, lat: float, lon: float,
category: str):
"""
Index place in Redis GEO by category.
Key pattern: places:geo:{category}
"""
key = f"places:geo:{category}"
self.redis_client.geoadd(key, (lon, lat, place_id))
# Also add to global index for category-agnostic search
self.redis_client.geoadd("places:geo:all", (lon, lat, place_id))
# Store place metadata separately
self.redis_client.hset(f"place:{place_id}", mapping={
"name": place.name,
"category": category,
"rating": place.rating,
"lat": lat,
"lon": lon
})
def search_nearby(self, lat: float, lon: float, radius_m: int,
category: str = None, limit: int = 20) -> List[Dict]:
"""
Search nearby places using Redis GEORADIUS.
"""
key = f"places:geo:{category}" if category else "places:geo:all"
# GEORADIUS with distance and coordinates
results = self.redis_client.georadius(
name=key,
longitude=lon,
latitude=lat,
radius=radius_m,
unit='m',
withdist=True,
withcoord=True,
count=limit,
sort='ASC' # Closest first
)
# Fetch place metadata
places = []
for place_id, distance, coords in results:
metadata = self.redis_client.hgetall(f"place:{place_id}")
places.append({
"place_id": place_id,
"distance": distance,
"location": {"lat": coords[1], "lon": coords[0]},
**metadata
})
return places
def search_knn(self, lat: float, lon: float, k: int = 10,
category: str = None) -> List[Dict]:
"""
Find k-nearest neighbors regardless of distance.
Start with small radius and expand until k results found.
"""
radius = 1000 # Start with 1km
max_radius = 50000 # Max 50km
while radius <= max_radius:
results = self.search_nearby(lat, lon, radius, category, limit=k)
if len(results) >= k:
return results[:k]
radius *= 2 # Exponential backoff
return results
Redis GEO Internals:
- Uses sorted set with geohash as score
- Geohash is 52-bit integer (fits in Redis score)
- GEORADIUS queries sorted set by geohash range
- Performance: O(N+log(M)) where N = results, M = total items
Sharding Strategy: Shard by geographic region to keep related data together:
def get_redis_shard(lat: float, lon: float) -> str:
"""Route to appropriate Redis shard based on location."""
if -125 < lon < -65 and 25 < lat < 50:
return "redis-us"
elif -10 < lon < 40 and 35 < lat < 70:
return "redis-eu"
elif 100 < lon < 145 and 20 < lat < 45:
return "redis-asia"
else:
return "redis-global"
3.3 Elasticsearch for Complex Geo Queries
When to use Elasticsearch:
- Need full-text search (“coffee near me”)
- Complex filters (category AND open_now AND rating > 4.0)
- Faceted search (aggregate by category)
- Geospatial bounding box queries
Index Mapping:
{
"mappings": {
"properties": {
"place_id": {"type": "keyword"},
"name": {
"type": "text",
"fields": {
"keyword": {"type": "keyword"}
}
},
"location": {"type": "geo_point"},
"category": {"type": "keyword"},
"rating": {"type": "float"},
"price_level": {"type": "integer"},
"open_now": {"type": "boolean"},
"hours": {
"type": "nested",
"properties": {
"day": {"type": "keyword"},
"open": {"type": "keyword"},
"close": {"type": "keyword"}
}
},
"popularity_score": {"type": "float"}
}
}
}
Geo Query Examples:
# Geo distance query
{
"query": {
"bool": {
"must": {
"match": {"category": "restaurant"}
},
"filter": {
"geo_distance": {
"distance": "5km",
"location": {
"lat": 37.7749,
"lon": -122.4194
}
}
}
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 37.7749,
"lon": -122.4194
},
"order": "asc",
"unit": "km"
}
}
]
}
# Geo bounding box query
{
"query": {
"bool": {
"filter": {
"geo_bounding_box": {
"location": {
"top_left": {"lat": 37.8, "lon": -122.5},
"bottom_right": {"lat": 37.7, "lon": -122.3}
}
}
}
}
}
}
# Complex query with multiple filters
{
"query": {
"bool": {
"must": [
{"match": {"name": "coffee"}}
],
"filter": [
{"term": {"category": "cafe"}},
{"term": {"open_now": true}},
{"range": {"rating": {"gte": 4.0}}},
{
"geo_distance": {
"distance": "2km",
"location": {"lat": 37.7749, "lon": -122.4194}
}
}
]
}
},
"sort": [
{"_score": "desc"},
{"_geo_distance": {
"location": {"lat": 37.7749, "lon": -122.4194},
"order": "asc"
}}
]
}
Sharding by Region:
# Create index per region for better performance
indices = [
"places-us-west",
"places-us-east",
"places-europe",
"places-asia"
]
def search_places(lat, lon, query_params):
"""Search appropriate regional index."""
index = get_index_by_location(lat, lon)
response = es_client.search(
index=index,
body={
"query": build_geo_query(lat, lon, query_params),
"size": 20
}
)
return response['hits']['hits']
3.4 Ranking Service
Ranking Factors:
- Distance (primary)
- Popularity (reviews, check-ins)
- Rating
- Personalization (user preferences)
- Freshness (newly opened places)
- Business tier (promoted listings)
Scoring Formula:
def calculate_score(place, user_location, user_prefs):
"""
Multi-factor scoring for place ranking.
"""
# Distance score (inverse exponential)
distance_km = haversine(user_location, place.location)
distance_score = math.exp(-distance_km / 5.0) # Decay over 5km
# Rating score (normalized)
rating_score = place.rating / 5.0
# Popularity score (log scale)
popularity_score = math.log10(place.review_count + 1) / 4.0
# Personalization score
category_match = 1.0 if place.category in user_prefs else 0.5
# Weighted combination
total_score = (
0.50 * distance_score +
0.20 * rating_score +
0.15 * popularity_score +
0.15 * category_match
)
return total_score
# Sort by score
ranked_places = sorted(
candidates,
key=lambda p: calculate_score(p, user_loc, user_prefs),
reverse=True
)
ML-Based Ranking:
import numpy as np
from sklearn.ensemble import GradientBoostingRegressor
class MLRankingService:
def __init__(self):
self.model = self.load_model()
def extract_features(self, place, user_location, user_prefs):
"""Extract features for ML model."""
distance = haversine(user_location, place.location)
return np.array([
distance,
place.rating,
place.review_count,
place.price_level,
int(place.open_now),
place.popularity_score,
int(place.category in user_prefs),
place.days_since_opened
])
def rank(self, places, user_location, user_prefs):
"""Rank places using ML model."""
features = [
self.extract_features(p, user_location, user_prefs)
for p in places
]
scores = self.model.predict(features)
ranked = sorted(
zip(places, scores),
key=lambda x: x[1],
reverse=True
)
return [place for place, score in ranked]
3.5 Handling Moving Objects (Users)
Challenge: User location updates frequently but we don’t want to reindex constantly.
Solution: Separate User Location Cache
class UserLocationService:
def __init__(self):
self.redis = redis.Redis()
def update_user_location(self, user_id: str, lat: float, lon: float):
"""
Cache user location with TTL.
No need to index in geospatial structure.
"""
self.redis.setex(
f"user:location:{user_id}",
300, # 5 minute TTL
json.dumps({"lat": lat, "lon": lon, "timestamp": time.time()})
)
def get_user_location(self, user_id: str):
"""Retrieve cached user location."""
data = self.redis.get(f"user:location:{user_id}")
return json.loads(data) if data else None
For ride-sharing (moving objects that need to be searched):
# Update driver location in Redis GEO
def update_driver_location(driver_id, lat, lon):
# Remove old location
redis.zrem("drivers:active", driver_id)
# Add new location
redis.geoadd("drivers:active", (lon, lat, driver_id))
# Set expiry on driver metadata
redis.setex(f"driver:{driver_id}", 60, json.dumps({
"lat": lat,
"lon": lon,
"status": "available"
}))
# Cleanup expired drivers
def cleanup_expired_drivers():
"""Remove drivers who haven't updated in 60 seconds."""
all_drivers = redis.zrange("drivers:active", 0, -1)
for driver_id in all_drivers:
if not redis.exists(f"driver:{driver_id}"):
redis.zrem("drivers:active", driver_id)
3.6 Place Data Management and Updates
Write Path:
Place Update → API → Place Service → PostgreSQL (Write)
↓
Update Redis GEO (Async)
↓
Update Elasticsearch (Async)
Implementation:
class PlaceService:
def __init__(self):
self.db = PostgresConnection()
self.redis = RedisConnection()
self.es = ElasticsearchConnection()
self.mq = KafkaProducer()
def create_place(self, place_data):
"""Create new place."""
# 1. Write to PostgreSQL (source of truth)
place_id = self.db.execute("""
INSERT INTO places (name, category, location, rating)
VALUES (%s, %s, ST_Point(%s, %s), %s)
RETURNING id
""", (
place_data['name'],
place_data['category'],
place_data['lon'],
place_data['lat'],
place_data['rating']
))
# 2. Publish to Kafka for async indexing
self.mq.send('place-updates', {
'event': 'create',
'place_id': place_id,
'data': place_data
})
return place_id
def update_place(self, place_id, updates):
"""Update existing place."""
# Update PostgreSQL
self.db.execute("""
UPDATE places
SET name = %s, rating = %s, updated_at = NOW()
WHERE id = %s
""", (updates['name'], updates['rating'], place_id))
# Publish update event
self.mq.send('place-updates', {
'event': 'update',
'place_id': place_id,
'updates': updates
})
# Async indexer consumer
class PlaceIndexer:
def consume_updates(self):
"""Process place updates from Kafka."""
for message in kafka_consumer:
event = message.value
if event['event'] == 'create':
self._index_new_place(event['place_id'], event['data'])
elif event['event'] == 'update':
self._update_indexes(event['place_id'], event['updates'])
elif event['event'] == 'delete':
self._remove_from_indexes(event['place_id'])
def _index_new_place(self, place_id, data):
"""Add to Redis GEO and Elasticsearch."""
# Index in Redis GEO
category = data['category']
redis.geoadd(
f"places:geo:{category}",
(data['lon'], data['lat'], place_id)
)
# Index in Elasticsearch
es.index(
index='places',
id=place_id,
document={
'place_id': place_id,
'name': data['name'],
'category': category,
'location': {'lat': data['lat'], 'lon': data['lon']},
'rating': data['rating']
}
)
3.7 Caching Strategy
Multi-Layer Cache:
L1: Application Cache (In-Memory)
from functools import lru_cache
import hashlib
class CacheLayer:
def __init__(self):
self.local_cache = {} # In-memory dict
def get_cache_key(self, lat, lon, radius, category):
"""Generate cache key from search params."""
# Round to 3 decimal places (~100m precision)
lat_rounded = round(lat, 3)
lon_rounded = round(lon, 3)
return f"{lat_rounded}:{lon_rounded}:{radius}:{category}"
@lru_cache(maxsize=1000)
def search_cached(self, lat, lon, radius, category):
"""LRU cache for frequent searches."""
cache_key = self.get_cache_key(lat, lon, radius, category)
# Check local cache first
if cache_key in self.local_cache:
return self.local_cache[cache_key]
# Execute search
results = self.search_nearby(lat, lon, radius, category)
# Cache results
self.local_cache[cache_key] = results
return results
L2: Redis Cache (Distributed)
def search_with_cache(lat, lon, radius, category):
"""Search with Redis cache layer."""
cache_key = f"search:{round(lat,2)}:{round(lon,2)}:{radius}:{category}"
# Check cache
cached = redis.get(cache_key)
if cached:
return json.loads(cached)
# Execute search
results = proximity_search(lat, lon, radius, category)
# Cache for 5 minutes
redis.setex(cache_key, 300, json.dumps(results))
return results
Cache Invalidation:
def invalidate_place_cache(place_id):
"""Invalidate cache when place is updated."""
# Get place location
place = db.get_place(place_id)
# Invalidate all cache keys in surrounding area
geohashes = get_nearby_geohashes(place.lat, place.lon, radius=10)
for gh in geohashes:
pattern = f"search:{gh}:*"
keys = redis.keys(pattern)
if keys:
redis.delete(*keys)
3.8 Database Sharding by Region
Sharding Strategy: Partition data by geographic region to improve query performance and enable regional isolation.
# Shard mapping
SHARDS = {
'us-west': {
'bounds': {'min_lat': 32, 'max_lat': 49, 'min_lon': -125, 'max_lon': -100},
'db': 'postgres-us-west.internal'
},
'us-east': {
'bounds': {'min_lat': 25, 'max_lat': 48, 'min_lon': -100, 'max_lon': -65},
'db': 'postgres-us-east.internal'
},
'europe': {
'bounds': {'min_lat': 35, 'max_lat': 70, 'min_lon': -10, 'max_lon': 40},
'db': 'postgres-eu.internal'
}
}
def get_shard_for_location(lat, lon):
"""Route query to appropriate database shard."""
for shard_name, config in SHARDS.items():
bounds = config['bounds']
if (bounds['min_lat'] <= lat <= bounds['max_lat'] and
bounds['min_lon'] <= lon <= bounds['max_lon']):
return config['db']
return 'postgres-global.internal' # Fallback
# Query router
def query_places(lat, lon, radius_km):
"""Route query to correct shard."""
db_conn = get_shard_for_location(lat, lon)
# For cross-boundary queries, query multiple shards
if is_near_boundary(lat, lon, radius_km):
shards = get_affected_shards(lat, lon, radius_km)
results = []
for shard in shards:
results.extend(query_shard(shard, lat, lon, radius_km))
return merge_and_sort(results, lat, lon)
else:
return query_shard(db_conn, lat, lon, radius_km)
Step 4: Wrap-Up
Final Architecture Summary
The production-grade proximity service uses a multi-layered approach:
Geospatial Indexing:
- Redis GEO for low-latency proximity queries (primary)
- Elasticsearch for complex filters and full-text search
- PostgreSQL + PostGIS as source of truth
- Geohash for cache keys and routing
Data Flow:
- Writes: PostgreSQL → Kafka → Redis GEO + Elasticsearch
- Reads: Application → L1 Cache → Redis GEO → Ranking → Response
- Complex queries: Elasticsearch with geo filters
Scalability:
- Geographic sharding for PostgreSQL and Elasticsearch
- Redis cluster with regional shards
- Multi-layer caching (application + Redis)
- Async indexing via Kafka
Performance:
- p99 latency < 200ms
- 200K QPS capacity with auto-scaling
- 99.99% availability with multi-region deployment
Key Design Decisions
- Redis GEO as primary search layer - Sub-millisecond latency for simple proximity queries
- PostgreSQL as source of truth - ACID guarantees for place data
- Async indexing - Eventual consistency acceptable for search indexes
- Geographic sharding - Keeps related data together, reduces cross-region queries
- Multi-layer caching - Reduces load on search layer by 80%+
Extensions and Future Work
Real-time place availability:
- WebSocket connection for live updates
- Redis Pub/Sub for place status changes
- Event-driven updates to mobile clients
Machine learning enhancements:
- Personalized ranking using collaborative filtering
- Demand prediction for popular areas
- Anomaly detection for fake reviews
Advanced features:
- Routing and navigation integration
- AR-based place discovery
- Social features (friend check-ins, recommendations)
This design handles 500M+ places, 100M+ DAU, and 200K+ QPS while maintaining sub-200ms latency - production-ready for scale.
Comments