Design Zillow
Zillow is a comprehensive real estate marketplace platform that aggregates property listings, provides automated home valuations (Zestimate), connects buyers with agents, and offers tools for property search, saved searches, and market analytics. At Meta-scale, we need to handle millions of daily active users searching through 135+ million properties with real-time price updates, complex geospatial queries, and ML-powered recommendations.
Designing Zillow presents unique challenges including geospatial property search, automated machine learning-based home valuations, MLS data aggregation from hundreds of sources, saved search alerting systems, and scaling to support hundreds of millions of properties with sub-second search latency.
Step 1: Understand the Problem and Establish Design Scope
Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.
Functional Requirements
Core Requirements (Priority 1-4):
- Users should be able to search for properties with filters including price, beds, baths, square footage, property type, and location.
- Users should be able to browse properties on a map with geospatial search capabilities including polygon and radius queries.
- Users should be able to view property detail pages with photos, videos, virtual tours, price history, and Zestimate valuations.
- Homeowners should be able to get automated home valuations (Zestimate) with confidence intervals.
Below the Line (Out of Scope):
- Users should be able to save searches and receive customizable alerts via email, push notifications, or SMS.
- Users should be able to save favorite properties and organize them into collections.
- Users should be able to compare multiple properties side-by-side.
- Users should be able to connect with real estate agents and request showings.
- Agents should be able to create and manage listings with performance analytics.
- Users should be able to view neighborhood insights including school ratings and walkability scores.
Non-Functional Requirements
Core Requirements:
- The system should provide low latency search results with P99 latency under 200ms for text searches and under 300ms for map-based geospatial queries.
- The system should handle high throughput with 500M+ searches per day and 50K+ new listings ingested daily.
- The system should ensure data quality with listing data freshness under 15 minutes from MLS sources and Zestimate median error rate under 2% in major markets.
Below the Line (Out of Scope):
- The system should provide 99.99% uptime for search and browsing capabilities.
- The system should support graceful degradation, serving stale Zestimate values if real-time calculation fails.
- The system should ensure de-duplication of listings from multiple sources with address standardization and geocoding accuracy above 99%.
- The system should support multi-region deployment for disaster recovery.
Clarification Questions & Assumptions:
- Platform: Web and mobile apps (iOS and Android) for both homebuyers and agents.
- Scale: 200M+ monthly active users, 135M+ properties indexed, 10M+ Zestimate calculations per day.
- Geographic Coverage: United States with 800+ Multiple Listing Service (MLS) integrations.
- Update Frequency: MLS feeds update every 15 minutes, Zestimate calculations cached for 24 hours.
- Search Types: Text search, map-based geospatial search, saved search alerts.
Step 2: Propose High-Level Design and Get Buy-in
Planning the Approach
Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.
Defining the Core Entities
To satisfy our key functional requirements, we’ll need the following entities:
Property: The core entity representing a real estate listing. Contains address information, location coordinates (latitude and longitude), price, bedrooms, bathrooms, square footage, lot size, year built, property type, listing status (active, pending, sold, off-market), photos, and metadata. Properties are indexed from Multiple Listing Services across the country.
Zestimate: An automated home valuation generated by machine learning models. Includes the estimated market value, confidence score, value range (upper and lower bounds), and the number of comparable properties used in the calculation. Zestimates are calculated using 100+ features including property characteristics, location data, market trends, and recent comparable sales.
User: Any registered user of the platform. Includes personal information, authentication credentials, notification preferences, saved properties, browsing history, and user segmentation data for personalization. Users can be homebuyers, sellers, or homeowners checking their property value.
Agent: Real estate professionals registered on the platform. Contains agent profile information, certifications, service areas, performance metrics (transaction volume, average sale price, days to close), review ratings, and lead assignment preferences.
Saved Search: User-defined search criteria with notification preferences. Includes search filters (location, price range, beds, baths, property type), alert frequency (instant, daily, weekly), last run timestamp, and active status. When new properties match the criteria, alerts are sent to users.
MLS Listing: Raw listing data ingested from Multiple Listing Services. Contains MLS-specific identifiers, source information, raw field mappings, and ingestion timestamps. This entity is normalized and de-duplicated before becoming a Property in the main database.
API Design
Property Search Endpoint: Used by clients to search for properties with various filters and sorting options.
POST /search -> SearchResults
Body: {
location: { lat, long, radius } | polygon | address,
priceMin: number,
priceMax: number,
beds: number,
baths: number,
propertyType: string[],
listingStatus: string,
sortBy: string,
page: number
}
Map Search Endpoint: Used by clients to perform geospatial searches within map boundaries or drawn polygons.
POST /search/map -> MapSearchResults
Body: {
bounds: { northEast: {lat, long}, southWest: {lat, long} } | polygon,
zoom: number,
filters: {...}
}
Property Details Endpoint: Retrieves comprehensive details for a specific property including photos, price history, and Zestimate.
GET /properties/:propertyId -> Property
Zestimate Endpoint: Retrieves or calculates the automated home valuation for a property.
GET /properties/:propertyId/zestimate -> Zestimate
Query params: {
forceRefresh: boolean
}
Save Search Endpoint: Allows users to save search criteria with alert preferences.
POST /saved-searches -> SavedSearch
Body: {
searchCriteria: {...},
alertFrequency: "instant" | "daily" | "weekly",
notificationChannels: string[]
}
High-Level Architecture
Let’s build up the system sequentially, addressing each functional requirement:
1. Users should be able to search for properties with filters
The core components necessary to fulfill property search are:
- Client Applications: Web and mobile interfaces where users input search criteria. Available on iOS, Android, and responsive web.
- API Gateway: Entry point for all client requests. Handles authentication, rate limiting, request routing, and metrics collection. Uses Kong or AWS API Gateway.
- Search Service: Manages property search queries using Elasticsearch. Handles full-text search, faceted filtering, sorting, and pagination. Optimized for sub-200ms response times.
- Elasticsearch Cluster: Distributed search engine storing indexed property data. Configured with geo-point and geo-shape indexing for location-based queries. Uses sharding by geographic region with replica sets for high availability.
- Listing Service: Source of truth for property data. Manages CRUD operations for listings, handles MLS feed ingestion, performs de-duplication and address standardization. Uses Change Data Capture (CDC) to keep Elasticsearch synchronized.
- Database (PostgreSQL): Relational database storing property metadata, MLS mappings, listing history, and referential integrity constraints. Partitioned by state for scaling.
Property Search Flow:
- The user enters search criteria (location, price range, beds, baths) into the client app, which sends a POST request to the search endpoint.
- The API gateway receives the request, handles authentication and rate limiting, then forwards to the Search Service.
- The Search Service constructs an Elasticsearch query with boolean filters, range queries, and term matching based on the user’s criteria.
- Elasticsearch executes the query across its distributed shards, returning matching property documents with aggregations for faceted search.
- The Search Service formats the results, adds additional metadata, and returns paginated results to the client through the API Gateway.
2. Users should be able to browse properties on a map with geospatial search
We extend our search capabilities with geospatial components:
- Geospatial Index (H3): Uber’s H3 hexagonal hierarchical spatial index used for map clustering and heatmaps. Properties are indexed using H3 cells at multiple resolutions, allowing dynamic clustering based on zoom level.
- Map Tile Service: Generates vector tiles for map visualization with property clusters. Uses adaptive resolution based on zoom level to balance performance and detail.
Map Search Flow:
- The user pans or zooms on a map interface, triggering a map search request with geographic bounds or a drawn polygon.
- The API gateway forwards the request to the Search Service with boundary coordinates.
- The Search Service uses Elasticsearch geo-shape queries with polygon or bounding box filters to find properties within the visible area.
- For clustering at lower zoom levels, the service queries properties by H3 cell identifiers, aggregating counts and average prices per hexagon.
- Results are returned with either individual property markers (high zoom) or cluster data (low zoom) to prevent overwhelming the map interface.
Polygon Search Optimization: For user-drawn polygons, we pre-process and simplify the geometry to reduce query complexity. Complex polygons are decomposed into simpler shapes, and an initial bounding box filter narrows the result set before applying the precise polygon intersection test.
3. Users should be able to view property details with photos, price history, and Zestimate
We add services for rich property details:
- Image Service: Manages property photos, videos, and virtual tours. Stores original files in S3 or similar object storage, generates thumbnails in multiple sizes, and serves content through a CDN for fast global delivery.
- Price History Service: Tracks property price changes over time, maintains historical listing data, and generates trend charts for property detail pages.
- Cache Layer (Redis): Caches frequently accessed property details, search results, and Zestimate values to reduce database load and improve response times.
Property Detail Flow:
- The user clicks on a property from search results, sending a GET request to the property details endpoint.
- The API gateway forwards the request to the Listing Service.
- The Listing Service first checks the Redis cache for property data. On cache miss, it queries the PostgreSQL database.
- Concurrently, it fetches the property’s photos from the Image Service and price history from the Price History Service.
- It also retrieves or triggers calculation of the Zestimate from the Zestimate Service.
- All data is aggregated, cached in Redis with appropriate TTL, and returned to the client for rendering.
4. Users should be able to get automated home valuations (Zestimate)
We introduce machine learning infrastructure for home valuations:
- Zestimate Service: Serves ML model predictions for property valuations. Manages model versioning, A/B testing, and result caching.
- Feature Store: Centralized repository for ML features used in Zestimate calculation. Stores pre-computed features in Redis for real-time inference and raw feature data in S3 for batch training.
- ML Pipeline: Offline training pipeline using XGBoost or LightGBM models. Trains on historical sales data, property characteristics, location features, market trends, and comparable sales. Models are retrained weekly with new data.
- Comparable Sales Service: Finds similar recently sold properties within the same area, used both as features for the ML model and displayed to users for transparency.
Zestimate Calculation Flow:
- When a property detail page is requested, the Listing Service makes a call to the Zestimate Service.
- The Zestimate Service checks Redis cache for a recent valuation (TTL of 24 hours).
- On cache miss, it gathers required features: property characteristics from the Listing Service, market data from the Market Data Service, and comparable sales from the Comparable Sales Service.
- Features are assembled into a feature vector and fed to the trained XGBoost model.
- The model returns a point estimate, which is augmented with confidence intervals calculated from model uncertainty and data quality metrics.
- The result includes the Zestimate value, upper and lower bounds, confidence score, and count of comparable properties used.
- The result is cached in Redis and returned to the client.
Step 3: Design Deep Dive
With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.
Deep Dive 1: How do we achieve sub-200ms search latency for complex geospatial queries?
Property search is fundamentally geospatial, requiring efficient indexing and querying of multi-dimensional location data combined with numerous filters.
Problem: Query Complexity
A typical search might include location-based filtering (within a radius or polygon), price range, number of bedrooms, property type, and listing status. Without proper indexing, this could require full table scans or multiple sequential filter operations.
Solution: Elasticsearch with Geospatial Indexing
Elasticsearch provides specialized data structures for geospatial queries:
- Geo-Point Data Type: Stores latitude and longitude coordinates with efficient indexing for distance and bounding box queries.
- Geo-Shape Data Type: Supports complex geometries like polygons, enabling queries like “find all properties within this user-drawn neighborhood boundary.”
- Compound Queries: Boolean queries combine geo-filters with range filters (price), term filters (property type), and full-text search (address, city).
The search query structure uses a boolean must clause containing a geo-shape filter for the location polygon, range filters for price and bedrooms, term filters for listing status, and aggregations for faceted search. Elasticsearch executes this as a single coordinated query across distributed shards.
Geohash and H3 for Map Clustering:
When users zoom out on the map, displaying individual property markers becomes impractical. We use Uber’s H3 hexagonal hierarchical spatial index to cluster properties:
- Properties are indexed with their H3 cell identifier at multiple resolutions (1 through 15).
- At low zoom levels, we query by coarser H3 cells (resolution 5-7), aggregating property counts and average prices per hexagon.
- At high zoom levels, we use finer resolution (resolution 9-11) or switch to individual properties.
- The resolution mapping adapts dynamically based on zoom level, balancing between showing detail and maintaining performance.
Polygon Search Optimization:
User-drawn polygons can be complex with many vertices, making geo-shape queries expensive. We optimize by:
- Simplifying polygon geometry using the Douglas-Peucker algorithm with a small tolerance, reducing vertex count while preserving shape.
- Validating polygon topology to ensure it’s a valid geometry (no self-intersections).
- Using a two-stage filter: first apply a bounding box filter to narrow candidates, then apply the precise polygon intersection test.
- Caching popular polygon searches (common neighborhood boundaries) with a cache key derived from the normalized polygon coordinates.
Search Result Caching:
Popular searches like “homes in San Francisco under $1M” are cached in Redis:
- Cache keys are generated by hashing normalized query parameters (sorted JSON representation).
- Cache TTL is set to 5 minutes for active listings, balancing freshness with performance.
- When a new listing is added or updated in an area, we use geospatial invalidation: query Redis for cache entries within a 1km radius using Redis GEO commands, and delete matching keys.
- This approach reduces Elasticsearch load by 40-60% for common queries.
Deep Dive 2: How do we build the Zestimate ML pipeline for accurate home valuations?
Zestimate is Zillow’s signature feature, providing automated home valuations using machine learning. Achieving under 2% median error requires sophisticated feature engineering, model training, and serving infrastructure.
Feature Engineering:
The Zestimate model uses 100+ features grouped into categories:
- Property Characteristics: Square footage, bedrooms, bathrooms, lot size, year built, age, property type, condition indicators (has pool, garage, basement, fireplace), number of stories.
- Derived Features: Price per square foot estimate, bed-to-bath ratio, living space to lot size ratio.
- Location Features: Latitude, longitude, walkability score, transit score, school district ratings, crime statistics, proximity to amenities.
- Market Trends: ZIP code median price, price change over 3 months and 1 year, inventory levels (active listings), average days on market, seasonal indicators (month, quarter, spring/summer flag).
- Comparable Sales Features: Average, median, standard deviation, min, and max prices from comparable sales; comparable square footage and age; count of comparable sales; weighted price per square foot.
Comparable sales are identified by querying sold properties in the same ZIP code, matching property type, within 20% square footage variance, same bedroom count, and sold within the last 18 months. Distance weighting and time decay are applied: closer and more recent sales receive higher weights using inverse distance weighting multiplied by exponential time decay.
Model Training:
The model uses XGBoost (Extreme Gradient Boosting) with regression objective. Key hyperparameters include:
- Max depth of 8 to capture complex interactions without overfitting.
- Learning rate of 0.05 with 1000 estimators for gradual convergence.
- Subsample and column sampling of 0.8 for regularization.
- Min child weight, gamma, L1, and L2 regularization to prevent overfitting.
- Histogram-based tree method for efficiency.
- Early stopping after 50 rounds without improvement.
Training data includes historical sales transactions with actual sale prices as labels. The dataset is split 80-20 for training and validation. Model evaluation uses Mean Absolute Error (MAE), Root Mean Squared Error (RMSE), and Mean Absolute Percentage Error (MAPE) to ensure accuracy.
Models are trained separately per market (metro area or state) to capture local dynamics. Each market has its own model version, and models are retrained weekly with new sales data.
Confidence Interval Calculation:
Zestimate provides a value range (upper and lower bounds) to communicate uncertainty:
- Confidence score starts at 100 and is penalized based on data quality factors.
- Fewer than 10 comparable sales: subtract 10-20 points.
- Old comparables (average age over 6 months): subtract 10-15 points.
- Missing key property features (square footage, bedrooms): subtract 10-30 points.
- Final confidence score is capped at minimum 0.
The value range is initially set at ±10% of the point estimate, then adjusted based on confidence score and model variance. Lower confidence scores result in wider ranges.
Model Serving and Caching:
The Zestimate Serving Layer manages model inference:
- Models are loaded into memory for fast prediction (under 100ms).
- Cache layer (Redis) stores calculated Zestimates with 24-hour TTL.
- On cache hit, return immediately. On cache miss, fetch property data, market data, and comparable sales, then run inference.
- Force refresh option allows homeowners to recalculate after property improvements.
- Batch processing overnight recalculates Zestimates for all properties with stale values.
- Model versions are tracked, and A/B testing compares new model performance before rollout.
Deep Dive 3: How do we aggregate and de-duplicate listings from 800+ MLS feeds?
Zillow doesn’t own listing data; it aggregates from hundreds of Multiple Listing Services across the US. Each MLS has different data formats, update frequencies, and field mappings.
MLS Ingestion Pipeline:
The ingestion pipeline consists of several stages:
-
Feed Retrieval: Connect to each MLS via their specific protocol (RETS, XML feeds, JSON APIs, FTP). Feeds are polled every 15 minutes to detect new or updated listings.
-
Normalization: Each MLS uses different field names and formats. We maintain a field mapping configuration per MLS source that translates their schema to our standard schema. For example, one MLS might use “ListPrice” while another uses “CurrentPrice” – both map to our “price” field.
-
Address Standardization: Addresses are cleaned and normalized to a consistent format: uppercase, abbreviated street types (Street to ST, Avenue to AVE, Road to RD), removal of apartment numbers for single-family homes.
-
Geocoding: Addresses are sent to a geocoding service (Google Maps API, Mapbox, or internal geocoder) to obtain latitude and longitude coordinates. Geocoding confidence scores are stored, and low-confidence results are flagged for manual review.
-
De-duplication Check: Before creating a new property record, we check if it already exists to prevent duplicates from multiple MLS sources.
-
Database Update: Create new property or update existing property in PostgreSQL. Generate Change Data Capture (CDC) events.
-
Event Publishing: Publish listing update events to Kafka for downstream consumers (Search Service to update Elasticsearch, Zestimate Service to recalculate, Notification Service for saved search alerts).
De-duplication System:
Properties can appear in multiple MLS feeds (e.g., a property in a border area served by two MLSs, or a syndication network). De-duplication prevents showing the same property multiple times:
- Fingerprint Generation: Create a unique identifier by hashing the standardized address and ZIP code using SHA-256.
- Exact Match: Query the database for an existing property with the same fingerprint. If found, it’s a duplicate – update the existing record.
- Fuzzy Match: If no exact match, perform fuzzy matching on address and coordinates. Query for properties within 50 meters using PostGIS spatial queries.
- Address Similarity: Calculate Levenshtein string similarity between addresses. Similarity above 85% combined with close proximity indicates a duplicate.
- Conflict Resolution: When the same property has different data from multiple sources, apply business rules (e.g., prefer the MLS with the most recent update, or the primary MLS for that region).
The fingerprint is stored in the properties table with a unique constraint, ensuring database-level duplicate prevention.
Data Quality Monitoring:
We track data quality metrics per MLS source:
- Ingestion lag (time between MLS update and our system update).
- Geocoding success rate and confidence distribution.
- De-duplication match rate.
- Field completeness (percentage of listings with all required fields).
- Error rates during parsing and normalization.
Sources with persistent quality issues are flagged for investigation or excluded from the platform.
Deep Dive 4: How do we handle saved searches and send timely alerts to millions of users?
Users can save search criteria and receive notifications when new matching properties appear. With millions of saved searches running continuously, this requires efficient batch processing.
Saved Search Architecture:
- Saved Searches Table: Stores user search criteria as JSON, alert frequency (instant, daily, weekly), last run timestamp, and active status.
- Saved Search Engine: Background service that periodically runs saved searches against new listings.
- Notification Service: Dispatches alerts via multiple channels (email, push notifications, SMS).
- Email Queue (SQS + SES): Asynchronous email delivery using AWS Simple Queue Service and Simple Email Service.
Saved Search Execution Flow:
-
Batch Job Trigger: A scheduled job (cron or AWS EventBridge) runs every 15 minutes to check for saved searches due for execution.
-
Query Saved Searches: Fetch all active saved searches where the last run timestamp is older than the alert frequency threshold. For example, instant alerts run every 15 minutes, daily alerts run if last run was over 24 hours ago, weekly alerts run if last run was over 7 days ago.
-
Find New Matches: For each saved search:
- Deserialize the saved search criteria JSON.
- Construct an Elasticsearch query with the user’s filters.
- Add a time filter to only include properties listed since the last run timestamp.
- Execute the query and collect matching properties.
-
Send Alerts: If new matches are found:
- Generate an email or push notification with property cards showing photos, prices, and key details.
- Enqueue the notification to the Email Queue or Push Notification Service.
- Update the last run timestamp for the saved search.
-
Delivery: Email workers consume from the SQS queue and use Amazon SES to send emails. Push notifications are sent via FCM (Firebase Cloud Messaging) for Android and APNs (Apple Push Notification service) for iOS.
Optimization Strategies:
- Batching: Group saved searches by similar criteria to reduce duplicate Elasticsearch queries. For example, multiple users searching for “3-bedroom homes in Seattle under $800K” can share the same query result.
- Incremental Index: Maintain a separate Elasticsearch index for recent listings (last 24 hours) with fewer shards, allowing faster queries for saved search execution.
- Priority Queue: During high-volume periods, prioritize “instant” alerts over daily/weekly alerts.
- Rate Limiting: Limit notification frequency per user to avoid overwhelming them (e.g., max 5 emails per day across all saved searches).
- Unsubscribe Management: Track user preferences and bounce rates to maintain good email deliverability reputation.
Email Content Generation:
Alert emails include rich HTML content with property cards showing:
- Primary photo with lazy loading.
- Price, address, beds, baths, and square footage.
- “View Details” link to the property page with tracking parameters.
- Map showing property locations.
- Footer with unsubscribe and preference management links.
Emails are personalized with the user’s name and summary of their search criteria.
Deep Dive 5: How do we match buyers with real estate agents efficiently?
Zillow generates revenue by connecting buyers with real estate agents through its Premier Agent program. Effective matching improves conversion rates and user satisfaction.
Agent Matching Algorithm:
The matching system ranks agents based on multiple factors:
-
Geographic Expertise: Agents must serve the target ZIP codes or neighborhoods the buyer is interested in. Service areas are stored as arrays of ZIP codes in the agents table.
-
Property Type Specialization: Some agents specialize in certain property types (single-family, condos, luxury). Match buyers’ property preferences with agents’ specialty areas.
-
Price Range Experience: Calculate the average sale price of the agent’s recent transactions (last 2 years). Agents with experience in the buyer’s target price range receive higher scores.
-
Performance Metrics: Consider average rating from reviews, total transaction volume (minimum 5 transactions), average days to close, and response time.
-
Availability: Check if the agent is actively accepting leads and hasn’t exceeded their lead capacity.
Scoring Formula:
A weighted scoring system calculates match quality:
- Property Type Match: 30% - Does the agent specialize in the buyer’s target property type?
- Price Range Similarity: 25% - How close is the agent’s average sale price to the buyer’s budget?
- Rating: 20% - Agent’s average rating normalized to 0-20 scale.
- Transaction Volume: 15% - Number of recent transactions (capped at 50 for scoring).
- Responsiveness: 10% - How quickly does the agent typically respond to leads?
Total score ranges from 0-100. Agents are ranked by score, and the top 5 are presented to the buyer.
Lead Assignment:
When a buyer requests agent contact:
- The buyer’s profile and preferences are sent to the Agent Matching Service.
- The matching algorithm scores and ranks all eligible agents.
- The top-ranked agent is assigned the lead and notified immediately.
- If the agent doesn’t respond within a defined window, the lead is offered to the next agent.
- Agent interactions (response time, conversion rate) are tracked and feed back into the scoring algorithm for continuous improvement.
Premier agents pay for placement and lead generation, so fair and effective matching is crucial for revenue and marketplace health.
Deep Dive 6: How can we scale Elasticsearch to handle 500M+ searches per day?
With 200M+ monthly active users performing multiple searches per session, we need to optimize Elasticsearch for extreme scale.
Sharding Strategy:
Elasticsearch distributes data across shards for parallel processing:
- Geographic Sharding: Partition properties by state or metro area. Major metros (SF, NYC, LA) get dedicated shards due to high query volume. Smaller states can share shards.
- Shard Count: Use approximately 10 primary shards per major metro, with 2 replicas for high availability. This provides 30 total shard copies (10 primary + 20 replicas) per metro.
- Routing: Include a routing key (state or metro ID) in index operations and queries to ensure requests hit only relevant shards, avoiding scatter-gather across all shards.
Hot-Warm Architecture:
Not all properties are equally popular:
- Hot Tier: Recent listings (last 90 days) and active searches are stored on high-performance SSD nodes. These receive the majority of search traffic.
- Warm Tier: Older sold listings and historical data are moved to cheaper HDD-based nodes. They’re still searchable but with higher latency.
- Index Lifecycle Management: Automatically move indexes from hot to warm tier based on age. Eventually archive to S3 for compliance and analytics.
Query Optimization:
- Filter Context vs Query Context: Use filter context for exact matches (property type, listing status) as filters are cached and faster than scored queries.
- Aggregation Optimization: Limit aggregation size and use sampling for large result sets. Cache aggregation results.
- Field Data Types: Use keyword fields for exact matching (property IDs), text fields for full-text search (descriptions), and appropriate numeric types to minimize storage.
- Source Filtering: Only return necessary fields (_source: [“address”, “price”, “photos”]) to reduce network transfer.
Caching Layers:
- Request Cache: Elasticsearch’s built-in request cache stores frequently executed queries at the shard level.
- Application Cache: Redis caches popular search results with 5-minute TTL before hitting Elasticsearch.
- CDN Cache: Search result pages are cached at the CDN edge for logged-out users.
Connection Pooling:
Maintain persistent HTTP connection pools to Elasticsearch cluster to avoid TCP handshake overhead. Use load balancers to distribute requests across Elasticsearch nodes.
Monitoring and Alerting:
Track key metrics:
- Query latency (P50, P99, P999).
- Indexing rate and lag.
- Shard allocation and health.
- Cache hit rates.
- Circuit breaker trips.
Set alerts for anomalies like latency spikes, high CPU, or failing nodes.
Step 4: Wrap Up
In this chapter, we proposed a system design for a real estate marketplace platform like Zillow. If there is extra time at the end of the interview, here are additional points to discuss:
Additional Features:
- Price Drop Alerts: Notify users when properties they’re watching have price reductions.
- Virtual Tours and 3D Walkthroughs: Integrate Matterport or similar platforms for immersive property viewing.
- Mortgage Calculator: Help users understand affordability with integrated mortgage and payment calculators.
- Neighborhood Insights: Provide detailed information about schools, crime rates, demographics, and local amenities.
- Open House Scheduling: Allow users to RSVP for open houses and add them to their calendar.
- Comparative Market Analysis: Automated CMA reports for agents and sellers.
- Predictive Analytics: Use ML to predict time to sell, optimal listing price, and best time to list.
Scaling Considerations:
- Database Sharding: Shard PostgreSQL by state or region to distribute load. Use Citus or Vitess for sharding orchestration.
- Read Replicas: Deploy multiple read replicas for the listing database to handle high read traffic. Use connection pooling and load balancing.
- CDN for Images: Store all property photos in S3 and serve through CloudFront CDN with edge caching and image optimization.
- Multi-Region Deployment: Deploy the system in multiple AWS regions (us-east-1, us-west-2) for disaster recovery and reduced latency.
- Microservices Architecture: Decompose into focused microservices (Search, Listing, Zestimate, User, Agent, Notification) for independent scaling and deployment.
Error Handling:
- Circuit Breakers: Implement circuit breakers for third-party dependencies (MLS feeds, geocoding APIs) to prevent cascading failures.
- Graceful Degradation: If Zestimate calculation fails, display the last known value with a staleness indicator. If Elasticsearch is down, fall back to database queries with reduced functionality.
- Retry Logic: Implement exponential backoff with jitter for transient failures in MLS ingestion and API calls.
- Dead Letter Queues: Use DLQs for failed message processing (listing updates, notifications) with monitoring and manual replay capability.
Security Considerations:
- Data Encryption: Encrypt sensitive data at rest (database, S3) using AWS KMS and in transit using TLS 1.3.
- Authentication and Authorization: Use JWT tokens for API authentication, with OAuth 2.0 for third-party integrations. Implement role-based access control (RBAC) for agents and admins.
- Rate Limiting: Protect APIs with rate limiting per user/IP to prevent abuse and scraping. Use distributed rate limiting with Redis.
- Input Validation: Sanitize all user inputs to prevent SQL injection, XSS, and other injection attacks.
- PII Protection: Mask or encrypt personally identifiable information (email, phone) and comply with GDPR, CCPA regulations.
Monitoring and Analytics:
- Key Metrics: Search success rate, search-to-contact conversion, Zestimate accuracy (median error, coverage), listing freshness, MLS ingestion lag.
- A/B Testing: Framework for testing search ranking algorithms, Zestimate models, UI changes, and pricing strategies.
- User Analytics: Track user journeys, popular searches, property view patterns, and drop-off points to optimize conversion funnels.
- Real-Time Dashboards: Operations dashboards for system health, business metrics, and incident response.
- Distributed Tracing: Use OpenTelemetry or AWS X-Ray to trace requests across microservices for debugging and performance optimization.
Future Improvements:
- AI-Powered Recommendations: Use deep learning for personalized property recommendations based on browsing history, saved searches, and user preferences.
- Natural Language Search: Allow users to search with natural language queries like “3-bedroom home near good schools under $800K” using NLP models.
- Computer Vision for Property Features: Automatically detect property features (pool, hardwood floors, updated kitchen) from photos using computer vision.
- Market Forecasting: Predict future property values and market trends using time series forecasting models.
- Automated CMA Generation: Generate comparative market analysis reports automatically for agents and sellers.
- Blockchain for Transaction Records: Explore blockchain for transparent, immutable property transaction history.
Data Model Summary:
The core data model includes:
-
Properties Table: Stores property records with address, coordinates, price, bedrooms, bathrooms, square footage, lot size, year built, property type, listing status, dates, MLS information, fingerprint for de-duplication, Zestimate cache, photos, and timestamps. Indexed on location (PostGIS GIST index), price, ZIP code, and listing status.
-
Users Table: User profiles with authentication, preferences, saved properties, browsing history, and notification settings.
-
Agents Table: Agent profiles with certifications, service areas, performance metrics, ratings, and lead preferences.
-
Saved Searches Table: User search criteria, alert frequency, last run timestamp, and active status.
-
Zestimates Table: Cached Zestimate calculations with value, confidence, bounds, comparable count, and update timestamp.
-
MLS Sources Table: Configuration for each MLS feed including credentials, field mappings, ingestion schedule, and health metrics.
Capacity Planning:
For 135M properties, 200M MAU, 500M searches per day:
- Elasticsearch Storage: ~135M properties × 5KB avg = ~675GB of indexed data. With 2 replicas, total storage is ~2TB. Add 50% buffer for overhead and growth: ~3TB SSD storage.
- PostgreSQL Storage: ~135M properties × 10KB avg = ~1.35TB primary data. With indexes, replicas, and backups: ~5TB total.
- Image Storage: ~135M properties × 20 photos × 500KB avg = ~1.35PB stored in S3.
- Search Throughput: 500M searches per day = ~5,800 QPS average, ~20,000 QPS peak. With caching reducing Elasticsearch load by 50%: ~10,000 QPS to Elasticsearch.
- Elasticsearch Cluster: ~100 nodes (mix of hot and warm) with 32GB RAM, 8 cores each.
- Application Servers: ~500 instances for Search, Listing, Zestimate services combined.
Congratulations on getting this far! Designing Zillow is a complex system design challenge that combines geospatial search, machine learning, data aggregation, and real-time alerting at massive scale. The key is to start with core functionality, layer in optimizations for performance and scale, and maintain data quality throughout the pipeline.
Summary
This comprehensive guide covered the design of a real estate marketplace platform like Zillow, including:
- Core Functionality: Property search with geospatial filtering, map-based browsing, property details with rich media, and automated home valuations (Zestimate).
- Key Challenges: Sub-200ms search latency for complex geospatial queries, accurate ML-based home valuations, aggregating and de-duplicating data from 800+ MLS feeds, and scaling to 135M+ properties.
- Solutions: Elasticsearch with geospatial indexing and H3 clustering, XGBoost-based Zestimate ML pipeline with feature engineering, robust MLS ingestion with fingerprinting de-duplication, saved search alerting with batch processing, and agent matching algorithms.
- Scalability: Elasticsearch sharding by geography, hot-warm architecture, multi-layer caching, database sharding, CDN for images, and microservices architecture.
The design demonstrates how to build a data-intensive platform with search, machine learning, third-party integrations, and real-time notifications at scale, serving hundreds of millions of users with low latency and high accuracy.
Comments