Design Spotify

Spotify is a music streaming platform that provides on-demand access to millions of songs, podcasts, and playlists. It allows users to discover, stream, and share music from their smartphones, computers, and other connected devices with personalized recommendations and social features.

Designing Spotify presents unique challenges including handling massive audio streaming at scale, building sophisticated recommendation systems, managing real-time collaborative playlists, calculating accurate royalty payments, and optimizing content delivery across global CDN networks.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the design, it’s crucial to define the functional and non-functional requirements. For user-facing applications like this, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.

Functional Requirements

Core Requirements (Priority 1-3):

  1. Users should be able to search for and stream music (songs, albums, artists).
  2. Users should be able to create, edit, and manage playlists.
  3. Users should receive personalized music recommendations.
  4. Users should be able to download content for offline listening.

Below the Line (Out of Scope):

  • Users should be able to follow friends and see what they’re listening to.
  • Users should be able to share songs, playlists, and albums.
  • Artists should be able to upload and manage their content.
  • Users should be able to stream and download podcast episodes.
  • Users should be able to see detailed analytics about their listening habits.

Non-Functional Requirements

Core Requirements:

  • The system should prioritize low latency streaming with initial buffering under 200ms.
  • The system should support adaptive bitrate streaming to handle varying network conditions.
  • The system should handle 100 million concurrent users during peak hours.
  • The system should ensure 99.99% uptime for the streaming service.

Below the Line (Out of Scope):

  • The system should ensure DRM protection for all audio content.
  • The system should comply with data privacy regulations (GDPR, CCPA).
  • The system should accurately calculate royalty payments for rights holders.
  • The system should provide comprehensive monitoring and alerting.

Clarification Questions & Assumptions:

  • Platform: Mobile apps (iOS/Android), web, desktop, smart speakers, and car systems.
  • Scale: 500 million monthly active users, 80 million songs in catalog, 10 billion streams per day.
  • Audio Quality: Support multiple quality levels from 96 kbps to 320 kbps.
  • Content Delivery: Global distribution with multi-region CDN strategy.
  • Storage: Petabytes of audio storage with billions of user-generated playlists.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Before moving on to designing the system, it’s important to plan your strategy. For user-facing product-style questions, the plan should be straightforward: build your design up sequentially, going one by one through your functional requirements. This will help you stay focused and ensure you don’t get lost in the weeds.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

User: Any person using the platform to listen to music or podcasts. Includes personal information, subscription tier (free or premium), payment methods, listening preferences, and account settings.

Song: An individual audio track in the catalog. Contains metadata including title, artist, album, duration, genre, release date, audio file references in multiple formats and bitrates, and rights holder information for royalty calculation.

Artist: A musician or band with content on the platform. Contains their profile information, discography, follower count, verified status, and analytics access credentials.

Album: A collection of songs released together. Includes album artwork, release date, track listing, and associated artist information.

Playlist: A user-curated or algorithm-generated collection of songs. Contains metadata like name, description, cover image, visibility settings (public, private, collaborative), creator information, follower count, and the ordered list of tracks.

Recommendation: A personalized suggestion for a user. Can be individual songs, albums, artists, or playlists based on listening history, preferences, and collaborative filtering algorithms.

Stream: A record of a user playing a song. Tracks which user listened to which song, when they listened, how long they listened, the audio quality used, and whether the stream counts toward royalty payments.

API Design

Search Content Endpoint: Used by users to search across songs, albums, artists, playlists, and podcasts.

GET /search?q={query}&type={type}&limit={limit} -> SearchResults

Stream Song Endpoint: Used by clients to get the streaming manifest for a song, containing URLs to audio chunks at different bitrates.

POST /stream/{songId} -> StreamManifest
Body: {
  quality: "low" | "normal" | "high" | "very_high"
}

Create Playlist Endpoint: Used by users to create a new playlist.

POST /playlists -> Playlist
Body: {
  name: string,
  description: string,
  isPublic: boolean,
  isCollaborative: boolean
}

Add Song to Playlist Endpoint: Used to add songs to a playlist.

POST /playlists/{playlistId}/tracks -> Success
Body: {
  songId: string,
  position: number
}

Get Recommendations Endpoint: Used by clients to fetch personalized recommendations for a user.

GET /recommendations?seed={type}:{id}&limit={limit} -> Recommendations

Note: The userId is present in the session cookie or JWT and not in query params. Always consider security implications when designing APIs.

Track Playback Event Endpoint: Used by clients to report playback events for analytics and royalty calculation.

POST /events/playback -> Success
Body: {
  songId: string,
  eventType: "start" | "pause" | "skip" | "complete",
  timestamp: number,
  position: number,
  duration: number
}

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Users should be able to search for and stream music

The core components necessary to fulfill search and streaming are:

  • Client Applications: Available on multiple platforms (iOS, Android, Web, Desktop). Interface with the system’s backend services for all user interactions.
  • API Gateway: Acts as the entry point for all client requests, routing to appropriate microservices. Handles authentication, rate limiting, and request validation.
  • Search Service: Manages full-text search across all content types. Uses Elasticsearch for indexing songs, albums, artists, playlists, and podcasts with support for auto-complete, fuzzy matching, and relevance ranking.
  • Metadata Service: Manages all song, album, artist, and podcast metadata. Serves information about tracks, album artwork, artist images, genre classifications, and mood tags.
  • Streaming Service: Handles audio streaming requests and coordinates content delivery. Determines the appropriate CDN endpoints for audio chunks based on user location and network conditions.
  • CDN (Content Delivery Network): Distributed network of edge servers that cache and serve audio content. Uses providers like CloudFront and Akamai to ensure low-latency delivery worldwide.
  • Object Storage: Cloud storage (S3 or Google Cloud Storage) containing audio files in multiple formats and bitrates. Audio is encoded in AAC, Ogg Vorbis, and MP3 formats at various quality levels.
  • PostgreSQL Database: Stores structured data including user profiles, song metadata, artist information, and subscription data. Sharded by user_id or content_id for horizontal scaling.
  • Redis Cache: In-memory cache for hot data including session information, recently played songs, trending content, and frequently accessed metadata.

Search Flow:

  1. The user enters a search query in the client app, which sends a GET request to the search endpoint.
  2. The API gateway handles authentication and forwards to the Search Service.
  3. The Search Service queries Elasticsearch, which returns ranked results across all content types.
  4. Results are personalized based on the user’s listening history and preferences.
  5. The Search Service fetches additional metadata from the Metadata Service and returns enriched results.

Streaming Flow:

  1. The user selects a song to play, and the client requests a streaming manifest.
  2. The API gateway authenticates and forwards to the Streaming Service.
  3. The Streaming Service verifies the user’s subscription status and generates a manifest containing URLs to audio chunks at different bitrates.
  4. The client begins downloading chunks from the CDN, starting with the first few chunks for pre-buffering.
  5. The client’s adaptive bitrate controller selects the optimal quality based on network speed and buffer health.
  6. As playback continues, the client reports events (play, pause, skip) to the Analytics Service via Kafka for royalty tracking.
2. Users should be able to create, edit, and manage playlists

We extend our existing design to support playlist management:

  • Playlist Service: Manages CRUD operations for playlists. Handles playlist creation, song addition/removal, reordering, sharing, and following. Supports real-time collaborative editing for shared playlists.
  • WebSocket Server: Enables real-time bidirectional communication for collaborative playlist features. Pushes updates to all connected clients when collaborators make changes.

Playlist Creation Flow:

  1. The user creates a new playlist in the client, which sends a POST request with playlist details.
  2. The API gateway authenticates and forwards to the Playlist Service.
  3. The Playlist Service creates a new entry in the PostgreSQL playlists table with the provided metadata.
  4. For collaborative playlists, the service also initializes a Redis sorted set to track real-time edits.
  5. The Playlist Service returns the created playlist object to the client.

Collaborative Playlist Flow:

  1. When a user adds a song to a collaborative playlist, the client sends a POST request.
  2. The Playlist Service acquires a distributed lock on the playlist using Redis.
  3. It updates the Redis sorted set with the new song and its position.
  4. The change is asynchronously persisted to PostgreSQL.
  5. The service publishes the change event to a Redis pub/sub channel.
  6. The WebSocket Server, subscribed to this channel, pushes the update to all connected collaborators.
  7. Other users see the update in real-time without refreshing.
3. Users should receive personalized music recommendations

We need to introduce sophisticated recommendation components:

  • Recommendation Engine: Generates personalized recommendations using multiple algorithms. Combines collaborative filtering, content-based filtering, and natural language processing to create diverse, engaging recommendations.
  • Machine Learning Service: Trains and serves ML models for recommendations. Processes historical listening data to generate user and song embeddings, similarity scores, and preference predictions.
  • Analytics Service: Processes streaming data from Kafka to track user behavior. Calculates metrics like listen counts, skip rates, playlist additions, and engagement scores that feed into recommendation algorithms.
  • Kafka: Event streaming platform for handling real-time playback events. Provides durable, scalable message queuing for analytics pipelines.
  • Feature Store: Stores precomputed features for users and songs. Includes metrics like user’s favorite genres, average tempo preference, song popularity scores, and acoustic features.
  • Cassandra: NoSQL database optimized for time-series data. Stores listening history, playback events, and user activity logs with high write throughput.

Recommendation Generation Flow:

  1. The client requests recommendations by calling the recommendations endpoint.
  2. The API gateway forwards to the Recommendation Engine.
  3. The engine retrieves the user’s profile and recent listening history from Cassandra and Redis.
  4. It generates recommendations using multiple approaches in parallel:
    • Collaborative filtering finds similar users and suggests their favorite songs.
    • Content-based filtering finds songs with similar audio features to recently played tracks.
    • NLP-based filtering finds artists mentioned in similar contexts based on text analysis.
    • Trending songs are included for diversity.
    • Exploration recommendations introduce new genres for discovery.
  5. All recommendations are combined with weighted scoring, re-ranked based on context (time of day, device, activity), and returned to the client.

Discover Weekly Generation: This algorithmic playlist is generated weekly for each user with 30 unheard songs they’re likely to enjoy:

  1. A batch job identifies users similar to the target user using collaborative filtering on listening history.
  2. It collects songs that similar users enjoyed but the target user hasn’t heard.
  3. Songs are scored based on how many similar users liked them, audio feature similarity, freshness, and diversity.
  4. The top 30 songs are selected with constraints ensuring artist and genre variety.
  5. The playlist is created in the database and appears in the user’s library every Monday.
4. Users should be able to download content for offline listening

We add components to support offline functionality:

  • Download Manager: Handles download requests and queue management. Enforces subscription tier limits (premium only), tracks download counts, and manages storage quotas.
  • Background Workers: Process download queues and encrypt audio files. Download audio from the CDN, apply DRM encryption with user-specific keys, and store encrypted files on the user’s device.
  • DRM License Server: Issues and validates content licenses. Ensures users can only decrypt and play downloaded content while maintaining an active subscription.

Offline Download Flow:

  1. A premium user selects a playlist for offline download in the client.
  2. The client sends a POST request to the download endpoint with the playlist ID.
  3. The Download Manager verifies the user’s premium status and current download count against their quota.
  4. It retrieves all songs in the playlist and enqueues download tasks to a Redis queue.
  5. Background workers pull tasks from the queue, fetch audio files from the CDN, encrypt them with DRM, and signal the client to store the encrypted files locally.
  6. The client stores encrypted files in local storage with metadata for playback.
  7. During offline playback, the client requests a license from the DRM server (if cached), decrypts the audio, and plays it.

Sync Logic: When the device reconnects to the network, the sync process ensures offline content stays current:

  1. The client retrieves the list of downloaded playlists from local storage.
  2. For each playlist, it fetches the current song list from the server.
  3. It compares server state with local state to identify songs to download (added to playlist) and songs to delete (removed from playlist).
  4. New songs are queued for background download while removed songs are deleted from local storage.
  5. Metadata updates (song titles, album art) are also synchronized.

Step 3: Design Deep Dive

With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.

Deep Dive 1: How do we achieve low-latency audio streaming with adaptive bitrate?

Audio streaming at scale requires careful optimization to minimize buffering and adapt to network conditions:

Audio Encoding Pipeline:

When artists or labels upload music, the raw audio files go through a multi-stage processing pipeline:

  1. The original high-quality file (WAV or FLAC) is received by the Encoding Service.
  2. The audio is transcoded into multiple formats and bitrates: AAC at 256kbps for premium users, AAC at 128kbps for normal quality, AAC at 96kbps for data saver mode, and Ogg Vorbis and MP3 for compatibility.
  3. Each encoded file is segmented into 10-second chunks to enable adaptive streaming.
  4. Chunks are uploaded to S3 or Google Cloud Storage with multi-region replication.
  5. The CDN (CloudFront, Akamai) synchronizes these chunks to edge locations worldwide.
  6. Metadata about available formats and chunk URLs is stored in PostgreSQL.

Adaptive Bitrate Selection:

The client implements an adaptive bitrate controller that dynamically selects the optimal audio quality:

  1. The controller maintains a list of available bitrates (96, 128, 160, 256, 320 kbps).
  2. It continuously monitors network speed by measuring download times for recent chunks.
  3. It tracks the buffer health (percentage of audio buffered ahead of playback position).
  4. When buffer health is low (below 20%), it downgrade to a lower bitrate to prevent rebuffering.
  5. When network speed is high and buffer is healthy, it upgrades to higher quality.
  6. User preferences (low, normal, high quality) set the maximum allowed bitrate.
  7. Transitions between bitrates are smoothed to avoid frequent switching that could disrupt the listening experience.

CDN Strategy:

To minimize latency and bandwidth costs:

  1. Popular songs (top 1% accounting for 80% of streams) are cached at all edge locations globally.
  2. Less popular content is cached on-demand with longer TTLs.
  3. Cache-Control headers are set to max-age of one year since audio files are immutable.
  4. Regional caching prioritizes local music in specific geographic areas.
  5. Origin shielding protects storage from direct traffic, reducing egress costs.
  6. Multi-CDN setup provides redundancy: primary CDN handles most traffic, secondary CDN serves as failover, and direct S3/GCS access is the final fallback.

Client Streaming Logic:

The client implements sophisticated buffering and prefetching:

  1. When a user plays a song, the client requests a streaming manifest containing chunk URLs for all available bitrates.
  2. It immediately begins downloading the first 3 chunks (30 seconds) for pre-buffering.
  3. Once the buffer reaches a threshold, playback begins.
  4. A background process continues downloading chunks ahead of the playback position.
  5. The bitrate controller regularly evaluates network conditions and adjusts quality.
  6. When the user skips to a different position, the client fetches chunks from that position.
  7. All playback events (start, pause, skip, buffer underrun) are logged for analytics.

DRM Implementation:

Digital Rights Management protects content from piracy:

  1. Audio files in storage and on CDN are encrypted using AES-128.
  2. Each client platform uses a different DRM system: Widevine for Android, FairPlay for iOS, PlayReady for Windows.
  3. When requesting a stream, the client also requests a license from the License Server.
  4. The License Server validates the user’s subscription status before issuing a decryption key.
  5. Keys are time-limited and device-specific, preventing unauthorized sharing.
  6. The client decrypts chunks in memory before playback, never storing unencrypted audio.

Deep Dive 2: How do we build a recommendation system that keeps users engaged?

Spotify’s recommendation system is critical to user engagement and retention. It combines multiple techniques:

Collaborative Filtering:

This approach finds patterns in user listening behavior to recommend songs:

  1. User-item interaction data is collected from listening history stored in Cassandra.
  2. A sparse matrix is constructed where rows represent users, columns represent songs, and values represent play counts or listening time.
  3. Matrix factorization techniques (SVD or ALS) decompose this into lower-dimensional user and song embeddings (latent factors).
  4. Each user and song is represented as a vector in a 100-300 dimensional space.
  5. Songs with high dot product with a user’s vector are predicted to be enjoyed.
  6. The algorithm filters out songs the user has already heard.
  7. This approach works well for popular content but struggles with new songs (cold start problem).

Content-Based Filtering:

This approach recommends songs with similar audio characteristics:

  1. Audio features are extracted using signal processing and machine learning models.
  2. Features include tempo (BPM), key and mode (major/minor scale), energy level, danceability score, valence (positivity/happiness), acousticness, instrumentalness, loudness, and speechiness.
  3. These features are stored in the Feature Store for fast access.
  4. To find similar songs, the system computes cosine similarity between feature vectors.
  5. Songs with similar audio profiles to recently played tracks are recommended.
  6. This approach works well for new content but can lack diversity.

Natural Language Processing:

This approach analyzes text data to understand artist and genre relationships:

  1. Web scrapers collect text from music blogs, reviews, playlist descriptions, and social media.
  2. Artist mentions, genre tags, and contextual descriptions are extracted.
  3. Word embedding models (Word2Vec, BERT) create vector representations of artists and genres.
  4. Artists mentioned in similar contexts are considered similar.
  5. This provides cultural and contextual similarity beyond just audio features.
  6. It helps discover emerging artists and understand genre evolution.

Hybrid Recommendation System:

The production recommendation engine combines all approaches:

  1. Multiple recommendation algorithms run in parallel, each generating a candidate set.
  2. Collaborative filtering contributes 40% weight, finding songs that similar users enjoyed.
  3. Content-based filtering contributes 30% weight, finding songs with similar audio characteristics.
  4. NLP-based filtering contributes 15% weight, finding similar artists and genres.
  5. Trending songs contribute 10% weight for relevance and social proof.
  6. Exploration recommendations contribute 5% weight for diversity and discovery.
  7. All candidates are merged and deduplicated.
  8. A ranking model scores each song based on predicted engagement likelihood.
  9. Contextual re-ranking adjusts scores based on time of day, device, location, and inferred activity.
  10. For example, high-energy songs are boosted for morning or workout contexts, while mellow songs are boosted for evening relaxation.

Model Training Infrastructure:

The ML models require continuous training on fresh data:

  1. Batch processing jobs run weekly on historical listening data using Spark clusters.
  2. Feature engineering pipelines compute user and song features from raw events.
  3. Models are trained using distributed frameworks like TensorFlow or PyTorch.
  4. Trained models are versioned and stored in MLflow or model registries.
  5. A/B testing framework evaluates new models against production models, measuring click-through rate, listening time, and retention.
  6. Winning models are gradually rolled out to all users.
  7. Real-time feature computation updates the Feature Store continuously.
  8. Model serving infrastructure (TensorFlow Serving, Seldon) provides low-latency predictions.

Deep Dive 3: How do we manage real-time collaborative playlists at scale?

Collaborative playlists allow multiple users to edit the same playlist simultaneously, requiring real-time synchronization:

Data Model:

Playlists are stored in two places:

  1. PostgreSQL stores the authoritative state: playlist metadata (name, description, visibility, creator) in a playlists table, and playlist songs (playlist_id, song_id, position, added_by, added_at) in a playlist_songs table.
  2. Redis stores real-time working state: an active collaborative playlist is cached as a sorted set where the score represents song position.

Real-Time Collaboration Flow:

When a user edits a collaborative playlist:

  1. The client sends a request to add, remove, or reorder songs.
  2. The Playlist Service acquires a distributed lock on the playlist using Redis with a SET operation with NX and EX flags.
  3. While holding the lock, it reads the current playlist state from Redis.
  4. It applies the requested change to the Redis sorted set (ZADD for additions, ZREM for deletions, ZADD with new score for reordering).
  5. The lock is released after the Redis update completes.
  6. An event describing the change is published to a Redis pub/sub channel.
  7. The change is queued for asynchronous persistence to PostgreSQL.

WebSocket Notification:

Connected users receive instant updates:

  1. Each user editing a collaborative playlist maintains a WebSocket connection to the WebSocket Server.
  2. The WebSocket Server subscribes to Redis pub/sub channels for active playlists.
  3. When a change event is published, Redis pushes it to all subscribers.
  4. The WebSocket Server receives the event and pushes it to all connected clients viewing that playlist.
  5. Clients receive the notification and update their UI immediately without polling.
  6. This provides a Google Docs-like collaborative experience with real-time cursor positions and changes.

Conflict Resolution:

Race conditions are prevented through locking:

  1. The distributed lock ensures only one edit happens at a time per playlist.
  2. If a client’s request arrives while another edit is in progress, it waits briefly then retries.
  3. The lock TTL (5-10 seconds) ensures that if a server crashes, the lock expires automatically.
  4. Last-write-wins semantics apply: the most recent successful edit is considered authoritative.
  5. PostgreSQL persistence happens asynchronously with eventual consistency.

Deep Dive 4: How do we calculate artist royalties accurately at scale?

Royalty calculation is critical for maintaining relationships with artists and labels:

Event Collection:

Every playback event is tracked:

  1. When a user plays a song, the client sends periodic heartbeat events to track playback progress.
  2. Events include user ID, song ID, timestamp, playback position, duration played, and user subscription tier.
  3. Events are sent to the API Gateway and immediately published to a Kafka topic.
  4. Kafka provides durability and ordering guarantees, ensuring no events are lost.
  5. Events are partitioned by song ID for parallel processing.

Stream Validation:

Not all playback counts toward royalties:

  1. A stream is only billable if the user listened for at least 30 seconds (prevents accidental plays or previews).
  2. Free tier streams have different royalty rates than premium streams.
  3. Multiple plays of the same song by the same user in a short time window are deduplicated to prevent gaming.
  4. Bot detection algorithms filter out fraudulent streams.

Real-Time Processing:

Apache Flink processes events in real-time:

  1. Flink consumers read from Kafka topics.
  2. Streaming aggregations count plays per song, artist, and album in tumbling windows (1 hour).
  3. Results are pushed to Redis for real-time dashboards showing current listener counts.
  4. Aggregated data is also written to Cassandra for long-term storage.

Batch Processing:

Spark jobs run daily for comprehensive analytics:

  1. All playback events from the previous day are read from Cassandra.
  2. Aggregations compute total streams per song, geographic distribution, device type breakdown, and demographic analysis.
  3. Results are written to PostgreSQL in analytics tables.
  4. These tables power artist dashboards and internal business intelligence tools.

Monthly Royalty Calculation:

At the end of each month:

  1. The total subscription revenue for the month is calculated.
  2. Spotify’s share (typically 30%) is subtracted, leaving 70% for rights holders.
  3. Total billable streams for the month are summed across all songs.
  4. Per-stream rate is calculated by dividing the royalty pool by total streams (typically $0.003 to $0.005).
  5. For each song, the number of streams is multiplied by the per-stream rate to get the song’s total royalty.
  6. Rights holder splits are applied: the recording artist receives a percentage, the record label receives a percentage, songwriters and publishers receive mechanical royalties, and producers may receive points.
  7. Each rights holder’s earnings are aggregated across all their songs.
  8. Payment records are created in the Payment Service for disbursement.

Transparency and Analytics:

Artists have access to detailed analytics:

  1. Artist Portal provides dashboards showing daily listener counts, geographic distribution of fans, demographic breakdowns, playlist additions, and revenue estimates.
  2. Real-time metrics show which songs are trending and gaining momentum.
  3. Historical trends help artists understand their growth trajectory.
  4. Exportable reports support business planning and tour routing decisions.

Deep Dive 5: How do we handle offline downloads while protecting content?

Offline mode is a premium feature that requires careful implementation:

Download Quota Management:

Subscription tiers have different limits:

  1. Free users cannot download content for offline listening.
  2. Premium users can download up to 10,000 songs across up to 5 devices.
  3. The Download Manager tracks download counts in PostgreSQL per user.
  4. When a download request arrives, the manager checks the current count against the quota.
  5. If the quota is exceeded, the request is rejected with an appropriate error message.

Download Queue Processing:

Downloads happen asynchronously:

  1. When a user requests a playlist download, the Download Manager creates download tasks for each song.
  2. Tasks are pushed to a Redis queue (LPUSH to a list).
  3. Background worker processes (running on dedicated instances) pull tasks from the queue (BRPOP with blocking).
  4. Workers download audio files from the CDN in the appropriate quality for the user’s settings.
  5. Files are encrypted using AES-128 with a user-specific and device-specific key.
  6. The encrypted file is returned to the client, which stores it in local device storage.
  7. Metadata about downloaded songs is stored in local SQLite database on the device.

DRM for Offline Content:

Downloaded content is protected:

  1. Each device has a unique device ID generated during first app launch.
  2. The License Server issues a long-lived license (e.g., 30 days) for downloaded content.
  3. The license is stored securely in the device’s keychain (iOS) or keystore (Android).
  4. During offline playback, the client uses the cached license to decrypt and play the audio.
  5. When the license expires, the device must connect to the network to renew it.
  6. If the user’s subscription lapses, the License Server refuses renewal, rendering downloaded content unplayable.

Sync and Conflict Resolution:

When the device comes online:

  1. The client compares local downloaded playlists with server state.
  2. For each playlist, it identifies songs added since last sync (to download) and songs removed since last sync (to delete locally).
  3. New download tasks are queued for additions.
  4. Removed songs are deleted from local storage to free up space.
  5. Metadata updates (e.g., changed song titles, updated album art) are also synchronized.
  6. This ensures offline content stays current with user’s playlists.

Storage Management:

Device storage is limited:

  1. The client estimates storage requirements before downloading (song duration multiplied by bitrate).
  2. If insufficient storage is available, the user is prompted to free up space or select fewer songs.
  3. Users can configure download quality (lower quality uses less storage).
  4. The app provides tools to view storage usage and selectively remove downloaded content.

Deep Dive 6: How do we scale search across hundreds of millions of songs and users?

Search is a critical feature that must be fast and relevant:

Elasticsearch Indexing:

All content is indexed for search:

  1. Songs, albums, artists, playlists, and podcasts are indexed in Elasticsearch.
  2. Each document contains searchable fields: name/title with multiple analyzers (standard for full-text, keyword for exact match, edge n-gram for autocomplete), artist names, album names, genres as keyword fields, popularity scores, release dates, and durations.
  3. Indexes are sharded across multiple Elasticsearch nodes for parallelism.
  4. Replicas provide redundancy and increase read capacity.
  5. When new content is added to PostgreSQL, a change data capture (CDC) system (e.g., Debezium) streams updates to Kafka, and a consumer service reads from Kafka and updates Elasticsearch in near real-time.

Query Processing:

Search queries are optimized for relevance:

  1. User queries are sent to the Search Service.
  2. The service constructs an Elasticsearch multi_match query that searches across multiple fields with different weights (song name gets 3x weight, artist name gets 2x weight, album name gets 1x weight).
  3. Fuzzy matching handles typos and variations.
  4. Results are scored by Elasticsearch’s BM25 algorithm, which considers term frequency, inverse document frequency, and field length.
  5. The Search Service retrieves more results than needed (e.g., 100 instead of 20) to allow for re-ranking.

Personalized Ranking:

Results are personalized per user:

  1. The Search Service retrieves the user’s listening history and preferences from Redis or Cassandra.
  2. It applies a re-ranking function that boosts songs from artists the user frequently listens to, songs in genres the user prefers, and songs the user has previously added to playlists.
  3. Popularity is also factored in: highly popular songs are boosted for ambiguous queries.
  4. The re-ranked results are returned to the client.

Autocomplete:

Fast suggestions improve user experience:

  1. As the user types, the client sends prefix queries to the Search Service.
  2. The service uses the edge n-gram analyzer field for efficient prefix matching.
  3. Results are sorted by popularity to show the most likely matches first.
  4. Autocomplete responses are cached in Redis with short TTLs (1 minute) to reduce load.
  5. The client debounces requests to avoid sending a query for every keystroke.

Scaling Strategy:

To handle massive query volume:

  1. Elasticsearch cluster is horizontally scaled with additional nodes as traffic grows.
  2. Read replicas handle the majority of search traffic.
  3. Hot data (popular songs, trending searches) is cached in Redis to reduce Elasticsearch load.
  4. Query results are cached with TTLs based on query type (longer for static queries, shorter for personalized queries).
  5. Rate limiting prevents abuse and ensures fair resource allocation.

Step 4: Wrap Up

In this chapter, we proposed a system design for a music streaming platform like Spotify. If there is extra time at the end of the interview, here are additional points to discuss:

Additional Features:

  • Podcast support: Extend the platform to stream podcast episodes with resume playback functionality, chapter markers, and variable speed playback.
  • Social features: Add friend following, activity feeds showing what friends are listening to, sharing songs and playlists, and user profiles with statistics.
  • Radio stations: Generate infinite radio stations based on seed songs, artists, or genres using recommendation algorithms.
  • Lyrics integration: Display synchronized lyrics that highlight in real-time during playback.
  • Concerts and events: Show upcoming concerts for artists a user follows, with ticket purchasing integration.
  • Canvas: Short looping video content displayed while a song plays for enhanced visual experience.

Scaling Considerations:

  • Horizontal Scaling: All services are designed to be stateless and can scale horizontally by adding more instances behind load balancers.
  • Database Sharding: PostgreSQL can be sharded by user_id for user data and by content_id for catalog data to distribute load.
  • Caching Layers: Multiple levels of caching reduce database load: client-side caching for recently played songs, Redis caching for hot metadata and session data, and CDN caching for audio files and images.
  • Message Queue Scaling: Kafka topics are partitioned to enable parallel processing of events. Consumer groups ensure each event is processed exactly once.
  • Geographic Distribution: Deploy services in multiple AWS regions to reduce latency for global users and provide disaster recovery capabilities.

Error Handling:

  • Network Failures: Clients implement exponential backoff retry logic for failed requests. Audio chunks are retried with fallback to lower quality if higher quality fails.
  • Service Failures: Circuit breakers prevent cascading failures. If the Recommendation Service is down, the system falls back to showing trending content instead of personalized recommendations.
  • Database Failures: PostgreSQL read replicas provide failover capability. If the primary fails, a replica is promoted. Redis Cluster provides automatic failover for cache.
  • Third-Party API Failures: If the mapping API fails during fare estimation, the system falls back to cached estimates based on straight-line distance.

Security Considerations:

  • Encrypt sensitive data in transit using TLS and at rest using AES-256.
  • Implement proper authentication using JWT tokens with short expiration times and refresh token rotation.
  • Use OAuth 2.0 for third-party integrations like social login.
  • Apply rate limiting at the API Gateway level to prevent abuse and DDoS attacks (e.g., 1000 requests per hour per user).
  • Validate and sanitize all user inputs to prevent injection attacks.
  • Implement anomaly detection to identify and block fraudulent streaming for royalty fraud prevention.

Monitoring and Analytics:

  • Track key metrics: stream start latency (p50, p95, p99), buffering rate and rebuffering events, recommendation click-through rate and listening time, search relevance metrics, error rates per service, database query performance, and Kafka consumer lag.
  • Set up alerting for critical thresholds: stream latency exceeding 500ms for 5 minutes, error rate exceeding 1% for any service, database connection pool exhaustion, and Kafka lag exceeding 10,000 messages.
  • Use distributed tracing (Jaeger or Zipkin) to track requests across microservices and identify bottlenecks.
  • Centralized logging (ELK stack or Datadog) aggregates logs from all services for debugging.
  • Real-time dashboards show system health and key business metrics.

Cost Optimization:

  • Storage: Use S3 Intelligent-Tiering to automatically move infrequently accessed audio files to cheaper storage tiers. Apply lifecycle policies to archive very old content to Glacier.
  • Bandwidth: Multi-CDN strategy allows negotiating better rates. Adaptive bitrate streaming reduces bandwidth consumption for users on slow networks. Regional caching minimizes cross-region data transfer costs.
  • Compute: Use auto-scaling to match instance count to demand, scaling down during off-peak hours. Use spot instances for batch processing jobs like recommendation model training. Reserve instances for baseline load to get discounts.
  • Database: Use read replicas instead of scaling up primary instances. Cache aggressively to reduce database queries. Implement connection pooling to reduce connection overhead.

Future Enhancements:

  • AI-generated playlists: Use advanced language models to create playlists based on natural language descriptions like “upbeat songs for a road trip on a sunny day”.
  • Voice control: Deep integration with voice assistants for hands-free control and voice-based search.
  • Social listening: Real-time group listening sessions where friends can listen together remotely with synchronized playback.
  • Enhanced audio: Support for lossless audio (FLAC), spatial audio (Dolby Atmos), and high-resolution audio for audiophiles.
  • Live performances: Stream live concerts and exclusive performances from artists directly through the platform.
  • Artist collaboration tools: Enable artists to collaborate on tracks directly within the platform with version control and commenting.

Congratulations on getting this far! Designing Spotify is a complex system design challenge that covers many distributed systems concepts including content delivery networks, recommendation systems, real-time collaboration, event-driven architectures, and massive-scale data processing. The key is to start with core functionality, then systematically address scalability, reliability, and advanced features.


Summary

This comprehensive guide covered the design of a music streaming platform like Spotify, including:

  1. Core Functionality: Search and streaming, playlist management, personalized recommendations, and offline downloads.
  2. Key Challenges: Low-latency adaptive streaming, sophisticated recommendation algorithms, real-time collaborative editing, accurate royalty calculations, and efficient search at scale.
  3. Solutions: Multi-CDN strategy with adaptive bitrate streaming, hybrid recommendation engine combining collaborative filtering and content-based approaches, WebSocket-based real-time synchronization, event-driven analytics pipeline with Kafka, and Elasticsearch-powered search with personalization.
  4. Scalability: Horizontal scaling of stateless services, database sharding and replication, multi-level caching strategy, message queue partitioning, and geographic distribution.

The design demonstrates how to build a highly available, low-latency streaming platform that handles hundreds of millions of users, billions of daily streams, and petabytes of content while providing personalized experiences and protecting content through DRM.