Design Dropbox

Dropbox is a distributed file storage and synchronization service that allows users to store files in the cloud, sync them across multiple devices, and share them with others. At its core, it’s a complex distributed system that handles billions of files, petabytes of data, and millions of concurrent users while maintaining strong consistency guarantees and sub-second synchronization latency.

In this design, we’ll architect a system capable of handling 500M users, 100M daily active users, with an average of 3 devices per user, storing 2 exabytes of data across billions of files. We’ll focus on the technical challenges that make Dropbox unique: efficient file chunking and deduplication, real-time synchronization, conflict resolution, and bandwidth optimization.

Step 1: Understand the Problem and Establish Design Scope

Before diving into the architecture, let’s establish clear functional and non-functional requirements. Understanding the scope helps us make informed decisions about which features to prioritize and which tradeoffs to accept.

Functional Requirements

Core Requirements (Priority 1-3):

  1. Users should be able to upload files up to 50GB and download them from any device.
  2. Changes to files should be automatically synced across all connected devices in real-time.
  3. Users should be able to share files and folders with other users with granular permissions.
  4. The system should maintain version history for files with a minimum retention of 30 days.
  5. The system should handle conflicts when the same file is modified on multiple devices simultaneously.
  6. Users should be able to access and modify files offline with changes syncing when back online.

Below the Line (Out of Scope):

  • Users should be able to selectively sync specific folders to certain devices.
  • Users should be able to preview common file types without downloading.
  • Real-time collaborative editing like Google Docs.
  • Advanced search with content indexing.
  • Paper or document creation features.
  • Mobile-specific optimizations.

Non-Functional Requirements

Core Requirements:

  • The system should prioritize low sync latency with P99 under 2 seconds from file modification to notification on other devices.
  • The system should ensure 99.999999999% (11 nines) durability with no data loss.
  • The system should achieve 99.95% availability with less than 22 minutes downtime per month.
  • The system should minimize bandwidth usage through efficient deduplication and compression.
  • The system should ensure strong consistency for metadata operations.

Below the Line (Out of Scope):

  • Advanced security features like zero-knowledge encryption.
  • Detailed compliance with specific regulations like GDPR.
  • Mobile-specific bandwidth and battery optimizations.
  • Advanced monitoring, logging, and alerting systems.

Clarification Questions & Assumptions:

  • Scale: 500 million total users with 100 million daily active users.
  • Storage: Average user stores 20GB, with deduplication reducing total storage by approximately 50%.
  • Devices: Average of 3 devices per user requiring synchronization.
  • File Operations: System handles approximately 35K file operations per second at peak.
  • File Sizes: Average file size is 2MB with support for files up to 50GB.
  • Bandwidth: Approximately 2 PB of data uploaded daily.

Step 2: Propose High-Level Design and Get Buy-in

Planning the Approach

Our design strategy will proceed sequentially through each functional requirement. This ensures we build a cohesive system that addresses user needs systematically. We’ll start with basic file upload and download, then add synchronization, sharing, versioning, conflict resolution, and finally offline capabilities.

Defining the Core Entities

To satisfy our key functional requirements, we’ll need the following entities:

User: Represents any person using the platform to store and sync files. Contains personal information, storage quota, current storage usage, and authentication credentials.

File: Represents a file or folder in the system. Includes metadata such as name, path, size, content hash, chunk manifest (list of chunks that compose the file), version number, creation and modification timestamps, and ownership information.

Chunk: A portion of a file, typically 4MB in size. Chunks are the fundamental unit of storage and deduplication. Each chunk is identified by its SHA-256 content hash, making identical chunks across different files or users deduplicate automatically.

Version: A historical snapshot of a file at a specific point in time. Contains the chunk manifest for that version, timestamp, and which device created it. Enables users to restore previous versions of files.

Share: Represents a sharing relationship between a file owner and another user. Contains permissions (read, write, comment), the identities of both parties, and when the share was created.

Device: Represents a registered device for a user (desktop, mobile, web). Tracks the device’s last sync timestamp, sync cursor for delta sync, and device-specific metadata.

API Design

Initialize Upload Endpoint: Used by clients to initiate a file upload session. Returns an upload ID that will be used for subsequent chunk uploads.

POST /upload/init -> UploadSession
Body: {
  fileName: string,
  fileSize: number,
  contentHash: string
}

Upload Chunk Endpoint: Used by clients to upload individual file chunks. The service checks if the chunk already exists to enable deduplication.

POST /upload/chunk -> ChunkStatus
Body: {
  uploadId: string,
  chunkHash: string,
  chunkData: binary
}

Commit Upload Endpoint: Finalizes the upload by creating the file metadata record with the complete chunk manifest.

POST /upload/commit -> File
Body: {
  uploadId: string,
  chunkManifest: string[]
}

Get File Metadata Endpoint: Retrieves metadata about a file or folder, including its chunk manifest for downloads.

GET /files/:fileId -> File

List Files Endpoint: Returns a list of files and folders in a directory for a user.

GET /files?path={path} -> File[]

Share File Endpoint: Creates a sharing relationship, granting another user access to a file or folder.

POST /shares -> Share
Body: {
  fileId: string,
  sharedWithEmail: string,
  permission: "read" | "write" | "comment"
}

Get Delta Sync Endpoint: Returns all changes since the last sync cursor, enabling efficient incremental synchronization.

GET /sync/delta?cursor={cursor} -> DeltaResponse

Update File Endpoint: Updates file metadata when a file is modified, including optimistic locking via version numbers.

PATCH /files/:fileId -> File
Body: {
  chunkManifest: string[],
  version: number
}

Note: All authenticated endpoints use session tokens or JWT for authentication, never trusting client-provided user IDs.

High-Level Architecture

Let’s build up the system sequentially, addressing each functional requirement:

1. Users should be able to upload files and download them from any device

The core components necessary for basic file storage are:

  • Client Application: Available on desktop (Windows, macOS, Linux), mobile (iOS, Android), and web. Contains a file watcher to detect local changes, a chunker to break files into chunks, and an upload/download manager for network operations.
  • API Gateway: Entry point for all client requests. Handles SSL termination, authentication, rate limiting, and routing to appropriate backend services.
  • Upload Service: Manages file upload operations. Performs chunk-level deduplication by checking if chunks already exist before storing them. Communicates with block storage to persist unique chunks.
  • Metadata Service: Central service managing file and folder metadata. Stores file hierarchy, ownership, permissions, and chunk manifests. Uses a relational database sharded by user ID.
  • Block Storage Service: Abstraction layer over cloud object storage like Amazon S3 or Google Cloud Storage. Stores file chunks using content-addressed naming based on chunk hashes.

File Upload Flow:

  1. The client application detects a new file (via file watcher) and breaks it into chunks using content-defined chunking.
  2. The client computes SHA-256 hashes for each chunk and initiates an upload session by calling the Initialize Upload endpoint.
  3. The Upload Service returns an upload ID and the client begins uploading chunks.
  4. For each chunk, the Upload Service checks if the chunk hash already exists in storage. If it does, the upload is skipped (deduplication). If not, the chunk data is stored in Block Storage.
  5. Once all chunks are uploaded, the client commits the upload with the complete chunk manifest.
  6. The Metadata Service creates a file record with the manifest, linking the file to its constituent chunks.

File Download Flow:

  1. The client requests file metadata from the Metadata Service, which returns the chunk manifest.
  2. The client downloads chunks in parallel from Block Storage.
  3. Chunks are concatenated in order to reconstruct the original file.
  4. The reconstructed file is written to the local filesystem.
2. Changes to files should be automatically synced across all connected devices in real-time

We introduce new components to enable real-time synchronization:

  • Notification Service: Maintains persistent WebSocket connections with all online devices. When file changes occur, publishes notifications to relevant devices instantly. Falls back to long polling for clients behind restrictive firewalls.
  • Sync Service: Manages synchronization state for each device. Tracks which files each device has and which updates they need. Generates delta sync operations to minimize data transfer.

Synchronization Flow:

  1. A user modifies a file on Device A. The file watcher detects the change immediately.
  2. Device A chunks the modified file and compares the new chunk manifest with the previous one to identify changed chunks.
  3. Device A uploads only the changed chunks to the Upload Service (delta sync).
  4. Device A updates the file metadata with the new chunk manifest and increments the version number.
  5. The Metadata Service publishes a change notification containing the file ID, new version, and chunk manifest.
  6. The Notification Service receives this event and pushes notifications to all other devices registered to that user (Devices B, C, D).
  7. Each device receives the notification, compares the new manifest with their local manifest, downloads missing chunks, and reconstructs the file.
  8. Devices update their local filesystem with the new file contents.

The entire process typically completes in under 2 seconds, providing a seamless synchronization experience.

3. Users should be able to share files and folders with granular permissions

We extend the Metadata Service to support sharing:

  • Permission Model: Defines different access levels: Owner (full control), Editor (read and write), Viewer (read-only), and Commenter (view and comment).
  • Shares Table: Stores sharing relationships between file owners and recipients, including permission levels.

Sharing Flow:

  1. User A wants to share a folder with User B. They enter User B’s email address and select the “editor” permission level.
  2. The client calls the Share File endpoint with the file ID, recipient email, and permission.
  3. The Metadata Service validates that User A owns the file, looks up User B by email, and creates a share record.
  4. The Notification Service sends a notification to User B’s devices informing them of the new shared folder.
  5. User B’s devices create a reference to the shared folder in their local database, and it appears in their file system.
  6. When User B modifies a file in the shared folder, the system checks their permission level (editor, so modification is allowed), updates the file on User A’s shard, and notifies both users’ devices.

Permission inheritance ensures that when a folder is shared, all files and subfolders within it inherit the same permissions, simplifying management.

4. The system should maintain version history with minimum 30-day retention

We add a versioning system:

  • Versions Table: Stores historical snapshots of files, including the chunk manifest, size, timestamp, and which device created the version.

Versioning Flow:

  1. Every time a file is modified, before updating the current file record, the system creates a new entry in the Versions table.
  2. The version entry captures the previous chunk manifest, allowing the file to be restored to that state.
  3. A background cleanup job runs periodically to enforce the retention policy: keep all versions for 30 days, then retain only every 10th version for up to one year, and finally keep only the current version.
  4. When chunks are no longer referenced by any file or version, they can be safely deleted from Block Storage to reclaim space.

Users can browse version history through the client application and restore previous versions if needed.

5. The system should handle conflicts when files are modified on multiple devices simultaneously

We implement conflict detection and resolution:

  • Optimistic Locking: Each file has a monotonically increasing version number. When updating a file, the client must provide the expected version number. The database performs an atomic update only if the version matches.
  • Conflict Detection: If the version number doesn’t match, a conflict is detected, meaning another device modified the file concurrently.

Conflict Resolution Flow:

  1. Device A attempts to update a file, expecting version 41. However, Device B already updated it to version 42.
  2. The Metadata Service detects the version mismatch and returns a conflict error.
  3. The system uses a Last-Write-Wins strategy with conflict copies: The modification with the later timestamp becomes the canonical version.
  4. The earlier modification is saved as a separate file with a name like “file (Device A’s conflicted copy).txt”.
  5. Both files are synced to all devices, and users are notified of the conflict.
  6. Users can manually review and merge the conflicted copies if desired.

This approach ensures no data is lost while providing a clear resolution path. For text files, more sophisticated strategies like three-way merge or operational transformation can be applied, but binary files default to conflict copies.

6. Users should be able to work offline with changes syncing when back online

We add offline capabilities:

  • Local SQLite Database: Each client maintains a local database tracking file metadata, sync status, and pending operations.
  • Pending Operations Queue: When offline, all file operations (create, update, delete, move) are queued locally.

Offline Flow:

  1. While offline, the user modifies several files. The file watcher detects these changes.
  2. The client chunks the files and stores pending upload operations in the local database with status “pending_upload”.
  3. When connectivity returns, the client detects it and establishes a WebSocket connection with the Notification Service.
  4. The client calls the Delta Sync endpoint with its last sync cursor to retrieve all remote changes that occurred while offline.
  5. The client downloads and applies remote changes first to ensure it has the latest state.
  6. Next, the client processes its pending operations queue, uploading local changes.
  7. If any local changes conflict with remote changes (detected via version numbers), the conflict resolution strategy is applied.
  8. Once all operations complete successfully, they’re removed from the queue, and the sync cursor is updated.

This design ensures seamless offline functionality with automatic synchronization on reconnection.

Step 3: Design Deep Dive

With the core functional requirements addressed, let’s examine the critical technical challenges that enable the system to operate at scale with high performance.

Deep Dive 1: How do we efficiently chunk files and achieve effective deduplication?

Efficient storage is critical at Dropbox’s scale. Simply storing complete files would be wasteful. When a user modifies a 1GB video by changing metadata, we shouldn’t re-upload the entire file. This is where chunking and deduplication come in.

Why Content-Defined Chunking?

Fixed-size chunking, where files are split at regular intervals (e.g., every 4MB), has a fatal flaw: if you insert even a single byte at the beginning of a file, all subsequent chunks shift and become completely different. This breaks deduplication entirely.

Content-Defined Chunking (CDC) solves this by using a rolling hash to find chunk boundaries based on the actual content, not the position. We use Rabin fingerprinting, a technique that slides a window over the file and looks for specific patterns in the hash value.

How Rabin Fingerprinting Works:

The algorithm maintains a rolling hash window of 48 bytes. As it scans through the file byte by byte, it updates the hash by adding the new byte and removing the oldest one. When the hash value matches a specific pattern (determined by a mask), we declare a chunk boundary.

For example, with a mask of 0x3FFFFF, we get chunk boundaries approximately every 4MB on average. We also enforce minimum (1MB) and maximum (8MB) chunk sizes to prevent pathologically small or large chunks.

The Deduplication Process:

On the client side, when uploading a file, we chunk it using Rabin fingerprinting and compute a SHA-256 hash for each chunk. We then send just the list of chunk hashes to the server.

The Upload Service checks each hash against its chunk index (typically cached in Redis for performance). If a chunk already exists, we skip uploading it and simply increment its reference count. Only new, unique chunks are uploaded to Block Storage.

This achieves powerful deduplication:

  • Cross-user deduplication: If two users have identical files, we store it once.
  • Delta sync: Modifying 1MB in a 1GB file only uploads the changed chunks.
  • Bandwidth savings: Typically 50-70% reduction in upload bandwidth.

Chunk Storage Strategy:

Chunks are stored in S3 with keys based on their content hash, organized into a hierarchy to prevent hot partitions: s3://dropbox-blocks/{region}/{hash[0:2]}/{hash[2:4]}/{hash}. This sharding improves S3 performance by distributing requests across multiple partitions.

File Reconstruction:

To download a file, we retrieve its chunk manifest from the Metadata Service, fetch all chunks from S3 in parallel (typically 4-8 concurrent downloads), and concatenate them in order to reconstruct the original file.

The beauty of this approach is that it works transparently: from the user’s perspective, they’re uploading and downloading complete files, but under the hood, we’re optimizing every byte transfer.

Deep Dive 2: How do we implement efficient real-time synchronization at scale?

Synchronization is the heart of Dropbox. It must be fast, reliable, and efficient even with millions of concurrent users making changes.

The Sync State Machine:

Each device maintains a state for every file: IN_SYNC (local and remote match), UPLOADING (local changes being pushed), DOWNLOADING (remote changes being pulled), CONFLICT (both modified simultaneously), or OFFLINE (queued changes).

Client-Side Sync Algorithm:

When the file watcher detects a local modification, the client immediately computes a new chunk manifest using content-defined chunking. It compares this with the previous manifest stored locally to identify exactly which chunks changed. Only these changed chunks are uploaded, achieving delta sync.

The client then calls the Metadata Service to update the file record with the new chunk manifest, providing the expected version number for optimistic locking. If successful, the file version increments, and the client updates its local state to IN_SYNC.

Server-Side Notification:

When the Metadata Service receives a file update, it publishes a change event to a Redis Pub/Sub channel. The Notification Service subscribes to these channels and immediately pushes notifications to all relevant devices over their WebSocket connections.

The notification includes the file ID, new version number, and new chunk manifest. Each device compares this manifest with its local one, identifies missing chunks, downloads them in parallel, and reconstructs the file.

WebSocket Architecture:

The Notification Service maintains persistent WebSocket connections with all online devices. Each server instance handles approximately 50,000 concurrent connections. With 100 million DAU and 3 devices each, we need about 6,000 notification servers.

Connection state is stored in Redis (deviceId to serverId mapping). When a device connects, it’s assigned to a server, and that mapping is cached. When the Metadata Service publishes a file change event, it queries Redis to find which servers have interested devices, and the event is routed only to those servers.

Delta Sync for Reconnection:

When a device comes back online after being disconnected, it doesn’t want to download the entire file list. Instead, it provides a sync cursor (a timestamp or version marker) to the Delta Sync endpoint, which returns only files that changed since that cursor.

The response is paginated with a next cursor value, allowing the client to incrementally fetch changes without overwhelming the system or the network connection. This is especially important for mobile devices on slow networks.

Optimizations:

To achieve sub-2-second P99 sync latency, we employ several optimizations:

  • Parallel chunk transfers (4-8 concurrent downloads) to saturate bandwidth.
  • TCP connection pooling to avoid slow-start overhead.
  • Compression for compressible file types.
  • Prioritizing small files and frequently accessed files in the sync queue.
  • Batching multiple small file updates into a single notification when appropriate.

Deep Dive 3: How do we resolve conflicts reliably while minimizing user disruption?

Conflicts are inevitable in a distributed system where multiple devices can modify files concurrently. Our conflict resolution strategy must be deterministic, preserve user data, and be understandable.

Optimistic Locking for Conflict Detection:

Each file record in the database has a version number. When updating, we use an atomic database operation: update the file only if the current version matches the expected version. If the update affects zero rows, we know a conflict occurred.

This is implemented with a SQL statement like: UPDATE files SET content_hash = new_hash, version = version + 1 WHERE file_id = 123 AND version = 41. If another device already incremented the version to 42, this update fails, and we detect the conflict.

Last-Write-Wins with Conflict Copies:

For most files, especially binary files, we use a Last-Write-Wins strategy. We compare the modification timestamps of the conflicting versions. The version with the later timestamp becomes the canonical file.

However, we don’t discard the earlier version. Instead, we create a “conflict copy” with a generated filename like “report (Alice’s MacBook conflicted copy 2025-11-18).pdf”. Both files are synced to all devices, and the user is notified.

This ensures no data is ever lost due to conflicts. Users can review both versions and manually merge them if needed.

Three-Way Merge for Text Files:

For text-based files like source code or documents, we can be more sophisticated. We use a three-way merge algorithm that looks at the common ancestor (the last version both devices had) and both modified versions.

The algorithm identifies which lines were added, removed, or modified by each device. If changes don’t overlap, they can be automatically merged. If the same lines were modified differently, we create a conflict marker in the file that the user must resolve.

Operational Transformation for Real-Time Collaboration:

For advanced collaborative features (not in initial scope but worth mentioning), we can use Operational Transformation (OT). OT transforms concurrent operations to account for each other, allowing multiple users to edit the same document simultaneously without conflicts.

For example, if User A inserts “Hello” at position 0 and User B inserts “World” at position 0, OT transforms User B’s operation to account for User A’s insertion, resulting in deterministic, consistent behavior across all clients.

Conflict Resolution Flow:

When a conflict is detected:

  1. The server identifies both conflicting versions and their timestamps.
  2. The later version becomes the canonical file (updated in the files table).
  3. The earlier version is created as a new file with a conflict copy name.
  4. Both files are synced to all devices via the Notification Service.
  5. Users receive a notification explaining the conflict and how to resolve it.
  6. Version history is preserved for both versions, allowing restoration if needed.

This approach prioritizes data preservation and user control over fully automatic resolution.

Deep Dive 4: How do we scale metadata storage to handle 500 million users and billions of files?

The Metadata Service is the system’s source of truth for file hierarchy, permissions, and versions. It must be highly available, consistent, and able to handle massive scale.

Database Schema Design:

We use PostgreSQL for metadata storage due to its strong consistency guarantees, ACID transactions, and rich indexing capabilities. Our schema includes:

Users Table: Stores user accounts with email, password hash, storage quota, and current storage usage.

Files Table: The core table containing file metadata. Each record includes file ID, owner user ID, parent folder ID (for hierarchy), file name, full path (denormalized for fast lookups), whether it’s a folder, content hash, chunk manifest stored as JSONB, file size, version number, creation timestamp, modification timestamp, and soft deletion timestamp.

File Versions Table: Historical snapshots of files. Contains version ID, file ID, version number, chunk manifest at that version, size, creation timestamp, and which device created it.

Shares Table: Sharing relationships between file owners and recipients, with permission levels.

Devices Table: Registered devices for each user, tracking last sync timestamp, sync cursor, and activity status.

Sharding Strategy:

With 500 million users, a single database instance can’t handle the load. We shard by user_id using consistent hashing: shard_id = hash(user_id) modulo NUM_SHARDS.

This has significant advantages:

  • All of a user’s files reside on the same shard, avoiding distributed joins.
  • Easy horizontal scaling by adding more shards.
  • Simplified backup and maintenance (operate on one shard at a time).
  • Natural isolation between users for security and performance.

The primary challenge is shared folders, which span shards. We handle this by keeping shared folder metadata on the owner’s shard and creating reference records on participants’ shards. Reads may require cross-shard queries, but this is an acceptable tradeoff given the benefits.

Indexing Strategy:

Critical indexes include:

  • Index on (user_id, parent_id) for fast folder listing.
  • Index on (user_id, path) for path-based lookups.
  • Index on content_hash for deduplication checks.
  • Index on (file_id, version) in the versions table for version history queries.

Caching Layer:

To reduce database load, we cache heavily accessed metadata in Redis:

  • File metadata for recently accessed files (TTL: 5 minutes).
  • User storage quota and usage (TTL: 1 minute).
  • Share permissions (TTL: 10 minutes).
  • Directory listings for frequently browsed folders (TTL: 30 seconds).

Cache invalidation occurs when files are modified, ensuring consistency. We use Redis Pub/Sub to broadcast invalidation events to all application servers.

Version Management and Cleanup:

Storing every version forever would consume unbounded storage. We implement a retention policy:

  • Keep all versions for 30 days (configurable per subscription tier).
  • After 30 days, keep only every 10th version for up to one year.
  • After one year, keep only the current version.

A background job runs daily to identify and delete old versions based on this policy. When a version is deleted, we decrement reference counts on its chunks. When a chunk’s reference count reaches zero (no files or versions reference it), it’s deleted from Block Storage.

This balances user utility (access to recent version history) with storage efficiency.

Deep Dive 5: How do we implement secure file sharing with fine-grained permissions?

File sharing requires careful permission management, access control enforcement, and security to prevent unauthorized access.

Permission Model:

We support four permission levels:

  • Owner: Full control, can delete, rename, move, and change permissions.
  • Editor: Can read, write, create, and delete files but cannot change permissions or delete the root shared folder.
  • Viewer: Read-only access, can view and download files but cannot modify them.
  • Commenter: Can view files and add comments (for collaboration features).

Permission Inheritance:

Permissions are inherited hierarchically. When a folder is shared, all files and subfolders within it automatically inherit the same permissions unless explicitly overridden. This simplifies management and provides intuitive behavior.

Access Control Enforcement:

Every file operation (read, write, delete, share) goes through a permission checker. The checker:

  1. Determines if the user is the owner (if so, grant all permissions).
  2. Queries the shares table for explicit shares to this user for this file.
  3. If no direct share exists, recursively checks parent folders for inherited permissions.
  4. Compares the user’s permission level with the required permission level using a hierarchy (owner > editor > commenter > viewer).
  5. Returns allowed or denied.

This logic is implemented centrally in the Metadata Service and enforced consistently across all operations.

Sharing Flow in Detail:

When User A shares a folder with User B:

  1. User A’s client sends a share request with User B’s email and desired permission level.
  2. The Metadata Service validates that User A owns the folder.
  3. The service looks up User B’s user ID by email.
  4. A share record is created in the shares table linking the folder, User A (owner), User B (recipient), and the permission level.
  5. The Notification Service sends a notification to all of User B’s online devices.
  6. User B’s devices receive the notification and create a local reference to the shared folder.
  7. The shared folder appears in User B’s file listing, typically in a special “Shared with me” section.

When User B modifies a file in the shared folder:

  1. User B’s client uploads the changes.
  2. The Metadata Service checks permissions using the permission checker.
  3. Since User B has “editor” permissions, the modification is allowed.
  4. The file metadata is updated on User A’s database shard (since User A owns the file).
  5. The Notification Service notifies both User A’s and User B’s devices of the change.

Security Considerations:

  • All share links include cryptographically secure tokens to prevent guessing.
  • Share permissions can be revoked instantly by the owner, and the change propagates immediately.
  • Audit logs track all sharing and access events for security monitoring.
  • Rate limiting prevents abuse, such as sharing files with thousands of users rapidly.

Shared Folder Consistency:

Since shared folders span database shards (owner’s shard and participants’ shards), maintaining consistency requires careful design. We use eventual consistency for share metadata propagation but strong consistency for file content updates.

When a share is created, the share record on the owner’s shard is the source of truth. Other shards maintain cached references that are periodically refreshed. This tradeoff allows us to avoid distributed transactions while still providing a consistent user experience.

Deep Dive 6: How do we optimize bandwidth usage to minimize transfer times and costs?

Bandwidth is expensive and often the bottleneck for users, especially on mobile networks. We employ multiple strategies to optimize bandwidth usage.

Adaptive Compression:

Not all files benefit from compression. Compressing a JPEG or MP4 file (already compressed formats) wastes CPU cycles without reducing size. Our compression manager intelligently decides whether to compress based on file type and compressibility.

For known compressible types like text, logs, JSON, XML, HTML, and source code, we always compress using zlib with compression level 6 (balanced speed and ratio).

For unknown types, we sample compress the first 1MB. If compression reduces size by more than 20%, we compress the entire file. Otherwise, we skip compression.

We also adjust compression level based on context:

  • Level 1 (fastest) for mobile uploads on cellular networks to conserve battery.
  • Level 6 (balanced) as the default for desktop.
  • Level 9 (maximum) for cold storage and archival.

Parallel Chunk Transfers:

Instead of transferring chunks sequentially, we use 4-8 parallel TCP connections. This saturates available bandwidth and reduces the impact of latency, especially on high-latency connections like satellite or long-distance links.

TCP Connection Pooling:

Establishing a new TCP connection for each chunk is inefficient due to TCP slow-start. We maintain a pool of 4-8 persistent connections to the Upload Service and reuse them for multiple chunk uploads. This keeps connections in the high-throughput state.

Resume Capability:

Since each chunk has a deterministic hash based on its content, uploads and downloads are fully resumable. If a transfer is interrupted (network failure, app crash, device sleep), the client can resume from the last successfully transferred chunk.

The Upload Service accepts chunks idempotently: uploading the same chunk twice is a no-op. This allows clients to retry without worrying about duplicates.

Bandwidth Throttling:

Users can configure maximum upload and download speeds to prevent Dropbox from saturating their internet connection and affecting other applications. We implement throttling using a token bucket algorithm, which allows bursts while maintaining an average rate limit.

Smart Sync Scheduling:

Not all sync operations are equally urgent. We prioritize:

  • Small files over large files (complete faster, provide quicker feedback).
  • Frequently accessed files over rarely accessed files.
  • User-initiated downloads (explicit user action) over background syncs.
  • Recently modified files over older changes.

Large files like videos are synced in the background with lower priority. If the user needs them immediately, they can trigger a high-priority download manually.

Selective Sync:

Users can choose which folders to sync to each device. For example, a desktop might sync the entire account, while a laptop syncs only work folders, and a phone syncs only photos. This reduces bandwidth usage and storage requirements on individual devices.

Deep Dive 7: How do we handle offline mode seamlessly?

Users expect Dropbox to work offline and automatically sync when connectivity returns. This requires robust local state management and conflict resolution.

Local SQLite Database:

Each client maintains a SQLite database that mirrors relevant metadata from the server. This database includes:

local_files table: Tracks all files synced to this device with their paths, content hashes, chunk manifests, versions, modification timestamps, sync status (synced, pending_upload, pending_download, conflict), and last sync timestamp.

pending_operations table: Queues operations performed while offline, including creates, updates, deletes, and moves. Each operation record includes the file ID, operation type, operation data (JSON with details), creation timestamp, and retry count.

Offline Operation Queueing:

When the device is offline, all file operations are queued in the pending_operations table instead of being sent to the server immediately. The client continues to function normally from the user’s perspective: they can create, modify, and delete files as usual.

The local SQLite database is updated synchronously to reflect these operations, ensuring the local view is consistent. However, the operations aren’t applied to the server until connectivity returns.

Reconnection Flow:

When the device detects connectivity (by successfully pinging the Notification Service or API Gateway), it initiates a multi-step reconnection process:

First, it establishes a WebSocket connection to the Notification Service to receive real-time updates going forward.

Second, it calls the Delta Sync endpoint with its last known sync cursor to retrieve all changes that occurred on the server while it was offline. This includes modifications from other devices and other users for shared files.

Third, it applies these remote changes to the local filesystem and database, downloading necessary chunks. This ensures the local state reflects the latest server state.

Fourth, it processes its pending operations queue. For each queued operation, it attempts to execute it against the server (upload changed files, delete removed files, etc.).

Conflict Detection During Reconnection:

Conflicts arise when the same file was modified both locally (while offline) and remotely (by another device). The version number mismatch immediately reveals the conflict.

When the client attempts to upload a local change with version 41, but the server’s current version is 43 (due to remote changes), the server returns a conflict error.

The client then downloads the remote version (version 43) and applies the conflict resolution strategy: Last-Write-Wins with conflict copies. The remote version becomes the canonical file, and the local version is saved as a conflict copy with a descriptive name.

Both files are retained locally, and the user is notified. The pending operation is marked as resolved and removed from the queue.

Retry Logic:

If operations fail due to transient errors (network hiccup, server temporarily unavailable), they remain in the pending operations queue with an incremented retry count. The client uses exponential backoff for retries to avoid overwhelming the server.

After a maximum number of retries (e.g., 10), operations are marked as failed, and the user is notified to take manual action.

Optimistic UI Updates:

The client provides optimistic UI updates for pending operations. For example, when the user deletes a file while offline, it immediately disappears from the UI, even though the server deletion is queued. This provides a responsive user experience.

If the operation later fails (e.g., due to a conflict or permission change), the client rolls back the UI update and notifies the user.

Offline Conflict Example:

Consider this scenario:

  • User modifies document.txt offline on their laptop.
  • Simultaneously, they modify the same file on their desktop (which is online).
  • The desktop’s change syncs to the server, incrementing the version to 11.
  • The laptop reconnects and attempts to upload its change with expected version 10.
  • The server detects the version mismatch and returns a conflict error.
  • The laptop downloads version 11 (desktop’s change) and saves its local change as “document (Laptop’s conflicted copy).txt”.
  • Both files are retained, and the user can manually merge if needed.

This ensures no data loss while maintaining consistency across devices.

Step 4: Wrap Up

In this comprehensive design, we’ve architected a production-grade file storage and synchronization system capable of handling hundreds of millions of users and exabytes of data.

Key Technical Achievements:

Efficient Storage: Content-defined chunking with Rabin fingerprinting achieves 50-70% deduplication across users, reducing storage requirements from 10 exabytes to 5 exabytes. Combined with cross-user deduplication and intelligent chunk sizing, this minimizes both storage costs and bandwidth usage.

Fast Synchronization: Sub-2-second P99 sync latency is achieved through WebSocket-based real-time notifications, delta sync that transfers only changed chunks, parallel chunk transfers to saturate bandwidth, and optimized network protocols. Users experience near-instantaneous synchronization across devices.

Conflict Resolution: Robust conflict detection using optimistic locking with version numbers ensures strong consistency. Last-Write-Wins with conflict copies preserves all user data while providing deterministic resolution. More sophisticated strategies like three-way merge can be applied to text files for better automatic merging.

Horizontal Scalability: Database sharding by user ID allows horizontal scaling of metadata storage. Stateless microservices (Upload, Metadata, Sync, Notification) scale independently based on load. Block storage leverages cloud object storage (S3, GCS) that scales automatically. Redis caching reduces database load for hot data.

Bandwidth Optimization: Adaptive compression applies compression only when beneficial. Parallel chunk transfers and TCP connection pooling maximize throughput. Resume capability prevents wasted bandwidth on interrupted transfers. Smart sync scheduling prioritizes urgent operations.

Offline Support: Local SQLite database and pending operations queue enable full offline functionality. Delta sync on reconnection efficiently catches up with remote changes. Conflict resolution ensures data consistency despite concurrent offline modifications.

Technology Stack:

  • Storage: Amazon S3 or Google Cloud Storage for block storage, PostgreSQL (sharded by user_id) for metadata, Redis for caching and pub/sub.
  • Compute: Stateless microservices deployed on Kubernetes or similar orchestration platform.
  • Communication: WebSocket for real-time notifications, REST API for file operations, gRPC for internal service communication.
  • Client: Native desktop applications (Electron or native), mobile apps (iOS Swift, Android Kotlin), web application (React or similar).

Scaling Numbers:

  • 500 million total users, 100 million daily active users.
  • 5 exabytes of deduplicated storage (15 exabytes with 3x replication).
  • 35,000 file operations per second at peak.
  • 1.5 PB of metadata storage.
  • 2 PB of data uploaded daily.
  • 99.95% availability with less than 22 minutes downtime per month.
  • 99.999999999% (11 nines) durability through S3’s redundancy.

Bottlenecks and Solutions:

Bottleneck 1: Metadata Service Hotspots

Popular shared folders accessed by millions of users can create hotspots on specific database shards. We mitigate this with read replicas to distribute read load, aggressive caching in Redis with short TTLs for consistency, and CDN caching for public file listings.

Bottleneck 2: Upload Service at Peak Load

During peak hours, millions of concurrent uploads can overwhelm the Upload Service. We horizontally scale upload servers based on load (hundreds of instances during peak), use S3’s multipart upload API for parallelism, implement client-side rate limiting to smooth traffic spikes, and employ autoscaling based on CPU and network metrics.

Bottleneck 3: Notification Service Connection Limits

With 100 million DAU and 3 devices each, we need to maintain 300 million WebSocket connections. Each notification server handles approximately 50,000 connections, requiring 6,000 servers. We store connection state in a Redis cluster (deviceId to serverId mapping), implement graceful failover when servers crash or restart, and use connection draining during deployments to avoid disruption.

Bottleneck 4: Chunk Deduplication at Scale

Checking 500 billion chunks for deduplication requires an efficient index. We use Bloom filters in Redis for quick negative lookups (this chunk definitely doesn’t exist), maintain a distributed chunk index in a Redis cluster sharded by chunk hash, accept eventual consistency for deduplication (duplicate chunks may be stored briefly until cleanup), and implement batch deduplication checks to reduce round trips.

Further Optimizations:

Machine Learning for Prefetching: Use ML models to predict which files a user will access next and prefetch them. For example, if a user opens a project folder, prefetch recently modified files in that folder. If a user is commuting home, prefetch starred files for offline access.

Intelligent Chunk Sizing: Adapt chunk size based on file type. Large video files use 8MB chunks (fewer chunks reduce metadata overhead). Source code uses 1MB chunks (finer-grained deduplication for frequent small changes).

Edge Caching: Deploy edge servers in major cities to cache popular chunks. Popular files like company handbooks are served from edge locations, reducing S3 egress costs and improving latency for users.

Peer-to-Peer Sync: For users with multiple devices on the same local network, sync directly between devices without going through the cloud. A desktop and laptop on the same LAN can sync at 100 MB/s instead of being limited by the slower internet uplink and downlink speeds.

Predictive Conflict Resolution: Use file type and user context to choose optimal resolution strategies. Binary files like images and videos use Last-Write-Wins. Text files like code and documents use three-way merge. Structured data like JSON uses operational transformation.

Monitoring and Observability:

Key metrics to track system health:

  • Sync Latency: P50, P99, P99.9 time from file modification to notification delivery on other devices.
  • Upload/Download Throughput: Aggregate gigabytes per second across all users.
  • Deduplication Ratio: Percentage of bandwidth and storage saved through deduplication.
  • Conflict Rate: Number of conflicts per million file modifications.
  • Error Rate: Failed uploads, downloads, and API requests per million operations.

Critical alerts:

  • P99 sync latency exceeds 5 seconds: Page on-call engineer.
  • Error rate exceeds 1%: Page on-call engineer.
  • Deduplication ratio drops below 40%: Investigate potential chunking issues.
  • S3 error rate exceeds 0.1%: Check AWS status and consider failover.

Security Considerations:

Encryption at Rest: All chunks stored in S3 are encrypted using AES-256. Each user has a unique encryption key stored in AWS KMS or similar key management service.

Encryption in Transit: All client-server communication uses TLS 1.3. WebSocket connections use WSS (WebSocket Secure) for encrypted real-time notifications.

Zero-Knowledge Encryption: As a premium feature, offer client-side encryption where files are encrypted before upload. The server never sees plaintext, providing maximum privacy. Tradeoff: server-side features like preview and search don’t work.

Access Control: Fine-grained permissions with central enforcement in the Metadata Service. Audit logs record all file access and sharing changes for compliance and security monitoring.

Rate Limiting: Per-user rate limits prevent abuse (e.g., 1000 API requests per minute). Exponential backoff for failed operations prevents client bugs from overloading the system.

Conclusion:

Designing Dropbox requires balancing numerous tradeoffs: consistency versus availability, bandwidth versus latency, deduplication benefits versus computational overhead, and simplicity versus features. Our architecture achieves production-grade reliability through content-defined chunking for efficient storage, delta sync for fast incremental updates, robust conflict resolution for multi-device scenarios, sharded metadata storage for horizontal scalability, and WebSocket notifications for real-time synchronization.

This design scales to billions of files and millions of concurrent users while maintaining sub-second sync latency and 99.95% availability. The key insight is that file synchronization is fundamentally a distributed systems problem requiring careful attention to consistency, concurrency, conflict resolution, and efficient data transfer.

At massive scale, additional optimizations like edge computing, ML-based prefetching, and peer-to-peer sync further enhance performance and reduce costs. However, the core architecture remains sound and incrementally scalable: you can start with a single server and scale horizontally as demand grows, adding components and shards as needed.

The beauty of this design is its modularity: each component (Upload Service, Metadata Service, Sync Service, Notification Service, Block Storage) can be scaled, optimized, and upgraded independently without affecting the others. This separation of concerns enables continuous improvement and adaptation to changing requirements.