Handling Large Blobs
A comprehensive guide to handling large files through presigned URLs, direct uploads, resumable transfers, and CDN distribution...
Managing large files—videos, images, documents, and other binary data—presents unique challenges in distributed systems. While small JSON payloads and form submissions flow naturally through application servers, routing gigabyte-sized files through those same servers creates bottlenecks, wastes resources, and provides a terrible user experience. Modern architectures solve this by separating storage from compute, using blob storage services for the data itself while application servers handle only access control and metadata management. This post explores the patterns that enable efficient, scalable handling of large files without forcing valuable server resources to act as dumb data proxies.
The Fundamental Problem: The naive approach to file handling routes every byte through your application servers. A client uploads a 2GB video by sending it to your API server, which receives the entire file and then forwards it to storage. Downloads work similarly—the storage service sends files to your API server, which forwards them to clients. This proxy pattern works adequately for small files but breaks down catastrophically as file sizes increase.
Consider the resources consumed when proxying a 2GB upload. Your application server must receive 2GB over the network, hold portions of it in memory while processing, then send those same 2GB to storage over another network connection. This ties up server memory, network bandwidth, and CPU cycles for minutes per file. A server that could handle thousands of lightweight API requests per second might struggle with just a handful of concurrent large file uploads. You’re using expensive application infrastructure as a glorified data pipe, adding latency and cost while contributing no value to the transfer.
The problem compounds with downloads. Users requesting large files from blob storage must wait for your servers to fetch files from storage and forward them. Geographic distribution makes this worse—a user in Sydney requesting a file stored in Virginia-based S3 must wait for your Virginia application server to fetch it from nearby storage, then send it across the Pacific. You’re forcing traffic through an inefficient path when cloud providers already offer global infrastructure designed specifically for serving large files efficiently.
Beyond the technical waste, the user experience suffers. Upload failures at 99% completion force users to start over from scratch. No progress visibility exists when uploads bypass your servers. Database state shows files as “pending” while storage might already have the complete file, or the reverse—marking uploads complete when they haven’t actually finished. The simple act of moving bytes has become a distributed systems problem involving eventual consistency, state synchronization, and coordination challenges.
Blob Storage as the Foundation: The first principle of handling large files is storing them separately from your primary database. Databases excel at structured data with complex queries but perform terribly with large binary objects. A 100MB video stored as a database BLOB degrades query performance, explodes backup times, slows replication, and wastes expensive database storage. Object stores like Amazon S3, Google Cloud Storage, or Azure Blob Storage are purpose-built for this workload: unlimited capacity, extreme durability (99.999999999% for S3), automatic replication, and per-object pricing.
As a practical guideline, files exceeding 10MB should nearly always reside in blob storage rather than databases. This threshold depends on your infrastructure, but 10MB is where the pain becomes real. Smaller files—thumbnails, avatars, small documents—can reasonably live in databases as they don’t significantly impact performance. But anything approaching double-digit megabytes belongs in dedicated object storage.
Blob storage solves the storage problem but doesn’t inherently solve the transfer problem. Cloud providers offer the infrastructure, but getting data to and from that infrastructure efficiently requires architectural patterns that bypass your application servers.
Direct Upload Through Presigned URLs: Instead of proxying data through your servers, you grant clients temporary, scoped credentials to interact directly with storage. Your application server’s role shifts from data transfer to access control—validating requests, generating credentials, and stepping aside while clients interact directly with storage.
The mechanism is presigned URLs, which major cloud providers support under various names but with identical concepts. When a client wants to upload a file, your server doesn’t receive the file itself. Instead, it receives a request for upload permission. After validating the user and checking quotas, your server generates a temporary upload URL encoding permission to upload one specific file to one specific location for a limited time, typically 15 minutes to an hour.
Generating presigned URLs happens entirely in your application’s memory with no network call to blob storage required. Your server uses its cloud credentials to create a cryptographic signature that the storage service can verify later. The URL contains several components: the storage endpoint, the object key (path where the file will be stored), expiration time, and a signature created by hashing all these parameters with your secret key. When the client uploads to this URL, the storage service recalculates the hash using its copy of your credentials. If signatures match and the URL hasn’t expired, the upload proceeds.
This signature-based approach enables powerful restrictions. You can encode conditions into the signature that the storage service validates during upload. Content-length-range parameters set minimum and maximum file sizes, preventing someone from uploading 10GB when you expect 10MB. Content-type restrictions ensure that profile picture endpoints only accept images, not videos. These constraints become part of the signature, so URLs generated for 5MB image uploads will reject 500MB video uploads.
From the client’s perspective, uploading is straightforward: perform an HTTP PUT to the presigned URL with the file as the request body. They’re uploading directly to storage at full speed without application servers touching the bytes. A Sydney user uploads to Sydney’s regional storage at local network speeds while your Virginia servers handle other requests. Your infrastructure scales independently—upload capacity scales with cloud storage bandwidth, not your application server count.
Direct Downloads and CDN Distribution: Downloads mirror uploads, either directly from blob storage or through content delivery networks. For direct downloads, you generate signed URLs granting temporary read access to specific files. The mechanics are identical to uploads: your server creates signed URLs, clients download directly from storage, your servers never touch the bytes.
However, direct blob storage downloads miss significant optimization opportunities. CDNs solve geographic latency and reduce origin load through global edge locations that cache content. When you generate a signed CloudFront URL, the first user pulls from origin storage, but subsequent users in that region receive cached copies with single-digit millisecond latency. The difference between 200ms cross-ocean latency and 5ms local cache access multiplied across hundreds of requests creates dramatically better user experiences.
The key distinction is that CDN signatures differ from storage signatures in how they’re validated. Storage signatures (like S3 presigned URLs) are validated by the storage service itself using your cloud credentials. The storage service has your secret key and verifies you generated the signature. CDN signatures (like CloudFront signed URLs) use public-private key cryptography validated by CDN edge servers worldwide. You hold the private key and sign URLs. Edge locations have the corresponding public key and validate signatures locally without calling back to origin servers or storage.
Resumable Uploads for Large Files: Simple direct uploads work well for files under a few hundred megabytes, but larger files require resumability. Consider a 5GB video upload over a 100Mbps connection—that’s over seven minutes of transfer time. If the connection drops at 99%, forcing users to restart from scratch is unacceptable.
All major cloud providers solve this through chunked upload APIs, though implementation details vary. AWS S3 uses multipart uploads where files are split into 5MB or larger parts, each with its own presigned URL. Google Cloud Storage and Azure use session-based approaches where you upload chunks with range headers to the same endpoint. Both patterns enable resuming from failed chunks without restarting entire uploads.
The chunked upload flow begins with your server initiating a multipart upload session and generating presigned URLs for each part. For a 5GB file divided into 100MB parts, you’d generate 50 URLs. Clients upload parts in parallel or sequentially, tracking completion checksums returned by the storage service. These checksums are hashes of uploaded data, enabling verification that received data matches what was sent.
When connections fail mid-upload, clients query the storage API to determine which parts uploaded successfully. The storage service maintains this state using the session identifier (upload ID in S3, resumable upload URL in Google Cloud Storage). If parts 1-60 succeeded but part 61 failed, the client resumes from part 61 without re-uploading successful parts. This is especially critical for mobile users on unreliable connections or massive files taking hours to upload.
Progress tracking emerges naturally from chunked uploads. As each part completes, you know the exact percentage. Simple client-side arithmetic provides accurate progress bars without complex server-side state management. After all parts upload, clients must call a completion endpoint with part numbers and checksums. The storage service assembles parts into the final object. Until completion succeeds, you have parts in storage but no accessible file. Incomplete multipart uploads cost money, so lifecycle rules should automatically clean them up after 24-48 hours.
State Synchronization Challenges: Moving servers out of the critical path solves the bottleneck but introduces distributed state management complexity. The standard pattern stores file metadata in your database while actual files live in blob storage. This gives you the best of both worlds: fast metadata queries with unlimited file storage. However, keeping these separate systems synchronized becomes tricky.
Consider a file storage system where your metadata table tracks user files with columns for filename, size, content type, storage location, and status. The status column indicates whether files are ready for download or still uploading. With direct uploads, keeping this synchronized requires careful design because your database and blob storage are separate systems updating at different times.
The simplest approach is trusting clients to report completion. After uploading to blob storage, clients call your API saying “upload complete,” and you update database status. This creates several problems: race conditions where databases show “completed” before files exist in storage, orphaned files when clients crash after uploading but before notifying you, malicious clients marking uploads complete without actually uploading, and network failures preventing completion notifications from reaching servers.
Most blob storage services solve this through event notifications. When S3 receives a file, it publishes events through messaging services with details about what was uploaded. Events include the object key—the same storage path you stored in your database when generating the presigned URL. This enables finding the exact database row to update. Now the storage service itself confirms what exists, removing clients from the trust equation.
However, events can fail too—network issues, service outages, or processing errors might delay or lose events. Production systems add reconciliation as a safety net. Periodic background jobs check for files stuck in pending status and verify them against storage. Query the database for records in pending status for over a certain threshold, then query the storage service to see if files actually exist. If they do, update status to completed. If not, mark them as failed and potentially notify users.
With events as your primary update mechanism and reconciliation catching stragglers, you maintain consistency without sacrificing the performance benefits of direct uploads. The small delay in status updates is a reasonable trade-off for not proxying gigabytes through servers.
Preventing Abuse and Ensuring Security: Presigned URLs grant powerful capabilities, making abuse prevention critical. The most effective protection is quarantine processing. Uploads go into a quarantine bucket first. Run virus scans, content validation, and any other checks before moving files to the public bucket. This prevents malicious content from being immediately accessible even if someone uploads it.
Implement automatic content analysis: image recognition to detect inappropriate content, file type validation ensuring “photos” aren’t executables, size checks preventing storage bombs. Only after checks pass do you move files to final locations and update database status to “available.” This approach is far more robust than real-time abuse detection. Even if someone bypasses rate limiting and uploads malicious content, they cannot use it until your systems approve it.
Processing delay naturally throttles abuse—attackers cannot immediately see if uploads worked, making automation harder. Always include file size limits in presigned URL conditions. Without these, someone could upload terabytes on URLs meant for small images, exploding storage costs. Set minimum and maximum sizes based on expected use cases.
Consider rate limiting at multiple levels: per-user daily upload limits, per-endpoint request limits, and total storage quotas. These limits should apply when generating presigned URLs, not just during the actual upload. This prevents attackers from obtaining thousands of valid URLs and using them later to bypass rate limits.
Optimizing Downloads: Direct downloads from blob storage work but miss huge optimization opportunities. CDNs solve the geography problem by caching content at edge locations worldwide. When generating signed CloudFront URLs, first users pull from origin, but subsequent users in that region receive cached copies with millisecond latency. The difference between 200ms origin latency and 5ms edge latency compounds across hundreds of requests.
For large files, CDNs alone don’t fully solve the problem. A 5GB file still takes minutes to download, and connection breaks force restarts. HTTP range requests enable resumable downloads by requesting specific byte ranges:
GET /large-file.zip
Range: bytes=0-10485759 (first 10MB)
This enables resumable downloads where clients track completed ranges and request only missing pieces after reconnection. Modern browsers and download managers handle this automatically if storage and CDN support range requests, which they universally do. Ensure signed URLs don’t restrict HTTP verbs or headers needed for range requests.
For extreme cases like distributing massive datasets or game assets, parallel chunk downloads can help. Split files into parts, download 4-6 chunks simultaneously, then reassemble client-side. This can triple or quadruple speeds by working around per-connection throttling. However, this complexity is rarely worth it since most users are limited by total bandwidth, not per-connection limits.
The pragmatic approach is serving everything through CDNs with appropriate cache headers, ensuring range requests work for large files, and letting CDNs and browsers handle optimization. Only consider exotic approaches like parallel downloads if distributing multi-gigabyte files and users complain about speeds despite good connectivity.
Cloud Provider Variations: Each cloud provider implements these patterns with different terminology and APIs. AWS uses presigned URLs for uploads and multipart upload APIs. Google Cloud Storage uses signed URLs and resumable uploads. Azure uses Shared Access Signature (SAS) tokens and block blobs. Despite naming differences, the underlying concepts are identical: cryptographically signed temporary credentials enabling direct client-to-storage communication without routing through application servers.
For CDN distribution, AWS offers CloudFront with signed URLs and cookies. Google Cloud CDN uses signed URLs. Azure CDN integrates with SAS tokens. Event notifications vary too: S3 events route to Lambda, SNS, or SQS; Google Cloud Storage uses Pub/Sub to Cloud Functions; Azure uses Event Grid. Understanding these provider-specific implementations helps during interviews when discussing concrete architectures, but the patterns are universal.
When to Apply These Patterns: The decision point is simple: files exceeding 10MB should trigger consideration of these patterns. The exact threshold depends on infrastructure, but 10MB is where traditional proxying becomes painful. Anything smaller—JSON payloads, form submissions, small images—should use normal API endpoints. The two-step dance of generating presigned URLs adds latency and complexity with no real benefit for small files.
However, there are scenarios where these patterns don’t apply even for large files. Synchronous validation requirements where you must reject invalid data before accepting uploads necessitate proxying. For example, CSV imports requiring header and data type validation before confirmation need to see bytes as they flow. Compliance and data inspection for regulatory requirements, such as scanning for credit card numbers or enforcing HIPAA, requires certified systems to see all data. When user experience demands immediate feedback based on file contents—like profile photos appearing instantly with face detection—the asynchronous nature breaks expected user experience.
For the vast majority of large file scenarios—video uploads, photo sharing, file sync, media in messaging—direct upload and download patterns dramatically improve performance, reduce server load, and provide better user experiences. Recognize when you’re in the majority case and confidently apply these patterns rather than defaulting to proxying simply because it’s familiar.
Handling large files efficiently requires understanding that your application servers should orchestrate access rather than proxy data. Generate presigned URLs granting temporary, scoped permissions for clients to interact directly with blob storage. Implement chunked uploads with resumability for large files. Use event notifications and reconciliation to keep metadata synchronized with storage state. Distribute downloads through CDNs for optimal global performance. Apply these patterns thoughtfully based on file size, validation requirements, and compliance needs. Master these approaches and you’ll architect systems that gracefully handle everything from profile photos to multi-gigabyte video uploads without your servers becoming expensive bottlenecks in the data path.