Design Coinbase
Coinbase is a cryptocurrency exchange platform that allows users to buy, sell, and store digital currencies such as Bitcoin, Ethereum, and thousands of other cryptocurrencies. It provides wallet management, real-time trading, order matching, and secure custody services for digital assets.
Designing Coinbase presents unique challenges including high-frequency order matching, real-time price feeds, secure wallet management with hot/cold storage architectures, blockchain transaction processing, regulatory compliance (KYC/AML), and handling extreme trading volumes during market volatility.
Step 1: Understand the Problem and Establish Design Scope
Before diving into the design, it’s crucial to define the functional and non-functional requirements. For a cryptocurrency exchange, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.
Functional Requirements
Core Requirements (Priority 1-3):
- Users should be able to create accounts and complete KYC (Know Your Customer) verification.
- Users should be able to deposit fiat currency (USD, EUR) and withdraw to bank accounts.
- Users should be able to deposit cryptocurrency from external wallets and withdraw to external addresses.
- Users should be able to place market orders (buy/sell at current market price) and limit orders (buy/sell at specific price).
- Users should be able to view their portfolio balances, transaction history, and real-time price charts.
- The system should match buy and sell orders efficiently and execute trades.
Below the Line (Out of Scope):
- Users should be able to stake cryptocurrencies for rewards.
- Users should be able to participate in cryptocurrency lending/borrowing.
- Users should be able to trade derivatives (futures, options).
- Users should be able to set up recurring purchases.
- Advanced trading features like stop-loss orders, trailing stops.
Non-Functional Requirements
Core Requirements:
- The system should ensure strong consistency for account balances and order matching to prevent double-spending and race conditions.
- The system should provide low-latency order execution (< 100ms p99) for competitive trading.
- The system should be highly available (99.99% uptime) as downtime during volatile markets can cause significant customer losses.
- The system must be secure with hardware security modules (HSM) for key management and multi-signature wallets for cold storage.
- The system must comply with financial regulations including KYC/AML (Anti-Money Laundering) requirements.
Below the Line (Out of Scope):
- The system should scale to handle 100k+ orders per second during peak volatility.
- The system should support multi-region deployment with disaster recovery.
- The system should provide comprehensive audit logs for regulatory compliance.
- The system should have sophisticated fraud detection and prevention mechanisms.
Clarification Questions & Assumptions:
- Scale: 10 million users, with approximately 500k daily active traders.
- Order Volume: Peak of 50k orders per second during high volatility.
- Cryptocurrencies: Support 100+ different cryptocurrencies initially.
- Trading Pairs: Support fiat-to-crypto and crypto-to-crypto pairs.
- Payment Processing: Integration with banks for ACH transfers, wire transfers, and card payments.
- Blockchain Integration: Direct integration with blockchain networks for deposits/withdrawals.
Step 2: Propose High-Level Design and Get Buy-in
Planning the Approach
Before moving on to designing the system, it’s important to plan your strategy. For a cryptocurrency exchange, we’ll organize our design around core financial primitives: accounts and wallets, order placement and matching, transaction settlement, and security/compliance.
Defining the Core Entities
To satisfy our key functional requirements, we’ll need the following entities:
User: Any registered user on the platform. Includes personal information (name, email, phone), KYC verification status (pending, verified, rejected), account creation timestamp, and security settings (2FA enabled, password hash).
Account: A financial account for a user holding fiat currency. Each user has multiple accounts (USD account, EUR account, etc.). Contains account balance, available balance (total minus pending orders), and transaction history.
Wallet: A cryptocurrency wallet for a user. Each user has a wallet per cryptocurrency (BTC wallet, ETH wallet, etc.). Contains public address, encrypted private key reference (for hot wallets), balance, and transaction history.
Order: A buy or sell order placed by a user. Includes order type (market or limit), side (buy or sell), trading pair (e.g., BTC/USD), quantity, price (for limit orders), status (pending, partial, filled, cancelled), and timestamps.
Trade: A completed trade between two orders. Records both buyer and seller order IDs, executed price, quantity, trading pair, fee charged, and execution timestamp.
Transaction: A blockchain transaction for cryptocurrency deposits or withdrawals. Includes transaction hash, blockchain network, confirmation count, status (pending, confirmed, failed), amount, and addresses (from/to).
Price: Real-time price data for trading pairs. Includes bid price (highest buy order), ask price (lowest sell order), last traded price, 24h volume, and price change percentage.
API Design
Create Account Endpoint: Used by new users to register on the platform.
POST /users/register -> User
Body: email, password, fullName, dateOfBirth
Submit KYC Endpoint: Used to submit identity verification documents.
POST /kyc/submit -> KYCStatus
Body: documentType, documentImage, selfieImage, address
Deposit Fiat Endpoint: Initiates a fiat currency deposit via bank transfer.
POST /accounts/{currency}/deposit -> DepositInstructions
Body: amount, method (ach, wire, card)
Place Order Endpoint: Places a buy or sell order for a trading pair.
POST /orders -> Order
Body: tradingPair, side, type, quantity, price (optional)
Withdraw Crypto Endpoint: Withdraws cryptocurrency to an external wallet address.
POST /wallets/{crypto}/withdraw -> Transaction
Body: toAddress, amount, memo (optional)
Get Portfolio Endpoint: Retrieves user’s complete portfolio with all balances.
GET /portfolio -> Portfolio
Response: accounts, wallets, totalValue
WebSocket Price Feed: Real-time streaming of price updates for trading pairs.
WS /prices/stream
Subscribe: tradingPairs
Message: pair, bid, ask, lastPrice, volume, timestamp
High-Level Architecture
Let’s build up the system sequentially, addressing each functional requirement:
1. Users should be able to create accounts and complete KYC verification
The core components necessary to fulfill user registration and KYC are:
- Web/Mobile Client: User-facing applications for iOS, Android, and web browsers. Interfaces with backend services via REST APIs.
- API Gateway: Entry point for all client requests. Handles cross-cutting concerns like authentication (JWT tokens), rate limiting (to prevent abuse), and request routing.
- User Service: Manages user accounts, authentication, and authorization. Stores user credentials (hashed passwords), profile information, and account settings. Integrates with 2FA providers.
- KYC Service: Handles identity verification workflow. Integrates with third-party KYC providers (e.g., Jumio, Onfido) for document verification, facial recognition, and address validation. Updates user verification status and stores compliance records.
- User Database: Stores user entities with profile data and KYC status. Uses PostgreSQL for ACID transactions and complex queries.
- Document Storage: Stores KYC documents securely. Uses S3 with encryption at rest and strict access controls.
KYC Flow:
- User creates account via POST to the register endpoint. User Service creates user record with “unverified” status.
- User submits identity documents via POST to the KYC submit endpoint.
- KYC Service uploads documents to Document Storage and calls third-party KYC provider API.
- KYC provider performs verification (typically takes minutes to hours) and sends webhook callback.
- KYC Service updates user status to “verified” and enables trading functionality.
- Notification Service sends email confirmation to user.
2. Users should be able to deposit fiat currency and withdraw to bank accounts
We need to handle traditional banking integration:
- Account Service: Manages fiat currency accounts for users. Handles deposits, withdrawals, and balance tracking. Ensures atomic updates to prevent double-spending.
- Payment Service: Integrates with banking partners and payment processors (e.g., Plaid, Stripe). Handles ACH transfers, wire transfers, and card payments. Manages payment reconciliation.
- Account Database: Stores account balances and transaction ledger. Uses PostgreSQL with strict ACID guarantees for financial accuracy.
Fiat Deposit Flow:
- User initiates deposit via POST to the accounts deposit endpoint with amount and method.
- Payment Service generates unique deposit instructions (account number, reference code).
- User transfers money from their bank to Coinbase’s bank account with reference code.
- Banking partner sends webhook when funds arrive (typically 1-5 business days).
- Payment Service verifies reference code and calls Account Service to credit user’s account.
- Account Service updates user’s USD balance atomically in database.
- Notification Service sends email/push notification confirming deposit.
3. Users should be able to deposit cryptocurrency and withdraw to external addresses
We add blockchain integration components:
- Wallet Service: Manages cryptocurrency wallets for users. Generates wallet addresses, tracks balances, and coordinates deposits/withdrawals.
- Blockchain Node Service: Runs full nodes for each supported blockchain (Bitcoin Core, Geth for Ethereum, etc.). Monitors blockchain for incoming transactions and broadcasts outgoing transactions.
- Transaction Monitor: Watches blockchain for deposits to user wallet addresses. Confirms transactions based on required confirmation count (e.g., 6 confirmations for Bitcoin).
- Wallet Database: Stores wallet addresses, balances, and transaction history. Uses PostgreSQL for consistency.
Crypto Deposit Flow:
- User navigates to deposit page and requests deposit address via GET to the wallets deposit-address endpoint.
- Wallet Service generates or retrieves existing BTC address for user and returns it.
- User sends BTC from external wallet to provided address.
- Transaction Monitor detects incoming transaction on blockchain and creates Transaction record with “pending” status.
- As blockchain confirmations accumulate, Transaction Monitor updates confirmation count.
- After required confirmations (e.g., 6 for BTC), Transaction Monitor calls Wallet Service to credit user’s wallet.
- Wallet Service updates user’s BTC balance atomically.
4. Users should be able to place market and limit orders
We introduce the trading engine:
- Order Service: Receives and validates orders from users. Checks available balance, enforces trading limits, and publishes orders to matching engine.
- Order Matching Engine: Core component that matches buy and sell orders based on price-time priority. Maintains order books for each trading pair in memory for high performance.
- Order Database: Persists orders and their status. Uses PostgreSQL for durability.
- Order Book Cache: In-memory representation of order books using Redis Sorted Sets. Enables sub-millisecond order matching.
Order Placement Flow:
- User places limit buy order for 0.1 BTC at $50,000 via POST to the orders endpoint.
- API Gateway authenticates user and forwards to Order Service.
- Order Service validates order (checks user has sufficient USD balance: 0.1 * $50,000 = $5,000).
- Order Service reserves $5,000 from available balance (available = total - reserved).
- Order Service persists order to Order Database and publishes to matching engine via Kafka.
- Matching Engine receives order and attempts to match against existing sell orders in order book.
- If match found, Trade is executed and both parties’ balances are updated. If no match, order is added to order book.
5. Users should be able to view portfolio and real-time price charts
We add market data components:
- Market Data Service: Aggregates and distributes real-time price data, order book snapshots, and trade history. Computes derived metrics like 24h volume and price changes.
- Price Feed Service: Manages WebSocket connections for real-time price streaming to clients. Subscribes to internal trade events and pushes updates to connected clients.
- Time-Series Database: Stores historical price data for charting. Uses InfluxDB or TimescaleDB optimized for time-series queries.
- Portfolio Service: Calculates user’s portfolio value by aggregating all account and wallet balances and multiplying by current prices.
Real-Time Price Feed Flow:
- Client establishes WebSocket connection to the prices stream endpoint and subscribes to BTC/USD.
- Price Feed Service adds client to subscription list for BTC/USD.
- When trades execute on BTC/USD pair, Matching Engine publishes trade event to Kafka.
- Market Data Service consumes trade event, updates latest price, and publishes price update.
- Price Feed Service receives price update and broadcasts to all subscribed WebSocket clients.
- Client receives price update and updates UI in real-time.
6. The system should match orders and execute trades efficiently
The matching engine is the heart of the exchange:
- Matching Engine: High-performance component that matches orders using price-time priority algorithm. Maintains separate order books for each trading pair. Processes orders sequentially to ensure consistency.
- Settlement Service: Handles trade settlement by updating buyer and seller account/wallet balances atomically. Calculates trading fees and credits Coinbase’s fee account.
- Trade Database: Records all executed trades for audit trail and reporting. Uses PostgreSQL for ACID guarantees.
Trade Execution Flow:
- Buy order (0.1 BTC at $50,000) enters matching engine.
- Matching engine checks sell side of BTC/USD order book for matching sell orders at $50,000 or lower.
- Matching engine finds sell order (0.1 BTC at $49,900) and creates Trade record.
- Trade is published to Settlement Service via Kafka.
- Settlement Service updates buyer and seller balances atomically in a database transaction: Buyer receives 0.1 BTC minus 0.2% fee and pays $4,990 USD, while Seller receives $4,990 USD minus 0.2% fee and loses 0.1 BTC.
- Settlement Service marks both orders as “filled” in Order Database.
- Notification Service sends trade confirmation emails/push notifications.
Step 3: Design Deep Dive
With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that separate good designs from great ones.
Deep Dive 1: How do we design the order matching engine for low-latency and high throughput?
The matching engine is the most critical component of any exchange. It must handle thousands of orders per second with minimal latency while ensuring correct order matching.
Problem:
Traditional database-backed systems are too slow for order matching. Even with optimized indexes, querying and updating the database for each order would introduce unacceptable latency (tens of milliseconds).
Solution: In-Memory Order Book with Price-Time Priority
We maintain order books entirely in memory using efficient data structures. We can use Redis Sorted Sets or custom implementations in C++/Rust for maximum performance. Each trading pair has a separate order book with buy side and sell side.
The data structure works as follows. For the buy side (bids), we use a sorted set ordered by price (descending) and time (ascending). The highest price has highest priority, and for the same price, earlier orders execute first (FIFO). For the sell side (asks), we use a sorted set ordered by price (ascending) and time (ascending). The lowest price has highest priority, and for the same price, earlier orders execute first (FIFO).
Redis Implementation:
When adding a buy order to the order book, we use ZADD with a score calculated as (MAX_PRICE - price) * 1e9 + timestamp. This ensures higher prices come first. For sell orders, the score is simply price * 1e9 + timestamp. To get the best bid (highest buy price), we use ZRANGE on the bids sorted set to get the first element. To get the best ask (lowest sell price), we use ZRANGE on the asks sorted set.
Matching Algorithm:
The order processing function first determines if the order is a market or limit order. For market orders, we execute immediately at the best available price. For limit orders, we use a more complex matching process.
For limit order execution, we iterate while the order still has remaining quantity. We get matching orders from the opposite side of the order book. If no matching orders exist, we add the order to the order book and exit. We check if the price matches - for a buy order, the buy price must be greater than or equal to the sell price. If prices don’t match, we add the order to the order book and exit. If prices match, we execute a trade between the two orders. If the order is completely filled, we break out of the loop.
High Throughput Optimizations:
We use several strategies to maximize throughput. First, each trading pair has a dedicated single-threaded matching engine to avoid lock contention. This provides deterministic ordering and eliminates race conditions. Second, we process orders in small batches (e.g., 100 orders) to improve throughput while maintaining low latency. Third, we use event sourcing where all order events are written to Kafka for durability. The matching engine can be reconstructed from the event log if it crashes.
This approach provides excellent performance characteristics with order matching latency under 1ms at p99, throughput of 50k+ orders per second per trading pair, and order book snapshot generation in under 5ms.
Deep Dive 2: How do we ensure strong consistency for account balances and prevent double-spending?
Financial systems require absolute accuracy. A single bug that allows double-spending or incorrect balances can lead to catastrophic financial losses.
Problem:
Multiple concurrent operations (order placement, trade settlement, deposits, withdrawals) can access the same account balance, leading to race conditions and inconsistent state.
Solution: Transactional Database with Pessimistic Locking
We choose PostgreSQL as our database because it provides ACID transactions, pessimistic locking, and serializable isolation level needed for financial accuracy.
The account balance schema includes a table for accounts with columns for user_id, currency, balance (must be non-negative), available_balance (must be non-negative), reserved_balance (must be non-negative), version number, and updated timestamp. We enforce a constraint that balance equals available_balance plus reserved_balance.
We also maintain a ledger_entries table that records every balance change. Each entry includes an ID, user_id, currency, amount (can be positive or negative), type (deposit, withdrawal, trade_buy, trade_sell, fee), reference_id (links to order or transaction), balance_after, and created timestamp.
Double-Entry Accounting:
Every transaction affects at least two accounts. For example, when User A buys 1 BTC for $50,000 from User B, we have multiple ledger entries: User A’s USD account is debited $50,000, User B’s USD account is credited $50,000, User B’s BTC wallet is debited 1 BTC, User A’s BTC wallet is credited 1 BTC, User A pays a $100 fee (debit), and Coinbase’s fee account receives $100 (credit).
Optimistic Locking with Version Numbers:
To reserve balance for an order, we use an UPDATE statement that modifies available_balance and reserved_balance only if the version number matches the expected value. The query decreases available_balance by the order amount, increases reserved_balance by the same amount, increments the version number, and updates the timestamp. The WHERE clause checks that the user_id and currency match, that available_balance is sufficient, and that the version matches the expected version. If no rows are updated, either there was insufficient balance or a concurrent modification occurred.
Pessimistic Locking for Trade Settlement:
For trade settlement, we use a transaction with SERIALIZABLE isolation level. We first lock all involved accounts using SELECT FOR UPDATE. This locks both the buyer and seller accounts for both currencies (USD and BTC in this example). Then we perform all balance updates. We also insert all corresponding ledger entries for the double-entry accounting system. Finally, we commit the transaction. All changes either succeed together or fail together, ensuring consistency.
Reconciliation Service:
We run periodic jobs to verify account balances match ledger entries. This service detects and alerts on any discrepancies. We use checksums and cryptographic signatures for an audit trail.
Deep Dive 3: How do we architect hot and cold wallet storage for security?
Cryptocurrency exchanges are prime targets for hackers. Proper wallet architecture is essential to protect user funds.
Problem:
Storing all user funds in hot wallets (connected to internet) exposes them to hacking risks. But storing all funds in cold wallets (offline) makes withdrawals slow and operationally complex.
Solution: Multi-Tiered Hot/Cold Wallet Architecture
We implement three wallet tiers. Hot wallets are online wallets for immediate withdrawals and hold 5-10% of total funds. Warm wallets are semi-online with manual approval for large transfers and hold 20-30% of funds. Cold wallets are offline storage in hardware security modules and hold 60-75% of funds.
Hot Wallet Architecture:
Hot wallets run on secure servers with encrypted private keys. Keys are stored in Hardware Security Modules (HSM) like AWS CloudHSM or Thales Luna. Each cryptocurrency has multiple hot wallet addresses for security diversification. Automated withdrawals are allowed up to certain limits (e.g., $10k per user per day).
For example, a Bitcoin hot wallet setup might include Hot_Wallet_1 with 10 BTC for withdrawals 0-50 BTC per day, Hot_Wallet_2 with 10 BTC as backup, and Hot_Wallet_3 with 5 BTC for smaller transactions. Each wallet has its private key stored in HSM, multi-factor authentication for access, IP whitelist for signing servers, and rate limiting on transactions.
Cold Wallet Architecture:
Cold wallets use multi-signature wallets requiring 3-of-5 signatures to move funds. Private keys are stored on hardware wallets (Ledger, Trezor) in geographically distributed safe deposit boxes. Air-gapped signing is used where transaction details are transferred via QR codes, signed offline, and then broadcasted. Quarterly audits with proof of reserves are conducted.
For example, a Bitcoin cold wallet setup uses a 3-of-5 multisig configuration where Key 1 is in a hardware wallet in NYC safe deposit box (Executive 1), Key 2 is in SF (Executive 2), Key 3 is in London (Executive 3), Key 4 is in Singapore (Executive 4), and Key 5 is with an external custodian (Backup). To move funds, any 3 of the 5 key holders must sign the transaction.
Automated Hot Wallet Replenishment:
A monitoring function checks hot wallet balances for each cryptocurrency. For each crypto, if the hot balance falls below a minimum threshold (e.g., 10 BTC), the system calculates the required amount to reach target balance and creates a warm-to-hot transfer request. This request requires manual approval from 2 executives. After approval, the transaction is signed and broadcast, then wallet balances are updated.
Withdrawal Flow:
When a user requests withdrawal of 0.5 BTC, the Wallet Service checks withdrawal limits and fraud scores. If the amount is below the daily limit and the risk score is below the threshold, the system automatically signs and broadcasts from hot wallet. If the amount exceeds the daily limit or has high risk, it’s flagged for manual review. The security team investigates and if approved, signs from warm wallet. The transaction is broadcast to the blockchain and the system monitors confirmations, updating the user balance after the threshold is met.
Security Measures:
We implement multiple security layers. Multi-signature requires multiple approvals for all large transfers. Time locks mandate a 24-hour delay for withdrawals above threshold. Address whitelisting allows users to whitelist withdrawal addresses with a 48-hour activation delay. Anomaly detection uses ML models to detect suspicious withdrawal patterns. Insurance protects cold storage funds against theft and loss.
Deep Dive 4: How do we process blockchain transactions and handle confirmations?
Blockchain transactions require monitoring multiple networks, handling different confirmation requirements, and dealing with reorganizations.
Problem:
Each blockchain has different characteristics: confirmation times, finality guarantees, fee mechanisms, and edge cases (orphaned blocks, chain reorganizations).
Solution: Blockchain-Specific Transaction Processors
We run full nodes for each blockchain network to avoid third-party dependencies. We have dedicated transaction processor services per blockchain and use an event-driven architecture with transaction state machines.
Transaction State Machine:
The transaction lifecycle includes several states: INITIATED (user requested withdrawal), PENDING (transaction broadcast to network), CONFIRMING (transaction included in block, accumulating confirmations), CONFIRMED (required confirmations reached), FAILED (transaction failed or rejected), and REORGED (block containing transaction was reorganized out).
State transitions work as follows: INITIATED transitions to PENDING when transaction is signed and broadcast. PENDING transitions to CONFIRMING when transaction is included in first block. CONFIRMING transitions to CONFIRMED when required confirmations are reached. CONFIRMING can transition to REORGED if a chain reorganization is detected. REORGED transitions back to PENDING when the transaction is re-broadcast. PENDING transitions to FAILED if the transaction is rejected by the network.
Bitcoin Transaction Processing:
We monitor the Bitcoin blockchain for new blocks. For each transaction in a block, we check if it’s a deposit transaction or a withdrawal transaction. For deposits, we identify the destination address and amount, find the user by their deposit address, and create a transaction record with 1 confirmation and CONFIRMING status. We subscribe to confirmation updates for that transaction. As confirmations accumulate, we update the transaction record. Bitcoin requires 6 confirmations for deposits. When confirmations reach 6 and status is CONFIRMING, we change status to CONFIRMED, credit the user’s wallet, and send a notification.
Ethereum Transaction Processing:
Ethereum has different considerations including faster blocks (12 seconds vs 10 minutes), smart contract interactions, gas fees, and uncle blocks. When processing an Ethereum block, we check each transaction. If the transaction is to our deposit contract address, we handle it as an ERC20 deposit. If the transaction is to one of our user deposit addresses, we handle it as an ETH deposit.
For ERC20 deposits, we decode smart contract event logs. We look for Transfer events by checking the event signature. We decode the from address, to address, and amount from the log data. If the to address is one of our user deposit addresses, we process the deposit. Ethereum requires 12 confirmations for deposits.
Handling Chain Reorganizations:
When a chain reorganization occurs, we receive notifications about orphaned blocks and new blocks. We find transactions that were in the old chain but not in the new chain. For each orphaned transaction, if it was a deposit that we already credited (status CONFIRMED), we reverse the credit and change status to REORGED. We then check if the transaction appears in the new chain. If yes, we update confirmations and change status to CONFIRMING. If no, the transaction may be dropped so we change status to PENDING and alert the security team.
For withdrawal transactions that don’t appear in the new chain, we re-broadcast the transaction to ensure it gets included.
Fee Management:
We implement dynamic fee estimation by querying the blockchain for recommended fees based on network congestion. For Bitcoin, we use Replace-By-Fee (RBF) to increase the fee if a transaction is stuck. For Ethereum, we use EIP-1559 with base fee plus priority fee mechanism for predictable fees.
Optimization: UTXO Consolidation for Bitcoin:
Bitcoin uses a UTXO model where each deposit creates a new UTXO. Over time, hot wallets accumulate many small UTXOs, which increases withdrawal transaction size and fees. During low network activity (when fees are low), we consolidate UTXOs. If we have more than 100 UTXOs and network fees are below threshold, we create a consolidation transaction that sends all UTXOs back to ourselves in a single transaction. This reduces future transaction costs.
Deep Dive 5: How do we implement KYC/AML compliance and fraud detection?
Regulatory compliance is mandatory for cryptocurrency exchanges. Failure to comply can result in fines, legal action, and loss of banking relationships.
Problem:
We must verify user identities (KYC), detect suspicious activities (AML), and report certain transactions to regulators (SAR - Suspicious Activity Reports).
Solution: Multi-Layered Compliance System
KYC Verification Pipeline:
We implement tiered verification levels. Level 1 (Basic Verification) allows trading up to $1k/day and requires email verification, phone number verification, and basic personal info (name, DOB, address). Level 2 (Full Verification) allows trading up to $50k/day and requires government-issued ID (passport, driver’s license), facial recognition (liveness check), address verification (utility bill), and background check. Level 3 (Enhanced Verification) allows unlimited trading and requires source of funds documentation, employment verification, and enhanced due diligence.
KYC Integration:
When performing KYC, we submit user documents to a third-party KYC provider. We send the document type, document image, selfie image, full name, and date of birth. If the response status is APPROVED, we update user verification to LEVEL_2 and enable trading features. If the status is REQUIRES_REVIEW, we flag for manual review with the reason provided. Otherwise, we reject the verification with the reason.
AML Transaction Monitoring:
We monitor transactions for suspicious patterns. For each transaction, we calculate a risk score and maintain a list of flags. We check for structuring (multiple transactions just below reporting threshold), rapid movement (deposit, trade, immediate withdrawal), high-risk jurisdiction involvement, and unusual patterns. If the risk score exceeds 70, we block the transaction and alert the compliance team. If the risk score is between 50 and 70, we flag for review. We log all AML checks for audit trail.
Structuring detection identifies multiple transactions 90-99% of the $10k reporting threshold within a 7-day period. If we find 3 or more such transactions, we flag it as structuring.
Rapid movement detection identifies cases where a user deposits funds, trades, and immediately withdraws a similar amount within 24 hours. This pattern suggests money laundering.
Travel Rule Compliance:
For transfers exceeding $3,000, exchanges must share sender/recipient information. When processing a withdrawal above this threshold, we check if the destination is another regulated exchange. If so, we share user information including sender name, address, date of birth, account number, amount, and cryptocurrency with the destination exchange.
Sanctions Screening:
We screen users against multiple sanctions lists including OFAC SDN (US Treasury Special Designated Nationals), UN Sanctions, and EU Sanctions. If a user’s name or address matches any list, we block their account, alert the compliance team, and report to regulators.
Deep Dive 6: How do we provide real-time price feeds and handle high-frequency updates?
Real-time price data is crucial for traders to make informed decisions. The system must distribute price updates to thousands of connected clients with minimal latency.
Problem:
Broadcasting every single trade to all connected clients is inefficient. For high-volume trading pairs, thousands of trades per second would overwhelm WebSocket connections.
Solution: Aggregation with WebSocket Broadcasting
The architecture includes three main components. The Matching Engine publishes every trade to Kafka. The Market Data Aggregator consumes trades and computes aggregated metrics. The WebSocket Gateway maintains connections to clients and broadcasts updates.
Market Data Aggregation:
We aggregate trades into 100ms windows. For each window, we record the trading pair, start and end times, first and last prices, high and low prices, total volume, and trade count. This reduces the number of messages while maintaining sufficient granularity for most use cases.
We also compute OHLCV (Open, High, Low, Close, Volume) candles for various intervals (1m, 5m, 15m, 1h, 1d). For each trade, we determine which candle it belongs to based on timestamp and interval. If the candle doesn’t exist yet, we initialize it with the trade’s open price. Otherwise, we update the candle by adjusting high (maximum of current high and trade price), low (minimum of current low and trade price), close (latest trade price), and volume (cumulative).
WebSocket Broadcasting:
The Price Feed Service maintains a subscription model. It keeps a map of trading pairs to sets of client IDs, and a map of client IDs to WebSocket connections. When a client subscribes to trading pairs, we add their client ID to the subscription set for each pair. When unsubscribing, we remove them from the sets.
To broadcast a price update, we get all clients subscribed to the trading pair. We create a JSON message with type, pair, and data. For each subscribed client, we send the message through their WebSocket connection. If sending fails (client disconnected), we remove the client from our tracking.
Redis Pub/Sub for Horizontal Scaling:
Multiple WebSocket servers can scale horizontally using Redis Pub/Sub to broadcast updates across servers. The Market Data Service (publisher) publishes price updates to Redis channels like “price_updates:BTC_USD”. WebSocket Servers (subscribers) subscribe to these channels. When a message arrives, they parse the channel to extract the trading pair, decode the price data, and broadcast to their local clients.
Order Book Snapshots:
Clients need initial order book state when connecting. To provide snapshots, we get the top N bids and asks from Redis (default 20). For bids, we use ZREVRANGE to get highest prices. For asks, we use ZRANGE to get lowest prices. We return a snapshot object with the pair, arrays of bids and asks (each with price and quantity), and timestamp.
The WebSocket flow works as follows: Client connects and subscribes to a trading pair. Server sends initial order book snapshot. Server streams incremental updates (trades, order changes). Client maintains local order book by applying updates.
Rate Limiting for Price Feeds:
To prevent abuse, we limit connections per user (maximum 5 WebSocket connections) and subscriptions per connection (maximum 50). When handling a new connection, we check the user’s current connection count. If at or over the limit, we reject with “Too many connections”. When handling a subscription request, we check current subscriptions for that connection. If adding the new subscriptions would exceed the limit, we send an error “Subscription limit exceeded”.
Deep Dive 7: How do we scale the system to handle extreme trading volume during market volatility?
During major market events (e.g., Bitcoin price surges), trading volume can spike 10-100x normal levels.
Problem:
Normal load is 5k orders/second. Peak load during volatility is 50k orders/second. The system must scale elastically without degrading performance.
Solution: Multi-Layer Scalability Strategy
Horizontal Scaling of Services:
All services are stateless and can scale horizontally. Order Service scales from 10 instances (normal) to 50 instances (peak). Settlement Service scales from 5 to 25 instances. Wallet Service scales from 5 to 20 instances. API Gateway uses auto-scaling based on request rate.
We use Kubernetes for auto-scaling with HorizontalPodAutoscaler configurations. We define minimum and maximum replicas for each service and set target CPU utilization (typically 70%). Kubernetes automatically adjusts the number of pods based on observed metrics.
Database Scaling:
For PostgreSQL, we scale read operations with read replicas, increasing from 3 replicas (normal) to 10 replicas (peak). We use connection pooling with PgBouncer configured for 1000 max connections. We partition orders and trades tables by date (e.g., orders_2024_01, orders_2024_02) to improve query performance and maintenance.
For Redis, we use Redis Cluster with 6 nodes (3 masters, 3 replicas). Each trading pair’s order book is on a separate shard. Each node has 64GB of memory.
Kafka Scaling:
Our Kafka configuration includes 12 brokers across 3 availability zones. The trade events topic has 20 partitions, as does the order events topic. We use a replication factor of 3 for durability. Each partition can handle approximately 10k messages/sec, giving us total capacity of 20 partitions * 10k = 200k messages/sec.
Matching Engine Scaling:
Each trading pair has a dedicated matching engine. We scale by distributing trading pairs across separate instances. For example, Instance_1 handles BTC/USD, ETH/USD, and BNB/USD. Instance_2 handles SOL/USD, ADA/USD, and DOT/USD. For very high-volume pairs, we dedicate an entire instance (e.g., Instance_4 handles only BTC/USD during extreme volatility).
Matching engines are single-threaded for consistency, but multiple instances provide horizontal scalability.
Circuit Breakers:
To prevent cascading failures, we implement circuit breakers. When placing an order, we first check if the circuit breaker is open for the order service. If open, we return an error “Service temporarily unavailable”. We try to process the order. On success, we record it with the circuit breaker. On exception, we record the failure. If the failure rate exceeds threshold (e.g., 50% over 10 requests), we open the circuit breaker for a duration (e.g., 60 seconds). After the duration, it enters half-open state to test if the service has recovered.
Load Shedding:
During extreme load, we prioritize critical operations. We check current system load. If load exceeds 90%, we shed load based on priority. For basic tier users making order requests, we return 429 (Too Many Requests). For portfolio requests, we serve from cache instead of real-time calculation.
Step 4: Wrap Up
In this chapter, we proposed a system design for a cryptocurrency exchange platform like Coinbase. If there is extra time at the end of the interview, here are additional points to discuss:
Additional Features:
- Staking: Allow users to stake cryptocurrencies (ETH, ADA, etc.) to earn rewards.
- Lending/Borrowing: Enable users to lend crypto for interest or borrow against their holdings.
- Derivatives Trading: Support futures, options, and perpetual swaps for advanced traders.
- Fiat On-Ramp/Off-Ramp: Direct integration with banks and payment providers for seamless deposits/withdrawals.
- Recurring Purchases: Dollar-cost averaging with automated recurring buys.
- Tax Reporting: Generate tax documents (Form 8949, Schedule D) for users.
Scaling Considerations:
- Geographic Distribution: Deploy in multiple regions (US, Europe, Asia) for low latency.
- Database Sharding: Shard by user ID or geographic region for massive scale.
- Event Sourcing: Store all state changes as immutable events for audit trail and debugging.
- CQRS Pattern: Separate read and write paths for optimized performance.
Monitoring and Observability:
- Key Metrics:
- Order placement latency (p50, p99)
- Order matching throughput
- Trade execution rate
- API error rates
- Blockchain confirmation times
- Wallet balance reconciliation status
- Hot wallet balance levels
- Alerting:
- Hot wallet balance below threshold
- Unusual withdrawal patterns
- High order rejection rate
- Database replication lag
- Failed blockchain transactions
- Distributed Tracing: Use Jaeger or Datadog to trace requests across microservices.
Disaster Recovery:
- Database Backups:
- Continuous WAL archiving to S3
- Daily full backups
- Point-in-time recovery capability
- Backup restoration testing quarterly
- Cold Wallet Recovery:
- Documented procedures for key recovery
- Geographically distributed key storage
- Shamir’s Secret Sharing for critical keys
- Incident Response:
- Runbooks for common failure scenarios
- War room procedures for major incidents
- Post-mortem process for learning
Security Hardening:
- Application Security:
- Input validation and sanitization
- Protection against SQL injection, XSS, CSRF
- API rate limiting per user and IP
- DDoS protection with Cloudflare
- Network Security:
- VPC isolation for internal services
- Bastion hosts for SSH access
- Security groups restricting traffic
- WAF (Web Application Firewall)
- Operational Security:
- Principle of least privilege for access control
- Regular security audits and penetration testing
- Employee background checks
- Mandatory security training
- Incident Detection:
- SIEM (Security Information and Event Management)
- Anomaly detection with machine learning
- 24/7 security operations center
Regulatory Compliance:
- Licensing: Register as Money Services Business (MSB) with FinCEN, obtain state licenses.
- Reporting: File SARs (Suspicious Activity Reports) and CTRs (Currency Transaction Reports).
- Data Retention: Retain KYC documents and transaction records for 5-7 years per regulations.
- Privacy: Comply with GDPR (Europe), CCPA (California), and other privacy laws.
- Audit Trail: Immutable logs of all system actions for regulatory audits.
Testing Strategy:
- Unit Tests: 80%+ code coverage for critical financial logic.
- Integration Tests: Test interactions between services.
- Load Tests: Simulate peak trading volumes (100k orders/sec).
- Chaos Engineering: Randomly inject failures to test resilience (Netflix Chaos Monkey).
- Security Tests: Regular penetration testing and vulnerability scanning.
Cost Optimization:
- Database: Use read replicas for queries, keep writes on primary.
- Kafka: Retain messages for 7 days, archive to S3 for long-term storage.
- Blockchain Nodes: Run own nodes to avoid third-party API costs.
- Caching: Aggressive caching to reduce database load.
- Reserved Instances: Purchase 1-3 year reserved instances for predictable workloads.
Future Improvements:
- Machine Learning:
- Price prediction models
- Fraud detection with deep learning
- Optimal order routing
- Personalized trading recommendations
- DeFi Integration: Connect to decentralized exchanges (Uniswap, PancakeSwap) for better liquidity.
- NFT Marketplace: Support NFT trading and storage.
- Institutional Services: Prime brokerage, OTC trading desk, custody services.
- Global Expansion: Support more fiat currencies and local payment methods.
Congratulations on getting this far! Designing a cryptocurrency exchange like Coinbase is one of the most complex system design challenges, combining financial systems, distributed systems, blockchain technology, security, and regulatory compliance. The key is to start with core trading functionality, ensure strong consistency for financial accuracy, implement robust security with hot/cold wallet architecture, and scale horizontally to handle extreme volatility.
Summary
This comprehensive guide covered the design of a cryptocurrency exchange platform like Coinbase, including:
-
Core Functionality: User accounts with KYC, fiat deposits/withdrawals, crypto deposits/withdrawals, order placement (market/limit), portfolio management, and order matching.
-
Order Matching Engine: In-memory order books with Redis Sorted Sets, price-time priority algorithm, single-threaded execution per trading pair for consistency, and event sourcing for durability.
-
Financial Consistency: PostgreSQL with ACID transactions, pessimistic locking for trade settlement, double-entry accounting, optimistic locking with version numbers, and ledger-based reconciliation.
-
Wallet Security: Multi-tiered hot/cold wallet architecture, Hardware Security Modules (HSM) for key storage, multi-signature cold wallets with geographic distribution, automated hot wallet replenishment, and comprehensive security measures.
-
Blockchain Integration: Full node operation for each blockchain, transaction state machines, confirmation monitoring, chain reorganization handling, dynamic fee estimation, and UTXO consolidation.
-
Compliance: KYC verification pipeline with tiered limits, AML transaction monitoring with risk scoring, Travel Rule compliance for large transfers, sanctions screening, and audit trail maintenance.
-
Real-Time Data: WebSocket price feeds with Redis Pub/Sub, market data aggregation (OHLCV candles), order book snapshots, and horizontal scaling of WebSocket servers.
-
Scalability: Horizontal scaling of stateless services, database read replicas and partitioning, Kafka for durable message queues, Redis Cluster for order books, circuit breakers and load shedding, and auto-scaling with Kubernetes.
The design demonstrates how to build a production-grade financial platform with the security, consistency, and performance required for handling billions of dollars in daily trading volume while maintaining regulatory compliance and protecting user assets.
Comments