Design PayPal
PayPal is a global digital payment platform that enables users to send and receive money electronically. It serves as an intermediary between buyers and sellers, facilitating secure financial transactions without directly sharing sensitive banking information. PayPal supports both peer-to-peer payments and merchant transactions across 200+ countries and 25+ currencies.
Designing PayPal presents unique challenges including distributed transaction processing with ACID guarantees, real-time fraud detection, multi-currency conversion, regulatory compliance, and maintaining strong consistency for financial operations while processing thousands of transactions per second.
Step 1: Understand the Problem and Establish Design Scope
Before diving into the design, it’s crucial to define the functional and non-functional requirements. For financial platforms like PayPal, functional requirements are the “Users should be able to…” statements, whereas non-functional requirements define system qualities via “The system should…” statements.
Functional Requirements
Core Requirements (Priority 1-3):
- Users should be able to send money to other users using email, phone, or PayPal ID (P2P payments).
- Users should be able to manage their wallet balance, link bank accounts and cards, and withdraw funds.
- Merchants should be able to integrate PayPal checkout and receive payments from customers.
- The system should process refunds (full and partial) with automatic reversal of fees.
Below the Line (Out of Scope):
- Users should be able to open and resolve disputes and chargebacks.
- Merchants should be able to set up recurring subscription billing.
- Users should be able to send invoices and track payment requests.
- The system should support multi-currency conversion with real-time exchange rates.
- The system should provide merchant APIs, SDKs, and webhook delivery.
Non-Functional Requirements
Core Requirements:
- The system should guarantee ACID properties for all financial transactions (no double charges, no lost money).
- The system should maintain strong consistency for account balances across distributed databases.
- The system should detect and prevent fraud in real-time with less than 100ms latency overhead.
- The system should maintain 99.99% uptime (52 minutes downtime per year maximum).
Below the Line (Out of Scope):
- The system should comply with PCI-DSS Level 1 standards for handling card data.
- The system should comply with financial regulations (KYC, AML, GDPR, PSD2).
- The system should process exactly-once payments with no duplicate transactions.
- The system should encrypt all sensitive data end-to-end.
Clarification Questions & Assumptions:
- Scale: 430 million active accounts globally, processing 20 billion transactions annually.
- Peak Load: 3,000 transactions per second during holiday shopping periods.
- Transaction Volume: $1.36 trillion in annual payment volume, average transaction $68.
- Performance Targets: P95 checkout latency under 500ms, fraud detection under 100ms, payment authorization under 200ms.
- Settlement: T+1 settlement (next business day) to merchants and financial institutions.
- Fraud Rate: Must maintain fraud rate below 0.32% of transaction volume.
Step 2: Propose High-Level Design and Get Buy-in
Planning the Approach
For financial platform designs, the approach should focus on building up the payment flow sequentially while ensuring data consistency and security at every step. We’ll start with basic payment processing, then add fraud detection, multi-currency support, and merchant integration.
Defining the Core Entities
To satisfy our key functional requirements, we’ll need the following entities:
User Account: Represents both consumers and merchants on the platform. Contains personal information, verification status (KYC level), transaction limits, linked payment instruments (tokenized), and security settings.
Wallet: Stores the current balance for each user in each supported currency. Acts as the primary account for holding funds before withdrawal to a bank account.
Payment Instrument: Represents linked bank accounts, debit cards, or credit cards. Stores only tokenized references to actual account details for PCI compliance, never raw card numbers.
Transaction: Records each payment from initiation to completion. Includes sender and recipient identities, amounts, currency, fees, current status, and timestamps for each state transition.
Ledger Entry: Immutable record of each financial movement using double-entry accounting. Every transaction creates multiple ledger entries that must sum to zero, ensuring conservation of money.
Fare: Not applicable to PayPal (this is Uber-specific).
API Design
Send Money Endpoint: Used by users to initiate peer-to-peer payments to another user.
POST /payments -> Payment
Body: {
recipient: email | phone | paypalId,
amount: number,
currency: string,
idempotencyKey: string
}
Checkout Order Endpoint: Used by merchants to create a payment order for customer checkout.
POST /checkout/orders -> Order
Body: {
amount: { currency, value },
merchantId: string,
items: array
}
Capture Payment Endpoint: Used by merchants to capture an authorized payment after the customer approves.
POST /checkout/orders/:orderId/capture -> Payment
Link Payment Method Endpoint: Allows users to link bank accounts or cards to their PayPal account.
POST /payment-methods -> PaymentMethod
Body: {
type: "bank" | "card",
accountDetails: encrypted
}
Withdraw Funds Endpoint: Allows users to transfer funds from their PayPal wallet to their bank account.
POST /withdrawals -> Withdrawal
Body: {
amount: number,
currency: string,
bankAccountId: string
}
Note: All financial endpoints require strong authentication (2FA) and include idempotency keys to prevent duplicate processing.
High-Level Architecture
Let’s build up the system sequentially, addressing each functional requirement:
1. Users should be able to send money to other users (P2P payments)
The core components necessary to fulfill peer-to-peer payments are:
- Client Applications: Mobile apps (iOS/Android) and web application where users initiate payment requests.
- API Gateway: Entry point managing authentication, authorization, rate limiting, and TLS termination. Routes requests to appropriate microservices.
- Payment Service: Orchestrates the entire payment flow from initiation to completion. Coordinates with fraud detection, ledger service, and external payment networks.
- Account Service: Manages user accounts, wallet balances, and linked payment instruments. Enforces transaction limits based on KYC verification level.
- Ledger Service: Core financial engine implementing double-entry accounting. Ensures ACID properties across distributed databases and maintains immutable transaction logs.
- Database: PostgreSQL for transactional data (accounts, payment instruments), sharded by user_id. Cassandra for event sourcing (immutable ledger entries).
P2P Payment Flow:
- Alice wants to send $100 to Bob. She enters Bob’s email in the mobile app and submits the payment.
- The API Gateway authenticates Alice’s session and forwards the request to the Payment Service.
- The Payment Service validates that Alice has sufficient balance and the transaction doesn’t exceed her limits by checking with the Account Service.
- The Payment Service begins a distributed transaction with the Ledger Service to prepare the transfer.
- The Ledger Service validates the transaction and creates pending ledger entries: debit Alice $100, credit Bob $100.
- Once prepared, the Ledger Service returns a “prepared” status indicating readiness to commit.
- The Payment Service instructs the Ledger Service to commit the transaction.
- The Ledger Service atomically updates the ledger entries to “completed” status and updates both account balances.
- The system publishes a PaymentCompleted event to Kafka for downstream processing (notifications, webhooks, analytics).
- Both Alice and Bob receive notifications about the completed transaction.
2. Users should be able to manage wallet balances and link payment methods
We extend our design with additional capabilities:
- Tokenization Vault: Separate, PCI-compliant service that stores encrypted card numbers and bank account details. Returns tokens that can be safely stored in the main database.
- Redis Cache: Stores frequently accessed data like account balances and session information for fast retrieval.
Link Payment Method Flow:
- User enters their card or bank account details in the client app.
- The details are sent directly to the Tokenization Vault via a secure, isolated connection (never touching application servers).
- The Tokenization Vault validates the payment method (Luhn algorithm for cards, micro-deposit verification for banks).
- It returns a token representing the payment method.
- The Account Service stores only the token, last 4 digits, brand, and expiry date in the database.
- When processing payments, the token is exchanged for actual payment details only within the secure vault environment.
3. Merchants should be able to integrate PayPal checkout
We introduce new components for merchant support:
- Merchant Service: Handles merchant onboarding, underwriting, API credential management, and webhook delivery.
- Fraud Detection Service: Real-time machine learning models combined with rule-based engines to score every transaction for fraud risk.
- Settlement Service: Batches merchant transactions and initiates transfers to merchant bank accounts on a T+1 schedule.
Merchant Checkout Flow:
- A customer clicks “Pay with PayPal” button on the merchant’s website.
- The merchant’s frontend SDK initiates communication with the merchant’s backend server.
- The merchant server calls PayPal’s API to create a checkout order with the purchase amount and item details.
- PayPal returns an order ID, and the customer is redirected to PayPal’s login/approval page.
- The customer logs into PayPal and reviews the payment details, then approves the transaction.
- PayPal redirects the customer back to the merchant site with the order ID.
- The merchant server calls PayPal’s API to capture the payment using the order ID.
- PayPal’s Payment Service processes the transaction, running it through fraud detection.
- If approved, the Ledger Service transfers funds from the customer’s account to the merchant’s account (minus fees).
- PayPal returns a success response with transaction ID to the merchant.
- The merchant fulfills the order and PayPal sends a webhook notification confirming the transaction.
4. The system should process refunds with automatic fee reversal
Refunds are handled as reverse transactions:
- Refund Flow: When a merchant initiates a refund, the system creates new ledger entries that reverse the original transaction. If the original payment included a PayPal fee, that fee is also reversed and credited back to the merchant. The Ledger Service ensures the refund is linked to the original transaction for audit purposes.
Step 3: Design Deep Dive
With the core functional requirements met, it’s time to dig into the non-functional requirements via deep dives. These are the critical areas that ensure financial accuracy, security, and compliance.
Deep Dive 1: How do we guarantee ACID properties for distributed transactions?
Payment processing must guarantee atomicity (all steps succeed or none do), consistency (balances are always correct), isolation (concurrent transactions don’t interfere), and durability (completed transactions are permanent).
The Challenge:
Financial transactions often span multiple database shards. Alice on Shard 0 paying Bob on Shard 3 requires coordinated updates to both databases. If one update succeeds and the other fails, we violate atomicity and could lose money.
Solution: Two-Phase Commit Protocol
We implement a distributed transaction coordinator that manages the two-phase commit process:
Phase 1 - Prepare: The coordinator contacts all participating shards and asks them to prepare the transaction. Each shard validates that it can perform its part of the transaction (sufficient balance, no locks, within limits) and creates pending ledger entries. The shard locks the required resources and responds with “prepared” status. If any shard cannot prepare, the entire transaction is aborted.
Phase 2 - Commit: Once all shards respond with “prepared,” the coordinator instructs them to commit. Each shard atomically updates its pending entries to “completed,” updates account balances, and writes to the immutable transaction log. If the coordinator crashes between phases, it can recover and either retry commits or abort based on the transaction log.
Idempotency Handling:
Every payment request includes a client-generated idempotency key. The Payment Service checks a cache (Redis) for an existing transaction with this key. If found, it returns the cached result instead of processing again. This prevents duplicate payments if the client retries after a network timeout.
The idempotency key is cached for 24 hours after the transaction completes. The cache entry stores the complete transaction result including status, transaction ID, and any error messages.
Database Sharding Strategy:
The database is sharded by user_id using consistent hashing. Each shard is a PostgreSQL instance with read replicas. With 10 shards initially, user_id modulo 10 determines the shard assignment. This distributes load evenly and allows horizontal scaling by adding more shards.
Cross-shard transactions use the two-phase commit coordinator. Single-shard transactions (uncommon for payments but common for balance queries) execute locally without coordination overhead.
Deep Dive 2: How do we implement double-entry accounting in a distributed system?
Double-entry accounting is a fundamental principle where every transaction creates two entries: a debit and a credit. This ensures the sum of all balances always equals zero, providing a self-auditing system.
Ledger Entry Structure:
Each entry in the ledger table represents a single movement of money. It contains the transaction ID, account ID, amount (positive for credit, negative for debit), currency, entry type (payment, refund, fee, withdrawal), status (pending, completed, failed), timestamp, and metadata.
The ledger entries table is append-only and immutable. Database rules prevent updates or deletions. Once written, an entry can never be modified, ensuring a complete audit trail.
Transaction Example:
When Alice sends $100 to Bob with a $2.90 PayPal fee, the system creates three ledger entries with the same transaction ID:
Entry 1: Alice’s account debited $100 (amount: -100.00, type: payment_sent) Entry 2: Bob’s account credited $97.10 (amount: +97.10, type: payment_received) Entry 3: PayPal revenue account credited $2.90 (amount: +2.90, type: fee)
The sum of all amounts equals zero: -100.00 + 97.10 + 2.90 = 0. This invariant must hold for every transaction.
Balance Calculation:
An account’s balance is computed by summing all completed ledger entries for that account. While this is the source of truth, querying the database every time would be slow.
For performance, balances are cached in Redis with a write-through pattern. When a ledger entry is created, the cache is updated atomically using Redis’s INCRBYFLOAT command. If the cache misses, the balance is recomputed from the database and populated.
Reconciliation Process:
A daily batch job validates the system’s financial integrity by checking that the sum of all ledger entries equals zero, each transaction has balanced debits and credits, cached balances match computed balances from the database, and external settlement amounts match internal records.
Any discrepancies trigger alerts for the operations team to investigate. This catches potential bugs, data corruption, or fraud attempts.
Deep Dive 3: How do we handle multi-currency conversion at scale?
PayPal supports 25+ currencies with real-time conversion when users send money across currencies.
Exchange Rate Management:
The FX Service maintains exchange rates by aggregating data from multiple providers (Bloomberg, Reuters, internal trading desks). Using multiple sources provides redundancy and helps detect anomalies.
Every 5 minutes, the service fetches rates from all providers and calculates the median rate for each currency pair. This protects against outliers or faulty data from a single provider. PayPal then applies a spread (typically 3.5% markup) to the median rate, which becomes the customer-facing exchange rate.
Rates are cached in Redis with a 5-minute TTL. When a currency conversion is requested, the service first checks the cache. On cache miss, it fetches from providers, calculates the rate, applies the spread, caches it, and returns it.
Cross-Currency Payment Flow:
When Alice with a USD account sends $100 to Bob with a EUR account, the system first fetches the current USD/EUR rate (for example, 1 USD = 0.92 EUR). After applying PayPal’s spread, the effective rate becomes 0.888.
Bob receives 100 × 0.888 = €88.80. The system creates ledger entries: debit Alice $100 USD, credit Bob €88.80 EUR, and credit PayPal’s FX revenue account the difference in USD equivalent.
Currency Hedging:
To avoid constant conversion costs and foreign exchange risk, PayPal maintains float balances in multiple currencies. When there’s an imbalance (too much EUR, not enough USD), the treasury team uses hedging strategies:
Forward contracts lock in exchange rates for future dates, allowing predictable conversion costs. Currency swaps with partner banks exchange one currency for another at agreed rates. Dynamic routing pays EUR merchants from the EUR float instead of converting from USD.
Deep Dive 4: How do merchants integrate PayPal checkout with proper security?
Merchants integrate PayPal using client-side JavaScript SDKs and server-side REST APIs. The integration must be secure to prevent tampering with payment amounts.
Checkout Integration Pattern:
The merchant never directly specifies the payment amount in the client-side code. Instead, the flow works as follows:
The customer clicks “Pay with PayPal” on the merchant site. The merchant’s client-side SDK triggers a callback function that sends a request to the merchant’s own backend server (not PayPal) to create an order. The merchant’s server validates the cart items and calculates the correct total amount server-side, then calls PayPal’s API to create an order with that amount.
PayPal returns an order ID to the merchant’s server, which passes it back to the client. The client redirects the user to PayPal’s approval page using the order ID. The customer logs into PayPal, reviews the order details, and approves the payment.
PayPal redirects the customer back to the merchant with the order ID. The merchant’s server (not the client) captures the payment by calling PayPal’s API with the order ID. PayPal validates that the order was approved and processes the payment, returning a transaction ID on success.
This pattern ensures that the payment amount is always determined by the merchant’s server, not the client browser where it could be manipulated.
Webhook Delivery for Async Events:
PayPal sends webhooks to merchant servers for important events like payment completion, refunds, or disputes. The Webhook Service uses a Kafka queue for reliable delivery.
When an event occurs, the service creates a signed payload using HMAC-SHA256 with the merchant’s secret key. This allows merchants to verify the webhook authenticity. The event is published to a Kafka topic with the merchant ID as the partition key, ensuring ordering of events per merchant.
Consumer workers pull events from Kafka and attempt to deliver them via HTTP POST to the merchant’s configured webhook URL. If delivery fails (network error, 5xx response), the system retries with exponential backoff: 1 minute, 2 minutes, 4 minutes, up to a maximum of 10 attempts over several hours.
After maximum retries, failed webhooks are stored in a dead letter queue for manual review. Merchants can also query PayPal’s API to retrieve missed events.
One Touch Checkout:
For returning customers, PayPal offers One Touch checkout that eliminates the redirect to the login page. On the first transaction, the customer approves “Stay logged in” which sets a secure, HTTP-only cookie with a refresh token.
On subsequent purchases, when the merchant’s client detects the PayPal cookie, it calls PayPal’s API to validate the token. If valid, PayPal auto-approves the payment without requiring the customer to log in again. This reduces checkout time from 30 seconds to under 2 seconds, significantly improving conversion rates.
Deep Dive 5: How do we detect and prevent fraud in real-time?
Real-time fraud detection is critical for minimizing chargebacks and protecting users. The system must score every transaction in under 100ms to avoid adding noticeable latency.
Feature Engineering:
The Fraud Service extracts over 200 features from each transaction. These features span multiple categories:
Transaction features include amount, currency, whether it’s cross-border, time of day, and day of week. Fraudsters often operate at unusual times or send money internationally.
Account features cover account age, total transaction history, average transaction size, number of linked cards, email and phone verification status, and KYC level. New accounts with little history are riskier.
Behavioral features examine time since last transaction, transactions in the last 24 hours and last hour, whether the recipient is new, and whether the amount is unusual compared to the user’s history. Sudden changes in behavior often indicate account takeover.
Device features capture device ID, IP address, user agent, location country, and VPN detection. Multiple accounts sharing a device or transactions from VPN/proxy servers are suspicious.
Network features include the recipient’s fraud score, number of disputes against the recipient, and the merchant’s overall fraud rate. If Bob receives payments from many fraud victims, transactions to Bob become riskier.
Machine Learning Model:
The system uses a Gradient Boosted Decision Tree model (XGBoost) trained on over 100 million labeled transactions. The model outputs a fraud probability score between 0 and 1.
Training data comes from two sources: transactions later disputed or charged back are labeled as fraud, while transactions older than 90 days with no disputes are labeled as legitimate. This creates ground truth labels for supervised learning.
The dataset has severe class imbalance with only 0.3% fraud and 99.7% legitimate transactions. To address this, the training pipeline uses SMOTE (Synthetic Minority Oversampling) to balance the classes before training.
Models are retrained weekly with the latest data to adapt to evolving fraud patterns. Before deployment, new models are evaluated on a holdout test set. Only models that improve precision, recall, and AUC metrics compared to the current production model are deployed.
Rule-Based Engine:
The ML model is complemented by a deterministic rule engine that checks for known fraud patterns. Rules have a condition function and an associated risk score.
Example rules include: “High-value transaction from new account” triggers if amount exceeds $1000 and account age is less than 7 days (score: 0.7). “Multiple failed payment methods” triggers if there have been more than 3 failed payment attempts (score: 0.9). “Velocity check” triggers if there are more than 10 transactions in 1 hour (score: 0.6).
Each rule that fires adds its score to a total rule score. Multiple triggered rules indicate higher risk.
Decision Logic:
For each transaction, the system runs both the ML model and rule engine in parallel. The ML model prediction has an 80ms timeout - if it doesn’t respond in time, a default medium-risk score of 0.5 is used. This ensures fraud detection never blocks payment processing indefinitely.
The final fraud score is a weighted average: 70% ML score + 30% rule score. This combines the adaptability of machine learning with the reliability of explicit rules.
Based on the final score, the system makes a decision. Scores below 0.3 are approved immediately. Scores between 0.3 and 0.7 trigger a challenge flow requiring additional verification like 2FA or 3D Secure. Scores above 0.7 result in the transaction being blocked.
All decisions and features are logged to a data warehouse for model retraining and analysis.
3D Secure Challenge Flow:
For medium-risk transactions, the system triggers 3D Secure authentication. The customer is redirected to their card issuer’s authentication page where they enter an OTP sent to their phone or use the issuer’s mobile app for biometric authentication.
The issuer confirms the customer’s identity and returns an authentication token to PayPal. With this token, PayPal completes the payment with liability shift - meaning the card issuer is now responsible for fraud, not PayPal. This significantly reduces chargeback risk.
Deep Dive 6: How do we handle settlement and reconciliation?
Settlement is the process of transferring funds from PayPal to merchants and users who withdraw money.
Settlement Timeline:
On T+0 (transaction day), when a customer pays a merchant $100, funds are held in PayPal’s account. The merchant sees a “pending” balance in their dashboard but cannot withdraw yet.
On T+1 (next business day), PayPal’s Settlement Service runs a batch job at 2 AM UTC. It calculates each merchant’s net settlement by summing all completed payments, subtracting refunds and fees. If the net amount is below a minimum threshold (like $10), it’s rolled over to the next day.
For merchants above the threshold, the service initiates an ACH transfer to the merchant’s linked bank account. The merchant’s balance status changes from “pending” to “in_transit.”
On T+2 to T+3, the ACH transfer completes (ACH takes 1-2 business days). The merchant receives the funds in their bank account, and the balance in PayPal shows as “available.”
Batching Logic:
The Settlement Service queries the Ledger Service to find all merchants with pending settlements from the previous day. For each merchant, it calculates the net settlement amount by summing payments minus refunds minus fees.
The service then initiates bulk ACH transfers to the merchants’ bank accounts. Using bulk transfers is more cost-effective than individual transfers. Each transfer includes a description with the settlement date for merchant reconciliation.
The Ledger Service records each settlement with the merchant ID, amount, ACH transaction ID, status (“initiated”), and settlement date. When the ACH transfer completes days later, a webhook from the bank updates the status to “completed.”
Reconciliation:
Daily reconciliation ensures internal ledger balances match external bank balances. The Reconciliation Service runs every morning after settlement processing.
It calculates the expected balance by taking the internal ledger balance, subtracting in-flight settlements that haven’t completed yet, and comparing this to the actual bank balance retrieved via bank APIs.
If there’s a discrepancy greater than one cent (allowing for rounding errors), the system alerts the operations team and creates a ticket for investigation. Common causes include failed ACH transfers, bank fees not recorded internally, or rare database inconsistencies.
Transaction-level reconciliation compares internal transaction records with bank transaction records to find any missing or extra transactions. This catches edge cases like double-processed settlements or failed reversals.
Reserve Accounts:
For high-risk merchants (new, high-chargeback industries like adult content or virtual goods), PayPal implements rolling reserves. The system holds 30% of each day’s sales for 90 days to cover potential chargebacks.
When processing settlement for these merchants, the Settlement Service calculates the reserve amount and creates a reserve ledger entry with a hold period. Only 70% of the funds are settled immediately. After 90 days, if no chargebacks occurred, the reserved funds are released to the merchant.
This protects PayPal from merchants who receive payments, immediately withdraw all funds, and then disappear when chargebacks arrive weeks later.
Deep Dive 7: How do we comply with PCI-DSS for handling card data?
PCI-DSS (Payment Card Industry Data Security Standard) is a mandatory security standard for organizations that handle credit card data. PayPal must comply with Level 1, the highest level, due to transaction volume.
Tokenization Architecture:
The core principle is to never store raw card numbers in the application database. Instead, PayPal uses tokenization.
When a user adds a card, the Card Service first validates it using the Luhn algorithm to check the card number format. It then sends the card data to a separate Tokenization Vault via an isolated network connection.
The Tokenization Vault is a PCI-compliant service that encrypts the card data using AES-256 encryption with keys stored in Hardware Security Modules (HSMs). It returns a token - a random identifier like “tok_a1b2c3d4” that has no relationship to the original card number.
The application database stores only the token, last 4 digits (for display to users), card brand (Visa, Mastercard), and expiry date. If an attacker compromises the application database, they cannot steal card numbers.
When processing a payment, the Payment Service sends the token back to the Tokenization Vault. The vault decrypts the card data and returns it to the Payment Processor. This decryption happens only in the secure vault environment, never in application code.
Network Segmentation:
PayPal’s infrastructure is divided into security zones:
The Public Zone contains API gateways and load balancers accessible from the internet. No cardholder data ever exists here.
The Application Zone contains Payment Services, Fraud Detection, and other business logic. These services only work with tokens and encrypted data.
The Cardholder Data Environment (CDE) is an isolated network containing only the Tokenization Vault and Payment Processor. Access is strictly controlled with firewall rules, and all access is logged.
This segmentation ensures that even if the Application Zone is breached, attackers cannot reach raw card data.
Encryption Standards:
All data in transit uses TLS 1.3 with strong cipher suites. All data at rest uses AES-256 encryption. Encryption keys are stored in HSMs which provide tamper-resistant key storage. Keys are rotated quarterly, and old keys are securely destroyed.
Access Control:
The principle of least privilege is enforced - employees only have access to systems required for their job. All employees must use multi-factor authentication. Access to the CDE requires additional approval and is logged to a separate audit system that employees cannot modify.
Security Scanning:
PayPal undergoes quarterly vulnerability scans by PCI-approved scanning vendors. Annual penetration testing is performed by external security firms who attempt to breach the system. Continuous monitoring tools watch for suspicious activity like unusual data access patterns or unauthorized access attempts.
Any findings from scans or tests are prioritized and remediated within defined timeframes (critical issues within 30 days, high within 90 days).
Step 4: Wrap Up
In this design, we proposed a comprehensive payment platform similar to PayPal. If there is extra time at the end of the interview, here are additional points to discuss:
Key Design Decisions:
Distributed Ledger with Double-Entry Accounting: Using immutable ledger entries with double-entry accounting guarantees financial consistency and provides a complete audit trail. Every transaction’s debits and credits must sum to zero, making the system self-auditing. Sharding by user_id enables horizontal scaling while maintaining transaction integrity.
Two-Phase Commit for ACID Transactions: Distributed transactions across database shards require careful coordination. The two-phase commit protocol ensures atomicity - either all updates succeed or none do. Combined with idempotency keys, this prevents duplicate payments and lost money, even during failures.
Real-Time Fraud Detection: A multi-layered approach combines machine learning models with deterministic rules. The ML model adapts to new fraud patterns through continuous retraining, while rules catch known attack vectors. Processing must complete in under 100ms to avoid noticeable latency, achieved through model optimization and caching frequently accessed features.
Asynchronous Settlement: T+1 settlement (next business day) balances merchant needs with operational efficiency. Batching settlements reduces transaction costs and ACH fees. Rolling reserves for high-risk merchants protect against chargeback losses that arrive weeks after the original transaction.
PCI-DSS Compliance: Tokenization eliminates the need to store raw card data in application databases, drastically reducing PCI compliance scope. Network segmentation isolates the Cardholder Data Environment, and regular security audits ensure ongoing compliance.
Scaling Considerations:
Database Scaling: PostgreSQL is sharded by user_id starting with 10 shards, expandable to 100+ shards as the user base grows. Read replicas distribute query load for operations like balance checks and transaction history. Old transactions are archived to cold storage (S3) after 7 years to reduce database size and costs.
Service Scaling: All services are stateless microservices that can scale horizontally. Auto-scaling policies monitor CPU and memory metrics, adding instances during peak load (holiday shopping) and removing them during off-peak hours. Each service has independent scaling characteristics - Fraud Service needs more compute, Payment Service needs more I/O.
Geographic Distribution: Services are deployed across multiple regions (US East, US West, Europe, Asia) to reduce latency for global users. Users are routed to the nearest region using geographic load balancing. Cross-region database replication provides disaster recovery, with automatic failover if a region goes down.
Caching Strategy: Redis clusters provide distributed caching with high availability. Account balances use a write-through caching pattern - all balance updates immediately update both the database and cache. Foreign exchange rates are cached with a 5-minute TTL since they change infrequently. Fraud model features for recent transactions are cached to speed up repeat checks.
Rate Limiting: Per-user rate limits prevent abuse (like trying thousands of small transactions to find stolen card numbers that work). Per-merchant rate limits prevent runaway billing bugs. API rate limits for partner integrations ensure fair resource sharing.
Monitoring and Alerts:
Key Metrics: Transaction success rate must exceed 99.9% - anything lower indicates a serious issue. P95 payment latency must stay below 500ms to maintain good user experience. Fraud detection accuracy balances precision (not blocking legitimate transactions) and recall (catching actual fraud). Chargeback rate must stay below 0.5% or card networks may impose fines. Settlement success rate must exceed 99.99% since failed settlements require expensive manual intervention.
Alerting Rules: Alerts trigger when transaction success rate drops below 99% (indicating potential outage), fraud service latency exceeds 100ms (may indicate model issues or database problems), database replication lag exceeds 10 seconds (risks serving stale data), reconciliation detects mismatches (potential data corruption), or chargeback rate spikes above 2x baseline (possible fraud ring or data breach).
Trade-offs:
Strong Consistency vs. Availability: For financial transactions, the system prioritizes strong consistency over availability. During a database failover, payment processing may be unavailable for 30-60 seconds. This is acceptable because processing a transaction incorrectly (duplicate charge, lost money) is far worse than a brief outage. Multi-region deployment and fast failover mechanisms minimize downtime.
Real-Time Fraud Detection vs. Latency: Running fraud detection on every transaction adds 100ms latency. This is acceptable because the cost of fraud (chargebacks, reputation damage, potential loss of card network relationships) far exceeds the cost of slightly slower checkout. The system optimizes ML inference and caches features to minimize this overhead.
T+1 Settlement vs. Instant Payout: Next-day settlement is operationally efficient and reduces ACH costs through batching. However, some merchants need cash flow immediately. PayPal offers instant payout as a premium feature with a 1% fee, allowing merchants to get funds in minutes via debit card transfers. This lets different merchant segments choose their preferred trade-off between cost and speed.
Future Enhancements:
Blockchain Settlement: Using stablecoins (USDC, USDT) for cross-border settlement could reduce costs and enable instant transfers. Smart contracts could automate escrow and dispute resolution, reducing manual intervention. However, regulatory uncertainty around cryptocurrencies requires careful legal consideration.
Buy Now, Pay Later (BNPL): Integrating with PayPal Credit to offer installment payments at checkout can increase conversion rates and average order values for merchants. Underwriting decisions would be based on transaction history and credit scores, with machine learning models predicting default risk.
Cryptocurrency Support: Allowing users to buy, sell, and hold cryptocurrencies (Bitcoin, Ethereum) in their PayPal wallet, with automatic conversion to fiat for merchant payments. This requires additional licensing, custody solutions, and risk management for crypto price volatility.
Open Banking Integration: Using services like Plaid or Tink to verify bank accounts instantly instead of micro-deposits. Enabling instant account-to-account transfers through Open Banking APIs could be faster and cheaper than ACH, especially for same-bank transfers.
AI-Powered Customer Support: Chatbots powered by large language models could handle common queries about transaction status, how to send money, or refund requests. For disputes, machine learning could automatically analyze evidence and resolve 70%+ of cases without human review, dramatically reducing support costs.
Enhanced Machine Learning: Current fraud models are transaction-level. Graph neural networks could analyze the network structure of money flow, identifying fraud rings where multiple accounts work together. This “follow the money” approach can catch sophisticated fraud that individual transaction analysis misses.
This design handles billions of dollars in daily transactions with strong consistency guarantees, sub-second latency, and industry-leading fraud detection. The architecture scales horizontally across multiple geographic regions, maintains 99.99% uptime through redundancy and failover mechanisms, and complies with strict financial regulations including PCI-DSS, KYC, and AML requirements. The system provides a seamless user experience for 430 million global users while ensuring every cent is accurately tracked and every transaction is secure.
Comments