Iris Coleman
Mar 23, 2026 14:21
Paxos reveals the way it partitioned a 21TB crypto ledger desk with zero downtime, attaining a 371ms cutover whereas crypto markets ran 24/7.
Paxos, the stablecoin infrastructure firm behind PYUSD and USDP, has printed technical particulars on how its engineering group partitioned a 21-terabyte ledger database desk with out taking programs offline—a feat that required simply 371 milliseconds for the ultimate cutover whereas crypto markets continued buying and selling across the clock.
The corporate’s ledger desk had grown to roughly two-thirds of Aurora’s per-table measurement restrict, giving engineers a few 12 months earlier than writes would begin failing. For a agency processing stablecoin transactions that may’t tolerate knowledge loss or delays, conventional migration approaches involving prolonged upkeep home windows weren’t viable.
The Technical Strategy
Somewhat than copying billions of rows into a brand new construction—the usual partitioning playbook—Paxos constructed the partitioned structure across the present desk. The unique desk grew to become a “historical past” partition whereas new time-range partitions caught incoming knowledge. To exterior programs, nothing modified; Postgres dealt with routing internally.
The catch? Postgres must confirm each row satisfies partition constraints earlier than attaching a desk, which suggests a full desk scan. On a 21TB desk, that is not fast.
Paxos cut up this into two phases. First, they added the constraint as NOT VALID—a quick operation that skips verification. Then they ran VALIDATE CONSTRAINT individually, permitting reads and writes to proceed throughout the scan.
9 Hours of Ache
The validation scan took simply over 9 hours. Throughout that point, the lock prevented autovacuum from cleansing up lifeless tuples, inflicting tail latency to climb steadily. The primary try failed when write spikes exceeded timeout thresholds.
On the second try, Paxos coordinated with market makers to pause buying and selling exercise briefly and relaxed timeout thresholds. P50 latency stayed comparatively flat, however P95/P99 degraded considerably as lifeless tuples collected.
“In hindsight, this was probably the most operationally demanding a part of the migration—not the cutover, however the scan that made the cutover doable,” the engineering group wrote.
Why This Issues for Crypto Infrastructure
The migration highlights a rising problem for crypto infrastructure suppliers. In contrast to conventional finance with scheduled upkeep home windows, crypto markets run constantly. Database tables backing ledger programs cannot merely go offline for rebuilding.
Paxos additionally tackled a delicate partitioning gotcha: uniqueness constraints throughout partitions. In Postgres, distinctive constraints on partitioned tables solely work globally in the event that they embody the partition key. In any other case, two inserts with similar idempotency keys might land in several partitions and each succeed—a catastrophe for a monetary ledger that would apply the identical transaction twice.
The answer concerned idempotency checks outdoors the balance-update lock, including lower than 5 milliseconds of latency in inside testing.
Testing at Manufacturing Scale
Staging environments could not replicate the issue. Paxos used Aurora manufacturing cloning to create a full-sized check database, then constructed a “reverse historical past” SQL generator that replayed actual transaction patterns backward to keep away from triggering overdraft failures.
The method displays broader business motion towards zero-downtime database migrations. Related methods have been documented for tables starting from 1.5TB to multi-terabyte scale, usually utilizing views to summary transitions from functions.
For Paxos, the partitioned construction now allows one-liner archiving via DETACH PARTITION and removes the looming measurement ceiling as a constraint. The corporate says it is starting a sequence of posts on engineering challenges behind its infrastructure—a sign that stablecoin operators are more and more competing on technical credibility alongside regulatory compliance.
Picture supply: Shutterstock

