Solana's Read Layer, Rebuilt: Inside Quicknode's In-Memory Cache

Quicknode rebuilt Solana's read layer in memory. getProgramAccounts, getLargestAccounts, and more run 10x to 1000x faster than stock Agave.

Solana's Read Layer, Rebuilt: Inside Quicknode's In-Memory Cache

Solana developers hit roadblocks when using methods that return large amounts of data or require scanning the full account set. getProgramAccounts gets throttled or disabled. getLargestAccounts never returns. The root cause is structural. Agave RPC nodes are write-optimized systems, and throttling expensive read methods is a necessity, not a policy choice. Applications end up compensating with retry logic, caching layers, and polling intervals that exist purely to work around their blockchain provider’s infrastructure limitations.

While the broader ecosystem is now beginning to tackle this problem, Quicknode already built the read layer that Solana's ecosystem actually needs. Two years of iterative work, purpose-built in-memory architecture, and results that are 10x to over 1000x faster than Agave's defaults.

If you're a Quicknode customer, you don’t need to opt in to these benefits. Your endpoint delivers them already.

The Structural Problem with Solana's Read Layer

Agave RPC nodes are built to process transactions, reach consensus, and advance the ledger. The account state they maintain is a byproduct of that process, not a read-optimized store. There is no query planner, no native secondary indexes, and no separation between read and write I/O.

Hardware scaling does not resolve this. Read queries and consensus work compete for the same CPU on the same machine, so more hardware means paying validator-class costs (768 GB RAM, fast NVMe, high-bandwidth networking) without eliminating the contention.

Adding more RPC nodes compounds the problem: additional nodes add load to Solana's gossip and turbine networks, slowing state propagation for the validators that actually need it. Scaling the read layer through the validator architecture actively degrades the write layer.

Data pagination introduces a consistency risk. When new blocks land between pages, the account state underlying the response changes mid-read. Applications iterating through large result sets receive data that shifted before they finished reading it.

How Quicknode Built a Better Read Layer

Quicknode recognized this problem early, separating read workloads from the validator and serving them from purpose-built infrastructure. The architecture serving traffic today is the third iteration of those efforts. Each one moved closer to the data, stripping away abstraction layers that added latency without adding value, and the work is ongoing.

Iteration 1: Relational Databases

The first iteration followed the conventional approach: pipe account state and ledger data into a relational database, build indexes, and query against it. This immediately solved the resource contention problem. Validators could focus on consensus while a separate system handled reads. But relational databases carry overhead that matters at Solana's scale: query planning, row locking, managing Write-Ahead Logging, autovacuum, and the inherent latency of disk-backed storage. For a blockchain producing blocks every 400 milliseconds, every microsecond of read latency compounds across millions of daily requests. As Solana traffic scaled, disk I/O became the next limit to clear.

Iteration 2: Specialized High-Performance Databases

The next iteration started with the schema itself. The team simplified and denormalized the data model to match Solana's actual access patterns, dropping the relational shape that no longer paid for itself. That reshape unlocked a move to a high-performance NoSQL database built for write-heavy ingestion and low-latency point reads, without the query planning, row locking, and write-ahead overhead that came with the relational approach.

Both throughput and latency improved noticeably, and the operational headaches that came with running a relational system at scale (autovacuum stalls, lock contention spikes, WAL bloat) disappeared.

Iteration 3: Fully In-Memory Architecture

The motivation for the next move was different. Disk-backed systems, no matter how well-tuned, share a hard ceiling: every read eventually touches a storage medium that is orders of magnitude slower than RAM. To push past it, the read path itself had to leave the disk behind.

The current architecture eliminates disk from the read path entirely. AccountsDB state lives in memory, indexed by hand-crafted data structures purpose-built for Solana's specific query patterns. There is no query planner, no B-tree traversal, no page cache miss: just direct memory access to pre-indexed data.

The shift to a fully in-memory architecture was a deliberate decision to optimize for a single objective: raw read performance.

The LedgerDB (stores the historical record of every slot's blocks, transactions, and metadata) cache follows the same design philosophy with a different storage strategy. Solana's ledger is too large to fit entirely in memory, so it uses a tiered design: tip-of-chain data stays hot in memory, and historical data is served from a high-performance distributed database tuned for that workload.

How It Works: Quicknode's In-Memory Cache

Quicknode's Solana Cache is a single self-contained binary that serves every tip-of-chain method with no external dependencies. Only historical ledger queries reach into an external database.

The AccountsDB (stores the current state of every account) cache is fully in-memory, and the design choices that make it fast are not incidental. Four mechanisms work together to eliminate every avoidable source of latency on the read and write paths.

Ahead of the Chain

Data freshness is a performance property, and it begins before any read request arrives. The cache ingests directly from a shred-stream data source, receiving account updates 200-400 ms earlier than they would surface on a typical Agave RPC node that waits for fully formed blocks. Those updates land through a low-latency write path (in-memory writes with a near lock-free design) so the head start is preserved end-to-end rather than absorbed by ingestion overhead.

The combined effect is measurable at tip: under production traffic, the cache's latest observable slot runs consistently 1-2 slots ahead of a comparably configured Agave RPC node. For workloads where the question is 'what is the state right now' for trading systems, MEV, liquidation engines, and real-time dashboards, this is the difference between acting on live state and acting on data that is already stale.

Custom Data Structures

The indexes that power account lookups, token queries, and program account filtering are built from low-level data structures chosen by benchmark, not the most general choice, but the one that measured fastest for the access patterns the system actually sees. Shared state uses minimal locking wherever possible, keeping concurrent readers out of each other’s way.

Tuned for the Hardware

The system goes deep on hardware utilization by pinning ingestion, indexing, and query serving to dedicated CPU cores to minimize context-switch overhead, and selecting memory allocators per-component based on their specific allocation profiles to reduce fragmentation under load.

Pre-Computation

Where possible, the system pre-computes and maintains derived views as account state is ingested: supply aggregates, sorted account lists, filtered program caches. This shifts CPU cost from the latency-sensitive read path to the throughput-oriented write path. Queries never trigger computation. They read results that are already there. The 17ms getSupply response in the benchmarks is this mechanism in practice: the aggregate already exists when the query arrives.

getProgramAccounts: The Hardest Problem

getProgramAccounts does a simple-sounding task: return all accounts owned by this program that match these filters. The challenge is that indexing it well is hard. The program could be any of thousands of deployed programs, each with its own account schema.

The filters are typically memcmp operations at specific byte offsets, offsets that only make sense if the program's internal data layout is known. A memcmp at offset 32 means one thing in SPL Token and something entirely different in a DEX program.

There is no universal indexing strategy that works for all of them. Indexing getProgramAccounts efficiently requires understanding what each program's accounts look like. Quicknode's indexer addresses this with two strategies working together:

Natively Indexed Programs

For a small set of high-traffic, well-understood programs, Quicknode builds and maintains dedicated native indexes continuously as accounts update. These are the programs that dominate Solana traffic: SPL Token (legacy), SPL Token-2022, and the Stake program.

The account layouts are known, the filter patterns clients use are known, and Quicknode designed the indexes to exploit those specifics directly. These indexes resolve queries in sub-millisecond time regardless of how many millions of accounts the program owns, and they cover the vast majority of getProgramAccounts traffic because the most popular queries are concentrated in a small number of programs.

Dynamically Indexed Programs

For everything else, like every custom DeFi protocol, every program deployed last week, every niche use case, the indexer uses a dynamic strategy. The first time a caller requests a filter combination for a given program, the cache automatically creates a new index on the fly. That first request runs unoptimized. The indexer has to scan the program's accounts and build the index before it can respond, so the caller absorbs the one-time cost of warming the cache. And even that cold path beats Agave.

Every subsequent request for the same filter is served from a live, incrementally-maintained cache that stays current as accounts change onchain. Updates flow through the same streaming pipeline that powers the natively indexed programs, so the cached view never drifts from reality.

From that point on, latency is effectively constant regardless of payload size or program popularity. A filter returning ten accounts and a filter returning a million accounts respond in the same order of magnitude, because the work is already done. Responses are streamed directly from pre-assembled memory, not recomputed per request.

New programs are supported the same day they are deployed. There is no schema integration, no manual configuration, and no waiting for an engineering team to add support. The first client to query a new program kicks off the indexing, and every client after that benefits from it.

Native indexes deliver maximum performance for the bulk of real-world traffic. The dynamic indexer ensures no program or filter pattern is unsupported.

p50

Program Accounts Quicknode Cache Warm TTFB Agave Direct TTFB Quicknode vs Agave
Jupiter v6 60 0.2 ms 2.0 ms 10x faster
Marinade Finance 1K 2.5 ms 12 ms 4.8x faster
Tensor 30K 0.7 ms 213 ms ~304x faster
Orca Whirlpool 91K ~2.0 ms 3,515 ms ~1,760x faster
Phoenix DEX 109K ~2.6 ms 783 ms ~301x faster
Drift Protocol 273K ~6.4 ms 3,469 ms ~542x faster
SPL Name Service 341K ~8.7 ms 6,764 ms ~777x faster
Pump.fun 507K ~15 ms 27,958 ms ~1,864x faster
Raydium V4 705K ~18.5 ms 10,224 ms ~553x faster

p99

Program Accounts Quicknode Cache Warm TTFB Agave Direct TTFB Quicknode vs Agave
Jupiter v6 60 0.3 ms 6 ms 20x faster
Marinade Finance 1K 4 ms 15 ms 3.8x faster
Tensor 30K 1.2 ms 234 ms ~195x faster
Orca Whirlpool 91K ~2.3 ms 3,668 ms ~1,595x faster
Phoenix DEX 109K ~2.8 ms 845 ms ~302x faster
Drift Protocol 273K ~7.0 ms 3,720 ms ~531x faster
SPL Name Service 341K ~8.9 ms 7,571 ms ~851x faster
Pump.fun 507K ~16 ms 29,718 ms ~1,857x faster
Raydium V4 705K ~20 ms 11,120 ms ~556x faster

getProgramAccounts is the hardest problem on the AccountsDB surface, which is why it gets its own optimization strategy. Everything else (point lookups, aggregates, token queries, block and slot metadata) is served from the same in-memory state with the same design principles: pre-computed where possible, lock-free on the read path, and resolved in a single lookup rather than a scan.

What This Delivers in Production

The result is that every AccountsDB method, not just the headline ones, runs at in-memory speed under concurrent load. There is no supported method that degrades under load as traffic scales or the account set grows.

The numbers below come from benchmarking against a stock Agave validator on the same network. The performance difference is architectural, not a product of better hardware.

For simple point lookups, getAccountInfo, getBalance, and getTokenAccountBalance, Quicknode's indexer matches Agave at 2-4ms. The hot path for these methods is a single indexed read. Response time is dominated by network round-trip, not by work that scales with the account set. Performance diverges on methods that require aggregates, sorted views, or secondary indexes the validator does not maintain.

Method Quicknode Cache p50 Agave p50 Delta
getTokenLargestAccounts (USDC) 4 ms 97,535 ms 24,384x faster
getTokenLargestAccounts (BONK) 4 ms 8,463 ms 2,116x faster
getLargestAccounts 7 ms >300,000 ms (timeout) >42,857x faster
getSupply 17 ms 6,231 ms 367x faster
getTokenAccountsByDelegate 4 ms unsupported (error) Agave cannot serve this
getProgramAccounts: Token-2022 by owner 8 ms >15,000 ms ~1,875x faster
getProgramAccounts: Orca Whirlpool (91K accts) (warm) 167 ms 3,515 ms 21x faster
getProgramAccounts: Pump.fun (507K accts) (warm) 795 ms 27,958 ms 35x faster

Three results stand apart:

getTokenLargestAccounts for USDC: Agave takes nearly 100 seconds. Every request forces the validator to walk the full SPL Token program, filter by mint, sort, and return the top 20 with no cached result and no sorted index. Quicknode's indexer maintains a per-mint sorted set that updates incrementally as balances change.

getLargestAccounts does not complete on Agave within five minutes. The validator recomputes a sorted aggregate over the entire account set from scratch on every request. Quicknode's indexer maintains it as a running index.

getTokenAccountsByDelegate is unsupported on Agave. Quicknode's indexer serves it in single-digit milliseconds.

The latency numbers show individual request speed. The concurrency picture is different and equally important. When a single Agave getTokenLargestAccounts query takes 97 seconds, that request ties up a serving slot and stalls every query behind it: one slow request cascades into latency spikes for all concurrent callers.

Quicknode's indexer does not have that failure mode. Every query is served from a pre-built in-memory index, so no single request runs long enough to starve concurrent traffic. Under sustained load, the AccountsDB cache serves tens of thousands of requests per second while keeping P50 latency under 10ms, with throughput bounded by hardware capacity rather than account set size or query complexity.

Not every method is sub-10ms. Cold requests to programs outside the natively indexed set pay an initialization cost of 11-40 seconds on first call while the subscription establishes and the filtered cache populates. Every subsequent request to the same filter serves from a warm in-memory cache.

Stake program queries at scale illustrate the middle range. Marinade's 81K stake accounts resolve in about a second on Quicknode's indexer versus 6.7 seconds on Agave. The index lookup stays fast, but assembling and serializing a response at that volume involves real CPU work.

Responses at the scale of Raydium V4 (~705K accounts, ~869 MB of JSON) hit a different ceiling: time-to-first-byte is 1.5 seconds, and full delivery takes longer still, bound by the physics of moving 869 MB over a wire.

The performance gains are dramatic across the board. Every method runs 10x to 1000x faster than a stock Agave validator. In single-method load tests, a single instance sustains 350,000 getTransaction requests per second at under 40 ms latency, and 120,000 getAccountInfo requests per second at under 20 ms P50.

Combined, the AccountsDB and LedgerDB caches cover ~95% of production RPC traffic and ~90% of the Solana JSON-RPC specification.

This Is Not the Finish Line

What exists today is the result of two years of iteration. It is not the end state. Quicknode's investment in Solana's read layer is continuous. The architecture will keep getting faster, method coverage will keep expanding, and the work of closing the gap between what developers need and what the infrastructure delivers does not stop.

The Bottom Line

Solana's read and write workloads are fundamentally different problems. They deserve fundamentally different architectures. Quicknode built the read layer that Solana's ecosystem actually needs, from scratch, over two years, without compromising on performance. Continuous improvements to this infrastructure are not something to opt into. It is what every Quicknode Solana endpoint already delivers.

Create a Solana endpoint on Quicknode and start building on a read layer that has been purpose-built and continuously optimized for exactly this workload.


About Quicknode

Founded in 2017, Quicknode provides world-class blockchain infrastructure to developers and enterprises. With 99.99% uptime, support for 80+ blockchains, and performance trusted by industry leaders, we empower builders to deploy and scale next-generation applications across Web3.

Start building today at Quicknode.com.