Architecting a Scalable Blockchain Explorer for Substrate Chains

The Starting Point: Why Not a Monolith?

The naive approach to a blockchain explorer is a single service: connect to the node, parse blocks, store them, serve them. That works for a proof of concept. It breaks in production.

A blockchain explorer has three workloads with completely different characteristics:

Historical queries

Read-heavy, latency-sensitive, scales with user traffic. A spike in explorer visitors should not slow down block ingestion.

Block ingestion

Write-heavy, sequential, time-critical. New blocks arrive every few seconds — missing one means a gap in your data.

Real-time broadcasting

Connection-heavy, stateful. Maintaining thousands of WebSocket connections is a different scaling problem from serving HTTP or writing to a database.

Putting all three in one process means they compete for resources, they fail together, and you can only scale all of them at once even if only one is under pressure.

The Three Services

System Overview

┌─────────────────┐ Substrate Node (WS)

│ Block Worker │◀──────────────────────── subscribeNewHeads()

│ (Write path) │──▶ PostgreSQL (blocks, extrinsics, events)

└────────┬────────┘──▶ Internal Event Bus (new_block)

│

┌────────▼────────┐

│ Broadcaster │──▶ WebSocket clients (Frontend)

└─────────────────┘

┌─────────────────┐

│ RPC Server │◀──▶ HTTP REST (Frontend queries)

│ (Read path) │──▶ PostgreSQL (read only)

└─────────────────┘

The critical rule: no service crosses into another service's responsibility. The Block Worker never serves HTTP. The RPC Server never connects to the chain. The Broadcaster never writes to the database.

Service 1 — The RPC Server (Read Layer)

The RPC Server exposes a REST API for the frontend — block lists, block detail, extrinsics by account, event history — all served from PostgreSQL. No chain connection at any point.

Frontend traffic hits only the RPC Server. You can horizontally scale it behind a load balancer, add read replicas, or CDN-cache endpoints without touching the ingestion pipeline.

Scalability lever

Run 10 RPC Server instances behind an ALB. Block ingestion is unaffected — it writes to the same database and never knows the RPC Server exists.

GET /blocks?page=1&limit=20
GET /blocks/:numberOrHash
GET /blocks/:hash/extrinsics
GET /accounts/:address/extrinsics
GET /events?section=balances&method=Transfer

Service 2 — The Block Worker (Write Layer)

The Worker is the only component that connects to the Substrate node. Its sole responsibility: ensure every block is ingested accurately and in order.

The ingestion loop

The Worker subscribes to new block headers via the Substrate WebSocket RPC, fetches the full block, decodes extrinsics and events using the chain's runtime metadata, writes to PostgreSQL, then publishes a new_block event to the internal event bus. One database write, one event published — that's the entire contract.

The cold-start gap problem

Every block worker must solve this: what happened while it was down? On startup, a backfill routine compares the latest DB block number against the current chain head and processes any gap sequentially before attaching the live subscription.

// On startup — always
1. Get latest block number in DB           → e.g. #4985
2. Get current chain head                  → e.g. #5002
3. Process blocks #4986 → #5002 in order   → backfill gap
4. Subscribe to new heads                  → live from #5003

Design decision

The Worker is the one place in this system where you deliberately avoid horizontal scaling. Sequential consistency of the blockchain is a hard constraint — embrace it rather than fight it.

Service 3 — The Event Broadcaster (Real-Time Layer)

The Broadcaster takes new_block events from the internal bus and pushes them to every connected frontend client over WebSocket. No database connection, no state beyond active socket connections.

Scaling horizontally with Redis Pub/Sub

When a single Broadcaster isn't enough, run multiple instances. Move the event bus from in-process EventEmitter to Redis Pub/Sub — the Worker publishes to a Redis channel, every Broadcaster instance subscribes and fans out to its own clients.

Multi-node Broadcaster scaling

Block Worker ──PUBLISH──▶ Redis Channel: new_block

├──SUBSCRIBE──▶ Broadcaster 1 ──▶ clients 1–5000

├──SUBSCRIBE──▶ Broadcaster 2 ──▶ clients 5001–10000

└──SUBSCRIBE──▶ Broadcaster N ──▶ clients ...

Scalability lever

Moving from EventEmitter to Redis Pub/Sub is a one-file change — the interface is identical. The Worker still calls eventBus.publish(), each Broadcaster still calls eventBus.subscribe().

The Internal Event Bus — The Glue

The event bus is the only coupling between the Worker and the Broadcaster. Keeping it behind an interface is what makes the entire system swappable.

interface EventBus {
  publish(event: string, payload: unknown): void;
  subscribe(event: string, handler: (payload: unknown) => void): void;
}
// Implementation 1: EventEmitter (single node)
// Implementation 2: Redis Pub/Sub (multi-node)
// Both satisfy the same interface — zero changes to Worker or Broadcaster

Why This Architecture Is Highly Scalable

Each service scales on its own axis

RPC Server scales horizontally for read traffic. Broadcaster scales horizontally via Redis for WebSocket connections. Block Worker scales vertically for decode throughput. None affect each other.

Failures are isolated

If the Broadcaster crashes, block ingestion continues and historical queries still work. If the Worker restarts, it backfills the gap on startup — nothing is lost.

Read and write paths never compete

Explorer users querying historical data never slow down the block ingestion loop. They touch the same database via completely separate connection pools and processes.

Independent deployments

Updating the REST API, adding a WebSocket event, or optimising the block parser are all single-service deployments. No coordination needed.

What I Would Do Differently

Start with Redis from day one

Even on a single node, using Redis Pub/Sub from the start means you never have to migrate. The overhead is negligible.

Add a dead-letter queue for failed blocks

A block that consistently fails to parse will block the worker. A dead-letter queue that parks failed blocks and lets the worker continue makes the ingestion loop resilient to malformed chain data.

Closing Thoughts

The three-service split was the right call, and the main reason was operational: being able to restart, scale, or redeploy any one service without coordination overhead made the system dramatically easier to operate in production.

The pattern — a write worker, a read API, and a real-time broadcaster decoupled via an event bus — is not blockchain-specific. IoT sensor data, financial feeds, live sports data — the architecture maps directly.