Scaling Node.js: From 100 to 1 Million Requests

The Single-Threaded Nature of Node

Node.js is built on the V8 Engine and uses a non-blocking, event-driven I/O model. This makes it incredibly fast for I/O-bound tasks (like reading from a database), but susceptible to bottlenecks on CPU-bound tasks (like image processing).

But here is the catch: Node.js runs on a single thread. If you deploy a standard Node app on an 8-core server, you are wasting 87% of your CPU power.

Phase 1: Vertical Scaling (Clustering)

In Node.js, we can create child processes using the cluster module. This allows us to spawn a "Worker" for every CPU core available. They share the same server port but process requests independently.

Using PM2 for Zero-Downtime Reloads

Instead of writing manual cluster code, use PM2, the production process manager for Node.js.

# Start application with maximum instances (one per core)
pm2 start app.js -i max
 
# Reload without downtime (Requests are handed off to new workers)
pm2 reload app

Result: 8x throughput on an 8-core machine.

User logs in on Server A. Session is saved in Server A's RAM.
Next request hits Server B. Server B knows nothing about the session. User is logged out.

Solution: Use a centralized Session Store like Redis. All servers confirm the session token against Redis.

Phase 3: Optimizing the Event Loop

Scaling infrastructure is useless if your code blocks the event loop.

The Don'ts

Don't use Sync functions: fs.readFileSync, crypto.pbkdf2Sync. They halt the entire server.
Don't process JSON blobs > 10MB: JSON.parse is synchronous and CPU intensive. Use streaming parsers like JSONStream.

The Dos

Offloading: Move heavy computation to Worker Threads or a separate microservice (maybe written in Go or Rust).
gzip/Brotli: Compress responses to reduce network latency.

import compression from 'compression';
app.use(compression());

Phase 4: Database Scaling

Your app servers are stateless and scalable, but your Database is now the bottleneck.

Read Replicas: Direct all GET requests to Read Replicas, and only POST/PUT to the Primary DB.
Connection Pooling: Node.js opens connections fast. Ensure your DB isn't overwhelmed by establishing a connection pool (e.g., in Postgres or MongoDB driver).

Summary

Scaling is a journey, not a switch.

Start with PM2 (utilize all cores).
Move to Redis for state.
Deploy behind Nginx.
Optimize query patterns and event loop blocking.
Orchestrate with Kubernetes when you hit widespread scale.