Nimesh Thakur

Posted on Dec 14, 2025

I Crashed My Server with Promise.all() - Here's How I Built a 121 RPS Fuzzer Instead

#webdev #node #tooling #opensource

The Journey of Making 404fuzz Blazingly Fast ⚡

When I started building 404fuzz, I had one goal: make it fast. Really fast. But I quickly learned that speed in Node.js isn't just about throwing more requests at a server. It's about understanding how Node.js actually works.

Let me take you on the journey from my first "obvious" solution to a fuzzer that achieves 121 RPS on modest hardware.

Chapter 1: The Promise.all() Trap 🪤

My First Thought

"Easy! I'll just load all my wordlist paths and fire them all at once with Promise.all()!"

// My first naive approach
const wordlist = ['admin', 'backup', 'config', ...]; // 10,000 paths
const promises = wordlist.map(path => fetch(`${target}/${path}`));
await Promise.all(promises); // Fire everything!

The Brutal Reality

This crashed everything. My laptop froze. The target server probably hated me. What went wrong?

Here's what I learned: Promise.all() is NOT parallel execution.

Understanding Node.js: Concurrent, Not Parallel 🔄

Let me explain with a diagram:

┌─────────────────────────────────────────────────┐
│          Node.js Single Thread                  │
├─────────────────────────────────────────────────┤
│                                                 │
│  Your Code (Asynchronous)                        │
│         ↓                                       │
│  Event Demultiplexer (Receives all events)     │
│         ↓                                       │
│  Event Queue [Event1, Event2, Event3, ...]     │
│         ↓                                       │
│  Event Loop (while(queue.length > 0))          │
│    ├─ Takes Event1                             │
│    ├─ Executes Callback                        │
│    ├─ Returns immediately (non-blocking!)      │
│    └─ Takes Event2...                          │
│                                                 │
└─────────────────────────────────────────────────┘

Key Insight: Node.js is concurrent (non-blocking), not parallel (multiple things at once).

When you do Promise.all() with 10,000 requests:

❌ You don't get 10,000 parallel threads
❌ You DO get 10,000 open connections
❌ You DO consume massive memory
❌ You DO overwhelm both your system and the target

Result: System crash, memory exhaustion, or you become an accidental DDoS attacker.

Chapter 2: The Queue Model - Controlled Chaos 🎯

The Better Approach

I needed bounded concurrency - control how many requests run at once, queue the rest.

┌────────────────────────────────────────┐
│      Wordlist (10,000 paths)           │
└────────────┬───────────────────────────┘
             ↓
┌────────────────────────────────────────┐
│         Request Queue                  │
│  [req1, req2, req3, req4, req5, ...]   │
└────────────┬───────────────────────────┘
             ↓
┌────────────────────────────────────────┐
│    Concurrency Limit (e.g., 50)       │
│                                        │
│  [Active1] [Active2] ... [Active50]   │
│      ↓          ↓            ↓         │
│   Response  Response     Response      │
│      ↓          ↓            ↓         │
│  Next from queue (req51)               │
└────────────────────────────────────────┘

The Implementation

class RequestQueue {
  constructor(concurrency = 50) {
    this.concurrency = concurrency;
    this.running = 0;
    this.queue = [];
  }

  async add(task) {
    // If we're at limit, queue it
    if (this.running >= this.concurrency) {
      await new Promise(resolve => this.queue.push(resolve));
    }

    this.running++;
    try {
      return await task();
    } finally {
      this.running--;
      // Release next queued task
      if (this.queue.length > 0) {
        const resolve = this.queue.shift();
        resolve();
      }
    }
  }
}

The Results

✅ Memory stays stable
✅ Target server doesn't die
✅ Predictable resource usage
✅ You can tune with -t flag (concurrency level)

But I wanted MORE speed. Time for the next level.

Chapter 3: Multi-Core Power - The Cluster Model 💪

The Problem

Node.js is single-threaded. My i5 processor has 8 cores. I'm using only 12.5% of my CPU!

The Solution: Node.js Cluster Module

┌─────────────────────────────────────────────────┐
│            Primary Process (Master)             │
│                                                 │
│  - Loads wordlist                               │
│  - Splits work among workers                    │
│  - Collects results                             │
└────────────┬───────────────────────┬────────────┘
             │                       │
      ┌──────┴──────┐         ┌──────┴──────┐
      ↓             ↓         ↓             ↓
┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐
│ Worker 1 │  │ Worker 2 │  │ Worker 3 │  │ Worker 4 │
│          │  │          │  │          │  │          │
│  Queue   │  │  Queue   │  │  Queue   │  │  Queue   │
│  Model   │  │  Model   │  │  Model   │  │  Model   │
│  (-t 10) │  │  (-t 10) │  │  (-t 10) │  │  (-t 10) │
└──────────┘  └──────────┘  └──────────┘  └──────────┘
     ↓             ↓              ↓             ↓
   Target       Target         Target        Target

The Implementation

// Primary process
if (cluster.isPrimary) {
  const numWorkers = getCoreCount(options.cores); // -c flag
  const workload = splitWordlist(wordlist, numWorkers);

  for (let i = 0; i < numWorkers; i++) {
    const worker = cluster.fork();
    worker.send({ paths: workload[i], concurrency: options.threads });
  }
}

// Worker process
if (cluster.isWorker) {
  process.on('message', async ({ paths, concurrency }) => {
    const queue = new RequestQueue(concurrency);
    for (const path of paths) {
      await queue.add(() => fuzzPath(path));
    }
  });
}

Chapter 4: The Sweet Spot - Balancing Act ⚖️

Here's where it gets interesting. More workers ≠ More speed

The Complexity

You're now balancing TWO variables:

Clusters (-c): Number of worker processes
Concurrency (-t): Requests per worker

What I Discovered

Configuration              RPS     Why?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-c 8  -t 2                 ~65     Too much IPC overhead
-c 4  -t 5                 ~95     Better balance
-c 2  -t 10               ~121     SWEET SPOT! ⭐
-c 1  -t 20                ~85     Bottlenecked by single process
-c all -t 20              ~70     IPC kills performance

The Pattern: Fewer workers + higher concurrency = faster!

Why?

Fewer Workers (e.g., -c 2):
┌──────────────┐
│   Worker 1   │───┐
│   -t 10      │   │  Less communication
│   (10 reqs)  │   ├─> overhead between
└──────────────┘   │  processes
                   │
┌──────────────┐   │
│   Worker 2   │───┘
│   -t 10      │
│   (10 reqs)  │
└──────────────┘

More Workers (e.g., -c 8):
┌────────┐┌────────┐┌────────┐┌────────┐
│Worker 1││Worker 2││Worker 3││Worker 4│
│ -t 2   ││ -t 2   ││ -t 2   ││ -t 2   │
└───┬────┘└───┬────┘└───┬────┘└───┬────┘
    │         │         │         │
    └─────────┴─────────┴─────────┘
         High IPC (Inter-Process
         Communication) overhead!

Chapter 5: Putting It All Together 🎯

The Final Architecture

┌───────────────────────────────────────────────┐
│         404fuzz Primary Process                │
│                                                │
│  1. Load wordlist                              │
│  2. Parse target & options                     │
│  3. Calculate optimal worker count (-c flag)   │
│  4. Split wordlist into chunks                 │
│  5. Spawn workers                              │
└────────────┬──────────────────────────────────┘
             │
      ┌──────┴──────┐
      ↓             ↓
┌─────────────┐  ┌─────────────┐
│  Worker 1   │  │  Worker 2   │
│             │  │             │
│  ┌────────┐ │  │  ┌────────┐ │
│  │ Queue  │ │  │  │ Queue  │ │
│  │ Model  │ │  │  │ Model  │ │
│  │ (-t N) │ │  │  │ (-t N) │ │
│  └───┬────┘ │  │  └───┬────┘ │
│      ↓      │  │      ↓      │
│   [10 reqs] │  │   [10 reqs] │
└──────┬──────┘  └──────┬──────┘
       ↓                ↓
     Target          Target

Usage Examples

# Fast & balanced (recommended)
404fuzz  https://target.com -w wordlist.txt -c 2 -t 10

# Maximum concurrency, fewer workers
404fuzz  https://target.com -w wordlist.txt -c half -t 20

# Use all cores (not always faster!)
404fuzz  https://target.com -w wordlist.txt -c all -t 5

# Single core for testing
404fuzz  https://target.com -w wordlist.txt -c 1 -t 50

The Results 📊

Hardware: Dell 7290, i5 8th Gen, 8GB RAM, 256GB SSD

Performance:

Peak RPS: 121 requests/second
Memory usage: Stable (~200-300MB)
CPU usage: Efficient (50-60% on 2 cores)

Comparison:

Approach                    RPS    Memory   Crashed?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Promise.all() (naive)       N/A    >2GB     YES 💥
Queue only (single core)    ~45    ~150MB   No
Queue + Cluster (optimal)   ~121   ~250MB   No ✅

Key Takeaways 🎓

Node.js is concurrent, not parallel - Understanding the event loop is crucial
Unbounded concurrency is dangerous - Always implement a queue with limits
More workers ≠ better performance - IPC overhead is real
Sweet spot exists - Fewer workers + higher concurrency often wins
Experimentation is key - Every system is different, test your configs!

Try 404fuzz Yourself

# Clone the repository
git clone https://github.com/toklas495/404fuzz.git
cd 404fuzz

# Install dependencies
npm install

# Build and link globally
npm run build

# Verify installation
404fuzz

# Start fuzzing with recommended settings
404fuzz https://target.com/FUZZ -w /path/to/wordlist.txt -c 2 -t 10

⭐ GitHub Repository

What's Next?

Now that we've achieved speed, the next step is adding intelligence - making 404fuzz learn from responses, adapt its strategies, and discover paths smarter, not just faster.

But that's a story for another blog post. 😉

Built with ❤️ and lots of trial & error. If this helped you understand Node.js concurrency better, drop a ⭐ on the repo!