DEV Community

Cover image for I Crashed My Server with Promise.all() - Here's How I Built a 121 RPS Fuzzer Instead
Nimesh Thakur
Nimesh Thakur

Posted on

I Crashed My Server with Promise.all() - Here's How I Built a 121 RPS Fuzzer Instead

The Journey of Making 404fuzz Blazingly Fast ⚑

When I started building 404fuzz, I had one goal: make it fast. Really fast. But I quickly learned that speed in Node.js isn't just about throwing more requests at a server. It's about understanding how Node.js actually works.

Let me take you on the journey from my first "obvious" solution to a fuzzer that achieves 121 RPS on modest hardware.


Chapter 1: The Promise.all() Trap πŸͺ€

My First Thought

"Easy! I'll just load all my wordlist paths and fire them all at once with Promise.all()!"

// My first naive approach
const wordlist = ['admin', 'backup', 'config', ...]; // 10,000 paths
const promises = wordlist.map(path => fetch(`${target}/${path}`));
await Promise.all(promises); // Fire everything!
Enter fullscreen mode Exit fullscreen mode

The Brutal Reality

This crashed everything. My laptop froze. The target server probably hated me. What went wrong?

Here's what I learned: Promise.all() is NOT parallel execution.


Understanding Node.js: Concurrent, Not Parallel πŸ”„

Let me explain with a diagram:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚          Node.js Single Thread                  β”‚
β”œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€
β”‚                                                 β”‚
β”‚  Your Code (Asynchronous)                        β”‚
β”‚         ↓                                       β”‚
β”‚  Event Demultiplexer (Receives all events)     β”‚
β”‚         ↓                                       β”‚
β”‚  Event Queue [Event1, Event2, Event3, ...]     β”‚
β”‚         ↓                                       β”‚
β”‚  Event Loop (while(queue.length > 0))          β”‚
β”‚    β”œβ”€ Takes Event1                             β”‚
β”‚    β”œβ”€ Executes Callback                        β”‚
β”‚    β”œβ”€ Returns immediately (non-blocking!)      β”‚
β”‚    └─ Takes Event2...                          β”‚
β”‚                                                 β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

Key Insight: Node.js is concurrent (non-blocking), not parallel (multiple things at once).

When you do Promise.all() with 10,000 requests:

  • ❌ You don't get 10,000 parallel threads
  • ❌ You DO get 10,000 open connections
  • ❌ You DO consume massive memory
  • ❌ You DO overwhelm both your system and the target

Result: System crash, memory exhaustion, or you become an accidental DDoS attacker.


Chapter 2: The Queue Model - Controlled Chaos 🎯

The Better Approach

I needed bounded concurrency - control how many requests run at once, queue the rest.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚      Wordlist (10,000 paths)           β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         Request Queue                  β”‚
β”‚  [req1, req2, req3, req4, req5, ...]   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚    Concurrency Limit (e.g., 50)       β”‚
β”‚                                        β”‚
β”‚  [Active1] [Active2] ... [Active50]   β”‚
β”‚      ↓          ↓            ↓         β”‚
β”‚   Response  Response     Response      β”‚
β”‚      ↓          ↓            ↓         β”‚
β”‚  Next from queue (req51)               β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
Enter fullscreen mode Exit fullscreen mode

The Implementation

class RequestQueue {
  constructor(concurrency = 50) {
    this.concurrency = concurrency;
    this.running = 0;
    this.queue = [];
  }

  async add(task) {
    // If we're at limit, queue it
    if (this.running >= this.concurrency) {
      await new Promise(resolve => this.queue.push(resolve));
    }

    this.running++;
    try {
      return await task();
    } finally {
      this.running--;
      // Release next queued task
      if (this.queue.length > 0) {
        const resolve = this.queue.shift();
        resolve();
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

The Results

  • βœ… Memory stays stable
  • βœ… Target server doesn't die
  • βœ… Predictable resource usage
  • βœ… You can tune with -t flag (concurrency level)

But I wanted MORE speed. Time for the next level.


Chapter 3: Multi-Core Power - The Cluster Model πŸ’ͺ

The Problem

Node.js is single-threaded. My i5 processor has 8 cores. I'm using only 12.5% of my CPU!

The Solution: Node.js Cluster Module

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚            Primary Process (Master)             β”‚
β”‚                                                 β”‚
β”‚  - Loads wordlist                               β”‚
β”‚  - Splits work among workers                    β”‚
β”‚  - Collects results                             β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚                       β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
      ↓             ↓         ↓             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚ Worker 1 β”‚  β”‚ Worker 2 β”‚  β”‚ Worker 3 β”‚  β”‚ Worker 4 β”‚
β”‚          β”‚  β”‚          β”‚  β”‚          β”‚  β”‚          β”‚
β”‚  Queue   β”‚  β”‚  Queue   β”‚  β”‚  Queue   β”‚  β”‚  Queue   β”‚
β”‚  Model   β”‚  β”‚  Model   β”‚  β”‚  Model   β”‚  β”‚  Model   β”‚
β”‚  (-t 10) β”‚  β”‚  (-t 10) β”‚  β”‚  (-t 10) β”‚  β”‚  (-t 10) β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
     ↓             ↓              ↓             ↓
   Target       Target         Target        Target
Enter fullscreen mode Exit fullscreen mode

The Implementation

// Primary process
if (cluster.isPrimary) {
  const numWorkers = getCoreCount(options.cores); // -c flag
  const workload = splitWordlist(wordlist, numWorkers);

  for (let i = 0; i < numWorkers; i++) {
    const worker = cluster.fork();
    worker.send({ paths: workload[i], concurrency: options.threads });
  }
}

// Worker process
if (cluster.isWorker) {
  process.on('message', async ({ paths, concurrency }) => {
    const queue = new RequestQueue(concurrency);
    for (const path of paths) {
      await queue.add(() => fuzzPath(path));
    }
  });
}
Enter fullscreen mode Exit fullscreen mode

Chapter 4: The Sweet Spot - Balancing Act βš–οΈ

Here's where it gets interesting. More workers β‰  More speed

The Complexity

You're now balancing TWO variables:

  • Clusters (-c): Number of worker processes
  • Concurrency (-t): Requests per worker

What I Discovered

Configuration              RPS     Why?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
-c 8  -t 2                 ~65     Too much IPC overhead
-c 4  -t 5                 ~95     Better balance
-c 2  -t 10               ~121     SWEET SPOT! ⭐
-c 1  -t 20                ~85     Bottlenecked by single process
-c all -t 20              ~70     IPC kills performance
Enter fullscreen mode Exit fullscreen mode

The Pattern: Fewer workers + higher concurrency = faster!

Why?

Fewer Workers (e.g., -c 2):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   Worker 1   │───┐
β”‚   -t 10      β”‚   β”‚  Less communication
β”‚   (10 reqs)  β”‚   β”œβ”€> overhead between
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚  processes
                   β”‚
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   Worker 2   β”‚β”€β”€β”€β”˜
β”‚   -t 10      β”‚
β”‚   (10 reqs)  β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

More Workers (e.g., -c 8):
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚Worker 1β”‚β”‚Worker 2β”‚β”‚Worker 3β”‚β”‚Worker 4β”‚
β”‚ -t 2   β”‚β”‚ -t 2   β”‚β”‚ -t 2   β”‚β”‚ -t 2   β”‚
β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜
    β”‚         β”‚         β”‚         β”‚
    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         High IPC (Inter-Process
         Communication) overhead!
Enter fullscreen mode Exit fullscreen mode

Chapter 5: Putting It All Together 🎯

The Final Architecture

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚         404fuzz Primary Process                β”‚
β”‚                                                β”‚
β”‚  1. Load wordlist                              β”‚
β”‚  2. Parse target & options                     β”‚
β”‚  3. Calculate optimal worker count (-c flag)   β”‚
β”‚  4. Split wordlist into chunks                 β”‚
β”‚  5. Spawn workers                              β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
             β”‚
      β”Œβ”€β”€β”€β”€β”€β”€β”΄β”€β”€β”€β”€β”€β”€β”
      ↓             ↓
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Worker 1   β”‚  β”‚  Worker 2   β”‚
β”‚             β”‚  β”‚             β”‚
β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚  β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
β”‚  β”‚ Queue  β”‚ β”‚  β”‚  β”‚ Queue  β”‚ β”‚
β”‚  β”‚ Model  β”‚ β”‚  β”‚  β”‚ Model  β”‚ β”‚
β”‚  β”‚ (-t N) β”‚ β”‚  β”‚  β”‚ (-t N) β”‚ β”‚
β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚  β”‚  β””β”€β”€β”€β”¬β”€β”€β”€β”€β”˜ β”‚
β”‚      ↓      β”‚  β”‚      ↓      β”‚
β”‚   [10 reqs] β”‚  β”‚   [10 reqs] β”‚
β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜  β””β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”˜
       ↓                ↓
     Target          Target
Enter fullscreen mode Exit fullscreen mode

Usage Examples

# Fast & balanced (recommended)
404fuzz  https://target.com -w wordlist.txt -c 2 -t 10

# Maximum concurrency, fewer workers
404fuzz  https://target.com -w wordlist.txt -c half -t 20

# Use all cores (not always faster!)
404fuzz  https://target.com -w wordlist.txt -c all -t 5

# Single core for testing
404fuzz  https://target.com -w wordlist.txt -c 1 -t 50
Enter fullscreen mode Exit fullscreen mode

The Results πŸ“Š

Hardware: Dell 7290, i5 8th Gen, 8GB RAM, 256GB SSD

Performance:

  • Peak RPS: 121 requests/second
  • Memory usage: Stable (~200-300MB)
  • CPU usage: Efficient (50-60% on 2 cores)

Comparison:

Approach                    RPS    Memory   Crashed?
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Promise.all() (naive)       N/A    >2GB     YES πŸ’₯
Queue only (single core)    ~45    ~150MB   No
Queue + Cluster (optimal)   ~121   ~250MB   No βœ…
Enter fullscreen mode Exit fullscreen mode

Key Takeaways πŸŽ“

  1. Node.js is concurrent, not parallel - Understanding the event loop is crucial
  2. Unbounded concurrency is dangerous - Always implement a queue with limits
  3. More workers β‰  better performance - IPC overhead is real
  4. Sweet spot exists - Fewer workers + higher concurrency often wins
  5. Experimentation is key - Every system is different, test your configs!

Try 404fuzz Yourself

# Clone the repository
git clone https://github.com/toklas495/404fuzz.git
cd 404fuzz

# Install dependencies
npm install

# Build and link globally
npm run build

# Verify installation
404fuzz

# Start fuzzing with recommended settings
404fuzz https://target.com/FUZZ -w /path/to/wordlist.txt -c 2 -t 10
Enter fullscreen mode Exit fullscreen mode

⭐ GitHub Repository


What's Next?

Now that we've achieved speed, the next step is adding intelligence - making 404fuzz learn from responses, adapt its strategies, and discover paths smarter, not just faster.

But that's a story for another blog post. πŸ˜‰


Built with ❀️ and lots of trial & error. If this helped you understand Node.js concurrency better, drop a ⭐ on the repo!

Top comments (1)

Collapse
 
david_thomas profile image
David Thomas

Nice good work