The Journey of Making 404fuzz Blazingly Fast β‘
When I started building 404fuzz, I had one goal: make it fast. Really fast. But I quickly learned that speed in Node.js isn't just about throwing more requests at a server. It's about understanding how Node.js actually works.
Let me take you on the journey from my first "obvious" solution to a fuzzer that achieves 121 RPS on modest hardware.
Chapter 1: The Promise.all() Trap πͺ€
My First Thought
"Easy! I'll just load all my wordlist paths and fire them all at once with Promise.all()!"
// My first naive approach
const wordlist = ['admin', 'backup', 'config', ...]; // 10,000 paths
const promises = wordlist.map(path => fetch(`${target}/${path}`));
await Promise.all(promises); // Fire everything!
The Brutal Reality
This crashed everything. My laptop froze. The target server probably hated me. What went wrong?
Here's what I learned: Promise.all() is NOT parallel execution.
Understanding Node.js: Concurrent, Not Parallel π
Let me explain with a diagram:
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Node.js Single Thread β
βββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β Your Code (Asynchronous) β
β β β
β Event Demultiplexer (Receives all events) β
β β β
β Event Queue [Event1, Event2, Event3, ...] β
β β β
β Event Loop (while(queue.length > 0)) β
β ββ Takes Event1 β
β ββ Executes Callback β
β ββ Returns immediately (non-blocking!) β
β ββ Takes Event2... β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββ
Key Insight: Node.js is concurrent (non-blocking), not parallel (multiple things at once).
When you do Promise.all() with 10,000 requests:
- β You don't get 10,000 parallel threads
- β You DO get 10,000 open connections
- β You DO consume massive memory
- β You DO overwhelm both your system and the target
Result: System crash, memory exhaustion, or you become an accidental DDoS attacker.
Chapter 2: The Queue Model - Controlled Chaos π―
The Better Approach
I needed bounded concurrency - control how many requests run at once, queue the rest.
ββββββββββββββββββββββββββββββββββββββββββ
β Wordlist (10,000 paths) β
ββββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββ
β Request Queue β
β [req1, req2, req3, req4, req5, ...] β
ββββββββββββββ¬ββββββββββββββββββββββββββββ
β
ββββββββββββββββββββββββββββββββββββββββββ
β Concurrency Limit (e.g., 50) β
β β
β [Active1] [Active2] ... [Active50] β
β β β β β
β Response Response Response β
β β β β β
β Next from queue (req51) β
ββββββββββββββββββββββββββββββββββββββββββ
The Implementation
class RequestQueue {
constructor(concurrency = 50) {
this.concurrency = concurrency;
this.running = 0;
this.queue = [];
}
async add(task) {
// If we're at limit, queue it
if (this.running >= this.concurrency) {
await new Promise(resolve => this.queue.push(resolve));
}
this.running++;
try {
return await task();
} finally {
this.running--;
// Release next queued task
if (this.queue.length > 0) {
const resolve = this.queue.shift();
resolve();
}
}
}
}
The Results
- β Memory stays stable
- β Target server doesn't die
- β Predictable resource usage
- β
You can tune with
-tflag (concurrency level)
But I wanted MORE speed. Time for the next level.
Chapter 3: Multi-Core Power - The Cluster Model πͺ
The Problem
Node.js is single-threaded. My i5 processor has 8 cores. I'm using only 12.5% of my CPU!
The Solution: Node.js Cluster Module
βββββββββββββββββββββββββββββββββββββββββββββββββββ
β Primary Process (Master) β
β β
β - Loads wordlist β
β - Splits work among workers β
β - Collects results β
ββββββββββββββ¬ββββββββββββββββββββββββ¬βββββββββββββ
β β
ββββββββ΄βββββββ ββββββββ΄βββββββ
β β β β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β Worker 1 β β Worker 2 β β Worker 3 β β Worker 4 β
β β β β β β β β
β Queue β β Queue β β Queue β β Queue β
β Model β β Model β β Model β β Model β
β (-t 10) β β (-t 10) β β (-t 10) β β (-t 10) β
ββββββββββββ ββββββββββββ ββββββββββββ ββββββββββββ
β β β β
Target Target Target Target
The Implementation
// Primary process
if (cluster.isPrimary) {
const numWorkers = getCoreCount(options.cores); // -c flag
const workload = splitWordlist(wordlist, numWorkers);
for (let i = 0; i < numWorkers; i++) {
const worker = cluster.fork();
worker.send({ paths: workload[i], concurrency: options.threads });
}
}
// Worker process
if (cluster.isWorker) {
process.on('message', async ({ paths, concurrency }) => {
const queue = new RequestQueue(concurrency);
for (const path of paths) {
await queue.add(() => fuzzPath(path));
}
});
}
Chapter 4: The Sweet Spot - Balancing Act βοΈ
Here's where it gets interesting. More workers β More speed
The Complexity
You're now balancing TWO variables:
-
Clusters (
-c): Number of worker processes -
Concurrency (
-t): Requests per worker
What I Discovered
Configuration RPS Why?
βββββββββββββββββββββββββββββββββββββββββββββ
-c 8 -t 2 ~65 Too much IPC overhead
-c 4 -t 5 ~95 Better balance
-c 2 -t 10 ~121 SWEET SPOT! β
-c 1 -t 20 ~85 Bottlenecked by single process
-c all -t 20 ~70 IPC kills performance
The Pattern: Fewer workers + higher concurrency = faster!
Why?
Fewer Workers (e.g., -c 2):
ββββββββββββββββ
β Worker 1 βββββ
β -t 10 β β Less communication
β (10 reqs) β ββ> overhead between
ββββββββββββββββ β processes
β
ββββββββββββββββ β
β Worker 2 βββββ
β -t 10 β
β (10 reqs) β
ββββββββββββββββ
More Workers (e.g., -c 8):
ββββββββββββββββββββββββββββββββββββββββ
βWorker 1ββWorker 2ββWorker 3ββWorker 4β
β -t 2 ββ -t 2 ββ -t 2 ββ -t 2 β
βββββ¬ββββββββββ¬ββββββββββ¬ββββββββββ¬βββββ
β β β β
βββββββββββ΄ββββββββββ΄ββββββββββ
High IPC (Inter-Process
Communication) overhead!
Chapter 5: Putting It All Together π―
The Final Architecture
βββββββββββββββββββββββββββββββββββββββββββββββββ
β 404fuzz Primary Process β
β β
β 1. Load wordlist β
β 2. Parse target & options β
β 3. Calculate optimal worker count (-c flag) β
β 4. Split wordlist into chunks β
β 5. Spawn workers β
ββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
ββββββββ΄βββββββ
β β
βββββββββββββββ βββββββββββββββ
β Worker 1 β β Worker 2 β
β β β β
β ββββββββββ β β ββββββββββ β
β β Queue β β β β Queue β β
β β Model β β β β Model β β
β β (-t N) β β β β (-t N) β β
β βββββ¬βββββ β β βββββ¬βββββ β
β β β β β β
β [10 reqs] β β [10 reqs] β
ββββββββ¬βββββββ ββββββββ¬βββββββ
β β
Target Target
Usage Examples
# Fast & balanced (recommended)
404fuzz https://target.com -w wordlist.txt -c 2 -t 10
# Maximum concurrency, fewer workers
404fuzz https://target.com -w wordlist.txt -c half -t 20
# Use all cores (not always faster!)
404fuzz https://target.com -w wordlist.txt -c all -t 5
# Single core for testing
404fuzz https://target.com -w wordlist.txt -c 1 -t 50
The Results π
Hardware: Dell 7290, i5 8th Gen, 8GB RAM, 256GB SSD
Performance:
- Peak RPS: 121 requests/second
- Memory usage: Stable (~200-300MB)
- CPU usage: Efficient (50-60% on 2 cores)
Comparison:
Approach RPS Memory Crashed?
βββββββββββββββββββββββββββββββββββββββββββββββββ
Promise.all() (naive) N/A >2GB YES π₯
Queue only (single core) ~45 ~150MB No
Queue + Cluster (optimal) ~121 ~250MB No β
Key Takeaways π
- Node.js is concurrent, not parallel - Understanding the event loop is crucial
- Unbounded concurrency is dangerous - Always implement a queue with limits
- More workers β better performance - IPC overhead is real
- Sweet spot exists - Fewer workers + higher concurrency often wins
- Experimentation is key - Every system is different, test your configs!
Try 404fuzz Yourself
# Clone the repository
git clone https://github.com/toklas495/404fuzz.git
cd 404fuzz
# Install dependencies
npm install
# Build and link globally
npm run build
# Verify installation
404fuzz
# Start fuzzing with recommended settings
404fuzz https://target.com/FUZZ -w /path/to/wordlist.txt -c 2 -t 10
What's Next?
Now that we've achieved speed, the next step is adding intelligence - making 404fuzz learn from responses, adapt its strategies, and discover paths smarter, not just faster.
But that's a story for another blog post. π
Built with β€οΈ and lots of trial & error. If this helped you understand Node.js concurrency better, drop a β on the repo!
Top comments (1)
Nice good work