Marco

Posted on Feb 4 • Originally published at marc0.dev

I Tested the New Qwen3-Coder-Next Locally—Here's Why 3B Active Params Changes Everything

#programming #opensource #ai #machinelearning

An 80B model that only uses 3B parameters per token just scored 70.6% on SWE-Bench Verified. That beats DeepSeek-V3.2 (671B params) while being 10 points behind Claude Opus 4.5—on hardware you can actually afford.

Alibaba dropped Qwen3-Coder-Next yesterday (Feb 3, 2026). This isn't just another open-source model release. It's the first time a local-runnable model closes the gap with frontier proprietary systems to single digits on real-world coding benchmarks.

The Numbers That Matter

Model	Params Active	SWE-Bench Verified
Claude Opus 4.5	Unknown (closed)	80.9%
GLM-4.7	358B	74.2%
Qwen3-Coder-Next	3B	70.6%
DeepSeek-V3.2	671B	70.2%

Read that again: 3B active params beating 671B.

The MoE (Mixture of Experts) architecture activates only what's needed per token. You get the reasoning depth of an 80B model with the inference speed of a 3B model.

The Strategic Angle: AI Infrastructure Independence

The "what if the music stops" fear is real. Anthropic burns $2B/year. OpenAI needs $10B+ annually. Currently, these companies are subsidizing our workflows with VC money. One day, the pricing goes 10x, or the API disappears behind an enterprise wall.

And now? Insurance exists. It costs the price of a high-end Mac Mini ($2,000). One-time payment. Yours forever.

With Qwen3 reaching 70% on SWE-Bench locally, "Local AI" is no longer about running garbage 7B models that struggle with for-loops. It's legitimately competitive with frontier models for real engineering work.

Your new backup plan:

High-End: Keep using Opus/Sonnet for the hardest architectural reasoning (it's still ~10 points ahead)
Insurance: Know that if the cloud infrastructure shifts, you have the hardware and the weights to keep shipping
Privacy: For sensitive code, it already makes sense to run local

This is the moment local AI stopped being a hobby project and became a legitimate professional tool. That's not just tech news—that's career security.

Why This Is Different

Most coding models learn by predicting the next token. Read-only education.

Qwen3-Coder-Next was trained on 800,000 verifiable tasks from real GitHub PRs:

Model tries to fix a bug
Runs tests in Docker
Gets pass/fail feedback
Learns from actual execution

It learned to fix bugs and pass tests—not just write code.

Run It Locally in 5 Minutes

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh

# Pull the model (~46GB)
ollama pull qwen3-coder-next

# Configure for Claude Code
export ANTHROPIC_BASE_URL="http://localhost:11434"
export ANTHROPIC_AUTH_TOKEN="ollama"

# Run
claude --model qwen3-coder-next

Hardware needed: Mac Mini M4 Pro 64GB ($2,000) or 48GB+ VRAM.

The Economics

Cloud API (Opus 4.5): $70-150/month heavy usage
Local (Qwen3-Coder-Next): $2,000 once + $5/month electricity
Break-even: 8-12 months

After that, frontier-level coding AI for basically free.

Full technical breakdown, OpenClaw integration guide, and setup details in the original article:

👉 Read the full article on marc0.dev

What's your local AI setup? Running any models on your own hardware? Drop a comment—curious what's working for people.

Top comments (3)

david duymelinck • Feb 4

Mac Mini M4 Pro

I started laughing when I read that, because it put the image in my head of everyone that is using AI walking around with a mac mini and a screen instead of a laptop.

Ben Santora • Feb 4 • Edited

"Local AI" is no longer about running garbage 7B models that struggle with for-loops.

Hey, are you calling my Llama-3.1 8B garbage? I love that little model. I don't have the cash for a NVIDIA GPU and would never buy a PC with a closed ecosystem.

Running my slms bare-metal on Linux Debian on my 2021 HP ENVY 17m-ch0xxx:
• CPU: 11th Gen Intel Core i7-1165G7 (4 cores, 8 threads), up to 4.7 GHz
• RAM: 12 GiB

Compiled llama.cpp with flags to utilize AVX-512 which helps a lot in CPU-only mode.

PEACEBINFLOW • Feb 5

This is the part that really changes the conversation for me: 3B active params beating 600B-class models on SWE-Bench isn’t just a benchmark flex, it’s an infrastructure shift.

The MoE angle matters way more than people realize. It’s basically saying: you don’t need constant brute force to reason well — you need the right experts waking up at the right time. That’s a very different future than “throw more GPUs at it.”

Also 100% agree on the “insurance” framing. This feels like the first time local AI isn’t a compromise anymore. Not “good enough for toy projects,” but actually viable for real engineering work if the cloud rug ever gets pulled or repriced into oblivion.

The training approach is underrated too. Learning from real PRs + tests + execution feedback explains why the gap closed so fast. That’s not next-token cosplay — that’s learning how code survives contact with reality.

I’m still using cloud models for heavy architectural thinking, but having something like this locally feels like owning your tools again instead of renting your brain. Curious how far you think this goes — do you see local MoE models eventually eating into the architectural reasoning tier too, or will that stay cloud-dominated for a while?