Our dev environment bill was killing us. Every day was costing around $28, and no matter what we did, that number wouldn't budge. We'd optimize here, scale down there, but nothing seemed to stick. Then one afternoon, I actually looked at the bill line by line instead of just checking the dashboard.
That's when I found the real problem.
The Two Leaks We Discovered
The ECS Cluster Nobody Used
We had spun up a full ECS cluster for development workloads. Full cluster. For dev. The reasoning at the time was probably something like "production-grade infrastructure everywhere" which sounds good until you realize you're paying production prices for development work.
The reality: most of these workloads didn't need that level of orchestration. We were running simple background jobs and web services that spent most of their time idle. The cluster was burning money just sitting there waiting for occasional traffic.
The Jump Boxes That Sat Empty
We had Windows jump hosts for database access. Two of them. Running 24/7. They were idle probably 90% of the time, but there they were on the bill every single day.
On top of that, we were paying for NAT gateways to route traffic to these instances. Those aren't cheap when you're not actively using them.
What We Actually Changed
We didn't reinvent the wheel. We just right-sized our architecture to match what we actually needed.
Moving to Lightsail
We migrated the non-critical workloads off ECS and onto Amazon Lightsail. This sounds like a bigger change than it was. In practice, we containerized the same code, pointed it at Lightsail's container service, and we were done.
The difference was immediate. Lightsail is simpler, cheaper, and honestly overkill for what we were running anyway. The workloads didn't care about ECS's orchestration features. They just needed to run.
Daily compute cost dropped hard after this move.
Replacing Jump Boxes With SSH Tunneling
The jump boxes were there for one reason: secure access to databases. But a $150-300/month Windows instance is a sledgehammer solution for something that SSH tunneling solves better and cheaper.
We set up a bastion host using a basic EC2 instance (the smallest one you can get), configured SSH tunneling, and removed the expensive jump boxes. Developers could tunnel securely to the database without maintaining dedicated gateway infrastructure.
Cost went from paying for always-on jump hosts to paying for a tiny bastion that barely gets touched.
The Numbers
Before: $28/day
After: under $10/day
That projects to about $540/month in savings, or roughly 65% less.
Did we lose any functionality? No. The dev environment works the same. Deployments work the same. The only difference is developers don't have to wait for infrastructure that was overbuilt for what they actually do.
Why This Matters
There's this pattern in cloud architecture where "enterprise-grade" becomes the default. It makes sense for production—you want resilience, you want redundancy, you want things to survive failures gracefully.
But development environments aren't production. They're where you build and test. They need to be reliable enough that your team isn't blocked, but they don't need to cost like they're running Netflix.
The biggest optimization win isn't usually some clever trick. It's looking at what you're actually using versus what you're actually paying for, then doing something about the gap.
In our case, the gap was huge.
If You're In a Similar Spot
Take 30 minutes this week and break down your non-production AWS bill by service. Look for the same patterns we found—full-featured services handling simple workloads, infrastructure running idle most of the time, features you provisioned "just in case" but never use.
You might find the same kinds of leaks. And fixing them is usually simpler than you'd think.
Top comments (1)
65% in 3 weeks is impressive.
Most teams underestimate how much waste hides in idle resources and over-provisioning.
Curious — was the biggest win from architecture changes or usage visibility?