"99.9% uptime guaranteed!" sounds great until you do the math. That's 8.76 hours of downtime per year—or 43.8 minutes per month. For a payment API like Stripe, that could mean thousands of failed transactions.
Most developers glance at SLA numbers without understanding what they actually mean. Then downtime hits, revenue tanks, and they realize the fine print matters.
Here's everything you need to know about API SLAs—and how to avoid getting burned.
What is an API SLA?
SLA = Service Level Agreement
It's a contract between an API provider and you (the customer) that defines:
- Uptime guarantees (99%, 99.9%, 99.99%)
- Performance targets (response time, throughput)
- Support response times (how fast they help when things break)
- Compensation (what you get when they fail to deliver)
Key point: An SLA is a promise, not a reality. It's what the provider aims for, not what you're guaranteed to experience.
The Truth About Uptime Percentages
Common SLA Tiers
| Uptime % | Downtime/Year | Downtime/Month | Downtime/Week | Real Impact |
|---|---|---|---|---|
| 90% | 36.5 days | 3 days | 16.8 hours | Unacceptable for production |
| 95% | 18.25 days | 1.5 days | 8.4 hours | Budget tier, risky |
| 99% | 3.65 days | 7.2 hours | 1.68 hours | Entry-level SaaS |
| 99.9% | 8.76 hours | 43.8 min | 10.1 min | Industry standard |
| 99.95% | 4.38 hours | 21.9 min | 5 min | High-quality APIs |
| 99.99% | 52.6 min | 4.38 min | 1.01 min | Enterprise grade |
| 99.999% | 5.26 min | 26 sec | 6 sec | "Five nines" (rare, expensive) |
What "99.9% Uptime" Actually Means
Scenario: Your payment API has a 99.9% SLA.
You think: "Great, only 10 minutes of downtime per week!"
Reality:
- 43 minutes/month can happen anytime (Murphy's Law: during peak hours)
- If you process $10,000/hour, that's $7,167 in lost revenue
- Users don't care about your SLA—they just know your checkout is broken
- Some providers count "scheduled maintenance" separately (read the fine print!)
The math:
99.9% uptime = 0.1% downtime
0.1% of 730 hours/month = 43.8 minutes
43.8 minutes × $10,000/hour = $7,300 potential loss
Bottom line: Even "excellent" SLAs allow significant downtime.
How API Providers Calculate Uptime
Method 1: Simple Availability
Formula: (Total time - Downtime) / Total time
Example:
- Month: 730 hours
- Downtime: 1 hour
- Uptime: (730 - 1) / 730 = 99.86%
Sounds simple, but...
Tricky parts:
-
What counts as "down"?
- Some providers only count total outages (API returns nothing)
- Slow responses (5 seconds instead of 100ms) might not count
- Partial outages (50% error rate) might be "up" by their definition
-
When is downtime measured?
- Only successful requests? (Ignores failed ones)
- Only peak hours? (Hides overnight issues)
- Excludes "scheduled maintenance"?
Method 2: Success Rate
Formula: Successful requests / Total requests
Example:
- 1 million requests
- 999,000 succeeded
- Uptime: 999,000 / 1,000,000 = 99.9%
Better metric because it reflects user experience, not just "API is responding."
SLA Fine Print: What They Don't Tell You
Exclusions (What Doesn't Count)
Most SLAs exclude:
1. Scheduled Maintenance
"We may take the service offline for up to 4 hours/month for planned maintenance with 24-hour notice."
Translation: That 99.9% SLA just became 99.3% in practice.
2. Your Fault
"Downtime caused by customer misuse, including rate limit violations or invalid API calls, is excluded."
Translation: If you hit their API too hard and it throttles you, that's on you.
3. Force Majeure (Acts of God)
"Downtime due to natural disasters, wars, pandemics, or other events beyond our control is excluded."
Translation: If AWS has a regional outage, your API provider isn't liable.
4. Third-Party Services
"We are not responsible for outages in dependencies (DNS providers, CDN networks, etc.)."
Translation: Your API might be "up" even if it's unusable due to network issues.
Credits vs. Refunds
Most SLAs offer credits, not refunds:
Example (Typical SLA):
- 99.9% promised, 99% delivered → 10% credit
- 99.9% promised, 95% delivered → 25% credit
- 99.9% promised, 90% delivered → 50% credit
You pay $1,000/month, they're down for 7 hours:
- Lost revenue: $20,000 (your payments were offline)
- Credit: $100 (10% of your monthly bill)
The math doesn't work out. SLA credits barely compensate for actual business impact.
Real API SLA Examples
Stripe
Uptime SLA: 99.99% (52 minutes/year)
Fine print:
- Scheduled maintenance excluded (up to 4 hours/quarter)
- Only counts "platform unavailability" (not slow responses)
- Credits: 10-100% depending on severity
- Must claim within 30 days
Reality: Stripe is extremely reliable, but when they go down (March 2019, 4 hours), entire internet commerce halts.
OpenAI
Uptime SLA: None for standard tier
GPT-4 API: "We'll try our best" (no formal SLA)
Enterprise tier: Custom SLAs negotiated
Translation: If ChatGPT goes down, you're SOL unless you're paying enterprise rates.
AWS
Uptime SLA: 99.99% (EC2, S3)
Fine print:
- Measured per region (not globally)
- Excludes "service-specific" issues
- Credits: 10-100% depending on severity
Reality: AWS is rock-solid, but regional outages happen (US-East-1 in 2021 took down half the internet).
How to Protect Yourself
1. Don't Rely on a Single API
Multi-provider strategy:
Payments:
- Primary: Stripe
- Backup: PayPal
- Failover: Auto-switch on error
AI:
- Primary: OpenAI GPT-4
- Backup: Anthropic Claude
- Fallback: Cached responses
Email:
- Primary: SendGrid
- Backup: Resend
- Failover: AWS SES
2. Monitor Uptime Yourself
Don't trust the provider's status page.
Use third-party monitoring:
- API Status Check - Real-time monitoring for 100+ APIs
- Datadog - Full infrastructure monitoring
- Pingdom - Uptime tracking
Why? Providers define "up" differently than you do. Monitor from your users' perspective.
3. Build in Graceful Degradation
When APIs fail, don't break your entire product.
Strategies:
- Cache responses (show stale data during outages)
- Queue requests (process when API comes back)
- Show friendly errors ("Payment system temporarily unavailable, try PayPal")
Example:
async function processPayment() {
try {
return await stripe.charge(...)
} catch (error) {
// Stripe down? Try PayPal
return await paypal.charge(...)
}
}
The Bottom Line
99.9% uptime sounds good until you do the math:
- 43 minutes/month = potential revenue loss
- SLA credits rarely cover actual damages
- Fine print excludes most real-world scenarios
How to protect yourself:
- Diversify: Use multiple providers for critical APIs
- Monitor: Don't trust their status page
- Degrade gracefully: Build fallbacks into your product
- Negotiate: If you're paying serious money, get better terms
Remember: An SLA is a minimum bar, not a promise of perfection. Even the best APIs go down. Your job is to make sure your product survives when they do.
Originally published at API Status Check
Top comments (0)