Shib™ 🚀

Posted on Feb 6 • Originally published at apistatuscheck.com

Understanding API SLAs: What 99.9% Uptime Really Means

#api #sla #uptime #devops

"99.9% uptime guaranteed!" sounds great until you do the math. That's 8.76 hours of downtime per year—or 43.8 minutes per month. For a payment API like Stripe, that could mean thousands of failed transactions.

Most developers glance at SLA numbers without understanding what they actually mean. Then downtime hits, revenue tanks, and they realize the fine print matters.

Here's everything you need to know about API SLAs—and how to avoid getting burned.

What is an API SLA?

SLA = Service Level Agreement

It's a contract between an API provider and you (the customer) that defines:

Uptime guarantees (99%, 99.9%, 99.99%)
Performance targets (response time, throughput)
Support response times (how fast they help when things break)
Compensation (what you get when they fail to deliver)

Key point: An SLA is a promise, not a reality. It's what the provider aims for, not what you're guaranteed to experience.

The Truth About Uptime Percentages

Common SLA Tiers

Uptime %	Downtime/Year	Downtime/Month	Downtime/Week	Real Impact
90%	36.5 days	3 days	16.8 hours	Unacceptable for production
95%	18.25 days	1.5 days	8.4 hours	Budget tier, risky
99%	3.65 days	7.2 hours	1.68 hours	Entry-level SaaS
99.9%	8.76 hours	43.8 min	10.1 min	Industry standard
99.95%	4.38 hours	21.9 min	5 min	High-quality APIs
99.99%	52.6 min	4.38 min	1.01 min	Enterprise grade
99.999%	5.26 min	26 sec	6 sec	"Five nines" (rare, expensive)

What "99.9% Uptime" Actually Means

Scenario: Your payment API has a 99.9% SLA.

You think: "Great, only 10 minutes of downtime per week!"

Reality:

43 minutes/month can happen anytime (Murphy's Law: during peak hours)
If you process $10,000/hour, that's $7,167 in lost revenue
Users don't care about your SLA—they just know your checkout is broken
Some providers count "scheduled maintenance" separately (read the fine print!)

The math:

99.9% uptime = 0.1% downtime
0.1% of 730 hours/month = 43.8 minutes
43.8 minutes × $10,000/hour = $7,300 potential loss

Bottom line: Even "excellent" SLAs allow significant downtime.

How API Providers Calculate Uptime

Method 1: Simple Availability

Formula: (Total time - Downtime) / Total time

Example:

Month: 730 hours
Downtime: 1 hour
Uptime: (730 - 1) / 730 = 99.86%

Sounds simple, but...

Tricky parts:

What counts as "down"?
- Some providers only count total outages (API returns nothing)
- Slow responses (5 seconds instead of 100ms) might not count
- Partial outages (50% error rate) might be "up" by their definition
When is downtime measured?
- Only successful requests? (Ignores failed ones)
- Only peak hours? (Hides overnight issues)
- Excludes "scheduled maintenance"?

Method 2: Success Rate

Formula: Successful requests / Total requests

Example:

1 million requests
999,000 succeeded
Uptime: 999,000 / 1,000,000 = 99.9%

Better metric because it reflects user experience, not just "API is responding."

SLA Fine Print: What They Don't Tell You

Exclusions (What Doesn't Count)

Most SLAs exclude:

1. Scheduled Maintenance

"We may take the service offline for up to 4 hours/month for planned maintenance with 24-hour notice."

Translation: That 99.9% SLA just became 99.3% in practice.

2. Your Fault

"Downtime caused by customer misuse, including rate limit violations or invalid API calls, is excluded."

Translation: If you hit their API too hard and it throttles you, that's on you.

3. Force Majeure (Acts of God)

"Downtime due to natural disasters, wars, pandemics, or other events beyond our control is excluded."

Translation: If AWS has a regional outage, your API provider isn't liable.

4. Third-Party Services

"We are not responsible for outages in dependencies (DNS providers, CDN networks, etc.)."

Translation: Your API might be "up" even if it's unusable due to network issues.

Credits vs. Refunds

Most SLAs offer credits, not refunds:

Example (Typical SLA):

99.9% promised, 99% delivered → 10% credit
99.9% promised, 95% delivered → 25% credit
99.9% promised, 90% delivered → 50% credit

You pay $1,000/month, they're down for 7 hours:

Lost revenue: $20,000 (your payments were offline)
Credit: $100 (10% of your monthly bill)

The math doesn't work out. SLA credits barely compensate for actual business impact.

Real API SLA Examples

Stripe

Uptime SLA: 99.99% (52 minutes/year)

Fine print:

Scheduled maintenance excluded (up to 4 hours/quarter)
Only counts "platform unavailability" (not slow responses)
Credits: 10-100% depending on severity
Must claim within 30 days

Reality: Stripe is extremely reliable, but when they go down (March 2019, 4 hours), entire internet commerce halts.

OpenAI

Uptime SLA: None for standard tier

GPT-4 API: "We'll try our best" (no formal SLA)

Enterprise tier: Custom SLAs negotiated

Translation: If ChatGPT goes down, you're SOL unless you're paying enterprise rates.

AWS

Uptime SLA: 99.99% (EC2, S3)

Fine print:

Measured per region (not globally)
Excludes "service-specific" issues
Credits: 10-100% depending on severity

Reality: AWS is rock-solid, but regional outages happen (US-East-1 in 2021 took down half the internet).

How to Protect Yourself

1. Don't Rely on a Single API

Multi-provider strategy:

Payments:

Primary: Stripe
Backup: PayPal
Failover: Auto-switch on error

AI:

Primary: OpenAI GPT-4
Backup: Anthropic Claude
Fallback: Cached responses

Email:

Primary: SendGrid
Backup: Resend
Failover: AWS SES

2. Monitor Uptime Yourself

Don't trust the provider's status page.

Use third-party monitoring:

API Status Check - Real-time monitoring for 100+ APIs
Datadog - Full infrastructure monitoring
Pingdom - Uptime tracking

Why? Providers define "up" differently than you do. Monitor from your users' perspective.

3. Build in Graceful Degradation

When APIs fail, don't break your entire product.

Strategies:

Cache responses (show stale data during outages)
Queue requests (process when API comes back)
Show friendly errors ("Payment system temporarily unavailable, try PayPal")

Example:

async function processPayment() {
  try {
    return await stripe.charge(...)
  } catch (error) {
    // Stripe down? Try PayPal
    return await paypal.charge(...)
  }
}

The Bottom Line

99.9% uptime sounds good until you do the math:

43 minutes/month = potential revenue loss
SLA credits rarely cover actual damages
Fine print excludes most real-world scenarios

How to protect yourself:

Diversify: Use multiple providers for critical APIs
Monitor: Don't trust their status page
Degrade gracefully: Build fallbacks into your product
Negotiate: If you're paying serious money, get better terms

Remember: An SLA is a minimum bar, not a promise of perfection. Even the best APIs go down. Your job is to make sure your product survives when they do.

Originally published at API Status Check

DEV Community

Understanding API SLAs: What 99.9% Uptime Really Means

What is an API SLA?

The Truth About Uptime Percentages

Common SLA Tiers

What "99.9% Uptime" Actually Means

How API Providers Calculate Uptime

Method 1: Simple Availability

Method 2: Success Rate

SLA Fine Print: What They Don't Tell You

Exclusions (What Doesn't Count)

Credits vs. Refunds

Real API SLA Examples

Stripe

OpenAI

AWS

How to Protect Yourself

1. Don't Rely on a Single API

2. Monitor Uptime Yourself

3. Build in Graceful Degradation

The Bottom Line

Top comments (0)