METHOD · JUN · 04 · 2026

Graceful Degradation: What Your AI System Should Do When a Dependency Goes Down

Most AI workflows are built as if every external dependency has 100% uptime. They don't. Here is how to map your dependencies, assign realistic failure probabilities, and implement fallbacks that keep pipelines running when something breaks.

5 MIN READ

A single AI workflow can touch five or six external services before it produces output. An LLM API. A vector database. A CRM lookup. A web scraper. A third-party enrichment provider. Each one has its own uptime record — and none of them is 100%.

When one goes down, a system with no degradation plan stops completely. A 10-minute outage becomes a full pipeline halt. Queues back up. Downstream steps time out. Someone files a ticket.

The fix is not better SLAs from your vendors. It is designing for failure from the start.

The Dependency Map Exercise

Before you can degrade gracefully, you need to know what you depend on.

List every external call your workflow makes. Be exhaustive. Include:

LLM inference endpoints (OpenAI, Anthropic, self-hosted)
Vector stores and retrieval layers
CRM reads and writes
Enrichment APIs (LinkedIn data, firmographic providers)
Web fetch or scraping services
Internal microservices owned by another team
Auth and token validation endpoints

For each dependency, assign a realistic monthly uptime figure. Use the vendor's published status page history, not their marketing SLA. A service that advertises 99.9% uptime but has had three incidents in the last 90 days is not a 99.9% service in practice.

Then do the math. If your workflow requires five dependencies and each has 99.5% uptime, the combined availability of the full chain is roughly 97.5%. That is about 18 hours of potential downtime per month — not from negligence, just from arithmetic.

This exercise usually produces two reactions. First, surprise at how many external calls exist. Second, clarity about which dependencies are the highest risk.

Three Degradation Modes

Not every failure warrants the same response. There are three modes. Choosing the right one depends on the dependency's role in the workflow.

Full Fallback (Cached Result)

Use this when the dependency provides data that changes slowly and where a slightly stale answer is better than no answer.

Example: a firmographic enrichment call that returns company size and industry. If the enrichment API is down, serve the last cached result with a timestamp. A company's employee count from 48 hours ago is almost always good enough to continue processing.

Requirements: a cache layer with a defined TTL, a staleness threshold your team has agreed on, and a flag in the output indicating the result came from cache.

Partial Fallback (Reduced Output with a Flag)

Use this when the dependency contributes to output quality but is not required for the output to exist.

Example: a workflow that generates a prospect summary using both CRM history and live web data. If the web fetch fails, produce the summary from CRM data alone. Mark the output as enrichment_incomplete: true so downstream consumers know to treat it with lower confidence.

This keeps the pipeline moving. It also creates a clear audit trail. When the dependency recovers, you can reprocess flagged records.

Graceful Rejection (Fail Loudly and Fast)

Use this when the dependency is load-bearing and no partial result is better than a wrong result.

Example: a compliance check that must run before a message is sent. If the compliance API is unreachable, do not send the message. Do not guess. Reject the job immediately, log the reason, and route it to a retry queue with a backoff schedule.

The key word is fast. A graceful rejection that times out after 30 seconds is not graceful. Set aggressive timeouts — 2 to 5 seconds for most API calls — and reject early. This protects the rest of your pipeline from cascading delays.

The Separation Pattern

The most common implementation mistake is embedding fallback logic directly inside the main workflow function. It looks like this: a long conditional block that checks for errors, tries a cache, checks again, and eventually returns something. After two engineers touch it, nobody is confident what it actually does.

Separation pattern: treat each dependency as a wrapped client, not a raw call.

The wrapper owns three things:

The live call
The fallback behavior for that specific dependency
The logging and flagging contract

The main workflow logic calls the wrapper and receives either a result or a structured failure object. It does not know whether the result came from live data or cache. It does not contain any retry logic. It just processes what it receives.

This keeps degradation code isolated and testable. You can unit-test each wrapper's fallback independently. You can swap a caching strategy without touching workflow logic. When a new dependency is added, the pattern is already established.

A secondary benefit: when an incident occurs, the wrapper's logs tell you exactly which dependency failed, at what time, and which fallback mode activated. Debugging a production outage takes minutes instead of hours.

What This Looks Like in Practice

A workflow that processes 200 prospect records per hour will encounter dependency failures regularly — not occasionally. At realistic uptime figures, expect at least one partial failure event per day in a moderately complex pipeline.

Systems built with degradation modes handle these events without human intervention. The pipeline continues. Flagged records get reprocessed when dependencies recover. Operators see a dashboard metric, not a support ticket.

Systems without degradation modes stop, alert, and wait for someone to restart them.

The difference is not engineering heroics. It is a dependency map, three defined modes, and a separation pattern applied consistently.

If you are building or auditing an AI pipeline and want to walk through the dependency map for your specific workflow, Start a conversation →