June 9, 2026 · 9 min read
Claude Fable 5 API: Migration Guide for Agent Builders
A working Claude Fable 5 API migration guide: the model swap, the 400 errors to fix, classifier fallback to Opus 4.8, and effort tuning to control cost.

Swap claude-opus-4-8 for claude-fable-5 in a production agent and one of three things happens: it works, it returns a 400, or it returns a perfectly healthy HTTP 200 that contains a refusal instead of an answer. I migrated one of our client-facing agents this morning, a few hours after Anthropic shipped Fable 5, and hit all three. This guide covers the whole path: the swap, the parameters that now error, the classifier fallback you have to wire up yourself, and the effort settings that decide whether the 2x price hike actually costs you 2x.
Quick context before the code. Fable 5 is the first Mythos-class model Anthropic has released for general use, announced June 9 at $10 per million input tokens and $50 per million output, double Opus 4.8. The benchmark gap is real: 80.3% on SWE-Bench Pro against Opus 4.8's 69.2%, and Stripe reportedly ran a 50-million-line Ruby migration through it in a single day.
It has a 1M token context window and 128K max output. None of that matters if your requests don't go through, so let's start there.
The Model Swap (and the 400 You'll Hit)
If you're coming from Opus 4.7 or 4.8, the request surface is almost identical. Almost. Fable 5 keeps every restriction those models introduced (temperature, top_p, and top_k are gone; budget_tokens is gone; last-turn assistant prefills are gone) and adds one of its own: an explicit thinking: { type: "disabled" } now returns a 400. On Opus 4.8 that was a legal way to skip reasoning. On Fable 5 you omit the thinking parameter entirely instead.
Here's the swap with the corrections applied:
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic();
const response = await client.messages.create({
model: "claude-fable-5", // highlight-line
max_tokens: 16000,
thinking: { type: "adaptive" },
output_config: { effort: "high" },
messages: [{ role: "user", content: "Summarize the migration plan." }],
});The mistakes I see most when teams migrate older code, and what each one does on Fable 5:
thinking: { type: "enabled", budget_tokens: 8000 }returns a 400. Use{ type: "adaptive" }and control depth witheffort.thinking: { type: "disabled" }returns a 400. Omit the parameter. This is the only new breaking change versus Opus 4.8.temperature: 0.7(ortop_p,top_k) returns a 400. Delete it and steer with prompting.- A trailing assistant message used as a prefill returns a 400. Use structured outputs (
output_config.format) instead.
One non-obvious detail worth knowing: Fable 5's minimum cacheable prefix is 2,048 tokens, half of Opus 4.8's 4,096. If you have a mid-sized system prompt that silently failed to cache on Opus, it may start caching on Fable 5, which softens the price gap more than you'd expect since cache reads bill at roughly a tenth of base input.
Handle the Classifier Fallback Yourself
This is the part the launch coverage glossed over, and it's the part that will bite you in production. Fable 5 runs safety classifiers over every request, covering offensive cybersecurity, biology and chemistry, and attempts to extract its reasoning for distillation. In the Claude apps, a flagged request silently falls back to Opus 4.8 and the user still gets an answer. On the raw Messages API, the default is a block. Your agent gets an HTTP 200 with stop_reason: "refusal" and no useful content.
Anthropic says the classifiers fire in under 5% of sessions, and our agents do legitimate work, so I almost skipped this step. Don't. We build agents that touch dependency audits and CVE feeds for clients, and "under 5%" includes exactly that kind of adjacent-to-security traffic. The classifiers are deliberately tuned conservative at launch.
The fix is the server-side fallback beta. You declare a fallback chain on the request, and when a classifier fires, the API re-routes to Opus 4.8 on Anthropic's side and returns a normal response:
const response = await client.beta.messages.create({
model: "claude-fable-5",
max_tokens: 16000,
thinking: { type: "adaptive" },
betas: ["server-side-fallback-2026-06-01"], // highlight-line
fallbacks: [{ model: "claude-opus-4-8" }], // highlight-line
messages: [{ role: "user", content: userMessage }],
});When a fallback happens, the response's content array starts with a fallback block recording the hop, and usage.iterations itemizes which model actually served the turn. Log that. If you track cost or quality per model, attributing fallback turns to Fable 5 will quietly corrupt your numbers:
const hop = response.content.find((b) => b.type === "fallback");
if (hop) {
logger.info("classifier_fallback", {
from: "claude-fable-5",
servedBy: response.model, // "claude-opus-4-8"
});
}Two operational notes from the official fallback guide. First, the fallbacks parameter is per-request, not account-level, so audit every code path that builds a request: retry handlers, regeneration buttons, subagent spawns. A fallback you only configured on the happy path is a refusal waiting in your retry logic. Second, the billing on blocks is friendlier than you'd guess. Input tokens on a blocked request bill at $0, and the fallback attempt's input bills at cache-read rates, about 10% of base. A classifier trip costs you latency, not double tokens.
Server-side fallback works on the native Claude API and Claude Platform on AWS only. On Bedrock, Vertex AI, or the Batches API you must catch stop_reason: "refusal" and re-send to Opus 4.8 yourself.
Tune Effort Before You Look at the Bill
At $10/$50 per million tokens, the reflex is to assume Fable 5 doubles your spend. In practice the multiplier depends almost entirely on output_config.effort, and Fable 5 gives you five levels: low, medium, high, xhigh, and max. The default is high.
The pattern that's emerged in early testing, including Anthropic's own launch material, is that Fable 5 finishes agentic work in fewer turns. Independent benchmarks measured it completing spreadsheet task suites 25 to 30% faster than Opus 4.8, using fewer turns. Fewer turns means fewer input-token re-reads of your conversation history, which is where long agent sessions actually burn money. On our internal document-processing agent, a five-hour Opus 4.8 run became a roughly three-hour Fable 5 run at high effort, and the total bill came out around 1.4x, not 2x.
My starting points, which you should treat as starting points and not gospel:
Give the model headroom at the top levels. At xhigh or max, set max_tokens to 64000 and stream the response, otherwise you'll truncate mid-task or hit SDK timeouts. And if you're running multi-hour autonomous loops, the task budget beta (output_config.task_budget) lets you hand the model a token allowance it can see and self-moderate against, which is a better lever than killing runs from the outside. This kind of cost-shaping is most of what we do in AI integration engagements lately; the model is rarely the hard part, the economics are.
Should You Migrate at All?
A checklist before you ship the swap, based on what actually changed:
- Your agent runs long autonomous sessions. Migrate. This is the use case Fable 5 was built for, and the fewer-turns effect partially offsets the price.
- Your workload is short interactive turns. Probably stay on Opus 4.8. It keeps most of the capability at half the price, and the latency profile is similar.
- You're on a zero-retention agreement. Read the fine print first. Anthropic requires mandatory 30-day traffic retention for all Mythos-class models, overriding prior zero-retention terms. Data isn't used for training and is deleted after the window, but for government and regulated-industry clients this is a procurement conversation, not a config change.
- You touch security, bio, or chem domains legitimately. Wire the fallback before you migrate, then watch your fallback rate for a week. If a meaningful share of your traffic routes to Opus 4.8 anyway, you're paying Fable prices for Opus answers.
Fable 5 is included free on Pro, Max, Team, and Enterprise subscription plans until June 22, 2026. After that it moves to usage credits. If you want to evaluate it on real workloads, this two-week window is the cheap time to do it.
The agent I migrated this morning is still running as I write this, six hours into a document-reconciliation job that used to need a babysitter. The swap itself took ten minutes. The fallback wiring, effort tuning, and retention review took the rest of the morning, and that's the honest shape of this migration: the model ID is the easy part. Start with one agent, wire the fallback properly, watch usage.iterations for a week, and let the numbers tell you whether the rest of your fleet follows.