reasoning.effort knob onto Claude’s native
thinking and effort controls, which differ by model generation.
Configure
config.yaml:
Anthropic’s
/v1/messages requires max_tokens on every request. GoModel
injects ANTHROPIC_DEFAULT_MAX_TOKENS (default 4096) when a caller omits
it, keeping the OpenAI-compatible surface lenient.Reasoning effort mapping
GoModel accepts the OpenAI-shaped"reasoning": {"effort": "..."} object as
well as the Chat Completions string form "reasoning_effort": "..." (a
non-empty reasoning.effort wins when both are present; an empty object falls
back to the string form) and translates them to Claude’s native controls. The
five accepted levels are low, medium, high, xhigh, and max; values
are matched case-insensitively and any other value is downgraded to low and
logged. The translation
depends on whether the model supports adaptive thinking.
| Model generation | Thinking config | Effort destination |
|---|---|---|
Adaptive — claude-fable-5, claude-opus-4-8, claude-opus-4-7, claude-opus-4-6, claude-sonnet-4-6 (and dated snapshots of each) | thinking: {type: "adaptive"} | output_config.effort (passed through) |
Legacy (everything else, e.g. claude-opus-4-5, claude-3-5-sonnet) | thinking: {type: "enabled", budget_tokens: N} | mapped to a token budget |
Adaptive routing is an explicit allowlist, not a version comparison. New
model IDs are treated as legacy until added to the list. For pre-4.7 models
the legacy fallback keeps working via
budget_tokens; models from Opus 4.7
onward reject budget_tokens outright, so a new adaptive-only model ID
fails with an upstream 400 until it is added to the allowlist.max_tokens is
bumped above the budget when needed. xhigh and max are adaptive-only levels,
so on legacy models they are capped at the high budget rather than inflating
max_tokens past what those models can emit:
| Effort | Budget tokens |
|---|---|
low | 5000 |
medium | 10000 |
high / xhigh / max | 20000 |
Omit
reasoning to leave thinking off. On the adaptive models above, GoModel
only sets thinking: {type: "adaptive"} when you pass reasoning.effort;
without it those models do not engage extended thinking. Effort is a separate
control that governs overall token spend (text and tool calls) whether or not
thinking is engaged, and Anthropic defaults it to high when unset. It is a
behavioral signal for depth and verbosity, not a hard budget — actual usage
varies per request and is bounded by max_tokens.Effort levels are model-gated upstream:
xhigh is only available on Fable 5
and Opus 4.8/4.7, and max on Fable 5, Opus 4.8/4.7/4.6, and Sonnet 4.6.
GoModel forwards the level you send; Anthropic rejects it with a 400 if the
target model does not support it. Manual budget_tokens thinking is rejected
on Fable 5 and Opus 4.7/4.8, which is why GoModel uses adaptive thinking for
those models.temperature = 1. GoModel
drops any other temperature value (and logs it) rather than failing the request.
Native passthrough
To send Claude-native request fields that have no OpenAI-compatible equivalent (for example inline mid-tasksystem entries in the messages array), use the
passthrough route /p/anthropic/messages, which forwards the body verbatim.