/v1/audio/speech and /v1/audio/transcriptions
endpoints into that dialect for you.
Configure
config.yaml:
Thinking mode
MiMo models think by default, which increases token usage. Disable it per request with MiMo’sthinking parameter — GoModel forwards it unchanged:
reasoning_content field in the message history exactly as you received it.
GoModel preserves it in both directions.
1M context
Append[1m] to a model ID (for example mimo-v2.5-pro[1m]) to enable
1M-token context. These variants are usually not returned by MiMo’s /models
listing, so add them to the configured model list to make them routable:
Text-to-speech and transcription
MiMo has no native/audio/* endpoints — TTS (mimo-v2.5-tts,
mimo-v2.5-tts-voicedesign, mimo-v2.5-tts-voiceclone) and ASR
(mimo-v2.5-asr) run through chat completions. GoModel exposes both ways:
- Standard audio endpoints —
/v1/audio/speechand/v1/audio/transcriptionsare translated automatically. Speech supportsresponse_formatwav(default) andpcm;instructionsbecome the MiMo style prompt andvoiceselects a preset voice. Transcription supportsjson(default) andtextresponse formats, withlanguagepassed through to MiMo’sasr_optionsandtemperatureforwarded to the chat request. - MiMo’s chat dialect — send chat completions directly: synthesis text in
an
assistantmessage with a top-levelaudio: {format, voice}parameter, or aninput_audiocontent part whosedatais a base64data:URI for transcription. GoModel forwards these shapes untouched.
Not supported by Xiaomi MiMo
All of these returninvalid_request_error rather than silently dropping the
option:
- Embeddings.
- Speech
response_formatvalues other thanwav/pcmand non-defaultspeed(useinstructionsto adjust pace). - Transcription
verbose_json/srt/vttformats,prompt, andtimestamp_granularities(MiMo returns plain transcript text only).
MiMo-V2-Flash and V2-TTS requests auto-route to the V2.5 models (at V2.5
pricing) from June 18, 2026.