GoModel talks to vLLM through its OpenAI-compatible HTTP server. Hugging Face model IDs with slashes (e.g.Documentation Index
Fetch the complete documentation index at: https://gomodel.enterpilot.io/docs/llms.txt
Use this file to discover all available pages before exploring further.
meta-llama/Llama-3.1-8B-Instruct) work — GoModel
splits provider-qualified selectors on the first slash only.
Start vLLM first:
Configure
These examples assume GoModel runs in Docker and vLLM is on the host at
localhost:8000 — hence host.docker.internal. If both run in the same
Docker network, use the vLLM service name. If GoModel runs on the host
directly, use http://localhost:8000/v1.Run GoModel
Verify
GET /v1/models returns vLLM’s model IDs prefixed by provider name, e.g.
vllm/meta-llama/Llama-3.1-8B-Instruct.
Multiple vLLM instances
Use suffixed env vars to register more than one without YAML:VLLM_BASE_URL registers vllm. VLLM_TEST_BASE_URL registers vllm-test
(suffix is lowercased, underscores become hyphens).
vLLM passthrough
Passthrough is enabled by default. Use it for vLLM-specific endpoints such as/tokenize, /detokenize, /pooling, /rerank:
VLLM_API_KEY
to upstream requests when configured.
Passthrough routes are provider-type scoped at
/p/vllm/.... To target one
named instance in a multi-vLLM setup, use translated /v1/... requests with
provider-qualified model IDs (e.g. vllm-test/meta-llama/Llama-3.1-8B-Instruct).Not yet integrated
- Native vLLM batch APIs.
- OpenAI-compatible files lifecycle.
- Responses lifecycle utility endpoints.