Skip to main content

Documentation Index

Fetch the complete documentation index at: https://gomodel.enterpilot.io/docs/llms.txt

Use this file to discover all available pages before exploring further.

GoModel talks to vLLM through its OpenAI-compatible HTTP server. Hugging Face model IDs with slashes (e.g. meta-llama/Llama-3.1-8B-Instruct) work — GoModel splits provider-qualified selectors on the first slash only. Start vLLM first:
vllm serve meta-llama/Llama-3.1-8B-Instruct
# add --api-key token-abc123 if you want vLLM to require bearer auth

Configure

VLLM_BASE_URL=http://host.docker.internal:8000/v1   # include /v1
# VLLM_API_KEY=token-abc123                         # only if vLLM was started with --api-key
GOMODEL_MASTER_KEY=change-me
These examples assume GoModel runs in Docker and vLLM is on the host at localhost:8000 — hence host.docker.internal. If both run in the same Docker network, use the vLLM service name. If GoModel runs on the host directly, use http://localhost:8000/v1.

Run GoModel

docker run --rm -p 8080:8080 --env-file .env enterpilot/gomodel

Verify

curl -s http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vllm/meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Reply with exactly ok."}]
  }'
GET /v1/models returns vLLM’s model IDs prefixed by provider name, e.g. vllm/meta-llama/Llama-3.1-8B-Instruct.

Multiple vLLM instances

Use suffixed env vars to register more than one without YAML:
VLLM_BASE_URL=http://host.docker.internal:8000/v1
VLLM_TEST_BASE_URL=http://host.docker.internal:8001/v1
VLLM_BASE_URL registers vllm. VLLM_TEST_BASE_URL registers vllm-test (suffix is lowercased, underscores become hyphens).

vLLM passthrough

Passthrough is enabled by default. Use it for vLLM-specific endpoints such as /tokenize, /detokenize, /pooling, /rerank:
curl -s http://localhost:8080/p/vllm/tokenize \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "Hello"}'
GoModel strips client auth headers before forwarding and applies VLLM_API_KEY to upstream requests when configured.
Passthrough routes are provider-type scoped at /p/vllm/.... To target one named instance in a multi-vLLM setup, use translated /v1/... requests with provider-qualified model IDs (e.g. vllm-test/meta-llama/Llama-3.1-8B-Instruct).

Not yet integrated

  • Native vLLM batch APIs.
  • OpenAI-compatible files lifecycle.
  • Responses lifecycle utility endpoints.