vLLM - GoModel

GoModel talks to vLLM through its OpenAI-compatible HTTP server. Hugging Face model IDs with slashes (e.g. meta-llama/Llama-3.1-8B-Instruct) work — GoModel splits provider-qualified selectors on the first slash only. Start vLLM first:

vllm serve meta-llama/Llama-3.1-8B-Instruct
# add --api-key token-abc123 if you want vLLM to require bearer auth

Configure

VLLM_BASE_URL=http://host.docker.internal:8000/v1   # include /v1
# VLLM_API_KEY=token-abc123                         # only if vLLM was started with --api-key
GOMODEL_MASTER_KEY=change-me

These examples assume GoModel runs in Docker and vLLM is on the host at localhost:8000 — hence host.docker.internal. If both run in the same Docker network, use the vLLM service name. If GoModel runs on the host directly, use http://localhost:8000/v1.

Run GoModel

docker run --rm -p 8080:8080 --env-file .env enterpilot/gomodel

docker run --rm -p 8080:8080 \
  -e GOMODEL_MASTER_KEY="change-me" \
  -e VLLM_BASE_URL="http://host.docker.internal:8000/v1" \
  enterpilot/gomodel

make build
./bin/gomodel

Verify

curl -s http://localhost:8080/v1/chat/completions \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "vllm/meta-llama/Llama-3.1-8B-Instruct",
    "messages": [{"role": "user", "content": "Reply with exactly ok."}]
  }'

GET /v1/models returns vLLM’s model IDs prefixed by provider name, e.g. vllm/meta-llama/Llama-3.1-8B-Instruct.

Multiple vLLM instances

Use suffixed env vars to register more than one without YAML:

VLLM_BASE_URL=http://host.docker.internal:8000/v1
VLLM_TEST_BASE_URL=http://host.docker.internal:8001/v1

VLLM_BASE_URL registers vllm. VLLM_TEST_BASE_URL registers vllm-test (suffix is lowercased, underscores become hyphens).

vLLM passthrough

Passthrough is enabled by default. Use it for vLLM-specific endpoints such as /tokenize, /detokenize, /pooling, /rerank:

curl -s http://localhost:8080/p/vllm/tokenize \
  -H "Authorization: Bearer change-me" \
  -H "Content-Type: application/json" \
  -d '{"model": "meta-llama/Llama-3.1-8B-Instruct", "prompt": "Hello"}'

GoModel strips client auth headers before forwarding and applies VLLM_API_KEY to upstream requests when configured.

Passthrough routes are provider-type scoped at /p/vllm/.... To target one named instance in a multi-vLLM setup, use translated /v1/... requests with provider-qualified model IDs (e.g. vllm-test/meta-llama/Llama-3.1-8B-Instruct).

Not yet integrated

Native vLLM batch APIs.
OpenAI-compatible files lifecycle.
Responses lifecycle utility endpoints.

​Configure

​Run GoModel

​Verify

​Multiple vLLM instances

​vLLM passthrough

​Not yet integrated

Configure

Run GoModel

Verify

Multiple vLLM instances

vLLM passthrough

Not yet integrated