Cache - GoModel

Overview

GoModel ships with two response-cache layers for non-streaming requests on:

/v1/chat/completions
/v1/responses
/v1/messages
/v1/embeddings

Exact-match cache returns byte-identical responses with:

X-Cache: HIT (exact)

Semantic cache uses embeddings plus vector search so meaning-equivalent prompts can reuse a stored response:

X-Cache: HIT (semantic)

Enable the exact cache

Point response caching at Redis:

cache:
  response:
    simple:
      redis:
        url: redis://localhost:6379
        ttl: 3600

You can also configure it with environment variables:

REDIS_URL
REDIS_KEY_RESPONSES
REDIS_TTL_RESPONSES

Enable semantic caching

Add a semantic block with an embedder provider and a vector store:

cache:
  response:
    semantic:
      enabled: true
      embedder:
        provider: openai
      vector_store:
        type: qdrant
        qdrant:
          url: http://localhost:6333
          collection: gomodel_semantic

Supported vector stores:

qdrant
pgvector
pinecone
weaviate

Both cache layers run after workflow and guardrail patching, so they operate on the final request sent upstream. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per request. For the full semantic-cache design and storage options, see ADR-0006.

What the exact cache keys on

The exact cache hashes:

the request path
the resolved workflow context used for execution specifically execution mode, provider type, and resolved model
the final request body

This means guardrails and workflows affect cache keys when they change the resolved workflow or the final body sent through execution.

`user_path` behavior

For the exact cache, user_path is not added to the cache key by itself. That is intentional. If two requests end up with the same path, resolved workflow, and final request body, they can share the same exact-cache entry even when they originate from different user_path values.

If you need tenant or path-specific cache behavior, use a scoped workflow or otherwise make the final request differ for that scope. user_path alone is not an exact-cache partition key.

Common patterns:

disable cache in a scoped workflow
use different scoped workflows for different user_path values
include scope-specific context so the final request body differs

Cache analytics

When response caching and usage tracking are enabled, the admin API exposes a cached-only overview at:

/admin/cache/overview

Cached usage entries are also visible in the regular usage log and summary endpoints.

​Overview

​Enable the exact cache

​Enable semantic caching

​What the exact cache keys on

​user_path behavior

​Cache analytics

Overview

Enable the exact cache

Enable semantic caching

What the exact cache keys on

`user_path` behavior

Cache analytics