Skip to main content

Overview

GoModel ships with two response-cache layers for non-streaming requests on:
  • /v1/chat/completions
  • /v1/responses
  • /v1/embeddings
Exact-match cache returns byte-identical responses with:
X-Cache: HIT (exact)
Semantic cache uses embeddings plus vector search so meaning-equivalent prompts can reuse a stored response:
X-Cache: HIT (semantic)

Enable the exact cache

Point response caching at Redis:
cache:
  response:
    simple:
      redis:
        url: redis://localhost:6379
        ttl: 3600
You can also configure it with environment variables:
  • REDIS_URL
  • REDIS_KEY_RESPONSES
  • REDIS_TTL_RESPONSES

Enable semantic caching

Add a semantic block with an embedder provider and a vector store:
cache:
  response:
    semantic:
      enabled: true
      embedder:
        provider: openai
      vector_store:
        type: qdrant
        qdrant:
          url: http://localhost:6333
          collection: gomodel_semantic
Supported vector stores:
  • qdrant
  • pgvector
  • pinecone
  • weaviate
Both cache layers run after workflow and guardrail patching, so they operate on the final request sent upstream. Use Cache-Control: no-cache or Cache-Control: no-store to bypass caching per request. For the full semantic-cache design and storage options, see ADR-0006.

What the exact cache keys on

The exact cache hashes:
  • the request path
  • the resolved workflow context used for execution specifically execution mode, provider type, and resolved model
  • the final request body
This means guardrails and workflows affect cache keys when they change the resolved workflow or the final body sent through execution.

user_path behavior

For the exact cache, user_path is not added to the cache key by itself. That is intentional. If two requests end up with the same path, resolved workflow, and final request body, they can share the same exact-cache entry even when they originate from different user_path values.
If you need tenant or path-specific cache behavior, use a scoped workflow or otherwise make the final request differ for that scope. user_path alone is not an exact-cache partition key.
Common patterns:
  • disable cache in a scoped workflow
  • use different scoped workflows for different user_path values
  • include scope-specific context so the final request body differs

Cache analytics

When response caching and usage tracking are enabled, the admin API exposes a cached-only overview at:
/admin/api/v1/cache/overview
Cached usage entries are also visible in the regular usage log and summary endpoints.