Overview
GoModel ships with two response-cache layers for non-streaming requests on:/v1/chat/completions/v1/responses/v1/embeddings
Enable the exact cache
Point response caching at Redis:REDIS_URLREDIS_KEY_RESPONSESREDIS_TTL_RESPONSES
Enable semantic caching
Add asemantic block with an embedder provider and a vector store:
qdrantpgvectorpineconeweaviate
Cache-Control: no-cache or
Cache-Control: no-store to bypass caching per request.
For the full semantic-cache design and storage options, see
ADR-0006.
What the exact cache keys on
The exact cache hashes:- the request path
- the resolved workflow context used for execution specifically execution mode, provider type, and resolved model
- the final request body
user_path behavior
For the exact cache, user_path is not added to the cache key by itself.
That is intentional. If two requests end up with the same path, resolved
workflow, and final request body, they can share the same exact-cache
entry even when they originate from different user_path values.
If you need tenant or path-specific cache behavior, use a scoped workflow or
otherwise make the final request differ for that scope.
user_path alone is
not an exact-cache partition key.- disable cache in a scoped workflow
- use different scoped workflows for different
user_pathvalues - include scope-specific context so the final request body differs