Essays··8 min read
Cheap hits, confident wrong answers
Prefix caching is a fact; semantic caching is a bet. One is free and lossless, the other can return a confident, well-formatted, wrong answer with an HTTP 200. Both are true in the same architecture diagram.
llm-inferencesemantic-cachingfinopsprefix-caching
Read