Cache (LLM Cache)
Storing previous AI responses for reuse. Saves costs and speeds up repeated queries.
Why it matters
Many AI queries are repetitive. Caching means you pay for the first answer, then serve identical questions for free — cutting costs by 40-80% in typical business deployments.
In practice
Our LLM routing uses content-hash caching: identical queries return cached results instantly. FAQ matching handles ~40% of chat queries for free before any LLM is called.
Related terms
Cost Tracking
Monitoring every AI call: model used, tokens in/out, cost, cache status. Essential for profitable operations.
Ollama
A tool for running AI models locally. Free, private, fast.
Fallback
An alternative approach when the primary method fails (e.g., Ollama fails, Claude API takes over).
Token Budget
A limit on how many tokens an agent may use for a task.