Cache (LLM Cache)

Storing previous AI responses for reuse. Saves costs and speeds up repeated queries.

Why it matters

Many AI queries are repetitive. Caching means you pay for the first answer, then serve identical questions for free — cutting costs by 40-80% in typical business deployments.

In practice

Our LLM routing uses content-hash caching: identical queries return cached results instantly. FAQ matching handles ~40% of chat queries for free before any LLM is called.

Related terms

Cost Tracking

Monitoring every AI call: model used, tokens in/out, cost, cache status. Essential for profitable operations.

Ollama

A tool for running AI models locally. Free, private, fast.

Fallback

An alternative approach when the primary method fails (e.g., Ollama fails, Claude API takes over).

Token Budget

A limit on how many tokens an agent may use for a task.

Back to glossary