Inference

The process of an AI model generating a response or prediction from input data.

Why it matters

Inference is where the cost happens. Understanding it helps you optimize: batch queries, cache results, choose the cheapest model.

In practice

We route inference strategically: FAQ matching avoids it entirely, Ollama handles simple tasks locally (free), Claude API is reserved for complex reasoning.

Related terms

Back to glossary