Latency
The time it takes for an agent to respond or act.
Why it matters
Agent tasks trade instant response for thorough, autonomous work. The right latency depends on the use case.
In practice
Our chat widget responds in <2s for FAQ matches, 3-5s for Ollama, 5-10s when Claude API is needed.
Related terms
Inference
The process of an AI model generating a response or prediction from input data.
Cache (LLM Cache)
Storing previous AI responses for reuse. Saves costs and speeds up repeated queries.
Ollama
A tool for running AI models locally. Free, private, fast.
Fallback
An alternative approach when the primary method fails (e.g., Ollama fails, Claude API takes over).