Latency

The time it takes for an agent to respond or act.

Why it matters

Agent tasks trade instant response for thorough, autonomous work. The right latency depends on the use case.

In practice

Our chat widget responds in <2s for FAQ matches, 3-5s for Ollama, 5-10s when Claude API is needed.

Related terms

The process of an AI model generating a response or prediction from input data.

Storing previous AI responses for reuse. Saves costs and speeds up repeated queries.

A tool for running AI models locally. Free, private, fast.

An alternative approach when the primary method fails (e.g., Ollama fails, Claude API takes over).