Knowledge Distillation

Compressing a larger model's behavior into a smaller model to reduce cost and latency.

Why it matters

Large models are smart but expensive. Knowledge distillation creates smaller models that capture most capability at a fraction of the cost.

In practice

Our Ollama-first approach is similar in spirit: use smaller local models for routine tasks and only escalate to Claude when needed.

Related terms

Back to glossary