Quantization

Reducing a model's numerical precision to decrease size, cost, and inference time.

Why it matters

Quantization makes large models runnable on smaller hardware — a 70B model at 4-bit can run on consumer GPUs.

In practice

The llama3.2 model via Ollama uses quantization to fit on our Hetzner server without dedicated GPU.

Related terms

Back to glossary