Quantization

What is Quantization?

Quantization reduces the precision of a model's weights (e.g., from 16-bit to 4-bit) to lower memory usage and increase inference speed, often with minimal loss in accuracy.

Where did the term "Quantization" come from?

Essential technique for running large models on edge devices.

How is "Quantization" used today?

Enables running 70B+ parameter models on consumer GPUs.

Related Terms

Inference
Model Weights
Large Language Model (LLM)