EXL2

What is EXL2?

EXL2 is a quantization format optimized purely for speed on modern Nvidia GPUs, allowing for mixed precision to perfectly fit a model into available VRAM.

Where did the term "EXL2" come from?

Designed for high-performance enthusiast inference.

How is "EXL2" used today?

Fastest format for Nvidia cards, outperforming GGUF in raw speed.

Related Terms