EXL2 is a quantization format optimized purely for speed on modern Nvidia GPUs, allowing for mixed precision to perfectly fit a model into available VRAM.
Designed for high-performance enthusiast inference.
Fastest format for Nvidia cards, outperforming GGUF in raw speed.