An attention algorithm that allows storing continuous keys and values in non-continuous memory space (like OS virtual memory pages). This drastically reduces memory waste and fragmentation.
Core innovation of vLLM.
Enables massive concurrent users on limited VRAM.