PagedAttention

What is PagedAttention?

An attention algorithm that allows storing continuous keys and values in non-continuous memory space (like OS virtual memory pages). This drastically reduces memory waste and fragmentation.

Where did the term "PagedAttention" come from?

Core innovation of vLLM.

How is "PagedAttention" used today?

Enables massive concurrent users on limited VRAM.

Related Terms