vLLM

What is vLLM?

A high-throughput and memory-efficient serving open-source library for LLMs. It uses PagedAttention to manage KV cache memory non-contiguously.

Where did the term "vLLM" come from?

UC Berkeley (2023).

How is "vLLM" used today?

The gold standard for self-hosting open-source LLMs.

Related Terms