A high-throughput and memory-efficient serving open-source library for LLMs. It uses PagedAttention to manage KV cache memory non-contiguously.
UC Berkeley (2023).
The gold standard for self-hosting open-source LLMs.