KV Cache (Key-Value Cache)

What is KV Cache (Key-Value Cache)?

KV Cache is an optimization technique used during LLM inference where the Key and Value matrices for previous tokens are stored in memory to avoid recomputation at each step.

Where did the term "KV Cache (Key-Value Cache)" come from?

Developed to speed up autoregressive decoding in transformers.

How is "KV Cache (Key-Value Cache)" used today?

A critical optimization for reducing latency and compute costs in production LLM serving.

Related Terms

Inference
Transformer Architecture
latency