KV Cache is an optimization technique used during LLM inference where the Key and Value matrices for previous tokens are stored in memory to avoid recomputation at each step.
Developed to speed up autoregressive decoding in transformers.
A critical optimization for reducing latency and compute costs in production LLM serving.