Continuous Batching

What is Continuous Batching?

A serving technique where new requests are added to the running batch immediately as old ones finish, rather than waiting for the entire batch to complete. Also known as 'iteration-level scheduling'.

Where did the term "Continuous Batching" come from?

Essential for LLM inference servers.

How is "Continuous Batching" used today?

Maximizes GPU utilization for chat applications.

Related Terms