Test-Time Compute refers to models that spend extra time 'thinking' or generating hidden reasoning tokens during the inference phase before producing a final answer. This approach, similar to human 'System 2' thinking (slow, deliberate), allows models to solve complex problems by exploring multiple paths and verifying steps, effectively trading increased latency for higher intelligence.
Popularized by OpenAI's o1 model and DeepSeek R1 (2024-2025).
Emerging as a new scaling law for reasoning capabilities, shifting focus from pre-training scale to inference-time compute.