Reward Model

What is Reward Model?

A separate model trained to predict human preferences (e.g., 'Model A is better than Model B'). It serves as the 'judge' or scorekeeper during the Reinforcement Learning phase of training (RLHF).

Where did the term "Reward Model" come from?

Crucial component of RLHF.

How is "Reward Model" used today?

The bottleneck of high-quality model alignment.

Related Terms

rlhf
PPO (Proximal Policy Optimization)