LLM-as-a-Judge

What is LLM-as-a-Judge?

LLM-as-a-Judge is an evaluation paradigm where a strong model (like GPT-4) is used to grade the outputs of a weaker model, acting as a proxy for human evaluation.

Where did the term "LLM-as-a-Judge" come from?

Scalable but prone to self-preference bias.

How is "LLM-as-a-Judge" used today?

Standard for evaluating RAG pipelines and chatbots.

Related Terms

Large Language Model (LLM)
rlhf