LLM-as-a-Judge is an evaluation paradigm where a strong model (like GPT-4) is used to grade the outputs of a weaker model, acting as a proxy for human evaluation.
Scalable but prone to self-preference bias.
Standard for evaluating RAG pipelines and chatbots.