Share to: share facebook share twitter share wa share telegram print page

LLM-as-a-Judge

LLM-as-a-Judge or LLM-based evaluation is a conceptual framework in natural language processing (NLP) that employs large language models (LLMs) as evaluators to assess the performance of other language-based systems or outputs. Instead of relying solely on human annotators, the approach leverages the general language capabilities of advanced language models to serve at automated judges.

LLM-as-a-Judge may be more cost-effective and may be added to automated evaluation pipelines. Unlike traditional automatic evaluation metrics such as ROUGE and BLEU, which rely on transparent, rule-based comparisons with surface-level n-grams, LLM-as-a-Judge relies on the opaque internal reasoning of large language models. The LLM-based evaluations likely incorporate deeper semantic understanding, but at the cost of interpretability. Beyond the interpretability there may be other issues with LLM evaluators.[1] For instance, if an LLM has generated an output, the evaluation of the output with the same LLM may yield a distorted evaluation, "LLM narcissism".[1][2]

Typically, a more powerful LLM is employed to evaluate the outputs of smaller or less capable language models—for example, using GPT-4 to assess the performance of a 13-billion-parameter LLaMA model.[3]

References

  1. ^ a b Laura Dietz; Oleg Zendel; Peter Bailey; et al. (18 July 2025). Principles and Guidelines for the Use of LLM Judges. pp. 218–229. doi:10.1145/3731120.3744588. ISBN 979-8-4007-1861-8. Wikidata Q135734265. {{cite book}}: |journal= ignored (help)
  2. ^ Yiqi Liu; Nafise Moosavi; Chenghua Lin (2024), LLMs as Narcissistic Evaluators: When Ego Inflates Evaluation Scores, pp. 12688–12701, doi:10.18653/V1/2024.FINDINGS-ACL.753, Wikidata Q135734850
  3. ^ Lianmin Zheng; Wei-Lin Chiang; Ying Sheng; et al. (9 June 2023), Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, arXiv:2306.05685, doi:10.48550/ARXIV.2306.05685, Wikidata Q123527686


Prefix: a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9

Portal di Ensiklopedia Dunia

Kembali kehalaman sebelumnya