Topic 1 Question 96
A company has fine-tuned a large language model (LLM) to answer questions for a help desk. The company wants to determine if the fine-tuning has enhanced the model's accuracy.
Which metric should the company use for the evaluation?
Precision
Time to first token
F1 score
Word error rate
ユーザの投票
コメント(5)
- 正解だと思う選択肢: C
F1 score is a metric that combines precision and recall to evaluate the balance between correctly identified outputs and missed or irrelevant outputs. It is particularly useful for tasks like question answering, where both accuracy and completeness are critical. In this help desk scenario, the F1 score helps assess whether the model consistently provides correct and relevant answers to user queries, reflecting the effectiveness of fine-tuning.
👍 1ap64912024/12/27 - 正解だと思う選択肢: C
The F1 score provides a balanced evaluation of the model's ability to give both relevant and accurate answers, making it the most suitable metric for assessing the fine-tuned model’s performance in answering help desk questions.
👍 1aws_Tamilan2024/12/27 - 正解だと思う選択肢: C
The correct answer is C. F1 score combines precision and recall, making it ideal for question-answering evaluation.
👍 1may2021_r2024/12/28
シャッフルモード