Topic 1 Question 106
A company is introducing a mobile app that helps users learn foreign languages. The app makes text more coherent by calling a large language model (LLM). The company collected a diverse dataset of text and supplemented the dataset with examples of more readable versions. The company wants the LLM output to resemble the provided examples.
Which metric should the company use to assess whether the LLM meets these requirements?
Value of the loss function
Semantic robustness
Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score
Latency of the text generation
ユーザの投票
コメント(4)
- 正解だと思う選択肢: C
The ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score is widely used to measure the similarity between generated text and a set of reference texts. Since the company wants the LLM's output to resemble the provided readable examples, ROUGE is the most appropriate metric.
ROUGE compares the LLM-generated text with the human-provided reference texts by evaluating n-gram overlap, precision, recall, and F1 score, making it a great choice for text coherence and readability assessment.
👍 2Jessiii2025/02/11 - 正解だと思う選択肢: C
he most suitable metric to assess whether the LLM output resembles the provided examples of more readable text is:
C. Recall-Oriented Understudy for Gisting Evaluation (ROUGE) score
The ROUGE score is commonly used for evaluating the quality of text summarization and machine-generated text by comparing it to a set of reference texts. It measures how well the generated text matches the provided examples in terms of content and coherence. Specifically, ROUGE scores focus on the overlap of n-grams, word sequences, and word pairs between the generated text and the reference texts, making it ideal for this use case.
👍 126b8fe12024/12/26 - 正解だと思う選択肢: C
Since the company wants the LLM output to resemble the provided examples in terms of coherence and readability, ROUGE score is the best metric for this evaluation.
👍 1aws_Tamilan2024/12/27
シャッフルモード