Topic 1 Question 85
You work for a large social network service provider whose users post articles and discuss news. Millions of comments are posted online each day, and more than 200 human moderators constantly review comments and flag those that are inappropriate. Your team is building an ML model to help human moderators check content on the platform. The model scores each comment and flags suspicious comments to be reviewed by a human. Which metric(s) should you use to monitor the model’s performance?
Number of messages flagged by the model per minute
Number of messages flagged by the model per minute confirmed as being inappropriate by humans.
Precision and recall estimates based on a random sample of 0.1% of raw messages each minute sent to a human for review
Precision and recall estimates based on a sample of messages flagged by the model as potentially inappropriate each minute
ユーザの投票
コメント(14)
- 正解だと思う選択肢: D👍 9hiromi2022/12/19
- 正解だと思う選択肢: B
I think B is a complete answer because the answer says "to be inappropriate for humans". So, having this background information, we have a human-checked sample of messages, and we can use that for the following:
- message flagged and confirmed as inappropriate as true positive,
- message flagged and confirmed as not inappropriate as false negative,
- message not flagged and confirmed as inappropriate as false negative,
And indirectly, we can use that to calculate metrics like precision and recall.
So I think if we have the information verified by humans, the number of flagged messages has enough information to say if the model is good or bad.
Answer D is vaguer to me because it doesn't bring up the use of "confirmed to be inappropriate for humans"
👍 4guilhermebutzke2023/02/23 - 正解だと思う選択肢: B
B is the only way to go!
👍 2ares812022/12/14
シャッフルモード