Topic 1 Question 137
You deployed an ML model into production a year ago. Every month, you collect all raw requests that were sent to your model prediction service during the previous month. You send a subset of these requests to a human labeling service to evaluate your model’s performance. After a year, you notice that your model's performance sometimes degrades significantly after a month, while other times it takes several months to notice any decrease in performance. The labeling service is costly, but you also need to avoid large performance degradations. You want to determine how often you should retrain your model to maintain a high level of performance while minimizing cost. What should you do?
Train an anomaly detection model on the training dataset, and run all incoming requests through this model. If an anomaly is detected, send the most recent serving data to the labeling service.
Identify temporal patterns in your model’s performance over the previous year. Based on these patterns, create a schedule for sending serving data to the labeling service for the next year.
Compare the cost of the labeling service with the lost revenue due to model performance degradation over the past year. If the lost revenue is greater than the cost of the labeling service, increase the frequency of model retraining; otherwise, decrease the model retraining frequency.
Run training-serving skew detection batch jobs every few days to compare the aggregate statistics of the features in the training dataset with recent serving data. If skew is detected, send the most recent serving data to the labeling service.
ユーザの投票
コメント(7)
- 正解だと思う選択肢: D👍 3hiromi2022/12/22
- 正解だと思う選択肢: D
Option D is the best approach to determine how often to retrain the model while minimizing cost. Running training-serving skew detection batch jobs every few days to compare the aggregate statistics of the features in the training dataset with recent serving data is an effective way to detect when the model's performance has degraded. If skew is detected, the most recent serving data should be sent to the labeling service to evaluate the model's performance. This approach is more cost-effective than sending a subset of requests to the labeling service every month because it only sends data when there is a high probability that the model's performance has degraded. By doing this, the model can be retrained at the right time, and the cost of the labeling service can be minimized.
👍 3TNT872023/03/07 - 正解だと思う選択肢: B
"After a year, you notice that your model's performance sometimes degrades significantly after a month, while other times it takes several months to notice any decrease in performance." Hence I vote B
👍 2mil_spyro2022/12/13
シャッフルモード