Topic 1 Question 36
You are building a model to predict daily temperatures. You split the data randomly and then transformed the training and test datasets. Temperature data for model training is uploaded hourly. During testing, your model performed with 97% accuracy; however, after deploying to production, the model's accuracy dropped to 66%. How can you make your production model more accurate?
Normalize the data for the training, and test datasets as two separate steps.
Split the training and test data based on time rather than a random split to avoid leakage.
Add more data to your test set to ensure that you have a fair distribution and sample for testing.
Apply data transformations before splitting, and cross-validate to make sure that the transformations are applied to both the training and test sets.
ユーザの投票
コメント(11)
B. If you do time series prediction, you can't borrow information from the future to predict the future. If you do, you are artificially increasing your accuracy.
👍 29maartenalexander2021/06/22B. D doesn't improve anything at all. Split and Transform is no different than Transform and Split if the transform logic is the same.
👍 3Danny20212021/09/08- 正解だと思う選択肢: B
If you do random split in a time series, your risk that training data will contain information about the target (definition of leakage), but similar data won't be available when the model is used for prediction. Leakage causes the model to look accurate until you start making actual predictions with it.
👍 3giaZ2022/03/08
シャッフルモード