Topic 1 Question 128
You work on a regression problem in a natural language processing domain, and you have 100M labeled examples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?
Increase the share of the test sample in the train-test split.
Try to collect more data and increase the size of your dataset.
Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.
ユーザの投票
コメント(17)
This is a case of underfitting - not overfitting (for over fitting the model will have extremely low training error but a high testing error) - so we need to make the model more complex - answer is D
👍 53Callumr2020/06/20should be D
👍 18[Removed]2020/03/22- 正解だと思う選択肢: D
D: A is incorrect since test sample is large enough. B is incorrect since dataset is pretty large already, and having more data typically helps with overfitting and not with underfitting. C is incorrect since regularization helps to avoid overfitting and we have a clear underfitting case. D is correct since increasing model complexity generally helps when you have an underfitting problem.
👍 6MaxNRG2022/01/09
シャッフルモード