Topic 1 Question 128

Professional Data Engineer

Topic 1 Question 128
You work on a regression problem in a natural language processing domain, and you have 100M labeled examples in your dataset. You have randomly shuffled your data and split your dataset into train and test samples (in a 90/10 ratio). After you trained the neural network and evaluated your model on a test set, you discover that the root-mean-squared error (RMSE) of your model is twice as high on the train set as on the test set. How should you improve the performance of your model?
- Increase the share of the test sample in the train-test split.
- Try to collect more data and increase the size of your dataset.
- Try out regularization techniques (e.g., dropout of batch normalization) to avoid overfitting.
- Increase the complexity of your model by, e.g., introducing an additional layer or increase sizing the size of vocabularies or n-grams used.
ユーザの投票
コメント(17)
- This is a case of underfitting - not overfitting (for over fitting the model will have extremely low training error but a high testing error) - so we need to make the model more complex - answer is D
  
  👍 53
  Callumr2020/06/20
- should be D
  
  👍 18
  [Removed]2020/03/22
- 正解だと思う選択肢: D
  D: A is incorrect since test sample is large enough. B is incorrect since dataset is pretty large already, and having more data typically helps with overfitting and not with underfitting. C is incorrect since regularization helps to avoid overfitting and we have a clear underfitting case. D is correct since increasing model complexity generally helps when you have an underfitting problem.
  
  👍 6
  MaxNRG2022/01/09
シャッフルモード

ユーザの投票

コメント(17)