Topic 1 Question 2

Professional Data Engineer

Topic 1 Question 2
You are building a model to make clothing recommendations. You know a user's fashion preference is likely to change over time, so you build a data pipeline to stream new data back to the model as it becomes available. How should you use this data to train the model?
- Continuously retrain the model on just the new data.
- Continuously retrain the model on a combination of existing data and the new data.
- Train on the existing data while using the new data as your test set.
- Train on the new data while using the existing data as your test set.
ユーザの投票
コメント(17)
- I think it should be B because we have to use a combination of old and new test data as well as training data
  
  👍 34
  serg3d2020/05/29
- B, as we need to train the data with new data, so that it will keep learning, and as well as used for test
  
  👍 11
  jagadamba2020/06/28
- I would go with B (but A is also possible If the market is fast changing) https://datascience.stackexchange.com/questions/12761/should-a-model-be-re-trained-if-new-observations-are-available: Suppose that your model attempts to predict customers' behavior, e.g. how likely is a customer to purchase your product given an offer tailored for him. Clearly, the market changes over time, customers' preferences change, and your competitors adjust. You should adjust as well, so you need to retrain periodically. In such a case I would recommend to add new data, but also omit old data that is not relevant anymore. If the market is fast changing, you should even consider retraining periodically based on new data only. https://docs.aws.amazon.com/machine-learning/latest/dg/retraining-models-on-new-data.html: It is a good practice to continuously monitor the incoming data and retrain your model on newer data if you find that the data distribution has deviated significantly from the original training data distribution
  
  👍 4
  MaxNRG2021/11/07
シャッフルモード

ユーザの投票

コメント(17)