Topic 1 Question 130

AWS Certified Machine Learning - Specialty

Topic 1 Question 130
A data scientist must build a custom recommendation model in Amazon SageMaker for an online retail company. Due to the nature of the company's products, customers buy only 4-5 products every 5-10 years. So, the company relies on a steady stream of new customers. When a new customer signs up, the company collects data on the customer's preferences. Below is a sample of the data available to the data scientist. How should the data scientist split the dataset into a training and test set for this use case?
- Shuffle all interaction data. Split off the last 10% of the interaction data for the test set.
- Identify the most recent 10% of interactions for each user. Split off these interactions for the test set.
- Identify the 10% of users with the least interaction data. Split off all interaction data from these users for the test set.
- Randomly select 10% of the users. Split off all interaction data from these users for the test set.
ユーザの投票
コメント(8)
- I would select B, straight from this AWS example: https://aws.amazon.com/blogs/machine-learning/building-a-customized-recommender-system-in-amazon-sagemaker/
  
  👍 22
  joep212021/09/19
- I think the answer is D because customers by only 4-5 products every 5-10 years so it doesn't make sense to get 10% interactions for each user as a test set.
  
  👍 5
  NicZ11112021/11/01
- 正解だと思う選択肢: B
  I think it is a problem of leakage, so B is the correct answer https://www.datarobot.com/wiki/target-leakage/
  
  👍 3
  gggsrs2022/01/07
シャッフルモード

ユーザの投票

コメント(8)