Topic 1 Question 152

Professional Machine Learning Engineer

Topic 1 Question 152
You are developing an ML model using a dataset with categorical input variables. You have randomly split half of the data into training and test sets. After applying one-hot encoding on the categorical variables in the training set, you discover that one categorical variable is missing from the test set. What should you do?
- Use sparse representation in the test set.
- Randomly redistribute the data, with 70% for the training set and 30% for the test set
- Apply one-hot encoding on the categorical variables in the test data
- Collect more data representing all categories
ユーザの投票
コメント(8)
- C. Apply one-hot encoding on the categorical variables in the test data.
  
  When using one-hot encoding on categorical variables, each unique value of the variable is represented as a separate binary variable. Therefore, it is important to ensure that the same set of binary variables is present in both the training and test datasets. Since one categorical variable is missing in the test set, the recommended approach is to apply one-hot encoding on the categorical variables in the test set to ensure that the same set of binary variables is present in both datasets.
  
  👍 2
  TNT872023/03/07
- 正解だと思う選択肢: A
  Since one categorical variable is missing from the test set, C would result in a different number of columns in the training and test sets.
  
  👍 2
  formazioneQI2023/04/18
- 正解だと思う選択肢: C
  By using a sparse representation, you will be losing the information contained in the missing categorical variable. This could lead to the model making incorrect predictions on the test set.
  
  👍 2
  Gudwin2023/04/27
シャッフルモード

ユーザの投票

コメント(8)