Topic 1 Question 127
While performing exploratory data analysis on a dataset, you find that an important categorical feature has 5% null values. You want to minimize the bias that could result from the missing values. How should you handle the missing values?
Remove the rows with missing values, and upsample your dataset by 5%.
Replace the missing values with the feature’s mean.
Replace the missing values with a placeholder category indicating a missing value.
Move the rows with missing values to your validation dataset.
ユーザの投票
コメント(8)
- 正解だと思う選択肢: C
When handling missing values in a categorical feature, replacing the missing values with a placeholder category indicating a missing value, as described in option C, is the most appropriate solution in order to minimize bias that could result from the missing values. This approach allows the algorithm to treat missing values as a separate category, avoiding the risk of any assumptions being made about the missing values. Option A, removing the rows with missing values and upsampling the dataset by 5%, can lead to a loss of valuable data and can also introduce bias into the data. This approach can lead to overrepresentation of certain classes and underrepresentation of others.
Option B, replacing the missing values with the feature's mean, is not appropriate for categorical features as there is no meaningful average value for categorical features.
Option D, moving the rows with missing values to the validation dataset, is not a good solution. This approach may introduce bias into the validation dataset and can lead to overfitting.
👍 3shankalman7172023/02/24 C looks correct. We should replace the values with the a placeholder
👍 2hargur2022/12/23If you want to minimize the bias, why do not you use mean?
👍 2jdeix2023/01/25
シャッフルモード