Topic 1 Question 72
2 つ選択A Data Scientist is building a model to predict customer churn using a dataset of 100 continuous numerical features. The Marketing team has not provided any insight about which features are relevant for churn prediction. The Marketing team wants to interpret the model and see the direct impact of relevant features on the model outcome. While training a logistic regression model, the Data Scientist observes that there is a wide gap between the training and validation set accuracy. Which methods can the Data Scientist use to improve the model performance and satisfy the Marketing team's needs?
Add L1 regularization to the classifier
Add features to the dataset
Perform recursive feature elimination
Perform t-distributed stochastic neighbor embedding (t-SNE)
Perform linear discriminant analysis
ユーザの投票
コメント(9)
AC - correct answer
👍 13bluer12022/04/30- 正解だと思う選択肢: AC
overfitting: add regularization, remove features
👍 4NeverMinda2022/06/07 AC - Key: logistic regression model = non linear in terms of Odds and Probability, however it is linear in terms of Log Odds. Key: Large gap between training & validation = overfitting => 5 techniques to prevent overfitting:
- Simplifying the model | 2. Early stopping
- Use data argumentation | 4. Use regularization | 5. Use dropouts
A - yes to avoid overfitting (although i am thinking it is talking about regressor) Not B - add feature will lead to overfitting C - feature elimination - prevent overfitting Not D - t-SNE is a nonlinear dimensionality reduction technique Not E - find feature correlation only - Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics and other fields, to find a linear combination of features that characterizes or separates two or more classes of objects or events.
👍 4wisoxe83562022/12/04
シャッフルモード