Topic 1 Question 48
You started working on a classification problem with time series data and achieved an area under the receiver operating characteristic curve (AUC ROC) value of 99% for training data after just a few experiments. You haven't explored using any sophisticated algorithms or spent any time on hyperparameter tuning. What should your next step be to identify and fix the problem?
Address the model overfitting by using a less complex algorithm.
Address data leakage by applying nested cross-validation during model training.
Address data leakage by removing features highly correlated with the target value.
Address the model overfitting by tuning the hyperparameters to reduce the AUC ROC value.
ユーザの投票
コメント(8)
Ans: B (Ref: https://towardsdatascience.com/time-series-nested-cross-validation-76adba623eb9) (C) High correlation doesn't mean leakage. The question may suggest target leakage and the defining point of this leakage is the availability of data after the target is available.(https://www.kaggle.com/dansbecker/data-leakage)
👍 22Paul_Dirac2021/06/26- 正解だと思う選択肢: C
C: this is correct choice 1000000000% This is data leakage issue on training data https://cloud.google.com/automl-tables/docs/train#analyze The question is from this content. If a column's Correlation with Target value is high, make sure that is expected, and not an indication of target leakage.
Let 's explain on my owner way, sometime the feature used on training data use value to calculate something from target value unintentionally, it result in high correlation with each other. for instance , you predict stock price by using moving average, MACD , RSI despite the fact that 3 features have been calculated from price (target).
👍 4John_Pongthorn2023/03/05 - 正解だと思う選択肢: B
Quite tricky but through elimination, correct answer is B. Model overfitting doesn't apply here as we can't tell if a model is overfitting by just looking at training data results.
👍 3David_ml2022/05/10
シャッフルモード