Topic 1 Question 287
2 つ選択A data engineer is evaluating customer data in Amazon SageMaker Data Wrangler. The data engineer will use the customer data to create a new model to predict customer behavior.
The engineer needs to increase the model performance by checking for multicollinearity in the dataset.
Which steps can the data engineer take to accomplish this with the LEAST operational effort?
Use SageMaker Data Wrangler to refit and transform the dataset by applying one-hot encoding to category-based variables.
Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values.
Use the SageMaker Data Wrangler Quick Model visualization to quickly evaluate the dataset and to produce importance scores for each feature.
Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data.
Use SageMaker Data Wrangler diagnostic visualization. Use least absolute shrinkage and selection operator (LASSO) to plot coefficient values from a LASSO model that is trained on the dataset.
ユーザの投票
コメント(3)
B and E Explanation:
Option B: Principal components analysis (PCA) and singular value decomposition (SVD) are techniques used to identify multicollinearity in a dataset. By visualizing the singular values, the data engineer can assess the level of multicollinearity present in the features. This approach is effective for detecting relationships among variables.
Option E: LASSO (Least Absolute Shrinkage and Selection Operator) is a regularization technique that can be used to penalize certain coefficients and, in turn, highlight the most important features. By plotting the coefficient values from a LASSO model, the data engineer can identify variables that contribute the most to the model. This can be useful for identifying and mitigating multicollinearity.
👍 1xiaoeason2023/12/15- 正解だと思う選択肢: BD
B. Use SageMaker Data Wrangler diagnostic visualization. Use principal components analysis (PCA) and singular value decomposition (SVD) to calculate singular values.
PCA and SVD can help in identifying multicollinearity by analyzing the correlation structure of the variables. High condition numbers or small singular values may indicate multicollinearity issues. D. Use the SageMaker Data Wrangler Min Max Scaler transform to normalize the data.
Normalizing the data using techniques like Min-Max scaling can mitigate the impact of multicollinearity. Normalization helps in bringing the features to a similar scale, reducing the sensitivity to differences in magnitudes.
👍 1aquanaveen2023/12/17 - 正解だと思う選択肢: BE
PCA and SVD calculate singular values, which indicate the contribution of each feature to the overall variance. Features with high singular values have less multicollinearity.
LASSO regularization shrinks coefficient values of highly correlated features towards zero, highlighting potential multicollinearity through their relative sizes.
👍 1taustin22023/12/19
シャッフルモード