Topic 1 Question 13
Case study - An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. Before the ML engineer trains the model, the ML engineer must resolve the issue of the imbalanced data. Which solution will meet this requirement with the LEAST operational effort?
Use Amazon Athena to identify patterns that contribute to the imbalance. Adjust the dataset accordingly.
Use Amazon SageMaker Studio Classic built-in algorithms to process the imbalanced dataset.
Use AWS Glue DataBrew built-in features to oversample the minority class.
Use the Amazon SageMaker Data Wrangler balance data operation to oversample the minority class.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: D👍 4GiorgioGss2024/11/27
- 正解だと思う選 択肢: D
Both Glue DataBrew and Data Wrangler allows data preparation for ML with no-code/low-code (aka low ops effort). However, Data Wrangler provides built-in transformation for balancing dataset (random oversampling, random undersampling and smote) https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-balance-data while DataBrew doesn't provide built-in recipe step for balancing dataset, actually it provides a smaller set of data science recipe steps limited to binarization, bucketization, categorical mapping, one-hot encoding, scaling, skewness and tokenization https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.data-science.html
👍 1ninomfr642024/12/31
シャッフルモード