Topic 1 Question 12
Case study - An ML engineer is developing a fraud detection model on AWS. The training dataset includes transaction logs, customer profiles, and tables from an on-premises MySQL database. The transaction logs and customer profiles are stored in Amazon S3. The dataset has a class imbalance that affects the learning of the model's algorithm. Additionally, many of the features have interdependencies. The algorithm is not capturing all the desired underlying patterns in the data. The training dataset includes categorical data and numerical data. The ML engineer must prepare the training dataset to maximize the accuracy of the model. Which action will meet this requirement with the LEAST operational overhead?
Use AWS Glue to transform the categorical data into numerical data.
Use AWS Glue to transform the numerical data into categorical data.
Use Amazon SageMaker Data Wrangler to transform the categorical data into numerical data.
Use Amazon SageMaker Data Wrangler to transform the numerical data into categorical data.
ユーザの投票
コメント(3)
- 正解だと思う選択肢: C👍 3GiorgioGss2024/11/27
- 正解だと思う選択肢: C
Data Wrangler can be used for encoding categorical data, i.e. the process of creating a numerical representation for categories. Categorical encoding encodes categorical data that is in string format into arrays of integers. Data Wrangler supports ordinal and a one-hot encoding, also similarity encoding (more advanced). https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cat-encode
AWS Glue also has Data science recipe steps for One Hot Encoding and Categorical Mapping. https://docs.aws.amazon.com/databrew/latest/dg/recipe-actions.data-science.html
However Data Wrangler is more user-friendly with visual and natural language interfaces for less operational overhead
👍 1Pofmagic2024/12/27 - 正解だと思う選択肢: C
You need to transform category to numeric as ML model works with numbers, thus it is either A or C. Data Wrangler provides a builtin transformation to encode categorical data - https://docs.aws.amazon.com/sagemaker/latest/dg/data-wrangler-transform.html#data-wrangler-transform-cat-encode while Glue doesn't provide a managed transformation for encoding data - https://docs.aws.amazon.com/glue/latest/dg/edit-jobs-transforms.html
👍 1ninomfr642024/12/29
シャッフルモード