Topic 1 Question 20
A company has a large, unstructured dataset. The dataset includes many duplicate records across several key attributes. Which solution on AWS will detect duplicates in the dataset with the LEAST code development?
Use Amazon Mechanical Turk jobs to detect duplicates.
Use Amazon QuickSight ML Insights to build a custom deduplication model.
Use Amazon SageMaker Data Wrangler to pre-process and detect duplicates.
Use the AWS Glue FindMatches transform to detect duplicates.
ユーザの投票
コメント(5)
- 正解だと思う選択肢: D
https://aws.amazon.com/about-aws/whats-new/2021/11/aws-glue-findmatches-new-data-existing-dataset/ "allows you to identify duplicate or matching records in your dataset"
👍 5GiorgioGss2024/11/27 - 正解だと思う選択肢: D
AWS Glue FindMatches is specifically designed to identify duplicate or matching records in datasets without requiring labeled training data. It uses machine learning to find fuzzy matches and allows customization to fine-tune the matching process, making it ideal for this scenario.
👍 5Saransundar2024/12/04 - 正解だと思う選択肢: D
The AWS Glue FindMatches transform is the most appropriate solution because it is specifically designed to detect duplicates, requires minimal development effort, and scales efficiently for large datasets.
👍 3feelgoodfactor2024/12/16
シャッフルモード