Topic 1 Question 141
A retail company wants to combine its customer orders with the product description data from its product catalog. The structure and format of the records in each dataset is different. A data analyst tried to use a spreadsheet to combine the datasets, but the effort resulted in duplicate records and records that were not properly combined. The company needs a solution that it can use to combine similar records from the two datasets and remove any duplicates. Which solution will meet these requirements?
Use an AWS Lambda function to process the data. Use two arrays to compare equal strings in the fields from the two datasets and remove any duplicates.
Create AWS Glue crawlers for reading and populating the AWS Glue Data Catalog. Call the AWS Glue SearchTables API operation to perform a fuzzy- matching search on the two datasets, and cleanse the data accordingly.
Create AWS Glue crawlers for reading and populating the AWS Glue Data Catalog. Use the FindMatches transform to cleanse the data.
Create an AWS Lake Formation custom transform. Run a transformation for matching products from the Lake Formation console to cleanse the data automatically.
解説
ユーザの投票
コメント(5)
- 正解だと思う選択肢: C
C; Glue can use FindMatches transformation to find duplicates
👍 19spaceexplorer2022/04/30 - 正解だと思う選択肢: C
It is C as described in the tutorial - https://docs.aws.amazon.com/glue/latest/dg/machine-learning-transform-tutorial.html
LakeFormation can also invoke a FindMatches algorithm (because it manages Data Ingestion through Glue), but we don't have a data lake in this example. No one would build a whole Data Lake - a process that takes days - only to find some matching records.
👍 4uninit2023/01/28 D is correct
👍 2[Removed]2022/06/13
シャッフルモード