Topic 1 Question 123
A company reads data from customer databases that run on Amazon RDS. The databases contain many inconsistent fields. For example, a customer record field that iPnamed place_id in one database is named location_id in another database. The company needs to link customer records across different databases, even when customer record fields do not match.
Which solution will meet these requirements with the LEAST operational overhead?
Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use the FindMatches transform to find duplicate records in the data.
Create an AWS Glue crawler to craw the databases. Use the FindMatches transform to find duplicate records in the data. Evaluate and tune the transform by evaluating the performance and results.
Create an AWS Glue crawler to craw the databases. Use Amazon SageMaker to construct Apache Spark ML pipelines to find duplicate records in the data.
Create a provisioned Amazon EMR cluster to process and analyze data in the databases. Connect to the Apache Zeppelin notebook. Use an Apache Spark ML model to find duplicate records in the data. Evaluate and tune the model by evaluating the performance and results.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: B
Answer is B
👍 4komorebi2024/08/09 - 正解だと思う選択肢: B
AWS Glue Crawler: Automatically discovers the schema and structure of data in the RDS databases, saving significant manual effort. Creates a unified data catalog that can be queried or transformed.
👍 1HagarTheHorrible2024/12/18
シャッフルモード