Examtopics

AWS Certified Data Engineer - Associate

147
149

Topic 1 Question 148
An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.

A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.

Which solution will meet these requirements with the LEAST operational overhead?
- Use the FindMatches feature of AWS Glue to remove duplicate records.
- Use non-Windows functions in Amazon Athena to remove duplicate records.
- Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
- Use the global tables feature of Amazon DynamoDB to prevent duplicate data.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: A
  A - The other options are dumb and hardly make sense
  
  👍 2
  Fawk2024/09/18
- 正解だと思う選択肢: A
  A: Sí, porque AWS Glue FindMatches utiliza machine learning para deduplicar datos y corregir errores ortográficos con mínima sobrecarga operativa. B: No, usar Athena requiere escribir consultas manuales y no maneja bien las variaciones de escritura. C: No, Neptune ML está orientado a análisis en grafos, no a la deduplicación de datos semi-estructurados. D: No, global tables en DynamoDB se usan para replicación, no para eliminar duplicados.
  
  👍 1
  italiancloud20252025/02/18
シャッフルモード

147
149