Topic 1 Question 148
An investment company needs to manage and extract insights from a volume of semi-structured data that grows continuously.
A data engineer needs to deduplicate the semi-structured data, remove records that are duplicates, and remove common misspellings of duplicates.
Which solution will meet these requirements with the LEAST operational overhead?
Use the FindMatches feature of AWS Glue to remove duplicate records.
Use non-Windows functions in Amazon Athena to remove duplicate records.
Use Amazon Neptune ML and an Apache Gremlin script to remove duplicate records.
Use the global tables feature of Amazon DynamoDB to prevent duplicate data.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: A
A - The other options are dumb and hardly make sense
👍 2Fawk2024/09/18 - 正解だと思う選択肢: A
A: Sí, porque AWS Glue FindMatches utiliza machine learning para deduplicar datos y corregir errores ortográficos con mínima sobrecarga operativa. B: No, usar Athena requiere escribir consultas manuales y no maneja bien las variaciones de escritura. C: No, Neptune ML está orientado a análisis en grafos, no a la deduplicación de datos semi-estructurados. D: No, global tables en DynamoDB se usan para replicación, no para eliminar duplicados.
👍 1italiancloud20252025/02/18
シャッフルモード