Topic 1 Question 25
A data engineer needs to join data from multiple sources to perform a one-time analysis job. The data is stored in Amazon DynamoDB, Amazon RDS, Amazon Redshift, and Amazon S3. Which solution will meet this requirement MOST cost-effectively?
Use an Amazon EMR provisioned cluster to read from all sources. Use Apache Spark to join the data and perform the analysis.
Copy the data from DynamoDB, Amazon RDS, and Amazon Redshift into Amazon S3. Run Amazon Athena queries directly on the S3 files.
Use Amazon Athena Federated Query to join the data from all data sources.
Use Redshift Spectrum to query data from DynamoDB, Amazon RDS, and Amazon S3 directly from Redshift.
ユーザの投票
コメント(6)
- 正解だと思う選択肢: C
I would go for C because Federated Query is typical for this porpouse. Besides, we don't need to add/duplicate resources in S3. But I see that, becasuse Athena is more optimized for S3, it can be considered a tricky question, since there can be more trade-offs to consider, such as data governance that are easier if data is centralized in S3 in my opinion.
👍 7lucas_rfsb2024/09/30 - 正解だと思う選択肢: C
You can query these sources by using Federated Queries, which is a native feature of Athena. The other options may increase costs and operational overhead, as they use more than one service to achieve the same result
https://docs.aws.amazon.com/athena/latest/ug/connectors-available.html
👍 4[Removed]2024/07/20 - 正解だと思う選択肢: C
Serverless Processing: Athena is a serverless query service, meaning you only pay for the queries you run. This eliminates the need to provision and manage compute resources like in EMR clusters, making it ideal for one-time jobs. Federated Query Capability: Athena Federated Query allows you to directly query data from various sources like DynamoDB, RDS, Redshift, and S3 without physically moving the data. This eliminates data movement costs and simplifies the analysis process. Reduced Cost for Large Datasets: Compared to copying data to S3, which can be expensive for large datasets, Athena Federated Query avoids unnecessary data movement, reducing overall costs.
👍 4pypelyncar2024/12/08
シャッフルモード