Topic 1 Question 35
A company uses Amazon S3 to store semi-structured data in a transactional data lake. Some of the data files are small, but other data files are tens of terabytes. A data engineer must perform a change data capture (CDC) operation to identify changed data from the data source. The data source sends a full snapshot as a JSON file every day and ingests the changed data into the data lake. Which solution will capture the changed data MOST cost-effectively?
Create an AWS Lambda function to identify the changes between the previous data and the current data. Configure the Lambda function to ingest the changes into the data lake.
Ingest the data into Amazon RDS for MySQL. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.
Use an open source data lake format to merge the data source with the S3 data lake to insert the new data and update the existing data.
Ingest the data into an Amazon Aurora MySQL DB instance that runs Aurora Serverless. Use AWS Database Migration Service (AWS DMS) to write the changed data to the data lake.
ユーザの投票
コメント(7)
- 正解だと思う選択肢: C👍 6GiorgioGss2024/03/11
- 正 解だと思う選択肢: C
This is a tricky one. Although option A seems like the best choice since it uses an AWS service, I believe using Delta/Iceberg APIs would be easier than writing custom code on Lambda
👍 4[Removed]2024/01/20 Relative to cost, here are docs for the reason for option C: https://docs.aws.amazon.com/AmazonS3/latest/dev/Welcome.html https://aws.amazon.com/blogs/big-data/ https://docs.aws.amazon.com/glue/latest/dg/welcome.html https://docs.aws.amazon.com/emr/
Here are docs for reasons the others are not correct: https://aws.amazon.com/lambda/pricing/ https://aws.amazon.com/rds/pricing/ https://aws.amazon.com/dms/pricing/
👍 2certplan2024/03/21
シャ ッフルモード