Topic 1 Question 103
A company has an AWS Glue extract, transform, and load (ETL) job that runs every day at the same time. The job processes XML data that is in an Amazon S3 bucket. New data is added to the S3 bucket every day. A solutions architect notices that AWS Glue is processing all the data during each run. What should the solutions architect do to prevent AWS Glue from reprocessing old data?
Edit the job to use job bookmarks.
Edit the job to delete data after the data is processed.
Edit the job by setting the NumberOfWorkers field to 1.
Use a FindMatches machine learning (ML) transform.
ユーザの投票
コメント(11)
- 正解だと思う選択肢: A
This is the purpose of bookmarks: "AWS Glue tracks data that has already been processed during a previous run of an ETL job by persisting state information from the job run. This persisted state information is called a job bookmark. Job bookmarks help AWS Glue maintain state information and prevent the reprocessing of old data." https://docs.aws.amazon.com/glue/latest/dg/monitor-continuations.html
👍 28123jhl02022/10/18 - 正解だと思う選択肢: A👍 3LeGloupier2022/10/18
- 正解だと思う選択肢: A
Option A. Edit the job to use job bookmarks.
Job bookmarks in AWS Glue allow the ETL job to track the data that has been processed and to skip data that has already been processed. This can prevent AWS Glue from reprocessing old data and can improve the performance of the ETL job by only processing new data. To use job bookmarks, the solutions architect can edit the job and set the "Use job bookmark" option to "True". The ETL job will then use the job bookmark to track the data that has been processed and skip data that has already been processed in subsequent runs.
👍 3Buruguduystunstugudunstuy2022/12/27
シャッフルモード