Topic 1 Question 157
Your team develops services that run on Google Cloud. You want to process messages sent to a Pub/Sub topic, and then store them. Each message must be processed exactly once to avoid duplication of data and any data conflicts. You need to use the cheapest and most simple solution. What should you do?
Process the messages with a Dataproc job, and write the output to storage.
Process the messages with a Dataflow streaming pipeline using Apache Beam's PubSubIO package, and write the output to storage.
Process the messages with a Cloud Function, and write the results to a BigQuery location where you can run a job to deduplicate the data.
Retrieve the messages with a Dataflow streaming pipeline, store them in Cloud Bigtable, and use another Dataflow streaming pipeline to deduplicate messages.
ユーザの投票
コメント(3)
Answer is B
https://cloud.google.com/blog/products/data-analytics/handling-duplicate-data-in-streaming-pipeline-using-pubsub-dataflow "...because Pub/Sub provides each message with a unique message_id, Dataflow uses it to deduplicate messages by default if you use the built-in Apache Beam PubSubIO"
👍 2wrakky2023/01/04- 正解だと思う選択肢: B👍 1zellck2022/12/16
- 正解だと思う選択肢: B👍 1TNT872022/12/24
シャッフルモード