Topic 1 Question 62
Your company receives both batch- and stream-based event data. You want to process the data using Google Cloud Dataflow over a predictable time period. However, you realize that in some instances data can arrive late or out of order. How should you design your Cloud Dataflow pipeline to handle data that is late or out of order?
Set a single global window to capture all the data.
Set sliding windows to capture all the lagged data.
Use watermarks and timestamps to capture the lagged data.
Ensure every datasource type (stream or batch) has a timestamp, and use the timestamps to define the logic for lagged data.
ユーザの投票
コメント(17)
Answer: C Description: A watermark is a threshold that indicates when Dataflow expects all of the data in a window to have arrived. If new data arrives with a timestamp that's in the window but older than the watermark, the data is considered late data.
👍 39[Removed]2020/03/27Answer: C
👍 17[Removed]2020/03/21Answer should be C. sliding windows are meant for calculating running average and not lagging data. Watermark is best for this purpose
👍 7safiyu2021/08/13
シャッフルモード