Topic 1 Question 173

Professional Data Engineer

Topic 1 Question 173
You are designing a pipeline that publishes application events to a Pub/Sub topic. Although message ordering is not important, you need to be able to aggregate events across disjoint hourly intervals before loading the results to BigQuery for analysis. What technology should you use to process and load this data to BigQuery while ensuring that it will scale with large volumes of events?
- Create a Cloud Function to perform the necessary data processing that executes using the Pub/Sub trigger every time a new message is published to the topic.
- Schedule a Cloud Function to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.
- Schedule a batch Dataflow job to run hourly, pulling all available messages from the Pub/Sub topic and performing the necessary aggregations.
- Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.
ユーザの投票
コメント(6)
- D
  
  TUMBLE=> fixed windows. HOP=> sliding windows. SESSION=> session windows.
  
  👍 7
  Atnafu2022/12/16
- why not c ? as data is arriving hourly why we can use batch processing rather than streaming with 1 hour fixed window?
  
  👍 3
  musumusu2023/02/24
- 正解だと思う選択肢: D
  D. Create a streaming Dataflow job that reads continually from the Pub/Sub topic and performs the necessary aggregations using tumbling windows.
  
  A tumbling window represents a consistent, disjoint time interval in the data stream.
  
  Reference: https://cloud.google.com/dataflow/docs/concepts/streaming-pipelines#tumbling-windows
  
  👍 2
  AWSandeep2022/09/02
シャッフルモード

ユーザの投票

コメント(6)