Topic 1 Question 263
You maintain ETL pipelines. You notice that a streaming pipeline running on Dataflow is taking a long time to process incoming data, which causes output delays. You also noticed that the pipeline graph was automatically optimized by Dataflow and merged into one step. You want to identify where the potential bottleneck is occurring. What should you do?
Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.
Insert output sinks after each key processing step, and observe the writing throughput of each block.
Log debug information in each ParDo function, and analyze the logs at execution time.
Verify that the Dataflow service accounts have appropriate permissions to write the processed data to the output sinks.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: A
A. Insert a Reshuffle operation after each processing step, and monitor the execution details in the Dataflow console.
👍 2scaenruy2024/01/03 - 正解だと思う選択肢: A
- The Reshuffle operation is used in Dataflow pipelines to break fusion and redistribute elements, which can sometimes help improve parallelization and identify bottlenecks.
- By inserting Reshuffle after each processing step and observing the pipeline's performance in the Dataflow console, you can potentially identify stages that are disproportionately slow or stalled.
- This can help in pinpointing the step where the bottleneck might be occurring.
👍 2raaad2024/01/05
シャッフルモード