Topic 1 Question 254
You are running a Dataflow streaming pipeline, with Streaming Engine and Horizontal Autoscaling enabled. You have set the maximum number of workers to 1000. The input of your pipeline is Pub/Sub messages with notifications from Cloud Storage. One of the pipeline transforms reads CSV files and emits an element for every CSV line. The job performance is low, the pipeline is using only 10 workers, and you notice that the autoscaler is not spinning up additional workers. What should you do to improve performance?
Enable Vertical Autoscaling to let the pipeline use larger workers.
Change the pipeline code, and introduce a Reshuffle step to prevent fusion.
Update the job to increase the maximum number of workers.
Use Dataflow Prime, and enable Right Fitting to increase the worker resources.
ユーザの投票
コメント(3)
- 正解だと思う選択肢: B
- Fusion optimization in Dataflow can lead to steps being "fused" together, which can sometimes hinder parallelization.
- Introducing a Reshuffle step can prevent fusion and force the distribution of work across more workers.
- This can be an effective way to improve parallelism and potentially trigger the autoscaler to increase the number of workers.
👍 4raaad2024/01/04 - 正解だと思う選択肢: D
D. Use Dataflow Prime, and enable Right Fitting to increase the worker resources.
👍 1scaenruy2024/01/03 - 正解だと思う選択肢: B
Problem is performnace and not using all workers properly, https://cloud.google.com/dataflow/docs/pipeline-lifecycle#fusion_optimization
👍 1GCP0012024/01/08
シャッフルモード