Topic 1 Question 136

Professional Data Engineer

Topic 1 Question 136
You are running a pipeline in Dataflow that receives messages from a Pub/Sub topic and writes the results to a BigQuery dataset in the EU. Currently, your pipeline is located in europe-west4 and has a maximum of 3 workers, instance type n1-standard-1. You notice that during peak periods, your pipeline is struggling to process records in a timely fashion, when all 3 workers are at maximum CPU utilization. Which two actions can you take to increase performance of your pipeline?

2 つ選択
- Increase the number of max workers
- Use a larger instance type for your Dataflow workers
- Change the zone of your Dataflow pipeline to run in us-central1
- Create a temporary table in Bigtable that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Bigtable to BigQuery
- Create a temporary table in Cloud Spanner that will act as a buffer for new data. Create a new step in your pipeline to write to this table first, and then create a new pipeline to write from Cloud Spanner to BigQuery
ユーザの投票
コメント(17)
- A & B instance n1-standard-1 is low configuration and hence need to be larger configuration, definitely B should be one of the option. Increase max workers will increase parallelism and hence will be able to process faster given larger CPU size and multi core processor instance type is chosen. Option A can be a better step.
  
  👍 47
  jvg6372020/03/18
- A & B.
  
  With autoscaling enabled, the Dataflow service does not allow user control of the exact number of worker instances allocated to your job. You might still cap the number of workers by specifying the --max_num_workers option when you run your pipeline. Here as per question CAP is 3, So we can change that CAP.
  
  For batch jobs, the default machine type is n1-standard-1. For streaming jobs, the default machine type for Streaming Engine-enabled jobs is n1-standard-2 and the default machine type for non-Streaming Engine jobs is n1-standard-4. When using the default machine types, the Dataflow service can therefore allocate up to 4000 cores per job. If you need more cores for your job, you can select a larger machine type.
  
  👍 13
  sumanshu2021/07/04
- A , E is correct
  
  👍 4
  haroldbenites2020/08/21
シャッフルモード

ユーザの投票

コメント(17)