Examtopics

Professional Data Engineer

60
62

Topic 1 Question 61
Your analytics team wants to build a simple statistical model to determine which customers are most likely to work with your company again, based on a few different metrics. They want to run the model on Apache Spark, using data housed in Google Cloud Storage, and you have recommended using Google Cloud Dataproc to execute this job. Testing has shown that this workload can run in approximately 30 minutes on a 15-node cluster, outputting the results into Google BigQuery. The plan is to run this workload weekly. How should you optimize the cluster for cost?
- Migrate the workload to Google Cloud Dataflow
- Use pre-emptible virtual machines (VMs) for the cluster
- Use a higher-memory node so that the job runs faster
- Use SSDs on the worker nodes so that the job can run faster
ユーザの投票
コメント(17)
- B. (Hadoop/Spark jobs are run on Dataproc, and the pre-emptible machines cost 80% less)
  
  👍 43
  jvg6372020/03/16
- I think the answer should be B:
  
  https://cloud.google.com/dataproc/docs/concepts/compute/preemptible-vms
  
  👍 16
  rickywck2020/03/16
- 正解だと思う選択肢: B
  "this workload can run in approximately 30 minutes on a 15-node cluster," so you need performance for only 30 mins -> preemptible VMs
  
  https://cloud.google.com/dataproc/docs/concepts/compute/preemptible-vms
  
  👍 4
  medeis_jar2022/01/04
シャッフルモード

60
62