Topic 1 Question 17

Professional Data Engineer

Topic 1 Question 17
Your company is migrating their 30-node Apache Hadoop cluster to the cloud. They want to re-use Hadoop jobs they have already created and minimize the management of the cluster as much as possible. They also want to be able to persist data beyond the life of the cluster. What should you do?
- Create a Google Cloud Dataflow job to process the data.
- Create a Google Cloud Dataproc cluster that uses persistent disks for HDFS.
- Create a Hadoop cluster on Google Compute Engine that uses persistent disks.
- Create a Cloud Dataproc cluster that uses the Google Cloud Storage connector.
- Create a Hadoop cluster on Google Compute Engine that uses Local SSD disks.
ユーザの投票
コメント(17)
- Answer: D Description: Dataproc is used to migrate Hadoop and Spark jobs on GCP. Dataproc with GCS connected through Google Cloud Storage connector helps store data after the life of the cluster. When the job is high I/O intensive, then we need to create a small persistent disk.
  
  👍 52
  [Removed]2020/03/26
- Correct : D
  
  👍 16
  [Removed]2020/03/16
- D is correct because it uses managed services, and also allows for the data to persist on GCS beyond the life of the cluster. A is not correct because the goal is to re-use their Hadoop jobs and MapReduce and/or Spark jobs cannot simply be moved to Dataflow. B is not correct because the goal is to persist the data beyond the life of the ephemeral clusters, and if HDFS is used as the primary attached storage mechanism, it will also disappear at the end of the cluster’s life. C is not correct because the goal is to use managed services as much as possible, and this is the opposite. E is not correct because the goal is to use managed services as much as possible, and this is the opposite.
  
  👍 10
  MaxNRG2021/11/09
シャッフルモード

ユーザの投票

コメント(17)