Topic 1 Question 146

Professional Data Engineer

Topic 1 Question 146
You want to migrate an on-premises Hadoop system to Cloud Dataproc. Hive is the primary tool in use, and the data format is Optimized Row Columnar (ORC). All ORC files have been successfully copied to a Cloud Storage bucket. You need to replicate some data to the cluster's local Hadoop Distributed File System (HDFS) to maximize performance. What are two ways to start using Hive in Cloud Dataproc?

2 つ選択
- Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to HDFS. Mount the Hive tables locally.
- Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to any node of the Dataproc cluster. Mount the Hive tables locally.
- Run the gsutil utility to transfer all ORC files from the Cloud Storage bucket to the master node of the Dataproc cluster. Then run the Hadoop utility to copy them do HDFS. Mount the Hive tables from HDFS.
- Leverage Cloud Storage connector for Hadoop to mount the ORC files as external Hive tables. Replicate external Hive tables to the native ones.
- Load the ORC files into BigQuery. Leverage BigQuery connector for Hadoop to mount the BigQuery tables as external Hive tables. Replicate external Hive tables to the native ones.
ユーザの投票
コメント(17)
- Should be B C
  
  👍 17
  [Removed]2020/03/21
- Answer is C and D 100%. I know it says to transfer all the files but with the options provided c is the best choice. Explaination A and B cannot be true as gsutil can copy data to master node and the to hdfs from master node. C -> works D->works Recommended by google E-> Will work but as the question says maximize performance this is not a case. As bigquery hadoop connecter stores all the BQ data to GCS as temp and then processes it to HDFS. As data is already in GCS we donot need to load it to bq and use a connector then unloads it back to GCS and then processes it.
  
  👍 13
  Sid192021/12/14
- D and E are externam table which will have performance issue and which wil not fullfill the reqt, so D and E are eliminated. in ABC , A is eliminated because we cannot copy data directly from GCS bucket to HDFS through utility. so A is also eliminated. so only B and C are correct answer.
  
  👍 6
  apnu2020/12/31
シャッフルモード

ユーザの投票

コメント(17)