Topic 1 Question 275

Professional Data Engineer

Topic 1 Question 275
You created an analytics environment on Google Cloud so that your data scientist team can explore data without impacting the on-premises Apache Hadoop solution. The data in the on-premises Hadoop Distributed File System (HDFS) cluster is in Optimized Row Columnar (ORC) formatted files with multiple columns of Hive partitioning. The data scientist team needs to be able to explore the data in a similar way as they used the on-premises HDFS cluster with SQL on the Hive query engine. You need to choose the most cost-effective storage and processing solution. What should you do?
- Import the ORC files to Bigtable tables for the data scientist team.
- Import the ORC files to BigQuery tables for the data scientist team.
- Copy the ORC files on Cloud Storage, then deploy a Dataproc cluster for the data scientist team.
- Copy the ORC files on Cloud Storage, then create external BigQuery tables for the data scientist team.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: D
  This approach leverages BigQuery's powerful analytics capabilities without the overhead of data transformation or maintaining a separate cluster, while also allowing your team to use SQL for data exploration, similar to their experience with the on-premises Hadoop/Hive environment.
  
  👍 2
  Smakyel792024/01/07
- 正解だと思う選択肢: D
  It leverages the strengths of BigQuery for SQL-based exploration while avoiding additional costs and complexity associated with data transformation or migration.
  
  The data remains in ORC format in Cloud Storage, and BigQuery's external tables feature allows direct querying of this data.
  👍 2
  raaad2024/01/09
シャッフルモード

ユーザの投票

コメント(2)