Topic 1 Question 124

Professional Data Engineer

Topic 1 Question 124
You are designing a cloud-native historical data processing system to meet the following conditions: ✑ The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Dataproc, BigQuery, and Compute Engine. ✑ A batch pipeline moves daily data. ✑ Performance is not a factor in the solution. ✑ The solution design should maximize availability. How should you design data storage for this solution?
- Create a Dataproc cluster with high availability. Store the data in HDFS, and perform analysis as needed.
- Store the data in BigQuery. Access the data using the BigQuery Connector on Dataproc and Compute Engine.
- Store the data in a regional Cloud Storage bucket. Access the bucket directly using Dataproc, BigQuery, and Compute Engine.
- Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Dataproc, BigQuery, and Compute Engine.
ユーザの投票
コメント(6)
- 正解だと思う選択肢: D
  D of course
  
  👍 2
  devaid2022/10/16
- 正解だと思う選択肢: D
  Problem: How to store data? Considerations: High availability, performance not an issue
  
  A → avoid HDFS C → multi-regional > regional in terms of availability
  
  B could be the answer but we’re dealing with PDF documents, we need blob storage (cloud storage). If we only have csv or Avro, this may be the answer
  
  👍 2
  jkhong2022/10/17
- 正解だと思う選択肢: D
  D is the answer.
  
  👍 2
  zellck2022/12/03
シャッフルモード

ユーザの投票

コメント(6)