Topic 1 Question 124
You are designing a cloud-native historical data processing system to meet the following conditions: ✑ The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Dataproc, BigQuery, and Compute Engine. ✑ A batch pipeline moves daily data. ✑ Performance is not a factor in the solution. ✑ The solution design should maximize availability. How should you design data storage for this solution?
Create a Dataproc cluster with high availability. Store the data in HDFS, and perform analysis as needed.
Store the data in BigQuery. Access the data using the BigQuery Connector on Dataproc and Compute Engine.
Store the data in a regional Cloud Storage bucket. Access the bucket directly using Dataproc, BigQuery, and Compute Engine.
Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Dataproc, BigQuery, and Compute Engine.
ユーザの投票
コメント(6)
- 正解だと思う選択肢: D
D of course
👍 2devaid2022/10/16 - 正解だと思う選択肢: D
Problem: How to store data? Considerations: High availability, performance not an issue
A → avoid HDFS C → multi-regional > regional in terms of availability
B could be the answer but we’re dealing with PDF documents, we need blob storage (cloud storage). If we only have csv or Avro, this may be the answer
👍 2jkhong2022/10/17 - 正解だと思う選択肢: D
D is the answer.
👍 2zellck2022/12/03
シャッフルモード