Examtopics

Professional Data Engineer
  • Topic 1 Question 124

    You are designing a cloud-native historical data processing system to meet the following conditions: ✑ The data being analyzed is in CSV, Avro, and PDF formats and will be accessed by multiple analysis tools including Dataproc, BigQuery, and Compute Engine. ✑ A batch pipeline moves daily data. ✑ Performance is not a factor in the solution. ✑ The solution design should maximize availability. How should you design data storage for this solution?

    • Create a Dataproc cluster with high availability. Store the data in HDFS, and perform analysis as needed.

    • Store the data in BigQuery. Access the data using the BigQuery Connector on Dataproc and Compute Engine.

    • Store the data in a regional Cloud Storage bucket. Access the bucket directly using Dataproc, BigQuery, and Compute Engine.

    • Store the data in a multi-regional Cloud Storage bucket. Access the data directly using Dataproc, BigQuery, and Compute Engine.


    シャッフルモード