Topic 1 Question 24
Your organization has a petabyte of application logs stored as Parquet files in Cloud Storage. You need to quickly perform a one-time SQL-based analysis of the files and join them to data that already resides in BigQuery. What should you do?
Create a Dataproc cluster, and write a PySpark job to join the data from BigQuery to the files in Cloud Storage.
Launch a Cloud Data Fusion environment, use plugins to connect to BigQuery and Cloud Storage, and use the SQL join operation to analyze the data.
Create external tables over the files in Cloud Storage, and perform SQL joins to tables in BigQuery to analyze the data.
Use the bq load command to load the Parquet files into BigQuery, and perform SQL joins to analyze the data.
ユーザの投票
コメント(1)
- 正解だと思う選択肢: C
The most efficient and quick solution for a one-time SQL analysis of petabyte-scale Parquet files in Cloud Storage joined with BigQuery data is C. Create external tables over the files in Cloud Storage and perform SQL joins. External tables allow you to query data directly in Cloud Storage with SQL, avoiding the time and cost of loading a petabyte of data into BigQuery. This is ideal for a fast, one-time analysis. Options A (Dataproc/Spark) and B (Cloud Data Fusion) are more complex and slower for a quick analysis. Option D (bq load) is inefficient and slow as it requires loading a petabyte of data into BigQuery, which is unnecessary for a one-time analysis of external files. Therefore, Option C provides the most direct, efficient, and SQL-centric approach for this scenario.
👍 1n21837128472025/02/27
シャッフルモード