Topic 1 Question 4
You want to process and load a daily sales CSV file stored in Cloud Storage into BigQuery for downstream reporting. You need to quickly build a scalable data pipeline that transforms the data while providing insights into data quality issues. What should you do?
Create a batch pipeline in Cloud Data Fusion by using a Cloud Storage source and a BigQuery sink.
Load the CSV file as a table in BigQuery, and use scheduled queries to run SQL transformation scripts.
Load the CSV file as a table in BigQuery. Create a batch pipeline in Cloud Data Fusion by using a BigQuery source and sink.
Create a batch pipeline in Dataflow by using the Cloud Storage CSV file to BigQuery batch template.
ユーザの投票
コメント(3)
- 正解だと思う選択肢: A
Cloud Data Fusion enables us to build a scalable data pipeline from Cloud Storage to BigQuery. In addition, the service provides us an end-to-end data lineage for root cause and impact analysis.
👍 4trashbox2025/01/22 - 正解だと思う選択肢: D
There should be more detail in the question. Though both Dataflow and Datafusion can be used. Datafusion is more suitable if you dont want to code and let google do the work. In case there is more complexity in daily analysis of the CSV, Dataflow is the best approach as it provide in built templates and custom template creation both.
👍 2jatinbhatia20552025/02/23 - 正解だと思う選択肢: A
The best option is A. Cloud Data Fusion pipeline (Cloud Storage to BigQuery). Option A is best because Cloud Data Fusion is visual and fast for pipeline building, scalable, handles transformations visually, and provides data quality insights within the pipeline. Option B (BigQuery load + SQL) is incorrect because scheduled queries are less of a pipeline and offer fewer built-in data quality features. Option C (BigQuery load + Data Fusion BQ to BQ) is incorrect because it's inefficient and redundant to load to BigQuery before Data Fusion. Option D (Dataflow template) is incorrect because while scalable, Data Fusion is often quicker to build visually for simpler pipelines. Therefore, Option A, Cloud Data Fusion, is the best balance of speed, scalability, and data quality for this task.
👍 2n21837128472025/03/05
シャッフルモード