Topic 1 Question 190
You are loading CSV files from Cloud Storage to BigQuery. The files have known data quality issues, including mismatched data types, such as STRINGs and INT64s in the same column, and inconsistent formatting of values such as phone numbers or addresses. You need to create the data pipeline to maintain data quality and perform the required cleansing and transformation. What should you do?
Use Data Fusion to transform the data before loading it into BigQuery.
Use Data Fusion to convert the CSV files to a self-describing data format, such as AVRO, before loading the data to BigQuery.
Load the CSV files into a staging table with the desired schema, perform the transformations with SQL, and then write the results to the final destination table.
Create a table with the desired schema, load the CSV files into the table, and perform the transformations in place using SQL.
ユーザの投票
コメント(13)
- 正解だと思う選択肢: A
A is the answer.
https://cloud.google.com/data-fusion/docs/concepts/overview loud Data Fusion is a fully managed, cloud-native, enterprise data integration service for quickly building and managing data pipelines.
The Cloud Data Fusion web UI lets you to build scalable data integration solutions to clean, prepare, blend, transfer, and transform data, without having to manage the infrastructure.
👍 4zellck2022/11/28 - 正解だと思う選択肢: A
I'm kinda inclined towards C as SQL seems a powerful option to treat this kind of use case.
Also, I didn't get how the transformations mentioned on this page will help to clean the data (https://cloud.google.com/data-fusion/docs/concepts/transformation-pushdown#supported_transformations)
But I guess using Wrangler plugin, this kind of stuff can be done on DataFusion, also the question talks about an pipeline, so A is the final choice.
👍 3saurabhsingh4k2022/12/18 The Correct Ans is C
👍 2samirzubair2022/11/23
シャッフルモード