Topic 1 Question 28

Professional Data Engineer

Topic 1 Question 28
Your company is performing data preprocessing for a learning algorithm in Google Cloud Dataflow. Numerous data logs are being are being generated during this step, and the team wants to analyze them. Due to the dynamic nature of the campaign, the data is growing exponentially every hour. The data scientists have written the following code to read the data for a new key features in the logs. You want to improve the performance of this data read. What should you do?
- Specify the TableReference object in the code.
- Use .fromQuery operation to read specific fields from the table.
- Use of both the Google BigQuery TableSchema and TableFieldSchema classes.
- Call a transform that returns TableRow objects, where each element in the PCollection represents a single row in the table.
ユーザの投票
コメント(11)
- B BigQueryIO.read.fromQuery() executes a query and then reads the results received after the query execution. Therefore, this function is more time-consuming, given that it requires that a query is first executed (which will incur in the corresponding economic and computational costs).
  
  👍 10
  arthur23852022/09/02
- Since we want to be able to analyze data from a new ML feature (column) we only need to check values from that column. By doing a fromQuery(SELECT featueColum FROM table) we are optimizing costs and performance since we are not checking all columns.
  
  https://cloud.google.com/bigquery/docs/best-practices-costs#avoid_select_
  
  👍 6
  maxdataengineer2022/10/08
- 正解だと思う選択肢: B
  reading only relevant cols
  
  👍 5
  gcm72022/10/12
シャッフルモード

ユーザの投票

コメント(11)