Topic 1 Question 166

Professional Data Engineer

Topic 1 Question 166
A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time has increased. You need to implement a change that would improve query performance in BigQuery. What should you do?
- Implement clustering in BigQuery on the ingest date column.
- Implement clustering in BigQuery on the package-tracking ID column.
- Tier older data onto Cloud Storage files and create a BigQuery table using Cloud Storage as an external data source.
- Re-create the table using data partitioning on the package delivery date.
ユーザの投票
コメント(6)
- 正解だと思う選択肢: B
  B; As the table has already created with ingest-date partitioning.
  
  👍 2
  pluiedust2022/09/07
- D is not correct becsuse This Is problem Is The Real Time so ingested date is the same as delivery date.
  
  👍 2
  John_Pongthorn2022/09/11
- 正解だと思う選択肢: B
  B is the answer.
  
  https://cloud.google.com/bigquery/docs/clustered-tables Clustered tables in BigQuery are tables that have a user-defined column sort order using clustered columns. Clustered tables can improve query performance and reduce query costs.
  
  In BigQuery, a clustered column is a user-defined table property that sorts storage blocks based on the values in the clustered columns. The storage blocks are adaptively sized based on the size of the table. A clustered table maintains the sort properties in the context of each operation that modifies it. Queries that filter or aggregate by the clustered columns only scan the relevant blocks based on the clustered columns instead of the entire table or table partition.
  
  👍 2
  zellck2022/11/30
シャッフルモード

ユーザの投票

コメント(6)