Topic 1 Question 209

Professional Data Engineer

Topic 1 Question 209
A shipping company has live package-tracking data that is sent to an Apache Kafka stream in real time. This is then loaded into BigQuery. Analysts in your company want to query the tracking data in BigQuery to analyze geospatial trends in the lifecycle of a package. The table was originally created with ingest-date partitioning. Over time, the query processing time has increased. You need to copy all the data to a new clustered table. What should you do?
- Re-create the table using data partitioning on the package delivery date.
- Implement clustering in BigQuery on the package-tracking ID column.
- Implement clustering in BigQuery on the ingest date column.
- Tier older data onto Cloud Storage files and create a BigQuery table using Cloud Storage as an external data source.
ユーザの投票
コメント(4)
- 正解だと思う選択肢: B
  Query Focus: Analysts are interested in geospatial trends within individual package lifecycles. Clustering by package-tracking ID physically co-locates related data, significantly improving query performance for these analyses.
  
  Addressing Slow Queries: Clustering addresses the query slowdown issue by optimizing data organization for the specific query patterns.
  
  Partitioning vs. Clustering:
  
  Partitioning: Divides data into segments based on a column's values, primarily for managing large datasets and optimizing query costs. Clustering: Organizes data within partitions for faster querying based on specific columns.
  
  👍 3
  e70ea9e2023/12/30
- 正解だと思う選択肢: B
  Answer is B
  
  👍 2
  raaad2024/01/02
- 正解だと思う選択肢: B
  This looks like Question #166
  
  Option B, implementing clustering in BigQuery on the package-tracking ID column, seems the most appropriate. It directly addresses the query slowdown issue by reorganizing the data in a way that aligns with the analysts' query patterns, leading to more efficient and faster query execution.
  
  👍 2
  MaxNRG2024/01/07
シャッフルモード

ユーザの投票

コメント(4)