Topic 1 Question 111
You have historical data covering the last three years in BigQuery and a data pipeline that delivers new data to BigQuery daily. You have noticed that when the Data Science team runs a query filtered on a date column and limited to 30`"90 days of data, the query scans the entire table. You also noticed that your bill is increasing more quickly than you expected. You want to resolve the issue as cost-effectively as possible while maintaining the ability to conduct SQL queries. What should you do?
Re-create the tables using DDL. Partition the tables by a column containing a TIMESTAMP or DATE Type.
Recommend that the Data Science team export the table to a CSV file on Cloud Storage and use Cloud Datalab to explore the data by reading the files directly.
Modify your pipeline to maintain the last 30ג€"90 days of data in one table and the longer history in a different table to minimize full table scans over the entire history.
Write an Apache Beam pipeline that creates a BigQuery table per day. Recommend that the Data Science team use wildcards on the table name suffixes to select the data they need.
ユーザの投票
コメント(17)
should be A
👍 33[Removed]2020/03/22Answer: A Description: Partition is the solution for reducing cost and time
👍 18[Removed]2020/03/27I will go with Option A
👍 5arghya132020/11/18
シャッフルモード