Topic 1 Question 175
Your company is implementing a data warehouse using BigQuery, and you have been tasked with designing the data model. You move your on-premises sales data warehouse with a star data schema to BigQuery but notice performance issues when querying the data of the past 30 days. Based on Google's recommended practices, what should you do to speed up the query without increasing storage costs?
Denormalize the data.
Shard the data by customer ID.
Materialize the dimensional data in views.
Partition the data by transaction date.
ユーザの投票
コメント(11)
- 正解だと思う選択肢: D
D is the right answer because it does not increase storage costs. A is not correct because denormalization typically increases the amount of storage needed.
👍 6waiebdi2023/02/12 - 正解だと思う選択肢: D
A sneaky question. [D] Yes - Since data is queried with date criteria, partitioning by transaction date will surely speed it up without further cost. [A] Yes? - Star schema is a denormalized model but as user Reall01 pointed out, the option to use nested and repeated fields can be considered a further denormalization. And if the model hasn't frequently changing dimensions, this kind of denormalization will result in increased performance, according to https://cloud.google.com/bigquery/docs/loading-data#loading_denormalized_nested_and_repeated_data : "In some circumstances, denormalizing your data and using nested and repeated fields doesn't result in increased performance. Avoid denormalization in these use cases:
- You have a star schema with frequently changing dimensions"
I guess that the person who added this question, had in mind [D] as a correct answer. If the questioner had all the aforementioned under consideration, would state clearly if there are frequently changing dimensions in the schema.
👍 4NicolasN2022/11/06 D. Partition the data by transaction date.
Star schema is already denormalized.
👍 3AWSandeep2022/09/02
シャッフルモード