Topic 1 Question 161

Professional Data Engineer

Topic 1 Question 161
You need to choose a database to store time series CPU and memory usage for millions of computers. You need to store this data in one-second interval samples. Analysts will be performing real-time, ad hoc analytics against the database. You want to avoid being charged for every query executed and ensure that the schema design will allow for future growth of the dataset. Which database and data model should you choose?
- Create a table in BigQuery, and append the new samples for CPU and memory to the table
- Create a wide table in BigQuery, create a column for the sample value at each second, and update the row with the interval for each second
- Create a narrow table in Bigtable with a row key that combines the Computer Engine computer identifier with the sample time at each second
- Create a wide table in Bigtable with a row key that combines the computer identifier with the sample time at each minute, and combine the values for each second as column data.
ユーザの投票
コメント(17)
- Answer C
  
  A tall and narrow table has a small number of events per row, which could be just one event, whereas a short and wide table has a large number of events per row. As explained in a moment, tall and narrow tables are best suited for time-series data.
  
  For time series, you should generally use tall and narrow tables. This is for two reasons: Storing one event per row makes it easier to run queries against your data. Storing many events per row makes it more likely that the total row size will exceed the recommended maximum (see Rows can be big but are not infinite).
  
  https://cloud.google.com/bigtable/docs/schema-design-time-series#patterns_for_row_key_design
  
  👍 29
  psu2020/04/30
- C correct answer
  
  👍 19
  madhu11712020/03/15
- Should be A question did not talk about latency without query cost -- BigQuery Cache flexible schema - BigQuery (nested and repeated)
  
  👍 5
  Sumanth092021/03/31
シャッフルモード

ユーザの投票

コメント(17)