Topic 1 Question 32

Professional Data Engineer

Topic 1 Question 32
Your company is running their first dynamic campaign, serving different offers by analyzing real-time data during the holiday season. The data scientists are collecting terabytes of data that rapidly grows every hour during their 30-day campaign. They are using Google Cloud Dataflow to preprocess the data and collect the feature (signals) data that is needed for the machine learning model in Google Cloud Bigtable. The team is observing suboptimal performance with reads and writes of their initial load of 10 TB of data. They want to improve this performance while minimizing cost. What should they do?
- Redefine the schema by evenly distributing reads and writes across the row space of the table.
- The performance issue should be resolved over time as the site of the BigDate cluster is increased.
- Redesign the schema to use a single row key to identify values that need to be updated frequently in the cluster.
- Redesign the schema to use row keys based on numeric IDs that increase sequentially per user viewing the offers.
ユーザの投票
コメント(17)
- I hate it when I read the question, than I think oh easy and I KNOW the answer, then I look at the choices and the answer I thought of is just not there at all... and I realize I absolutely have no idea :'D
  
  👍 45
  IsaB2020/09/14
- Correct A
  
  👍 22
  [Removed]2020/03/20
- A as the schema needs to be redesigned to distribute the reads and writes evenly across each table. Refer GCP documentation - Bigtable Performance: https://cloud.google.com/bigtable/docs/performance The table's schema is not designed correctly. To get good performance from Cloud Bigtable, it's essential to design a schema that makes it possible to distribute reads and writes evenly across each table. See Designing Your Schema for more information. https://cloud.google.com/bigtable/docs/schema-design Option B is wrong as increasing the size of cluster would increase the cost. Option C is wrong as single row key for frequently updated identifiers reduces performance Option D is wrong as sequential IDs would degrade the performance. A safer approach is to use a reversed version of the user's numeric ID, which spreads traffic more evenly across all of the nodes for your Cloud Bigtable table.
  
  👍 11
  MaxNRG2021/11/14
シャッフルモード

ユーザの投票

コメント(17)