Topic 1 Question 184
You are building a report-only data warehouse where the data is streamed into BigQuery via the streaming API. Following Google's best practices, you have both a staging and a production table for the data. How should you design your data loading to ensure that there is only one master dataset without affecting performance on either the ingestion or reporting pieces?
Have a staging table that is an append-only model, and then update the production table every three hours with the changes written to staging.
Have a staging table that is an append-only model, and then update the production table every ninety minutes with the changes written to staging.
Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every three hours.
Have a staging table that moves the staged data over to the production table and deletes the contents of the staging table every thirty minutes.
ユーザの投票
コメント(10)
- 正解だと思う選択肢: C
[C] I found the correct answer based on a real case, where Google's Solutions Architect team decided to move an internal process to use BigQuery. The related doc is here: https://cloud.google.com/blog/products/data-analytics/moving-a-publishing-workflow-to-bigquery-for-new-data-insights
👍 12NicolasN2022/11/06 Vote B - "Some recently streamed rows might not be available for table copy typically for a few minutes. In rare cases, this can take up to 90 minutes" https://cloud.google.com/bigquery/docs/streaming-data-into-bigquery#dataavailability
👍 10nwk2022/09/04- 正解だと思う選択肢: C
C is the answer.
https://cloud.google.com/blog/products/data-analytics/moving-a-publishing-workflow-to-bigquery-for-new-data-insights Following common extract, transform, load (ETL) best practices, we used a staging table and a separate production table so that we could load data into the staging table without impacting users of the data. The design we created based on ETL best practices called for first deleting all the records from the staging table, loading the staging table, and then replacing the production table with the contents.
When using the streaming API, the BigQuery streaming buffer remains active for about 30 to 60 minutes or more after use, which means that you can’t delete or change data during that time. Since we used the streaming API, we scheduled the load every three hours to balance getting data into BigQuery quickly and being able to subsequently delete the data from the staging table during the load process.
👍 6zellck2022/11/29
シャッフルモード