Topic 1 Question 33
You have a demand forecasting pipeline in production that uses Dataflow to preprocess raw data prior to model training and prediction. During preprocessing, you employ Z-score normalization on data stored in BigQuery and write it back to BigQuery. New training data is added every week. You want to make the process more efficient by minimizing computation time and manual intervention. What should you do?
Normalize the data using Google Kubernetes Engine.
Translate the normalization algorithm into SQL for use with BigQuery.
Use the normalizer_fn argument in TensorFlow's Feature Column API.
Normalize the data with Apache Spark using the Dataproc connector for BigQuery.
ユーザの投票
コメント(9)
B. I think. BiqQuery definitely minimizes computational time for normalization. I think it would also minimize manual intervention. For data normalization in dataflow you'd have to pass in values of mean and standard deviation as a side-input. That seems more work than a simple SQL query
👍 18maartenalexander2021/06/22B. I agree with B as well.
👍 3alashin2021/07/05- 正解だと思う選択肢: B
B is the most efficient as you will not load --> process --> save , no you will only write some sql in bigquery and voila :D
👍 3Mohamed_Mossad2022/06/13
シャッフルモード