Topic 1 Question 164

Professional Data Engineer

Topic 1 Question 164
You are working on a linear regression model on BigQuery ML to predict a customer's likelihood of purchasing your company's products. Your model uses a city name variable as a key predictive component. In order to train and serve the model, your data must be organized in columns. You want to prepare your data using the least amount of coding while maintaining the predictable variables. What should you do?
- Create a new view with BigQuery that does not include a column with city information.
- Use SQL in BigQuery to transform the state column using a one-hot encoding method, and make each city a column with binary values.
- Use TensorFlow to create a categorical variable with a vocabulary list. Create the vocabulary file and upload that as part of your model to BigQuery ML.
- Use Cloud Data Fusion to assign each city to a region that is labeled as 1, 2, 3, 4, or 5, and then use that number to represent the city in the model.
ユーザの投票
コメント(17)
- 正解だと思う選択肢: D
  If we're rigorous, as we should because it's a professional exam, I think option B is incorrect because it's one-hot-encoding the "state" column, if the answer was "city" column, then I'd go for B. As this is not the case and I do not accept an spelling error like this in an official question, I would go for D.
  
  👍 6
  cajica2023/02/05
- 正解だと思う選択肢: B
  The Cloud Data Fusion method will add unecessary weights to categories with higher value labels, which will skew the model. The best practice for encoding nominal categorical data is to one-hot-encode them into binary values. That is conveniently done in BigQuery:
  
  https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-auto-preprocessing#one_hot_encoding
  
  👍 4
  ovokpus2022/11/23
- 正解だと思う選択肢: D
  D uses the least amount of coding... even if the model is not good. B encodes the "state", not the "city".
  
  👍 4
  juliobs2023/03/22
シャッフルモード

ユーザの投票

コメント(17)