Topic 1 Question 148
You are training an ML model using data stored in BigQuery that contains several values that are considered Personally Identifiable Information (PII). You need to reduce the sensitivity of the dataset before training your model. Every column is critical to your model. How should you proceed?
Using Dataflow, ingest the columns with sensitive data from BigQuery, and then randomize the values in each sensitive column.
Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow with the DLP API to encrypt sensitive values with Format Preserving Encryption.
Use the Cloud Data Loss Prevention (DLP) API to scan for sensitive data, and use Dataflow to replace all sensitive data by using the encryption algorithm AES-256 with a salt.
Before training, use BigQuery to select only the columns that do not contain sensitive data. Create an authorized view of the data so that sensitive values cannot be accessed by unauthorized individuals.
ユーザの投票
コメント(6)
- 正解だと思う選択肢: B
Format Preserving Encryption uses deidentify configuration in which you can specify the param wrapped_key (the encrypted ('wrapped') AES-256 key to use). Answer is B according to me. Ref: https://cloud.google.com/dlp/docs/samples/dlp-deidentify-fpe
👍 3Scipione_2023/02/16 - 正解だと思う選択肢: D
This approach would allow you to keep the critical columns of data while reducing the sensitivity of the dataset by removing the personally identifiable information (PII) before training the model. By creating an authorized view of the data, you can ensure that sensitive values cannot be accessed by unauthorized individuals.
https://cloud.google.com/bigquery/docs/data-governance#data_loss_prevention
👍 2TNT872023/02/16 - 正解だと思う選択肢: C👍 1imamapri2023/02/03
シャッフルモード