Topic 1 Question 6
You work for a healthcare company that has a large on-premises data system containing patient records with personally identifiable information (PII) such as names, addresses, and medical diagnoses. You need a standardized managed solution that de-identifies PII across all your data feeds prior to ingestion to Google Cloud. What should you do?
Use Cloud Run functions to create a serverless data cleaning pipeline. Store the cleaned data in BigQuery.
Use Cloud Data Fusion to transform the data. Store the cleaned data in BigQuery.
Load the data into BigQuery, and inspect the data by using SQL queries. Use Dataflow to transform the data and remove any errors.
Use Apache Beam to read the data and perform the necessary cleaning and transformation operations. Store the cleaned data in BigQuery.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: B
Cloud Data Fusion can be used to Sensitive Protection Service to de-identify feeds. However, Cloud Dataflow (a Google managed version of Beam) is the more general approach being used. I am only selecting Data Fusion over Beam because it is a named Google service. Had they said Dataflow, I would have gone there instead.
👍 1rich_maverick2025/02/26 - 正解だと思う選択肢: B
The best option is B. Cloud Data Fusion. Option B is best because Data Fusion is a managed, visual, standardized data integration service ideal for building de-identification pipelines. Option A (Cloud Run functions) is incorrect because it requires more coding and is less inherently standardized for pipelines. Option C (Load to BigQuery first) is incorrect because it violates the requirement to de-identify before ingestion, creating a security risk. Option D (Apache Beam/Dataflow) is incorrect because while powerful, it's more code-centric and less of a pre-built managed solution compared to Data Fusion. Therefore, Option B, Cloud Data Fusion, is the best managed and standardized solution for pre-ingestion PII de-identification.
👍 1n21837128472025/03/05
シャッフルモード