Topic 1 Question 287
You are tasked with building an MLOps pipeline to retrain tree-based models in production. The pipeline will include components related to data ingestion, data processing, model training, model evaluation, and model deployment. Your organization primarily uses PySpark-based workloads for data preprocessing. You want to minimize infrastructure management effort. How should you set up the pipeline?
Set up a TensorFlow Extended (TFX) pipeline on Vertex AI Pipelines to orchestrate the MLOps pipeline. Write a custom component for the PySpark-based workloads on Dataproc.
Set up a Vertex AI Pipelines to orchestrate the MLOps pipeline. Use the predefined Dataproc component for the PySpark-based workloads.
Set up Kubeflow Pipelines on Google Kubernetes Engine to orchestrate the MLOps pipeline. Write a custom component for the PySparkbased workloads on Dataproc.
Set up Cloud Composer to orchestrate the MLOps pipeline. Use Dataproc workflow templates for the PySpark-based workloads in Cloud Composer.
ユーザの投票
コメント(4)
- 正解だと思う選択肢: B
B) Best option due to higher ease of use, integration with existing PySpark infrastructure (via Dataproc) and minimal infrastructure management overhead, because: Vertex AI Pipelines is fully managed, minimizing infra management effort and natively integrated with Dataproc for PySpark (while Composer is not); Dataproc’s predefined component for PySpark workload reduces effort and error probability; It is suitable for tree-based models (other options are too, but with more effort)
👍 2carolctech2024/10/26 - 正解だと思う選択肢: B
This is the most suitable approach
👍 2AB_C2024/11/27 - 正解だと思う選択肢: B
A- Rejected due to component for the PySpark-based C- Kubeflow Pipelines not a managed service and the question mentions 'minimize infrastructure management effort' D-
👍 1Omi_040402024/12/09
シャッフルモード