Topic 1 Question 177
You have created a Vertex AI pipeline that includes two steps. The first step preprocesses 10 TB data completes in about 1 hour, and saves the result in a Cloud Storage bucket. The second step uses the processed data to train a model. You need to update the model’s code to allow you to test different algorithms. You want to reduce pipeline execution time and cost while also minimizing pipeline changes. What should you do?
Add a pipeline parameter and an additional pipeline step. Depending on the parameter value, the pipeline step conducts or skips data preprocessing, and starts model training.
Create another pipeline without the preprocessing step, and hardcode the preprocessed Cloud Storage file location for model training.
Configure a machine with more CPU and RAM from the compute-optimized machine family for the data preprocessing step.
Enable caching for the pipeline job, and disable caching for the model training step.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: A
The pipeline already generates the preprocessed dataset and stores, there's no need to preprocess again for another model
👍 1pikachu0072024/01/11 - 正解だと思う選択肢: D
Not A. Adding a pipeline parameter and new pipeline steps does not minimise pipeline changes.
Not C. The idea is not to re-run the preprocessing step at all.
Not B. Creating a whole new pipeline implies a significant investment of effort.
I opt for D: Enabling caching only for preprocessing job (although it says “pipeline job” in the option, I think that is a typo). Quoting Vertex AI docs: “If there is a matching execution in Vertex ML Metadata, the outputs of that execution are used and the step is skipped. This helps to reduce costs by skipping computations that were completed in a previous pipeline run.” https://cloud.google.com/vertex-ai/docs/pipelines/configure-caching
👍 1b1a8fae2024/01/11
シャッフルモード