Topic 1 Question 209
You are training a custom language model for your company using a large dataset. You plan to use the Reduction Server strategy on Vertex AI. You need to configure the worker pools of the distributed training job. What should you do?
Configure the machines of the first two worker pools to have GPUs, and to use a container image where your training code runs. Configure the third worker pool to have GPUs, and use the reductionserver container image.
Configure the machines of the first two worker pools to have GPUs and to use a container image where your training code runs. Configure the third worker pool to use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
Configure the machines of the first two worker pools to have TPUs and to use a container image where your training code runs. Configure the third worker pool without accelerators, and use the reductionserver container image without accelerators, and choose a machine type that prioritizes bandwidth.
Configure the machines of the first two pools to have TPUs, and to use a container image where your training code runs. Configure the third pool to have TPUs, and use the reductionserver container image.
ユーザの投票
コメント(2)
- 正解だと思う選択肢: B
Worker Pools 1 and 2: These pools are responsible for the actual model training tasks. They require GPUs (or TPUs, if applicable to your model) to accelerate model computations. They run the container image containing your training code. Worker Pool 3: This pool is dedicated to the reduction server. It doesn't require accelerators (GPUs or TPUs) for gradient aggregation. Prioritize machines with high network bandwidth to optimize gradient exchange. Use the specific reductionserver
👍 1pikachu0072024/01/12 - 正解だと思う選択肢: B
bandwidth is important for the reduction server
👍 1winston92024/01/13
シャッフルモード