Topic 1 Question 238
You have deployed a scikit-team model to a Vertex AI endpoint using a custom model server. You enabled autoscaling: however, the deployed model fails to scale beyond one replica, which led to dropped requests. You notice that CPU utilization remains low even during periods of high load. What should you do?
Attach a GPU to the prediction nodes
Increase the number of workers in your model server
Schedule scaling of the nodes to match expected demand
Increase the minReplicaCount in your DeployedModel configuration
ユーザの投票
コメント(7)
- 正解だと思う選択肢: A
"We generally recommend starting with one worker or thread per core. If you notice that CPU utilization is low, especially under high load, or your model is not scaling up because CPU utilization is low, then increase the number of workers." https://cloud.google.com/vertex-ai/docs/general/deployment
👍 7sonicclasps2024/01/31 - 正解だと思う選択肢: B
Low CPU Utilization: Despite high load, low CPU utilization indicates underutilization of available resources, suggesting a bottleneck within the model server itself, not overall compute capacity. Worker Concurrency: Increasing the number of workers within the model server allows it to handle more concurrent requests, effectively utilizing available CPU resources and addressing the bottleneck.
👍 3pikachu0072024/01/12 - 正解だと思う選択肢: B
I went B
👍 2Carlose21082024/02/26
シャッフルモード