Topic 1 Question 277
You work for a large bank that serves customers through an application hosted in Google Cloud that is running in the US and Singapore. You have developed a PyTorch model to classify transactions as potentially fraudulent or not. The model is a three-layer perceptron that uses both numerical and categorical features as input, and hashing happens within the model.
You deployed the model to the us-central1 region on nl-highcpu-16 machines, and predictions are served in real time. The model's current median response latency is 40 ms. You want to reduce latency, especially in Singapore, where some customers are experiencing the longest delays. What should you do?
Attach an NVIDIA T4 GPU to the machines being used for online inference.
Change the machines being used for online inference to nl-highcpu-32.
Deploy the model to Vertex AI private endpoints in the us-central1 and asia-southeast1 regions, and allow the application to choose the appropriate endpoint.
Create another Vertex AI endpoint in the asia-southeast1 region, and allow the application to choose the appropriate endpoint.
ユーザの投票
コメント(10)
- 正解だと思う選択肢: C
My Answer: C The bottleneck is network latency. So, A: Not Correct: might improve performance, but it's an expensive solution and may not be necessary if the bottleneck is network latency. B: Not Correct: might offer slight improvement, but the primary issue is geographical distance between users and the model. C: CORRECT: This approach leverages the geographical proximity of the endpoints to the users, reducing latency for customers in Singapore without neglecting customers in the US. Additionally, using Vertex AI private endpoints ensures secure and efficient communication between the application and the model. D: Not Correct: it's not the most efficient approach because it does not utilize the existing infrastructure in the us-central1 region, and managing multiple endpoints might introduce additional complexity.
👍 8guilhermebutzke2024/02/19 - 正解だと思う選択肢: D
By having an endpoint in the asia-southeast1 region (Singapore), the data doesn't have to travel as far, significantly reducing the round-trip time. Allowing the application to choose the appropriate endpoint based on the user's location ensures that requests are handled by the nearest available server, optimizing response times for users in different regions.
👍 2tavva_prudhvi2024/03/30 - 正解だと思う選択肢: C
Deploying the model to a Vertex AI private endpoint in the Singapore region brings the model closer to users in that region. This significantly reduces network latency for those users compared to accessing the model hosted in us-central1. Allowing the application to choose the appropriate endpoint based on user location (through private endpoints) ensures users access the geographically closest model replica, optimizing latency. Why not D: creating a separate endpoint in Singapore would allow regional deployment, it wouldn't automatically route users to the closest endpoint. You still need additional logic within the application for regional routing, increasing complexity.
👍 2omermahgoub2024/04/13
シャッフルモード