Topic 1 Question 145
You have trained a DNN regressor with TensorFlow to predict housing prices using a set of predictive features. Your default precision is tf.float64, and you use a standard TensorFlow estimator:
Your model performs well, but just before deploying it to production, you discover that your current serving latency is 10ms @ 90 percentile and you currently serve on CPUs. Your production requirements expect a model latency of 8ms @ 90 percentile. You're willing to accept a small decrease in performance in order to reach the latency requirement. Therefore your plan is to improve latency while evaluating how much the model's prediction decreases. What should you first try to quickly lower the serving latency?
Switch from CPU to GPU serving.
Apply quantization to your SavedModel by reducing the floating point precision to tf.float16.
Increase the dropout rate to 0.8 and retrain your model.
Increase the dropout rate to 0.8 in _PREDICT mode by adjusting the TensorFlow Serving parameters.
ユーザの投票
コメント(8)
- 正解だと思う選択肢: B👍 4imamapri2023/02/03
- 正解だと思う選択肢: A
A makes sense too
👍 3TNT872023/03/07 - 正解だと思う選択肢: A
For tf.float16 [Option B], we would have to be on TFLite: https://discuss.tensorflow.org/t/convert-tensorflow-saved-model-from-float32-to-float16/12130 and resp. https://www.tensorflow.org/lite/performance/post_training_quantization#float16_quantization (plus “By default, a float16 quantized model will "dequantize" the weights values to float32 when run on the CPU. (Note that the GPU delegate will not perform this dequantization, since it can operate on float16 data.)”
👍 2M252023/05/10
シャッフルモード