Examtopics

Professional Machine Learning Engineer
  • Topic 1 Question 131

    You are an ML engineer at a mobile gaming company. A data scientist on your team recently trained a TensorFlow model, and you are responsible for deploying this model into a mobile application. You discover that the inference latency of the current model doesn’t meet production requirements. You need to reduce the inference time by 50%, and you are willing to accept a small decrease in model accuracy in order to reach the latency requirement. Without training a new model, which model optimization technique for reducing latency should you try first?

    • Weight pruning

    • Dynamic range quantization

    • Model distillation

    • Dimensionality reduction


    シャッフルモード