Examtopics

Professional Machine Learning Engineer
  • Topic 1 Question 212

    You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop. Model training will use a large batch size, and you expect training to take several weeks. You need to configure a training architecture that minimizes both training time and compute costs. What should you do?

    • Implement 8 workers of a2-megagpu-16g machines by using tf.distribute.MultiWorkerMirroredStrategy.

    • Implement a TPU Pod slice with -accelerator-type=v4-l28 by using tf.distribute.TPUStrategy.

    • Implement 16 workers of c2d-highcpu-32 machines by using tf.distribute.MirroredStrategy.

    • Implement 16 workers of a2-highgpu-8g machines by using tf.distribute.MultiWorkerMirroredStrategy.


    シャッフルモード