Topic 1 Question 212
You are pre-training a large language model on Google Cloud. This model includes custom TensorFlow operations in the training loop. Model training will use a large batch size, and you expect training to take several weeks. You need to configure a training architecture that minimizes both training time and compute costs. What should you do?
Implement 8 workers of a2-megagpu-16g machines by using tf.distribute.MultiWorkerMirroredStrategy.
Implement a TPU Pod slice with -accelerator-type=v4-l28 by using tf.distribute.TPUStrategy.
Implement 16 workers of c2d-highcpu-32 machines by using tf.distribute.MirroredStrategy.
Implement 16 workers of a2-highgpu-8g machines by using tf.distribute.MultiWorkerMirroredStrategy.
ユーザの投票
コメント(1)
- 正解だと思う選択肢: B
TPU Advantages:
Highly Specialized: TPUs (Tensor Processing Units) are custom-designed hardware accelerators specifically optimized for machine learning workloads, particularly those involving large batch sizes and matrix-heavy computations, common in large language models. Exceptional Performance: TPUs can significantly outperform CPUs and GPUs in terms of speed and efficiency for these types of tasks. Cost-Effective: While TPUs might have a higher hourly cost, their exceptional performance often leads to lower overall costs due to faster training times and reduced resource usage. TPU Pod Slice:
Scalability: TPU Pod slices allow you to distribute training across multiple TPUv4 chips for even greater performance and scalability. Custom Operations: The tf.distribute.TPUStrategy ensures compatibility with custom TensorFlow operations,
👍 1pikachu0072024/01/12
シャッフルモード