Topic 1 Question 203
An ecommerce company wants to train a large image classification model with 10,000 classes. The company runs multiple model training iterations and needs to minimize operational overhead and cost. The company also needs to avoid loss of work and model retraining.
Which solution will meet these requirements?
Create the training jobs as AWS Batch jobs that use Amazon EC2 Spot Instances in a managed compute environment.
Use Amazon EC2 Spot Instances to run the training jobs. Use a Spot Instance interruption notice to save a snapshot of the model to Amazon S3 before an instance is terminated.
Use AWS Lambda to run the training jobs. Save model weights to Amazon S3.
Use managed spot training in Amazon SageMaker. Launch the training jobs with checkpointing enabled.
ユーザの投票
コメント(5)
- 正解だと思う選択肢: D
It has to be D. With Spot training we can reduce the cost and save the model weights with checkpoint enabled.
👍 6Amit110119962022/11/28 - 正解だと思う選択肢: D
https://docs.aws.amazon.com/sagemaker/latest/dg/model-managed-spot-training.html
Managed spot training can optimize the cost of training models up to 90% over on-demand instances. SageMaker manages the Spot interruptions on your behalf.
"Spot instances can be interrupted, causing jobs to take longer to start or finish. You can configure your managed spot training job to use checkpoints. SageMaker copies checkpoint data from a local path to Amazon S3. When the job is restarted, SageMaker copies the data from Amazon S3 back into the local path. The training job can then resume from the last checkpoint instead of restarting."
👍 5Peeking2022/12/10 - 正解だと思う選択肢: D
Managed spot training in Amazon SageMaker provides a cost-effective way to run large machine learning workloads.
With managed spot training, the training jobs are executed using Amazon EC2 Spot instances, which can significantly reduce the cost of training.
Additionally, by launching training jobs with checkpointing enabled, the work done up to the last checkpoint is saved to Amazon S3. This ensures that the training job can be resumed from the last checkpoint in case of instance failure or termination. This minimizes the risk of data loss and avoids the need for retraining the model from scratch. Using Amazon SageMaker also reduces the operational overhead required to set up and manage the training environment.
👍 3AjoseO2023/02/19
シャッフルモード