Topic 1 Question 113
A Machine Learning Specialist is designing a scalable data storage solution for Amazon SageMaker. There is an existing TensorFlow-based model implemented as a train.py script that relies on static training data that is currently stored as TFRecords. Which method of providing training data to Amazon SageMaker would meet the business requirements with the LEAST development overhead?
Use Amazon SageMaker script mode and use train.py unchanged. Point the Amazon SageMaker training invocation to the local path of the data without reformatting the training data.
Use Amazon SageMaker script mode and use train.py unchanged. Put the TFRecord data into an Amazon S3 bucket. Point the Amazon SageMaker training invocation to the S3 bucket without reformatting the training data.
Rewrite the train.py script to add a section that converts TFRecords to protobuf and ingests the protobuf data instead of TFRecords.
Prepare the data in the format accepted by Amazon SageMaker. Use AWS Glue or AWS Lambda to reformat and store the data in an Amazon S3 bucket.
ユーザの投票
コメント(9)
I would select B. Based on the following AWS documentation it appears this is the right approach: https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html https://github.com/aws-samples/amazon-sagemaker-script-mode/blob/master/tf-horovod-inference-pipeline/train.py
👍 20joep212021/10/11B is my answer. Reading Data filenames = ["s3://bucketname/path/to/file1.tfrecord", "s3://bucketname/path/to/file2.tfrecord"] dataset = tf.data.TFRecordDataset(filenames)
👍 12SophieSu2021/10/29Unfortunilty you can't use the script unchanged, there are some things that need to be added:
- Make sure your script can handle --model_dir as an additional command line argument. If you did not specify a location when you created the TensorFlow estimator, an S3 location under the default training job bucket is used. Distributed training with parameter servers requires you to use the tf.estimator.train_and_evaluate API and to provide an S3 location as the model directory during training.
- Load input data from the input channels. The input channels are defined when fit is called.
https://sagemaker.readthedocs.io/en/stable/frameworks/tensorflow/using_tf.html
Beause of the pre-rec Ans A and B are an easy disqualifcation. There is no need to change the training format so option C is a red herring
Ans is D
Not the most obvious answer
👍 4cnethers2021/10/16
シャッフルモード