Topic 1 Question 169
A company is building a new version of a recommendation engine. Machine learning (ML) specialists need to keep adding new data from users to improve personalized recommendations. The ML specialists gather data from the users' interactions on the platform and from sources such as external websites and social media. The pipeline cleans, transforms, enriches, and compresses terabytes of data daily, and this data is stored in Amazon S3. A set of Python scripts was coded to do the job and is stored in a large Amazon EC2 instance. The whole process takes more than 20 hours to finish, with each script taking at least an hour. The company wants to move the scripts out of Amazon EC2 into a more managed solution that will eliminate the need to maintain servers. Which approach will address all of these requirements with the LEAST development effort?
Load the data into an Amazon Redshift cluster. Execute the pipeline by using SQL. Store the results in Amazon S3.
Load the data into Amazon DynamoDB. Convert the scripts to an AWS Lambda function. Execute the pipeline by triggering Lambda executions. Store the results in Amazon S3.
Create an AWS Glue job. Convert the scripts to PySpark. Execute the pipeline. Store the results in Amazon S3.
Create a set of individual AWS Lambda functions to execute each of the scripts. Build a step function by using the AWS Step Functions Data Science SDK. Store the results in Amazon S3.
ユーザの投票
コメント(5)
- 正解だと思う選択肢: C
C; Lambda execution time has hard limit of 15 mins which might not be enough for data processing
👍 13spaceexplorer2022/04/30 - 正解だと思う選択肢: C
The data pipeline involves cleaning, transforming, enriching, and compressing terabytes of data and storing the data in Amazon S3.
AWS Glue is an ETL service that makes it easy to move data between data stores. The Glue job allows you to use PySpark scripts to perform ETL tasks. With AWS Glue, you do not need to provision and manage servers, which eliminates the need to maintain servers, as required by the company.
Therefore, AWS Glue would address all of the company's requirements with the least development effort.
👍 3AjoseO2023/02/19 - 正解だと思う選択肢: C
Converting python scripts to pyspark is less coding effort than writing up SQL, which is somewhat limited in the types of transformations it can do.
Lambda function responses are a deadend for reason already given (timeout)
👍 2ovokpus2022/06/26
シャッフルモード