Topic 1 Question 258
A company has an application that places hundreds of .csv files into an Amazon S3 bucket every hour. The files are 1 GB in size. Each time a file is uploaded, the company needs to convert the file to Apache Parquet format and place the output file into an S3 bucket.
Which solution will meet these requirements with the LEAST operational overhead?
Create an AWS Lambda function to download the .csv files, convert the files to Parquet format, and place the output files in an S3 bucket. Invoke the Lambda function for each S3 PUT event.
Create an Apache Spark job to read the .csv files, convert the files to Parquet format, and place the output files in an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the Spark job.
Create an AWS Glue table and an AWS Glue crawler for the S3 bucket where the application places the .csv files. Schedule an AWS Lambda function to periodically use Amazon Athena to query the AWS Glue table, convert the query results into Parquet format, and place the output files into an S3 bucket.
Create an AWS Glue extract, transform, and load (ETL) job to convert the .csv files to Parquet format and place the output files into an S3 bucket. Create an AWS Lambda function for each S3 PUT event to invoke the ETL job.
ユーザの投票
コメント(11)
- 正解だと思う選択肢: D
No, D should be correct.
"LEAST operational overhead" => Should you fully manage service like Glue instead of manually like the answer A.
👍 10Parsons2023/01/14 Here A is the correct answer. The reason here is the least operational overhead. A ==> S3 - Lambda - S3 D ==> S3 - Lambda - Glue - S3
Also, glue cannot convert on fly automatically, you need to write some code there. If you write the same code in lambda it will convert the same and push the file to S3
Lambda has max memory of 128 MB to 10 GB. So, it can handle it easily.
And we need to consider cost also, glue cost is more. Hope many from this forum realize these differences.
👍 3aws4myself2023/01/25A is unlikely to work as Lambda may struggle with 1GB size: "< 64 MB, beyond which lambda is likely to hit memory caps", see https://stackoverflow.com/questions/41504095/creating-a-parquet-file-on-aws-lambda-function
👍 2JayBee652023/01/24
シャッフルモード