Topic 1 Question 87
A company has an Amazon S3 bucket that contains 1 ТРof files from different sources. The S3 bucket contains the following file types in the same S3 folder: CSV, JSON, XLSX, and Apache Parquet.
An ML engineer must implement a solution that uses AWS Glue DataBrew to process the data. The ML engineer also must store the final output in Amazon S3 so that AWS Glue can consume the output in the future.
Which solution will meet these requirements?
Use DataBrew to process the existing S3 folder. Store the output in Apache Parquet format.
Use DataBrew to process the existing S3 folder. Store the output in AWS Glue Parquet format.
Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in Apache Parquet format.
Separate the data into a different folder for each file type. Use DataBrew to process each folder individually. Store the output in AWS Glue Parquet format.
ć¦ć¼ć¶ć®ę焨
ć³ć”ć³ć(1)
- ę£č§£ć ćØęćéøęč¢: C
AWS Glue DataBrew can process various file formats (CSV, JSON, XLSX, Parquet) Since DataBrew can handle datasets with multiple file formats, there is no need to separate files into different folders by type. Apache Parquet is an optimal format for AWS Glue Parquet is a columnar format, which is well-suited for AWS Glue and is efficient for later analysis and ML model training. "AWS Glue Parquet format" does not exist Options B and D mention "AWS Glue Parquet format," which is incorrect. Parquet is a standard data format and is not exclusive to AWS Glue. ā Conclusion: Option A is the best solution because it allows DataBrew to process all files in the existing S3 folder and store the output in Apache Parquet format, which is efficient and compatible with AWS Glue. š
š 2ryuhei2025/03/01
ć·ć£ććć«ć¢ć¼ć