Topic 1 Question 212
A data scientist has 20 TB of data in CSV format in an Amazon S3 bucket. The data scientist needs to convert the data to Apache Parquet format.
How can the data scientist convert the file format with the LEAST amount of effort?
Use an AWS Glue crawler to convert the file format.
Write a script to convert the file format. Run the script as an AWS Glue job.
Write a script to convert the file format. Run the script on an Amazon EMR cluster.
Write a script to convert the file format. Run the script in an Amazon SageMaker notebook.
ユーザの投票
コメント(3)
in sagemaker notebook, you'd have to write python code but question is asking for something easy so i choose option b https://blog.searce.com/convert-csv-json-files-to-apache-parquet-using-aws-glue-a760d177b45f
👍 2drcok872023/02/10- 正解だと思う選択肢: B
AWS Glue is a fully-managed ETL service that makes it easy to move data between data stores. AWS Glue can be used to automate the conversion of CSV files to Parquet format with minimal effort. AWS Glue supports reading data from CSV files, transforming the data, and writing the transformed data to Parquet files.
Option A is incorrect because AWS Glue crawler is used to infer the schema of data stored in S3 and create AWS Glue Data Catalog tables.
Option C is incorrect because while Amazon EMR can be used to process large amounts of data and perform data conversions, it requires more operational effort than AWS Glue.
Option D is incorrect because Amazon SageMaker is a machine learning service, and while it can be used for data processing, it is not the best option for simple data format conversion tasks.
👍 2AjoseO2023/02/19 - 正解だと思う選択肢: B
Option B. A - Glue crawler creates Glue Data Catalog from S3 buckets. It can be used to query by athena. C, D - not serverless and not generally used for etl.
👍 2GiyeonShin2023/02/21
シャッフルモード