Topic 1 Question 3
A Mobile Network Operator is building an analytics platform to analyze and optimize a company's operations using Amazon Athena and Amazon S3. The source systems send data in .CSV format in real time. The Data Engineering team wants to transform the data to the Apache Parquet format before storing it on Amazon S3. Which solution takes the LEAST effort to implement?
Ingest .CSV data using Apache Kafka Streams on Amazon EC2 instances and use Kafka Connect S3 to serialize data as Parquet
Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Glue to convert data into Parquet.
Ingest .CSV data using Apache Spark Structured Streaming in an Amazon EMR cluster and use Apache Spark to convert data into Parquet.
Ingest .CSV data from Amazon Kinesis Data Streams and use Amazon Kinesis Data Firehose to convert data into Parquet.
ユーザの投票
コメント(17)
- 👍 24DonaldCMLIN2021/09/22
D is wrong as kinesis firehose can convert from JSON to parquet but here we have CSV. B is correct and here is another proof link: https://medium.com/searce/convert-csv-json-files-to-apache-parquet-using-aws-glue-a760d177b45f
👍 17vetal2021/09/24A-- cannot be anwser as Apache kafka S3 cannot write in parquet B-- seems like a good anwser but if the question is old , then at that time Glue did not had compatibility with Kinesis data streams C--- cannot be the anwser as Spark structured streaming need to read data from somehwere like Kafka topics , we cannot publish data directly to it and like other three options mention Apache kafka/Kinesis in it so this means it should have been in this one also if this was the correct anwser. D-- Seems like a good anwser , but it needs lambda to convert CSV -- JSON and then Firehose's inbuilt ability to covert JSON to Parquet before storing to S3
Now the final analysis -- :) -- The comments on this question started some time in Decmeber 2019/January2020 and the Glue capability to consume real time Kiensis data streams was annoounce much later on April 27 , 2020 which means this anwser must be incorrect before this date --Hence the likely choice is D , even though firehose Lambda thing is not mentioned in it specifically Any counter comment ???
👍 6harmanbirstudy2021/10/30
シャッフルモード