Topic 1 Question 36

AWS Certified Data Engineer - Associate

Topic 1 Question 36
A data engineer runs Amazon Athena queries on data that is in an Amazon S3 bucket. The Athena queries use AWS Glue Data Catalog as a metadata table. The data engineer notices that the Athena query plans are experiencing a performance bottleneck. The data engineer determines that the cause of the performance bottleneck is the large number of partitions that are in the S3 bucket. The data engineer must resolve the performance bottleneck and reduce Athena query planning time. Which solutions will meet these requirements?

2 つ選択
- Create an AWS Glue partition index. Enable partition filtering.
- Bucket the data based on a column that the data have in common in a WHERE clause of the user query.
- Use Athena partition projection based on the S3 bucket prefix.
- Transform the data that is in the S3 bucket to Apache Parquet format.
- Use the Amazon EMR S3DistCP utility to combine smaller objects in the S3 bucket into larger objects.
ユーザの投票
コメント(14)
- 正解だと思う選択肢: AC
  https://aws.amazon.com/blogs/big-data/top-10-performance-tuning-tips-for-amazon-athena/ Optimizing Partition Processing using partition projection Processing partition information can be a bottleneck for Athena queries when you have a very large number of partitions and aren’t using AWS Glue partition indexing. You can use partition projection in Athena to speed up query processing of highly partitioned tables and automate partition management. Partition projection helps minimize this overhead by allowing you to query partitions by calculating partition information rather than retrieving it from a metastore. It eliminates the need to add partitions’ metadata to the AWS Glue table.
  
  👍 7
  rralucard_2024/07/31
- 正解だと思う選択肢: AC
  Keyword: Athena query planning time
  
  See explanation in the link: https://www.myexamcollection.com/Data-Engineer-Associate-vce-questions.htm
  
  B & D are related to analytical queries performance, not about "query planning" performance.
  
  👍 3
  fceb2c12024/09/23
- 正解だと思う選択肢: AD
  A. Create an AWS Glue partition index. Enable partition filtering. Targeted Optimization: Partition indexes within the Glue Data Catalog help Athena efficiently identify the relevant partitions, significantly reducing query planning time. Partition filtering further refines the search during query execution. D. Transform the data that is in the S3 bucket to Apache Parquet format. Efficient Columnar Format: Parquet's columnar storage and built-in metadata often allow Athena to skip over large portions of data irrelevant to the query, leading to faster query planning and execution.
  
  👍 3
  Christina6662024/10/12
シャッフルモード

ユーザの投票

コメント(14)