Topic 1 Question 31

AWS Certified Data Engineer - Associate

Topic 1 Question 31
A company is building an analytics solution. The solution uses Amazon S3 for data lake storage and Amazon Redshift for a data warehouse. The company wants to use Amazon Redshift Spectrum to query the data that is in Amazon S3. Which actions will provide the FASTEST queries?

2 つ選択
- Use gzip compression to compress individual files to sizes that are between 1 GB and 5 GB.
- Use a columnar storage file format.
- Partition the data based on the most common query predicates.
- Split the data into files that are less than 10 KB.
- Use file formats that are not splittable.
ユーザの投票
コメント(7)
- 正解だと思う選択肢: BC
  B. Use a columnar storage file format: This is an excellent approach. Columnar storage formats like Parquet and ORC are highly recommended for use with Redshift Spectrum. They store data in columns, which allows Spectrum to scan only the needed columns for a query, significantly improving query performance and reducing the amount of data scanned.
  
  C. Partition the data based on the most common query predicates: Partitioning data in S3 based on commonly used query predicates (like date, region, etc.) allows Redshift Spectrum to skip large portions of data that are irrelevant to a particular query. This can lead to substantial performance improvements, especially for large datasets.
  
  👍 5
  rralucard_2024/02/03
- 正解だと思う選択肢: BC
  https://docs.aws.amazon.com/redshift/latest/dg/c-spectrum-external-performance.html
  
  👍 5
  GiorgioGss2024/03/11
- 正解だと思う選択肢: BC
  Redshift Spectrum is optimized for querying data stored in columnar formats like Parquet or ORC. These formats store each data column separately, allowing Redshift Spectrum to only scan the relevant columns for a specific query, significantly improving performance compared to row-oriented formats Partitioning organizes data files in S3 based on specific column values (e.g., date, region). When your queries filter or join data based on these partitioning columns (common query predicates), Redshift Spectrum can quickly locate the relevant data files, minimizing the amount of data scanned and accelerating query execution
  
  👍 3
  pypelyncar2024/06/09
シャッフルモード

ユーザの投票

コメント(7)