Topic 1 Question 135
A data engineer is using an AWS Glue crawler to catalog data that is in an Amazon S3 bucket. The S3 bucket contains both .csv and json files. The data engineer configured the crawler to exclude the .json files from the catalog.
When the data engineer runs queries in Amazon Athena, the queries also process the excluded .json files. The data engineer wants to resolve this issue. The data engineer needs a solution that will not affect access requirements for the .csv files in the source S3 bucket.
Which solution will meet this requirement with the SHORTEST query times?
Adjust the AWS Glue crawler settings to ensure that the AWS Glue crawler also excludes .json files.
Use the Athena console to ensure the Athena queries also exclude the .json files.
Relocate the .json files to a different path within the S3 bucket.
Use S3 bucket policies to block access to the .json files.
ユーザの投票
コメント(3)
- 正解だと思う選択肢: C
Athena does not recognize exclude patterns that you specify an AWS Glue crawler. For example, if you have an Amazon S3 bucket that contains both .csv and .json files and you exclude the .json files from the crawler, Athena queries both groups of files. To avoid this, place the files that you want to exclude in a different location. https://docs.aws.amazon.com/athena/latest/ug/troubleshooting-athena.html
👍 8teo21572024/08/12 Athena will scan both types of files.
Although it may be feasible to adjust Athena query to exclude .json, the SHORTEST query times would be via relocating .json files to different path.
👍 1BenLearningDE2024/09/12If the AWS Glue crawler is configured to exclude .json files, then the AWS Glue Data Catalog will not have any metadata related to those .json files. In this case, the Athena table that uses the Glue Data Catalog would not be aware of the .json files at all, and Athena queries would only process the files that are included in the Glue catalog (e.g., .csv files).
👍 1AdityaB2024/10/14
シャッフルモード