Topic 1 Question 26

AWS Certified Data Engineer - Associate

Topic 1 Question 26
A company is planning to use a provisioned Amazon EMR cluster that runs Apache Spark jobs to perform big data analysis. The company requires high reliability. A big data team must follow best practices for running cost-optimized and long-running workloads on Amazon EMR. The team must find a solution that will maintain the company's current level of performance. Which combination of resources will meet these requirements MOST cost-effectively?

2 つ選択
- Use Hadoop Distributed File System (HDFS) as a persistent data store.
- Use Amazon S3 as a persistent data store.
- Use x86-based instances for core nodes and task nodes.
- Use Graviton instances for core nodes and task nodes.
- Use Spot Instances for all primary nodes.
ユーザの投票
コメント(6)
- 正解だと思う選択肢: BD
  HDFS is not recommended for persistent storage because once a cluster is terminated, all HDFS data is lost. Also, long-running workloads can fill the disk space quickly. Thus, S3 is the best option since it's highly available, durable, and scalable.
  
  AWS Graviton-based instances cost up to 20% less than comparable x86-based Amazon EC2 instances: https://aws.amazon.com/ec2/graviton/
  
  👍 7
  [Removed]2024/07/20
- 正解だと思う選択肢: BD
  B and D.
  
  👍 3
  GiorgioGss2024/09/10
- 正解だと思う選択肢: BD
  s3 no question. Graviton=> Cost-Effectiveness: Graviton instances are ARM-based instances specifically designed for cloud workloads. They offer significant cost savings compared to x86-based instances while delivering comparable or better performance for many Apache Spark workloads. Performance: Graviton instances are optimized for Spark workloads and can deliver the same level of performance as x86-based instances in many cases. Additionally, EMR offers performance-optimized versions of Spark built for Graviton instances.
  
  👍 3
  pypelyncar2024/12/08
シャッフルモード

ユーザの投票

コメント(6)