Topic 1 Question 94
A retail company stores transactions, store locations, and customer information tables in four reserved ra3.4xlarge Amazon Redshift cluster nodes. All three tables use even table distribution.
The company updates the store location table only once or twice every few years.
A data engineer notices that Redshift queues are slowing down because the whole store location table is constantly being broadcast to all four compute nodes for most queries. The data engineer wants to speed up the query performance by minimizing the broadcasting of the store location table.
Which solution will meet these requirements in the MOST cost-effective way?
Change the distribution style of the store location table from EVEN distribution to ALL distribution.
Change the distribution style of the store location table to KEY distribution based on the column that has the highest dimension.
Add a join column named store_id into the sort key for all the tables.
Upgrade the Redshift reserved node to a larger instance size in the same instance family.
ユーザの投票
コメント(4)
- 正解だと思う選択肢: A
Changing the distribution style of the store location table to ALL distribution (A) is the most cost-effective solution. It directly addresses the issue of broadcasting by ensuring the entire table is available on each node, significantly improving join performance without incurring substantial additional costs.
👍 4PGGuy2024/06/21 - 正解だと思う選択肢: A
Using ALL distribution means the table is replicated to all nodes, eliminating the need for broadcasting during queries. Since the store location table is updated infrequently, this will significantly speed up queries without incurring frequent update costs.
👍 2tgv2024/06/15 - 正解だと思う選択肢: A
The most cost-effective solution to speed up the query performance by minimizing the broadcasting of the store location table would be:
A. Change the distribution style of the store location table from EVEN distribution to ALL distribution.
In Amazon Redshift, the ALL distribution style replicates the entire table to all nodes in the cluster, which eliminates the need to redistribute the data when executing a query. This can significantly improve query performance. Given that the store location table is updated only once or twice every few years, the overhead of maintaining the replicated data would be minimal. This makes it a cost-effective solution for improving the query performance.
👍 2bakarys2024/07/02
シャッフルモード