Topic 1 Question 118
A banking company uses an application to collect large volumes of transactional data. The company uses Amazon Kinesis Data Streams for real-time analytics. The company’s application uses the PutRecord action to send data to Kinesis Data Streams.
A data engineer has observed network outages during certain times of day. The data engineer wants to configure exactly-once delivery for the entire processing pipeline.
Which solution will meet this requirement?
Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
Update the checkpoint configuration of the Amazon Managed Service for Apache Flink (previously known as Amazon Kinesis Data Analytics) data collection application to avoid duplicate processing of events.
Design the data source so events are not ingested into Kinesis Data Streams multiple times.
Stop using Kinesis Data Streams. Use Amazon EMR instead. Use Apache Flink and Apache Spark Streaming in Amazon EMR.
ユーザの投票
コメント(3)
- 正解だと思う選択肢: A
A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
This approach ensures that even if a record is sent more than once due to network outages or other issues, it will only be processed once because the unique ID can be used to identify and remove any duplicates. This is a common pattern for achieving exactly-once processing semantics in distributed systems. The other options do not guarantee exactly-once delivery across the entire pipeline. Option B is partially correct but it only avoids duplicate processing within the Amazon Managed Service for Apache Flink, not across the entire pipeline. Option C is not always feasible because network issues and other factors can lead to events being ingested into Kinesis Data Streams multiple times. Option D involves changing the entire technology stack, which is not necessary to achieve the desired outcome and could introduce additional complexity and cost.
👍 3bakarys2024/07/02 - 正解だと思う選択肢: A
A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
Explanation: Exactly-Once Delivery: Ensuring exactly-once delivery is a challenge in distributed systems, especially in the presence of network outages and retries. By embedding a unique ID in each record at the source, you can track and identify duplicate records during processing. This approach allows you to implement idempotent processing, where duplicate records can be detected and discarded, ensuring that each record is processed exactly once. De-duplication Logic: Implementing de-duplication logic based on unique IDs ensures that even if the same record is ingested multiple times due to retries or network issues, it will be processed only once by the downstream applications.
👍 2Ja132024/07/08 - 正解だと思う選択肢: A
A. Design the application so it can remove duplicates during processing by embedding a unique ID in each record at the source.
👍 1PashoQ2024/09/17
シャッフルモード