Examtopics

Professional Cloud DevOps Engineer
  • Topic 1 Question 133

    You encounter a large number of outages in the production systems you support. You receive alerts for all the outages, the alerts are due to unhealthy systems that are automatically restarted within a minute. You want to set up a process that would prevent staff burnout while following Site Reliability Engineering (SRE) practices. What should you do?

    • Eliminate alerts that are not actionable

    • Redefine the related SLO so that the error budget is not exhausted

    • Distribute the alerts to engineers in different time zones

    • Create an incident report for each of the alerts


    シャッフルモード