Event. Alert. Incident. These terms are bandied about, often interchangeably, in IT operations management. Broadly speaking, they all refer to situations where something is potentially amiss and needs to be investigated and resolved. Each of these three words does, however, have a distinct definition. Because they are used in scenarios where clear communication and timeliness are critical, it’s important to understand the differences and use them appropriately.
Events
An event is simply an occurrence or a change in the operation of a system network, process, or workflow. Events can be planned or unplanned, and can happen automatically or as a result of human intervention. Think of an event as “raw data” about something that happened, without qualifying whether it is good, bad, or neutral.
Alerts
An alert is a qualified event that is deemed “bad” and requires attention, for example from DevOps, a system admin, or the SRE team. If an event is a flag, then an alert is a red flag. Because the typical enterprise experiences thousands of events each day, they should be deduplicated to eliminate redundancy, which cuts down on the noise, and enriched to provide informative details, which helps speed time to resolution.
Incidents
Once it has been determined that an alert negatively affects the organization in some way and that it requires immediate attention—perhaps involving a cross-functional team of responders, constant stakeholder communication, and an after-the-fact post mortem—an incident is created. Not all alerts become incidents, particularly if they are less severe and can be worked by a single responder. Often, alert reviews and incident collaboration and problem-solving happen in an IT situation room (also sometimes called a war room).
The differences between events, alerts, and incidents in action
Here are two examples that illustrate the progression from event to alert to incident.
Storage example
- Event: The average read or write byte rate of a flash array exceeds the expected range.
- Alert: Because this could represent a potential problem that needs to be fixed, a system admin analyzes why the bucket is full and determines it was simply a transient occurrence and no remediation is required.
- Incident: No incident is created.
E-commerce example
- Event: A shopper gets an error on the checkout page generating an event. The internal system hits a low-memory threshold and generates another event.
- Alert: These two events are correlated and, because this is clearly a problem, a system admin or SRE team analyzes the related events and determines that the system is running out of memory.
- Incident: Because business is impacted, potentially affecting revenue, this needs immediate attention, so an incident is created to identify and implement a resolution.
Manage events, alerts, and incidents with CloudMonitor
Virtana’s CloudMonitor, which powers the Virtana Platform, provides a single view of hybrid and multi-cloud infrastructure so you can get the most of your multi-vendor resources. Out-of-the-box and custom policies enable you to effectively navigate events, alerts, and incidents. Our ServiceNow integration allows for two-way communication between ServiceNow and CloudMonitor, which means you’ll see the latest incident information no matter which system it’s created or updated in. Contact us to learn more or to book a demo.