Introduction
The presence of nuisance alarms, defined as alarms that annunciate excessively, unnecessarily, or do not return to normal after the correct response is taken, poses a significant threat to facility safety and efficiency. Their constant barrage negatively impacts operators / dispatchers, leading to many problems:
- Alarm Fatigue: Repeated exposure to false alarms desensitizes personnel, making it difficult to distinguish between critical and non-critical events.
- Heightened Stress: The constant pressure to respond to alarms, even if many are false, creates a stressful work environment.
- Desensitization: Over time, personnel may begin to ignore alarms altogether, potentially missing critical warnings.
- Impact to System Performance: Use of excessive resources (memory, bandwidth) may compromise automation system performance.
- Undermining Response Protocols: Inconsistent response processes necessitate additional training and leads to unreliable results.
The ultimate consequence of these issues can be disastrous:
- Unplanned Downtime: Ignoring critical alarms can lead to equipment failures and unplanned outages of revenue-generating spaces.
- Facility Incidents: In the worst-case scenario, missed alarms can escalate into safety incidents with property damage, loss of product / research, or even injuries / death.
The culprits behind nuisance alarms include poor system design and configuration, malfunctioning hardware, and alarms that don't provide valuable information for facility operations. Reducing the number and impact of nuisance alarms is crucial for minimizing operational risk and maximizing operator / dispatcher performance.
Purpose
This article provides a methodology for identifying nuisance alarms via Alarm Triage™ and fixing them. Alarm Triage reports should be scheduled to run and distribute automatically (e.g., on a weekly basis) to become part of a regular weekly review process.
Actions to fix nuisance alarms include making changes at the source (e.g., Building Automation System (BAS) alarm configuration), modifying sequence of operations, and fixing hardware issues.
What is a Facility Alarm (or what should it be)?
The most important principle of alarm management is that an alarm should annunciate only when it meets specific criteria (derived from the definition of a facility alarm below). These criteria ensure the alarms are valid, meaningful, and actionable. The process of reviewing alarms against these criteria to identify nuisance alarms is called alarm rationalization.
Facility alarm: a facility abnormal space condition, performance deviation, or equipment malfunction…
=> The condition is unexpected and not a result of normal operations
which requires a timely operator action (e.g., write a CMMS work order, adjust a BAS temperature setpoint)...
=> A human response is required to address the issue
to prevent a consequence. (e.g., loss of lab research or shutdown of an OR).
=> There is a consequence if no action is taken
Definition based on ANSI/ISA-18.2-2016.
Types of Nuisance Alarms
Although there are different types of nuisance alarms, they share the characteristic of violating one or more of the alarm criteria above.
Figure 1. Types of Nuisance Alarms
Chattering alarm - An alarm that repeatedly transitions between the alarm state and the not active state in a short period of time [SOURCE: ISA-18.2-2016]
Fleeting alarm - alarm that transitions between an active alarm state and a not active state in a short period of time without rapidly repeating. [SOURCE: ISA-18.2-2016]
Redundant alarm – Alarms that consistently occur within a short period of other alarms - indicating multiple annunciations for the same event. The goal is one event: one alarm.
Stale alarm - An alarm that remains in the alarm state / annunciated for an extended period of time (e.g., 24 hours). [SOURCE: ISA-18.2-2016]
Identifying and Fixing Nuisance Alarms
Nuisance alarms identified within Alarm Triage should be analyzed to understand symptoms and root cause. Determination of root cause guides the implementation of potential solutions, as outlined below.
| Issue / Scenario | Potential Solutions | Display for Identifying the Issue |
| Chattering Alarms | Proper deadband, on-delay, off-delay settings, alarm limit review | Bad Actors Report |
| Fleeting Alarms | Proper deadband, on-delay settings | Bad Actors Report |
| (Other) Frequently Occurring Alarms | Proper deadband, on-delay, off-delay settings, alarm limit review | Bad Actors Report |
| Stale Alarms | Rationalization, logic-based or state-based alarming | Stale Alarms |
| Redundant Alarms | Rationalization, state-based suppression | |
| Nuisance Alarms (General) | Rationalization, improve PID Loop Tuning, hardware in need of maintenance | Nuisance Alarms |
Table 1. Alarm System Issue Remediation List
Using the Bad Actors Report
The Bad Actors Report can be used to identify chattering alarms, fleeting alarms, and other frequently occurring alarms.
- Select Analytics Icon.
- Select Alarms / Bad Actors Report.
Figure 2. Identification of Bad Actors (Most Frequently Occurring Alarms) in Alarm Triage
The Bad Actors Report shows the most frequently occurring alarms over a time period. The alarm shown in Row 3 (A44170) is the third most frequently occurring alarm. It has annunciated 2090 times (Instance Count) staying active for an average of 1m 10s at a time (Time in Alarm-Avg). This single alarm condition created 13% of the total alarm load (% of Total Alarms).
- To further analyze the performance of a specific alarm and learn how it might be fixed, click on the Alarm ID to launch the Alarm Details view.
Figure 3. Analyzing Time in Alarm to Evaluate Chattering / Fleeting Behavior
- This view shows the number of alarm occurrences per day and the total number of occurrences for the week classified as Chattering or Fleeting behavior.
- Click on View Alarm History to assess how long each alarm occurrence is active before it clears.
- Click on the column heading Time in Alarm to list the alarm occurrences in order from shortest to longest. This can help evaluate how setting an on-delay in the BAS would reduce the number of occurrences.
Figure 4. Ordering / Reviewing Time in Alarm Occurrences from Smallest to Largest
Using the Stale Alarms Report
The Stale Alarms report identifies stale alarms (active for > 24 hours).
- Select Explorer Icon.
- Select Grid Icon to display data in table format.
- Click on Time in Alarm column to sort from highest to lowest.
Figure 5. Stale Alarm List in Alarm Triage
Using the Nuisance Alarms Report
- Select Explorer Icon.
- Select Nuisance Alarms View.
- Scroll / Filter / Sort to select the alarm of interest. By default the view presents the alarms with the most nuisance alarm activity at the top (with the number of nuisance activations shown).
- Click on the View Alarm History link.
- To display instances of chattering behavior, fleeting behavior, or Nuisance behavior select Columns and add “Chattering”, ”Fleeting” and “Nuisance” to the display.
Figure 6. Adding Chattering, Fleeting, and Nuisance Behavior to Report
- Filter on instances of chattering, fleeting, or nuisance behavior, by setting filter as appropriate:
- Chattering = “True”
- Fleeting = “True”
- Nuisance = “True”
Figure 7. Identification of Chattering Alarms in Alarm Triage
Fixing Nuisance Alarm Behavior
Chattering Alarms, Fleeting Alarms, and Other Frequently Occurring Alarms
A fleeting alarm transitions between the alarm state and the normal state in a short period of time (e.g., less than 5 minutes) but does not immediately repeat. If it repeats, it is called a chattering alarm. In both cases, the alarm durations are too short for the alarm clearing to have been due to the operator’s action.
A Bad Actors report will identify frequently occurring alarms (including chattering and fleeting alarms), which distract the operator and make it difficult to focus on addressing legitimate issues. Common root causes for these behaviors include:
- Noisy instrumentation signals
- Alarm limits too close to operating conditions
- Poor PID controller tuning
- Sticking valves and dampers
Evaluate Alarm Validity and Meaning (Rationalization):
- Apply the relevant steps of alarm rationalization as shown below to determine if the alarm is valid and meaningful.
Figure 8. Simplified Alarm Rationalization Workflow for Determining if an Alarm is Needed
- If NOT valid and meaningful: Flag the alarm for removal (disable in the BAS).
Alarm Filtering Techniques:
- Analog Alarms: Apply / adjust Alarm Deadband in the source system (e.g., BAS) first. This creates a buffer zone around the setpoint where the alarm won't trigger from minor fluctuations.
- If the deadband doesn't resolve, use On/Off Delay. This introduces a time delay before triggering the alarm, allowing for temporary signal spikes to settle.
-
Discrete Alarms: Apply On/Off Delays in the source system (e.g., BAS). These delays can help filter out short-duration glitches on discrete (on/off) signals.
- Note: Off-Delay is effective only for chattering alarms, it is NOT effective for fleeting alarms.
- Note: These filtering features might not be available for all types of alarms (e.g., system alarms).
Alarm Setpoint Review:
- Chattering due to Tight Setpoint: Analyze the alarm's Setpoint (Limit). If it's too close to the normal operating range, small fluctuations can trigger the alarm repeatedly (chattering).
- Historical Trend Analysis: Review historical data (trends) to define the normal operating envelope for the process variable.
- Setpoint Adjustment: If the alarm is valid, adjust the setpoint based on the normal operating range and the severity of potential consequences if the limit is breached.
Control Loop Optimization:
- Chattering due to Control Issues: Examine historical data to see if the chattering is caused by a control loop problem, such as poor tuning of a Proportional-Integral-Derivative (PID) controller.
- Control Loop Correction: If control issues are identified, adjust the PID controller settings to achieve smoother control and reduce alarm chattering.
Alarm Latching (Optional):
- Equipment/Sensor Diagnostics: For alarms related to equipment or sensor diagnostics that don't exhibit clear failure modes (e.g., gradual degradation), consider alarm latching. This keeps the alarm active until manually reset, even if the issue resolves itself momentarily. This helps identify intermittent problems that might otherwise go unnoticed.
Dynamic Alarming for Mode Changes:
- Determine if the process or equipment has distinct operating modes (e.g., summer/winter, occupied/unoccupied).
- If different alarm limits are needed for different modes, implement dynamic alarming in the BAS. This allows the system to automatically adjust the alarm limit based on operating state.
Offline Analysis of Alarm Duration to Estimate On-Delay
- To determine what value of on-delay might mitigate the chattering behavior, click on the link to Download the alarm history in .csv or .xls format.
- Open the file in Excel and manipulate the data to create a pivot table / pareto chart that shows the percent of alarms that have been active for up to a period of time.
With the Time in Alarm data structured in a pareto chart, the % reduction of alarm occurrences can be estimated for various on-delay values. For example, the data shows that setting an alarm on-delay in the BAS of five minutes could be expected to eliminate approximately 90% of the alarm occurrences. This is because more than 50% of alarm notifications are active for only two minutes.
Figure 9. Estimating BAS Alarm On-Delay based on Pareto Analysis of Time in Alarm
Stale Alarms
Stale alarms, those that remain active for more than 24 hours, create a double threat. First, they clutter BAS alarm displays, making it difficult for operators to become aware of new events. Second, because these alarms can't annunciate again, they essentially become disabled, creating a potential safety risk.
Common causes for stale alarms in Alarm Triage include:
- Misconfigured Alarms: This could be due to improper alarm definition, limits that are too strict/lax, or settings not considering the equipment's state.
- Invalid Alarms: Some alarms are not relevant for the current state and should be suppressed.
- Faulty Hardware: Hardware malfunctions trigger alarms that persist until repair or replacement.
- Communication Breakdown: Sometimes Alarm Triage fails to receive the "Return to Normal" signal from the BAS, keeping the alarm active even though the alarm has cleared.
Evaluate Alarm Validity:
-
Review Setpoint (Limit):
- Check if the limit is too close to the normal operating range using historical data.
- If so, adjust the limit based on the normal operating range and the severity of the issue the alarm represents (consequence threshold).
-
Assess State-Dependent Limits:
- Determine if the equipment/space has different operating modes requiring distinct alarm limits.
- If applicable, implement "Dynamic Alarming" to automatically adjust limits based on the detected state (e.g., Normal, Not in Use, In Use).
-
Identify Invalid Alarm States:
- Check if there are situations when the alarm shouldn't trigger and should be suppressed automatically.
Address Hardware Issues:
- If faulty hardware is causing the alarm, mark it as out-of-service in the BAS and Snooze the alarm in Alarm Triage.
Resolve Communication Errors:
- If the alarm shows active in Alarm Triage but has returned to normal (RTN) in the BAS, contact Virtual Facility for further assistance.
- You may also need to manually clear the alarm in Alarm Triage (specific instructions will depend on the system).