Buzz...Buzz...Buzz...Buzz... Alex is jolted awake at 3 am from his phone incessantly sending alerts. “What the heck?” he grumbles. Blinking to attention, he sees over 100 alarm events entering his inbox at rapid-fire. They enter so rapidly that they come in batches, briefly pausing as the application tries to process the bombardment of new requests. Buried somewhere in this pile lies one event that needs attention. Overloaded with alerts, the application freezes, and any attempt to search the queue is futile. As Alex calls into the operator on duty, he grumbles, “This can’t happen anymore.”
Like water floods, alarm floods are not fun. As we shared in a previous blog, an alarm flood is a specific event that causes the system to flood with an unusual amount of alarms. An alarm flood can be due to a cascade effect from a single point of failure. When these occur, operators become detectives trying to determine the ultimate event that began the flood.
At Casne Engineering, we focus on configuring systems to first detect the most important alarms within an alarm flood alert. We use advanced algorithms to group the less important alarms into a secondary email, keeping the critical alarms at the operators’ attention.
Removing the Alarm Headache
Our 7 Step Solution to Alarm Management supports customers as they move from reactive alarm systems to predictive ones. Here are some of the ways we have helped our customers:
Loss of Communications
A client was experiencing network connectivity issues, which caused over 300 individual devices to go into alarm for “Loss of Communications.” The stream of alarms flooding the operator inbox seemed to never end as message after message blocked the email system’s usability.
Casne solved the alarm issue by implementing advanced alarming algorithms to determine if the connectivity issue was for a specific device or the entire network. Once the problem was identified, the operators would receive a single traceable and actionable email alert.
Identifying the Important Alarms, Without the Hunt
We had another client who was having difficulty determining the identity of some significant critical events. Our engineering team performed a detailed benchmark assessment, which showed that an average of 20 alarms would be triggered during critical events. These alarms consisted not only of the triggering device but all of the devices downstream as well. Alarms would arrive at random, and operators would rush off to a location far from the initiating device, to only come back to the event queue and see other alarms that could be the possible cause, wasting valuable time.
By looking at the alarm process and with the help of an Alarm Philosophy, we were able to define the alarm paths clearly and implement “Cascade Detection,” which is programmed to follow alarm cascade paths to identify and alert the true cause of the cascade. The rest of the alarms were detoured. Depending on the severity, the alarms would be shown in the Alarms alert, delayed until the initial alarm was acknowledged, or muted until conditions returned to normal. This dynamic alarm process decreased the time to act, which reduced the time to restore the system to production.
Remove the Alarm Stress
Nobody likes waking up at 3 am to a flood of emails. The stress isn’t worth it. Partnering with Casne Engineering to bring your reactive alarm system to a proactive one will bring you peace of mind and a good night’s sleep. Casne Engineering provides all the necessary solutions for your operational technology needs. We bring over 40 years of success in professional engineering and technology integration services for major utilities, process industries, and critical facilities. Our team of capable engineers and technologists develops and supports engineered solutions using the best of breed products and technologies. Contact us here to discuss your operational technology needs.