Flap detection
Subscribe

From OpenNMS

Jump to: navigation, search


Overview

There often are irregularities which can't be taken as an important error per se like an interface going down and up once but, occuring more often during some time span, have to be considered as important enough to take a closer look at.

This automation provides a possibility to count specific events within some predefined time span and generate an alarm if they occur too often.

There is a known problem with this approach: once the limit is reached you will keep getting additional events until the mean value of monitored events over the specified time span falls below the configured threshold.

Configuration of automation

Automations are configured in $OPENNMS_HOME/etc/vacuumd-configuration.xml

This example is for generating a new event "FlappingInterface" if an InterfaceDown event is reoccuring 10 times or more within one hour.
First, define the automation and the interval to check. Don't make it too small to avoid excessive database load.

<automations>
 ...
 <automation name="FlappingInterface"
             interval="3000000" active="true"
             trigger-name="triggerFlappingInterface"
             action-name="actionFlappingInterface"
             action-event="FlappingInterface" /> 
 ...
</automations>

Second, define the trigger. Here you define which event should be monitored, the time span for adding up the events and the number of events within this timespan considered to be critical. Here we have a timespan of 1 hour and the critical number of events is 10

<triggers>
 ...
 <trigger name="triggerFlappingInterface" operator="&gt;="  row-count="1">
  <statement>SELECT node.nodelabel AS _nodelabel,
                    e.count AS _count,
                    e.nodeid AS _nodeid,
                    e.eventuei AS _eventuei,
                    e.ipaddr AS _ipaddr
             FROM ( SELECT nodeid,eventuei,ipaddr,count(nodeid) AS count
                        FROM events
                            WHERE eventuei='uei.opennms.org/nodes/interfaceDown'
                            AND eventsource='OpenNMS.Poller.DefaultPollContext'
                            AND (eventcreatetime > now() - interval '1 hour')
                        GROUP BY nodeid,eventuei,ipaddr )
             AS e
             LEFT OUTER JOIN node ON (e.nodeid=node.nodeid) WHERE e.count >= 10;
  </statement> 
 </trigger>
 ...
</triggers>

You don't really need an action here, but the definition is required.

<actions>
 ...
 <action name="actionFlappingInterface">
  <statement /> 
 </action>
 ...
</actions>

Here we create a new event which will show us that there is some problem reoccuring.

<action-events>
 ...
 <action-event name="FlappingInterface" for-each-result="true">
  <assignment type="field" name="uei" value="uei.opennms.org/vacuumd/FlappingInterface" /> 
  <assignment type="field" name="nodeid" value="${_nodeid}" /> 
  <assignment type="field" name="ipaddr" value="${_ipaddr}" /> 
 </action-event>
 ...
</action-events>

Configuration of the new event

Create your own event definition file like $OPENNMS_HOME/etc/events/myCompanyEvents.xml containing the new event.

Please take also a look at [[1]]

<events>
...
   <event>
       <uei>uei.opennms.org/vacuumd/FlappingInterface</uei>
       <event-label>Event defined for myCompany: Interface is flapping</event-label>
       <descr> <p>InterfaceDown for node:%nodelabel% interface %ipaddr% reoccured multiple times.</p&gt;</descr>
       <logmsg dest="logndisplay"> &lt;p&gt;InterfaceDown event reoccuring for node:%nodelabel% interface: %ipaddr%;.&lt;/p&gt;</logmsg>
       <severity>Minor</severity>
   </event>
 <event>
 ...
</events>

Don't forget to reference this file within $OPENNMS_HOME/etc/eventconf.xml like

...
<event-file>events/myCompanyEvents.xml</event-file>
...