From OpenNMS
Contents |
Configuration Details
Alarms are derived from OpenNMS Events with an <alarm-data> element that includes a <reductionKey>. The event definitions live in the XML files that are included via an <event-file> directive in the eventconf.xml file and are formatted as shown here:
Example
<event>
<mask>
<maskelement>
<mename>id</mename>
<mevalue>.1.3.6.1.4.1.3955.2.2.1</mevalue>
</maskelement>
<maskelement>
<mename>generic</mename>
<mevalue>6</mevalue>
</maskelement>
<maskelement>
<mename>specific</mename>
<mevalue>1</mevalue>
</maskelement>
</mask>
<uei>uei.opennms.org/vendor/Linksys/traps/linksysConnTrap</uei>
<event-label>Linksys Connection Trap</event-label>
<descr><p>This trap signifies that a TCP/UDP connection has been made. </p>
</descr>
<logmsg dest="logndisplay"><p>Linksys Event: %parm[#1]%.</p></logmsg>
<severity>Normal</severity>
<alarm-data reduction-key="%uei%:%dpname%:%nodeid%" alarm-type="1" auto-clean="true"/>
</event>
Getting Started
First off, any event that has a reductionKey assigned, will result in the creation of an alarm which is persisted to the alarms table, if also, the Event does not have the element logmsg dest set to "donotpersist":
<logmsg dest="donotpersist" />
Alarm Data
The alarm-data element contains the following attributes:
- reduction key (replaceable text)
- alarm type (positive integer)
- auto-clean (boolean)
Reduction Keys
The following sample reduction-key attribute tells the alarm writer in OpenNMS to parse the UEI and the node ID from the event and store it in the reductionKey column. (Note: This alarm-data element containing the reduction-key attribute is a new format for version 1.3.1. The previous format was an element like: "<reductionKey>")
<alarm-data reduction-key="%uei%:%nodeid%" alarm-type="1" auto-clean="false" />
The alarm writer now uses that key to determine if a previous event with the same reductionKey has already been converted to an alarm and inserted into the alarms table. If it has, it only updates the lastEventTime, lastEventID, and the counter columns. Otherwise, it inserts a new alarm.
Alarm Types
The alarm-type attribute is added to assist with automations, correlation, and other integration such as the new OSS/J implementation. The current model defines an alarm-type set to "1" to be a problem, and alarm-type set to "2" to be a resolution event. Feel free to use this field at your own discression (positiveIntegers only), however, the default integrations make use of these values (i.e. cosmicClear automation).
Auto Cleaning
As an event is processed into the Alarm model, the auto-clean attribute is used to removed the historical events. All previous events matching the reduction key of the current event will be removed from the DB. This is best used with events that do not have outages, notifications, etc. associated with them. Mainly for just tallying events that can later be used to trigger an action in an automation.
Mailing List Discussion
Jeff Gehlbach to opennms-users on Wed, 3 Sep 2008 16:56:21 -0400 (EDT) Message-Id: <6F6A91A9-16B6-421B-B773-68EEB99C72F1@opennms.org>
Setting auto-clean="true" in an event's alarm-data annotation causes all old events for an alarm to be deleted from the database when a new event comes in that is reduced under an existing alarm. The use case for this attribute is events that tend to be numerous and for which the alarm counter is as useful as the individual events, such as SNMP authenFailure traps. So normally one sets this only on trouble (alarm- type="1") events. In fact, since Normal-severity resolution (alarm- type="2") alarms are deleted periodically by an automation, setting auto-clean="true" on a resolution event makes no real sense.
So there's what cleaning is about. Clearing is about automatically setting the severity of an alarm to "Cleared" when a resolution (alarm- type="2") is received whose clear-key matches the reduction-key of the trouble (alarm-type="1") alarm. I think you've got a pretty good grasp on clearing, with a couple of caveats:
- The "clear-uei" attribute is deprecated and should not be used in new definitions
- Clearing is performed by an automation, so its effect is not instantaneous.
Jeff Gehlbach to opennms-discuss on 2008-10-15 14:28:51
Message-Id: <AF1E72F2-72FB-457B-93B7-49908D8E261D@opennms.org>
When you acknowledge an alarm, you're saying "I acknowledge that this problem is genuine and I am taking ownership of its resolution." Acknowledged alarms never go away, this is by design. There is a default automation that deletes unacknowledged alarms whose severity is "Cleared", so if you want an alarm to go away, it should be cleared and unacknowledged.
Alarms are designed to be self-clearing when possible. For event types that come in "trouble" and "resolution" flavors, we try to define the alarms so that the "resolution" event clears an alarm that was created by the corresponding "trouble" event. Not every event comes in these nice pairs, though, so sometimes it's necessary to clear an alarm manually as you have done.
Benefits
The great benefits of this new feature are:
- You can now view only those events that actually represent problems
- Immediately see how many of each of these have been received
- Probably view almost all events (now alarms) in one view
Another new feature added to support Alarms is called Automations. Even though Automations provides more potential for your network management processes than just handling alarms, this is a good time to get started with them and see how alarm management can be dramatically enhanced. With the use of Automations, OpenNMS can now provide Event/Alarm correlation, escalation, and better Event/Alarm management.






