NetSNMP logmatch usage
From OpenNMS
Contents |
Using NetSNMP logmatch
Background
My company had been having issues with RADIUS authentications failing. Being a typical OpenNMS user, I am a data collection freak. I decided to find a way to quantify how often this was happening and maybe find some pattern for it. We use Juniper Network Steel Belted RADIUS(Formerly Funk's).
NetSNMP
The system is a SuSE Linux which has an NetSNMP agent installed. All of the logs are text files, a new text file is created every night at midnight with a new name corresponding to the date. I have a small cron that creates a symlink to the file name current.log. I added the following to the bottom of my snmpd.conf -
logmatch radunwilling /var/log/sbr/current.log 600 DSA is unwilling to perform logmatch radldapsuccess /var/log/sbr/current.log 600 LDAPAUTH: Authentication attempt.*Success logmatch radldapfail /var/log/sbr/current.log 600 LDAPAUTH: Authentication attempt.*Failure logmatch radtlssuccess /var/log/sbr/current.log 600 EAP-TLS authentication succeeded logmatch radtlsfail /var/log/sbr/current.log 600 EAP-TLS authentication failed sysObjectID .1.3.6.1.4.1.1411
The logmatch resets the counter each time it is read for some values, I have set the cycletime to 600 in case there is a missed poll. I also decided to use a custom sysObjectID so I can better distinguish it in my datacollection-config.xml.
datacollection-config.xml
Using the standard instance ID method
<group name="juniper-sbr" ifType="ignore">
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="1" alias="DSAunwilling" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="2" alias="LDAPsuccess" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="3" alias="LDAPfailure" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="4" alias="TLSsuccess" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="5" alias="TLSfailure" type="Counter32" />
</group>
<systemDef name="Juniper SBR">
<sysoidMask>.1.3.6.1.4.1.1411</sysoidMask>
<collect>
<includeGroup>mib2-host-resources-system</includeGroup>
<includeGroup>mib2-host-resources-memory</includeGroup>
<includeGroup>net-snmp-disk</includeGroup>
<includeGroup>ucd-loadavg</includeGroup>
<includeGroup>ucd-memory</includeGroup>
<includeGroup>ucd-sysstat</includeGroup>
<includeGroup>ucd-diskio</includeGroup>
<includeGroup>juniper-sbr</includeGroup>
</collect>
</systemDef>
Using Collecting_SNMP_data_from_tables_with_arbitrary_indexes method
Here are the bits I added to my datacollection-config.xml -
<resourceType name="logMatchIndex" label="NetSnmp Log Match"
resourceLabel="${logMatchName} RegEx - ${logMatchRegEx}">
<persistenceSelectorStrategy class="org.opennms.netmgt.collectd.PersistAllSelectorStrategy"/>
<storageStrategy class="org.opennms.netmgt.dao.support.IndexStorageStrategy"/>
</resourceType>
<group name="juniper-sbr" ifType="all">
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.2" instance="logMatchIndex" alias="logMatchName" type="string" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.4" instance="logMatchIndex" alias="logMatchRegEx" type="string" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="logMatchIndex" alias="logMatchGlobCnt" type="Counter32" />
<mibObj oid=".1.3.6.1.4.1.2021.16.2.1.6" instance="logMatchIndex" alias="logMatchGlobInt" type="Integer" />
</group>
<systemDef name="Juniper SBR">
<sysoidMask>.1.3.6.1.4.1.1411</sysoidMask>
<collect>
<includeGroup>mib2-host-resources-system</includeGroup>
<includeGroup>mib2-host-resources-memory</includeGroup>
<includeGroup>net-snmp-disk</includeGroup>
<includeGroup>ucd-loadavg</includeGroup>
<includeGroup>ucd-memory</includeGroup>
<includeGroup>ucd-sysstat</includeGroup>
<includeGroup>ucd-diskio</includeGroup>
<includeGroup>juniper-sbr</includeGroup>
</collect>
</systemDef>
snmp-graph.properties
Using the standard instance ID method
I added to the reports line -
juniper-sbr.ldapsuccess, juniper-sbr.ldapfail, juniper-sbr.tlssuccess, juniper-sbr.tlsfail, \
report.juniper-sbr.ldap.name=SBR LDAP per minute
report.juniper-sbr.ldap.columns=LDAPsuccess, LDAPfailure
report.juniper-sbr.ldap.type=nodeSnmp
report.juniper-sbr.ldap.command=--title="LDAP per minute" \
DEF:val1={rrd1}:LDAPsuccess:AVERAGE \
DEF:val2={rrd2}:LDAPfailure:AVERAGE \
CDEF:successpermin=val1,60,* \
CDEF:failpermin=val2,60,* \
LINE2:successpermin#0000Ff:"Success " \
GPRINT:successpermin:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:successpermin:MIN:"Min \\: %8.2lf %s" \
GPRINT:successpermin:MAX:"Max \\: %8.2lf %s\\n" \
LINE2:failpermin#ff0000:"Failure " \
GPRINT:failpermin:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:failpermin:MIN:"Min \\: %8.2lf %s" \
GPRINT:failpermin:MAX:"Max \\: %8.2lf %s\\n"
report.juniper-sbr.tls.name=SBR TLS per minute
report.juniper-sbr.tls.columns=TLSsuccess, TLSfailure
report.juniper-sbr.tls.type=nodeSnmp
report.juniper-sbr.tls.command=--title="TLS per minute" \
DEF:val1={rrd1}:TLSsuccess:AVERAGE \
DEF:val2={rrd2}:TLSfailure:AVERAGE \
CDEF:successpermin=val1,60,* \
CDEF:failpermin=val2,60,* \
LINE2:successpermin#0000Ff:"Success " \
GPRINT:successpermin:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:successpermin:MIN:"Min \\: %8.2lf %s" \
GPRINT:successpermin:MAX:"Max \\: %8.2lf %s\\n" \
LINE2:failpermin#ff0000:"Failure " \
GPRINT:failpermin:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:failpermin:MIN:"Min \\: %8.2lf %s" \
GPRINT:failpermin:MAX:"Max \\: %8.2lf %s\\n"
Using Collecting_SNMP_data_from_tables_with_arbitrary_indexes method
I added to the reports line -
juniper-sbr.logmatch, juniper-sbr.logmatchint, \
and the two reports -
report.juniper-sbr.logmatch.name=SBR log matches per minute
report.juniper-sbr.logmatch.columns=logMatchGlobCnt
report.juniper-sbr.logmatch.type=logMatchIndex
report.juniper-sbr.logmatch.command=--title="Matches per minute" \
DEF:val1={rrd1}:logMatchGlobCnt:AVERAGE \
CDEF:valpermin=val1,60,* \
LINE2:valpermin#0000ff:"Rate " \
GPRINT:valpermin:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:valpermin:MIN:"Min \\: %8.2lf %s" \
GPRINT:valpermin:MAX:"Max \\: %8.2lf %s\\n"
report.juniper-sbr.logmatchint.name=SBR total log matches
report.juniper-sbr.logmatchint.columns=logMatchGlobInt
report.juniper-sbr.logmatchint.type=logMatchIndex
report.juniper-sbr.logmatchint.command=--title="Total Match counts" \
--units-exponent 0 \
DEF:val1={rrd1}:logMatchGlobInt:AVERAGE \
LINE2:val1#0000ff:"Counts " \
GPRINT:val1:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:val1:MIN:"Min \\: %8.2lf %s" \
GPRINT:val1:MAX:"Max \\: %8.2lf %s\\n"
I changed the rate to per minute to make it easier to understand. The other is a straight counter, not sure it will ever be as useful as the rate though. Here is a sample of the output of sbr.logmatch -
Here is another example. In this case I was able to diagnose the problem due to the high amount of fails occurring. You can also see the recovery -
If I move away from the arbitrary index and use static OIDs in data collection, I could put multiple lines on the graphs. I could then have fail and success on the same graph image.
thresholds.xml
I'm not sure how to approach this at this time. It seems, with the arbitrary index, I might need different values for different parts of the index. I think I will need to change this to static OIDs then I can set the threshold specifically to the value I want to monitor.









New Pages