NetSNMP logmatch usage

From OpenNMS

Contents

Using NetSNMP logmatch

Background

My company had been having issues with RADIUS authentications failing. Being a typical OpenNMS user, I am a data collection freak. I decided to find a way to quantify how often this was happening and maybe find some pattern for it. We use Juniper Network Steel Belted RADIUS(Formerly Funk's).


NetSNMP

The system is a SuSE Linux which has an NetSNMP agent installed. All of the logs are text files, a new text file is created every night at midnight with a new name corresponding to the date. I have a small cron that creates a symlink to the file name current.log. I added the following to the bottom of my snmpd.conf -

logmatch radunwilling /var/log/sbr/current.log 600 DSA is unwilling to perform
logmatch radldapsuccess /var/log/sbr/current.log 600 LDAPAUTH: Authentication attempt.*Success
logmatch radldapfail /var/log/sbr/current.log 600 LDAPAUTH: Authentication attempt.*Failure
logmatch radtlssuccess /var/log/sbr/current.log 600 EAP-TLS authentication succeeded
logmatch radtlsfail /var/log/sbr/current.log 600 EAP-TLS authentication failed
sysObjectID .1.3.6.1.4.1.1411

The logmatch resets the counter each time it is read for some values, I have set the cycletime to 600 in case there is a missed poll. I also decided to use a custom sysObjectID so I can better distinguish it in my datacollection-config.xml.

datacollection-config.xml

Using the standard instance ID method

      <group name="juniper-sbr" ifType="ignore">
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="1" alias="DSAunwilling" type="Counter32" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="2" alias="LDAPsuccess" type="Counter32" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="3" alias="LDAPfailure" type="Counter32" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="4" alias="TLSsuccess" type="Counter32" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="5" alias="TLSfailure" type="Counter32" />
      </group>
      <systemDef name="Juniper SBR">
        <sysoidMask>.1.3.6.1.4.1.1411</sysoidMask>
        <collect>
          <includeGroup>mib2-host-resources-system</includeGroup>
          <includeGroup>mib2-host-resources-memory</includeGroup>
          <includeGroup>net-snmp-disk</includeGroup>
          <includeGroup>ucd-loadavg</includeGroup>
          <includeGroup>ucd-memory</includeGroup>
          <includeGroup>ucd-sysstat</includeGroup>
          <includeGroup>ucd-diskio</includeGroup>
          <includeGroup>juniper-sbr</includeGroup>
        </collect>
      </systemDef>


Using Collecting_SNMP_data_from_tables_with_arbitrary_indexes method

Here are the bits I added to my datacollection-config.xml -

    <resourceType name="logMatchIndex" label="NetSnmp Log Match"
                  resourceLabel="${logMatchName} RegEx - ${logMatchRegEx}">
      <persistenceSelectorStrategy class="org.opennms.netmgt.collectd.PersistAllSelectorStrategy"/>
      <storageStrategy class="org.opennms.netmgt.dao.support.IndexStorageStrategy"/>
    </resourceType>
      <group name="juniper-sbr" ifType="all">
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.2" instance="logMatchIndex" alias="logMatchName" type="string" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.4" instance="logMatchIndex" alias="logMatchRegEx" type="string" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.5" instance="logMatchIndex" alias="logMatchGlobCnt" type="Counter32" />
        <mibObj oid=".1.3.6.1.4.1.2021.16.2.1.6" instance="logMatchIndex" alias="logMatchGlobInt" type="Integer" />
      </group>
      <systemDef name="Juniper SBR">
        <sysoidMask>.1.3.6.1.4.1.1411</sysoidMask>
        <collect>
          <includeGroup>mib2-host-resources-system</includeGroup>
          <includeGroup>mib2-host-resources-memory</includeGroup>
          <includeGroup>net-snmp-disk</includeGroup>
          <includeGroup>ucd-loadavg</includeGroup>
          <includeGroup>ucd-memory</includeGroup>
          <includeGroup>ucd-sysstat</includeGroup>
          <includeGroup>ucd-diskio</includeGroup>
          <includeGroup>juniper-sbr</includeGroup>
        </collect>
      </systemDef>

snmp-graph.properties

Using the standard instance ID method

I added to the reports line -

juniper-sbr.ldapsuccess, juniper-sbr.ldapfail, juniper-sbr.tlssuccess, juniper-sbr.tlsfail, \
report.juniper-sbr.ldap.name=SBR LDAP per minute
report.juniper-sbr.ldap.columns=LDAPsuccess, LDAPfailure
report.juniper-sbr.ldap.type=nodeSnmp
report.juniper-sbr.ldap.command=--title="LDAP per minute" \
DEF:val1={rrd1}:LDAPsuccess:AVERAGE \
DEF:val2={rrd2}:LDAPfailure:AVERAGE \
CDEF:successpermin=val1,60,* \
CDEF:failpermin=val2,60,* \
LINE2:successpermin#0000Ff:"Success " \
GPRINT:successpermin:AVERAGE:" Avg  \\: %8.2lf %s" \
GPRINT:successpermin:MIN:"Min  \\: %8.2lf %s" \
GPRINT:successpermin:MAX:"Max  \\: %8.2lf %s\\n" \
LINE2:failpermin#ff0000:"Failure " \
GPRINT:failpermin:AVERAGE:" Avg  \\: %8.2lf %s" \
GPRINT:failpermin:MIN:"Min  \\: %8.2lf %s" \
GPRINT:failpermin:MAX:"Max  \\: %8.2lf %s\\n"

report.juniper-sbr.tls.name=SBR TLS per minute
report.juniper-sbr.tls.columns=TLSsuccess, TLSfailure
report.juniper-sbr.tls.type=nodeSnmp
report.juniper-sbr.tls.command=--title="TLS per minute" \
DEF:val1={rrd1}:TLSsuccess:AVERAGE \
DEF:val2={rrd2}:TLSfailure:AVERAGE \
CDEF:successpermin=val1,60,* \
CDEF:failpermin=val2,60,* \
LINE2:successpermin#0000Ff:"Success " \
GPRINT:successpermin:AVERAGE:" Avg  \\: %8.2lf %s" \
GPRINT:successpermin:MIN:"Min  \\: %8.2lf %s" \
GPRINT:successpermin:MAX:"Max  \\: %8.2lf %s\\n" \
LINE2:failpermin#ff0000:"Failure " \
GPRINT:failpermin:AVERAGE:" Avg  \\: %8.2lf %s" \
GPRINT:failpermin:MIN:"Min  \\: %8.2lf %s" \
GPRINT:failpermin:MAX:"Max  \\: %8.2lf %s\\n"

Using Collecting_SNMP_data_from_tables_with_arbitrary_indexes method

I added to the reports line -

juniper-sbr.logmatch, juniper-sbr.logmatchint, \

and the two reports -

report.juniper-sbr.logmatch.name=SBR log matches per minute
report.juniper-sbr.logmatch.columns=logMatchGlobCnt
report.juniper-sbr.logmatch.type=logMatchIndex
report.juniper-sbr.logmatch.command=--title="Matches per minute" \
DEF:val1={rrd1}:logMatchGlobCnt:AVERAGE \
CDEF:valpermin=val1,60,* \
LINE2:valpermin#0000ff:"Rate " \
GPRINT:valpermin:AVERAGE:" Avg  \\: %8.2lf %s" \
GPRINT:valpermin:MIN:"Min  \\: %8.2lf %s" \
GPRINT:valpermin:MAX:"Max  \\: %8.2lf %s\\n"

report.juniper-sbr.logmatchint.name=SBR total log matches
report.juniper-sbr.logmatchint.columns=logMatchGlobInt
report.juniper-sbr.logmatchint.type=logMatchIndex
report.juniper-sbr.logmatchint.command=--title="Total Match counts" \
 --units-exponent 0  \
DEF:val1={rrd1}:logMatchGlobInt:AVERAGE \
LINE2:val1#0000ff:"Counts " \
GPRINT:val1:AVERAGE:" Avg  \\: %8.2lf %s" \
GPRINT:val1:MIN:"Min  \\: %8.2lf %s" \
GPRINT:val1:MAX:"Max  \\: %8.2lf %s\\n"

I changed the rate to per minute to make it easier to understand. The other is a straight counter, not sure it will ever be as useful as the rate though. Here is a sample of the output of sbr.logmatch - Image:Logmatch.png

Here is another example. In this case I was able to diagnose the problem due to the high amount of fails occurring. You can also see the recovery - Image:Tlsfail.png

If I move away from the arbitrary index and use static OIDs in data collection, I could put multiple lines on the graphs. I could then have fail and success on the same graph image.

thresholds.xml

I'm not sure how to approach this at this time. It seems, with the arbitrary index, I might need different values for different parts of the index. I think I will need to change this to static OIDs then I can set the threshold specifically to the value I want to monitor.

Personal tools
DevJam 2008 Sponsors
DevJam 2008 Sponsor: Google
DevJam 2008 Sponsor: Netregistry
DevJam 2008 Sponsor: Papa John's
NewEdge Networks
OpenNMS takes home the gold award!
Join the Free Software Foundation
Support This Project Commercial OpenNMS Support OpenNMS Italia Get OpenNMS at SourceForge.net. Fast, secure and Free Open Source software downloads Our Network Simulator Our Java Profiler