User talk:Jeffg
Subscribe

From OpenNMS

Jump to: navigation, search

Contents

Useful Thoughts

Possible workaround for Net-SNMP < 5.4.1 amd64 agent bug

The following patch, against Snmp4J 1.8.2, seems to effect a workaround for this highly annoying and destructive agent bug.

diff -ur snmp4j-1.8.2.orig/src/org/snmp4j/smi/IpAddress.java snmp4j-1.8.2/src/org/snmp4j/smi/IpAddress.java
--- snmp4j-1.8.2.orig/src/org/snmp4j/smi/IpAddress.java 2007-03-10 21:59:12.000000000 -0500
+++ snmp4j-1.8.2/src/org/snmp4j/smi/IpAddress.java      2007-08-26 19:07:26.000000000 -0400
@@ -150,9 +150,17 @@
                             type.getValue());
     }
     if (value.length != 4) {
-      throw new IOException("IpAddress encoding error, wrong length: " +
+         if ( (value.length == 8) && (System.getProperty("org.snmp4j.smi.bugCompat.netSnmpIpAddress8Bytes").equals("true")) ) {
+               byte[] tempValue = { 0,0,0,0 };
+               for (int i = 0; i < 4; i++) {
+                 tempValue[i] = value[i];
+               }
+               value = tempValue;
+         } else {
+        throw new IOException("IpAddress encoding error, wrong length: " +
                             value.length);
-    }
+      }
+       }
     inetAddress = InetAddress.getByAddress(value);
   }
 

After patching and building SNMP4J, drop the resulting .../dist/lib/SNMP4J.jar into OPENNMS_HOME/lib in place of the existing snmp4j-x.y.z.jar. Then add the following line to OPENNMS_HOME/etc/opennms.properties:

org.snmp4j.smi.bugCompat.netSnmpIpAddress8Bytes=true

I'd like several people to test this patch before I submit it upstream to the Snmp4J maintainers for possible inclusion in a future version of their official distribution.


How to package OpenNMS 1.3.6 for Debian / Ubuntu

1.3.6 Debian Packaging is meant as a resource for OpenNMS developers. Everyday users should get pre-built DEBs from the APT repo.

Maven Profile Activation Ordering Bug

There is a bug in Maven 2.0.4 (the version we use as of today, 09 June 2007) that causes profiles to be activated in random order. I note this here because I was at the edge of sanity trying to override a profile. This bug is allegedly fixed in Maven 2.0.5, so I'm going to give that a spin.

JDBC Monitoring

Notes on using JDBCMonitor and JDBCStoredProcedureMonitor

Irrational Musings

At present, most of these deal with things that would help nudge OpenNMS towards 100%-Java (or at least 0% JNI) status.

Move ICMP from JNI / libjicmp to InetAddress::isReachable

This idea seemed great at the time, but the new InetAddress.isReachable method in Java 5 / 1.5.0 may lack the customizability to act as a replacement for the existing JNI method. The problems as I see them:

  • While you can specify a timeout in milliseconds using one signature, all (both as of 1.5.0) signatures return a boolean. That means we can get a literal yes/no answer on whether a node is reachable, but not a round-trip time to store. The workaround is to sample the system time before and after the method call:
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.io.IOException;

public class Pingit {
    public static void main (String[] argv) {
        InetAddress ia = null;
        boolean isUp = false;
        long sTime, rtTime;

        try {
            ia = InetAddress.getByName(argv[0]);
        }
        catch (UnknownHostException e) {
            System.out.println("Unknown host: " + e.toString() + "\n");
            System.exit(1);
        }
        sTime = System.currentTimeMillis();
        try {
            isUp = ia.isReachable(5000);
        } catch (IOException e) {
            System.out.println("Timed out\n");
            System.exit(1);
        } catch (IllegalArgumentException e) {
            System.out.println("Timeout must be positive\n");
            System.exit(1);
        } finally {
            rtTime = System.currentTimeMillis() - sTime;
            System.out.println("Host " + ia.toString() + (isUp ? " is alive (" + rtTime + "ms) \n" : " unreachable\n"));
        }
    }
}

Example run:

[jeffg@mahotkat: /tmp]$ sudo java Pingit yahoo.com
Host yahoo.com/216.109.112.135 is alive (24ms)

The resulting numbers would need careful, rigorous testing against equivalent JNI results.

  • There is no way to specify the payload, or even the payload size, as of Java 5 / 1.5.0. The current JNI method uses a set payload of the string "OpenNMS" to achieve a 56-byte datagram size. Whether or not this is critical I do not know. I can imagine some very paranoid organizations requiring ICMP echo datagrams to be exactly some magical size, in which case we would be dead in the water.
  • The implementation of InetAddress.isReachable in Java 5 / 1.5.0 automatically and silently falls back to using TCP echo if it does not have privileges to create ICMP datagrams, i.e. if OpenNMS is not running as a privileged user. While the automatic fallback may be nice in shops where we cannot run as root, the fact that we never hear about it could confound troubleshooting in cases where we accidentally get demoted to a non-privileged user. Other, uglier permissions issues would probably mask this problem, though.

Reimplement the iplike function in PL/PGSQL rather than C

My reading so far indicates this should be possible. I will try to come up with a proof of concept in the near future. I think Matt said that he and Dave had tried to accomplish this a while back, but had lacked the PL/PGSQL expertise to get the performance where it needed to be. I have zero expertise in the language, but maybe my notes will help somebody someday.

Jeff, have a look at:

 http://bugzilla.opennms.org/cgi-bin/bugzilla/show_bug.cgi?id=1504

Christopher Browne, spent a lot of time on this and there may be some 8.2 PostgreSQL changes that will support our quest further. You might want to try and touch base with him.

Also, he suggested that we start moving to INET values in the DB rather than strings for IPs. We could begin doing this as we did with the ID columns by implementing Triggers to automatically populate the INET column if it is null. Hopefully there are annotations to support our Hibernate DAOs.

UPDATE: We now (as of OpenNMS 1.3.6) ship a PL/PGSQL version of IPLIKE, and fall back on it if the C-language version is unavailable.

RRD stress-test on kif

I got a new pair of OpenNMS servers, HP DL360 G5s with one dual-core Xeon Woodcrest CPU each. The daemons node, kif, was specced to optimize disk throughput. Here's the physical layout:

  • OS and apps on a RAID1 -- a single two-disk mirror
  • JRB files on a RAID10 -- a single stripe of two two-disk mirrors

I laid out all but the /boot and swap volumes on LVM volumes, for future expandability. The observant reader will realize that this setup maxes out the disk bays in a G5 DL360, but I can add up to two more physical disks by upgrading to a DL380.

The JRB disks are 72GB 15KRPM SAS disks. The RAID is done in hardware by an HP P400i controller with the 512MB battery-backed write cache upgrade. I am trying to determine what filesystem and options offers the best performance for an OpenNMS installation on this physical setup.

I ran the RRD stresser with the following invocation against three different filesystem setups on the same physical / logical layout defined above:

java -jar -Xmx1024m  -Dorg.opennms.rrd.queuing.writethreads=10 -Dstresstest.filecount=30000 \
-Dstresstest.modulus=1000 -Dstresstest.threadcount=75 -Dstresstest.maxupdates=100000 \
-Dstresstest.file=/rrd/stress/ opennms-rrd-stresser-1.3.3-SNAPSHOT-jar-with-dependencies.jar

The raw results are available here

I post-processed these results with an ugly shell+Perl one-liner that averages the significantOpsPending and produces a histogram of these values:

gehlbachj@gehlbachj-laptop:~/projects/rrd-stress$ for file in * ; do echo $file ; \
 grep 'significantOpsPending' $file | perl -e 'my %counts; my $samp = 0; my $tot = 0; while (<STDIN>) { \
 if ($_ =~ /significantOpsPending=(\d+),/) { $counts{$1}++; $samp++; $tot += $1; } } print "Average: " \
 . ($tot / $samp) . "\nHistogram:\n"; foreach my $count (sort { $a <=> $b} keys %counts) { print \
 "\t$count: $counts{$count}\n"; }' ; done

Results:

rrd-stressresults_ext3+data=writeback,noexec,nosuid,nodev,noatime.txt
Average: 2.96
Histogram:
        0: 15
        1: 26
        2: 9
        3: 14
        4: 8
        5: 9
        6: 6
        7: 11
        8: 1
        12: 1
rrd-stressresults_ext3+noexec,nosuid,nodev,noatime.txt
Average: 1.68
Histogram:
        0: 39
        1: 17
        2: 17
        3: 9
        4: 8
        5: 4
        6: 4
        7: 2
rrd-stressresults_xfs+noexec,nosuid,nodev,noatime.txt
Average: 2.16
Histogram:
        0: 28
        1: 20
        2: 11
        3: 16
        4: 9
        5: 8
        6: 7
        8: 1

MIB Studies