HTTP Collector
Subscribe

From OpenNMS

Jump to: navigation, search

HTTP Data Collector

The OpenNMS Community has added another new feature to the project, the HTTP Data Collector. This feature gives network managers the ability to collect metrics for performance reporting via the HTTP Protocol. Traditionally, performance data is gathered by an NMS via the SNMP protocol. While OpenNMS provides an extremely high performance SNMP data collector, this new collector gives network managers the flexibility they need for gathering performance data via the Internet's most widely used and flexible protocol without having to immediately deploy SNMP to gather this data.

Availability

This feature is available from the 1.3.2 release.

IT Executive Summary

The HTTP Data Collector can be used to gather performance data via HTTP servers and the HTTP protocol. The OpenNMS administrator configures this collector in much the same way as the data collection is configured for the two other supported protcols, SNMP and JMX. Data is collected by sending the configured URL(s) to the HTTP server and parsing the results with regular expressions for one or more data points. Matched groups from the regular expression are written to RRD repositories and presented in the OpenNMS Web application. Once data is gathered and written to an RRD, the other components of OpenNMS (thresholding, reporting, etc.) will function regardless of the protocol used to collect the data.

Using the extensible ServiceCollector Java Interface, a data collector was developed to provide network managers with extreme flexibility for gathering performance metrics.

Detailed Description

Configuring data collection via the HTTP protocol is easier if one is already familiar with configuring data collection in OpenNMS with the SNMP collector and/or the JMX collector. The HTTP collector is configured as a collectable service in a Collectd package and references collections defined in the http-datacollection-config.xml.

Example Collection

A more complex example which Monitors Apache is here.

The HTTP Collector works by using the HTTP protocol to retrieve a response from an HTTP server and parsing that response for numeric attributes and storing them in an RRD repository. The parsing is based on regular expressions containing one or more matching groups (back references in RE speak). For example, providing the data collector the following expression:

  "(?s).*?Document\sCount:\s+([0-9]+).*"

and a URL that returns the following sample text (http://www.opennms.org/test/resources/httpcolltest.html):

  <!DOCTYPE html
   PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
   "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
   <html>
     <head>
       <title>collector test</title>
     </head>
     <body>
       <p>Document Count: 5</p>
     </body>
   </html>

...will find the value "5" that will be stored in an RRD file.

When defining the document count attribute, you must determine the data type for storage in the RRD repoository. Unlike SNMP, where the MIB definition file defines the data type (COUNTER, INTEGER, GAUGE, etc.) for you, you will need to make that determination. In this example, if the document count is a measurement of the quantity of documents served over a given period of time, then a GAUGE would be the data type to specify. If the document count returned is a running total of the number of documents served, then COUNTER[32|64] would be the correct data type.

In this example, an attribute will be defined for persistence by referencing the RE match group 1. Match group 1 is associated with the RE group "([0-9]+)" defined in the expression (ignore the "(?s)" for now). The number of attributes (match groups) that can be derived from one HTTP response is unbounded. If you have trouble assembling your own regular expressions than you might want to look at an applet providing a web-based regular expression test for your convenience [1].

Save this file to doccount.html in the root of your web server, you can not source it from the opennms location as it will require authentication to access it.

The OpenNMS Collector Architecture Revisited

There are two main components of the OpenNMS data collection architecture, the collector daemon (Collectd.java) and the service collector interface (ServiceCollector.java) that is implemented by all data collectors. The collector daemon's function is to schedule data collection for OpenNMS entities (currently nodes and interfaces) using the service collectors defined in collection packages. Additionally in the collector's configuration, service collectors are associated with services. When a service is discovered on an entity and that service is defined in a collector package, it gets scheduled for collection by collectd. The service monitor's "collect" method is called by the scheduler and, like magic, you have a simple yet easily extensible architecture.

Defining HTTP Collection Packages

The HTTP collector (HttpCollector.java) is a new service collector that can be defined in the collector's configuration for collecting data via HTTP. This configuration is identical in every aspect to configuring any service collector, however, the functionality is a bit different than the SNMP collector. Because the SNMP agent information is associated with nodes and interfaces in the database, the collector configuration can be generically assigned to the service "SNMP" for data collection via the SNMP protocol. The node's system object ID can be used by the SNMP service collector to associate collection groups with nodes. When using the HTTP collector, collections are assigned per service and not generically to the "HTTP" service. So, a node having an HTTP service defined may or may not support data collection on the service. To properly configure a node for HTTP collection, other service names are defined to associate collection groups with a set of nodes. This is done using Provisiond through the use of either a requisition or a detector (see detector example below).

Here is a Collectd configuration supporting the Document Count example above:

collectd-configuration.xml

  <package name="doc-count">
    <filter>IPADDR IPLIKE *.*.*.*</filter>
    <include-range begin="1.1.1.1" end="254.254.254.254"/>

    <service name="HttpDocCount" interval="300000" user-defined="false" status="on" >
      <parameter key="collection" value="doc-count-1" />
      <parameter key="retry" value="1" />
      <parameter key="timeout" value="2000" />
      <parameter key="url" value="/doccount.html" />
    </service>

  </package>
  <collector service="HttpDocCount" class-name="org.opennms.netmgt.collectd.HttpCollector" />

This defines a collection package uses the HTTP service collector for any node in the system (notice the filter and IP range) that supports the "HttpDocCount" service. The attributes that can be collected for these nodes via HTTP will be found in the http-datacollection-config.xml file under the "http-collection" element named "doc-count-1". The parameters a passed the the HTTP service collector to control its behavior. The allowed parameters are those show in the example:

HttpCollector Parameters

  • collection
  • retry
  • timeout
  • http-collection (deprecated, use collection instead)

Defining HTTP Collection Attributes

Data collection attributes are defined in the http-datacollection-config.xml file. It is much the same format as the datacollection-config.xml file in that it contains many of the same elements. Here is the configuration used for the example collection above:

http-datacollection-config.xml

<?xml version="1.0" encoding="UTF-8"?>
<http-datacollection-config  
    xmlns:http-dc="http://xmlns.opennms.org/xsd/config/http-datacollection" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/http-datacollection http://www.opennms.org/xsd/config/http-datacollection-config.xsd" 
    rrdRepository="/usr/opennms/share/rrd/snmp/" >
  <http-collection name="doc-count-1">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:8928</rra>
      <rra>RRA:AVERAGE:0.5:12:8784</rra>
      <rra>RRA:MIN:0.5:12:8784</rra>
      <rra>RRA:MAX:0.5:12:8784</rra>
    </rrd>
    <uris>
      <uri name="document-counts">
        <url path="/doccount.html"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412" 
             matches="(?s).*?Document\sCount:\s+([0-9]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="documentCount" match-group="1" type="gauge32"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>
</http-datacollection-config>

As you can see, the "rrdRepository" attribute and the "<rrd>" element are identical and there is an "http-collection" element that behaves as "collection" element in datacollection-config.xml (the collection service defined in a Collectd package points to the collection name in this file). The configuration elements change from here but the concept of constructing groups of attributes for collection are much the same as with SNMP and JMX. Where SNMP data collection has the "<group>" element and JMX has the "<mbeans>" element, the HTTP collector has the "<uris>" element where you define a list of URIs (see RFC 2396).

URLs

The "<url>" element can be defined with all the URI/URL components specified in RFC 2396. This allows data collection URLs to be defined with absolute specificity so that the HTTP collector can build a very precise URL (see the schema here: [XSD]):

<url> attributes

name="method"          use="optional"  default="GET"
name="http-version"    use="optional"  default="1.1"
name="user-agent"      use="optional"
name="virtual-host"         use="optional"
name="scheme"          use="optional"  default="http://"
name="user-info"      use="optional"
name="host"            use="optional"  default="${ipaddr}"
name="port"            use="optional"  default="80"
name="path"            use="required"
name="query"            use="optional"
name="fragment"            use="optional"
name="matches"         use="optional"  default="(.*)"
name="response-range"  use="optional"  default="100-399"

(note the missing "fragment" URL component, soon to be fixed)

scheme

So far, only the "http://" schema has been tested. No restrictions have yet been placed in the XSD so feel free to try other schemas, i.e. "ftp://".

method

Currently, only the "GET" method is supported.

http-version

Both 1.0 and 1.1 versions are supported.

user-info

This attribute is enabled, however, not yet tested. Please provide feed back.

host

This is the most interesting attribute in this list. The default value here is "${ipaddr}" if not specified differently, will be the IP address of the node for this collected data. Changing this parameter allows you to request data from an HTTP server on a different node than the node for which data is collected.

  • SPECIAL NOTE: an enhancement will be made that allows the "${ipaddr} to be part of any of the "<url>" element's attributes.
port

Positive Integer representing the TCP port of the HTTP listener on the host.

path

This is there relative path of the URL. Until the "fragment" attribute is added, include the fragment in the path.

query-string

This is the "CGI" parameters and values if any are required as part of this URL. (Note SPECIAL NOTE above)

fragment

The URL fragment to an anchor on a page

The key <url> attribute is "matches". This is how you tell the HTTP Collector to extract values from the HTTP server's response and reference it for storage in an RRD. As you can see, the "<url>" element in the "document-counts" example above used many of these attributes' default values.

The next XML element in this configuration is the "<attributes>" element and it defines a collection of data points that will be stored in RRDs.

 <attributes>
   <attrib alias="documentCount" match-group="1" type="gauge32" />
 <attributes>

The attributes defined here will be written as node level data in the RRD Repository. If more than one IP address on a node is found to have the same collectable HTTP service defined, only on address will be scheduled for collection.

Defining HTTP Collection Services

Now the tricky part! When a service is provisioned by Provisiond, the "nodeGainedService" event is generated. The Collector listens for this event and checks the event to see if the new service is one defined in its configuration (see collectd-configuration.xml above) for data collection. In this document's example, the service defined for collection is called "HttpDocCount". The tricky part is defining the service in a requisition or with a detector such that Provisiond correctly detects the HTTP service providing the performance data. There are two basic HTTP data collection scenarios:

  • Collecting performance data for the node that gained this service
  1. directly from an HTTP server on the node itself
  2. indirectly from an HTTP server on a different node

Defining the Service

Just as all other services are defined in OpenNMS, the service that identifies HTTP data collection is defined either directly in a requisition or by configuring a service detector. Here are two possible ways to define the service that meet the requirements for this document's example:

  • Using the HttpDetector to discover the service with an HTTP transaction

Httpdoccount-http-plugin.png

  • Using the LoopDetector to force the service

Httpdoccount-loop-plugin.png

Using the HttpDetector configuration, the service will be created if there is an HTTP server listening and responding to the URL provided. The URL in this configuration is not nearly as flexible as the "<url>" element in the HTTP Collector's configuration, but this will be resolved soon.

Using the LoopDetector, the service is defined to be supported on a node having the interface with the IP Address of 209.61.128.9. (See Passive_Status_Keeper#Loop_Plugin_Method)

Both of these configurations will cause the nodeGainedService "HttpDocCount" to be published and the Collector will schedule it for collection.

Putting it all together

Provisiond generates Event:nodeGainedService:HttpDocCount
    |
    |
    ----> Collector schedules collection for HttpDocCount service
              |
              |
              ----> Scheduler calls "collect" method on specified interval
                        |
                        |
                        ----> HttpCollector requests URL/parses attributes/writes performance data


Summary

When configuring your OpenNMS server for HTTP data collection, remember that each "<uri>" element can contain multiple "<url>" elements and each "<url>" element can contain multiple performance attributes. So, one collection could reference multiple URLS can get many attributes each (whether is should reference multiple URLs is left for you to decide).

The HTTP is a very new service with respect to the lifetime of this vintage open source project. Please update this documentation with edits, feedback, and working examples.

--David 11:56, 26 October 2006 (EDT)

Examples

Weather Station

The HTTP collector can be used to gather performance data from devices that are easily instrumented via protocols like SNMP.

The Davis Vantage Pro weather station does not support SNMP, but it can publish data to a web page. The output could like something like this:

Weatheroutput.png

There is a lot of interesting numeric data that would lend itself for data collection, such as temperature, humidity and barometric pressure, so this example will walk through the steps needed to set up collection in OpenNMS.

First, a service must be created to identify this collection. Since this data only exists on one website, it is probably easiest just to provision the service directly in the requisition. First switch on free-form editing of service names at the top of the page:

Requisition-freeform-names.png

Then add the service directly onto the interface of the node where it should appear:

Requisition-httpweatherstation.png

Once this service has been added to the requisition and the requisition synchronized, the 10.1.1.1 interface will have the HttpWeatherStation service added to it.

The next step is to set up the collector. In collectd-configuration.xml the following is added:

    <package name="weather-station">
        <filter>IPADDR != '0.0.0.0'</filter>
        <include-range begin="1.1.1.1" end="254.254.254.254"/>

      <service name="HttpWeatherStation" interval="300000" user-defined="false" status="on" >
        <parameter key="collection" value="weather" />
        <parameter key="retry" value="1" />
        <parameter key="timeout" value="2000" />
      </service>
    </package>

This package will perform collection on any interface with the HttpWeatherStation service, which in this case was added in the requisition to be only on 10.1.1.1. If there were a number of weather stations on the network, this service could be added to multiple IPs. Remember that the service must be mapped to a class once in the collectd configuration file, so at the bottom add:

    <collector service="HttpWeatherStation" class-name="org.opennms.netmgt.collectd.HttpCollector" />

Notice that there is a parameter called "collection" with a value of "weather" defined for this collector. That is where the actual work of the collector is performed, and it is defined in http-datacollection-config.xml.

<http-datacollection-config  
    xmlns:http-dc="http://xmlns.opennms.org/xsd/config/http-datacollection" 
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
    xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/http-datacollection http://www.opennms.org/xsd/config/http-datacollection-config.xsd" 
    rrdRepository="/opt/opennms/share/rrd/snmp/" >
  <http-collection name="weather">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:2016</rra>
      <rra>RRA:AVERAGE:0.5:12:1488</rra>
      <rra>RRA:AVERAGE:0.5:288:366</rra>
      <rra>RRA:MAX:0.5:288:366</rra>
      <rra>RRA:MIN:0.5:288:366</rra>
    </rrd>
    <uris>
      <uri name="weather-station">
        <url path="/"
             virtual-host="www.example.com"
             matches="(?s).*?Temperature.*?3366FF&quot;&gt;([0-9\.]+).*?Humidity.*?3366FF&quot;&gt;([0-9]+)
                     .*?Barometer.*?3366FF&quot;&gt;([0-9\.]+).*?Wind\sChill.*?3366FF&quot;&gt;&lt;small&gt;([0-9\.]+)
                     .*?Heat\sIndex.*?3366FF&quot;&gt;&lt;small&gt;([0-9\.]+).*"
             response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="wsTemperature" match-group="1" type="gauge32"/>
          <attrib alias="wsHumidity"    match-group="2" type="gauge32"/>
          <attrib alias="wsBarometer"   match-group="3" type="gauge32"/>
          <attrib alias="wsWindChill"   match-group="4" type="gauge32"/>
          <attrib alias="wsHeatIndex"   match-group="5" type="gauge32"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>
</http-datacollection-config>

This looks pretty complex, but it is very similar to how SNMP data is loaded in datacollection-config.xml. First, the collection name is defined:

  <http-collection name="weather">

followed by the format of the RRDs:

    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:2016</rra>
      <rra>RRA:AVERAGE:0.5:12:1488</rra>
      <rra>RRA:AVERAGE:0.5:288:366</rra>
      <rra>RRA:MAX:0.5:288:366</rra>
      <rra>RRA:MIN:0.5:288:366</rra>
    </rrd>

This will store "as polled" 5 minute values for one week, hourly averages for 62 days, and min/max/average for each day for one year. This is the default for OpenNMS.

The real work is done in the <uri> object. Since the collector is based on the HTTP monitor class, any option that can be used with the monitor can be used with the collector, such as virtual-host, user-agent, etc.

In this case the virtual-host is necessary, as OpenNMS by default polls by IP address and not host name. To see if it is required, go to "http://[IP]" and see if it returns what is expected. If not, chances are a virtual-host name is required. The path value is also required. Since this weather data exists in the root of the web site, the path is set to "/".

The heart of the collection is in the "matches" attribute. In the example above it has been formatted to span more than one line, but in practice it needs to be all on one line to work.

This is where the source of the web page is examined in order to extract the data. If the data isn't in the source, it can't be collected, so such things as Flash-based web pages won't work in this fashion.

For the weather station the source looks like this:

<td bordercolor="#000000" style="border: thin solid rgb(0,0,0)" ><font face="verdana, Arial,
 Helvetica"><small><strong><font color=Brown>Temperature</font></strong><br></small></font></td>
 <td width="150" bordercolor="#000000" style="border: thin solid rgb(0,0,0)" ><font face="verdana, Arial,
 Helvetica"><strong><small><font color="#3366FF">82.0 F</font><br></small></strong></font>
</td></tr>
<tr>
 <td  bordercolor="#000000" style="border: thin solid rgb(0,0,0)" ><font face="verdana, Arial,
 Helvetica"><small><strong><font color=Brown>Humidity</font></strong><br></small></font></td>
 <td bordercolor="#000000" style="border: thin solid rgb(0,0,0)" ><font face="verdana, Arial,
 Helvetica"><strong><small><font color="#3366FF">35%</font><br></small></strong></font></td>
 </tr>

The regex to extract the data works as follows:

The initial "(?s)" is used to treat newlines as spaces, thus the query can span more than one line.

.*?Temperature.*?3366FF&quot;&gt;([0-9\.]+)

This looks for the first instance of the string "Temperature" that is followed at some point later by "3366FF", a quotation mark, and a greater than sign. Note that the special characters have to be written using HTML entities or else the XML with complain.

It then grabs (via the parentheses) all of the following characters that are either numbers or a period. Without the period (escaped with a backslash) only the whole number would be returned (i.e. 82 instead of 82.0). Then this is repeated for the next value requested off of this web page:

.*?Humidity.*?3366FF&quot;&gt;([0-9]+)

Since humidity is only reported in whole numbers, the period was left out.

This is repeated for all of the other variables. Note that the regex must actually match something in the source, or the collection will fail. For some reason the Davis Weatherlink software puts <small> before the value versus before the <font> tag on some variables, so the first attempt at this regex failed.

Finally, the values matched in the regex have to be associated with data source names:

        <attributes>
          <attrib alias="wsTemperature" match-group="1" type="gauge32"/>
          <attrib alias="wsHumidity"    match-group="2" type="gauge32"/>
          <attrib alias="wsBarometer"   match-group="3" type="gauge32"/>
          <attrib alias="wsWindChill"   match-group="4" type="gauge32"/>
          <attrib alias="wsHeatIndex"   match-group="5" type="gauge32"/>
        </attributes>

Each set of parentheses (except the first) is used to extract the value from the matching text. Note that all five of these values are retrieved with just one page access. OpenNMS does not need to fetch the page for each one.

So, once these files are modified and OpenNMS is restarted, the files "wsTemperature" etc. should appear in the "/opt/opennms/share/rrd/snmp/[nodeid]" directory for the nodeid that contains 10.1.1.1. If that is working the only thing left is to create the graphs. Modify snmp-graph.properties to include:

report.weather.temperature.name=Weather Station Temperatures
report.weather.temperature.columns=wsTemperature, wsHeatIndex, wsWindChill
report.weather.temperature.type=nodeSnmp
report.weather.temperature.command=--title="Temperature, Wind Chill and Heat Index" \
 --vertical-label "degrees F" \
 --units-exponent 0 \
 DEF:temp={rrd1}:wsTemperature:AVERAGE \
 DEF:heat={rrd2}:wsHeatIndex:AVERAGE \
 DEF:wind={rrd3}:wsWindChill:AVERAGE \
 LINE2:temp#000000:"Temperature  " \
 GPRINT:temp:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:temp:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:temp:MAX:"Max  \\: %8.2lf %s\\n" \
 LINE2:heat#A00000:"Heat Index   " \
 GPRINT:heat:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:heat:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:heat:MAX:"Max  \\: %8.2lf %s\\n" \
 LINE2:wind#0000A0:"Wind Chill   " \
 GPRINT:wind:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:wind:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:wind:MAX:"Max  \\: %8.2lf %s\\n" 

report.weather.humidity.name=Weather Station Humidity
report.weather.humidity.columns=wsHumidity
report.weather.humidity.type=nodeSnmp
report.weather.humidity.command=--title="Humidity" \
 --vertical-label percent \
 --units-exponent 0 \
 DEF:humidity={rrd1}:wsHumidity:AVERAGE \
 LINE2:humidity#0000A0:"Humidity   " \
 GPRINT:humidity:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:humidity:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:humidity:MAX:"Max  \\: %8.2lf %s\\n" 

(be sure to include weather.temperature and weather.humidity in the "reports=" list at the top of the file)

Which will result in the following graphs:

Temperature.png

Humidity.png

Sourceforge Stats

The OpenNMS Mailing Lists are hosted on Sourceforge and they provide a summary of the number of list subscribers and the number of messages posted. It is possible to extract this data using the HTTP Collector.

First, define the collection in http-datacollection-config.xml:

  <http-collection name="mailinglists">
    <rrd step="1800">
      <rra>RRA:AVERAGE:0.5:1:336</rra>
      <rra>RRA:AVERAGE:0.5:48:366</rra>
      <rra>RRA:MIN:0.5:48:366</rra>
      <rra>RRA:MAX:0.5:48:366</rra>
    </rrd>
    <uris>
      <uri name="MailingLists">
        <url path="/mail/?group_id=4141"
             virtual-host="sourceforge.net"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412" 
             matches="(?s).*?opennms-announce\sArchives&lt;/a&gt;\s([0-9]+).*?count:\s([0-9]+).*?opennms-cvs\sArchives&lt;/a&gt;\s([0-9]+).*?count:\s([0-9]+).*?opennms-devel\sArchives&lt;/a&gt;\s([0-9]+).*?count:\s([0-9]+).*?opennms-discuss\sArchives&lt;/a&gt;\s([0-9]+).*?count:\s([0-9]+).*?opennms-install\sArchives&lt;/a&gt;\s([0-9]+).*?count:\s([0-9]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="announcecnt"   match-group="1" type="gauge32"/>
          <attrib alias="announcesubs"  match-group="2" type="gauge32"/>
          <attrib alias="cvscnt"        match-group="3" type="gauge32"/>
          <attrib alias="cvssubs"       match-group="4" type="gauge32"/>
          <attrib alias="develcnt"      match-group="5" type="gauge32"/>
          <attrib alias="develsubs"     match-group="6" type="gauge32"/>
          <attrib alias="discusscnt"    match-group="7" type="gauge32"/>
          <attrib alias="discusssubs"   match-group="8" type="gauge32"/>
          <attrib alias="installcnt"    match-group="9" type="gauge32"/>
          <attrib alias="installsubs"   match-group="10" type="gauge32"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>

There are a few things to note here. Sourceforge doesn't update these stats very often. Since activity stats are only updated once a day, it is highly probable that these stats will only be generated once a day. However, in initial tests they were updated once in a week (this could be due to some server migrations going on at the moment). So there is really no need to query this every five minutes. In this example the site is queried once every 30 minutes:

    <rrd step="1800">
      <rra>RRA:AVERAGE:0.5:1:336</rra>
      <rra>RRA:AVERAGE:0.5:48:366</rra>
      <rra>RRA:MIN:0.5:48:366</rra>
      <rra>RRA:MAX:0.5:48:366</rra>
    </rrd>

The "step" value here is 1800 seconds, or 30 minutes. The another thing to note that the regular expression used only grabs data for five of the mailing lists - it could easily be modified to deal with all of them but these were the five most interesting.

Continuing on with the implementation, the next step is to have the collector reference this collection by editing collectd-configuration.xml:

    <package name="mailinglists">
        <filter>IPADDR != '0.0.0.0'</filter>
        <include-range begin="1.1.1.1" end="254.254.254.254"/>
        <service name="MailingLists" interval="1800000"
            user-defined="false" status="on">
            <parameter key="collection" value="mailinglists"/>
            <parameter key="retry" value="1"/>
            <parameter key="timeout" value="7000"/>
        </service>
    </package>

Restart OpenNMS since changes to the files you've just modified still require this as of the 1.12.6 release.

Note that the interval is 1800000ms, to match up with the 30 minutes polling cycle.

The final step involving file changes is to add the report to snmp-graph.properties:

report.mailinglists.subscribers.name=Sourceforge Mailing List Subscribers
report.mailinglists.subscribers.columns=announcesubs,cvssubs,develsubs,discusssubs,installsubs
report.mailinglists.subscribers.type=nodeSnmp
report.mailinglists.subscribers.command=--title="Number of Subscribers per List" \
DEF:number1={rrd1}:announcesubs:AVERAGE \
LINE2:number1#ff0000:"announce" \
GPRINT:number1:AVERAGE:" Avg \\: %8.2lf " \
GPRINT:number1:MIN:"Min \\: %8.0lf " \
GPRINT:number1:MAX:"Max \\: %8.0lf \\n" \
DEF:number2={rrd2}:cvssubs:AVERAGE \
LINE2:number2#00ff00:"cvs     " \
GPRINT:number2:AVERAGE:" Avg \\: %8.2lf " \
GPRINT:number2:MIN:"Min \\: %8.0lf " \
GPRINT:number2:MAX:"Max \\: %8.0lf \\n" \
DEF:number3={rrd3}:develsubs:AVERAGE \
LINE2:number3#0000ff:"devel   " \
GPRINT:number3:AVERAGE:" Avg \\: %8.2lf " \
GPRINT:number3:MIN:"Min \\: %8.0lf " \
GPRINT:number3:MAX:"Max \\: %8.0lf \\n" \
DEF:number4={rrd4}:discusssubs:AVERAGE \
LINE2:number4#ffff00:"discuss " \
GPRINT:number4:AVERAGE:" Avg \\: %8.2lf " \
GPRINT:number4:MIN:"Min \\: %8.0lf " \
GPRINT:number4:MAX:"Max \\: %8.0lf \\n" \
DEF:number5={rrd5}:installsubs:AVERAGE \
LINE2:number5#00ffff:"install " \
GPRINT:number5:AVERAGE:" Avg \\: %8.2lf " \
GPRINT:number5:MIN:"Min \\: %8.0lf " \
GPRINT:number5:MAX:"Max \\: %8.0lf \\n" 

report.mailinglists.messages.name=Sourceforge Mailing List Message Count
report.mailinglists.messages.columns=announcecnt,cvscnt,develcnt,discusscnt,installcnt
report.mailinglists.messages.type=nodeSnmp
report.mailinglists.messages.command=--title="Number of Messages per List" \
DEF:number1={rrd1}:announcecnt:AVERAGE \
LINE2:number1#ff0000:"announce" \
GPRINT:number1:MIN:"Min \\: %8.0lf " \
GPRINT:number1:MAX:"Max \\: %8.0lf \\n" \
DEF:number2={rrd2}:cvscnt:AVERAGE \
LINE2:number2#00ff00:"cvs     " \
GPRINT:number2:MIN:"Min \\: %8.0lf " \
GPRINT:number2:MAX:"Max \\: %8.0lf \\n" \
DEF:number3={rrd3}:develcnt:AVERAGE \
LINE2:number3#0000ff:"devel   " \
GPRINT:number3:MIN:"Min \\: %8.0lf " \
GPRINT:number3:MAX:"Max \\: %8.0lf \\n" \
DEF:number4={rrd4}:discusscnt:AVERAGE \
LINE2:number4#ffff00:"discuss " \
GPRINT:number4:MIN:"Min \\: %8.0lf " \
GPRINT:number4:MAX:"Max \\: %8.0lf \\n" \
DEF:number5={rrd5}:installcnt:AVERAGE \
LINE2:number5#00ffff:"install " \
GPRINT:number5:MIN:"Min \\: %8.0lf " \
GPRINT:number5:MAX:"Max \\: %8.0lf \\n" 

And remember to add these new reports to the reports= line at the top of the file:

<pre>
mailinglists.subscribers, mailinglists.messages

At this point the system is configured, but since the new mailing list service has not been associated with a node, nothing will happen.

Since Sourceforge gets real upset when you port scan them, it is best not to use the default discovery in order to associate this service with a node. The easiest thing to do is to use the Requisition Editor.

Since you'll be adding an entry for a service that has not yet been "born" into your instance of OpenNMS, you'll need to enable free-form editing of service names by clicking the link at the top of the requisition editor page:

Requisition-freeform-names.png

Once the current IP address for sourceforge.net is known, it is a simple matter to add a "sourceforge.net" node with this new collection service via the Admin -> Manage Provisioning Requisitions GUI:

MailingListCollector.png

Should the IP address ever change, simply change it via the GUI and re-import. All of the collected data will remain.

If all goes well, it should result in a graph like this:

MailingListOutput.png

Stock Prices

Since NASDAQ.com prints the current stock index values in HTML, it is possible to use the HTTP collector to mine them. The configuration is as follows:

http-datacollection-config.xml:

  <http-collection name="stocks">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:8928</rra>
      <rra>RRA:AVERAGE:0.5:12:8784</rra>
      <rra>RRA:MIN:0.5:12:8784</rra>
      <rra>RRA:MAX:0.5:12:8784</rra>
    </rrd>
    <uris>
      <uri name="Stocks">
        <url path="/"
             virtual-host="www.nasdaq.com"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412" 
             matches="(?s).*NASDAQ&lt;/a&gt;&lt;/td&gt;.*?right;&quot;&gt;([0-9\.]+).*DJIA&lt;/a&gt;&lt;/td&gt;.*?right;&quot;&gt;([0-9\.]+).*P 500.*?&lt;/a&gt;&lt;/td&gt;.*?right;&quot;&gt;([0-9\.]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="nasdaq" match-group="1" type="gauge32"/>
          <attrib alias="djia" match-group="2" type="gauge32"/>
          <attrib alias="sp500" match-group="3" type="gauge32"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>

If the above regular expression doesn't work, try the following:

   (?s).*storeIndexInfo..NASDAQ.,.([\d.]+).*storeIndexInfo..DJIA.,.([\d.]+).*storeIndexInfo..S.P 500.,.([\d.]+).*

collectd-configuration.xml:

   <package name="stocks">
       <filter>IPADDR != '0.0.0.0'</filter>
       <include-range begin="1.1.1.1" end="254.254.254.254"/>
       <service name="StockTicker" interval="300000"
           user-defined="false" status="on">
           <parameter key="collection" value="stocks"/>
           <parameter key="retry" value="1"/>
           <parameter key="timeout" value="7000"/>
       </service>
   </package>

snmp-graph.properties:

report.stocks.nasdaq.name=NASDAQ Stock Info from NASDAQ.com
report.stocks.nasdaq.columns=nasdaq
report.stocks.nasdaq.type=nodeSnmp
report.stocks.nasdaq.command=--title="NASDAQ Stock Index Value" \
 --height 200 \
 DEF:nasdaq={rrd1}:nasdaq:AVERAGE \
 LINE2:nasdaq#000000:"NASDAQ  " \
 GPRINT:nasdaq:AVERAGE:"Avg  \\: %8.2lf " \
 GPRINT:nasdaq:MIN:"Min  \\: %8.2lf " \
 GPRINT:nasdaq:MAX:"Max  \\: %8.2lf \\n" 

report.stocks.djia.name=DJIA Stock Info from NASDAQ.com
report.stocks.djia.columns=djia
report.stocks.djia.type=nodeSnmp
report.stocks.djia.command=--title="DJIA Stock Index Value" \
 --height 200 \
 DEF:djia={rrd1}:djia:AVERAGE \
 LINE2:djia#A00000:"DJIA    " \
 GPRINT:djia:AVERAGE:"Avg  \\: %8.2lf " \
 GPRINT:djia:MIN:"Min  \\: %8.2lf " \
 GPRINT:djia:MAX:"Max  \\: %8.2lf \\n" 

report.stocks.sp500.name=S&P 500 Stock Info from NASDAQ.com
report.stocks.sp500.columns=sp500
report.stocks.sp500.type=nodeSnmp
report.stocks.sp500.command=--title="S&P 500 Stock Index Value" \
 --height 200 \
 DEF:sp500={rrd1}:sp500:AVERAGE \
 LINE2:sp500#0000A0:"S&P 500 " \
 GPRINT:sp500:AVERAGE:"Avg  \\: %8.2lf " \
 GPRINT:sp500:MIN:"Min  \\: %8.2lf " \
 GPRINT:sp500:MAX:"Max  \\: %8.2lf \\n" 

Note: remember to add them to the reports= line at the top of the file:

reports=stocks.nasdaq, stocks.djia, stocks.sp500, \

Restart OpenNMS. Follow the instructions for the Sourceforge Mailing List example above to add the service to a "NASDAQ node" using the Requisition Editor. Don't forget that you must click the link at the top of the requisition editor page to enable free-form editing of service names:

Requisition-freeform-names.png

This will result in graphs like the following:

StockNasdaq.png

StockDJIA.png

StockSP500.png

eBay Auction

One of the exercises done in the OpenNMS Basic Training Class is to set up the HTTP collector to track the price of an eBay auction. The configuration is as follows:

http-datacollection-config.xml:

  <http-collection name="ebay">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:8928</rra>
      <rra>RRA:AVERAGE:0.5:12:8784</rra>
      <rra>RRA:MIN:0.5:12:8784</rra>
      <rra>RRA:MAX:0.5:12:8784</rra>
    </rrd>
    <uris>
      <uri name="ebay">
        <url path="/ws/eBayISAPI.dll?ViewItem&amp;item=120288459936"
             virtual-host="cgi.ebay.com"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412" 
             matches="(?s).*?Current bid.*?US.*?([.0-9]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="ebayprice" match-group="1" type="gauge32"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>

Note the "item=" bit in the path. While this references an auction for a Wii Fit, the URL will become invalid shortly after the auction ends. To change the auction to another item, simply replace the "120288459936" with the item number of the new auction.

collectd-configuration.xml:

    <package name="ebay">
        <filter>IPADDR != '0.0.0.0'</filter>
        <include-range begin="1.1.1.1" end="254.254.254.254"/>
        <service name="EbayAuction" interval="300000"
            user-defined="false" status="on">
            <parameter key="collection" value="ebay"/>
            <parameter key="retry" value="1"/>
            <parameter key="timeout" value="7000"/>
        </service>
    </package>

Remember to add the collector line at the bottom:


    <collector service="EbayAuction" class-name="org.opennms.netmgt.collectd.HttpCollector"/>

snmp-graph.properties:

report.class.ebay.name=This an Ebay Auction
report.class.ebay.columns=ebayprice
report.class.ebay.type=nodeSnmp
report.class.ebay.command=--title="The price of an eBay Auction" \
DEF:number={rrd1}:ebayprice:AVERAGE \
LINE2:number#ff0000:"Price" \
GPRINT:number:AVERAGE:" Avg \\: %8.2lf %s" \
GPRINT:number:MIN:"Min \\: %8.2lf %s" \
GPRINT:number:MAX:"Max \\: %8.2lf %s\\n"

Note: remember to add them to the reports= line at the top of the file:

reports=class.ebay, \

Restart OpenNMS. Follow the instructions for the Sourceforge Mailing List example above to add the service to a "eBay node" using the Requisition Editor. This will result in a graph like the following:

EBayWii.png

One can also generate threshold events when some bids on the auction. Set up something like this:

threshd-configuration.xml:

    <package name="ebay">
        <filter>IPADDR != '0.0.0.0'</filter>
        <include-range begin="1.1.1.1" end="255.255.255.255"/>
        <service name="EbayAuction" interval="300000"
            user-defined="false" status="on">
            <parameter key="thresholding-group" value="ebay"/>
        </service>
    </package>

And

thresholds.xml:

    <group name="ebay" rrdRepository="/opt/opennms/share/rrd/snmp/">
        <threshold type="absoluteChange" ds-type="node" value="0.01"
            rearm="0.0" trigger="1" filterOperator="or" ds-name="ebayprice"/>
    </group>

Amazon Item

Amazon is known to dynamically change the price of items it stocks (see this article). Here is a way to use OpenNMS to track a particular item.

First, the "Amazon" service needs to be discovered. One way is to use a provisioning requisition along with the loop detector in the appropriate foreign-source definition:

Amazonloop.png

(It's worth mentioning that the loop detector is sometimes also called the LoopDetector)

It is probably best to put this in a group with a minimal foreign source definition. You won't need to detect other services on that IP address and folks like Amazon might get upset if they see a port scan coming from your OpenNMS server.

Once the service is defined, we need to use the HTTP collector to actually get this information:

http-datacollection-config.xml:

  <http-collection name="Amazon">
    <rrd step="900">
      <rra>RRA:AVERAGE:0.5:1:2016</rra>
      <rra>RRA:AVERAGE:0.5:12:1488</rra>
      <rra>RRA:AVERAGE:0.5:288:366</rra>
      <rra>RRA:MAX:0.5:288:366</rra>
      <rra>RRA:MIN:0.5:288:366</rra>
    </rrd>
    <uris>
      <uri name="Amazon">
        <url path="/gp/product/B006U1VGNS" host="www.amazon.com"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412" 
             matches="(?s).*?buyingPrice.*?([0-9\.]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="amazon-price" match-group="1" type="gauge"/>
        </attributes>
      </uri>
    </uris>
  </http-collection>

A few things to note. First, the interval is set to 900 seconds or 15 minutes. The prices are dynamic but usually not that dynamic and again when monitoring one should be considerate of the monitored server. While five minute polls would most likely not cause a problem, why take that chance?

The string after the /gp/product in the path is the Amazon item number. While looking this item up in a browser would tend to include a lot more information in the URL, this is the minimum needed to access the page.

collectd-configuration.xml:

    <package name="Amazon">
        <filter>IPADDR != '0.0.0.0'</filter>
        <include-range begin="1.1.1.1" end="254.254.254.254"/>
        <include-range begin="::1" end="ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff"/>
        <service name="Amazon" interval="900000" user-defined="false" status="on">
            <parameter key="collection" value="Amazon"/>
        </service>
    </package>

Note the interval and remember to add the collector line at the bottom:


    <collector service="Amazon" class-name="org.opennms.netmgt.collectd.HttpCollector"/>

snmp-graph.properties:

report.amazon.name=Amazon Auction Report
report.amazon.columns=amazon-price
report.amazon.type=nodeSnmp
report.amazon.command=--title="Item Price" \
DEF:price={rrd1}:amazon-price:AVERAGE \
LINE2:price#00A000:"Price" \
GPRINT:price:AVERAGE:" Avg \\: $%5.2lf " \
GPRINT:price:MIN:"Min \\: $%5.2lf " \
GPRINT:price:MAX:"Max \\: $%5.2lf \\n"

Note: remember to add them to the reports= line at the top of the file:

reports=amazon, \


This will result in a graph like the following:

Amazonprice.png

One can also generate threshold events when the price changes. Set up something like this:

threshd-configuration.xml:

    <package name="ebay">
        <filter>IPADDR != '0.0.0.0'</filter>
        <include-range begin="1.1.1.1" end="255.255.255.255"/>
        <service name="Amazon" interval="300000"
            user-defined="false" status="on">
            <parameter key="thresholding-group" value="amazon"/>
        </service>
    </package>

And

thresholds.xml:


    <group name="amazon" rrdRepository="/opt/opennms/share/rrd/snmp/">
        <threshold description="Tracking an Amazon Price"
            type="absoluteChange" ds-type="node" value="0.01"
            rearm="0.0" trigger="1"
            triggeredUEI="www.opennms.org/thresholds/amazonPriceChange"
            filterOperator="or" ds-name="amazon-price"/>
    </group>