Monitoring Apache with the HTTP collector
Subscribe

From OpenNMS

Revision as of 12:05, 31 August 2011 by Depinski (Talk | contribs)

Jump to: navigation, search

Building on David's excellent work with the HTTP collector, here's a working example that can collect data on idle and busy workers from Apache web servers running mod_status.

Apache configuration

Inside your apache config file find the line that maches below and remove the comment.

ExtendedStatus On

Same as above find the entry that looks like this, remove the deny line and modify the Allow to say all, this will let any source IP call the status page, obviously you will want to lock this down but to get up and running this is a good place to start.

<Location /server-status>
    SetHandler server-status
    Order deny,allow
    Allow from all
</Location>

You will need to restart the apache web server for the changes to take effect.

mod_status output

Apache's mod_status produces a "machine readable" output something like this:

Total Accesses: 2750703
Total kBytes: 17770979
CPULoad: .750477
Uptime: 34582
ReqPerSec: 79.5415
BytesPerSec: 526213
BytesPerReq: 6615.58
BusyWorkers: 16
IdleWorkers: 59
Scoreboard: ___R_____W_____W______W__.......................................__K__W_KW_R_________WK___

Some of these fields look useful. Total Accesses and Total kBytes are counters that are incremented throughout the lifetime of the server process. We could collect those and use them to provide request/s and kByte/s throughput data, but the counters will get reset to zero every time the server is restarted. If we store these in rrda's of type COUNTER, there will be large spikes in the data as rrdtool or jrobin tries to deal with the fact that the counter has reset to zero. This is probably too much work to deal with. Fortunately the mod_status output does this calculation for us anyway with the ReqPerSec and BytesPerSec field. By collecting these as type GAUGE, we can avoid the counter reset issue. I'm going to develop a collection for BytesPerSec, BusyWorkers and IdleWorkers. We could also reasonably collect and graph ReqPerSec and BytesPerReq, which are both interesting metrics.

We can use the HTTP collector's regular expression capabilities to collect these metrics and tuck them away in RRDs for graphing and thresholding.

Configuring service discovery

First step is to define a service that we can use to identify nodes that support mod_status.

I assumed that any machine that responds with an HTTP 200 response code when asked for /server-status/ on port 80 is running mod_status. Here's my capds-configuration.xml

   <protocol-plugin protocol="Apache-Stats" class-name="org.opennms.netmgt.capsd.plugins.HttpPlugin"  scan="on" user-defined="false">
       <property key="port" value="80" />
       <property key="timeout" value="3000" />
       <property key="retry" value="2" />
       <property key="url" value="/server-status/?auto" />
   </protocol-plugin>

This defines a new service, "Apache-Stats". The name is not critical, but is does need to be consistent. I did not define a poller monitor for the service, all I needed to do is to discover the service so that I could use the name later on in my data collection configuration.

At this point, I bounced OpenNMS and rescanned a node that I knew offered the service to ensure that the service could be discovered. Sure enough the service showed up as "Not Monitored" in the appropriate node view.

data collection

It is important to be aware that in OPENNMS_1.3.2_RELEASE, http collection operates at a node level. There can therefore only be _one_ instance of an individual http collection per node. As the HTTP collector notes state, if more that one IP address on a node is found to have the same collectable HTTP service defined, only one address will be scheduled for collection.

There are two files to be configured here, collectd-configuration.xml and the new http-datacollection-config.xml file.

collectd-configuration.xml

I defined this service in the default "example1" package, but you could put it in any appropriate package that included the nodes that you wish to collect on:

   <service name="Apache-Stats" interval="300000" user-defined="false" status="on" >
     <parameter key="http-collection" value="apache-stats" />
     <parameter key="retry" value="1" />
     <parameter key="timeout" value="2000" />
   </service>

     <parameter key="url" value="/server-status"/>
     <parameter key="response-text" value="~.*Total.*"/>
     <parameter key="response" value="200-202,299"/>

Note that the service name must match the service name in capsd-configuration.xml. The http-collection parameter value is used later on in http-datacollection-config.xml.

Further down the file, outside of the package definitions, I added a service to class mapping for the service:

       <collector service="Apache-Stats" class-name="org.opennms.netmgt.collectd.HttpCollector" />

httpd-datacollection-config.xml

I'm including the whole http-datacollection-config.xml file here. The important thing to note is that the http-collection name must match the http-collection value in collectd-configuration.xml (in this case Apache-Stats). I also removed the existing "doc-count" collection from http-datacollection-config.xml as there was no correctponding doc-count collection in my collectd-configuration.xml.

<?xml version="1.0" encoding="UTF-8"?>
 <http-datacollection-config
    xmlns:http-dc="http://xmlns.opennms.org/xsd/config/http-datacollection"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/http-datacollection
      http://www.opennms.org/xsd/config/http-datacollection-config.xsd"
    rrdRepository="/opt/OpenNMS/share/rrd/snmp/" >
  <http-collection name="Apache-Stats">
    <rrd step="300">
      <rra>RRA:AVERAGE:0.5:1:8928</rra>
      <rra>RRA:AVERAGE:0.5:12:8784</rra>
      <rra>RRA:MIN:0.5:12:8784</rra>
      <rra>RRA:MAX:0.5:12:8784</rra>
    </rrd>
    <uris>
      <uri name="apache">
        <url path="/server-status/" query="auto"
             user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412"
             matches="(?s).*BytesPerSec:\s([0-9]+).*BusyWorkers:\s([0-9]+).*IdleWorkers:\s([0-9]+).*" response-range="100-399" >
        </url>
        <attributes>
          <attrib alias="BytesPerSec" match-group="1" type="gauge32"/>
          <attrib alias="BusyWorkers" match-group="2" type="gauge32"/>
          <attrib alias="IdleWorkers" match-group="3" type="gauge32"/>
        </attributes>
      </uri>
    </uris>
   </http-collection>
 </http-datacollection-config>

The donkey work here is done by the uri element.

  • path="/server-status/" query="auto" tells the collector which URL to request (note that query="auto" just adds a "?auto" query parameter, making the mod_status output machine readable).
  • matches="(?s).*BytesPerSec:\s([0-9]+).*BusyWorkers:\s([0-9]+).*IdleWorkers:\s([0-9]+).*" uses back references to store the numbers following the words "BytesPerSec", "BusyWorkers" and "IdleWorkers".
  • The attrib elements store the values in the "BytesPerSec", "BusyWorkers" and "IdleWorkers" back references into their respective RRDs with type gauge32.

Note that the (?s) at the beginnig of the regular expression is a "Mode Modifier". It plays an important role in setting the regular expression pattern to "Dot-matches-all" mode. This allows a dot to match a new line as well as any other character. This is required as (as you can see from the example at the top of the page), the machine-readable output is spread across several lines. After this, I restarted OpenNMS again to see that collection was taking place (and RRDs appearing).

Drawing the Graphs

HTTP collector output is at node level, and snmp-graph.properties needs to be configured accordingly. At this point (OpenNMS_1.3.2_RELEASE) the collector can only collect a single instance of an HTTP collection per node. The rrd's are therefore stored at node level within the rrd directory ($OPENNMS_HOME/share/rrd/snmp/<node_id>. This means that the graphs will be shown in the SNMP Node Data -> Node level performance data section of the Node's resource graphs page. This also means that the report type needs to be defined as nodeSnmp under the report section of snmp-graph.properties (see below).

Add the (to be defined) report to the list of reports at the top of the file:

apache.workers

Add the report definition to the bottom of the file:

report.apache.workers.name=Apache HTTP Workers
report.apache.workers.columns=BusyWorkers,IdleWorkers
report.apache.workers.type=nodeSnmp
report.apache.workers.command=--title="Apache HTTP workers" \
    --vertical-label workers \
    DEF:BusyWorkers={rrd1}:BusyWorkers:AVERAGE \
    DEF:IdleWorkers={rrd2}:IdleWorkers:AVERAGE \
    LINE2:BusyWorkers#ff0000:"busy workers " \
    GPRINT:BusyWorkers:AVERAGE:"Avg  \\: %8.2lf %s" \
    GPRINT:BusyWorkers:MIN:"Min  \\: %8.2lf %s" \
    GPRINT:BusyWorkers:MAX:"Max  \\: %8.2lf %s\\n" \
    LINE2:IdleWorkers#00ff00:"idle workers " \
    GPRINT:IdleWorkers:AVERAGE:"Avg  \\: %8.2lf %s" \
    GPRINT:IdleWorkers:MIN:"Min  \\: %8.2lf %s" \
    GPRINT:IdleWorkers:MAX:"Max  \\: %8.2lf %s\\n"

Note that drawing the BytesPerSec graph is left as an exercise for the reader.

Wait a while and then enjoy your new graphs:

apache worker statistics

httpd-datacollection-config.xml Full

I (Okasgtr) modified your original collecd and capsd to stop false collections of servers that respond to a web server discovery but don't have server-status, in addition to the above changes the below code will graph and collect all the data from the page for those looking for a cut and paste solution. Added Values to the above are color coded in red.

 <?xml version="1.0" encoding="UTF-8"?>
 <http-datacollection-config
    xmlns:http-dc="http://xmlns.opennms.org/xsd/config/http-datacollection"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://xmlns.opennms.org/xsd/config/http-datacollection http://www.opennms.org/xsd/config/http-datacollection-config.xsd"
    rrdRepository="/opt/opennms/share/rrd/snmp/" >
 <http-collection name="Apache-Stats">
   <rrd step="300">
     <rra>RRA:AVERAGE:0.5:1:8928</rra>
     <rra>RRA:AVERAGE:0.5:12:8784</rra>
     <rra>RRA:MIN:0.5:12:8784</rra>
     <rra>RRA:MAX:0.5:12:8784</rra>
   </rrd>
   <uris>
     <uri name="apache">
       <url path="/server-status/" query="auto"
            user-agent="Mozilla/5.0 (Macintosh; U; PPC Mac OS X; en) AppleWebKit/412 (KHTML, like Gecko) Safari/412"
            matches="(?s).*?Total\sAccesses:\s([0-9]+).*?Total\skBytes:\s([0-9]+).*?CPULoad:\s([0-9\.]+).*?Uptime:\s([0-9]+).*?ReqPerSec:\s([0-9\.]+).*?BytesPerSec:\s([0-9\.]+).*?BytesPerReq:\s([0-9\.]+).*?BusyWorkers:\s([0-9]+).*?IdleWorkers:\s([0-9]+).*" response-range="100-399" >
       </url>
       <attributes>
         <attrib alias="TotalAccesses" match-group="1" type="counter32"/>
         <attrib alias="TotalkBytes" match-group="2" type="counter32"/>
         <attrib alias="CPULoad" match-group="3" type="gauge32"/>
         <attrib alias="Uptime" match-group="4" type="gauge32"/>
         <attrib alias="ReqPerSec" match-group="5" type="gauge32"/>
         <attrib alias="BytesPerSec" match-group="6" type="gauge32"/>
         <attrib alias="BytesPerReq" match-group="7" type="gauge32"/>
         <attrib alias="BusyWorkers" match-group="8" type="gauge32"/>
         <attrib alias="IdleWorkers" match-group="9" type="gauge32"/>
       </attributes>
     </uri>
   </uris>
  </http-collection>
 </http-datacollection-config>

Drawing the Graphs Full

apache.workers, apache.bytes, apache.uptime, apache.cpu, apache.access, apache.kbytes, apache.byteperreq, apache.reqpersec
report.apache.workers.name=Apache HTTP Workers
report.apache.workers.columns=BusyWorkers,IdleWorkers
report.apache.workers.type=nodeSnmp
report.apache.workers.command=--title="Apache HTTP workers" \
 --vertical-label workers \
 DEF:BusyWorkers={rrd1}:BusyWorkers:AVERAGE \
 DEF:IdleWorkers={rrd2}:IdleWorkers:AVERAGE \
 LINE2:BusyWorkers#ff0000:"busy workers " \
 GPRINT:BusyWorkers:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:BusyWorkers:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:BusyWorkers:MAX:"Max  \\: %8.2lf %s\\n" \
 LINE2:IdleWorkers#00ff00:"idle workers " \
 GPRINT:IdleWorkers:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:IdleWorkers:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:IdleWorkers:MAX:"Max  \\: %8.2lf %s\\n"

report.apache.bytes.name=Apache Bytes Per Second
report.apache.bytes.columns=BytesPerSec
report.apache.bytes.type=nodeSnmp
report.apache.bytes.command=--title="Apache HTTP Bytes Per Second" \
 --vertical-label Bytes \
 DEF:BytesPerSec={rrd1}:BytesPerSec:AVERAGE \
 AREA:BytesPerSec#66CCFF: \
 LINE1:BytesPerSec#000000:"Bytes per second " \
 GPRINT:BytesPerSec:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:BytesPerSec:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:BytesPerSec:MAX:"Max  \\: %8.2lf %s\\n"

report.apache.uptime.name=Apache Uptime
report.apache.uptime.columns=Uptime
report.apache.uptime.type=nodeSnmp
report.apache.uptime.command=--title="Apache HTTP Uptime" \
 --vertical-label UpTime \
 --units-exponent 0 \
 DEF:Uptime={rrd1}:Uptime:AVERAGE \
 CDEF:timesec=Uptime,1,* \
 CDEF:timemin=timesec,60,/ \
 CDEF:timehour=timemin,60,/ \
 CDEF:timeday=timehour,24,/ \
 AREA:timehour#CC99FF: \
 LINE1:timehour#000000:"Hours" \
 GPRINT:timehour:MIN:"Min  \\: %8.2lf" \
 GPRINT:timehour:MAX:"Max  \\: %8.2lf\\n" \
 AREA:timeday#33FF00: \
 LINE1:timeday#33FF00:"Days" \
 GPRINT:timeday:MIN:"Min  \\: %8.2lf" \
 GPRINT:timeday:MAX:"Max  \\: %8.2lf\\n"

report.apache.cpu.name=Apache Cpu Load
report.apache.cpu.columns=CPULoad
report.apache.cpu.type=nodeSnmp
report.apache.cpu.command=--title="Apache Cpu Load" \
 --vertical-label Load \
 DEF:CPULoad={rrd1}:CPULoad:AVERAGE \
 AREA:CPULoad#999999: \
 LINE1:CPULoad#000000:"Load" \
 GPRINT:CPULoad:AVERAGE:"Avg  \\: %8.2lf%%" \
 GPRINT:CPULoad:MIN:"Min  \\: %8.2lf%%" \
 GPRINT:CPULoad:MAX:"Max  \\: %8.2lf%%\\n"

report.apache.access.name=Apache Accesses
report.apache.access.columns=TotalAccesses
report.apache.access.type=nodeSnmp
report.apache.access.command=--title="Apache Total Accesses" \
 --vertical-label Number \
 DEF:TotalAccesses={rrd1}:TotalAccesses:AVERAGE \
 AREA:TotalAccesses#FF6600: \
 LINE1:TotalAccesses#000000:"Total Accesses" \
 GPRINT:TotalAccesses:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:TotalAccesses:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:TotalAccesses:MAX:"Max  \\: %8.2lf %s\\n"

report.apache.kbytes.name=Apache Total kBytes
report.apache.kbytes.columns=TotalkBytes
report.apache.kbytes.type=nodeSnmp
report.apache.kbytes.command=--title="Apache Total kBytes" \
 --vertical-label kBytes \
 DEF:TotalkBytes={rrd1}:TotalkBytes:AVERAGE \
 AREA:TotalkBytes#00cc00: \
 LINE1:TotalkBytes#000000:"Total kBytes" \
 GPRINT:TotalkBytes:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:TotalkBytes:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:TotalkBytes:MAX:"Max  \\: %8.2lf %s\\n"

report.apache.byteperreq.name=Apache Bytes Per Request
report.apache.byteperreq.columns=BytesPerReq
report.apache.byteperreq.type=nodeSnmp
report.apache.byteperreq.command=--title="Apache Bytes Per Request" \
 --vertical-label Bytes \
 DEF:BytesPerReq={rrd1}:BytesPerReq:AVERAGE \
 AREA:BytesPerReq#9999CC: \
 LINE1:BytesPerReq#000000:"Bytes Per Request" \
 GPRINT:BytesPerReq:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:BytesPerReq:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:BytesPerReq:MAX:"Max  \\: %8.2lf %s\\n"

report.apache.reqpersec.name=Apache Requests Per Second
report.apache.reqpersec.columns=ReqPerSec
report.apache.reqpersec.type=nodeSnmp
report.apache.reqpersec.command=--title="Apache Requests Per Second" \
 --vertical-label Requests \
 DEF:ReqPerSec={rrd1}:ReqPerSec:AVERAGE \
 AREA:ReqPerSec#009999: \
 LINE1:ReqPerSec#000000:"Requests Per Second" \
 GPRINT:ReqPerSec:AVERAGE:"Avg  \\: %8.2lf %s" \
 GPRINT:ReqPerSec:MIN:"Min  \\: %8.2lf %s" \
 GPRINT:ReqPerSec:MAX:"Max  \\: %8.2lf %s\\n"