- If you design a new OpenNMS system carefully read the hardware considerations below.
- If you have already a running system, you might still find some possibilities to get nearer to the design described below
- Disk I/O and system memory are the points you should look at
- Remember that a 64-bit CPU is required in order for a single process to address more than about 2GB of memory, even with a PAE-aware kernel.
- There are some parameters regarding the filesystems for database and collected data to tune
- System's shared memory pool might need increasing for the database
- If you have 64-bit hardware, be sure to install a 64-bit operating system in order to address more than 4GB of physical memory
- Very important part as there are a lot of parameters to tune. Newer releases (8.4 and later) tend to be configured much more sanely by default compared to older ones.
Java virtual machine
- Heap space, permanent generation size, and garbage collection
- here you can generate a lot of data so carefully design what you really need
- data collection
- data storage and consolidation
If at all possible, use a server with a 64-bit CPU as this will enable the CPU to address more than 4GB of physical memory. Remember that even with a PAE-aware kernel / operating system, most 32-bit OSes don't allow a given process to address more than about 2GB of memory.
Probably the biggest performance improvement on systems that are collecting a lot of RRD data is to move PostgreSQL and Tomcat to a separate system from OpenNMS daemons! Huge difference.
On a server with hardware RAID, consider investing in a battery-backed write cache. On a HP DL380 G4, the I/O wait of the server dropped from an average of 15% to almost nil with the addition of a 128 MB BBWC. Additionally, ensure that you have ample memory on the system, on a HP G4 - single processor 4 Gigs of memory monitoring about 300 devices with 700 interfaces, our I/O wait time steadily began to climb. The CPU wait time was obsessively hogging all of the processor, making OpenNMS crawl, we resolved this by upping our memory to 12 Gigs of memory, which in turn brought the wait time back down to 1%.
For a small collection of monitored nodes, moving the RRD data area into a tmpfs / RAM drive may also alleviate the I/O wait caused by all of the writing required by the RRD data. The trade-off is that a server crash or power-down will cause the RRD files to be lost, unless you implement a sync tool to sync the RAM drive to a disk backup.
Because OpenNMS is well-equipped for gathering and recording details regarding network and systems performance and behavior, it tends to be a write-heavy application. If your environment offers a very large number of data points to be managed, it would serve you well to ensure that a large degree of spindle separation exists. In particular and where possible, ensure that:
- OpenNMS SNMP Collection
- OpenNMS Response Time Collection
- OpenNMS (and system) logging
- PostgreSQL Database
- PostgreSQL Writeahead logging
..occur on separate spindles, and in some cases separate drives or separate devices. Further, in a *Nix environment, it may behoove you to ensure that the RRD's end up on different mounts, so one has the option of mounting with the noatime directive without compromising other aspects of the system configuration.
The defaults for the opennms directories mentioned above are
/opt/opennms/share/rrd/snmp /opt/opennms/share/rrd/response /opt/opennms/logs or /var/log/opennms
but watch out for symbolic links!
The defaults for the postgresql directories mentioned above are
but note these may change slightly depending on the distro.
As a filesystem, the best performance is achieved with XFS. EXT(2,3) have built-in limitations in the number of file descriptors per directory and cannot be used on larger installations.
The data storage is the critical factor, hence the capacity of the storage must match the size of the installation: Best performance is achieved with SAN's (FibreChannel + Netapp or EMC or ..). The important point is that the IO Queue is kept on the "other" device and not on the OpenNMS Server.
Recently good results for smaller systems have been reported with SSD Drives.
To tell if you have a bottle neck with your disk you can use a couple of quick things. In "top" you can look for the waiting CPU percentage. For instance in top you hit "1" to break out all of the individual cores/CPU's and see that one of the CPU's has 100% wait. This could be from the swap file or any of the directories listed above.
The "nmon" program can show more detailed information. You will be able to see what spindles are being used when and how much read it has versus writes.
Memory-backed File Systems
One option, if your server has a lot of RAM, is to modify the OpenNMS startup scripts to maintain a memory-backed file system, combined with automatic backups and restores that handle any internally decided risk levels/SLAs. In Linux, this would be a tmpfs file system.
# XXX Custom code herein for dealing with memory drives mount | grep -q rrd if [ $? -ne 0 ]; then # RRD location is not present, create it and # unpack our data. mount -t tmpfs -o size=2G,nr_inodes=200k,mode=0700 tmpfs /opt/opennms/share/rrd cd / tar xf /mnt/db-backup/opennms-rrd.tar fi # XXX End custom code
This modification to /opt/opennms/bin/opennms is matched with a crontab entry that generates the opennms-rrd.tar file periodically.
In-the-field: On a DL380 G4, with 6 GB of RAM, 2 GB of RAM was allocated to a memory-backed file system. This reduced the disk I/O load (one shared RAID-10 for Postgres, OS and JRBs; with battery-backed cache) from 300 IOPS to 10 IOPS, along with a correlated drop in load average and response times for the OpenNMS UI.
N.B. In Linux, a tmpfs file system will go to swap if memory pressure demands real memory for applications. This can have a very negative effect on the I/O load and system performance.
- Do run a 64-bit kernel so that OpenNMS will be able to address more than 2GB of memory.
- Don't run in a VM.
- Don't put DB or RRD data on file systems managed by LVM.
- Don't put DB or RRD data on file systems on RAID-5.
- Do put OpenNMS logs and RRDs and PostgreSQL data on separate spindles or separate RAID sets. Read details for postgres and RRD below.
- Do run on a modern kernel. Linux 2.6 and later as well as Solaris 10 or newer are good. Stay away from Linux 2.4, in particular.
- Set noatime mount flag on filesystems hosting data for #4 above.
- Adapt the systems shared memory to the database, see Performance tuning#PostgreSQL and system's shared memory
- Solaris 10 systems may require increasing ICMP buffer size if polling large numbers of systems (ndd -set /dev/icmp icmp_max_buf 2097152). Use 'netstat -s -p | grep ICMP' and check the value of 'icmpInOverflows' to determine if you're overflowing the ICMP buffer.
The default shared_buffers parameter in postgresql.conf is extremely conservative, and in most cases with modern servers, this can be significantly tweaked for a big performance boost, and drop in I/O wait time. This change will need to be in-line with kernel parameter changes to shmmax. See Postgres Wiki tuning page and this PostgreSQL performance page for recommendations on this and other postgresql settings.
If you want to put PostgreSQL on a different box then you want to change the SQL host look in opennms-datasources.xml. The PostgreSQL server will also need iplike installed and configured.
To clean up extra events out of the database try this Event_Configuration_How-To#The_Database
PostgreSQL 8.1 and later
These changes to postgresql.conf will probably improve your DB performance if you have a enough RAM (about 2GB installed RAM for a dedicated server) to support the changes. (YMMV) You'll probably need to make adjustments to the shmmax kernel attribute on your system.
shared_buffers = 20000 work_mem = 16348 maintenance_work_mem = 65536 vacuum_cost_delay = 50 checkpoint_segments = 20 checkpoint_timeout = 900 wal_buffers = 64 stats_start_collector = on stats_row_level = on autovacuum = on
I've also set these higher values on *bigger* systems:
wal_buffers = 256 work_mem = 32768 maintenance_work_mem = 524288
On postgres 8.3 systems they've changed the format to allow you to specify amounts as memory allocations instead of number of blocks. Here are the equivalents:
shared_buffers = 164MB work_mem = 16MB maintenance_work_mem = 64MB vacuum_cost_delay = 50 checkpoint_segments = 20 checkpoint_timeout = 15min wal_buffers = 256kB stats_start_collector = on stats_row_level = on autovacuum = on
If you need the bigger values for larger systems here they are:
wal_buffers = 2048kB work_mem = 32MB maintenance_work_mem = 512MB
Systems with lots of RAM and PostgreSQL 8.2
Recently, we've found that changing the max_fsm_pages and max_fsm_releations 10 fold on systems with plenty of memory (4G+), improves performance dramatically.
#max_fsm_pages = 204800 # min max_fsm_relations*16, 6 bytes each max_fsm_pages = 2048000 #max_fsm_relations = 1000 # min 100, ~70 bytes each max_fsm_relations = 10000
(Note that the free space map has been reimplemented in PostgreSQL 8.4 and is now self-maintaining, so the max_fsm_* settings above are not necessary if you're running PostgreSQL 8.4.1 or later - note that 8.4.0 is not supported due to a nasty bug.)
As well as really bumping these:
work_mem = 100MB maintenance_work_mem = 128MB
Note: To make adjustments to shmmax, do the following:
Start postgresql from the command line:
sudo -u postgres pg_ctl -D /var/lib/pgsql/data start
(adjusting paths as necessary) and look at the error message:
# FATAL: could not create shared memory segment: Invalid argument DETAIL: Failed system call was shmget(key=5432001, size=170639360, 03600). HINT: This error usually means that PostgreSQL's request for a shared memory segment exceeded your kernel's SHMMAX parameter. You can either reduce the request size or reconfigure the kernel with larger SHMMAX. To reduce the request size (currently 170639360 bytes), reduce PostgreSQL's shared_buffers parameter (currently 20000) and/or its max_connections parameter (currently 100).
Notice the value of "size".
Then up the value of shmmax:
sysctl -w kernel.shmmax=170639360
And restart postgresql (using the normal method such as "service postgresql start")
Finally, edit /etc/sysctl and add the line
so it will survive a reboot.
If your OpenNMS system tend to have long response times and has
- no disk I/O-waits
- a lot of CPU idle time
then try to increase your operating systems shared memory (and that of postgres) as described above. The values written above are the absolut minimum values. Increasing the systems shared memory may greatly boost OpenNMS performance as it will speed up the communication between OpenNMS and the database. Try different values for the system's shared memory, even up to 10 times or more of the minimum value as described above. For further details see the links to postgres Wiki doku mentioned above.
PostgreSQL *any* Version
One additional configuration that seems to make a tremendous amount of peformance improvement is having the write-head logs on a separate spindle (even better a separate disk controller/channel). The way to do this is:
- shutdown opennms / tomcat
- shutdown postgresql
- cd to $PG_DATA
- mv pg_xlog <file system on different spindle>
- ln -s <file system on different spindle>/pg_xlog pg_xlog
- restart postgresql
Make sure postgres data and write-ahead logs do not live on a RAID-5 disk subsystem.
iplike stored procedure
See the documentation in iplike to be sure you have the best version of iplike running
postgres and disk I/O waits
Standard postgres configuration writes transactions to the disk before comitting them. If there are I/O-problems (waitstates) database transactions suffer, high application responsetimes are the result. On test machines -running most times on inappropriate hardware- synchronous writes may be disabled. In case of a system crash database inconsistencies may result, requesting rollback of the transaction log etc.. For test systems this is normally no problem.
Try with following configuration changes in
postgresql.conf on postgres 8.3 (or newer):
fsync = off synchronous_commit = on commit_delay = 1000
find problems due to long-running queries
If there is a reasonable suspicion that some queries are running for a very long time edit the postgresql.conf and change the parameter (PostgreSQL up to 8.3)
This will log all queries running for more than 1000 ms to postgresql.log.
After this change a stop/start of opennms and postgres is required. Don't forget to remove this configuration after debugging is finished.
Probably you will find that most times "bad database response time" is not due to a single query running for a long time but due to thousands of queries running for a very short time.
optimization for a lot of small queries
If anybody knows how to optimize PostgreSQL / OpenNMS for this please add it here! There are parameters like
$OPENNMS_HOME/etc/c3p0.properties which might help here.
Java Virtual Machine (JVM)
The following phaenomena of opennms are typical for running low on memory in the java virtual machine:
- long response times
- garbage collection is running very often and takes a lot of time (see below)
- alarms that should have been cleared automatically are still listed as alarms
Tuning heap size
Enable extensive garbage collection logging (see below) to see the behaviour looking at output.log. If garbage collections regularly take a lot of time (0.5 seconds is an empirical threshold) or are running very often (more than every 10-20 seconds) the java heap size should be increased. If it's running every 10 seconds and takes 9 seconds the system is stuck...
Parameters for tuning java may be added in $OPENNMS_HOME/etc/opennms.conf. If that file doesn't already exist, check in $OPENNMS_HOME/etc/examples/opennms.conf for a template.
The most important parameter is the java heap size
The default value is 256 which is sufficient only for test cases with one to five managed devices.
You can roughly test performance improvement opening the event list from opennms, adding ?limit=250 to the url and pressing Return
Now there should be 250 events in your list. Press F5 (at least with Firefox and IE this is the Reload-Page button) and stop the time until the page finished to refresh. Repeat this several times to get a good mean value. Now stop opennms, change the heap size as described above, restart opennms and wait for about 10 minutes to let it settle down after starting. Repeat the measurements then increase the heap size again as described above. You will get a table like
heap refresh time 1536 5-7 sec. 2048 3-4 sec. 3072 1-2 sec.
Watch out for memory and swap on your system (by example using top) and decide which value to keep in the config file.
To speed up the start phase of the java virtual machine you might want to add
though speeding up the startup time in most cases is not a big problem and the parameter sometimes doesn't help at all.
Tuning the maximum Permanent Generation size
If you're seeing messages in your logs containing a mention of:
java.lang.OutOfMemoryError: PermGen space
Then you probably need to allocate more memory to the garbage collector's permanent generation. This section of JVM memory is allocated separately from the heap, and its default maximum size varies according to the platform on which the JVM is running. The OpenNMS 1.8 start script on UNIX and Linux platforms sets the maximum size to 128MB, but you can adjust this value in $OPENNMS_HOME/etc/opennms.conf. For example:
Tuning garbage collection
If you have a system with a lot of cores and threads like sun's niagara cpu you might run into a problem known as "Amdahl's Law", see http://en.wikipedia.org/wiki/Amdahl%27s_law. You can try to optimize garbage collection using different garbage collectors, see http://java.sun.com/docs/hotspot/gc1.4.2/#3.%20Sizing%20the%20Generations|outline.
ADDITIONAL_MANAGER_OPTIONS="-XX:+UseParallelGC \ -verbose:gc \ -XX:+PrintGCDetails \ -XX:+PrintTenuringDistribution \ -XX:+PrintGCTimeStamps"
you will get a lot of time information about garbage collection in the output.log of opennms. The default garbage collector used by opennms is incgc (e.g. -XX:+incgc), others to try are ConcMarkSweepGC (-XX:+UseConcMarkSweepGC) and the ParallelGC (-XX:+UseParallelGC) which might be the best if you have a lot of cores/threads. If you have settled down you configuration remove the lines containing verbose and Print from the options:
Parallel thread library on Solaris systems
It is also useful to use libumem instead of standard IO libraries on Solaris 10. If you want to enable libumem on an existing application, you can use the LD_PRELOAD environment variable (or LD_PRELOAD_64 for 64 bit applications) to interpose the library on the application and cause it to use the malloc() family of functions from libumem instead of libc.
LD_PRELOAD=libumem.so opennms start LD_PRELOAD_64=libumem.so opennms start
To confirm that you are using libumem, you can use the pldd(1) command to list the dynamic libraries being used by your application. For example:
$ pgrep -l opennms 2239 opennms
$ pldd 2239 2239: opennms /lib/libumem.so.1 /usr/lib/libc/libc_hwcap2.so.1
By default the daemons log at WARN and webapp log at DEBUG level. This causes a lot of extra disk I/O. You can reduce the logging substantially by setting the level to WARN in /opt/opennms/etc/log4j.properties and /opt/opennms/webapps/opennms/WEB-INF/log4j.properties.
You don't need to restart OpenNMS, the changes will take effect a few second later.
High disk I/O load due to data collection is the major reason for performance problems in many OpenNMS systems. Hardware and filesystem layout as described above helps a lot.
Another approach is to omit all unnecessary data collections.
Don't collect what you don't need
While the "default" snmp-collection definitions in
datacollection-config.xml provide an easy-to-go data collection definition for small network systems in larger environments it's undesirable to collect everything that can be collected. Probably in those environments a better approach would be to NOT use default data collection but to start with defining packages in
collectd-configuration.xml and corresponding snmp-collections in
datacollection-config.xml to ensure only those values are collected you really care about. See Docu-overview/Data Collection for details.
Don't try to collect what you don't get
If you try to collect a lot of data from nodes which don't provide those values you will get a lot of threads waiting for timeouts or getting errors. If you have specific nodes with problems look in your $OPENNMS_HOME/share/rrd/snmp/[nodeid] directory for the node(s) in question and note all the mib objects that are actually being collected.
Another possibility is to change the logging for collectd from WARN to DEBUG:
$OPENNMS_HOME/etc/log4j.properties: # Collectd log4j.category.OpenNMS.Collectd=DEBUG, COLLECTD
and then fgrep for "node[your_nodeid]" in collectd.log.
There you should see which variables OpenNMS tries to collect and which variables are successfully collected. The successful ones normally end up in the jRRD files, all others defined in data-collection for this [type of] node can't be collected for some reason.
If there are too many unsuccessful tries change your datacollection-config.xml. You may omit those values for all devices or create new collection groups that contain only those mib objects the node(s) provide values for. Add a systemDef for your node(s) providing the the same values. In collectd-configuration.xml define a separate package for your node and reference the snmp-collection you just created in datacollection-config.xml. Make sure the node is only in this one package. This gives you an environment to work in that is free of any extra clutter and avoids requesting extraneous mib objects that you won't get a response for. Then experiment with different values for max-vars-per-pdu, timeout and also snmp v1 or v2c.
Don't forget to change back logging to WARN once you have finished debugging.
Writing all the snmp-collected data and the results from polling the service (response times) to rrd files produces a lot of disk I/O, so look for disk tuning below. For further tuning see the fundamentals and some more detailed pages like
Tomcat (if not using built-in Jetty server)
Note that there's no need to use Tomcat since OpenNMS version 1.3.7 unless you have a specific requirement that the built-in Jetty server in OpenNMS cannot meet.
If not already done at installation time; To allow Tomcat to access more memory than the default. The easiest way to do this is via the CATALINA_OPTS environment variable. If the Tomcat software being used has a configuration file as above, it can be added to that file. Otherwise it is best just to add it to catalina.sh. CATALINA_OPTS="-Xmx1024m"
The -Xmx option allows Tomcat to access up to 1GB of memory. Of course, the assumes that there is 1GB of available memory on the system. It will need to be tuned to the particular server in use.
Jetty built-in server
Similar to tomcat configuration, you can change the JVM startup options in $OPENNMS_HOME/etc/opennms.conf file. To increase the maximum heap size (-Xmx java option), add into $OPENNMS_HOME/etc/opennms.conf.
On Ubuntu $OPENNMS_HOME is defined in /usr/sbin/opennms as /usr/share/opennms, so the option must be added into /usr/share/opennms/etc/opennms.conf file.
Capsd service discovery / rescan
If discovery or rescanning of a node takes a long time, you can turn up the maximum number of threads for initial discovery of services (max-suspect-thread-pool-size) or rescans (max-rescan-thread-pool-size) at the top of
Change logging for capsd in
log4j.properties from WARN to DEBUG and check the
capsd.log file for the number after "Pool-fibern". If n is most of the time the same as the maximum number of threads configured you should increase the maximum number of threads. Most servers will easily handle 50 threads or even more as the threads are most of the time waiting for services that don't answer. Don't forget to change logging back to WARN.
Capsd will check every service defined in
every interface of the device during a rescan. For every service you can
define the number of retries and the timeout value. If you have a device
with a lot (hundred) of interfaces and the default capsd configuration
it has to check about 30 services (default for opennms 1.6.x) for every interface. If the interfaces
are just "ip interfaces" with no other service like dns, dhcp, http etc.
you have about 30 services to time out for every interface, and probably there are retries, too.
To get an estimate of the time this needs take
time = number of interfaces * number of services * ((number of retries)+1) * (timeout value/1000)
Note: timeout is defined in milliseconds!
time = 100 [interfaces] * 30 [services] * (1 [retry] +1) *(2000 [timeout in ms]/1000) = 12.000 seconds = 200 min. = 3.3 hours
Try to reduce the ip-ranges, the number of services to check, the timeout- and retry-values to something reasonable for your environment.
If you have good hardware and find your pollers are not completing in time, you can turn up the maximum number of poller threads at the top of
To find out how many threads are actually being used, make sure DEBUG level logging is enabled for
daemon/poller.log, then run:
$ tail -f poller.log | egrep 'PollerScheduler.*adjust:' ... 2007-09-05 10:30:32,755 DEBUG [PollerScheduler-45 Pool] RunnableConsumerThreadPool$SizingFifoQueue: adjust: started fiber PollerScheduler-45 Pool-fiber2 ratio = 1.0227273, alive = 44 ... 2007-09-05 10:30:12,783 DEBUG [PollerScheduler-45 Pool-fiber29] RunnableConsumerThreadPool$SizingFifoQueue: adjust: calling stop on fiber PollerScheduler-45 Pool-fiber3
Watch the output for a while after startup. The "alive" count shows the number of active poller threads (minus one -- the new thread isn't counted). If the number of threads is continually pegged at the maximum (default 30), you might want to add more threads.
Needs updating for 1.12
In OpenNMS 1.12 the RunnableConsumerThreadPool no longer exists, it seems to have been replaced by something new. This section needs corresponding updates.
All incoming events have to be checked against the configured events to classify them and to handle the parameters correctly. There are a lot of predefined events in opennms. Incoming events are compared to the list of configured events until the first match is found. If you have a lot of incoming events you might consider to make the following changes in
- comment out vendor events that you don't need
- put the vendor events that make most of your incoming events on top of the list
- Take care that Standard, default and programmatic events keep their place at the end of the list.
As there probably are a lot of events hitting the Standard- or default-events configured at the end of the list resorting the event list won't help as much as commenting out.
In the OpenNMS "contrib" directory, we have a small script for helping performance by archiving events into a historical event table and updating the references to the archived event to an event place holder.
You can download the latest version of the script here.
It is recommended that you run this script by passing in a timestamp argument such that you archive one day's worth of events beginning with the oldest day up to the point you want to keep live events (default is 9 weeks). Then run this script without a timestamp parameter, from cron as often as you like from there out.
To analyze why your event table is so large, have a look at Event_Maintenance.