From OpenNMS
Contents |
Upgrading OpenNMS
Disclaimer: Everything said below applies only to "normal" upgrades (et 1.3.10 to 1.3.11). If you jump versions you might need to consider changes which happened in between. This text is further not a guide but contains hints and best practices for the experienced system administrator who has successfully operated OpenNMS for a few versions.
The Test is applicable to all upgrades; the section underneath only if you enjoy pain and upgrade from source.
Preparation
Test
Prepare a Regression Test. A "Regression Test" is a test which checks if a system behaves after an upgrade as before.
The "Standard" Elements to test are:
1 - Is the old data there?
- The system starts
- The system is accessible over the webui
- You can log in
- The historical data is present (standard data collection and response time data)
- Your custom created data is present (can you see the graphs of the collections/pollers you added yourself)
- Add whatever else you added to the system here and which you should be able to see immediately after login
2 - Does the active part of the system work?
- Put the system in debug mode (log4j, keep the logfile sizes small)
- Discovery works as expected
- Datacollection works as expected
- No real (*) errors are in any logfile
- New data is collected (either you wait and look at the graphs or you check the filesystem)
- Your customer data collection (http..) is working
- The services you activated are running (syslogd? linkd?)
- Trigger a test notification for each of your escalation paths (**)
- The notification makes it up through the escalation path
- Create a situation which triggers a threshold
- The threshold is exceeded and an event (notification depending on your config) is created
- Put the system out of debug mode (log4j)
(*) real errors - something substantial - if the web ui complains about not finding log4j.properties I ignore that (**) you can not do this from the web ui but have to set it up beforehand; I use the syslog facility
This is what comes to my mind right now - mostly because I failed on one or the other item :-)
Backup
- Follow the remarks on backing up the database if you value the data. If you don't care, why do you upgrade? You could just reinstall ;-)
- As a best practice I save the complete old installation to be able to roll back ASAP if needed
Warp
Now you have prepared a test, backed up the application and the database. Perform the upgrade. Run the test. See everything works. Enjoy the new features. Almost, but what about..
Configuration Changes
The fact that you have changed the configuration of OpenNMS requires that you validate if your old configuration is compatible with the newer code. The easiest way would seemingly be to
- not change configuration file formats (wink, wink)
- copy everything from old to new
However - new features require new configuration and it's part of the development path that the inner logic of the application changes. That can require substantial changes in the configuration as well.
If you want to know beforehand what has changed, you can run a diff on the git repository:
git clone git://opennms.git.sourceforge.net/gitroot/opennms/opennms cd opennms git diff opennms-1.6.7..opennms-1.6.8 opennms-daemon/src/main/filtered/etc
A bright mind on the #opennms irc channel has shown this to me (I would have asked my Secretary to print it out and do a written report but this is so much faster!).
Replace $OLDVERSION with what you run (e.g. 1.3.11) and $NEWVERSION with your target version (eg "trunk" or "1.5.92").
Before doing anything I suggest you create a directory in ~etc in which you store the "dist" configuration files. Just to be sure that you can look at them for reference later - it's much easier to look there than to dive in the source tree.
Now that you know what has changed in OpenNMS you know 'which configuration files you can not re-use'. Start to move the ones 'you can reuse' into the new directory. Then merge the ones which have changed.
(a couple of hours later)
Now you should be able to run throught the normal installation procedure (runjava, update the database structure (install -disU)) and .. did I say "stop opennms" on the top? I hope you did.. ok, now start opennms.
After an upgrade I typically want to see what I did wrong asap so I normall do a
~/bin/opennms start & tail -f ~/logs/daemon/*log
which will throw errors on the console quite immediately. 99.9% of my problems after an upgrade were configuration errors on my behalf. 0.1% were due to the change in the thresholding configuration.
Automated Testing
Because of the significant number of changes I made on the system and because I almost always forget to copy my custom events in the events directory I wrote an automated test script in perl. Have a look at the simple test Module and write small checks to validate your configuration - it helps a lot.






