From OpenNMS
Contents |
A queueing strategy for the RRD Interface
OpenNMS developers have been investigating scalability issues with OpenNMS data collection. We have found the following issues to be the most significant factors in our ability to achieve reasonable levels of scalability:
- Non-thread safe C libs from RRDtool.
- Disk I/O performance accessing RRD files.
- Our RRD Fiile strategy (one file per metric)
JRobin
Initially, we adopted a Java implementation of RRDtool from the JRobin project. Even though we found significant performance improvements using a multi-threaded technology, we still ran into a performance bottle neck trying to scale to collections of 200K metrics (roughly 20,000 interfaces using default datacollection configs) and above. The performance bottle neck became disk I/O. As long as the system could cache the RRD files, the JRobin implementation seemed to be the answer. However, as the number of files grew or the size of the files grew beyond the system's ability to cache our read/write requests, performance dropped off significantly and we would miss polls.
Queuing
While JRobin will help us to achieve one of the project's goals of optionally becoming a 100% Java platform and shifted the bottle neck from OpenNMS software to the user's hardware, we still had not achieved our scalability goals.
If you have been following any of the CVS activity lately, you will have noticed quite a bit of contribution from Matt Brozowski (da Nog). Not only did he create new Java classes to abstract the RRD implementation, he designed a new queuing strategy that works much like a capacitor. The new queuing strategy added the performance boost we needed by:
- Freeing up the data collection threads to focus completely on data collection. Prior to the new queuing strategy, each data collection thread wrote their collected data through the non-thread safe RRDtool C lib. This meant that even though we had 80 threads doing collection by default, they each had to wait in a single file line to write their data to RRD files prior to going back to the thread pool and making themselves availble for more data collection tasks.
- Priority queuing. Obviously, some data is more important that it gets written to disk than other data. Using a priority system, less significant data is written to disk as time allows. This less significant data written to disk is much more efficient to, because many data points are written at once rather than one data point per file operation. In the case where insignificant data can't be written every 5 minutes, all file operations are queued up for an I/O opportunity. When that opportunity arrives, all data points are dumped at once.
We have found the behavior to be like that of a capacitor where the queue will find an equilibrium with the disk I/O performance. This happens as the number of data points written per file increase and level off to a balance where the number of enqueued data points equal the number of dequeued data points.
So far so good
We have achieved our scalability goals and we are still testing the upper limits. Many thanks to our install base and customers that have allowed us to test on their systems. Since we have pushed the bottleneck performance back to the hardware, we are looking for platforms that can push the bottleneck back on us. Currently, we are able to stay ahead of the disk sub system using 100 data collection threads and only 2 queuing threads that are responsible for all disk activity (creating RRD files and writing RRD data).
From our research, significant I/O performance improvements have been documented using the 2.6.7+ kernel and the ext3/Reiser4 file systems. One of our next steps is to test on this or a comparable OS.
To Do items
Here is the remaining work item list for the new queueing strategy:
- Create a broadcast event processor for the Queueing process to receive events. Initially an event is needed for the promote method so that the graphing servlet can promote any low priority data in the queue needed for the graphs requested from the GUI.
- Move JRobin and Queueing options from Java properties to XML configuration files.
- JRobin graphing needs clean up.
- Thresholding (threshd) needs test and debugging.
- Update build and install to unclude new jars and runtime requirements.
Merge to head and release 1.1.4
- Change RRD file strategy to 1 file per interface.
- Change the current priorization from a hard coded decision to a statistical decision.
- Determine best default JRobin and Queing options for install.
- Document the Queueing strategy and the RRD Strategy.
Comments, please.
-David Hustace






