From OpenNMS
barCamp Categories
Organize the Wiki
Start working on more of a manual/book format to help newbies. (Tarus)
Bugzilla Review
Review Bugzilla bugs for 1.6 and move most of them out (Tarus)
Nagios Plugins
Create some OpenNMS monitors to implement some of the defaul Nagios checkscript functionality
GUI Review
- Collect functions
- Sort
- Write down as-is use cases
- Try to design "new" use cases
- Consider Context
RT Integration
- once OTRS works fine, see if it can be transferred to RT
Documentation
I just want to give a few suggestions coming from my experience when starting to work with opennms. I still don't have the overwiew of the documentation so I can't change it myself (and I don't want to change your main pages or documentation style) so I added my suggestions here. An easy-to-read documentation will also help you because you get less questions in the mailing list so you have more time to work on things you like more (or don't work at all, at least for a few minutes every day ;-)). Maybe after hours of programming during the Dev-Jam someone might take what is below and work on it to relax or you discuss about it with a beer in the hand ;-)
I have seen that steps in the direction described below were already made during the last month but I'm missing a "global concept".
The following suggestions where made with the idea "from the global overview to the details" in mind like "first tell what you can do at all with opennms, then list the different functionalities you need for that and at last how they have to be configured for doing this".
There is a lot of helpful information in the wiki, but sometimes it's difficult to find it. If you don't have the right terms in mind for searching (specially a problem for those speaking other languages) you won't find the information. Some sort of index, like in the good old books from a former age, would be very helpful. Just a little script, gathering all headlines from the wiki pages and arranging them as links, would be a leap forward in dokumentation. Or you take the output from this script and link the pages to the How To's described below.
At the moment the doc is a mix of the description "what tasks are done with ... (capsd by example)" and "what to configure in ... for a specific problem". Maybe it would be better to organize it like
- Purpose of the service/application
- Availability
- (basic) description of functionality
- basic configuration
- Problem determination
- How To's ... (collect here all those pages for this service like "how to collect disk usage data"
Basic configuration should contain only the most important parts like enabling the service, parameters common to all other parts within this config etc.
Problem determination: instead of collecting all those tip's in the FAQ I would propose to collect the questions in the FAQ and the answers here, so the information is not spread in a lot of different pages and it is sorted near the service it is regarding. Maybe it is even better to answer common questions instead in the mailing list here and then send a link on the mailing list, this way people get used to read the documentation ;-) and they will see other parts of the docu before they ask the next question.
Another suggestion: a more straightforward way to the documentation. At the moment there are at least three links on the main page where to go to if you start working with opennms.
- Configure OpenNMS with some links below
- Other Places of Interest with links to Newbies and Documentation
- Official Documentation
and they point to different pages. At the end you have seen something somewhere but you don't remember where to go to find it again...
Maybe it would be better to start the main page with something like
Welcome to opennms
... the text as is now
Brief overview
here should be a link to some page describing the functionality and showing one or two screenshots for non-technicians. Obviously opennms opens the door for many who didn't have any network management tool befor so they need some simple explanations about what they can do with opennms like
- Discover IP devices in the network - Discover Services like DHCP, DNS, Router, Web-Services, Databases etc. on those devices - test availability of IP devices with icmp ping - test availability of services with specific pollers - check performance (response times) of services - collect data with snmp like CPU load, interface traffic and errors, disk usage etc. - graph the collected data - receive traps and syslogs from devices/applications and generate alarms depending on trap type etc. - generate alarms when performance-, traffic-, error- or disk-usage thresholds etc. are reached - send notifications of events/alarms via e-mail etc.
as they obviously very often don't have any idea of what they want to do at all ...
Official Documentation
just the same link as on the left side of the main page
The Official Documentation Page with some straightforward structure, but containing mostly (or only) links to the pages which are already there. I tried -with my limited knowledge- to give an example below, see Talk:Dev-Jam_2008#Node Discovery.
The french view on documentation (a.k.a A French Letter)
When I start with a new software I normally give it 10 minutes to convince me. Guess why I don't know Java ;-) Anyways, if we do a "book" or a "more structured" documentation I would love to see a
- Get started quick
in the very beginning; covering the important bits from "install" to "get the first alarm as an email" in might be two, three (four) pages, along the lines of "a day in the life of..".
Then the "reader" has a working system which does actually something and is rewarded for his patience.
From there on the system gets explained in more detail. But after five (;-)) pages I'd have something up and running which makes beep and blinks.
(miskellc) +1. Actually, +100. This is *exactly* what is needed.
"Official Documentation"
Remark: This is just an suggestion for a new structure for the official documentation. If you accidently came to this page go to the Docu-overview for the actual official documentation.
Official Documentation This wiki documentation is organized like following
- purpose of a service / application
- Availability (Version information)
- basic description of functionality
- basic configuration
- Problem determination
- How To's
Please keep this in mind when changing/adding documentations to keep a clear, easy to use structure
Brief technical overview
this section probably should start with a link to a more detailed but still brief, global technical overview of opennms, like a picture showing the database and the daemons with information flows and one or two words about the event driven technology
Furthermore this would be the right place to contain all the links to the "Purposes" below to generate a more detailed technical overview of opennms functionality
Installation
Prerequisites
Pre-Installation Performance Tuning
Installation Process
Installation-Problem Determination
Configuration
Logging
Users
LDAP Users
Local Users
Groups
Notification Paths
SNMP Communities
Node Discovery
Purpose
For managing nodes (devices) opennms needs to know that they are there. With node discovery opennms can automatically discover the nodes contained in your network and fill the opennms database with these nodes. You can limit the range(s) by defining filters or by adding specific ip addresses.
There are two basic ways for node discovery:
- using discoveryd which will sweep a range of ip addresses using icmp ping
- using linkd which will try to read certain mib tables from nodes using snmp to speed up discovery process and to find links to other nodes
and you can combine both. Depending on the size of your network, the scope you have to survey and technical restrictions like firewalls, low bandwidth or limits with the hardware opennms is running on you have to decide what to configure here.
There is still another way to fill your database with the nodes you want to manage. If you have some other system provisioning the required informations you can use Importer_Service instead of the discovery features described above, or you use the importer service for the initial filling of the database and then change to the discovery processes from opennms.
Functionality
Discoveryd and linkd have different methods to discover nodes, see below. Basically it works like this:
- an IP address is detected - check if the address is already known in the opennms db - if not, check if the ip address is in the range that should be discovered - if it should be discovered, add ip address to the database and generate a "newSuspect" event - all other daemons who have subscribed to this event will receive it and start their part of the work
Basic Configuration
For discovering nodes with discoveryd or linkd you have to have icmp ping working. If you can't reach a node from your opennms server with ping opennms won't discover it.
discoveryd
If you have a small network (up to a very few hundreds of nodes) this might be the easiest way to discover nodes in your network. discoveryd is configured via WebGUI Admin / Configure Discovery which writes the configuration to $opennms_home/etc/discovery-configuration.xml.
See Discovery#Overview for detailed configuration description
linkd
As linkd uses snmp to get more detailed information from the nodes you have to configure snmp communities first, see SNMPv3_protocol_configuration.
At the moment there is no GUI interface for configuring linkd so you have to go to the $opennms_home/etc directory and edit the linkd-configuration.xml. See linkd for detailed description of linkd.
Linkd performes well discovering new networks up to some thousands of nodes. If you have networks with tens of thousands of nodes you probably should use the Importer_Service at least for initially filling the database.
Problem determination
- Node(s) not discovered
try to ping the node from command line on your opennms server
If ping failes, try traceroute to see if there are routing-, firewall- or other problems.
How To's
Service Discovery
Status Polling
Data Collection
Thresholding
Alarms
Notifications
Automations
Web-UI
Performance Tuning
Suggestion
Check out New_documentation I started it months ago, got a bit written. It's intended to be more book style, taking users through the process of thinking about how to use OpenNMS. Other documentation can deal with gory details.
Matt's Suggestions
Where to code
Make a feature branch for each project and do your work there. XXX need a link to the how-to on how to do this.
Once the code is working, in good shape, has tests and documentation, merge it up to trunk.
Project ideas
Matt
- RESTful interfaces
- Maven archetype for OpenNMS daemons
- Allow options for:
- Listening for events
- etc.
- Allow options for:
- Maven archetype for OpenNMS web applications
Jonathan
- OTRS Integration
- It's in good shape, but needs cleaning up. There are two pieces, one piece in OpenNMS, and one that goes into OTRS (which will ideally be merged into OTRS).
- It probably makes sense to look at the OpenNMS Trouble Ticket API with David, Jonathan, and Johan. The API doesn't seem to handle things like failures.
Johan
- Maps with OpenLazlo
- Can make a client-side GUI with DHTML (probably would not work well with maps) or Flash 7/8/9.
Craig Miskell
- Rich Web Interfaces
- Have been looking at extJS. So has Alejandro.
- Talk about how collectd works
- With Matt, Jonathan, and Craig.
Alex
- User experience in the GUI
- We need to look at usability. Maybe come up with one or two story lines about how people use OpenNMS. Could feed well into documentation or vice versa.
- Desktop Client: Prototype an Adobe AIR based Desktop Client
- Reduce "unused" space on desktop - at the moment the Logo and other parts of the header together with the browser header (navigation bar etc.) takes up to 25-30% of the desktop (Michael)
- (Done) changed the order of things in the Admin menu and changed the descriptions to be more terse
- Started a sysconfig.jsp - now who can get the server status in there..?
Ben
Jeff
DJ
- Comedy
- Solaris packaging
Craig Gallen
- Update the OSS/J Qos interface and other investigations to support TMForum TIP program
Paul
- Patch capsd to give it an option to monitor only the primary interface on a node
- Implement notification rate limiting (patch notifd or hack something with alarms?)
- Maybe also implement flapping detection and/or improve path outages to handle redundant uplinks?
- Write Acegi module for Radius authentication
- Write a small package that can be thrown on a separate system to monitor the NMS daemon itself and do basic notification when it dies
- Get the patches in bugs 2488, 2199, and 2202 merged ;-)
- Get SNMP version auto-detection working again
Mike
Data source definition
- /node/14/....
- /node/*/....
- /node/?/....
CDEF templates
- Put stored value into base units
- Octets to bits
- Kilobytes to bytes
- Deciamps to amps
- sum/avg/max/min defs
- Inverted stack/area of def
- Check Cacti for similar functions
- All
- criteria based
GPRINT templates
- Provide standard easy to use configuration of gprint
- Mapped based on the data type provided automagically
Use cases for GraphTemplates
- Run a prefabricated template on a resource selected by user at run time
- Show me CPU utilization on Node 14
- Show me Utilization on Node 14 SnmpInterface 1110
- Run an adhoc graph containing attributes from two different resources
- Show me the HTTP response time on node 41 and disk i/o for node 22 disk 14
- Show me the utilization on node 33 on SnmpInterface 1121 and utilization on node 44 on SnmpInterface 1112
- Run a report on a group of resources(Surv Cat, intersection of surv cat, or something) specified by the user in the configuration
- Show me the nodes that match the filter CPU utilization on one graph
- Show me the nodes that match the filter with SNMPIfType 10 utilization on one graph
- Run a report showing graphs for a specific resource type for all nodes given a specified Application.
- Run a report on a group of resources that is related to a single resource specified by the user at run time
- Show me a single graph with the interface utilization for all interfaces on the resource selected, including an aggregate
- Show me a single graph with the disk i/o for all disks on the resource selected
- Run a report for a prefabricated report that involves different related resources, where one resource is selected by the user at runtime
- Show me a single graph with if util and cpu util for that node when the user selects that interface
- Show me CPU utilization on Node 14
- /node/?/nodeSnmp/CpuUtilization
- Show me Utilization on Node 14 SnmpInterface 1110
- /node/?/interfaceSnmp/ifInOctets
- /node/?/interfaceSnmp/ifOutOctets
- Show me the HTTP response time on node 41 and disk i/o for node 22 disk 14
- /node/41/responseTime/10.1.1.1/HTTP
- /node/22/hrStorageDescr/14/diskio
- Show me the utilization on node 33 on SnmpInterface 1121 and utilization on node 44 on SnmpInterface 1112
- /node/33/snmpInterface/1121/ifInOctets
- /node/33/snmpInterface/1121/ifOutOctets
- /node/44/snmpInterface/1112/ifInOctets
- /node/44/snmpInterface/1112/ifOutOctets
- Show one graph with all of the CPU utilization for all the nodes that match the filter
- /node/*:myCPUUs/nodeSnmp/cpuUtil
- Show me the nodes that match the filter with SNMPIfType 10 utilization on one graph
- /node/*/snmpInterface/*:ifTypeTen/ifInOctets
- /node/*/snmpInterface/*:ifTypeTen/ifOutOctets
- Show graph for a specific resource type for all nodes given a specified surv cat.
- /node/*:mysurvcat/?/?
Scratch for GraphTemplates
- Generic
- Saved report that is applied to a single resource dynamically from a user selected option
- Like current prefab graphs
- Non-specific to the attributes
- Based on single resource
- Based on single resource type
- Attribeautiliciousness
- Possibly an adhoc report that is used once or saved report
- Like adhoc but allows multiple resources
- Like secret
- Any attribute
- Grouped
- Saved report like statsd reports
- Defined IP range or other grouping of nodes with a collection of defined attributes of a particular type
- Group attribute by matching criteria
- Aggregate view of the same datasource for the group on the same graph
- Saved report like statsd reports
- Generic dynamic??
- Different resource that are related
- Show me ifInOctets for server and the CPU
- Different resource that are related






