A Consumer Goods Company
Subscribe

From OpenNMS

Jump to: navigation, search

Written by joed - (Thanks mhuot for the format) - Will add to this as I go along.

I work for a multinational consumer goods company, We have used most on the market existing tools. We have about 1000 larger locations in 57 countries around the world. We primarily monitor Datacenters, Network Nodes and Application Hosting Services. In our DC's we run Windows servers, AIX, Linux and HP-UX. Most of our core devices have SNMP enabled but in the windows world we utilize Nsclient quite a bit.

We utilize OpenNMS primarily to verify SLA adherence to customers, we monitor latency, traffic and application specific data, we also are looking at monitoring a ~ 1800 sites webfarm where virtualization components will become necessary.

Given our geographical size and the cost of bandwith, distributed monitoring is for us quite the rage, so upcoming features in OpenNMS will be very interesting.

We do autodiscovery on a few select ranges that are primarily our routing-core, data centers and WAN loopback ranges. That way we can minimize traffic and maximize target hit ratio.

The introduction of OpenNMS has lead to many network discussions, as we now more clearly can see user-like outage events, short latency drops and peaks in network utilization.

We also bolted on Netflow collections to OpenNMS to have a single view to all traffic matters in the Network and on the Server side.


On normal servers we are interested in outside of the predefined events consolidated syslog events, traps (in particular security oriented ones). We also pull an extensive set of Nsclient - Windows Performance Counters to be able to trap / event and send alarms on things like failed backups, Ms Exchange MTA Queue status, IIS thread count and so on.

We on a weekly basis utilize the reporting engine and the Vulnscand engine to give a quick view of our current systems offerings. As we resell our services within the company the reporting capabilities gives a quick path to compliance.

We normally log and threshold WAN related data, we alarm on outages and application errors. As we use the Nessus and Snort Integration as well, we do correlate quite a bit of data to match things such as, Vulnerabilities, Traps, Snort Messages and Syslog Events.