Monitoring services or metrics and getting alarms isn't complicated. A more interesting question is: How to fix them (fast)?
Monitoring systems, even in small or middle sized environments, creates a lot of different alarms. When you are working in a team, sometimes the person who creates a test or configures a threshold that throws an alarm is not the same person who has to understand what happened and what to do next.
Either way, you should have some kind of documentation for when you need it. Especially in the situation being on-call and getting woken up in the middle of the night. A good approach to being well prepared in alarm situations is to have specific instructions or information regarding alarms.
Since OpenNMS is event driven, it's possible to enrich events with additional useful information.
In case you get a nodeLostService alarm, you can use the operator instruction field by setting a link template to a wiki or internal documentation like this https://my-wiki/$service
.
As soon the Alarm appears, you can guide a person to instructions that are actually tested and informative about how to diagnose the situation. This also helps to improve the quality of tickets in case they need to be escalated to someone else.
SimpleSearch
Related Posts
Popular Tags
- twio
- minion
- kafka
- enlinkd
- dev-jam
- topology
- karaf
- osmc
- snmp
- training
- ouce
- rest
- elasticsearch
- bsm
- release
- netflow
- helm
- ohio linux fest
- syslog
- javascript
- topology maps
- alarmd
- telemetryd
- bug fixes
- jmx
- olf
- osgi
- sentinel
- drift
- docker
- mc frontalot
- kings barcade
- all things open
- compass
- events
- the doubleclicks
- drools
- ato
- newts
- jaxb
- provisiond
- eventd
- activemq
- alarms
- hackathon
- bootstrap
- vmware
- ws-man
- requisitions
- camel
- grafana
- opennms compass
- data collection
- wiki
- documentation
- icmp
- fosdem
- sextant
- uknof
- plugin manager
- trapd
- sflow
- cubaconf
- smoke tests
- jetty
- syslogd
- horizon
- percona live
- vaadin
- meridian
- nethinks
- opennms horizon
- collectd
- rpc
- angularjs
- provisioning
- tsrm
- castor
- asciibinder
- traps
- nznog
- inog
- discovery
- remote poller
- linuxconf
- webpack
- packaging
- gwt
- dbatlanta
- oia
- assets
- ipfix
- maps
- configuration
- correlation
- datacollection
- ping
- jasper
- ais
- jti
- ripe hackathon
- oce
- sqs
- dhcp
- poller
- linkd
- ausnog
- nx-os
- slack
- opennms.js
- afnog
- graphml
- java9
- rrd
- groovy
- radius
- pollerd
- node maps
- cisco
- rpm
- eif
- internals
- backshift
- quartz
- grok
- jms
- opennms
- ssl
- cdp
- graphs
- wmi
- hikaricp
- geolocation
- trouble ticketing
- varbinds
- outages
- scale
- amazon
- users
- thresholding
- xml
- open network summit
- ksc reports
- mattermost
- jnlp
- notifd
- logging
- jna
- testing
- dns
- open technologies for growth
- windows
- json
- angular
- selenium
- ldap
- bamboo
- tests
- postgresql
- ksc
- statsd
- opennms meridian
- spectrum
- upgrades
- opennms 101
- situations
- rmi
- jasper reports
- liquibase
- floss weekly
- path outages
- opendaylight
- metrics
- javapocalypse
- ifttt
- texas linux fest
- performance
- cxf
- youtube
- gemini
- notifications
- charts
- breadcrumbs
- monitor
- graphing
- rpms
- xmlcollector
- ohio state
- database
- measurements
- ticketing
- roles
- detectors
- nexus
- spring
- snmp4j
- jest
- rancid
- heartbeat
- devjam
- servicenow
- jaas
- api
- telemetry
- general assembly
- opennms foundation europe
- pris
- cassandra
- web ui
- css
- ttl
- jira
- capsd
- bsf
- meetup
- use-case
- tmforum
- vagrant
- drinkup
- sdn
- vsphere
- performance improvements
- statistics
- resource graphs
- instrumentation
- leaflet
- freebsd
- jar
- http
- flows
- ops board
- config-tester
- openhack
- telemanagement forum
- ipv4
- jdk
- vacuumd
- owasp
- flow
- monitoring locations
- ospf
- fiql
- pagerduty
- nxos
- amqp
- remedy
- cli
- debian
- persistence
- collection
- requsitions
- dashboard
- bridge discovery
- md4
- arista
- ripencc
- debugging
- cicd
- startpage
- overview
- monitoring
- process
- storebyforeignsource
- tlf
- sql
- ticketer
- path outage
- ulf
- blueprint
- dscp
- don't fragment
- twilio
- upgrade
- systemd
- maven
- webjars
- graph templates
- tcp persister
- httpclient
- netmask
- surveillance categories
- c3p0
- heartbeats
- imap
- confguration
- rhel
- ubuntu
- juniper
- refactoring
- google maps
- unit tests
- snmpv3 jira
- vdef
- notification
- threshold
- opennms.conf
- groups
- bsmd
- polling
- sni
- availability
- categories
- filtering
- operator instructions
- manageiq
- manageiq summit
- txlf
- install
- discourse
- jexl
- spring security
- jasperstudio
- meridian 2016
- opennms meridian 2016
- opennms.org
- scriptd
- nrtg
- northbounder
- iplike
- classification
- mailinglist
- netty
- hacktoberfest
- bugs
- maskelement
- hawtio
- surveillance
- news
- lombok
- miniontrapd
- holiday
- jdbc
- wallboard
- trend charts
- ui
- award
- eventconf
- acls
- carbon
- asciidoctor
- xmp
- ibm
- ipv6
- tivoli
- strafeping
- bridge forwarding
- enlinkd dscp
- dontfragment
- jicmp
- sparklines
- font awesome
- heatmap
- prometheus
- location
- hibernate
- openhack-2017
- startup
- event
- remote poller maps
- openlayers
- openstreetmap
- mqtt
- grpc
- collect
- ironport
- topologyd
- atlas
- ansible
- casperjs
- integration tests
- phantomjs
- usage statistics
- jstl
- user story