Montoring pacemaker with nagios

In the article I will describe how the monitor a pacemaker cluster resource manager of a Linux cluster with the nagios monitoring system. The nagios check_snmp plugin requests and analyses data from the pacemaker SNMP agent .

The nagios plugin check_snmp

Nagios provides a universal plugin to gather data from SNMP agents: check_snmp. You have to tell the plugin which OID you want to measure and its interpretation. The interpretation tell the plugin what values of the measurement indicate a good state (i.e. OK), a not so good state (i.e. WARNING) and a bad state (i.e. CRITICAL). The parameters are passed to the plugin with standard Unix options. The most important are:

check_snmp -H <ip_address> -o <OID> [-w warn_range] [-c crit_range]

You also can configure a community string for SNMPv1 or v2c or all the cyrpto stuff for SNMPv3 with other options. As always, the plugin tells you all its options when you call it with --help.

Configuration

We want to check a two node cluster for the following conditions:

  • At least one node is online: 2 nodes are OK, 1 is WARNING and 0 online nodes are critical.
  • There are no resources with failures. One failure in any resource gives a WARNING and mode are CRITICAL.

Online Nodes

The SNMP agent of pacemaker delivers the sys4PcmkOnlineNodes OID. This is the number of nodes in the online state. The nagios check would be:

$ check_snmp -H <node> -C public -o sys4PcmkOnlineNodes.0 -w 2: -c 1:
SNMP OK - 2 | PACEMAKER-MIB::sys4PcmkOnlineNodes.0=2

or in case one node is standby or offline:

SNMP WARNING - *1* | PACEMAKER-MIB::sys4PcmkOnlineNodes.0=1

Resource Failures

During normal operation resources in a cluster should not have any errors. Any failcounter in the cluster is sign for problems that the admins has to take care of. So the total number if failures in a cluster sys4PcmkResourceFailures makes a perfect target for monitoring. The check_snmp syntax would be:

$  check_snmp -H <node> -o sys4PcmkResourceFailures.0 -C public -w :1 -c :2
SNMP OK - 4 | PACEMAKER-MIB::sys4PcmkResourceFailures.0=0

or in case of any errors:

SNMP OK - 4 | PACEMAKER-MIB::sys4PcmkResourceFailures.0=4

Please mail me (ms@sys4.de) for any further questions.


Comments

Comments are closed.