Sunday, August 17, 2008

nagios redundant master servers setup

It may be necessary to have a fail-over nagios master in case of primary system hardware failure.
The following describes the setup of nagios redundant master setup using heartbeat and nagios internal techniques.

Both nagios systems have to be set up identically in terms of monitoring items.
Nagios will be running as process on both cluster nodes.

After finishing setup the following is needed:

- both systems need to have passive checks enabled
- both systems need nsca running
- both systems will need the obsess_over_services option set to 1
- both systems will need an ocsp_command configured (submit_check_result)

The command submit_check_results needs to be configured:

define command{
        command_name    submit_check_result
        command_line    /usr/lib/nagios/libexec/eventhandlers/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$'
        }

The mentioned script needs to be put into place on the primary node:
submit_check_result

Now we would need an additional start-stop script:
 /etc/init.d/nagios_notification

#!/bin/bash

case "$1" in
        "start")
                echo "Starting notifications"
                /usr/lib/nagios/libexec/eventhandlers/enable_notifications
                echo "Done"
                exit 0
        ;;
        "stop")
                echo "Stopping notifications"
                /usr/lib/nagios/libexec/eventhandlers/disable_notifications
                echo "Done"
                exit 0
        ;;
        *)
                echo "Usage: $0 [start|stop]"
        ;;
esac



Now we can start with the heartbeat setup:

put into /etc/ha.d/haressources on both clusternodes:

<clustername> \
    IPaddr::<virt ip>/cidr>/<iface>/<bcast> \
    nagios_notifications
 et voila.

We now have a basic nagios cluster where node B is informed about updates that are done on node A.
In case of hardware failure node B will take over and have notifications enabled.

Best would be to also disable active checks on node B until fail-over.

I will add an update on this.



It may be necessary to have a fail-over nagios master in case of primary system hardware failure.
The following describes the setup of nagios redundant master setup using heartbeat and nagios internal techniques.

Both nagios systems have to be set up identically in terms of monitoring items.
Nagios will be running as process on both cluster nodes.

After finishing setup the following is needed:

- both systems need to have passive checks enabled
- both systems need nsca running
- both systems will need the obsess_over_services option set to 1
- both systems will need an ocsp_command configured (submit_check_result)

The command submit_check_results needs to be configured:

define command{
        command_name    submit_check_result
        command_line    /usr/lib/nagios/libexec/eventhandlers/submit_check_resul
t $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$'
        }

The mentioned script needs to be put into place:
monitor1

Now we would need an additional start-stop script:

Now we can start with the heartbeat setup:

put into /etc/ha.d/haressources: