Sunday, August 17, 2008

nagios redundant master servers setup

It may be necessary to have a fail-over nagios master in case of primary system hardware failure.
The following describes the setup of nagios redundant master setup using heartbeat and nagios internal techniques.

Both nagios systems have to be set up identically in terms of monitoring items.
Nagios will be running as process on both cluster nodes.

After finishing setup the following is needed:

- both systems need to have passive checks enabled
- both systems need nsca running
- both systems will need the obsess_over_services option set to 1
- both systems will need an ocsp_command configured (submit_check_result)

The command submit_check_results needs to be configured:

define command{
        command_name    submit_check_result
        command_line    /usr/lib/nagios/libexec/eventhandlers/submit_check_result $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$'
        }

The mentioned script needs to be put into place on the primary node:
submit_check_result

Now we would need an additional start-stop script:
 /etc/init.d/nagios_notification

#!/bin/bash

case "$1" in
        "start")
                echo "Starting notifications"
                /usr/lib/nagios/libexec/eventhandlers/enable_notifications
                echo "Done"
                exit 0
        ;;
        "stop")
                echo "Stopping notifications"
                /usr/lib/nagios/libexec/eventhandlers/disable_notifications
                echo "Done"
                exit 0
        ;;
        *)
                echo "Usage: $0 [start|stop]"
        ;;
esac



Now we can start with the heartbeat setup:

put into /etc/ha.d/haressources on both clusternodes:

<clustername> \
    IPaddr::<virt ip>/cidr>/<iface>/<bcast> \
    nagios_notifications
 et voila.

We now have a basic nagios cluster where node B is informed about updates that are done on node A.
In case of hardware failure node B will take over and have notifications enabled.

Best would be to also disable active checks on node B until fail-over.

I will add an update on this.



It may be necessary to have a fail-over nagios master in case of primary system hardware failure.
The following describes the setup of nagios redundant master setup using heartbeat and nagios internal techniques.

Both nagios systems have to be set up identically in terms of monitoring items.
Nagios will be running as process on both cluster nodes.

After finishing setup the following is needed:

- both systems need to have passive checks enabled
- both systems need nsca running
- both systems will need the obsess_over_services option set to 1
- both systems will need an ocsp_command configured (submit_check_result)

The command submit_check_results needs to be configured:

define command{
        command_name    submit_check_result
        command_line    /usr/lib/nagios/libexec/eventhandlers/submit_check_resul
t $HOSTNAME$ '$SERVICEDESC$' $SERVICESTATE$ '$SERVICEOUTPUT$'
        }

The mentioned script needs to be put into place:
monitor1

Now we would need an additional start-stop script:

Now we can start with the heartbeat setup:

put into /etc/ha.d/haressources:


Saturday, August 16, 2008

example for munin-nagios integration

Let's assume that you have an USB temperature sensor connected to a machine in your server room.
Active checks would make no sense here, since most of the time the temperature should be OK.
But you want to get notified in case of cooling problems and over heating.




First you need a plugin for munin:
serverroom



Verifiy that this plugin is working prior doing anything else!
Also adopt the warn and crit values to your needs.

Now you need nsca installed on your munin master node.

Next
step is a proper munin master configuration for your system that has
the USB thermometer connected. We assume that the system has the name
intranet and that you have configured munin to make use of some
domains. intranet is located in domain intern.

contact.nagios.command /usr/sbin/send_nsca -H <nagios server IP> -c /etc/send_nsca.cfg
[intranet.intern]
        notify_alias intranet
        address <your systems IP>
        use_node_name yes

If you omit the "notify_alias" part all alarms will be sent from the given system name  plus domain appended (intranet.intern).
With the notify alias you can make sure that nagios receives the alarm for the proper system.

Now the nagios system needs to get configured.
Make sure you have nsca installed and running.
Now enable passive chacks in nagios.conf and create a service for the temperature alarm:

define service{
        use                             passive-service
        hostgroup_name                  temperature-servers
        service_description             Serverroom
                check_command                   return-ok
        }



Friday, August 1, 2008

combining nagios and munin

munin offers a nice way to collect and show system information. So munin can be easily used as a monitoring system. Munin uses simple plugins (either shell- or perl code) to gather data from systems.
nagios is well-known as an alarming system.

munin offers the possibility to name warn and critical values. In single use munin will show items that are beyond their warn and critical values by link highliting.

Additionally munin offers the possibility to make use of nagios passive checks via nsca.



nsca is a part of nagios.

first one needs to configure the munin-master to make use of the send_nsca command.
second one has to configure the nagios master to also run the nsca daemon (either via inetd or as standalone daemon)

Since nsca documentation is very simple the guys from munin made a documentation on how to combine munin and nagios.

The advantage of this is that you get information upon changes immediately.