mirror of https://github.com/Icinga/icinga2.git
360 lines
12 KiB
Markdown
360 lines
12 KiB
Markdown
# Additional Agent-based Checks <a id="agent-based-checks-addon"></a>
|
|
|
|
If the remote services are not directly accessible through the network, a
|
|
local agent installation exposing the results to check queries can
|
|
become handy.
|
|
|
|
## SNMP <a id="agent-based-checks-snmp"></a>
|
|
|
|
The SNMP daemon runs on the remote system and answers SNMP queries by plugin
|
|
binaries. The [Monitoring Plugins package](02-getting-started.md#setting-up-check-plugins) ships
|
|
the `check_snmp` plugin binary, but there are plenty of [existing plugins](05-service-monitoring.md#service-monitoring-plugins)
|
|
for specific use cases already around, for example monitoring Cisco routers.
|
|
|
|
The following example uses the [SNMP ITL](10-icinga-template-library.md#plugin-check-command-snmp) `CheckCommand` and just
|
|
overrides the `snmp_oid` custom attribute. A service is created for all hosts which
|
|
have the `snmp-community` custom attribute.
|
|
|
|
apply Service "uptime" {
|
|
import "generic-service"
|
|
|
|
check_command = "snmp"
|
|
vars.snmp_oid = "1.3.6.1.2.1.1.3.0"
|
|
vars.snmp_miblist = "DISMAN-EVENT-MIB"
|
|
|
|
assign where host.vars.snmp_community != ""
|
|
}
|
|
|
|
Additional SNMP plugins are available using the [Manubulon SNMP Plugins](10-icinga-template-library.md#snmp-manubulon-plugin-check-commands).
|
|
|
|
If no `snmp_miblist` is specified, the plugin will default to `ALL`. As the number of available MIB files
|
|
on the system increases so will the load generated by this plugin if no `MIB` is specified.
|
|
As such, it is recommended to always specify at least one `MIB`.
|
|
|
|
## SSH <a id="agent-based-checks-ssh"></a>
|
|
|
|
Calling a plugin using the SSH protocol to execute a plugin on the remote server fetching
|
|
its return code and output. The `by_ssh` command object is part of the built-in templates and
|
|
requires the `check_by_ssh` check plugin which is available in the [Monitoring Plugins package](02-getting-started.md#setting-up-check-plugins).
|
|
|
|
object CheckCommand "by_ssh_swap" {
|
|
import "by_ssh"
|
|
|
|
vars.by_ssh_command = "/usr/lib/nagios/plugins/check_swap -w $by_ssh_swap_warn$ -c $by_ssh_swap_crit$"
|
|
vars.by_ssh_swap_warn = "75%"
|
|
vars.by_ssh_swap_crit = "50%"
|
|
}
|
|
|
|
object Service "swap" {
|
|
import "generic-service"
|
|
|
|
host_name = "remote-ssh-host"
|
|
|
|
check_command = "by_ssh_swap"
|
|
|
|
vars.by_ssh_logname = "icinga"
|
|
}
|
|
|
|
## NSClient++ <a id="agent-based-checks-nsclient"></a>
|
|
|
|
[NSClient++](https://nsclient.org/) works on both Windows and Linux platforms and is well
|
|
known for its magnificent Windows support. There are alternatives like the WMI interface,
|
|
but using `NSClient++` will allow you to run local scripts similar to check plugins fetching
|
|
the required output and performance counters.
|
|
|
|
You can use the `check_nt` plugin from the Monitoring Plugins project to query NSClient++.
|
|
Icinga 2 provides the [nscp check command](10-icinga-template-library.md#plugin-check-command-nscp) for this:
|
|
|
|
Example:
|
|
|
|
object Service "disk" {
|
|
import "generic-service"
|
|
|
|
host_name = "remote-windows-host"
|
|
|
|
check_command = "nscp"
|
|
|
|
vars.nscp_variable = "USEDDISKSPACE"
|
|
vars.nscp_params = "c"
|
|
vars.nscp_warn = 70
|
|
vars.nscp_crit = 80
|
|
}
|
|
|
|
For details on the `NSClient++` configuration please refer to the [official documentation](https://docs.nsclient.org/).
|
|
|
|
## NSCA-NG <a id="agent-based-checks-nsca-ng"></a>
|
|
|
|
[NSCA-ng](http://www.nsca-ng.org) provides a client-server pair that allows the
|
|
remote sender to push check results into the Icinga 2 `ExternalCommandListener`
|
|
feature.
|
|
|
|
> **Note**
|
|
>
|
|
> This addon works in a similar fashion like the Icinga 1.x distributed model. If you
|
|
> are looking for a real distributed architecture with Icinga 2, scroll down.
|
|
|
|
## NRPE <a id="agent-based-checks-nrpe"></a>
|
|
|
|
[NRPE](https://docs.icinga.com/latest/en/nrpe.html) runs as daemon on the remote client including
|
|
the required plugins and command definitions.
|
|
Icinga 2 calls the `check_nrpe` plugin binary in order to query the configured command on the
|
|
remote client.
|
|
|
|
> **Note**
|
|
>
|
|
> The NRPE protocol is considered insecure and has multiple flaws in its
|
|
> design. Upstream is not willing to fix these issues.
|
|
>
|
|
> In order to stay safe, please use the native [Icinga 2 client](06-distributed-monitoring.md#distributed-monitoring)
|
|
> instead.
|
|
|
|
The NRPE daemon uses its own configuration format in nrpe.cfg while `check_nrpe`
|
|
can be embedded into the Icinga 2 `CheckCommand` configuration syntax.
|
|
|
|
You can use the `check_nrpe` plugin from the NRPE project to query the NRPE daemon.
|
|
Icinga 2 provides the [nrpe check command](10-icinga-template-library.md#plugin-check-command-nrpe) for this:
|
|
|
|
Example:
|
|
|
|
object Service "users" {
|
|
import "generic-service"
|
|
|
|
host_name = "remote-nrpe-host"
|
|
|
|
check_command = "nrpe"
|
|
vars.nrpe_command = "check_users"
|
|
}
|
|
|
|
nrpe.cfg:
|
|
|
|
command[check_users]=/usr/local/icinga/libexec/check_users -w 5 -c 10
|
|
|
|
If you are planning to pass arguments to NRPE using the `-a`
|
|
command line parameter, make sure that your NRPE daemon has them
|
|
supported and enabled.
|
|
|
|
> **Note**
|
|
>
|
|
> Enabling command arguments in NRPE is considered harmful
|
|
> and exposes a security risk allowing attackers to execute
|
|
> commands remotely. Details at [seclists.org](http://seclists.org/fulldisclosure/2014/Apr/240).
|
|
|
|
The plugin check command `nrpe` provides the `nrpe_arguments` custom
|
|
attribute which expects either a single value or an array of values.
|
|
|
|
Example:
|
|
|
|
object Service "nrpe-disk-/" {
|
|
import "generic-service"
|
|
|
|
host_name = "remote-nrpe-host"
|
|
|
|
check_command = "nrpe"
|
|
vars.nrpe_command = "check_disk"
|
|
vars.nrpe_arguments = [ "20%", "10%", "/" ]
|
|
}
|
|
|
|
Icinga 2 will execute the nrpe plugin like this:
|
|
|
|
/usr/lib/nagios/plugins/check_nrpe -H <remote-nrpe-host> -c 'check_disk' -a '20%' '10%' '/'
|
|
|
|
NRPE expects all additional arguments in an ordered fashion
|
|
and interprets the first value as `$ARG1$` macro, the second
|
|
value as `$ARG2$`, and so on.
|
|
|
|
nrpe.cfg:
|
|
|
|
command[check_disk]=/usr/local/icinga/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
|
|
|
|
Using the above example with `nrpe_arguments` the command
|
|
executed by the NRPE daemon looks similar to that:
|
|
|
|
/usr/local/icinga/libexec/check_disk -w 20% -c 10% -p /
|
|
|
|
You can pass arguments in a similar manner to [NSClient++](07-agent-based-monitoring.md#agent-based-checks-nsclient)
|
|
when using its NRPE supported check method.
|
|
|
|
|
|
## Passive Check Results and SNMP Traps <a id="agent-based-checks-snmp-traps"></a>
|
|
|
|
SNMP Traps can be received and filtered by using [SNMPTT](http://snmptt.sourceforge.net/)
|
|
and specific trap handlers passing the check results to Icinga 2.
|
|
|
|
Following the SNMPTT [Format](http://snmptt.sourceforge.net/docs/snmptt.shtml#SNMPTT.CONF-FORMAT)
|
|
documentation and the Icinga external command syntax found [here](24-appendix.md#external-commands-list-detail)
|
|
we can create generic services that can accommodate any number of hosts for a given scenario.
|
|
|
|
### Simple SNMP Traps <a id="simple-traps"></a>
|
|
|
|
A simple example might be monitoring host reboots indicated by an SNMP agent reset.
|
|
Building the event to auto reset after dispatching a notification is important.
|
|
Setup the manual check parameters to reset the event from an initial unhandled
|
|
state or from a missed reset event.
|
|
|
|
Add a directive in `snmptt.conf`
|
|
|
|
EVENT coldStart .1.3.6.1.6.3.1.1.5.1 "Status Events" Normal
|
|
FORMAT Device reinitialized (coldStart)
|
|
EXEC echo "[$@] PROCESS_SERVICE_CHECK_RESULT;$A;Coldstart;2;The snmp agent has reinitialized." >> /var/run/icinga2/cmd/icinga2.cmd
|
|
SDESC
|
|
A coldStart trap signifies that the SNMPv2 entity, acting
|
|
in an agent role, is reinitializing itself and that its
|
|
configuration may have been altered.
|
|
EDESC
|
|
|
|
1. Define the `EVENT` as per your need.
|
|
2. Construct the `EXEC` statement with the service name matching your template
|
|
applied to your _n_ hosts. The host address inferred by SNMPTT will be the
|
|
correlating factor. You can have snmptt provide host names or ip addresses to
|
|
match your Icinga convention.
|
|
|
|
Add an `EventCommand` configuration object for the passive service auto reset event.
|
|
|
|
object EventCommand "coldstart-reset-event" {
|
|
command = [ SysconfDir + "/icinga2/conf.d/custom/scripts/coldstart_reset_event.sh" ]
|
|
|
|
arguments = {
|
|
"-i" = "$service.state_id$"
|
|
"-n" = "$host.name$"
|
|
"-s" = "$service.name$"
|
|
}
|
|
}
|
|
|
|
Create the `coldstart_reset_event.sh` shell script to pass the expanded variable
|
|
data in. The `$service.state_id$` is important in order to prevent an endless loop
|
|
of event firing after the service has been reset.
|
|
|
|
#!/bin/bash
|
|
|
|
SERVICE_STATE_ID=""
|
|
HOST_NAME=""
|
|
SERVICE_NAME=""
|
|
|
|
show_help()
|
|
{
|
|
cat <<-EOF
|
|
Usage: ${0##*/} [-h] -n HOST_NAME -s SERVICE_NAME
|
|
Writes a coldstart reset event to the Icinga command pipe.
|
|
|
|
-h Display this help and exit.
|
|
-i SERVICE_STATE_ID The associated service state id.
|
|
-n HOST_NAME The associated host name.
|
|
-s SERVICE_NAME The associated service name.
|
|
EOF
|
|
}
|
|
|
|
while getopts "hi:n:s:" opt; do
|
|
case "$opt" in
|
|
h)
|
|
show_help
|
|
exit 0
|
|
;;
|
|
i)
|
|
SERVICE_STATE_ID=$OPTARG
|
|
;;
|
|
n)
|
|
HOST_NAME=$OPTARG
|
|
;;
|
|
s)
|
|
SERVICE_NAME=$OPTARG
|
|
;;
|
|
'?')
|
|
show_help
|
|
exit 0
|
|
;;
|
|
esac
|
|
done
|
|
|
|
if [ -z "$SERVICE_STATE_ID" ]; then
|
|
show_help
|
|
printf "\n Error: -i required.\n"
|
|
exit 1
|
|
fi
|
|
|
|
if [ -z "$HOST_NAME" ]; then
|
|
show_help
|
|
printf "\n Error: -n required.\n"
|
|
exit 1
|
|
fi
|
|
|
|
if [ -z "$SERVICE_NAME" ]; then
|
|
show_help
|
|
printf "\n Error: -s required.\n"
|
|
exit 1
|
|
fi
|
|
|
|
if [ "$SERVICE_STATE_ID" -gt 0 ]; then
|
|
echo "[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;$HOST_NAME;$SERVICE_NAME;0;Auto-reset (`date +"%m-%d-%Y %T"`)." >> /var/run/icinga2/cmd/icinga2.cmd
|
|
fi
|
|
|
|
Finally create the `Service` and assign it:
|
|
|
|
apply Service "Coldstart" {
|
|
import "generic-service-custom"
|
|
|
|
check_command = "dummy"
|
|
event_command = "coldstart-reset-event"
|
|
|
|
enable_notifications = 1
|
|
enable_active_checks = 0
|
|
enable_passive_checks = 1
|
|
enable_flapping = 0
|
|
volatile = 1
|
|
enable_perfdata = 0
|
|
|
|
vars.dummy_state = 0
|
|
vars.dummy_text = "Manual reset."
|
|
|
|
vars.sla = "24x7"
|
|
|
|
assign where (host.vars.os == "Linux" || host.vars.os == "Windows")
|
|
}
|
|
|
|
### Complex SNMP Traps <a id="complex-traps"></a>
|
|
|
|
A more complex example might be passing dynamic data from a traps varbind list
|
|
for a backup scenario where the backup software dispatches status updates. By
|
|
utilizing active and passive checks, the older freshness concept can be leveraged.
|
|
|
|
By defining the active check as a hard failed state, a missed backup can be reported.
|
|
As long as the most recent passive update has occurred, the active check is bypassed.
|
|
|
|
Add a directive in `snmptt.conf`
|
|
|
|
EVENT enterpriseSpecific <YOUR OID> "Status Events" Normal
|
|
FORMAT Enterprise specific trap
|
|
EXEC echo "[$@] PROCESS_SERVICE_CHECK_RESULT;$A;$1;$2;$3" >> /var/run/icinga2/cmd/icinga2.cmd
|
|
SDESC
|
|
An enterprise specific trap.
|
|
The varbinds in order denote the Icinga service name, state and text.
|
|
EDESC
|
|
|
|
1. Define the `EVENT` as per your need using your actual oid.
|
|
2. The service name, state and text are extracted from the first three varbinds.
|
|
This has the advantage of accommodating an unlimited set of use cases.
|
|
|
|
Create a `Service` for the specific use case associated to the host. If the host
|
|
matches and the first varbind value is `Backup`, SNMPTT will submit the corresponding
|
|
passive update with the state and text from the second and third varbind:
|
|
|
|
object Service "Backup" {
|
|
import "generic-service-custom"
|
|
|
|
host_name = "host.domain.com"
|
|
check_command = "dummy"
|
|
|
|
enable_notifications = 1
|
|
enable_active_checks = 1
|
|
enable_passive_checks = 1
|
|
enable_flapping = 0
|
|
volatile = 1
|
|
max_check_attempts = 1
|
|
check_interval = 87000
|
|
enable_perfdata = 0
|
|
|
|
vars.sla = "24x7"
|
|
vars.dummy_state = 2
|
|
vars.dummy_text = "No passive check result received."
|
|
}
|
|
|