icinga2/doc/07-agent-based-monitoring.md

388 lines
12 KiB
Markdown

# Additional Agent-based Checks <a id="agent-based-checks-addon"></a>
If the remote services are not directly accessible through the network, a
local agent installation exposing the results to check queries can
become handy.
## SNMP <a id="agent-based-checks-snmp"></a>
The SNMP daemon runs on the remote system and answers SNMP queries by plugin
binaries. The [Monitoring Plugins package](02-getting-started.md#setting-up-check-plugins) ships
the `check_snmp` plugin binary, but there are plenty of [existing plugins](05-service-monitoring.md#service-monitoring-plugins)
for specific use cases already around, for example monitoring Cisco routers.
The following example uses the [SNMP ITL](10-icinga-template-library.md#plugin-check-command-snmp) `CheckCommand` and just
overrides the `snmp_oid` custom attribute. A service is created for all hosts which
have the `snmp-community` custom attribute.
```
apply Service "uptime" {
import "generic-service"
check_command = "snmp"
vars.snmp_oid = "1.3.6.1.2.1.1.3.0"
vars.snmp_miblist = "DISMAN-EVENT-MIB"
assign where host.vars.snmp_community != ""
}
```
Additional SNMP plugins are available using the [Manubulon SNMP Plugins](10-icinga-template-library.md#snmp-manubulon-plugin-check-commands).
If no `snmp_miblist` is specified, the plugin will default to `ALL`. As the number of available MIB files
on the system increases so will the load generated by this plugin if no `MIB` is specified.
As such, it is recommended to always specify at least one `MIB`.
## SSH <a id="agent-based-checks-ssh"></a>
Calling a plugin using the SSH protocol to execute a plugin on the remote server fetching
its return code and output. The `by_ssh` command object is part of the built-in templates and
requires the `check_by_ssh` check plugin which is available in the [Monitoring Plugins package](02-getting-started.md#setting-up-check-plugins).
```
object CheckCommand "by_ssh_swap" {
import "by_ssh"
vars.by_ssh_command = "/usr/lib/nagios/plugins/check_swap -w $by_ssh_swap_warn$ -c $by_ssh_swap_crit$"
vars.by_ssh_swap_warn = "75%"
vars.by_ssh_swap_crit = "50%"
}
object Service "swap" {
import "generic-service"
host_name = "remote-ssh-host"
check_command = "by_ssh_swap"
vars.by_ssh_logname = "icinga"
}
```
## NSClient++ <a id="agent-based-checks-nsclient"></a>
[NSClient++](https://nsclient.org/) works on both Windows and Linux platforms and is well
known for its magnificent Windows support. There are alternatives like the WMI interface,
but using `NSClient++` will allow you to run local scripts similar to check plugins fetching
the required output and performance counters.
You can use the `check_nt` plugin from the Monitoring Plugins project to query NSClient++.
Icinga 2 provides the [nscp check command](10-icinga-template-library.md#plugin-check-command-nscp) for this:
Example:
```
object Service "disk" {
import "generic-service"
host_name = "remote-windows-host"
check_command = "nscp"
vars.nscp_variable = "USEDDISKSPACE"
vars.nscp_params = "c"
vars.nscp_warn = 70
vars.nscp_crit = 80
}
```
For details on the `NSClient++` configuration please refer to the [official documentation](https://docs.nsclient.org/).
## NSCA-NG <a id="agent-based-checks-nsca-ng"></a>
[NSCA-ng](http://www.nsca-ng.org) provides a client-server pair that allows the
remote sender to push check results into the Icinga 2 `ExternalCommandListener`
feature.
> **Note**
>
> This addon works in a similar fashion like the Icinga 1.x distributed model. If you
> are looking for a real distributed architecture with Icinga 2, scroll down.
## NRPE <a id="agent-based-checks-nrpe"></a>
[NRPE](https://docs.icinga.com/latest/en/nrpe.html) runs as daemon on the remote client including
the required plugins and command definitions.
Icinga 2 calls the `check_nrpe` plugin binary in order to query the configured command on the
remote client.
> **Note**
>
> The NRPE protocol is considered insecure and has multiple flaws in its
> design. Upstream is not willing to fix these issues.
>
> In order to stay safe, please use the native [Icinga 2 client](06-distributed-monitoring.md#distributed-monitoring)
> instead.
The NRPE daemon uses its own configuration format in nrpe.cfg while `check_nrpe`
can be embedded into the Icinga 2 `CheckCommand` configuration syntax.
You can use the `check_nrpe` plugin from the NRPE project to query the NRPE daemon.
Icinga 2 provides the [nrpe check command](10-icinga-template-library.md#plugin-check-command-nrpe) for this:
Example:
```
object Service "users" {
import "generic-service"
host_name = "remote-nrpe-host"
check_command = "nrpe"
vars.nrpe_command = "check_users"
}
```
nrpe.cfg:
```
command[check_users]=/usr/local/icinga/libexec/check_users -w 5 -c 10
```
If you are planning to pass arguments to NRPE using the `-a`
command line parameter, make sure that your NRPE daemon has them
supported and enabled.
> **Note**
>
> Enabling command arguments in NRPE is considered harmful
> and exposes a security risk allowing attackers to execute
> commands remotely. Details at [seclists.org](http://seclists.org/fulldisclosure/2014/Apr/240).
The plugin check command `nrpe` provides the `nrpe_arguments` custom
attribute which expects either a single value or an array of values.
Example:
```
object Service "nrpe-disk-/" {
import "generic-service"
host_name = "remote-nrpe-host"
check_command = "nrpe"
vars.nrpe_command = "check_disk"
vars.nrpe_arguments = [ "20%", "10%", "/" ]
}
```
Icinga 2 will execute the nrpe plugin like this:
```
/usr/lib/nagios/plugins/check_nrpe -H <remote-nrpe-host> -c 'check_disk' -a '20%' '10%' '/'
```
NRPE expects all additional arguments in an ordered fashion
and interprets the first value as `$ARG1$` macro, the second
value as `$ARG2$`, and so on.
nrpe.cfg:
```
command[check_disk]=/usr/local/icinga/libexec/check_disk -w $ARG1$ -c $ARG2$ -p $ARG3$
```
Using the above example with `nrpe_arguments` the command
executed by the NRPE daemon looks similar to that:
```
/usr/local/icinga/libexec/check_disk -w 20% -c 10% -p /
```
You can pass arguments in a similar manner to [NSClient++](07-agent-based-monitoring.md#agent-based-checks-nsclient)
when using its NRPE supported check method.
## Passive Check Results and SNMP Traps <a id="agent-based-checks-snmp-traps"></a>
SNMP Traps can be received and filtered by using [SNMPTT](http://snmptt.sourceforge.net/)
and specific trap handlers passing the check results to Icinga 2.
Following the SNMPTT [Format](http://snmptt.sourceforge.net/docs/snmptt.shtml#SNMPTT.CONF-FORMAT)
documentation and the Icinga external command syntax found [here](24-appendix.md#external-commands-list-detail)
we can create generic services that can accommodate any number of hosts for a given scenario.
### Simple SNMP Traps <a id="simple-traps"></a>
A simple example might be monitoring host reboots indicated by an SNMP agent reset.
Building the event to auto reset after dispatching a notification is important.
Setup the manual check parameters to reset the event from an initial unhandled
state or from a missed reset event.
Add a directive in `snmptt.conf`
```
EVENT coldStart .1.3.6.1.6.3.1.1.5.1 "Status Events" Normal
FORMAT Device reinitialized (coldStart)
EXEC echo "[$@] PROCESS_SERVICE_CHECK_RESULT;$A;Coldstart;2;The snmp agent has reinitialized." >> /var/run/icinga2/cmd/icinga2.cmd
SDESC
A coldStart trap signifies that the SNMPv2 entity, acting
in an agent role, is reinitializing itself and that its
configuration may have been altered.
EDESC
```
1. Define the `EVENT` as per your need.
2. Construct the `EXEC` statement with the service name matching your template
applied to your _n_ hosts. The host address inferred by SNMPTT will be the
correlating factor. You can have snmptt provide host names or ip addresses to
match your Icinga convention.
Add an `EventCommand` configuration object for the passive service auto reset event.
```
object EventCommand "coldstart-reset-event" {
command = [ ConfigDir + "/conf.d/custom/scripts/coldstart_reset_event.sh" ]
arguments = {
"-i" = "$service.state_id$"
"-n" = "$host.name$"
"-s" = "$service.name$"
}
}
```
Create the `coldstart_reset_event.sh` shell script to pass the expanded variable
data in. The `$service.state_id$` is important in order to prevent an endless loop
of event firing after the service has been reset.
```
#!/bin/bash
SERVICE_STATE_ID=""
HOST_NAME=""
SERVICE_NAME=""
show_help()
{
cat <<-EOF
Usage: ${0##*/} [-h] -n HOST_NAME -s SERVICE_NAME
Writes a coldstart reset event to the Icinga command pipe.
-h Display this help and exit.
-i SERVICE_STATE_ID The associated service state id.
-n HOST_NAME The associated host name.
-s SERVICE_NAME The associated service name.
EOF
}
while getopts "hi:n:s:" opt; do
case "$opt" in
h)
show_help
exit 0
;;
i)
SERVICE_STATE_ID=$OPTARG
;;
n)
HOST_NAME=$OPTARG
;;
s)
SERVICE_NAME=$OPTARG
;;
'?')
show_help
exit 0
;;
esac
done
if [ -z "$SERVICE_STATE_ID" ]; then
show_help
printf "\n Error: -i required.\n"
exit 1
fi
if [ -z "$HOST_NAME" ]; then
show_help
printf "\n Error: -n required.\n"
exit 1
fi
if [ -z "$SERVICE_NAME" ]; then
show_help
printf "\n Error: -s required.\n"
exit 1
fi
if [ "$SERVICE_STATE_ID" -gt 0 ]; then
echo "[`date +%s`] PROCESS_SERVICE_CHECK_RESULT;$HOST_NAME;$SERVICE_NAME;0;Auto-reset (`date +"%m-%d-%Y %T"`)." >> /var/run/icinga2/cmd/icinga2.cmd
fi
```
Finally create the `Service` and assign it:
```
apply Service "Coldstart" {
import "generic-service-custom"
check_command = "dummy"
event_command = "coldstart-reset-event"
enable_notifications = 1
enable_active_checks = 0
enable_passive_checks = 1
enable_flapping = 0
volatile = 1
enable_perfdata = 0
vars.dummy_state = 0
vars.dummy_text = "Manual reset."
vars.sla = "24x7"
assign where (host.vars.os == "Linux" || host.vars.os == "Windows")
}
```
### Complex SNMP Traps <a id="complex-traps"></a>
A more complex example might be passing dynamic data from a traps varbind list
for a backup scenario where the backup software dispatches status updates. By
utilizing active and passive checks, the older freshness concept can be leveraged.
By defining the active check as a hard failed state, a missed backup can be reported.
As long as the most recent passive update has occurred, the active check is bypassed.
Add a directive in `snmptt.conf`
```
EVENT enterpriseSpecific <YOUR OID> "Status Events" Normal
FORMAT Enterprise specific trap
EXEC echo "[$@] PROCESS_SERVICE_CHECK_RESULT;$A;$1;$2;$3" >> /var/run/icinga2/cmd/icinga2.cmd
SDESC
An enterprise specific trap.
The varbinds in order denote the Icinga service name, state and text.
EDESC
```
1. Define the `EVENT` as per your need using your actual oid.
2. The service name, state and text are extracted from the first three varbinds.
This has the advantage of accommodating an unlimited set of use cases.
Create a `Service` for the specific use case associated to the host. If the host
matches and the first varbind value is `Backup`, SNMPTT will submit the corresponding
passive update with the state and text from the second and third varbind:
```
object Service "Backup" {
import "generic-service-custom"
host_name = "host.domain.com"
check_command = "dummy"
enable_notifications = 1
enable_active_checks = 1
enable_passive_checks = 1
enable_flapping = 0
volatile = 1
max_check_attempts = 1
check_interval = 87000
enable_perfdata = 0
vars.sla = "24x7"
vars.dummy_state = 2
vars.dummy_text = "No passive check result received."
}
```