28 KiB
Service Monitoring
The power of Icinga 2 lies in its modularity. There are thousands of community plugins available next to the standard plugins provided by the Monitoring Plugins project.
Requirements
Plugins
All existing Icinga or Nagios plugins work with Icinga 2. Community plugins can be found for example on Icinga Exchange.
The recommended way of setting up these plugins is to copy them
into the PluginDir
directory.
If you have plugins with many dependencies, consider creating a custom RPM/DEB package which handles the required libraries and binaries.
Configuration management tools such as Puppet, Ansible, Chef or Saltstack also help with automatically installing the plugins on different operating systems. They can also help with installing the required dependencies, e.g. Python libraries, Perl modules, etc.
Plugin Setup
Good plugins provide installations and configuration instructions in their docs and/or README on GitHub.
Sometimes dependencies are not listed, or your distribution differs from the one described. Try running the plugin after setup and ensure it works.
Ensure it works
Prior to using the check plugin with Icinga 2 you should ensure that it is working properly by trying to run it on the console using whichever user Icinga 2 is running as:
RHEL/CentOS/Fedora
sudo -u icinga /usr/lib64/nagios/plugins/check_mysql_health --help
Debian/Ubuntu
sudo -u nagios /usr/lib/nagios/plugins/check_mysql_health --help
Additional libraries may be required for some plugins. Please consult the plugin documentation and/or the included README file for installation instructions. Sometimes plugins contain hard-coded paths to other components. Instead of changing the plugin it might be easier to create a symbolic link to make sure it doesn't get overwritten during the next update.
Sometimes there are plugins which do not exactly fit your requirements. In that case you can modify an existing plugin or just write your own.
Plugin Dependency Errors
Plugins can be scripts (Shell, Python, Perl, Ruby, PHP, etc.) or compiled binaries (C, C++, Go).
These scripts/binaries may require additional libraries which must be installed on every system they are executed.
Tip
Don't test the plugins on your master instance, instead do that on the satellites and clients which execute the checks.
There are errors, now what? Typical errors are missing libraries, binaries or packages.
Python Example
Example for a Python plugin which uses the tinkerforge
module
to query a network service:
ImportError: No module named tinkerforge.ip_connection
Its documentation
points to installing the tinkerforge
Python module.
Perl Example
Example for a Perl plugin which uses SNMP:
Can't locate Net/SNMP.pm in @INC (you may need to install the Net::SNMP module)
Prior to installing the Perl module via CPAN, look for a distribution
specific package, e.g. libnet-snmp-perl
on Debian/Ubuntu or perl-Net-SNMP
on RHEL/CentOS.
Optional: Custom Path
If you are not using the default PluginDir
directory, you
can create a custom plugin directory and constant
and reference this in the created CheckCommand objects.
Create a common directory e.g. /opt/monitoring/plugins
and install the plugin there.
mkdir -p /opt/monitoring/plugins
cp check_snmp_int.pl /opt/monitoring/plugins
chmod +x /opt/monitoring/plugins/check_snmp_int.pl
Next create a new global constant, e.g. CustomPluginDir
in your constants.conf
configuration file:
vim /etc/icinga2/constants.conf
const PluginDir = "/usr/lib/nagios/plugins"
const CustomPluginDir = "/opt/monitoring/plugins"
CheckCommand Definition
Each plugin requires a CheckCommand object in your configuration which can be used in the Service or Host object definition.
Please check if the Icinga 2 package already provides an existing CheckCommand definition.
If that's the case, thoroughly check the required parameters and integrate the check command into your host and service objects. Best practice is to run the plugin on the CLI with the required parameters first.
Example for database size checks with check_mysql_health.
/usr/lib64/nagios/plugins/check_mysql_health --hostname '127.0.0.1' --username root --password icingar0xx --mode sql --name 'select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '\''icinga'\'';' '--name2' 'db_size' --units 'MB' --warning 4096 --critical 8192
The parameter names inside the ITL commands follow the
<command name>_<parameter name>
schema.
Icinga Director
Navigate into Commands > External Commands
and search for mysql_health
.
Select mysql_health
and navigate into the Fields
tab.
In order to access the parameters, the Director requires you to first define the needed custom data fields:
mysql_health_hostname
mysql_health_username
andmysql_health_password
mysql_health_mode
mysql_health_name
,mysql_health_name2
andmysql_health_units
mysql_health_warning
andmysql_health_critical
Create a new host template and object where you'll generic
settings like mysql_health_hostname
(if it differs from the host's
address
attribute) and mysql_health_username
and mysql_health_password
.
Create a new service template for mysql-health
and set the mysql_health
as check command. You can also define a default for mysql_health_mode
.
Next, create a service apply rule or a new service set which gets assigned to matching host objects.
Icinga Config Files
Create or modify a host object which stores the generic database defaults and prepares details for a service apply for rule.
object Host "icinga2-master1.localdomain" {
check_command = "hostalive"
address = "..."
// Database listens locally, not external
vars.mysql_health_hostname = "127.0.0.1"
// Basic database size checks for Icinga DBs
vars.databases["icinga"] = {
mysql_health_warning = 4096 //MB
mysql_health_critical = 8192 //MB
}
vars.databases["icingaweb2"] = {
mysql_health_warning = 4096 //MB
mysql_health_critical = 8192 //MB
}
}
The host object prepares the database details and thresholds already for advanced apply for rules. It also uses conditions to fetch host specified values, or set default values.
apply Service "db-size-" for (db_name => config in host.vars.databases) {
check_interval = 1m
retry_interval = 30s
check_command = "mysql_health"
if (config.mysql_health_username) {
vars.mysql_healt_username = config.mysql_health_username
} else {
vars.mysql_health_username = "root"
}
if (config.mysql_health_password) {
vars.mysql_healt_password = config.mysql_health_password
} else {
vars.mysql_health_password = "icingar0xx"
}
vars.mysql_health_mode = "sql"
vars.mysql_health_name = "select sum(data_length + index_length) / 1024 / 1024 from information_schema.tables where table_schema = '" + db_name + "';"
vars.mysql_health_name2 = "db_size"
vars.mysql_health_units = "MB"
if (config.mysql_health_warning) {
vars.mysql_health_warning = config.mysql_health_warning
}
if (config.mysql_health_critical) {
vars.mysql_health_critical = config.mysql_health_critical
}
vars += config
}
New CheckCommand
This chapter describes how to add a new CheckCommand object for a plugin.
Please make sure to follow these conventions when adding a new command object definition:
- Use command arguments whenever possible. The
command
attribute must be an array in[ ... ]
for shell escaping. - Define a unique
prefix
for the command's specific arguments. Best practice is to follow this schema:
<command name>_<parameter name>
That way you can safely set them on host/service level and you'll always know which command they control.
- Use command argument default values, e.g. for thresholds.
- Use advanced conditions like
set_if
definitions.
Before starting with the CheckCommand definition, please check the existing objects available inside the ITL. They follow best practices and are maintained by developers and our community.
This example picks a new plugin called check_systemd uploaded to Icinga Exchange in June 2019.
First, install the plugin and ensure
that it works. Then run it with the
--help
parameter to see the actual parameters (docs might be outdated).
./check_systemd.py --help
usage: check_systemd.py [-h] [-c SECONDS] [-e UNIT | -u UNIT] [-v] [-V]
[-w SECONDS]
...
optional arguments:
-h, --help show this help message and exit
-c SECONDS, --critical SECONDS
Startup time in seconds to result in critical status.
-e UNIT, --exclude UNIT
Exclude a systemd unit from the checks. This option
can be applied multiple times. For example: -e mnt-
data.mount -e task.service.
-u UNIT, --unit UNIT Name of the systemd unit that is beeing tested.
-v, --verbose Increase output verbosity (use up to 3 times).
-V, --version show program's version number and exit
-w SECONDS, --warning SECONDS
Startup time in seconds to result in warning status.
The argument description is important, based on this you need to create the command arguments.
Tip
When you are using the Director, you can prepare the commands as files e.g. inside the
global-templates
zone. Then run the kickstart wizard again to import the commands as external reference.If you prefer to use the Director GUI/CLI, please apply the steps in the
Add Command
form.
Start with the basic plugin call without any parameters.
object CheckCommand "systemd" { // Plugin name without 'check_' prefix
command = [ PluginContribDir + "/check_systemd.py" ] // Use the 'PluginContribDir' constant, see the contributed ITL commands
}
Run a config validation to see if that works, icinga2 daemon -C
Next, analyse the plugin parameters. Plugins with a good help output show
optional parameters in square brackes. This is the case for all parameters
for this plugin. If there are required parameters, use the required
key
inside the argument.
The arguments
attribute is a dictionary which takes the parameters as keys.
arguments = {
"--unit" = { ... }
}
If there a long parameter names available, prefer them. This increases readability in both the configuration as well as the executed command line.
The argument value itself is a sub dictionary which has additional keys:
value
which references the runtime macro stringdescription
where you copy the plugin parameter help text intorequired
,set_if
, etc. for advanced parameters, check the CheckCommand object chapter.
The runtime macro syntax is required to allow value extraction when the command is executed.
Tip
Inside the Director, store the new command first in order to unveil the
Arguments
tab.
Best practice is to use the command name as prefix, in this specific
case e.g. systemd_unit
.
arguments = {
"--unit" = {
value = "$systemd_unit$" // The service parameter would then be defined as 'vars.systemd_unit = "icinga2"'
description = "Name of the systemd unit that is beeing tested."
}
"--warning" = {
value = "$systemd_warning$"
description = "Startup time in seconds to result in warning status."
}
"--critical" = {
value = "$systemd_critical$"
description = "Startup time in seconds to result in critical status."
}
}
This may take a while -- validate the configuration in between up until the CheckCommand definition is done.
Then test and integrate it into your monitoring configuration.
Remember: Do it once and right, and never touch the CheckCommand again. Optional arguments allow different use cases and scenarios.
Once you have created your really good CheckCommand, please consider sharing it with our community by creating a new PR on GitHub. Please also update the documentation for the ITL.
Tip
Inside the Director, you can render the configuration in the Deployment section. Extract the static configuration object and use that as a source for sending it upstream.
Modify Existing CheckCommand
Sometimes an existing CheckCommand inside the ITL is missing a parameter. Or you don't need a default parameter value being set.
Instead of copying the entire configuration object, you can import an object into another new object.
object CheckCommand "http-custom" {
import "http" // Import existing http object
arguments += { // Use additive assignment to add missing parameters
"--key" = {
value = "$http_..." // Keep the parameter name the same as with http
}
}
// Override default parameters
vars.http_address = "..."
}
This CheckCommand can then be referenced in your host/service object definitions.
Plugin API
Icinga 2 supports the native plugin API specification from the Monitoring Plugins project. It is defined in the Monitoring Plugins Development Guidelines.
Output
<STATUS>: <A short description what happened>
OK: MySQL connection time is fine (0.0002s)
WARNING: MySQL connection time is slow (0.5s > 0.1s threshold)
CRITICAL: MySQL connection time is causing degraded performance (3s > 0.5s threshold)
Icinga supports reading multi-line output where Icinga Web only shows the first line in the listings and everything in the detail view.
Example for an end2end check with many smaller test cases integrated:
OK: Online banking works.
Testcase 1: Site reached.
Testcase 2: Attempted login, JS loads.
Testcase 3: Login succeeded.
Testcase 4: View current state works.
Testcase 5: Transactions fine.
If the extended output shouldn't be visible in your monitoring, but only for testing,
it is recommended to implement the -v
or --verbose
plugin parameter to allow
developers and users to debug further.
Status
Value | Status | Description |
---|---|---|
0 | OK | The check went fine and everything is considered working. |
1 | Warning | The check is above the given warning threshold, or anything else is suspicious requiring attention before it breaks. |
2 | Critical | The check exceeded the critical threshold, or something really is broken and will harm the production environment. |
3 | Unknown | Invalid parameters, low level resource errors (IO device busy, no fork resources, TCP sockets, etc.) preventing the actual check. Higher level errors such as DNS resolving, TCP connection timeouts should be treated as Critical instead. Whenever the plugin reaches its timeout (best practice) it should also terminate with Unknown . |
Keep in mind that these are service states. Icinga automatically maps the host state from the returned plugin states.
Performance Data Metrics
Timeout
Icinga has a safety mechanism where it kills processes running for too long. The timeout can be specified in CheckCommand objects or on the host/service object.
Best practice is to control the timeout in the plugin itself and provide a clear message followed by the Unknown state.
Example in Python taken from check_tinkerforge:
import argparse
import signal
import sys
def handle_sigalrm(signum, frame, timeout=None):
output('Plugin timed out after %d seconds' % timeout, 3)
if __name__ == '__main__':
parser = argparse.ArgumentParser()
# ... add more arguments
parser.add_argument("-t", "--timeout", help="Timeout in seconds (default 10s)", type=int, default=10)
args = parser.parse_args()
signal.signal(signal.SIGALRM, partial(handle_sigalrm, timeout=args.timeout))
signal.alarm(args.timeout)
# ... perform the check and generate output/status
Versions
Plugins should provide a version via -V
or --version
parameter
which is bumped on releases. This allows to identify problems with
too old or new versions on the community support channels.
Example in Python taken from check_tinkerforge:
import argparse
import signal
import sys
__version__ = '0.9.1'
if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('-V', '--version', action='version', version='%(prog)s v' + sys.modules[__name__].__version__)
Create a new Plugin
Sometimes an existing plugin does not satisfy your requirements. You can either kindly contact the original author about plans to add changes and/or create a patch.
If you just want to format the output and state of an existing plugin it might also be helpful to write a wrapper script. This script could pass all configured parameters, call the plugin script, parse its output/exit code and return your specified output/exit code.
On the other hand plugins for specific services and hardware might not yet exist.
Common best practices when creating a new plugin are for example:
- Choose the programming language wisely
- Scripting languages (Bash, Python, Perl, Ruby, PHP, etc.) are easier to write and setup but their check execution might take longer (invoking the script interpreter as overhead, etc.).
- Plugins written in C/C++, Go, etc. improve check execution time but may generate an overhead with installation and packaging.
- Use a modern VCS such as Git for developing the plugin (e.g. share your plugin on GitHub).
- Add parameters with key-value pairs to your plugin. They should allow long names (e.g.
--host localhost
) and also short parameters (e.g.-H localhost
) -h|--help
should print the version and all details about parameters and runtime invocation.- Add a verbose/debug output functionality for detailed on-demand logging.
- Respect the exit codes required by the Plugin API.
- Always add performance data to your plugin output
Example skeleton:
# 1. include optional libraries
# 2. global variables
# 3. helper functions and/or classes
# 4. define timeout condition
if (<timeout_reached>) then
print "UNKNOWN - Timeout (...) reached | 'time'=30.0
endif
# 5. main method
<execute and fetch data>
if (<threshold_critical_condition>) then
print "CRITICAL - ... | 'time'=0.1 'myperfdatavalue'=5.0
exit(2)
else if (<threshold_warning_condition>) then
print "WARNING - ... | 'time'=0.1 'myperfdatavalue'=3.0
exit(1)
else
print "OK - ... | 'time'=0.2 'myperfdatavalue'=1.0
endif
There are various plugin libraries available which will help with plugin execution and output formatting too, for example nagiosplugin from Python.
Note
Ensure to test your plugin properly with special cases before putting it into production!
Once you've finished your plugin please upload/sync it to Icinga Exchange. Thanks in advance!
Service Monitoring Overview
The following examples should help you to start implementing your own ideas. There is a variety of plugins available. This collection is not complete -- if you have any updates, please send a documentation patch upstream.
General Monitoring
If the remote service is available (via a network protocol and port), and if a check plugin is also available, you don't necessarily need a local client. Instead, choose a plugin and configure its parameters and thresholds. The following examples are included in the Icinga 2 Template Library:
Linux Monitoring
- disk
- mem, swap
- procs
- users
- running_kernel
- package management: apt, yum, etc.
- ssh
- performance: iostat, check_sar_perf
Windows Monitoring
- check_wmi_plus
- NSClient++ (in combination with the Icinga 2 client and either check_nscp_api or nscp-local check commands)
- Icinga 2 Windows Plugins (disk, load, memory, network, performance counters, ping, procs, service, swap, updates, uptime, users
- vbs and Powershell scripts
Database Monitoring
- MySQL/MariaDB: mysql_health, mysql, mysql_query
- PostgreSQL: postgres
- Oracle: oracle_health
- MSSQL: mssql_health
- DB2: db2_health
- MongoDB: mongodb
- Elasticsearch: elasticsearch
- Redis: redis
SNMP Monitoring
- Manubulon plugins (interface, storage, load, memory, process)
- snmp, snmpv3
Network Monitoring
Web Monitoring
Java Monitoring
DNS Monitoring
Backup Monitoring
Log Monitoring
Virtualization Monitoring
VMware Monitoring
Tip: If you are encountering timeouts using the VMware Perl SDK, check this blog entry. Ubuntu 16.04 LTS can have troubles with random entropy in Perl asked here. In that case, haveged may help.