10 KiB
Monitoring Basics
This part of the Icinga 2 documentation provides an overview of all the basic monitoring concepts you need to know to run Icinga 2.
Hosts and Services
Icinga 2 can be used to monitor the availability of hosts and services. Services can be virtually anything which can be checked in some way:
- Network services (HTTP, SMTP, SNMP, SSH, etc.)
- Printers
- Switches / Routers
- Temperature Sensors
- Other local or network-accessible services
Host objects provide a mechanism to group together services that are running on the same physical device.
Here is an example of a host object which defines two child services:
object Host "my-server1" {
services["ping4"] = {
check_command = "ping4"
},
services["http"] = {
check_command = "http_ip"
},
check = "ping4",
macros["address"] = "10.0.0.1"
}
The example host my-server1 creates two services which belong to this host: ping4 and http.
It also specifies that the host should inherit its availability state from the ping4 service.
Note
In Icinga 1.x hosts had their own check command, check interval and notification settings. Instead, in Icinga 2 hosts inherit their state from one of its child services. No checks are performed for the host itself.
The address macro is used by check commands to determine which network address is associated with the host object.
Host States
Hosts inherit their state from the host check service that is specified using the check attribute.
Hosts can be in any of the following states:
Name | Description |
---|---|
UP | The host is available. |
DOWN | The host is unavailable. |
UNREACHABLE | At least one of the host's dependencies (e.g. its upstream router) is unavailable causing the host to be unreachable. |
Service States
Services can be in any of the following states:
Name | Description |
---|---|
OK | The service is working properly. |
WARNING | The service is experiencing some problems but is still considered to be in working condition. |
CRITICAL | The service is in a critical state. |
UNKNOWN | The check could not determine the service's state. |
Hard and Soft States
When detecting a problem with a service Icinga re-checks the service a number of times (based on the max_check_attempts and retry_interval settings) before sending notifications. This ensures that no unnecessary notifications are sent for transient failures. During this time the service is in a SOFT state.
After all re-checks have been executed and the service is still in a non-OK state the service switches to a HARD state and notifications are sent.
Name | Description |
---|---|
HARD | The host/service's state hasn't recently changed. |
SOFT | The host/service has recently changed state and is being re-checked. |
Check Commands
TODO
Macros
Macros may be used in command definitions to dynamically change how the command is executed.
Here is an example of a command definition which uses user-defined macros:
object CheckCommand "my-ping" inherits "plugin-check-command" {
command = [
"$plugindir$/check_ping",
"-4",
"-H", "$address$",
"-w", "$wrta$,$wpl$%",
"-c", "$crta$,$cpl$%",
"-p", "$packets$",
"-t", "$timeout$"
],
macros = {
wrta = 100,
wpl = 5,
crta = 200,
cpl = 15,
packets = 5,
timeout = 0
}
}
Note
If you have previously used Icinga 1.x you may already be familiar with user and argument macros (e.g., USER1 or ARG1). Unlike in Icinga 1.x macros may have arbitrary names and arguments are no longer specified in the check_command setting.
Macro names must be enclosed in two * signs, e.g. *$plugindir
. When
executing commands Icinga 2 checks the following objects in this
order to look up macros:
- User object (only for notifications)
- Service object
- Host object
- Command object
- Global macros in the IcingaMacros variable
This execution order allows you to define default values for macros in your command objects. The my-ping command shown above uses this to set default values for some of the latency thresholds and timeouts.
When using the my-ping command you can override all or some of the macros in the service definition like this:
object Host "my-server1" {
services["ping"] = {
check_command = "my-ping",
macros["packets"] = 10 // Overrides the default value of 5 given in the command
},
macros["address"] = "10.0.0.1"
}
If a macro isn't defined anywhere an empty value is used and a warning is emitted to the Icinga 2 log.
Note
Macros in capital letters (e.g. HOSTNAME) are reserved for use by Icinga 2 and should not be overwritten by users.
By convention every host should have an address macro. Hosts which have an IPv6 address should also have an address6 macro.
The plugindir macro should be set to the path of your check plugins. The /etc/icinga2/conf.d/macros.conf file is usually used to define global macros including this one.
Host Macros
The following host macros are available in all commands that are executed for hosts or services:
Name | Description |
---|---|
HOSTNAME | The name of the host object. |
HOSTDISPLAYNAME | The value of the display_name attribute. |
HOSTALIAS | This is an alias for the HOSTDISPLAYNAME macro. |
HOSTSTATE | The host's current state. Can be one of UNREACHABLE, UP and DOWN. |
HOSTSTATEID | The host's current state. Can be one of 0 (up), 1 (down) and 2 (unreachable). |
HOSTSTATETYPE | The host's current state type. Can be one of SOFT and HARD. |
HOSTATTEMPT | The current check attempt number. |
MAXHOSTATTEMPT | The maximum number of checks which are executed before changing to a hard state. |
LASTHOSTSTATE | The host's previous state. Can be one of UNREACHABLE, UP and DOWN. |
LASTHOSTSTATEID | The host's previous state. Can be one of 0 (up), 1 (down) and 2 (unreachable). |
LASTHOSTSTATETYPE | The host's previous state type. Can be one of SOFT and HARD. |
HOSTLATENCY | The host's check latency. |
HOSTEXECUTIONTIME | The host's check execution time. |
HOSTOUTPUT | The last check's output. |
HOSTPERFDATA | The last check's performance data. |
LASTHOSTCHECK | The timestamp when the last check was executed. |
HOSTADDRESS | This is an alias for the address macro. If the address macro is not defined the host object's name is used instead. |
HOSTADDRESS6 | This is an alias for the address6 macro. If the address macro is not defined the host object's name is used instead. |
Service Macros
The following service macros are available in all commands that are executed for services:
Name | Description |
---|---|
SERVICEDESC | The short name of the service object. |
SERVICEDISPLAYNAME | The value of the display_name attribute. |
SERVICECHECKCOMMAND | This is an alias for the SERVICEDISPLAYNAME macro. |
SERVICESTATE | The service's current state. Can be one of OK, WARNING, CRITICAL, UNCHECKABLE and UNKNOWN. |
SERVICESTATEID | The service's current state. Can be one of 0 (ok), 1 (warning), 2 (critical), 3 (unknown) and 4 (uncheckable). |
SERVICESTATETYPE | The service's current state type. Can be one of SOFT and HARD. |
SERVICEATTEMPT | The current check attempt number. |
MAXSERVICEATTEMPT | The maximum number of checks which are executed before changing to a hard state. |
LASTSERVICESTATE | The service's previous state. Can be one of OK, WARNING, CRITICAL, UNCHECKABLE and UNKNOWN. |
LASTSERVICESTATEID | The service's previous state. Can be one of 0 (ok), 1 (warning), 2 (critical), 3 (unknown) and 4 (uncheckable). |
LASTSERVICESTATETYPE | The service's previous state type. Can be one of SOFT and HARD. |
LASTSERVICESTATECHANGE | The last state change's timestamp. |
SERVICELATENCY | The service's check latency. |
SERVICEEXECUTIONTIME | The service's check execution time. |
SERVICEOUTPUT | The last check's output. |
SERVICEPERFDATA | The last check's performance data. |
LASTSERVICECHECK | The timestamp when the last check was executed. |
User Macros
The following service macros are available in all commands that are executed for users:
Name | Description |
---|---|
CONTACTNAME | The name of the user object. |
CONTACTALIAS | The value of the display_name attribute. |
CONTACTEMAIL | This is an alias for the email macro. |
CONTACTPAGER | This is an alias for the pager macro. |
Global Macros
The following macros are available in all commands:
Name | Description |
---|---|
TIMET | Current UNIX timestamp. |
LONGDATETIME | Current date and time including timezone information. |
SHORTDATETIME | Current date and time. |
DATE | Current date. |
TIME | Current time including timezone information. |
Using Templates
Templates may be used to apply a set of similar settings to more than one object.
For example, rather than manually creating a ping service object for each of your hosts you can use templates to avoid having to copy & paste parts of your config:
template Host "linux-server" {
services["ping"] = {
check_command = "ping4"
},
check = "ping4"
}
object Host "my-server1" inherits "linux-server" {
macros["address"] = "10.0.0.1"
}
object Host "my-server2" inherits "linux-server" {
macros["address"] = "10.0.0.2"
}
In this example both my-server1 and my-server2 each get their own ping service check.
Objects as well as templates themselves can inherit from an arbitrary number of templates. Attributes inherited from a template can be overridden in the object if necessary.
Groups
TODO
Host/Service Dependencies
TODO
Notifications
TODO