2013-09-27 10:44:24 +02:00
|
|
|
## Monitoring Basics
|
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
This part of the Icinga 2 documentation provides an overview of all the basic
|
|
|
|
monitoring concepts you need to know to run Icinga 2.
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
### Hosts and Services
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
Icinga 2 can be used to monitor the availability of hosts and services. Services
|
|
|
|
can be virtually anything which can be checked in some way:
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
* Network services (HTTP, SMTP, SNMP, SSH, etc.)
|
|
|
|
* Printers
|
|
|
|
* Switches / Routers
|
|
|
|
* Temperature Sensors
|
|
|
|
* Other local or network-accessible services
|
|
|
|
|
|
|
|
Host objects provide a mechanism to group together services that are running
|
|
|
|
on the same physical device.
|
|
|
|
|
|
|
|
Here is an example of a host object which defines two child services:
|
|
|
|
|
|
|
|
object Host "my-server1" {
|
|
|
|
services["ping4"] = {
|
|
|
|
check_command = "ping4"
|
|
|
|
},
|
|
|
|
|
|
|
|
services["http"] = {
|
|
|
|
check_command = "http_ip"
|
|
|
|
},
|
|
|
|
|
|
|
|
check = "ping4",
|
|
|
|
|
|
|
|
macros["address"] = "10.0.0.1"
|
|
|
|
}
|
|
|
|
|
|
|
|
The example host *my-server1* creates two services which belong to this host:
|
|
|
|
*ping4* and *http*.
|
|
|
|
|
|
|
|
It also specifies that the host should inherit its availability state from the
|
|
|
|
*ping4* service.
|
|
|
|
|
|
|
|
> **Note**
|
|
|
|
>
|
|
|
|
> In Icinga 1.x hosts had their own check command, check interval and
|
|
|
|
> notification settings. Instead, in Icinga 2 hosts inherit their state
|
|
|
|
> from one of its child services. No checks are performed for the host
|
|
|
|
> itself.
|
|
|
|
|
|
|
|
The *address* macro is used by check commands to determine which network
|
|
|
|
address is associated with the host object.
|
|
|
|
|
|
|
|
#### Host States
|
|
|
|
|
|
|
|
Name | Description
|
|
|
|
------------|--------------
|
|
|
|
UP | The host is available.
|
|
|
|
DOWN | The host is unavailable.
|
|
|
|
UNREACHABLE | At least one of the host's dependencies (e.g. its upstream router) is unavailable causing the host to be unreachable.
|
|
|
|
|
|
|
|
#### Service States
|
|
|
|
|
|
|
|
Name | Description
|
|
|
|
------------|--------------
|
|
|
|
OK | The service is fully available.
|
|
|
|
WARNING | The service is experiencing some problems but is still considered available.
|
|
|
|
CRITICAL | The service is in a critical state.
|
|
|
|
UNKNOWN | The check could not determine the service's state.
|
|
|
|
|
|
|
|
#### Hard and Soft States
|
|
|
|
|
|
|
|
When detecting a problem with a service Icinga re-checks the service a number of
|
|
|
|
times (based on the *max_check_attempts* and *retry_interval* settings) before sending
|
|
|
|
notifications. This ensures that no unnecessary notifications are sent for
|
|
|
|
transient failures. During this time the service is in a *SOFT* state.
|
|
|
|
|
|
|
|
After all re-checks have been executed and the service is still in a non-OK
|
|
|
|
state the service switches to a *HARD* state and notifications are sent.
|
|
|
|
|
|
|
|
Name | Description
|
|
|
|
------------|--------------
|
|
|
|
HARD | The host/service's state hasn't recently changed.
|
|
|
|
SOFT | The host/service has recently changed state and is being re-checked.
|
2013-09-27 10:44:24 +02:00
|
|
|
|
|
|
|
### Check Commands
|
|
|
|
|
|
|
|
TODO
|
|
|
|
|
|
|
|
### Macros
|
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
Macros may be used in command definitions to dynamically change how the command
|
|
|
|
is executed.
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
Here is an example of a command definition which uses user-defined macros:
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
object CheckCommand "my-ping" inherits "plugin-check-command" {
|
|
|
|
command = [
|
|
|
|
"$plugindir$/check_ping",
|
|
|
|
"-4",
|
|
|
|
"-H", "$address$",
|
|
|
|
"-w", "$wrta$,$wpl$%",
|
|
|
|
"-c", "$crta$,$cpl$%",
|
|
|
|
"-p", "$packets$",
|
|
|
|
"-t", "$timeout$"
|
|
|
|
],
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
macros = {
|
|
|
|
wrta = 100,
|
|
|
|
wpl = 5,
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
crta = 200,
|
|
|
|
cpl = 15,
|
|
|
|
|
|
|
|
packets = 5,
|
|
|
|
timeout = 0
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
> **Note**
|
|
|
|
>
|
|
|
|
> If you have previously used Icinga 1.x you may already be familiar with
|
|
|
|
> user and argument macros (e.g., USER1 or ARG1). Unlike in Icinga 1.x macros
|
|
|
|
> may have arbitrary names and arguments are no longer specified in the
|
|
|
|
> check_command setting.
|
|
|
|
|
|
|
|
Macro names must be enclosed in two *$* signs, e.g. *$plugindir$*. When
|
|
|
|
executing the command Icinga 2 checks the following objects in this
|
|
|
|
order to look up macros:
|
|
|
|
|
|
|
|
1. User object (only for notifications)
|
|
|
|
2. Service object
|
|
|
|
3. Host object
|
|
|
|
4. Command object
|
|
|
|
5. Global macros in the IcingaMacros variable
|
|
|
|
|
|
|
|
This execution order allows you to define default values for macros in your
|
|
|
|
command objects. The *my-ping* command shown above uses this to set default
|
|
|
|
values for some of the latency thresholds and timeouts.
|
|
|
|
|
|
|
|
When using the *my-ping* command you can overwrite all or some of the macros
|
|
|
|
in the service definition like this:
|
|
|
|
|
|
|
|
object Host "my-server1" {
|
|
|
|
services["ping"] = {
|
|
|
|
check_command = "my-ping",
|
|
|
|
|
|
|
|
macros["packets"] = 10 // Overwrites the default value of 5 given in the command
|
|
|
|
},
|
|
|
|
|
|
|
|
macros["address"] = "10.0.0.1"
|
|
|
|
}
|
|
|
|
|
|
|
|
If a macro isn't defined anywhere an empty value is used and a warning is
|
|
|
|
emitted to the Icinga 2 log.
|
|
|
|
|
|
|
|
> **Note**
|
|
|
|
>
|
|
|
|
> Macros in capital letters (e.g. HOSTNAME) are reserved for use by Icinga 2
|
|
|
|
> and should not be overwritten by users.
|
|
|
|
|
|
|
|
By convention every host should have an *address* macro. Hosts
|
|
|
|
which have an IPv6 address should also have an *address6* macro.
|
|
|
|
|
|
|
|
The *plugindir* macro should be set to the path of your check plugins. The
|
|
|
|
*/etc/icinga2/conf.d/macros.conf* file is usually used to define global macros
|
|
|
|
including this one.
|
|
|
|
|
|
|
|
#### Host Macros
|
|
|
|
|
|
|
|
The following host macros are available in all commands that are executed for
|
|
|
|
hosts or services:
|
|
|
|
|
|
|
|
Name | Description
|
|
|
|
-----------------------|--------------
|
|
|
|
HOSTNAME |The name of the host object.
|
|
|
|
HOSTDISPLAYNAME |The value of the display_name attribute.
|
|
|
|
HOSTALIAS |This is an alias for the *HOSTDISPLAYNAME* macro.
|
|
|
|
HOSTSTATE |The host's current state. Can be one of UNREACHABLE, UP and DOWN.
|
|
|
|
HOSTSTATEID |The host's current state. Can be one of 0 (up), 1 (down) and 2 (unreachable).
|
|
|
|
HOSTSTATETYPE |The host's current state type. Can be one of SOFT and HARD.
|
|
|
|
HOSTATTEMPT |The current check attempt number.
|
|
|
|
MAXHOSTATTEMPT |The maximum number of checks which are executed before changing to a hard state.
|
|
|
|
LASTHOSTSTATE |The host's previous state. Can be one of UNREACHABLE, UP and DOWN.
|
|
|
|
LASTHOSTSTATEID |The host's previous state. Can be one of 0 (up), 1 (down) and 2 (unreachable).
|
|
|
|
LASTHOSTSTATETYPE |The host's previous state type. Can be one of SOFT and HARD.
|
|
|
|
HOSTLATENCY |The host's check latency.
|
|
|
|
HOSTEXECUTIONTIME |The host's check execution time.
|
|
|
|
HOSTOUTPUT |The last check's output.
|
|
|
|
HOSTPERFDATA |The last check's performance data.
|
|
|
|
LASTHOSTCHECK |The timestamp when the last check was executed.
|
|
|
|
HOSTADDRESS |This is an alias for the *address* macro. If the *address* macro is not defined the host object's name is used instead.
|
|
|
|
HOSTADDRESS6 |This is an alias for the *address6* macro. If the *address* macro is not defined the host object's name is used instead.
|
|
|
|
|
|
|
|
#### Service Macros
|
|
|
|
|
|
|
|
The following service macros are available in all commands that are executed for
|
|
|
|
services:
|
|
|
|
|
|
|
|
Name | Description
|
|
|
|
-----------------------|--------------
|
|
|
|
SERVICEDESC |The short name of the service object.
|
|
|
|
SERVICEDISPLAYNAME |The value of the display_name attribute.
|
|
|
|
SERVICECHECKCOMMAND |This is an alias for the *SERVICEDISPLAYNAME* macro.
|
|
|
|
SERVICESTATE |The service's current state. Can be one of OK, WARNING, CRITICAL, UNCHECKABLE and UNKNOWN.
|
|
|
|
SERVICESTATEID |The service's current state. Can be one of 0 (ok), 1 (warning), 2 (critical), 3 (unknown) and 4 (uncheckable).
|
|
|
|
SERVICESTATETYPE |The service's current state type. Can be one of SOFT and HARD.
|
|
|
|
SERVICEATTEMPT |The current check attempt number.
|
|
|
|
MAXSERVICEATTEMPT |The maximum number of checks which are executed before changing to a hard state.
|
|
|
|
LASTSERVICESTATE |The service's previous state. Can be one of OK, WARNING, CRITICAL, UNCHECKABLE and UNKNOWN.
|
|
|
|
LASTSERVICESTATEID |The service's previous state. Can be one of 0 (ok), 1 (warning), 2 (critical), 3 (unknown) and 4 (uncheckable).
|
|
|
|
LASTSERVICESTATETYPE |The service's previous state type. Can be one of SOFT and HARD.
|
|
|
|
LASTSERVICESTATECHANGE |The last state change's timestamp.
|
|
|
|
SERVICELATENCY |The service's check latency.
|
|
|
|
SERVICEEXECUTIONTIME |The service's check execution time.
|
|
|
|
SERVICEOUTPUT |The last check's output.
|
|
|
|
SERVICEPERFDATA |The last check's performance data.
|
|
|
|
LASTSERVICECHECK |The timestamp when the last check was executed.
|
|
|
|
|
|
|
|
#### User Macros
|
|
|
|
|
|
|
|
The following service macros are available in all commands that are executed for
|
|
|
|
users:
|
|
|
|
|
|
|
|
Name | Description
|
|
|
|
-----------------------|---------------
|
|
|
|
CONTACTNAME | The name of the user object.
|
|
|
|
CONTACTALIAS | The value of the display_name attribute.
|
|
|
|
CONTACTEMAIL | This is an alias for the *email* macro.
|
|
|
|
CONTACTPAGER | This is an alias for the *pager* macro.
|
|
|
|
|
|
|
|
### Using Templates
|
|
|
|
|
|
|
|
Templates may be used to apply a set of similar settings to more than one
|
|
|
|
object.
|
|
|
|
|
|
|
|
For example, rather than manually creating a *ping* service object for each of
|
|
|
|
your hosts you can use templates to avoid having to copy & paste parts of your
|
|
|
|
config:
|
|
|
|
|
|
|
|
template Host "linux-server" {
|
|
|
|
services["ping"] = {
|
|
|
|
check_command = "ping4"
|
|
|
|
},
|
|
|
|
|
|
|
|
check = "ping4"
|
|
|
|
}
|
|
|
|
|
|
|
|
object Host "my-server1" inherits "linux-server" {
|
|
|
|
macros["address"] = "10.0.0.1"
|
|
|
|
}
|
|
|
|
|
|
|
|
object Host "my-server2" inherits "linux-server" {
|
|
|
|
macros["address"] = "10.0.0.2"
|
|
|
|
}
|
|
|
|
|
|
|
|
In this example both *my-server1* and *my-server2* each get their own ping
|
|
|
|
service check.
|
|
|
|
|
|
|
|
Objects as well as templates themselves can inherit from an arbitrary number of
|
|
|
|
templates. Attributes inherited from a template can be overwritten in the
|
|
|
|
object if necessary.
|
2013-09-27 10:44:24 +02:00
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
### Groups
|
2013-09-27 10:44:24 +02:00
|
|
|
|
|
|
|
TODO
|
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
### Host/Service Dependencies
|
2013-09-27 10:44:24 +02:00
|
|
|
|
|
|
|
TODO
|
|
|
|
|
2013-10-01 12:59:02 +02:00
|
|
|
### Notifications
|
2013-09-27 10:44:24 +02:00
|
|
|
|
|
|
|
TODO
|
|
|
|
|