icinga2/doc/2.3-monitoring-basics.md

293 lines
10 KiB
Markdown
Raw Normal View History

## Monitoring Basics
2013-10-01 12:59:02 +02:00
This part of the Icinga 2 documentation provides an overview of all the basic
monitoring concepts you need to know to run Icinga 2.
2013-10-01 12:59:02 +02:00
### Hosts and Services
2013-10-01 12:59:02 +02:00
Icinga 2 can be used to monitor the availability of hosts and services. Services
can be virtually anything which can be checked in some way:
2013-10-01 12:59:02 +02:00
* Network services (HTTP, SMTP, SNMP, SSH, etc.)
* Printers
* Switches / Routers
* Temperature Sensors
* Other local or network-accessible services
Host objects provide a mechanism to group together services that are running
on the same physical device.
Here is an example of a host object which defines two child services:
object Host "my-server1" {
services["ping4"] = {
check_command = "ping4"
},
services["http"] = {
check_command = "http_ip"
},
check = "ping4",
macros["address"] = "10.0.0.1"
}
The example host *my-server1* creates two services which belong to this host:
*ping4* and *http*.
It also specifies that the host should inherit its availability state from the
*ping4* service.
> **Note**
>
> In Icinga 1.x hosts had their own check command, check interval and
> notification settings. Instead, in Icinga 2 hosts inherit their state
> from one of its child services. No checks are performed for the host
> itself.
The *address* macro is used by check commands to determine which network
address is associated with the host object.
#### Host States
2013-10-01 15:33:34 +02:00
Hosts inherit their state from the host check service that is specified using
the *check* attribute.
Hosts can be in any of the following states:
2013-10-01 12:59:02 +02:00
Name | Description
------------|--------------
UP | The host is available.
DOWN | The host is unavailable.
UNREACHABLE | At least one of the host's dependencies (e.g. its upstream router) is unavailable causing the host to be unreachable.
#### Service States
2013-10-01 15:33:34 +02:00
Services can be in any of the following states:
2013-10-01 12:59:02 +02:00
Name | Description
------------|--------------
2013-10-01 15:33:34 +02:00
OK | The service is working properly.
WARNING | The service is experiencing some problems but is still considered to be in working condition.
2013-10-01 12:59:02 +02:00
CRITICAL | The service is in a critical state.
UNKNOWN | The check could not determine the service's state.
#### Hard and Soft States
When detecting a problem with a service Icinga re-checks the service a number of
times (based on the *max_check_attempts* and *retry_interval* settings) before sending
notifications. This ensures that no unnecessary notifications are sent for
transient failures. During this time the service is in a *SOFT* state.
After all re-checks have been executed and the service is still in a non-OK
state the service switches to a *HARD* state and notifications are sent.
Name | Description
------------|--------------
HARD | The host/service's state hasn't recently changed.
SOFT | The host/service has recently changed state and is being re-checked.
### Check Commands
TODO
### Macros
2013-10-01 12:59:02 +02:00
Macros may be used in command definitions to dynamically change how the command
is executed.
2013-10-01 12:59:02 +02:00
Here is an example of a command definition which uses user-defined macros:
2013-10-01 12:59:02 +02:00
object CheckCommand "my-ping" inherits "plugin-check-command" {
command = [
"$plugindir$/check_ping",
"-4",
"-H", "$address$",
"-w", "$wrta$,$wpl$%",
"-c", "$crta$,$cpl$%",
"-p", "$packets$",
"-t", "$timeout$"
],
2013-10-01 12:59:02 +02:00
macros = {
wrta = 100,
wpl = 5,
2013-10-01 12:59:02 +02:00
crta = 200,
cpl = 15,
packets = 5,
timeout = 0
}
}
> **Note**
>
> If you have previously used Icinga 1.x you may already be familiar with
> user and argument macros (e.g., USER1 or ARG1). Unlike in Icinga 1.x macros
> may have arbitrary names and arguments are no longer specified in the
> check_command setting.
Macro names must be enclosed in two *$* signs, e.g. *$plugindir$*. When
2013-10-01 15:33:34 +02:00
executing commands Icinga 2 checks the following objects in this
2013-10-01 12:59:02 +02:00
order to look up macros:
1. User object (only for notifications)
2. Service object
3. Host object
4. Command object
5. Global macros in the IcingaMacros variable
This execution order allows you to define default values for macros in your
command objects. The *my-ping* command shown above uses this to set default
values for some of the latency thresholds and timeouts.
2013-10-01 15:33:34 +02:00
When using the *my-ping* command you can override all or some of the macros
2013-10-01 12:59:02 +02:00
in the service definition like this:
object Host "my-server1" {
services["ping"] = {
check_command = "my-ping",
2013-10-01 15:33:34 +02:00
macros["packets"] = 10 // Overrides the default value of 5 given in the command
2013-10-01 12:59:02 +02:00
},
macros["address"] = "10.0.0.1"
}
If a macro isn't defined anywhere an empty value is used and a warning is
emitted to the Icinga 2 log.
> **Note**
>
> Macros in capital letters (e.g. HOSTNAME) are reserved for use by Icinga 2
> and should not be overwritten by users.
By convention every host should have an *address* macro. Hosts
which have an IPv6 address should also have an *address6* macro.
The *plugindir* macro should be set to the path of your check plugins. The
*/etc/icinga2/conf.d/macros.conf* file is usually used to define global macros
including this one.
#### Host Macros
The following host macros are available in all commands that are executed for
hosts or services:
Name | Description
-----------------------|--------------
2013-10-01 15:33:34 +02:00
HOSTNAME | The name of the host object.
HOSTDISPLAYNAME | The value of the display_name attribute.
HOSTALIAS | This is an alias for the *HOSTDISPLAYNAME* macro.
HOSTSTATE | The host's current state. Can be one of UNREACHABLE, UP and DOWN.
HOSTSTATEID | The host's current state. Can be one of 0 (up), 1 (down) and 2 (unreachable).
HOSTSTATETYPE | The host's current state type. Can be one of SOFT and HARD.
HOSTATTEMPT | The current check attempt number.
MAXHOSTATTEMPT | The maximum number of checks which are executed before changing to a hard state.
LASTHOSTSTATE | The host's previous state. Can be one of UNREACHABLE, UP and DOWN.
LASTHOSTSTATEID | The host's previous state. Can be one of 0 (up), 1 (down) and 2 (unreachable).
LASTHOSTSTATETYPE | The host's previous state type. Can be one of SOFT and HARD.
HOSTLATENCY | The host's check latency.
HOSTEXECUTIONTIME | The host's check execution time.
HOSTOUTPUT | The last check's output.
HOSTPERFDATA | The last check's performance data.
LASTHOSTCHECK | The timestamp when the last check was executed.
HOSTADDRESS | This is an alias for the *address* macro. If the *address* macro is not defined the host object's name is used instead.
HOSTADDRESS6 | This is an alias for the *address6* macro. If the *address* macro is not defined the host object's name is used instead.
2013-10-01 12:59:02 +02:00
#### Service Macros
The following service macros are available in all commands that are executed for
services:
Name | Description
-----------------------|--------------
2013-10-01 15:33:34 +02:00
SERVICEDESC | The short name of the service object.
SERVICEDISPLAYNAME | The value of the display_name attribute.
SERVICECHECKCOMMAND | This is an alias for the *SERVICEDISPLAYNAME* macro.
SERVICESTATE | The service's current state. Can be one of OK, WARNING, CRITICAL, UNCHECKABLE and UNKNOWN.
SERVICESTATEID | The service's current state. Can be one of 0 (ok), 1 (warning), 2 (critical), 3 (unknown) and 4 (uncheckable).
SERVICESTATETYPE | The service's current state type. Can be one of SOFT and HARD.
SERVICEATTEMPT | The current check attempt number.
MAXSERVICEATTEMPT | The maximum number of checks which are executed before changing to a hard state.
LASTSERVICESTATE | The service's previous state. Can be one of OK, WARNING, CRITICAL, UNCHECKABLE and UNKNOWN.
LASTSERVICESTATEID | The service's previous state. Can be one of 0 (ok), 1 (warning), 2 (critical), 3 (unknown) and 4 (uncheckable).
LASTSERVICESTATETYPE | The service's previous state type. Can be one of SOFT and HARD.
LASTSERVICESTATECHANGE | The last state change's timestamp.
SERVICELATENCY | The service's check latency.
SERVICEEXECUTIONTIME | The service's check execution time.
SERVICEOUTPUT | The last check's output.
SERVICEPERFDATA | The last check's performance data.
LASTSERVICECHECK | The timestamp when the last check was executed.
2013-10-01 12:59:02 +02:00
#### User Macros
The following service macros are available in all commands that are executed for
users:
2013-10-01 15:33:34 +02:00
Name | Description
-----------------------|--------------
2013-10-01 12:59:02 +02:00
CONTACTNAME | The name of the user object.
CONTACTALIAS | The value of the display_name attribute.
CONTACTEMAIL | This is an alias for the *email* macro.
CONTACTPAGER | This is an alias for the *pager* macro.
2013-10-01 15:33:34 +02:00
#### Global Macros
The following macros are available in all commands:
Name | Description
-----------------------|--------------
TIMET | Current UNIX timestamp.
LONGDATETIME | Current date and time including timezone information.
SHORTDATETIME | Current date and time.
DATE | Current date.
TIME | Current time including timezone information.
2013-10-01 12:59:02 +02:00
### Using Templates
Templates may be used to apply a set of similar settings to more than one
object.
For example, rather than manually creating a *ping* service object for each of
your hosts you can use templates to avoid having to copy & paste parts of your
config:
template Host "linux-server" {
services["ping"] = {
check_command = "ping4"
},
check = "ping4"
}
object Host "my-server1" inherits "linux-server" {
macros["address"] = "10.0.0.1"
}
object Host "my-server2" inherits "linux-server" {
macros["address"] = "10.0.0.2"
}
In this example both *my-server1* and *my-server2* each get their own ping
service check.
Objects as well as templates themselves can inherit from an arbitrary number of
2013-10-01 15:33:34 +02:00
templates. Attributes inherited from a template can be overridden in the
2013-10-01 12:59:02 +02:00
object if necessary.
2013-10-01 12:59:02 +02:00
### Groups
TODO
2013-10-01 12:59:02 +02:00
### Host/Service Dependencies
TODO
2013-10-01 12:59:02 +02:00
### Notifications
TODO