2014-05-04 11:25:12 +02:00
# <a id="monitoring-basics"></a> Monitoring Basics
This part of the Icinga 2 documentation provides an overview of all the basic
2015-05-13 14:14:30 +02:00
monitoring concepts you need to know to run Icinga 2.
2016-04-22 15:59:35 +02:00
Keep in mind these examples are made with a Linux server in mind. If you are
2016-08-13 15:59:06 +02:00
using Windows, you will need to change the services accordingly. See the [ITL reference ](10-icinga-template-library.md#windows-plugins )
2015-03-25 10:48:20 +01:00
for further information.
2014-05-04 11:25:12 +02:00
## <a id="hosts-services"></a> Hosts and Services
Icinga 2 can be used to monitor the availability of hosts and services. Hosts
and services can be virtually anything which can be checked in some way:
* Network services (HTTP, SMTP, SNMP, SSH, etc.)
* Printers
2016-04-22 15:59:35 +02:00
* Switches or routers
2014-05-29 16:54:57 +02:00
* Temperature sensors
2014-05-04 11:25:12 +02:00
* Other local or network-accessible services
Host objects provide a mechanism to group services that are running
on the same physical device.
Here is an example of a host object which defines two child services:
object Host "my-server1" {
address = "10.0.0.1"
check_command = "hostalive"
}
object Service "ping4" {
2014-05-29 16:54:57 +02:00
host_name = "my-server1"
2014-05-04 11:25:12 +02:00
check_command = "ping4"
}
object Service "http" {
2014-05-29 16:54:57 +02:00
host_name = "my-server1"
2014-06-02 10:47:31 +02:00
check_command = "http"
2014-05-04 11:25:12 +02:00
}
The example creates two services `ping4` and `http` which belong to the
host `my-server1` .
It also specifies that the host should perform its own check using the `hostalive`
check command.
2014-06-02 14:39:42 +02:00
The `address` attribute is used by check commands to determine which network
2014-05-04 11:25:12 +02:00
address is associated with the host object.
2016-08-13 15:59:06 +02:00
Details on troubleshooting check problems can be found [here ](15-troubleshooting.md#troubleshooting ).
2014-05-23 01:01:06 +02:00
2014-05-04 11:25:12 +02:00
### <a id="host-states"></a> Host States
Hosts can be in any of the following states:
Name | Description
------------|--------------
UP | The host is available.
DOWN | The host is unavailable.
### <a id="service-states"></a> Service States
Services can be in any of the following states:
Name | Description
------------|--------------
OK | The service is working properly.
WARNING | The service is experiencing some problems but is still considered to be in working condition.
CRITICAL | The service is in a critical state.
UNKNOWN | The check could not determine the service's state.
### <a id="hard-soft-states"></a> Hard and Soft States
When detecting a problem with a host/service Icinga re-checks the object a number of
times (based on the `max_check_attempts` and `retry_interval` settings) before sending
notifications. This ensures that no unnecessary notifications are sent for
transient failures. During this time the object is in a `SOFT` state.
After all re-checks have been executed and the object is still in a non-OK
state the host/service switches to a `HARD` state and notifications are sent.
Name | Description
------------|--------------
HARD | The host/service's state hasn't recently changed.
SOFT | The host/service has recently changed state and is being re-checked.
2014-06-02 14:39:42 +02:00
### <a id="host-service-checks"></a> Host and Service Checks
2014-05-04 11:25:12 +02:00
2015-02-11 11:51:58 +01:00
Hosts and services determine their state by running checks in a regular interval.
2014-06-02 14:39:42 +02:00
2015-02-11 11:51:58 +01:00
object Host "router" {
check_command = "hostalive"
address = "10.0.0.1"
2014-06-02 14:39:42 +02:00
}
2015-02-11 11:51:58 +01:00
The `hostalive` command is one of several built-in check commands. It sends ICMP
echo requests to the IP address specified in the `address` attribute to determine
whether a host is online.
2014-05-27 00:25:36 +02:00
2016-08-13 15:59:06 +02:00
A number of other [built-in check commands ](10-icinga-template-library.md#plugin-check-commands ) are also
2015-02-11 11:51:58 +01:00
available. In addition to these commands the next few chapters will explain in
detail how to set up your own check commands.
2014-05-27 00:25:36 +02:00
2015-02-11 11:51:58 +01:00
## <a id="object-inheritance-using-templates"></a> Templates
2014-05-04 11:25:12 +02:00
Templates may be used to apply a set of identical attributes to more than one
object:
template Service "generic-service" {
max_check_attempts = 3
check_interval = 5m
retry_interval = 1m
enable_perfdata = true
}
2014-11-07 03:40:46 +01:00
apply Service "ping4" {
2014-05-04 11:25:12 +02:00
import "generic-service"
check_command = "ping4"
2014-11-07 03:40:46 +01:00
assign where host.address
2014-05-04 11:25:12 +02:00
}
2014-11-07 03:40:46 +01:00
apply Service "ping6" {
2014-05-04 11:25:12 +02:00
import "generic-service"
check_command = "ping6"
2014-11-07 03:40:46 +01:00
assign where host.address6
2014-05-04 11:25:12 +02:00
}
2014-11-07 03:40:46 +01:00
2014-05-04 11:25:12 +02:00
In this example the `ping4` and `ping6` services inherit properties from the
2015-02-11 11:51:58 +01:00
template `generic-service` .
2014-05-04 11:25:12 +02:00
Objects as well as templates themselves can import an arbitrary number of
2015-02-11 11:51:58 +01:00
other templates. Attributes inherited from a template can be overridden in the
2014-05-04 11:25:12 +02:00
object if necessary.
2015-02-11 11:51:58 +01:00
You can also import existing non-template objects. Note that templates
and objects share the same namespace, i.e. you can't define a template
that has the same name like an object.
2014-11-25 18:51:53 +01:00
2015-02-11 14:10:21 +01:00
## <a id="custom-attributes"></a> Custom Attributes
In addition to built-in attributes you can define your own attributes:
object Host "localhost" {
vars.ssh_port = 2222
}
Valid values for custom attributes include:
2016-08-13 15:59:06 +02:00
* [Strings ](17-language-reference.md#string-literals ), [numbers ](17-language-reference.md#numeric-literals ) and [booleans ](17-language-reference.md#boolean-literals )
* [Arrays ](17-language-reference.md#array ) and [dictionaries ](17-language-reference.md#dictionary )
2015-06-15 11:11:49 +02:00
* [Functions ](3-monitoring-basics.md#custom-attributes-functions )
2015-02-11 14:10:21 +01:00
### <a id="custom-attributes-functions"></a> Functions as Custom Attributes
2016-08-13 15:59:06 +02:00
Icinga 2 lets you specify [functions ](17-language-reference.md#functions ) for custom attributes.
2015-06-15 11:11:49 +02:00
The special case here is that whenever Icinga 2 needs the value for such a custom attribute it runs
2015-02-11 14:10:21 +01:00
the function and uses whatever value the function returns:
object CheckCommand "random-value" {
command = [ PluginDir + "/check_dummy", "0", "$text$" ]
vars.text = {{ Math.random() * 100 }}
}
2016-08-13 15:59:06 +02:00
This example uses the [abbreviated lambda syntax ](17-language-reference.md#nullary-lambdas ).
2015-02-11 14:10:21 +01:00
These functions have access to a number of variables:
Variable | Description
-------------|---------------
user | The User object (for notifications).
service | The Service object (for service checks/notifications/event handlers).
host | The Host object.
command | The command object (e.g. a CheckCommand object for checks).
Here's an example:
vars.text = {{ host.check_interval }}
In addition to these variables the `macro` function can be used to retrieve the
value of arbitrary macro expressions:
vars.text = {{
if (macro("$address$") == "127.0.0.1") {
log("Running a check for localhost!")
}
return "Some text"
}}
2015-08-27 08:22:35 +02:00
The `resolve_arguments` can be used to resolve a command and its arguments much in
the same fashion Icinga does this for the `command` and `arguments` attributes for
commands. The `by_ssh` command uses this functionality to let users specify a
command and arguments that should be executed via SSH:
arguments = {
"-C" = {{
var command = macro("$by_ssh_command$")
var arguments = macro("$by_ssh_arguments$")
if (typeof(command) == String & & !arguments) {
return command
}
var escaped_args = []
for (arg in resolve_arguments(command, arguments)) {
escaped_args.add(escape_shell_arg(arg))
}
return escaped_args.join(" ")
}}
...
}
2015-03-19 17:18:36 +01:00
Acessing object attributes at runtime inside these functions is described in the
2016-08-13 15:59:06 +02:00
[advanced topics ](8-advanced-topics.md#access-object-attributes-at-runtime ) chapter.
2015-02-11 14:10:21 +01:00
## <a id="runtime-macros"></a> Runtime Macros
Macros can be used to access other objects' attributes at runtime. For example they
are used in command definitions to figure out which IP address a check should be
run against:
object CheckCommand "my-ping" {
command = [ PluginDir + "/check_ping", "-H", "$ping_address$" ]
arguments = {
"-w" = "$ping_wrta$,$ping_wpl$%"
"-c" = "$ping_crta$,$ping_cpl$%"
"-p" = "$ping_packets$"
}
2015-03-17 09:02:17 +01:00
vars.ping_address = "$address$"
2015-02-11 14:10:21 +01:00
vars.ping_wrta = 100
vars.ping_wpl = 5
vars.ping_crta = 250
vars.ping_cpl = 10
vars.ping_packets = 5
}
object Host "router" {
check_command = "my-ping"
address = "10.0.0.1"
}
In this example we are using the `$address$` macro to refer to the host's `address`
attribute.
We can also directly refer to custom attributes, e.g. by using `$ping_wrta$` . Icinga
automatically tries to find the closest match for the attribute you specified. The
exact rules for this are explained in the next section.
2015-10-21 22:49:21 +02:00
> **Note**
>
> When using the `$` sign as single character you must escape it with an
> additional dollar character (`$$`).
2015-02-11 14:10:21 +01:00
### <a id="macro-evaluation-order"></a> Evaluation Order
When executing commands Icinga 2 checks the following objects in this order to look
up macros and their respective values:
1. User object (only for notifications)
2. Service object
3. Host object
4. Command object
5. Global custom attributes in the `Vars` constant
This execution order allows you to define default values for custom attributes
in your command objects.
Here's how you can override the custom attribute `ping_packets` from the previous
example:
object Service "ping" {
host_name = "localhost"
check_command = "my-ping"
vars.ping_packets = 10 // Overrides the default value of 5 given in the command
}
2016-04-22 15:59:35 +02:00
If a custom attribute isn't defined anywhere, an empty value is used and a warning is
2015-02-11 14:10:21 +01:00
written to the Icinga 2 log.
2016-04-22 15:59:35 +02:00
You can also directly refer to a specific attribute -- thereby ignoring these evaluation
rules -- by specifying the full attribute name:
2015-02-11 14:10:21 +01:00
$service.vars.ping_wrta$
This retrieves the value of the `ping_wrta` custom attribute for the service. This
2015-03-17 09:02:17 +01:00
returns an empty value if the service does not have such a custom attribute no matter
2015-02-11 14:10:21 +01:00
whether another object such as the host has this attribute.
### <a id="host-runtime-macros"></a> Host Runtime Macros
The following host custom attributes are available in all commands that are executed for
hosts or services:
Name | Description
-----------------------------|--------------
host.name | The name of the host object.
host.display_name | The value of the `display_name` attribute.
host.state | The host's current state. Can be one of `UNREACHABLE` , `UP` and `DOWN` .
host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
host.state_type | The host's current state type. Can be one of `SOFT` and `HARD` .
host.check_attempt | The current check attempt number.
host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
host.last_state | The host's previous state. Can be one of `UNREACHABLE` , `UP` and `DOWN` .
host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD` .
host.last_state_change | The last state change's timestamp.
host.downtime_depth | The number of active downtimes.
host.duration_sec | The time since the last state change.
host.latency | The host's check latency.
host.execution_time | The host's check execution time.
host.output | The last check's output.
host.perfdata | The last check's performance data.
host.last_check | The timestamp when the last check was executed.
host.check_source | The monitoring instance that performed the last check.
host.num_services | Number of services associated with the host.
host.num_services_ok | Number of services associated with the host which are in an `OK` state.
host.num_services_warning | Number of services associated with the host which are in a `WARNING` state.
host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state.
host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state.
### <a id="service-runtime-macros"></a> Service Runtime Macros
The following service macros are available in all commands that are executed for
services:
Name | Description
---------------------------|--------------
service.name | The short name of the service object.
service.display_name | The value of the `display_name` attribute.
service.check_command | The short name of the command along with any arguments to be used for the check.
service.state | The service's current state. Can be one of `OK` , `WARNING` , `CRITICAL` and `UNKNOWN` .
service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
service.state_type | The service's current state type. Can be one of `SOFT` and `HARD` .
service.check_attempt | The current check attempt number.
service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
service.last_state | The service's previous state. Can be one of `OK` , `WARNING` , `CRITICAL` and `UNKNOWN` .
service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD` .
service.last_state_change | The last state change's timestamp.
service.downtime_depth | The number of active downtimes.
service.duration_sec | The time since the last state change.
service.latency | The service's check latency.
service.execution_time | The service's check execution time.
service.output | The last check's output.
service.perfdata | The last check's performance data.
service.last_check | The timestamp when the last check was executed.
service.check_source | The monitoring instance that performed the last check.
### <a id="command-runtime-macros"></a> Command Runtime Macros
The following custom attributes are available in all commands:
Name | Description
-----------------------|--------------
command.name | The name of the command object.
### <a id="user-runtime-macros"></a> User Runtime Macros
The following custom attributes are available in all commands that are executed for
users:
Name | Description
-----------------------|--------------
user.name | The name of the user object.
user.display_name | The value of the display_name attribute.
### <a id="notification-runtime-macros"></a> Notification Runtime Macros
Name | Description
-----------------------|--------------
notification.type | The type of the notification.
2016-04-22 15:59:35 +02:00
notification.author | The author of the notification comment if existing.
notification.comment | The comment of the notification if existing.
2015-02-11 14:10:21 +01:00
### <a id="global-runtime-macros"></a> Global Runtime Macros
The following macros are available in all executed commands:
Name | Description
-----------------------|--------------
icinga.timet | Current UNIX timestamp.
icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
icinga.date | Current date. Example: `2014-01-03`
icinga.time | Current time including timezone information. Example: `11:23:08 +0000`
icinga.uptime | Current uptime of the Icinga 2 process.
The following macros provide global statistics:
Name | Description
----------------------------------|--------------
icinga.num_services_ok | Current number of services in state 'OK'.
icinga.num_services_warning | Current number of services in state 'Warning'.
icinga.num_services_critical | Current number of services in state 'Critical'.
icinga.num_services_unknown | Current number of services in state 'Unknown'.
icinga.num_services_pending | Current number of pending services.
icinga.num_services_unreachable | Current number of unreachable services.
icinga.num_services_flapping | Current number of flapping services.
icinga.num_services_in_downtime | Current number of services in downtime.
icinga.num_services_acknowledged | Current number of acknowledged service problems.
icinga.num_hosts_up | Current number of hosts in state 'Up'.
icinga.num_hosts_down | Current number of hosts in state 'Down'.
icinga.num_hosts_unreachable | Current number of unreachable hosts.
2015-11-26 20:03:46 +01:00
icinga.num_hosts_pending | Current number of pending hosts.
2015-02-11 14:10:21 +01:00
icinga.num_hosts_flapping | Current number of flapping hosts.
icinga.num_hosts_in_downtime | Current number of hosts in downtime.
icinga.num_hosts_acknowledged | Current number of acknowledged host problems.
2015-02-11 11:51:58 +01:00
## <a id="using-apply"></a> Apply Rules
2014-05-04 11:25:12 +02:00
2016-08-13 15:59:06 +02:00
Instead of assigning each object ([Service](9-object-types.md#objecttype-service),
[Notification ](9-object-types.md#objecttype-notification ), [Dependency ](9-object-types.md#objecttype-dependency ),
[ScheduledDowntime ](9-object-types.md#objecttype-scheduleddowntime ))
based on attribute identifiers for example `host_name` objects can be [applied ](17-language-reference.md#apply ).
2014-05-04 11:25:12 +02:00
2014-11-07 03:40:46 +01:00
Before you start using the apply rules keep the following in mind:
* Define the best match.
2015-06-16 16:01:02 +02:00
* A set of unique [custom attributes ](3-monitoring-basics.md#custom-attributes ) for these hosts/services?
2015-01-22 09:40:25 +01:00
* Or [group ](3-monitoring-basics.md#groups ) memberships, e.g. a host being a member of a hostgroup, applying services to it?
2016-11-25 13:40:42 +01:00
* A generic pattern [match ](18-library-reference.md#global-functions-match ) on the host/service name?
2016-08-13 15:59:06 +02:00
* [Multiple expressions combined ](3-monitoring-basics.md#using-apply-expressions ) with `&&` or `||` [operators ](17-language-reference.md#expression-operators )
2014-11-07 03:40:46 +01:00
* All expressions must return a boolean value (an empty string is equal to `false` e.g.)
> **Note**
>
> You can set/override object attributes in apply rules using the respectively available
> objects in that scope (host and/or service objects).
2015-01-22 09:40:25 +01:00
[Custom attributes ](3-monitoring-basics.md#custom-attributes ) can also store nested dictionaries and arrays. That way you can use them
2014-11-07 03:40:46 +01:00
for not only matching for their existance or values in apply expressions, but also assign
("inherit") their values into the generated objected from apply rules.
2015-01-22 09:40:25 +01:00
* [Apply services to hosts ](3-monitoring-basics.md#using-apply-services )
* [Apply notifications to hosts and services ](3-monitoring-basics.md#using-apply-notifications )
2015-06-16 19:58:32 +02:00
* [Apply dependencies to hosts and services ](3-monitoring-basics.md#using-apply-dependencies )
2015-01-22 09:40:25 +01:00
* [Apply scheduled downtimes to hosts and services ](3-monitoring-basics.md#using-apply-scheduledowntimes )
2014-11-07 03:40:46 +01:00
A more advanced example is using [apply with for loops on arrays or
2015-06-16 17:34:53 +02:00
dictionaries](3-monitoring-basics.md#using-apply-for) for example provided by
2015-06-16 16:01:02 +02:00
[custom atttributes ](3-monitoring-basics.md#custom-attributes ) or groups.
2014-11-07 03:40:46 +01:00
> **Tip**
>
> Building configuration in that dynamic way requires detailed information
2016-08-13 15:59:06 +02:00
> of the generated objects. Use the `object list` [CLI command](11-cli-commands.md#cli-command-object)
> after successful [configuration validation](11-cli-commands.md#config-validation).
2014-11-07 03:40:46 +01:00
2015-02-11 11:51:58 +01:00
### <a id="using-apply-expressions"></a> Apply Rules Expressions
2014-11-07 03:40:46 +01:00
You can use simple or advanced combinations of apply rule expressions. Each
expression must evaluate into the boolean `true` value. An empty string
will be for instance interpreted as `false` . In a similar fashion undefined
attributes will return `false` .
Returns `false` :
assign where host.vars.attribute_does_not_exist
Multiple `assign where` condition rows are evaluated as `OR` condition.
You can combine multiple expressions for matching only a subset of objects. In some cases,
you want to be able to add more than one assign/ignore where expression which matches
a specific condition. To achieve this you can use the logical `and` and `or` operators.
2016-11-25 13:40:42 +01:00
[Match ](18-library-reference.md#global-functions-match ) all `*mysql*` patterns in the host name and (`&& `) custom attribute `prod_mysql_db`
2014-11-07 03:40:46 +01:00
matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true`
should be ignored, or any host name ending with `*internal` pattern.
object HostGroup "mysql-server" {
display_name = "MySQL Server"
assign where match("*mysql*", host.name) & & match("db-*", host.vars.prod_mysql_db)
ignore where host.vars.test_server == true
ignore where match("*internal", host.name)
}
Similar example for advanced notification apply rule filters: If the service
2016-11-25 13:40:42 +01:00
attribute `notes` [matches ](18-library-reference.md#global-functions-match ) the `has gold support 24x7` string `AND` one of the
2016-04-22 15:59:35 +02:00
two condition passes, either the `customer` host custom attribute is set to `customer-xy`
2014-11-07 03:40:46 +01:00
`OR` the host custom attribute `always_notify` is set to `true` .
The notification is ignored for services whose host name ends with `*internal`
2016-08-13 15:59:06 +02:00
`OR` the `priority` custom attribute is [less than ](17-language-reference.md#expression-operators ) `2` .
2014-11-07 03:40:46 +01:00
template Notification "cust-xy-notification" {
users = [ "noc-xy", "mgmt-xy" ]
command = "mail-service-notification"
}
apply Notification "notify-cust-xy-mysql" to Service {
import "cust-xy-notification"
2015-04-21 12:15:53 +02:00
assign where match("*has gold support 24x7*", service.notes) & & (host.vars.customer == "customer-xy" || host.vars.always_notify == true)
2015-01-22 16:09:28 +01:00
ignore where match("*internal", host.name) || (service.vars.priority < 2 & & host . vars . is_clustered = = true )
2014-11-07 03:40:46 +01:00
}
2016-08-13 15:59:06 +02:00
More advanced examples are covered [here ](8-advanced-topics.md#use-functions-assign-where ).
2016-03-17 15:49:13 +01:00
2015-02-11 11:51:58 +01:00
### <a id="using-apply-services"></a> Apply Services to Hosts
2014-05-26 22:27:13 +02:00
2015-06-16 16:01:02 +02:00
The sample configuration already includes a detailed example in [hosts.conf ](4-configuring-icinga-2.md#hosts-conf )
and [services.conf ](4-configuring-icinga-2.md#services-conf ) for this use case.
2014-11-07 03:40:46 +01:00
The example for `ssh` applies a service object to all hosts with the `address`
attribute being defined and the custom attribute `os` set to the string `Linux` in `vars` .
apply Service "ssh" {
2014-05-04 11:25:12 +02:00
import "generic-service"
2014-11-07 03:40:46 +01:00
check_command = "ssh"
2014-05-04 11:25:12 +02:00
2014-11-07 03:40:46 +01:00
assign where host.address & & host.vars.os == "Linux"
2014-05-04 11:25:12 +02:00
}
2014-11-07 03:40:46 +01:00
2016-04-22 15:59:35 +02:00
Other detailed examples are used in their respective chapters, for example
2015-06-16 16:01:02 +02:00
[apply services with custom command arguments ](3-monitoring-basics.md#command-passing-parameters ).
2014-05-04 11:25:12 +02:00
2015-02-11 11:51:58 +01:00
### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
2014-05-26 22:27:13 +02:00
2014-05-20 13:56:49 +02:00
Notifications are applied to specific targets (`Host` or `Service` ) and work in a similar
manner:
2014-11-07 03:40:46 +01:00
2014-05-20 13:56:49 +02:00
apply Notification "mail-noc" to Service {
import "mail-service-notification"
2014-11-07 03:40:46 +01:00
2014-05-20 13:56:49 +02:00
user_groups = [ "noc" ]
2014-11-07 03:40:46 +01:00
assign where host.vars.notification.mail
2014-05-20 13:56:49 +02:00
}
2014-11-07 03:40:46 +01:00
2014-05-20 13:56:49 +02:00
In this example the `mail-noc` notification will be created as object for all services having the
2014-11-07 03:40:46 +01:00
`notification.mail` custom attribute defined. The notification command is set to `mail-service-notification`
2014-05-20 13:56:49 +02:00
and all members of the user group `noc` will get notified.
2015-07-02 15:58:10 +02:00
It is also possible to generally apply a notification template and dynamically overwrite values from
2016-08-13 15:59:06 +02:00
the template by checking for custom attributes. This can be achieved by using [conditional statements ](17-language-reference.md#conditional-statements ):
2015-07-02 15:58:10 +02:00
apply Notification "host-mail-noc" to Host {
import "mail-host-notification"
// replace interval inherited from `mail-host-notification` template with new notfication interval set by a host custom attribute
if (host.vars.notification_interval) {
interval = host.vars.notification_interval
}
// same with notification period
if (host.vars.notification_period) {
2015-10-22 08:16:37 +02:00
period = host.vars.notification_period
2015-07-02 15:58:10 +02:00
}
// Send SMS instead of email if the host's custom attribute `notification_type` is set to `sms`
if (host.vars.notification_type == "sms") {
command = "sms-host-notification"
} else {
command = "mail-host-notification"
}
user_groups = [ "noc" ]
assign where host.address
}
In the example above, the notification template `mail-host-notification` , which contains all relevant
notification settings, is applied on all host objects where the `host.address` is defined.
Each host object is then checked for custom attributes (`host.vars.notification_interval`,
`host.vars.notification_period` and `host.vars.notification_type` ). Depending if the custom
attibute is set or which value it has, the value from the notification template is dynamically
overwritten.
2016-04-22 15:59:35 +02:00
The corresponding host object could look like this:
2015-07-02 15:58:10 +02:00
object Host "host1" {
import "host-linux-prod"
display_name = "host1"
address = "192.168.1.50"
vars.notification_interval = 1h
vars.notification_period = "24x7"
vars.notification_type = "sms"
}
### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
Detailed examples can be found in the [dependencies ](3-monitoring-basics.md#dependencies ) chapter.
### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
The sample configuration includes an example in [downtimes.conf ](4-configuring-icinga-2.md#downtimes-conf ).
2016-08-13 15:59:06 +02:00
Detailed examples can be found in the [recurring downtimes ](8-advanced-topics.md#recurring-downtimes ) chapter.
2015-07-02 15:58:10 +02:00
2015-02-11 11:51:58 +01:00
### <a id="using-apply-for"></a> Using Apply For Rules
2014-11-07 03:40:46 +01:00
2015-03-19 17:18:36 +01:00
Next to the standard way of using [apply rules ](3-monitoring-basics.md#using-apply )
2015-10-22 17:01:59 +02:00
there is the requirement of applying objects based on a set (array or
2016-08-13 15:59:06 +02:00
dictionary) using [apply for ](17-language-reference.md#apply-for ) expressions.
2014-11-07 03:40:46 +01:00
2015-06-16 16:01:02 +02:00
The sample configuration already includes a detailed example in [hosts.conf ](4-configuring-icinga-2.md#hosts-conf )
and [services.conf ](4-configuring-icinga-2.md#services-conf ) for this use case.
2014-11-07 03:40:46 +01:00
2015-03-19 17:18:36 +01:00
Take the following example: A host provides the snmp oids for different service check
types. This could look like the following example:
object Host "router-v6" {
check_command = "hostalive"
address6 = "::1"
vars.oids["if01"] = "1.1.1.1.1"
vars.oids["temp"] = "1.1.1.1.2"
vars.oids["bgp"] = "1.1.1.1.5"
}
2016-04-22 15:59:35 +02:00
Now we want to create service checks for `if01` and `temp` , but not `bgp` .
2015-03-19 17:18:36 +01:00
Furthermore we want to pass the snmp oid stored as dictionary value to the
2016-04-22 15:59:35 +02:00
custom attribute called `vars.snmp_oid` -- this is the command argument required
2016-08-13 15:59:06 +02:00
by the [snmp ](10-icinga-template-library.md#plugin-check-command-snmp ) check command.
2015-03-19 17:18:36 +01:00
The service's `display_name` should be set to the identifier inside the dictionary.
apply Service for (identifier => oid in host.vars.oids) {
check_command = "snmp"
display_name = identifier
vars.snmp_oid = oid
ignore where identifier == "bgp" //don't generate service for bgp checks
}
2015-05-13 14:14:30 +02:00
Icinga 2 evaluates the `apply for` rule for all objects with the custom attribute
2015-03-19 17:18:36 +01:00
`oids` set. It then iterates over all list items inside the `for` loop and evaluates the
`assign/ignore where` expressions. You can access the loop variable
in these expressions, e.g. for ignoring certain values.
In this example we'd ignore the `bgp` identifier and avoid generating an unwanted service.
2016-11-25 13:40:42 +01:00
We could extend the configuration by also matching the `oid` value on certain
[regex ](18-library-reference.md#global-functions-regex )/[wildcard match](18-library-reference.md#global-functions-match) patterns for example.
2015-03-19 17:18:36 +01:00
> **Note**
>
> You don't need an `assign where` expression only checking for existance
> of the custom attribute.
That way you'll save duplicated apply rules by combining them into one
generic `apply for` rule generating the object name with or without a prefix.
#### <a id="using-apply-for-custom-attribute-override"></a> Apply For and Custom Attribute Override
2015-05-13 14:14:30 +02:00
Imagine a different more advanced example: You are monitoring your network device (host)
with many interfaces (services). The following requirements/problems apply:
2014-11-07 03:40:46 +01:00
2016-05-23 14:14:59 +02:00
* Each interface service check should be named with a prefix and a name defined in your host object (which could be generated from your CMDB, etc.)
2014-11-07 03:40:46 +01:00
* Each interface has its own vlan tag
* Some interfaces have QoS enabled
2015-11-20 15:57:16 +01:00
* Additional attributes such as `display_name` or `notes` , `notes_url` and `action_url` must be
2014-11-07 03:40:46 +01:00
dynamically generated
2015-06-16 16:01:02 +02:00
Tip: Define the snmp community as global constant in your [constants.conf ](4-configuring-icinga-2.md#constants-conf ) file.
2015-05-13 14:14:30 +02:00
const IftrafficSnmpCommunity = "public"
2014-11-07 03:40:46 +01:00
2015-05-13 14:14:30 +02:00
By defining the `interfaces` dictionary with three example interfaces on the `cisco-catalyst-6509-34`
host object, you'll make sure to pass the [custom attribute ](3-monitoring-basics.md#custom-attributes )
storage required by the for loop in the service apply rule.
object Host "cisco-catalyst-6509-34" {
import "generic-host"
display_name = "Catalyst 6509 #34 VIE21"
address = "127.0.1.4"
/* "GigabitEthernet0/2" is the interface name,
* and key name in service apply for later on
*/
vars.interfaces["GigabitEthernet0/2"] = {
/* define all custom attributes with the
* same name required for command parameters/arguments
* in service apply (look into your CheckCommand definition)
*/
iftraffic_units = "g"
iftraffic_community = IftrafficSnmpCommunity
2015-05-13 18:24:19 +02:00
iftraffic_bandwidth = 1
2015-05-13 14:14:30 +02:00
vlan = "internal"
qos = "disabled"
2014-11-07 03:40:46 +01:00
}
2015-05-13 14:14:30 +02:00
vars.interfaces["GigabitEthernet0/4"] = {
iftraffic_units = "g"
//iftraffic_community = IftrafficSnmpCommunity
2015-05-13 18:24:19 +02:00
iftraffic_bandwidth = 1
2015-05-13 14:14:30 +02:00
vlan = "renote"
qos = "enabled"
2014-11-07 03:40:46 +01:00
}
2015-05-13 14:14:30 +02:00
vars.interfaces["MgmtInterface1"] = {
iftraffic_community = IftrafficSnmpCommunity
vlan = "mgmt"
interface_address = "127.99.0.100" #special management ip
2014-11-07 03:40:46 +01:00
}
}
You can also omit the `"if-"` string, then all generated service names are directly
taken from the `if_name` variable value.
The config dictionary contains all key-value pairs for the specific interface in one
2015-05-13 14:14:30 +02:00
loop cycle, like `iftraffic_units` , `vlan` , and `qos` for the specified interface.
You can either map the custom attributes from the `interface_config` dictionary to
local custom attributes stashed into `vars` . If the names match the required command
argument parameters already (for example `iftraffic_units` ), you could also add the
`interface_config` dictionary to the `vars` dictionary using the `+=` operator.
After `vars` is fully populated, all object attributes can be set calculated from
provided host attributes. For strings, you can use string concatention with the `+` operator.
2016-09-21 16:38:14 +02:00
You can also specify the display_name, check command, interval, notes, notes_url, action_url, etc.
2016-08-13 15:59:06 +02:00
attributes that way. Attribute strings can be [concatenated ](17-language-reference.md#expression-operators ),
2015-06-16 19:58:32 +02:00
for example for adding a more detailed service `display_name` .
2015-05-13 14:14:30 +02:00
2016-08-13 15:59:06 +02:00
This example also uses [if conditions ](17-language-reference.md#conditional-statements )
2015-05-13 14:14:30 +02:00
if specific values are not set, adding a local default value.
2016-04-22 15:59:35 +02:00
The other way around you can override specific custom attributes inherited from a service template if set.
2015-05-13 14:14:30 +02:00
/* loop over the host.vars.interfaces dictionary
* for (key => value in dict) means `interface_name` as key
* and `interface_config` as value. Access config attributes
* with the indexer (`.`) character.
*/
apply Service "if-" for (interface_name => interface_config in host.vars.interfaces) {
import "generic-service"
check_command = "iftraffic"
display_name = "IF-" + interface_name
2014-11-07 03:40:46 +01:00
2015-05-13 14:14:30 +02:00
/* use the key as command argument (no duplication of values in host.vars.interfaces) */
vars.iftraffic_interface = interface_name
2014-11-07 03:40:46 +01:00
2015-05-13 14:14:30 +02:00
/* map the custom attributes as command arguments */
vars.iftraffic_units = interface_config.iftraffic_units
vars.iftraffic_community = interface_config.iftraffic_community
2014-11-07 03:40:46 +01:00
2015-05-13 14:14:30 +02:00
/* the above can be achieved in a shorter fashion if the names inside host.vars.interfaces
* are the _exact_ same as required as command parameter by the check command
* definition.
*/
vars += interface_config
2014-11-07 03:40:46 +01:00
2015-05-13 18:24:19 +02:00
/* set a default value for units and bandwidth */
2015-05-13 14:14:30 +02:00
if (interface_config.iftraffic_units == "") {
2015-05-13 18:24:19 +02:00
vars.iftraffic_units = "m"
}
if (interface_config.iftraffic_bandwidth == "") {
vars.iftraffic_bandwidth = 1
2015-05-13 14:14:30 +02:00
}
if (interface_config.vlan == "") {
vars.vlan = "not set"
}
if (interface_config.qos == "") {
vars.qos = "not set"
}
2014-11-07 03:40:46 +01:00
2015-05-13 18:24:19 +02:00
/* set the global constant if not explicitely
* not provided by the `interfaces` dictionary on the host
*/
if (len(interface_config.iftraffic_community) == 0 || len(vars.iftraffic_community) == 0) {
vars.iftraffic_community = IftrafficSnmpCommunity
}
2015-05-13 14:14:30 +02:00
/* Calculate some additional object attributes after populating the `vars` dictionary */
2015-05-13 18:24:19 +02:00
notes = "Interface check for " + interface_name + " (units: '" + interface_config.iftraffic_units + "') in VLAN '" + vars.vlan + "' with ' QoS '" + vars.qos + "'"
2014-11-07 03:40:46 +01:00
notes_url = "http://foreman.company.com/hosts/" + host.name
2015-05-13 14:14:30 +02:00
action_url = "http://snmp.checker.company.com/" + host.name + "/if-" + interface_name
2014-11-07 03:40:46 +01:00
}
2015-05-13 14:14:30 +02:00
2015-05-13 18:24:19 +02:00
2016-12-08 17:38:41 +01:00
This example makes use of the [check_iftraffic ](https://exchange.icinga.com/exchange/iftraffic ) plugin.
2015-05-13 14:14:30 +02:00
The `CheckCommand` definition can be found in the
2016-08-13 15:59:06 +02:00
[contributed plugin check commands ](10-icinga-template-library.md#plugin-contrib-command-iftraffic )
2016-04-22 15:59:35 +02:00
-- make sure to include them in your [icinga2 configuration file ](4-configuring-icinga-2.md#icinga2-conf ).
2015-05-13 14:14:30 +02:00
2014-11-07 03:40:46 +01:00
> **Tip**
>
> Building configuration in that dynamic way requires detailed information
2016-08-13 15:59:06 +02:00
> of the generated objects. Use the `object list` [CLI command](11-cli-commands.md#cli-command-object)
> after successful [configuration validation](11-cli-commands.md#config-validation).
2014-11-07 03:40:46 +01:00
2015-06-16 17:34:53 +02:00
Verify that the apply-for-rule successfully created the service objects with the
2015-05-13 14:14:30 +02:00
inherited custom attributes:
# icinga2 daemon -C
# icinga2 object list --type Service --name *catalyst*
2015-11-20 15:57:16 +01:00
Object 'cisco-catalyst-6509-34!if-GigabitEthernet0/2' of type 'Service':
2015-05-13 18:24:19 +02:00
......
2015-05-13 14:14:30 +02:00
* vars
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26
* iftraffic_bandwidth = 1
2015-05-13 14:14:30 +02:00
* iftraffic_community = "public"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65
2015-05-13 14:14:30 +02:00
* iftraffic_interface = "GigabitEthernet0/2"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43
2015-05-13 14:14:30 +02:00
* iftraffic_units = "g"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57
2015-05-13 14:14:30 +02:00
* qos = "disabled"
* vlan = "internal"
2015-05-13 18:24:19 +02:00
2015-05-13 14:14:30 +02:00
Object 'cisco-catalyst-6509-34!if-GigabitEthernet0/4' of type 'Service':
...
* vars
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26
* iftraffic_bandwidth = 1
* iftraffic_community = "public"
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 79:5-79:53
2015-05-13 14:14:30 +02:00
* iftraffic_interface = "GigabitEthernet0/4"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43
2015-05-13 14:14:30 +02:00
* iftraffic_units = "g"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57
2015-05-13 14:14:30 +02:00
* qos = "enabled"
* vlan = "renote"
Object 'cisco-catalyst-6509-34!if-MgmtInterface1' of type 'Service':
...
* vars
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 59:3-59:26
* iftraffic_bandwidth = 1
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 66:5-66:32
2015-05-13 14:14:30 +02:00
* iftraffic_community = "public"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 53:3-53:65
2015-05-13 14:14:30 +02:00
* iftraffic_interface = "MgmtInterface1"
2015-05-13 18:24:19 +02:00
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 49:3-49:43
* iftraffic_units = "m"
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 52:3-52:57
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 63:5-63:30
2015-05-13 14:14:30 +02:00
* interface_address = "127.99.0.100"
* qos = "not set"
% = modified in '/etc/icinga2/conf.d/iftraffic.conf', lines 72:5-72:24
* vlan = "mgmt"
2014-11-07 03:40:46 +01:00
2015-03-19 17:18:36 +01:00
### <a id="using-apply-object-attributes"></a> Use Object Attributes in Apply Rules
2014-11-07 03:40:46 +01:00
Since apply rules are evaluated after the generic objects, you
can reference existing host and/or service object attributes as
values for any object attribute specified in that apply rule.
object Host "opennebula-host" {
import "generic-host"
address = "10.1.1.2"
vars.hosting["xyz"] = {
http_uri = "/shop"
2015-01-22 16:09:28 +01:00
customer_name = "Customer xyz"
customer_id = "7568"
support_contract = "gold"
2014-11-07 03:40:46 +01:00
}
vars.hosting["abc"] = {
http_uri = "/shop"
2015-01-22 16:09:28 +01:00
customer_name = "Customer xyz"
customer_id = "7568"
support_contract = "silver"
2014-11-07 03:40:46 +01:00
}
}
apply Service for (customer => config in host.vars.hosting) {
import "generic-service"
check_command = "ping4"
vars.qos = "disabled"
vars += config
vars.http_uri = "/" + vars.customer + "/" + config.http_uri
display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id
notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")."
notes_url = "http://foreman.company.com/hosts/" + host.name
action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id
}
2015-02-11 11:51:58 +01:00
## <a id="groups"></a> Groups
2014-05-04 11:25:12 +02:00
2015-02-11 11:51:58 +01:00
A group is a collection of similar objects. Groups are primarily used as a
visualization aid in web interfaces.
2014-05-04 11:25:12 +02:00
Group membership is defined at the respective object itself. If
you have a hostgroup name `windows` for example, and want to assign
specific hosts to this group for later viewing the group on your
2015-02-11 11:51:58 +01:00
alert dashboard, first create a HostGroup object:
2014-05-04 11:25:12 +02:00
object HostGroup "windows" {
display_name = "Windows Servers"
}
2015-02-11 11:51:58 +01:00
Then add your hosts to this group:
2014-05-04 11:25:12 +02:00
template Host "windows-server" {
groups += [ "windows" ]
}
object Host "mssql-srv1" {
import "windows-server"
vars.mssql_port = 1433
}
object Host "mssql-srv2" {
import "windows-server"
vars.mssql_port = 1433
}
2015-02-11 11:51:58 +01:00
This can be done for service and user groups the same way:
2014-05-04 11:25:12 +02:00
object UserGroup "windows-mssql-admins" {
display_name = "Windows MSSQL Admins"
}
template User "generic-windows-mssql-users" {
groups += [ "windows-mssql-admins" ]
}
object User "win-mssql-noc" {
import "generic-windows-mssql-users"
email = "noc@example.com"
}
object User "win-mssql-ops" {
import "generic-windows-mssql-users"
email = "ops@example.com"
}
2015-02-11 11:51:58 +01:00
### <a id="group-assign-intro"></a> Group Membership Assign
2014-05-04 11:25:12 +02:00
2015-02-11 14:15:03 +01:00
Instead of manually assigning each object to a group you can also assign objects
2015-02-11 11:51:58 +01:00
to a group based on their attributes:
2014-05-04 11:25:12 +02:00
2014-11-07 03:40:46 +01:00
object HostGroup "prod-mssql" {
display_name = "Production MSSQL Servers"
2015-02-11 11:51:58 +01:00
2014-11-07 03:40:46 +01:00
assign where host.vars.mssql_port & & host.vars.prod_mysql_db
ignore where host.vars.test_server == true
ignore where match("*internal", host.name)
2014-05-04 11:25:12 +02:00
}
2015-02-11 11:51:58 +01:00
In this example all hosts with the `vars` attribute `mssql_port`
2016-11-25 13:40:42 +01:00
will be added as members to the host group `mssql` . However, all
hosts [matching ](18-library-reference.md#global-functions-match ) the string `\*internal`
or with the `test_server` attribute set to `true` are **not** added to this group.
2015-02-11 11:51:58 +01:00
Details on the `assign where` syntax can be found in the
2016-08-13 15:59:06 +02:00
[Language Reference ](17-language-reference.md#apply ).
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
## <a id="notifications"></a> Notifications
Notifications for service and host problems are an integral part of your
monitoring setup.
When a host or service is in a downtime, a problem has been acknowledged or
the dependency logic determined that the host/service is unreachable, no
2014-05-29 16:54:57 +02:00
notifications are sent. You can configure additional type and state filters
2014-05-22 20:14:26 +02:00
refining the notifications being actually sent.
2016-04-22 15:59:35 +02:00
There are many ways of sending notifications, e.g. by email, XMPP,
2014-05-22 20:14:26 +02:00
IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
Instead it relies on external mechanisms such as shell scripts to notify users.
2016-08-13 15:59:06 +02:00
More notification methods are listed in the [addons and plugins ](13-addons.md#notification-scripts-interfaces )
2015-06-23 16:19:54 +02:00
chapter.
2014-05-22 20:14:26 +02:00
A notification specification requires one or more users (and/or user groups)
who will be notified in case of problems. These users must have all custom
attributes defined which will be used in the `NotificationCommand` on execution.
The user `icingaadmin` in the example below will get notified only on `WARNING` and
`CRITICAL` states and `problem` and `recovery` notification types.
object User "icingaadmin" {
display_name = "Icinga 2 Admin"
enable_notifications = true
states = [ OK, Warning, Critical ]
types = [ Problem, Recovery ]
email = "icinga@localhost"
}
2014-05-23 01:01:06 +02:00
If you don't set the `states` and `types` configuration attributes for the `User`
object, notifications for all states and types will be sent.
2016-08-13 15:59:06 +02:00
Details on troubleshooting notification problems can be found [here ](15-troubleshooting.md#troubleshooting ).
2014-05-22 20:14:26 +02:00
2014-05-29 15:34:01 +02:00
> **Note**
>
2016-08-13 15:59:06 +02:00
> Make sure that the [notification](11-cli-commands.md#enable-features) feature is enabled
2014-05-29 15:34:01 +02:00
> in order to execute notification commands.
2014-05-22 20:14:26 +02:00
You should choose which information you (and your notified users) are interested in
case of emergency, and also which information does not provide any value to you and
your environment.
2015-01-22 09:40:25 +01:00
An example notification command is explained [here ](3-monitoring-basics.md#notification-commands ).
2014-05-22 20:14:26 +02:00
You can add all shared attributes to a `Notification` template which is inherited
to the defined notifications. That way you'll save duplicated attributes in each
`Notification` object. Attributes can be overridden locally.
template Notification "generic-notification" {
interval = 15m
command = "mail-service-notification"
states = [ Warning, Critical, Unknown ]
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
2014-05-29 16:54:57 +02:00
FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
2014-05-22 20:14:26 +02:00
period = "24x7"
}
2015-01-25 16:07:43 +01:00
The time period `24x7` is included as example configuration with Icinga 2.
2014-05-22 20:14:26 +02:00
Use the `apply` keyword to create `Notification` objects for your services:
2014-11-07 03:40:46 +01:00
apply Notification "notify-cust-xy-mysql" to Service {
2014-05-22 20:14:26 +02:00
import "generic-notification"
2014-11-07 03:40:46 +01:00
users = [ "noc-xy", "mgmt-xy" ]
2014-05-22 20:14:26 +02:00
2014-11-07 03:40:46 +01:00
assign where match("*has gold support 24x7*", service.notes) & & (host.vars.customer == "customer-xy" || host.vars.always_notify == true
2015-01-22 16:09:28 +01:00
ignore where match("*internal", host.name) || (service.vars.priority < 2 & & host . vars . is_clustered = = true )
2014-05-22 20:14:26 +02:00
}
2014-11-07 03:40:46 +01:00
2014-05-22 20:14:26 +02:00
Instead of assigning users to notifications, you can also add the `user_groups`
attribute with a list of user groups to the `Notification` object. Icinga 2 will
send notifications to all group members.
2014-11-04 22:03:39 +01:00
> **Note**
>
> Only users who have been notified of a problem before (`Warning`, `Critical`, `Unknown`
> states for services, `Down` for hosts) will receive `Recovery` notifications.
2014-05-22 20:14:26 +02:00
### <a id="notification-escalations"></a> Notification Escalations
2014-05-29 16:54:57 +02:00
When a problem notification is sent and a problem still exists at the time of re-notification
2014-05-22 20:14:26 +02:00
you may want to escalate the problem to the next support level. A different approach
2014-06-15 17:34:21 +02:00
is to configure the default notification by email, and escalate the problem via SMS
2014-05-22 20:14:26 +02:00
if not already solved.
You can define notification start and end times as additional configuration
attributes making the `Notification` object a so-called `notification escalation` .
Using templates you can share the basic notification attributes such as users or the
`interval` (and override them for the escalation then).
2014-06-15 17:34:21 +02:00
Using the example from above, you can define additional users being escalated for SMS
2014-05-22 20:14:26 +02:00
notifications between start and end time.
object User "icinga-oncall-2nd-level" {
display_name = "Icinga 2nd Level"
vars.mobile = "+1 555 424642"
}
object User "icinga-oncall-1st-level" {
display_name = "Icinga 1st Level"
vars.mobile = "+1 555 424642"
}
2015-06-16 16:01:02 +02:00
Define an additional [NotificationCommand ](3-monitoring-basics.md#notification-commands ) for SMS notifications.
2014-05-22 20:14:26 +02:00
> **Note**
>
> The example is not complete as there are many different SMS providers.
> Please note that sending SMS notifications will require an SMS provider
2016-04-22 15:59:35 +02:00
> or local hardware with an active SIM card.
2014-05-22 20:14:26 +02:00
object NotificationCommand "sms-notification" {
command = [
PluginDir + "/send_sms_notification",
"$mobile$",
"..."
}
2014-11-07 03:40:46 +01:00
The two new notification escalations are added onto the local host
2014-05-22 20:14:26 +02:00
and its service `ping4` using the `generic-notification` template.
The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
command) after `30m` until `1h` .
> **Note**
>
> The `interval` was set to 15m in the `generic-notification`
> template example. Lower that value in your escalations by using a secondary
2014-06-15 17:34:21 +02:00
> template or by overriding the attribute directly in the `notifications` array
2014-05-22 20:14:26 +02:00
> position for `escalation-sms-2nd-level`.
2016-04-22 15:59:35 +02:00
If the problem does not get resolved nor acknowledged preventing further notifications,
2014-05-22 20:14:26 +02:00
the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
apply Notification "mail" to Service {
import "generic-notification"
command = "mail-notification"
users = [ "icingaadmin" ]
assign where service.name == "ping4"
}
apply Notification "escalation-sms-2nd-level" to Service {
import "generic-notification"
command = "sms-notification"
users = [ "icinga-oncall-2nd-level" ]
times = {
begin = 30m
end = 1h
}
assign where service.name == "ping4"
}
apply Notification "escalation-sms-1st-level" to Service {
import "generic-notification"
command = "sms-notification"
users = [ "icinga-oncall-1st-level" ]
times = {
begin = 1h
end = 2h
}
assign where service.name == "ping4"
}
2014-06-15 23:17:16 +02:00
### <a id="notification-delay"></a> Notification Delay
2014-05-22 20:14:26 +02:00
2016-04-22 15:59:35 +02:00
Sometimes the problem in question should not be announced when the notification is due
(the object reaching the `HARD` state), but after a certain period. In Icinga 2
2014-06-15 23:17:16 +02:00
you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
2016-04-22 15:59:35 +02:00
postpone the notification window for 15 minutes. Leave out the `end` key -- if not set,
2014-11-07 03:40:46 +01:00
Icinga 2 will not check against any end time for this notification. Make sure to
specify a relatively low notification `interval` to get notified soon enough again.
2014-05-22 20:14:26 +02:00
apply Notification "mail" to Service {
import "generic-notification"
command = "mail-notification"
users = [ "icingaadmin" ]
2014-11-07 03:40:46 +01:00
interval = 5m
times.begin = 15m // delay notification window
2014-05-22 20:14:26 +02:00
assign where service.name == "ping4"
}
2014-08-27 16:30:35 +02:00
### <a id="disable-renotification"></a> Disable Re-notifications
If you prefer to be notified only once, you can disable re-notifications by setting the
`interval` attribute to `0` .
apply Notification "notify-once" to Service {
import "generic-notification"
command = "mail-notification"
users = [ "icingaadmin" ]
interval = 0 // disable re-notification
assign where service.name == "ping4"
}
2014-05-22 20:14:26 +02:00
### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
If there are no notification state and type filter attributes defined at the `Notification`
2016-04-22 15:59:35 +02:00
or `User` object, Icinga 2 assumes that all states and types are being notified.
2014-05-22 20:14:26 +02:00
Available state and type filters for notifications are:
template Notification "generic-notification" {
states = [ Warning, Critical, Unknown ]
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
}
2016-04-22 15:59:35 +02:00
If you are familiar with Icinga 1.x `notification_options` , please note that they have been split
2014-06-15 17:34:21 +02:00
into type and state to allow more fine granular filtering for example on downtimes and flapping.
2015-02-23 22:29:46 +01:00
You can filter for acknowledgements and custom notifications too.
2014-05-22 20:14:26 +02:00
2014-05-04 11:25:12 +02:00
## <a id="commands"></a> Commands
Icinga 2 uses three different command object types to specify how
2014-05-29 16:54:57 +02:00
checks should be performed, notifications should be sent, and
2014-05-04 11:25:12 +02:00
events should be handled.
### <a id="check-commands"></a> Check Commands
2016-08-13 15:59:06 +02:00
[CheckCommand ](9-object-types.md#objecttype-checkcommand ) objects define the command line how
2014-09-15 19:03:55 +02:00
a check is called.
2016-08-13 15:59:06 +02:00
[CheckCommand ](9-object-types.md#objecttype-checkcommand ) objects are referenced by
[Host ](9-object-types.md#objecttype-host ) and [Service ](9-object-types.md#objecttype-service ) objects
2014-09-15 19:03:55 +02:00
using the `check_command` attribute.
2014-05-04 11:25:12 +02:00
2014-05-29 15:34:01 +02:00
> **Note**
>
2016-08-13 15:59:06 +02:00
> Make sure that the [checker](11-cli-commands.md#enable-features) feature is enabled in order to
2014-05-29 15:34:01 +02:00
> execute checks.
2014-05-22 16:46:29 +02:00
#### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
2014-05-04 11:25:12 +02:00
Unless you have done so already, download your check plugin and put it
2015-06-16 16:01:02 +02:00
into the [PluginDir ](4-configuring-icinga-2.md#constants-conf ) directory. The following example uses the
2015-06-16 19:58:32 +02:00
`check_mysql` plugin contained in the Monitoring Plugins package.
2014-05-04 11:25:12 +02:00
The plugin path and all command arguments are made a list of
double-quoted string arguments for proper shell escaping.
Call the `check_disk` plugin with the `--help` parameter to see
all available options. Our example defines warning (`-w`) and
critical (`-c`) thresholds for the disk usage. Without any
partition defined (`-p`) it will check all local partitions.
2016-04-22 15:59:35 +02:00
icinga@icinga2 $ /usr/lib64/nagios/plugins/check_mysql --help
...
2015-06-16 19:58:32 +02:00
This program tests connections to a MySQL server
2016-04-22 15:59:35 +02:00
Usage:
check_mysql [-d database] [-H host] [-P port] [-s socket]
[-u user] [-p password] [-S] [-l] [-a cert] [-k key]
[-C ca-cert] [-D ca-dir] [-L ciphers] [-f optfile] [-g group]
2014-06-15 23:17:16 +02:00
2015-06-16 17:34:53 +02:00
Next step is to understand how [command parameters ](3-monitoring-basics.md#command-passing-parameters )
2016-08-13 15:59:06 +02:00
are being passed from a host or service object, and add a [CheckCommand ](9-object-types.md#objecttype-checkcommand )
2014-09-15 19:03:55 +02:00
definition based on these required parameters and/or default values.
2014-05-22 16:46:29 +02:00
2016-08-13 15:59:06 +02:00
Please continue reading in the [plugins section ](5-service-monitoring.md#service-monitoring-plugins ) for additional integration examples.
2015-05-29 10:20:30 +02:00
2014-05-22 16:46:29 +02:00
#### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
2014-11-27 16:57:58 +01:00
Check command parameters are defined as custom attributes which can be accessed as runtime macros
by the executed check command.
2014-05-22 16:46:29 +02:00
2015-06-16 17:34:53 +02:00
The check command parameters for ITL provided plugin check command definitions are documented
2016-08-13 15:59:06 +02:00
[here ](10-icinga-template-library.md#plugin-check-commands ), for example
[disk ](10-icinga-template-library.md#plugin-check-command-disk ).
2015-06-16 17:34:53 +02:00
In order to practice passing command parameters you should [integrate your own plugin ](3-monitoring-basics.md#command-plugin-integration ).
The following example will use `check_mysql` provided by the [Monitoring Plugins installation ](2-getting-started.md#setting-up-check-plugins ).
Define the default check command custom attributes, for example `mysql_user` and `mysql_password`
(freely definable naming schema) and optional their default threshold values. You can
2015-01-22 09:40:25 +01:00
then use these custom attributes as runtime macros for [command arguments ](3-monitoring-basics.md#command-arguments )
2014-06-11 14:05:47 +02:00
on the command line.
2014-05-04 11:25:12 +02:00
2014-09-15 19:03:55 +02:00
> **Tip**
>
> Use a common command type as prefix for your command arguments to increase
2015-06-16 17:34:53 +02:00
> readability. `mysql_user` helps understanding the context better than just
> `user` as argument.
2014-09-15 19:03:55 +02:00
2014-05-04 11:25:12 +02:00
The default custom attributes can be overridden by the custom attributes
2015-06-16 17:34:53 +02:00
defined in the host or service using the check command `my-mysql` . The custom attributes
2014-05-04 11:25:12 +02:00
can also be inherited from a parent template using additive inheritance (`+=`).
2015-06-16 17:34:53 +02:00
# vim /etc/icinga2/conf.d/commands.conf
object CheckCommand "my-mysql" {
command = [ PluginDir + "/check_mysql" ] //constants.conf -> const PluginDir
2014-06-15 23:17:16 +02:00
2014-06-11 14:05:47 +02:00
arguments = {
2015-06-16 17:34:53 +02:00
"-H" = "$mysql_host$"
"-u" = {
2015-03-10 18:46:27 +01:00
required = true
2015-06-16 17:34:53 +02:00
value = "$mysql_user$"
2015-03-10 18:46:27 +01:00
}
2015-06-16 17:34:53 +02:00
"-p" = "$mysql_password$"
"-P" = "$mysql_port$"
"-s" = "$mysql_socket$"
"-a" = "$mysql_cert$"
"-d" = "$mysql_database$"
"-k" = "$mysql_key$"
"-C" = "$mysql_ca_cert$"
"-D" = "$mysql_ca_dir$"
"-L" = "$mysql_ciphers$"
"-f" = "$mysql_optfile$"
"-g" = "$mysql_group$"
"-S" = {
set_if = "$mysql_check_slave$"
description = "Check if the slave thread is running properly."
2015-03-10 18:46:27 +01:00
}
2015-06-16 17:34:53 +02:00
"-l" = {
set_if = "$mysql_ssl$"
description = "Use ssl encryption"
2015-03-10 18:46:27 +01:00
}
2014-06-11 14:05:47 +02:00
}
2014-06-15 23:17:16 +02:00
2015-06-16 17:34:53 +02:00
vars.mysql_check_slave = false
vars.mysql_ssl = false
vars.mysql_host = "$address$"
2014-05-04 11:25:12 +02:00
}
2015-06-16 17:34:53 +02:00
The check command definition also sets `mysql_host` to the `$address$` default value. You can override
this command parameter if for example your MySQL host is not running on the same server's ip address.
Make sure pass all required command parameters, such as `mysql_user` , `mysql_password` and `mysql_database` .
`MysqlUsername` and `MysqlPassword` are specified as [global constants ](4-configuring-icinga-2.md#constants-conf )
in this example.
# vim /etc/icinga2/conf.d/services.conf
apply Service "mysql-icinga-db-health" {
import "generic-service"
check_command = "my-mysql"
vars.mysql_user = MysqlUsername
vars.mysql_password = MysqlPassword
vars.mysql_database = "icinga"
vars.mysql_host = "192.168.33.11"
assign where match("icinga2*", host.name)
ignore where host.vars.no_health_check == true
}
Take a different example: The example host configuration in [hosts.conf ](4-configuring-icinga-2.md#hosts-conf )
also applies an `ssh` service check. Your host's ssh port is not the default `22` , but set to `2022` .
You can pass the command parameter as custom attribute `ssh_port` directly inside the service apply rule
inside [services.conf ](4-configuring-icinga-2.md#services-conf ):
apply Service "ssh" {
import "generic-service"
check_command = "ssh"
vars.ssh_port = 2022 //custom command parameter
assign where (host.address || host.address6) & & host.vars.os == "Linux"
}
If you prefer this being configured at the host instead of the service, modify the host configuration
object instead. The runtime macro resolving order is described [here ](3-monitoring-basics.md#macro-evaluation-order ).
2016-04-22 15:59:35 +02:00
object Host NodeName {
...
vars.ssh_port = 2022
}
2015-06-16 17:34:53 +02:00
#### <a id="command-passing-parameters-apply-for"></a> Passing Check Command Parameters Using Apply For
2014-06-11 14:05:47 +02:00
2015-06-16 17:34:53 +02:00
The host `localhost` with the generated services from the `basic-partitions` dictionary (see
[apply for ](3-monitoring-basics.md#using-apply-for ) for details) checks a basic set of disk partitions
2014-11-27 16:57:58 +01:00
with modified custom attributes (warning thresholds at `10%` , critical thresholds at `5%`
2014-05-04 11:25:12 +02:00
free disk space).
2014-11-27 16:57:58 +01:00
The custom attribute `disk_partition` can either hold a single string or an array of
string values for passing multiple partitions to the `check_disk` check plugin.
2014-05-04 11:25:12 +02:00
2014-11-27 16:57:58 +01:00
object Host "my-server" {
import "generic-host"
2014-05-04 11:25:12 +02:00
address = "127.0.0.1"
address6 = "::1"
2014-11-27 16:57:58 +01:00
vars.local_disks["basic-partitions"] = {
disk_partitions = [ "/", "/tmp", "/var", "/home" ]
}
2014-05-04 11:25:12 +02:00
}
2014-11-27 16:57:58 +01:00
apply Service for (disk => config in host.vars.local_disks) {
2014-05-04 11:25:12 +02:00
import "generic-service"
2014-06-11 14:05:47 +02:00
check_command = "my-disk"
2014-05-04 11:25:12 +02:00
2014-11-27 16:57:58 +01:00
vars += config
2015-03-10 18:46:27 +01:00
vars.disk_wfree = "10%"
vars.disk_cfree = "5%"
2014-05-04 11:25:12 +02:00
}
2014-11-27 16:57:58 +01:00
More details on using arrays in custom attributes can be found in
2015-06-16 16:01:02 +02:00
[this chapter ](3-monitoring-basics.md#custom-attributes ).
2014-11-27 16:57:58 +01:00
2014-06-11 14:05:47 +02:00
#### <a id="command-arguments"></a> Command Arguments
2014-05-26 22:27:13 +02:00
By defining a check command line using the `command` attribute Icinga 2
will resolve all macros in the static string or array. Sometimes it is
required to extend the arguments list based on a met condition evaluated
2016-04-22 15:59:35 +02:00
at command execution. Or making arguments optional -- only set if the
2014-05-26 22:27:13 +02:00
macro value can be resolved by Icinga 2.
object CheckCommand "check_http" {
2014-07-16 12:58:18 +02:00
command = [ PluginDir + "/check_http" ]
2014-05-26 22:27:13 +02:00
arguments = {
"-H" = "$http_vhost$"
"-I" = "$http_address$"
"-u" = "$http_uri$"
"-p" = "$http_port$"
"-S" = {
set_if = "$http_ssl$"
}
2014-06-27 21:38:11 +02:00
"--sni" = {
set_if = "$http_sni$"
}
"-a" = {
value = "$http_auth_pair$"
description = "Username:password on sites with basic authentication"
}
"--no-body" = {
set_if = "$http_ignore_body$"
}
"-r" = "$http_expect_body_regex$"
2014-05-26 22:27:13 +02:00
"-w" = "$http_warn_time$"
"-c" = "$http_critical_time$"
2014-07-10 22:16:14 +02:00
"-e" = "$http_expect$"
2014-05-26 22:27:13 +02:00
}
vars.http_address = "$address$"
vars.http_ssl = false
2014-06-27 21:38:11 +02:00
vars.http_sni = false
2014-05-26 22:27:13 +02:00
}
The example shows the `check_http` check command defining the most common
arguments. Each of them is optional by default and will be omitted if
2016-04-22 15:59:35 +02:00
the value is not set. For example, if the service calling the check command
2014-05-26 22:27:13 +02:00
does not have `vars.http_port` set, it won't get added to the command
line.
2014-08-27 16:30:35 +02:00
2014-05-26 22:27:13 +02:00
If the `vars.http_ssl` custom attribute is set in the service, host or command
object definition, Icinga 2 will add the `-S` argument based on the `set_if`
2014-08-27 16:30:35 +02:00
numeric value to the command line. String values are not supported.
2014-11-27 16:57:58 +01:00
If the macro value cannot be resolved, Icinga 2 will not add the defined argument
to the final command argument array. Empty strings for macro values won't omit
the argument.
2014-05-26 22:27:13 +02:00
That way you can use the `check_http` command definition for both, with and
without SSL enabled checks saving you duplicated command definitions.
Details on all available options can be found in the
2016-08-13 15:59:06 +02:00
[CheckCommand object definition ](9-object-types.md#objecttype-checkcommand ).
2014-05-26 22:27:13 +02:00
2014-05-04 11:25:12 +02:00
2015-03-19 17:18:36 +01:00
#### <a id="command-environment-variables"></a> Environment Variables
The `env` command object attribute specifies a list of environment variables with values calculated
from either runtime macros or custom attributes which should be exported as environment variables
prior to executing the command.
This is useful for example for hiding sensitive information on the command line output
when passing credentials to database checks:
object CheckCommand "mysql-health" {
command = [
PluginDir + "/check_mysql"
]
arguments = {
"-H" = "$mysql_address$"
"-d" = "$mysql_database$"
}
vars.mysql_address = "$address$"
vars.mysql_database = "icinga"
vars.mysql_user = "icinga_check"
vars.mysql_pass = "password"
env.MYSQLUSER = "$mysql_user$"
env.MYSQLPASS = "$mysql_pass$"
}
2014-05-04 11:25:12 +02:00
### <a id="notification-commands"></a> Notification Commands
2016-08-13 15:59:06 +02:00
[NotificationCommand ](9-object-types.md#objecttype-notificationcommand ) objects define how notifications are delivered to external
2016-05-23 14:14:59 +02:00
interfaces (email, XMPP, IRC, Twitter, etc.).
2014-05-04 11:25:12 +02:00
2016-08-13 15:59:06 +02:00
[NotificationCommand ](9-object-types.md#objecttype-notificationcommand ) objects are referenced by
[Notification ](9-object-types.md#objecttype-notification ) objects using the `command` attribute.
2014-09-15 19:03:55 +02:00
2014-05-29 15:34:01 +02:00
> **Note**
>
2016-08-13 15:59:06 +02:00
> Make sure that the [notification](11-cli-commands.md#enable-features) feature is enabled
2014-05-29 15:34:01 +02:00
> in order to execute notification commands.
2014-05-04 11:25:12 +02:00
Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
the current check output) sending an email to the user(s) associated with the
notification itself (`$user.email$`).
If you want to specify default values for some of the custom attribute definitions,
you can add a `vars` dictionary as shown for the `CheckCommand` object.
object NotificationCommand "mail-service-notification" {
command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
env = {
2014-07-11 14:38:15 +02:00
NOTIFICATIONTYPE = "$notification.type$"
SERVICEDESC = "$service.name$"
HOSTALIAS = "$host.display_name$"
HOSTADDRESS = "$address$"
SERVICESTATE = "$service.state$"
LONGDATETIME = "$icinga.long_date_time$"
SERVICEOUTPUT = "$service.output$"
NOTIFICATIONAUTHORNAME = "$notification.author$"
NOTIFICATIONCOMMENT = "$notification.comment$"
HOSTDISPLAYNAME = "$host.display_name$"
SERVICEDISPLAYNAME = "$service.display_name$"
USEREMAIL = "$user.email$"
2014-05-04 11:25:12 +02:00
}
}
The command attribute in the `mail-service-notification` command refers to the following
shell script. The macros specified in the `env` array are exported
as environment variables and can be used in the notification script:
#!/usr/bin/env bash
template=$(cat < < TEMPLATE
** *** Icinga ** ***
Notification Type: $NOTIFICATIONTYPE
Service: $SERVICEDESC
Host: $HOSTALIAS
Address: $HOSTADDRESS
State: $SERVICESTATE
Date/Time: $LONGDATETIME
Additional Info: $SERVICEOUTPUT
Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
TEMPLATE
)
/usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
2014-06-15 23:17:16 +02:00
2014-06-11 14:05:47 +02:00
> **Note**
>
> This example is for `exim` only. Requires changes for `sendmail` and
> other MTAs.
2014-05-04 11:25:12 +02:00
While it's possible to specify the entire notification command right
in the NotificationCommand object it is generally advisable to create a
shell script in the `/etc/icinga2/scripts` directory and have the
NotificationCommand object refer to that.
### <a id="event-commands"></a> Event Commands
2015-06-18 22:19:16 +02:00
Unlike notifications, event commands for hosts/services are called on every
2016-04-22 15:59:35 +02:00
check execution if one of these conditions matches:
2014-07-10 11:27:37 +02:00
2015-01-22 09:40:25 +01:00
* The host/service is in a [soft state ](3-monitoring-basics.md#hard-soft-states )
* The host/service state changes into a [hard state ](3-monitoring-basics.md#hard-soft-states )
2015-01-22 16:09:28 +01:00
* The host/service state recovers from a [soft or hard state ](3-monitoring-basics.md#hard-soft-states ) to [OK ](3-monitoring-basics.md#service-states )/[Up](3-monitoring-basics.md#host-states)
2014-07-10 11:27:37 +02:00
2016-08-13 15:59:06 +02:00
[EventCommand ](9-object-types.md#objecttype-eventcommand ) objects are referenced by
[Host ](9-object-types.md#objecttype-host ) and [Service ](9-object-types.md#objecttype-service ) objects
2014-09-15 19:03:55 +02:00
using the `event_command` attribute.
2014-07-10 11:27:37 +02:00
Therefore the `EventCommand` object should define a command line
2014-05-04 11:25:12 +02:00
evaluating the current service state and other service runtime attributes
2014-07-10 11:27:37 +02:00
available through runtime vars. Runtime macros such as `$service.state_type$`
and `$service.state$` will be processed by Icinga 2 helping on fine-granular
2014-05-04 11:25:12 +02:00
events being triggered.
2016-08-13 15:59:06 +02:00
If you are using a client as [command endpoint ](6-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint )
2016-05-09 17:42:56 +02:00
the event command will be executed on the client itself (similar to the check
command).
2014-05-04 11:25:12 +02:00
Common use case scenarios are a failing HTTP check requiring an immediate
restart via event command, or if an application is locked and requires
a restart upon detection.
2014-09-15 19:03:55 +02:00
#### <a id="event-command-restart-service-daemon"></a> Use Event Commands to Restart Service Daemon
2016-05-09 17:42:56 +02:00
The following example will trigger a restart of the `httpd` daemon
2014-09-15 19:03:55 +02:00
via ssh when the `http` service check fails. If the service state is
`OK` , it will not trigger any event action.
Requirements:
* ssh connection
* icinga user with public key authentication
* icinga user with sudo permissions for restarting the httpd daemon.
Example on Debian:
# ls /home/icinga/.ssh/
authorized_keys
# visudo
icinga ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
2014-05-04 11:25:12 +02:00
2016-08-13 15:59:06 +02:00
Define a generic [EventCommand ](9-object-types.md#objecttype-eventcommand ) object `event_by_ssh`
2014-09-15 19:03:55 +02:00
which can be used for all event commands triggered using ssh:
/* pass event commands through ssh */
object EventCommand "event_by_ssh" {
command = [ PluginDir + "/check_by_ssh" ]
arguments = {
"-H" = "$event_by_ssh_address$"
"-p" = "$event_by_ssh_port$"
"-C" = "$event_by_ssh_command$"
"-l" = "$event_by_ssh_logname$"
"-i" = "$event_by_ssh_identity$"
"-q" = {
set_if = "$event_by_ssh_quiet$"
}
"-w" = "$event_by_ssh_warn$"
"-c" = "$event_by_ssh_crit$"
"-t" = "$event_by_ssh_timeout$"
}
vars.event_by_ssh_address = "$address$"
vars.event_by_ssh_quiet = false
}
The actual event command only passes the `event_by_ssh_command` attribute.
The `event_by_ssh_service` custom attribute takes care of passing the correct
daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon
2015-04-03 00:33:04 +02:00
is only restarted when the service is not in an `OK` state.
2014-09-15 19:03:55 +02:00
2014-09-15 19:39:57 +02:00
object EventCommand "event_by_ssh_restart_service" {
import "event_by_ssh"
2014-09-15 19:03:55 +02:00
2014-09-15 19:39:57 +02:00
//only restart the daemon if state > 0 (not-ok)
//requires sudo permissions for the icinga user
vars.event_by_ssh_command = "test $service.state_id$ -gt 0 & & sudo /etc/init.d/$event_by_ssh_service$ restart"
}
2014-09-15 19:03:55 +02:00
Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it
which service should be restarted using the `event_by_ssh_service` attribute.
object Service "http" {
import "generic-service"
host_name = "remote-http-host"
check_command = "http"
event_command = "event_by_ssh_restart_service"
vars.event_by_ssh_service = "$host.vars.httpd_name$"
//vars.event_by_ssh_logname = "icinga"
//vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub"
2014-05-04 11:25:12 +02:00
}
2014-09-15 19:03:55 +02:00
Each host with this service then must define the `httpd_name` custom attribute
(for example generated from your cmdb):
object Host "remote-http-host" {
import "generic-host"
address = "192.168.1.100"
vars.httpd_name = "apache2"
}
You can testdrive this example by manually stopping the `httpd` daemon
on your `remote-http-host` . Enable the `debuglog` feature and tail the
`/var/log/icinga2/debug.log` file.
Remote Host Terminal:
# date; service apache2 status
Mon Sep 15 18:57:39 CEST 2014
Apache2 is running (pid 23651).
# date; service apache2 stop
Mon Sep 15 18:57:47 CEST 2014
[ ok ] Stopping web server: apache2 ... waiting .
Icinga 2 Host Terminal:
[2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100': PID 32622
[2014-09-15 18:58:32 +0200] notice/Process: PID 32622 ('/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100') terminated with exit code 2
[2014-09-15 18:58:32 +0200] notice/Checkable: State Change: Checkable remote-http-host!http soft state change from OK to CRITICAL detected.
[2014-09-15 18:58:32 +0200] notice/Checkable: Executing event handler 'event_by_ssh_restart_service' for service 'remote-http-host!http'
[2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 & & sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100': PID 32623
[2014-09-15 18:58:33 +0200] notice/Process: PID 32623 ('/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 & & sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100') terminated with exit code 0
Remote Host Terminal:
# date; service apache2 status
Mon Sep 15 18:58:44 CEST 2014
Apache2 is running (pid 24908).
2014-05-22 20:14:26 +02:00
## <a id="dependencies"></a> Dependencies
2014-05-04 11:25:12 +02:00
2016-08-13 15:59:06 +02:00
Icinga 2 uses host and service [Dependency ](9-object-types.md#objecttype-dependency ) objects
2014-05-22 20:14:26 +02:00
for determing their network reachability.
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
A service can depend on a host, and vice versa. A service has an implicit
2014-06-15 17:34:21 +02:00
dependency (parent) to its host. A host to host dependency acts implicitly
2014-05-22 20:14:26 +02:00
as host parent relation.
When dependencies are calculated, not only the immediate parent is taken into
account but all parents are inherited.
2014-05-12 19:40:02 +02:00
2014-12-15 11:08:34 +01:00
The `parent_host_name` and `parent_service_name` attributes are mandatory for
service dependencies, `parent_host_name` is required for host dependencies.
2015-01-22 09:40:25 +01:00
[Apply rules ](3-monitoring-basics.md#using-apply ) will allow you to
[determine these attributes ](3-monitoring-basics.md#dependencies-apply-custom-attributes ) in a more
2014-12-15 11:08:34 +01:00
dynamic fashion if required.
parent_host_name = "core-router"
parent_service_name = "uplink-port"
Notifications are suppressed by default if a host or service becomes unreachable.
You can control that option by defining the `disable_notifications` attribute.
disable_notifications = false
2015-02-08 00:30:58 +01:00
If the dependency should be triggered in the parent object's soft state, you
need to set `ignore_soft_states` to `false` .
2014-12-15 11:08:34 +01:00
The dependency state filter must be defined based on the parent object being
either a host (`Up`, `Down` ) or a service (`OK`, `Warning` , `Critical` , `Unknown` ).
The following example will make the dependency fail and trigger it if the parent
object is **not** in one of these states:
states = [ OK, Critical, Unknown ]
Rephrased: If the parent service object changes into the `Warning` state, this
dependency will fail and render all child objects (hosts or services) unreachable.
You can determine the child's reachability by querying the `is_reachable` attribute
2016-08-13 15:59:06 +02:00
in for example [DB IDO ](23-appendix.md#schema-db-ido-extensions ).
2014-05-04 11:25:12 +02:00
2014-08-04 10:47:30 +02:00
### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
Icinga 2 automatically adds an implicit dependency for services on their host. That way
service notifications are suppressed when a host is `DOWN` or `UNREACHABLE` . This dependency
does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
`states = [ Up ]` for all service objects.
Service checks are still executed. If you want to prevent them from happening, you can
apply the following dependency to all services setting their host as `parent_host_name`
and disabling the checks. `assign where true` matches on all `Service` objects.
apply Dependency "disable-host-service-checks" to Service {
disable_checks = true
assign where true
}
2014-06-04 17:52:32 +02:00
### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
2014-05-22 20:14:26 +02:00
A common scenario is the Icinga 2 server behind a router. Checking internet
access by pinging the Google DNS server `google-dns` is a common method, but
will fail in case the `dsl-router` host is down. Therefore the example below
2014-06-15 17:34:21 +02:00
defines a host dependency which acts implicitly as parent relation too.
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
Furthermore the host may be reachable but ping probes are dropped by the
2015-03-28 01:42:12 +01:00
router's firewall. In case the `dsl-router` 's `ping4` service check fails, all
2014-05-22 20:14:26 +02:00
further checks for the `ping4` service on host `google-dns` service should
be suppressed. This is achieved by setting the `disable_checks` attribute to `true` .
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
object Host "dsl-router" {
2014-12-10 18:17:27 +01:00
import "generic-host"
2014-05-22 20:14:26 +02:00
address = "192.168.1.1"
2014-05-04 11:25:12 +02:00
}
2014-05-22 20:14:26 +02:00
object Host "google-dns" {
2014-12-10 18:17:27 +01:00
import "generic-host"
2014-05-22 20:14:26 +02:00
address = "8.8.8.8"
}
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
apply Service "ping4" {
import "generic-service"
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
check_command = "ping4"
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
assign where host.address
2014-05-04 11:25:12 +02:00
}
2014-06-15 23:17:16 +02:00
apply Dependency "internet" to Host {
parent_host_name = "dsl-router"
disable_checks = true
disable_notifications = true
assign where host.name != "dsl-router"
}
2014-05-22 20:14:26 +02:00
apply Dependency "internet" to Service {
parent_host_name = "dsl-router"
2014-06-15 23:17:16 +02:00
parent_service_name = "ping4"
2014-05-22 20:14:26 +02:00
disable_checks = true
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
assign where host.name != "dsl-router"
2014-05-04 11:25:12 +02:00
}
2014-11-08 19:12:34 +01:00
### <a id="dependencies-apply-custom-attributes"></a> Apply Dependencies based on Custom Attributes
2015-01-22 09:40:25 +01:00
You can use [apply rules ](3-monitoring-basics.md#using-apply ) to set parent or
2016-04-22 15:59:35 +02:00
child attributes, e.g. `parent_host_name` to other objects'
2014-11-08 19:12:34 +01:00
attributes.
A common example are virtual machines hosted on a master. The object
name of that master is auto-generated from your CMDB or VMWare inventory
into the host's custom attributes (or a generic template for your
cloud).
Define your master host object:
/* your master */
object Host "master.example.com" {
import "generic-host"
}
Add a generic template defining all common host attributes:
/* generic template for your virtual machines */
template Host "generic-vm" {
import "generic-host"
}
Add a template for all hosts on your example.com cloud setting
custom attribute `vm_parent` to `master.example.com` :
template Host "generic-vm-example.com" {
import "generic-vm"
vars.vm_parent = "master.example.com"
}
Define your guest hosts:
object Host "www.example1.com" {
import "generic-vm-master.example.com"
}
object Host "www.example2.com" {
import "generic-vm-master.example.com"
}
Apply the host dependency to all child hosts importing the
`generic-vm` template and set the `parent_host_name`
to the previously defined custom attribute `host.vars.vm_parent` .
apply Dependency "vm-host-to-parent-master" to Host {
parent_host_name = host.vars.vm_parent
assign where "generic-vm" in host.templates
}
You can extend this example, and make your services depend on the
`master.example.com` host too. Their local scope allows you to use
`host.vars.vm_parent` similar to the example above.
apply Dependency "vm-service-to-parent-master" to Service {
parent_host_name = host.vars.vm_parent
assign where "generic-vm" in host.templates
}
That way you don't need to wait for your guest hosts becoming
unreachable when the master host goes down. Instead the services
will detect their reachability immediately when executing checks.
> **Note**
>
> This method with setting locally scoped variables only works in
> apply rules, but not in object definitions.
2014-06-15 23:17:16 +02:00
2014-06-04 17:52:32 +02:00
### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
2014-05-22 20:14:26 +02:00
Another classic example are agent based checks. You would define a health check
for the agent daemon responding to your requests, and make all other services
querying that daemon depend on that health check.
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
The following configuration defines two nrpe based service checks `nrpe-load`
2016-11-25 13:40:42 +01:00
and `nrpe-disk` applied to the host `nrpe-server` [matched ](18-library-reference.md#global-functions-match )
by its name. The health check is defined as `nrpe-health` service.
2014-05-04 11:25:12 +02:00
2014-05-22 20:14:26 +02:00
apply Service "nrpe-health" {
import "generic-service"
check_command = "nrpe"
assign where match("nrpe-*", host.name)
2014-05-04 11:25:12 +02:00
}
2014-05-22 20:14:26 +02:00
apply Service "nrpe-load" {
import "generic-service"
check_command = "nrpe"
vars.nrpe_command = "check_load"
assign where match("nrpe-*", host.name)
2014-05-04 11:25:12 +02:00
}
2014-05-22 20:14:26 +02:00
apply Service "nrpe-disk" {
import "generic-service"
check_command = "nrpe"
vars.nrpe_command = "check_disk"
assign where match("nrpe-*", host.name)
2014-05-04 11:25:12 +02:00
}
2014-05-22 20:14:26 +02:00
object Host "nrpe-server" {
import "generic-host"
2014-07-11 14:38:15 +02:00
address = "192.168.1.5"
2014-05-04 11:25:12 +02:00
}
2014-05-22 20:14:26 +02:00
apply Dependency "disable-nrpe-checks" to Service {
parent_service_name = "nrpe-health"
2014-05-04 11:25:12 +02:00
2014-06-02 13:51:28 +02:00
states = [ OK ]
2014-05-22 20:14:26 +02:00
disable_checks = true
disable_notifications = true
2014-06-02 13:51:28 +02:00
assign where service.check_command == "nrpe"
2014-05-22 20:14:26 +02:00
ignore where service.name == "nrpe-health"
2014-05-04 11:25:12 +02:00
}
2014-05-22 20:14:26 +02:00
The `disable-nrpe-checks` dependency is applied to all services
2014-06-02 13:51:28 +02:00
on the `nrpe-service` host using the `nrpe` check_command attribute
but not the `nrpe-health` service itself.