mirror of https://github.com/Icinga/icinga2.git
2565 lines
99 KiB
Markdown
2565 lines
99 KiB
Markdown
# <a id="monitoring-basics"></a> Monitoring Basics
|
|
|
|
This part of the Icinga 2 documentation provides an overview of all the basic
|
|
monitoring concepts you need to know to run Icinga 2.
|
|
|
|
## <a id="hosts-services"></a> Hosts and Services
|
|
|
|
Icinga 2 can be used to monitor the availability of hosts and services. Hosts
|
|
and services can be virtually anything which can be checked in some way:
|
|
|
|
* Network services (HTTP, SMTP, SNMP, SSH, etc.)
|
|
* Printers
|
|
* Switches / routers
|
|
* Temperature sensors
|
|
* Other local or network-accessible services
|
|
|
|
Host objects provide a mechanism to group services that are running
|
|
on the same physical device.
|
|
|
|
Here is an example of a host object which defines two child services:
|
|
|
|
object Host "my-server1" {
|
|
address = "10.0.0.1"
|
|
check_command = "hostalive"
|
|
}
|
|
|
|
object Service "ping4" {
|
|
host_name = "my-server1"
|
|
check_command = "ping4"
|
|
}
|
|
|
|
object Service "http" {
|
|
host_name = "my-server1"
|
|
check_command = "http"
|
|
}
|
|
|
|
The example creates two services `ping4` and `http` which belong to the
|
|
host `my-server1`.
|
|
|
|
It also specifies that the host should perform its own check using the `hostalive`
|
|
check command.
|
|
|
|
The `address` attribute is used by check commands to determine which network
|
|
address is associated with the host object.
|
|
|
|
Details on troubleshooting check problems can be found [here](12-troubleshooting.md#troubleshooting).
|
|
|
|
### <a id="host-states"></a> Host States
|
|
|
|
Hosts can be in any of the following states:
|
|
|
|
Name | Description
|
|
------------|--------------
|
|
UP | The host is available.
|
|
DOWN | The host is unavailable.
|
|
|
|
### <a id="service-states"></a> Service States
|
|
|
|
Services can be in any of the following states:
|
|
|
|
Name | Description
|
|
------------|--------------
|
|
OK | The service is working properly.
|
|
WARNING | The service is experiencing some problems but is still considered to be in working condition.
|
|
CRITICAL | The service is in a critical state.
|
|
UNKNOWN | The check could not determine the service's state.
|
|
|
|
### <a id="hard-soft-states"></a> Hard and Soft States
|
|
|
|
When detecting a problem with a host/service Icinga re-checks the object a number of
|
|
times (based on the `max_check_attempts` and `retry_interval` settings) before sending
|
|
notifications. This ensures that no unnecessary notifications are sent for
|
|
transient failures. During this time the object is in a `SOFT` state.
|
|
|
|
After all re-checks have been executed and the object is still in a non-OK
|
|
state the host/service switches to a `HARD` state and notifications are sent.
|
|
|
|
Name | Description
|
|
------------|--------------
|
|
HARD | The host/service's state hasn't recently changed.
|
|
SOFT | The host/service has recently changed state and is being re-checked.
|
|
|
|
### <a id="host-service-checks"></a> Host and Service Checks
|
|
|
|
Hosts and Services determine their state from a check result returned from a check
|
|
execution to the Icinga 2 application. By default the `generic-host` example template
|
|
will define `hostalive` as host check. If your host is unreachable for ping, you should
|
|
consider using a different check command, for instance the `http` check command, or if
|
|
there is no check available, the `dummy` check command.
|
|
|
|
object Host "uncheckable-host" {
|
|
check_command = "dummy"
|
|
vars.dummy_state = 1
|
|
vars.dummy_text = "Pretending to be OK."
|
|
}
|
|
|
|
Service checks could also use a `dummy` check, but the common strategy is to
|
|
[integrate an existing plugin](3-monitoring-basics.md#command-plugin-integration) as
|
|
[check command](3-monitoring-basics.md#check-commands) and [reference](3-monitoring-basics.md#command-passing-parameters)
|
|
that in your [Service](5-object-types.md#objecttype-service) object definition.
|
|
|
|
## <a id="configuration-best-practice"></a> Configuration Best Practice
|
|
|
|
The [Getting Started](2-getting-started.md#getting-started) chapter already introduced various aspects
|
|
of the Icinga 2 configuration language. If you are ready to configure additional
|
|
hosts, services, notifications, dependencies, etc, you should think about the
|
|
requirements first and then decide for a possible strategy.
|
|
|
|
There are many ways of creating Icinga 2 configuration objects:
|
|
|
|
* Manually with your preferred editor, for example vi(m), nano, notepad, etc.
|
|
* Generated by a [configuration management too](9-addons-plugins.md#configuration-tools) such as Puppet, Chef, Ansible, etc.
|
|
* A configuration addon for Icinga 2
|
|
* A custom exporter script from your CMDB or inventory tool
|
|
* your own.
|
|
|
|
In order to find the best strategy for your own configuration, ask yourself the following questions:
|
|
|
|
* Do your hosts share a common group of services (for example linux hosts with disk, load, etc checks)?
|
|
* Only a small set of users receives notifications and escalations for all hosts/services?
|
|
|
|
If you can at least answer one of these questions with yes, look for the [apply rules](3-monitoring-basics.md#using-apply) logic
|
|
instead of defining objects on a per host and service basis.
|
|
|
|
* You are required to define specific configuration for each host/service?
|
|
* Does your configuration generation tool already know about the host-service-relationship?
|
|
|
|
Then you should look for the object specific configuration setting `host_name` etc accordingly.
|
|
|
|
Finding the best files and directory tree for your configuration is up to you. Make sure that
|
|
the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file includes them, and then think about:
|
|
|
|
* tree-based on locations, hostgroups, specific host attributes with sub levels of directories.
|
|
* flat `hosts.conf`, `services.conf`, etc files for rule based configuration.
|
|
* generated configuration with one file per host and a global configuration for groups, users, etc.
|
|
* one big file generated from an external application (probably a bad idea for maintaining changes).
|
|
* your own.
|
|
|
|
In either way of choosing the right strategy you should additionally check the following:
|
|
|
|
* Are there any specific attributes describing the host/service you could set as `vars` custom attributes?
|
|
You can later use them for applying assign/ignore rules, or export them into external interfaces.
|
|
* Put hosts into hostgroups, services into servicegroups and use these attributes for your apply rules.
|
|
* Use templates to store generic attributes for your objects and apply rules making your configuration more readable.
|
|
Details can be found in the [using templates](3-monitoring-basics.md#object-inheritance-using-templates) chapter.
|
|
* Apply rules may overlap. Keep a central place (for example, [services.conf](4-configuring-icinga-2.md#services-conf) or [notifications.conf](4-configuring-icinga-2.md#notifications-conf)) storing
|
|
the configuration instead of defining apply rules deep in your configuration tree.
|
|
* Every plugin used as check, notification or event command requires a `Command` definition.
|
|
Further details can be looked up in the [check commands](3-monitoring-basics.md#check-commands) chapter.
|
|
|
|
If you happen to have further questions, do not hesitate to join the [community support channels](https://support.icinga.org)
|
|
and ask community members for their experience and best practices.
|
|
|
|
|
|
### <a id="object-inheritance-using-templates"></a> Object Inheritance Using Templates
|
|
|
|
Templates may be used to apply a set of identical attributes to more than one
|
|
object:
|
|
|
|
template Service "generic-service" {
|
|
max_check_attempts = 3
|
|
check_interval = 5m
|
|
retry_interval = 1m
|
|
enable_perfdata = true
|
|
}
|
|
|
|
template Service "ipv6-service {
|
|
notes = "IPv6 critical != IPv4 broken."
|
|
}
|
|
|
|
apply Service "ping4" {
|
|
import "generic-service"
|
|
|
|
check_command = "ping4"
|
|
|
|
assign where host.address
|
|
}
|
|
|
|
apply Service "ping6" {
|
|
import "generic-service"
|
|
import "ipv6-service"
|
|
|
|
check_command = "ping6"
|
|
|
|
assign where host.address6
|
|
}
|
|
|
|
|
|
In this example the `ping4` and `ping6` services inherit properties from the
|
|
template `generic-service`. The `ping6` service additionally imports the `ipv6-service`
|
|
template with the `notes` attribute.
|
|
|
|
Objects as well as templates themselves can import an arbitrary number of
|
|
templates. Attributes inherited from a template can be overridden in the
|
|
object if necessary.
|
|
|
|
You can import existing non-template objects into objects which
|
|
requires you to use unique names for templates and objects sharing
|
|
the same namespace.
|
|
|
|
Example for importing objects:
|
|
|
|
object CheckCommand "snmp-simple" {
|
|
...
|
|
vars.snmp_defaults = ...
|
|
}
|
|
|
|
object CheckCommand "snmp-advanced" {
|
|
import "snmp-simple"
|
|
...
|
|
vars.snmp_advanced = ...
|
|
}
|
|
|
|
### <a id="using-apply"></a> Apply objects based on rules
|
|
|
|
Instead of assigning each object ([Service](5-object-types.md#objecttype-service),
|
|
[Notification](5-object-types.md#objecttype-notification), [Dependency](5-object-types.md#objecttype-dependency),
|
|
[ScheduledDowntime](5-object-types.md#objecttype-scheduleddowntime))
|
|
based on attribute identifiers for example `host_name` objects can be [applied](15-language-reference.md#apply).
|
|
|
|
Before you start using the apply rules keep the following in mind:
|
|
|
|
* Define the best match.
|
|
* A set of unique [custom attributes](3-monitoring-basics.md#custom-attributes-apply) for these hosts/services?
|
|
* Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it?
|
|
* A generic pattern [match](15-language-reference.md#function-calls) on the host/service name?
|
|
* [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](15-language-reference.md#expression-operators)
|
|
* All expressions must return a boolean value (an empty string is equal to `false` e.g.)
|
|
|
|
> **Note**
|
|
>
|
|
> You can set/override object attributes in apply rules using the respectively available
|
|
> objects in that scope (host and/or service objects).
|
|
|
|
[Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them
|
|
for not only matching for their existance or values in apply expressions, but also assign
|
|
("inherit") their values into the generated objected from apply rules.
|
|
|
|
* [Apply services to hosts](3-monitoring-basics.md#using-apply-services)
|
|
* [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications)
|
|
* [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
|
|
* [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes)
|
|
|
|
A more advanced example is using [apply with for loops on arrays or
|
|
dictionaries](#using-apply-for) for example provided by
|
|
[custom atttributes](3-monitoring-basics.md#custom-attributes-apply) or groups.
|
|
|
|
> **Tip**
|
|
>
|
|
> Building configuration in that dynamic way requires detailed information
|
|
> of the generated objects. Use the `object list` [CLI command](7-cli-commands.md#cli-command-object)
|
|
> after successful [configuration validation](7-cli-commands.md#config-validation).
|
|
|
|
|
|
#### <a id="using-apply-expressions"></a> Apply Rules Expressions
|
|
|
|
You can use simple or advanced combinations of apply rule expressions. Each
|
|
expression must evaluate into the boolean `true` value. An empty string
|
|
will be for instance interpreted as `false`. In a similar fashion undefined
|
|
attributes will return `false`.
|
|
|
|
Returns `false`:
|
|
|
|
assign where host.vars.attribute_does_not_exist
|
|
|
|
Multiple `assign where` condition rows are evaluated as `OR` condition.
|
|
|
|
You can combine multiple expressions for matching only a subset of objects. In some cases,
|
|
you want to be able to add more than one assign/ignore where expression which matches
|
|
a specific condition. To achieve this you can use the logical `and` and `or` operators.
|
|
|
|
|
|
Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db`
|
|
matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true`
|
|
should be ignored, or any host name ending with `*internal` pattern.
|
|
|
|
object HostGroup "mysql-server" {
|
|
display_name = "MySQL Server"
|
|
|
|
assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db)
|
|
ignore where host.vars.test_server == true
|
|
ignore where match("*internal", host.name)
|
|
}
|
|
|
|
Similar example for advanced notification apply rule filters: If the service
|
|
attribute `notes` contains the `has gold support 24x7` string `AND` one of the
|
|
two condition passes: Either the `customer` host custom attribute is set to `customer-xy`
|
|
`OR` the host custom attribute `always_notify` is set to `true`.
|
|
|
|
The notification is ignored for services whose host name ends with `*internal`
|
|
`OR` the `priority` custom attribute is [less than](15-language-reference.md#expression-operators) `2`.
|
|
|
|
template Notification "cust-xy-notification" {
|
|
users = [ "noc-xy", "mgmt-xy" ]
|
|
command = "mail-service-notification"
|
|
}
|
|
|
|
apply Notification "notify-cust-xy-mysql" to Service {
|
|
import "cust-xy-notification"
|
|
|
|
assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
|
|
ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
|
|
}
|
|
|
|
|
|
|
|
|
|
#### <a id="using-apply-services"></a> Apply Services to Hosts
|
|
|
|
The sample configuration already includes a detailed example in [hosts.conf](4-configuring-icinga-2.md#hosts-conf)
|
|
and [services.conf](4-configuring-icinga-2.md#services-conf) for this use case.
|
|
|
|
The example for `ssh` applies a service object to all hosts with the `address`
|
|
attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`.
|
|
|
|
apply Service "ssh" {
|
|
import "generic-service"
|
|
|
|
check_command = "ssh"
|
|
|
|
assign where host.address && host.vars.os == "Linux"
|
|
}
|
|
|
|
|
|
Other detailed scenario examples are used in their respective chapters, for example
|
|
[apply services with custom command arguments](3-monitoring-basics.md#using-apply-services-command-arguments).
|
|
|
|
#### <a id="using-apply-notifications"></a> Apply Notifications to Hosts and Services
|
|
|
|
Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
|
|
manner:
|
|
|
|
|
|
apply Notification "mail-noc" to Service {
|
|
import "mail-service-notification"
|
|
|
|
user_groups = [ "noc" ]
|
|
|
|
assign where host.vars.notification.mail
|
|
}
|
|
|
|
|
|
In this example the `mail-noc` notification will be created as object for all services having the
|
|
`notification.mail` custom attribute defined. The notification command is set to `mail-service-notification`
|
|
and all members of the user group `noc` will get notified.
|
|
|
|
#### <a id="using-apply-dependencies"></a> Apply Dependencies to Hosts and Services
|
|
|
|
Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter.
|
|
|
|
#### <a id="using-apply-scheduledowntimes"></a> Apply Recurring Downtimes to Hosts and Services
|
|
|
|
The sample confituration includes an example in [downtimes.conf](4-configuring-icinga-2.md#downtimes-conf).
|
|
|
|
Detailed examples can be found in the [recurring downtimes](3-monitoring-basics.md#recurring-downtimes) chapter.
|
|
|
|
|
|
#### <a id="using-apply-for"></a> Using Apply For Rules
|
|
|
|
Next to the standard way of using apply rules there is the requirement of generating
|
|
apply rules objects based on set (array or dictionary). That way you'll save quite
|
|
of a lot of duplicated apply rules by combining them into one generic generating
|
|
the object name with or without a prefix.
|
|
|
|
The sample configuration already includes a detailed example in [hosts.conf](4-configuring-icinga-2.md#hosts-conf)
|
|
and [services.conf](4-configuring-icinga-2.md#services-conf) for this use case.
|
|
|
|
Imagine a different example: You are monitoring your switch (hosts) with many
|
|
interfaces (services). The following requirements/problems apply:
|
|
|
|
* Each interface service check should be named with a prefix and a running number
|
|
* Each interface has its own vlan tag
|
|
* Some interfaces have QoS enabled
|
|
* Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be
|
|
dynamically generated
|
|
|
|
By defining the `interfaces` dictionary with three example interfaces on the `core-switch`
|
|
host object, you'll make sure to pass the storage required by the for loop in the service apply
|
|
rule.
|
|
|
|
|
|
object Host "core-switch" {
|
|
import "generic-host"
|
|
address = "127.0.0.1"
|
|
|
|
vars.interfaces["0"] = {
|
|
port = 1
|
|
vlan = "internal"
|
|
address = "127.0.0.2"
|
|
qos = "enabled"
|
|
}
|
|
vars.interfaces["1"] = {
|
|
port = 2
|
|
vlan = "mgmt"
|
|
address = "127.0.1.2"
|
|
}
|
|
vars.interfaces["2"] = {
|
|
port = 3
|
|
vlan = "remote"
|
|
address = "127.0.2.2"
|
|
}
|
|
}
|
|
|
|
You can also omit the `"if-"` string, then all generated service names are directly
|
|
taken from the `if_name` variable value.
|
|
|
|
The config dictionary contains all key-value pairs for the specific interface in one
|
|
loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface.
|
|
|
|
By defining a default value for the custom attribute `qos` in the `vars` dictionary
|
|
before adding the `config` dictionary we''ll ensure that this attribute is always defined.
|
|
|
|
After `vars` is fully populated, all object attributes can be set. For strings, you can use
|
|
string concatention with the `+` operator.
|
|
|
|
You can also specifiy the check command that way.
|
|
|
|
apply Service "if-" for (if_name => config in host.vars.interfaces) {
|
|
import "generic-service"
|
|
check_command = "ping4"
|
|
|
|
vars.qos = "disabled"
|
|
vars += config
|
|
|
|
display_name = "if-" + if_name + "-" + vars.vlan
|
|
|
|
notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos
|
|
notes_url = "http://foreman.company.com/hosts/" + host.name
|
|
action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name
|
|
}
|
|
|
|
Note that numbers must be explicitely casted to string when adding to strings.
|
|
This can be achieved by wrapping them into the [string()](15-language-reference.md#function-calls) function.
|
|
|
|
> **Tip**
|
|
>
|
|
> Building configuration in that dynamic way requires detailed information
|
|
> of the generated objects. Use the `object list` [CLI command](7-cli-commands.md#cli-command-object)
|
|
> after successful [configuration validation](7-cli-commands.md#config-validation).
|
|
|
|
|
|
#### <a id="using-apply-object attributes"></a> Use Object Attributes in Apply Rules
|
|
|
|
Since apply rules are evaluated after the generic objects, you
|
|
can reference existing host and/or service object attributes as
|
|
values for any object attribute specified in that apply rule.
|
|
|
|
object Host "opennebula-host" {
|
|
import "generic-host"
|
|
address = "10.1.1.2"
|
|
|
|
vars.hosting["xyz"] = {
|
|
http_uri = "/shop"
|
|
customer_name = "Customer xyz"
|
|
customer_id = "7568"
|
|
support_contract = "gold"
|
|
}
|
|
vars.hosting["abc"] = {
|
|
http_uri = "/shop"
|
|
customer_name = "Customer xyz"
|
|
customer_id = "7568"
|
|
support_contract = "silver"
|
|
}
|
|
}
|
|
|
|
apply Service for (customer => config in host.vars.hosting) {
|
|
import "generic-service"
|
|
check_command = "ping4"
|
|
|
|
vars.qos = "disabled"
|
|
|
|
vars += config
|
|
|
|
vars.http_uri = "/" + vars.customer + "/" + config.http_uri
|
|
|
|
display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id
|
|
|
|
notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")."
|
|
|
|
notes_url = "http://foreman.company.com/hosts/" + host.name
|
|
action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id
|
|
}
|
|
|
|
### <a id="groups"></a> Groups
|
|
|
|
Groups are used for combining hosts, services, and users into
|
|
accessible configuration attributes and views in external (web)
|
|
interfaces.
|
|
|
|
Group membership is defined at the respective object itself. If
|
|
you have a hostgroup name `windows` for example, and want to assign
|
|
specific hosts to this group for later viewing the group on your
|
|
alert dashboard, first create the hostgroup:
|
|
|
|
object HostGroup "windows" {
|
|
display_name = "Windows Servers"
|
|
}
|
|
|
|
Then add your hosts to this hostgroup
|
|
|
|
template Host "windows-server" {
|
|
groups += [ "windows" ]
|
|
}
|
|
|
|
object Host "mssql-srv1" {
|
|
import "windows-server"
|
|
|
|
vars.mssql_port = 1433
|
|
}
|
|
|
|
object Host "mssql-srv2" {
|
|
import "windows-server"
|
|
|
|
vars.mssql_port = 1433
|
|
}
|
|
|
|
This can be done for service and user groups the same way. Additionally
|
|
the user groups are associated as attributes in `Notification` objects.
|
|
|
|
object UserGroup "windows-mssql-admins" {
|
|
display_name = "Windows MSSQL Admins"
|
|
}
|
|
|
|
template User "generic-windows-mssql-users" {
|
|
groups += [ "windows-mssql-admins" ]
|
|
}
|
|
|
|
object User "win-mssql-noc" {
|
|
import "generic-windows-mssql-users"
|
|
|
|
email = "noc@example.com"
|
|
}
|
|
|
|
object User "win-mssql-ops" {
|
|
import "generic-windows-mssql-users"
|
|
|
|
email = "ops@example.com"
|
|
}
|
|
|
|
#### <a id="group-assign-intro"></a> Group Membership Assign
|
|
|
|
If there is a certain number of hosts, services, or users matching a pattern
|
|
it's reasonable to assign the group object to these members.
|
|
Details on the `assign where` syntax can be found [here](15-language-reference.md#apply)
|
|
|
|
object HostGroup "prod-mssql" {
|
|
display_name = "Production MSSQL Servers"
|
|
assign where host.vars.mssql_port && host.vars.prod_mysql_db
|
|
ignore where host.vars.test_server == true
|
|
ignore where match("*internal", host.name)
|
|
}
|
|
|
|
In this inherited example from above all hosts with the `vars` attribute `mssql_port`
|
|
set will be added as members to the host group `mssql`. All `*internal`
|
|
hosts or with the `test_server` attribute set to `true` will be ignored.
|
|
|
|
## <a id="notifications"></a> Notifications
|
|
|
|
Notifications for service and host problems are an integral part of your
|
|
monitoring setup.
|
|
|
|
When a host or service is in a downtime, a problem has been acknowledged or
|
|
the dependency logic determined that the host/service is unreachable, no
|
|
notifications are sent. You can configure additional type and state filters
|
|
refining the notifications being actually sent.
|
|
|
|
There are many ways of sending notifications, e.g. by e-mail, XMPP,
|
|
IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
|
|
Instead it relies on external mechanisms such as shell scripts to notify users.
|
|
|
|
A notification specification requires one or more users (and/or user groups)
|
|
who will be notified in case of problems. These users must have all custom
|
|
attributes defined which will be used in the `NotificationCommand` on execution.
|
|
|
|
The user `icingaadmin` in the example below will get notified only on `WARNING` and
|
|
`CRITICAL` states and `problem` and `recovery` notification types.
|
|
|
|
object User "icingaadmin" {
|
|
display_name = "Icinga 2 Admin"
|
|
enable_notifications = true
|
|
states = [ OK, Warning, Critical ]
|
|
types = [ Problem, Recovery ]
|
|
email = "icinga@localhost"
|
|
}
|
|
|
|
If you don't set the `states` and `types` configuration attributes for the `User`
|
|
object, notifications for all states and types will be sent.
|
|
|
|
Details on troubleshooting notification problems can be found [here](12-troubleshooting.md#troubleshooting).
|
|
|
|
> **Note**
|
|
>
|
|
> Make sure that the [notification](7-cli-commands.md#features) feature is enabled on your master instance
|
|
> in order to execute notification commands.
|
|
|
|
You should choose which information you (and your notified users) are interested in
|
|
case of emergency, and also which information does not provide any value to you and
|
|
your environment.
|
|
|
|
An example notification command is explained [here](3-monitoring-basics.md#notification-commands).
|
|
|
|
You can add all shared attributes to a `Notification` template which is inherited
|
|
to the defined notifications. That way you'll save duplicated attributes in each
|
|
`Notification` object. Attributes can be overridden locally.
|
|
|
|
template Notification "generic-notification" {
|
|
interval = 15m
|
|
|
|
command = "mail-service-notification"
|
|
|
|
states = [ Warning, Critical, Unknown ]
|
|
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
|
|
FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
|
|
|
|
period = "24x7"
|
|
}
|
|
|
|
The time period `24x7` is included as example configuration with Icinga 2.
|
|
|
|
Use the `apply` keyword to create `Notification` objects for your services:
|
|
|
|
apply Notification "notify-cust-xy-mysql" to Service {
|
|
import "generic-notification"
|
|
|
|
users = [ "noc-xy", "mgmt-xy" ]
|
|
|
|
assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true
|
|
ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true)
|
|
}
|
|
|
|
|
|
Instead of assigning users to notifications, you can also add the `user_groups`
|
|
attribute with a list of user groups to the `Notification` object. Icinga 2 will
|
|
send notifications to all group members.
|
|
|
|
> **Note**
|
|
>
|
|
> Only users who have been notified of a problem before (`Warning`, `Critical`, `Unknown`
|
|
> states for services, `Down` for hosts) will receive `Recovery` notifications.
|
|
|
|
### <a id="notification-escalations"></a> Notification Escalations
|
|
|
|
When a problem notification is sent and a problem still exists at the time of re-notification
|
|
you may want to escalate the problem to the next support level. A different approach
|
|
is to configure the default notification by email, and escalate the problem via SMS
|
|
if not already solved.
|
|
|
|
You can define notification start and end times as additional configuration
|
|
attributes making the `Notification` object a so-called `notification escalation`.
|
|
Using templates you can share the basic notification attributes such as users or the
|
|
`interval` (and override them for the escalation then).
|
|
|
|
Using the example from above, you can define additional users being escalated for SMS
|
|
notifications between start and end time.
|
|
|
|
object User "icinga-oncall-2nd-level" {
|
|
display_name = "Icinga 2nd Level"
|
|
|
|
vars.mobile = "+1 555 424642"
|
|
}
|
|
|
|
object User "icinga-oncall-1st-level" {
|
|
display_name = "Icinga 1st Level"
|
|
|
|
vars.mobile = "+1 555 424642"
|
|
}
|
|
|
|
Define an additional [NotificationCommand](#notification) for SMS notifications.
|
|
|
|
> **Note**
|
|
>
|
|
> The example is not complete as there are many different SMS providers.
|
|
> Please note that sending SMS notifications will require an SMS provider
|
|
> or local hardware with a SIM card active.
|
|
|
|
object NotificationCommand "sms-notification" {
|
|
command = [
|
|
PluginDir + "/send_sms_notification",
|
|
"$mobile$",
|
|
"..."
|
|
}
|
|
|
|
The two new notification escalations are added onto the local host
|
|
and its service `ping4` using the `generic-notification` template.
|
|
The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
|
|
command) after `30m` until `1h`.
|
|
|
|
> **Note**
|
|
>
|
|
> The `interval` was set to 15m in the `generic-notification`
|
|
> template example. Lower that value in your escalations by using a secondary
|
|
> template or by overriding the attribute directly in the `notifications` array
|
|
> position for `escalation-sms-2nd-level`.
|
|
|
|
If the problem does not get resolved nor acknowledged preventing further notifications
|
|
the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
|
|
notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
|
|
|
|
apply Notification "mail" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
apply Notification "escalation-sms-2nd-level" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "sms-notification"
|
|
users = [ "icinga-oncall-2nd-level" ]
|
|
|
|
times = {
|
|
begin = 30m
|
|
end = 1h
|
|
}
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
apply Notification "escalation-sms-1st-level" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "sms-notification"
|
|
users = [ "icinga-oncall-1st-level" ]
|
|
|
|
times = {
|
|
begin = 1h
|
|
end = 2h
|
|
}
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
### <a id="notification-delay"></a> Notification Delay
|
|
|
|
Sometimes the problem in question should not be notified when the notification is due
|
|
(the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2
|
|
you can use the `times` dictionary and set `begin = 15m` as key and value if you want to
|
|
postpone the notification window for 15 minutes. Leave out the `end` key - if not set,
|
|
Icinga 2 will not check against any end time for this notification. Make sure to
|
|
specify a relatively low notification `interval` to get notified soon enough again.
|
|
|
|
apply Notification "mail" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
|
|
interval = 5m
|
|
|
|
times.begin = 15m // delay notification window
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
### <a id="disable-renotification"></a> Disable Re-notifications
|
|
|
|
If you prefer to be notified only once, you can disable re-notifications by setting the
|
|
`interval` attribute to `0`.
|
|
|
|
apply Notification "notify-once" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
|
|
interval = 0 // disable re-notification
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
|
|
|
|
If there are no notification state and type filter attributes defined at the `Notification`
|
|
or `User` object Icinga 2 assumes that all states and types are being notified.
|
|
|
|
Available state and type filters for notifications are:
|
|
|
|
template Notification "generic-notification" {
|
|
|
|
states = [ Warning, Critical, Unknown ]
|
|
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
|
|
FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
|
|
}
|
|
|
|
If you are familiar with Icinga 1.x `notification_options` please note that they have been split
|
|
into type and state to allow more fine granular filtering for example on downtimes and flapping.
|
|
You can filter for acknowledgements and custom notifications too.s and custom notifications too.
|
|
|
|
|
|
## <a id="timeperiods"></a> Time Periods
|
|
|
|
Time Periods define time ranges in Icinga where event actions are
|
|
triggered, for example whether a service check is executed or not within
|
|
the `check_period` attribute. Or a notification should be sent to
|
|
users or not, filtered by the `period` and `notification_period`
|
|
configuration attributes for `Notification` and `User` objects.
|
|
|
|
> **Note**
|
|
>
|
|
> If you are familar with Icinga 1.x - these time period definitions
|
|
> are called `legacy timeperiods` in Icinga 2.
|
|
>
|
|
> An Icinga 2 legacy timeperiod requires the `ITL` provided template
|
|
>`legacy-timeperiod`.
|
|
|
|
The `TimePeriod` attribute `ranges` may contain multiple directives,
|
|
including weekdays, days of the month, and calendar dates.
|
|
These types may overlap/override other types in your ranges dictionary.
|
|
|
|
The descending order of precedence is as follows:
|
|
|
|
* Calendar date (2008-01-01)
|
|
* Specific month date (January 1st)
|
|
* Generic month date (Day 15)
|
|
* Offset weekday of specific month (2nd Tuesday in December)
|
|
* Offset weekday (3rd Monday)
|
|
* Normal weekday (Tuesday)
|
|
|
|
If you don't set any `check_period` or `notification_period` attribute
|
|
on your configuration objects Icinga 2 assumes `24x7` as time period
|
|
as shown below.
|
|
|
|
object TimePeriod "24x7" {
|
|
import "legacy-timeperiod"
|
|
|
|
display_name = "Icinga 2 24x7 TimePeriod"
|
|
ranges = {
|
|
"monday" = "00:00-24:00"
|
|
"tuesday" = "00:00-24:00"
|
|
"wednesday" = "00:00-24:00"
|
|
"thursday" = "00:00-24:00"
|
|
"friday" = "00:00-24:00"
|
|
"saturday" = "00:00-24:00"
|
|
"sunday" = "00:00-24:00"
|
|
}
|
|
}
|
|
|
|
If your operation staff should only be notified during workhours
|
|
create a new timeperiod named `workhours` defining a work day from
|
|
09:00 to 17:00.
|
|
|
|
object TimePeriod "workhours" {
|
|
import "legacy-timeperiod"
|
|
|
|
display_name = "Icinga 2 8x5 TimePeriod"
|
|
ranges = {
|
|
"monday" = "09:00-17:00"
|
|
"tuesday" = "09:00-17:00"
|
|
"wednesday" = "09:00-17:00"
|
|
"thursday" = "09:00-17:00"
|
|
"friday" = "09:00-17:00"
|
|
}
|
|
}
|
|
|
|
Use the `period` attribute to assign time periods to
|
|
`Notification` and `Dependency` objects:
|
|
|
|
object Notification "mail" {
|
|
import "generic-notification"
|
|
|
|
host_name = "localhost"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
period = "workhours"
|
|
}
|
|
|
|
|
|
## <a id="commands"></a> Commands
|
|
|
|
Icinga 2 uses three different command object types to specify how
|
|
checks should be performed, notifications should be sent, and
|
|
events should be handled.
|
|
|
|
### <a id="command-environment-variables"></a> Environment Variables for Commands
|
|
|
|
Please check [Runtime Custom Attributes as Environment Variables](3-monitoring-basics.md#runtime-custom-attribute-env-vars).
|
|
|
|
|
|
### <a id="check-commands"></a> Check Commands
|
|
|
|
[CheckCommand](5-object-types.md#objecttype-checkcommand) objects define the command line how
|
|
a check is called.
|
|
|
|
[CheckCommand](5-object-types.md#objecttype-checkcommand) objects are referenced by
|
|
[Host](5-object-types.md#objecttype-host) and [Service](5-object-types.md#objecttype-service) objects
|
|
using the `check_command` attribute.
|
|
|
|
> **Note**
|
|
>
|
|
> Make sure that the [checker](7-cli-commands.md#features) feature is enabled in order to
|
|
> execute checks.
|
|
|
|
#### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
|
|
|
|
[CheckCommand](5-object-types.md#objecttype-checkcommand) objects require the [ITL template](6-icinga-template-library.md#itl-plugin-check-command)
|
|
`plugin-check-command` to support native plugin based check methods.
|
|
|
|
Unless you have done so already, download your check plugin and put it
|
|
into the [PluginDir](4-configuring-icinga-2.md#constants-conf) directory. The following example uses the
|
|
`check_disk` plugin contained in the Monitoring Plugins package.
|
|
|
|
The plugin path and all command arguments are made a list of
|
|
double-quoted string arguments for proper shell escaping.
|
|
|
|
Call the `check_disk` plugin with the `--help` parameter to see
|
|
all available options. Our example defines warning (`-w`) and
|
|
critical (`-c`) thresholds for the disk usage. Without any
|
|
partition defined (`-p`) it will check all local partitions.
|
|
|
|
icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
|
|
...
|
|
This plugin checks the amount of used disk space on a mounted file system
|
|
and generates an alert if free space is less than one of the threshold values
|
|
|
|
|
|
Usage:
|
|
check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
|
|
[-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
|
|
[-t timeout] [-u unit] [-v] [-X type] [-N type]
|
|
...
|
|
|
|
> **Note**
|
|
>
|
|
> Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us.
|
|
|
|
Next step is to understand how command parameters are being passed from
|
|
a host or service object, and add a [CheckCommand](5-object-types.md#objecttype-checkcommand)
|
|
definition based on these required parameters and/or default values.
|
|
|
|
#### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
|
|
|
|
Check command parameters are defined as custom attributes which can be accessed as runtime macros
|
|
by the executed check command.
|
|
|
|
Define the default check command custom attribute `disk_wfree` and `disk_cfree`
|
|
(freely definable naming schema) and their default threshold values. You can
|
|
then use these custom attributes as runtime macros for [command arguments](3-monitoring-basics.md#command-arguments)
|
|
on the command line.
|
|
|
|
> **Tip**
|
|
>
|
|
> Use a common command type as prefix for your command arguments to increase
|
|
> readability. `disk_wfree` helps understanding the context better than just
|
|
> `wfree` as argument.
|
|
|
|
The default custom attributes can be overridden by the custom attributes
|
|
defined in the service using the check command `my-disk`. The custom attributes
|
|
can also be inherited from a parent template using additive inheritance (`+=`).
|
|
|
|
object CheckCommand "my-disk" {
|
|
import "plugin-check-command"
|
|
|
|
command = [ PluginDir + "/check_disk" ]
|
|
|
|
arguments = {
|
|
"-w" = "$disk_wfree$%"
|
|
"-c" = "$disk_cfree$%"
|
|
"-W" = "$disk_inode_wfree$%"
|
|
"-K" = "$disk_inode_cfree$%"
|
|
"-p" = "$disk_partitions$"
|
|
"-x" = "$disk_partitions_excluded$"
|
|
}
|
|
|
|
vars.disk_wfree = 20
|
|
vars.disk_cfree = 10
|
|
}
|
|
|
|
> **Note**
|
|
>
|
|
> A proper example for the `check_disk` plugin is already shipped with Icinga 2
|
|
> ready to use with the [plugin check commands](6-icinga-template-library.md#plugin-check-command-disk).
|
|
|
|
The host `localhost` with the applied service `basic-partitions` checks a basic set of disk partitions
|
|
with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
|
|
free disk space).
|
|
|
|
The custom attribute `disk_partition` can either hold a single string or an array of
|
|
string values for passing multiple partitions to the `check_disk` check plugin.
|
|
|
|
object Host "my-server" {
|
|
import "generic-host"
|
|
address = "127.0.0.1"
|
|
address6 = "::1"
|
|
|
|
vars.local_disks["basic-partitions"] = {
|
|
disk_partitions = [ "/", "/tmp", "/var", "/home" ]
|
|
}
|
|
}
|
|
|
|
apply Service for (disk => config in host.vars.local_disks) {
|
|
import "generic-service"
|
|
check_command = "my-disk"
|
|
|
|
vars += config
|
|
|
|
vars.disk_wfree = 10
|
|
vars.disk_cfree = 5
|
|
}
|
|
|
|
|
|
More details on using arrays in custom attributes can be found in
|
|
[this chapter](3-monitoring-basics.md#runtime-custom-attributes).
|
|
|
|
|
|
#### <a id="command-arguments"></a> Command Arguments
|
|
|
|
By defining a check command line using the `command` attribute Icinga 2
|
|
will resolve all macros in the static string or array. Sometimes it is
|
|
required to extend the arguments list based on a met condition evaluated
|
|
at command execution. Or making arguments optional - only set if the
|
|
macro value can be resolved by Icinga 2.
|
|
|
|
object CheckCommand "check_http" {
|
|
import "plugin-check-command"
|
|
|
|
command = [ PluginDir + "/check_http" ]
|
|
|
|
arguments = {
|
|
"-H" = "$http_vhost$"
|
|
"-I" = "$http_address$"
|
|
"-u" = "$http_uri$"
|
|
"-p" = "$http_port$"
|
|
"-S" = {
|
|
set_if = "$http_ssl$"
|
|
}
|
|
"--sni" = {
|
|
set_if = "$http_sni$"
|
|
}
|
|
"-a" = {
|
|
value = "$http_auth_pair$"
|
|
description = "Username:password on sites with basic authentication"
|
|
}
|
|
"--no-body" = {
|
|
set_if = "$http_ignore_body$"
|
|
}
|
|
"-r" = "$http_expect_body_regex$"
|
|
"-w" = "$http_warn_time$"
|
|
"-c" = "$http_critical_time$"
|
|
"-e" = "$http_expect$"
|
|
}
|
|
|
|
vars.http_address = "$address$"
|
|
vars.http_ssl = false
|
|
vars.http_sni = false
|
|
}
|
|
|
|
The example shows the `check_http` check command defining the most common
|
|
arguments. Each of them is optional by default and will be omitted if
|
|
the value is not set. For example if the service calling the check command
|
|
does not have `vars.http_port` set, it won't get added to the command
|
|
line.
|
|
|
|
If the `vars.http_ssl` custom attribute is set in the service, host or command
|
|
object definition, Icinga 2 will add the `-S` argument based on the `set_if`
|
|
numeric value to the command line. String values are not supported.
|
|
|
|
If the macro value cannot be resolved, Icinga 2 will not add the defined argument
|
|
to the final command argument array. Empty strings for macro values won't omit
|
|
the argument.
|
|
|
|
That way you can use the `check_http` command definition for both, with and
|
|
without SSL enabled checks saving you duplicated command definitions.
|
|
|
|
Details on all available options can be found in the
|
|
[CheckCommand object definition](5-object-types.md#objecttype-checkcommand).
|
|
|
|
### <a id="using-apply-services-command-arguments"></a> Apply Services with Custom Command Arguments
|
|
|
|
Imagine the following scenario: The `my-host1` host is reachable using the default port 22, while
|
|
the `my-host2` host requires a different port on 2222. Both hosts are in the hostgroup `my-linux-servers`.
|
|
|
|
object HostGroup "my-linux-servers" {
|
|
display_name = "Linux Servers"
|
|
assign where host.vars.os == "Linux"
|
|
}
|
|
|
|
/* this one has port 22 opened */
|
|
object Host "my-host1" {
|
|
import "generic-host"
|
|
address = "129.168.1.50"
|
|
vars.os = "Linux"
|
|
}
|
|
|
|
/* this one listens on a different ssh port */
|
|
object Host "my-host2" {
|
|
import "generic-host"
|
|
address = "129.168.2.50"
|
|
vars.os = "Linux"
|
|
vars.custom_ssh_port = 2222
|
|
}
|
|
|
|
All hosts in the `my-linux-servers` hostgroup should get the `my-ssh` service applied based on an
|
|
[apply rule](15-language-reference.md#apply). The optional `ssh_port` command argument should be inherited from the host
|
|
the service is applied to. If not set, the check command `my-ssh` will omit the argument.
|
|
The `host` argument is special: `skip_key` tells Icinga 2 to ignore the key, and directly put the
|
|
value onto the command line. The `order` attribute specifies that this argument is the first one
|
|
(`-1` is smaller than the other defaults).
|
|
|
|
object CheckCommand "my-ssh" {
|
|
import "plugin-check-command"
|
|
|
|
command = [ PluginDir + "/check_ssh" ]
|
|
|
|
arguments = {
|
|
"-p" = "$ssh_port$"
|
|
"host" = {
|
|
value = "$ssh_address$"
|
|
skip_key = true
|
|
order = -1
|
|
}
|
|
}
|
|
|
|
vars.ssh_address = "$address$"
|
|
}
|
|
|
|
/* apply ssh service */
|
|
apply Service "my-ssh" {
|
|
import "generic-service"
|
|
check_command = "my-ssh"
|
|
|
|
//set the command argument for ssh port with a custom host attribute, if set
|
|
vars.ssh_port = "$host.vars.custom_ssh_port$"
|
|
|
|
assign where "my-linux-servers" in host.groups
|
|
}
|
|
|
|
The `my-host1` will get the `my-ssh` service checking on the default port:
|
|
|
|
[2014-05-26 21:52:23 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '129.168.1.50': PID 27281
|
|
|
|
The `my-host2` will inherit the `custom_ssh_port` variable to the service and execute a different command:
|
|
|
|
[2014-05-26 21:51:32 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '-p', '2222', '129.168.2.50': PID 26956
|
|
|
|
|
|
### <a id="notification-commands"></a> Notification Commands
|
|
|
|
[NotificationCommand](5-object-types.md#objecttype-notificationcommand) objects define how notifications are delivered to external
|
|
interfaces (E-Mail, XMPP, IRC, Twitter, etc).
|
|
|
|
[NotificationCommand](5-object-types.md#objecttype-notificationcommand) objects are referenced by
|
|
[Notification](5-object-types.md#objecttype-notification) objects using the `command` attribute.
|
|
|
|
`NotificationCommand` objects require the [ITL template](6-icinga-template-library.md#itl-plugin-notification-command)
|
|
`plugin-notification-command` to support native plugin-based notifications.
|
|
|
|
> **Note**
|
|
>
|
|
> Make sure that the [notification](7-cli-commands.md#features) feature is enabled on your master instance
|
|
> in order to execute notification commands.
|
|
|
|
Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
|
|
the current check output) sending an email to the user(s) associated with the
|
|
notification itself (`$user.email$`).
|
|
|
|
If you want to specify default values for some of the custom attribute definitions,
|
|
you can add a `vars` dictionary as shown for the `CheckCommand` object.
|
|
|
|
object NotificationCommand "mail-service-notification" {
|
|
import "plugin-notification-command"
|
|
|
|
command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
|
|
|
|
env = {
|
|
NOTIFICATIONTYPE = "$notification.type$"
|
|
SERVICEDESC = "$service.name$"
|
|
HOSTALIAS = "$host.display_name$"
|
|
HOSTADDRESS = "$address$"
|
|
SERVICESTATE = "$service.state$"
|
|
LONGDATETIME = "$icinga.long_date_time$"
|
|
SERVICEOUTPUT = "$service.output$"
|
|
NOTIFICATIONAUTHORNAME = "$notification.author$"
|
|
NOTIFICATIONCOMMENT = "$notification.comment$"
|
|
HOSTDISPLAYNAME = "$host.display_name$"
|
|
SERVICEDISPLAYNAME = "$service.display_name$"
|
|
USEREMAIL = "$user.email$"
|
|
}
|
|
}
|
|
|
|
The command attribute in the `mail-service-notification` command refers to the following
|
|
shell script. The macros specified in the `env` array are exported
|
|
as environment variables and can be used in the notification script:
|
|
|
|
#!/usr/bin/env bash
|
|
template=$(cat <<TEMPLATE
|
|
***** Icinga *****
|
|
|
|
Notification Type: $NOTIFICATIONTYPE
|
|
|
|
Service: $SERVICEDESC
|
|
Host: $HOSTALIAS
|
|
Address: $HOSTADDRESS
|
|
State: $SERVICESTATE
|
|
|
|
Date/Time: $LONGDATETIME
|
|
|
|
Additional Info: $SERVICEOUTPUT
|
|
|
|
Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
|
|
TEMPLATE
|
|
)
|
|
|
|
/usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
|
|
|
|
> **Note**
|
|
>
|
|
> This example is for `exim` only. Requires changes for `sendmail` and
|
|
> other MTAs.
|
|
|
|
While it's possible to specify the entire notification command right
|
|
in the NotificationCommand object it is generally advisable to create a
|
|
shell script in the `/etc/icinga2/scripts` directory and have the
|
|
NotificationCommand object refer to that.
|
|
|
|
### <a id="event-commands"></a> Event Commands
|
|
|
|
Unlike notifications event commands for hosts/services are called on every
|
|
check execution if one of these conditions match:
|
|
|
|
* The host/service is in a [soft state](3-monitoring-basics.md#hard-soft-states)
|
|
* The host/service state changes into a [hard state](3-monitoring-basics.md#hard-soft-states)
|
|
* The host/service state recovers from a [soft or hard state](3-monitoring-basics.md#hard-soft-states) to [OK](3-monitoring-basics.md#service-states)/[Up](3-monitoring-basics.md#host-states)
|
|
|
|
[EventCommand](5-object-types.md#objecttype-eventcommand) objects are referenced by
|
|
[Host](5-object-types.md#objecttype-host) and [Service](5-object-types.md#objecttype-service) objects
|
|
using the `event_command` attribute.
|
|
|
|
Therefore the `EventCommand` object should define a command line
|
|
evaluating the current service state and other service runtime attributes
|
|
available through runtime vars. Runtime macros such as `$service.state_type$`
|
|
and `$service.state$` will be processed by Icinga 2 helping on fine-granular
|
|
events being triggered.
|
|
|
|
Common use case scenarios are a failing HTTP check requiring an immediate
|
|
restart via event command, or if an application is locked and requires
|
|
a restart upon detection.
|
|
|
|
`EventCommand` objects require the ITL template `plugin-event-command`
|
|
to support native plugin based checks.
|
|
|
|
#### <a id="event-command-restart-service-daemon"></a> Use Event Commands to Restart Service Daemon
|
|
|
|
The following example will triggert a restart of the `httpd` daemon
|
|
via ssh when the `http` service check fails. If the service state is
|
|
`OK`, it will not trigger any event action.
|
|
|
|
Requirements:
|
|
|
|
* ssh connection
|
|
* icinga user with public key authentication
|
|
* icinga user with sudo permissions for restarting the httpd daemon.
|
|
|
|
Example on Debian:
|
|
|
|
# ls /home/icinga/.ssh/
|
|
authorized_keys
|
|
|
|
# visudo
|
|
icinga ALL=(ALL) NOPASSWD: /etc/init.d/apache2 restart
|
|
|
|
|
|
Define a generic [EventCommand](5-object-types.md#objecttype-eventcommand) object `event_by_ssh`
|
|
which can be used for all event commands triggered using ssh:
|
|
|
|
/* pass event commands through ssh */
|
|
object EventCommand "event_by_ssh" {
|
|
import "plugin-event-command"
|
|
|
|
command = [ PluginDir + "/check_by_ssh" ]
|
|
|
|
arguments = {
|
|
"-H" = "$event_by_ssh_address$"
|
|
"-p" = "$event_by_ssh_port$"
|
|
"-C" = "$event_by_ssh_command$"
|
|
"-l" = "$event_by_ssh_logname$"
|
|
"-i" = "$event_by_ssh_identity$"
|
|
"-q" = {
|
|
set_if = "$event_by_ssh_quiet$"
|
|
}
|
|
"-w" = "$event_by_ssh_warn$"
|
|
"-c" = "$event_by_ssh_crit$"
|
|
"-t" = "$event_by_ssh_timeout$"
|
|
}
|
|
|
|
vars.event_by_ssh_address = "$address$"
|
|
vars.event_by_ssh_quiet = false
|
|
}
|
|
|
|
The actual event command only passes the `event_by_ssh_command` attribute.
|
|
The `event_by_ssh_service` custom attribute takes care of passing the correct
|
|
daemon name, while `test $service.state_id$ -gt 0` makes sure that the daemon
|
|
is only restarted when the service is an a not `OK` state.
|
|
|
|
|
|
object EventCommand "event_by_ssh_restart_service" {
|
|
import "event_by_ssh"
|
|
|
|
//only restart the daemon if state > 0 (not-ok)
|
|
//requires sudo permissions for the icinga user
|
|
vars.event_by_ssh_command = "test $service.state_id$ -gt 0 && sudo /etc/init.d/$event_by_ssh_service$ restart"
|
|
}
|
|
|
|
|
|
Now set the `event_command` attribute to `event_by_ssh_restart_service` and tell it
|
|
which service should be restarted using the `event_by_ssh_service` attribute.
|
|
|
|
object Service "http" {
|
|
import "generic-service"
|
|
host_name = "remote-http-host"
|
|
check_command = "http"
|
|
|
|
event_command = "event_by_ssh_restart_service"
|
|
vars.event_by_ssh_service = "$host.vars.httpd_name$"
|
|
|
|
//vars.event_by_ssh_logname = "icinga"
|
|
//vars.event_by_ssh_identity = "/home/icinga/.ssh/id_rsa.pub"
|
|
}
|
|
|
|
|
|
Each host with this service then must define the `httpd_name` custom attribute
|
|
(for example generated from your cmdb):
|
|
|
|
object Host "remote-http-host" {
|
|
import "generic-host"
|
|
address = "192.168.1.100"
|
|
|
|
vars.httpd_name = "apache2"
|
|
}
|
|
|
|
You can testdrive this example by manually stopping the `httpd` daemon
|
|
on your `remote-http-host`. Enable the `debuglog` feature and tail the
|
|
`/var/log/icinga2/debug.log` file.
|
|
|
|
Remote Host Terminal:
|
|
|
|
# date; service apache2 status
|
|
Mon Sep 15 18:57:39 CEST 2014
|
|
Apache2 is running (pid 23651).
|
|
# date; service apache2 stop
|
|
Mon Sep 15 18:57:47 CEST 2014
|
|
[ ok ] Stopping web server: apache2 ... waiting .
|
|
|
|
Icinga 2 Host Terminal:
|
|
|
|
[2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100': PID 32622
|
|
[2014-09-15 18:58:32 +0200] notice/Process: PID 32622 ('/usr/lib64/nagios/plugins/check_http' '-I' '192.168.1.100') terminated with exit code 2
|
|
[2014-09-15 18:58:32 +0200] notice/Checkable: State Change: Checkable remote-http-host!http soft state change from OK to CRITICAL detected.
|
|
[2014-09-15 18:58:32 +0200] notice/Checkable: Executing event handler 'event_by_ssh_restart_service' for service 'remote-http-host!http'
|
|
[2014-09-15 18:58:32 +0200] notice/Process: Running command '/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100': PID 32623
|
|
[2014-09-15 18:58:33 +0200] notice/Process: PID 32623 ('/usr/lib64/nagios/plugins/check_by_ssh' '-C' 'test 2 -gt 0 && sudo /etc/init.d/apache2 restart' '-H' '192.168.1.100') terminated with exit code 0
|
|
|
|
Remote Host Terminal:
|
|
|
|
# date; service apache2 status
|
|
Mon Sep 15 18:58:44 CEST 2014
|
|
Apache2 is running (pid 24908).
|
|
|
|
|
|
|
|
|
|
## <a id="dependencies"></a> Dependencies
|
|
|
|
Icinga 2 uses host and service [Dependency](5-object-types.md#objecttype-dependency) objects
|
|
for determing their network reachability.
|
|
|
|
A service can depend on a host, and vice versa. A service has an implicit
|
|
dependency (parent) to its host. A host to host dependency acts implicitly
|
|
as host parent relation.
|
|
When dependencies are calculated, not only the immediate parent is taken into
|
|
account but all parents are inherited.
|
|
|
|
The `parent_host_name` and `parent_service_name` attributes are mandatory for
|
|
service dependencies, `parent_host_name` is required for host dependencies.
|
|
[Apply rules](3-monitoring-basics.md#using-apply) will allow you to
|
|
[determine these attributes](3-monitoring-basics.md#dependencies-apply-custom-attributes) in a more
|
|
dynamic fashion if required.
|
|
|
|
parent_host_name = "core-router"
|
|
parent_service_name = "uplink-port"
|
|
|
|
Notifications are suppressed by default if a host or service becomes unreachable.
|
|
You can control that option by defining the `disable_notifications` attribute.
|
|
|
|
disable_notifications = false
|
|
|
|
The dependency state filter must be defined based on the parent object being
|
|
either a host (`Up`, `Down`) or a service (`OK`, `Warning`, `Critical`, `Unknown`).
|
|
|
|
The following example will make the dependency fail and trigger it if the parent
|
|
object is **not** in one of these states:
|
|
|
|
states = [ OK, Critical, Unknown ]
|
|
|
|
Rephrased: If the parent service object changes into the `Warning` state, this
|
|
dependency will fail and render all child objects (hosts or services) unreachable.
|
|
|
|
You can determine the child's reachability by querying the `is_reachable` attribute
|
|
in for example [DB IDO](17-appendix.md#schema-db-ido-extensions).
|
|
|
|
### <a id="dependencies-implicit-host-service"></a> Implicit Dependencies for Services on Host
|
|
|
|
Icinga 2 automatically adds an implicit dependency for services on their host. That way
|
|
service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
|
|
does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
|
|
`states = [ Up ]` for all service objects.
|
|
|
|
Service checks are still executed. If you want to prevent them from happening, you can
|
|
apply the following dependency to all services setting their host as `parent_host_name`
|
|
and disabling the checks. `assign where true` matches on all `Service` objects.
|
|
|
|
apply Dependency "disable-host-service-checks" to Service {
|
|
disable_checks = true
|
|
assign where true
|
|
}
|
|
|
|
### <a id="dependencies-network-reachability"></a> Dependencies for Network Reachability
|
|
|
|
A common scenario is the Icinga 2 server behind a router. Checking internet
|
|
access by pinging the Google DNS server `google-dns` is a common method, but
|
|
will fail in case the `dsl-router` host is down. Therefore the example below
|
|
defines a host dependency which acts implicitly as parent relation too.
|
|
|
|
Furthermore the host may be reachable but ping probes are dropped by the
|
|
router's firewall. In case the `dsl-router``ping4` service check fails, all
|
|
further checks for the `ping4` service on host `google-dns` service should
|
|
be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
|
|
|
|
object Host "dsl-router" {
|
|
import "generic-host"
|
|
address = "192.168.1.1"
|
|
}
|
|
|
|
object Host "google-dns" {
|
|
import "generic-host"
|
|
address = "8.8.8.8"
|
|
}
|
|
|
|
apply Service "ping4" {
|
|
import "generic-service"
|
|
|
|
check_command = "ping4"
|
|
|
|
assign where host.address
|
|
}
|
|
|
|
apply Dependency "internet" to Host {
|
|
parent_host_name = "dsl-router"
|
|
disable_checks = true
|
|
disable_notifications = true
|
|
|
|
assign where host.name != "dsl-router"
|
|
}
|
|
|
|
apply Dependency "internet" to Service {
|
|
parent_host_name = "dsl-router"
|
|
parent_service_name = "ping4"
|
|
disable_checks = true
|
|
|
|
assign where host.name != "dsl-router"
|
|
}
|
|
|
|
### <a id="dependencies-apply-custom-attributes"></a> Apply Dependencies based on Custom Attributes
|
|
|
|
You can use [apply rules](3-monitoring-basics.md#using-apply) to set parent or
|
|
child attributes e.g. `parent_host_name`to other object's
|
|
attributes.
|
|
|
|
A common example are virtual machines hosted on a master. The object
|
|
name of that master is auto-generated from your CMDB or VMWare inventory
|
|
into the host's custom attributes (or a generic template for your
|
|
cloud).
|
|
|
|
Define your master host object:
|
|
|
|
/* your master */
|
|
object Host "master.example.com" {
|
|
import "generic-host"
|
|
}
|
|
|
|
Add a generic template defining all common host attributes:
|
|
|
|
/* generic template for your virtual machines */
|
|
template Host "generic-vm" {
|
|
import "generic-host"
|
|
}
|
|
|
|
Add a template for all hosts on your example.com cloud setting
|
|
custom attribute `vm_parent` to `master.example.com`:
|
|
|
|
template Host "generic-vm-example.com" {
|
|
import "generic-vm"
|
|
vars.vm_parent = "master.example.com"
|
|
}
|
|
|
|
Define your guest hosts:
|
|
|
|
object Host "www.example1.com" {
|
|
import "generic-vm-master.example.com"
|
|
}
|
|
|
|
object Host "www.example2.com" {
|
|
import "generic-vm-master.example.com"
|
|
}
|
|
|
|
Apply the host dependency to all child hosts importing the
|
|
`generic-vm` template and set the `parent_host_name`
|
|
to the previously defined custom attribute `host.vars.vm_parent`.
|
|
|
|
apply Dependency "vm-host-to-parent-master" to Host {
|
|
parent_host_name = host.vars.vm_parent
|
|
assign where "generic-vm" in host.templates
|
|
}
|
|
|
|
You can extend this example, and make your services depend on the
|
|
`master.example.com` host too. Their local scope allows you to use
|
|
`host.vars.vm_parent` similar to the example above.
|
|
|
|
apply Dependency "vm-service-to-parent-master" to Service {
|
|
parent_host_name = host.vars.vm_parent
|
|
assign where "generic-vm" in host.templates
|
|
}
|
|
|
|
That way you don't need to wait for your guest hosts becoming
|
|
unreachable when the master host goes down. Instead the services
|
|
will detect their reachability immediately when executing checks.
|
|
|
|
> **Note**
|
|
>
|
|
> This method with setting locally scoped variables only works in
|
|
> apply rules, but not in object definitions.
|
|
|
|
|
|
### <a id="dependencies-agent-checks"></a> Dependencies for Agent Checks
|
|
|
|
Another classic example are agent based checks. You would define a health check
|
|
for the agent daemon responding to your requests, and make all other services
|
|
querying that daemon depend on that health check.
|
|
|
|
The following configuration defines two nrpe based service checks `nrpe-load`
|
|
and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
|
|
`nrpe-health` service.
|
|
|
|
apply Service "nrpe-health" {
|
|
import "generic-service"
|
|
check_command = "nrpe"
|
|
assign where match("nrpe-*", host.name)
|
|
}
|
|
|
|
apply Service "nrpe-load" {
|
|
import "generic-service"
|
|
check_command = "nrpe"
|
|
vars.nrpe_command = "check_load"
|
|
assign where match("nrpe-*", host.name)
|
|
}
|
|
|
|
apply Service "nrpe-disk" {
|
|
import "generic-service"
|
|
check_command = "nrpe"
|
|
vars.nrpe_command = "check_disk"
|
|
assign where match("nrpe-*", host.name)
|
|
}
|
|
|
|
object Host "nrpe-server" {
|
|
import "generic-host"
|
|
address = "192.168.1.5"
|
|
}
|
|
|
|
apply Dependency "disable-nrpe-checks" to Service {
|
|
parent_service_name = "nrpe-health"
|
|
|
|
states = [ OK ]
|
|
disable_checks = true
|
|
disable_notifications = true
|
|
assign where service.check_command == "nrpe"
|
|
ignore where service.name == "nrpe-health"
|
|
}
|
|
|
|
The `disable-nrpe-checks` dependency is applied to all services
|
|
on the `nrpe-service` host using the `nrpe` check_command attribute
|
|
but not the `nrpe-health` service itself.
|
|
|
|
|
|
## <a id="downtimes"></a> Downtimes
|
|
|
|
Downtimes can be scheduled for planned server maintenance or
|
|
any other targetted service outage you are aware of in advance.
|
|
|
|
Downtimes will suppress any notifications, and may trigger other
|
|
downtimes too. If the downtime was set by accident, or the duration
|
|
exceeds the maintenance, you can manually cancel the downtime.
|
|
Planned downtimes will also be taken into account for SLA reporting
|
|
tools calculating the SLAs based on the state and downtime history.
|
|
|
|
Multiple downtimes for a single object may overlap. This is useful
|
|
when you want to extend your maintenance window taking longer than expected.
|
|
If there are multiple downtimes triggered for one object, the overall downtime depth
|
|
will be greater than `1`.
|
|
|
|
|
|
If the downtime was scheduled after the problem changed to a critical hard
|
|
state triggering a problem notification, and the service recovers during
|
|
the downtime window, the recovery notification won't be suppressed.
|
|
|
|
### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
|
|
|
|
A `fixed` downtime will be activated at the defined start time, and
|
|
removed at the end time. During this time window the service state
|
|
will change to `NOT-OK` and then actually trigger the downtime.
|
|
Notifications are suppressed and the downtime depth is incremented.
|
|
|
|
Common scenarios are a planned distribution upgrade on your linux
|
|
servers, or database updates in your warehouse. The customer knows
|
|
about a fixed downtime window between 23:00 and 24:00. After 24:00
|
|
all problems should be alerted again. Solution is simple -
|
|
schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
|
|
|
|
Unlike a `fixed` downtime, a `flexible` downtime will be triggered
|
|
by the state change in the time span defined by start and end time,
|
|
and then last for the specified duration in minutes.
|
|
|
|
Imagine the following scenario: Your service is frequently polled
|
|
by users trying to grab free deleted domains for immediate registration.
|
|
Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
|
|
a network outage visible to the monitoring. The service is still alive,
|
|
but answering too slow to Icinga 2 service checks.
|
|
For that reason, you may want to schedule a downtime between 07:30 and
|
|
08:00 with a duration of 15 minutes. The downtime will then last from
|
|
its trigger time until the duration is over. After that, the downtime
|
|
is removed (may happen before or after the actual end time!).
|
|
|
|
### <a id="scheduling-downtime"></a> Scheduling a downtime
|
|
|
|
This can either happen through a web interface or by sending an [external command](3-monitoring-basics.md#external-commands)
|
|
to the external command pipe provided by the `ExternalCommandListener` configuration.
|
|
|
|
Fixed downtimes require a start and end time (a duration will be ignored).
|
|
Flexible downtimes need a start and end time for the time span, and a duration
|
|
independent from that time span.
|
|
|
|
### <a id="triggered-downtimes"></a> Triggered Downtimes
|
|
|
|
This is optional when scheduling a downtime. If there is already a downtime
|
|
scheduled for a future maintenance, the current downtime can be triggered by
|
|
that downtime. This renders useful if you have scheduled a host downtime and
|
|
are now scheduling a child host's downtime getting triggered by the parent
|
|
downtime on NOT-OK state change.
|
|
|
|
### <a id="recurring-downtimes"></a> Recurring Downtimes
|
|
|
|
[ScheduledDowntime objects](5-object-types.md#objecttype-scheduleddowntime) can be used to set up
|
|
recurring downtimes for services.
|
|
|
|
Example:
|
|
|
|
apply ScheduledDowntime "backup-downtime" to Service {
|
|
author = "icingaadmin"
|
|
comment = "Scheduled downtime for backup"
|
|
|
|
ranges = {
|
|
monday = "02:00-03:00"
|
|
tuesday = "02:00-03:00"
|
|
wednesday = "02:00-03:00"
|
|
thursday = "02:00-03:00"
|
|
friday = "02:00-03:00"
|
|
saturday = "02:00-03:00"
|
|
sunday = "02:00-03:00"
|
|
}
|
|
|
|
assign where "backup" in service.groups
|
|
}
|
|
|
|
|
|
## <a id="comments-intro"></a> Comments
|
|
|
|
Comments can be added at runtime and are persistent over restarts. You can
|
|
add useful information for others on repeating incidents (for example
|
|
"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
|
|
is primarly accessible using web interfaces.
|
|
|
|
Adding and deleting comment actions are possible through the external command pipe
|
|
provided with the `ExternalCommandListener` configuration. The caller must
|
|
pass the comment id in case of manipulating an existing comment.
|
|
|
|
|
|
## <a id="acknowledgements"></a> Acknowledgements
|
|
|
|
If a problem is alerted and notified you may signal the other notification
|
|
recipients that you are aware of the problem and will handle it.
|
|
|
|
By sending an acknowledgement to Icinga 2 (using the external command pipe
|
|
provided with `ExternalCommandListener` configuration) all future notifications
|
|
are suppressed, a new comment is added with the provided description and
|
|
a notification with the type `NotificationFilterAcknowledgement` is sent
|
|
to all notified users.
|
|
|
|
### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
|
|
|
|
Once a problem is acknowledged it may disappear from your `handled problems`
|
|
dashboard and no-one ever looks at it again since it will suppress
|
|
notifications too.
|
|
|
|
This `fire-and-forget` action is quite common. If you're sure that a
|
|
current problem should be resolved in the future at a defined time,
|
|
you can define an expiration time when acknowledging the problem.
|
|
|
|
Icinga 2 will clear the acknowledgement when expired and start to
|
|
re-notify if the problem persists.
|
|
|
|
|
|
|
|
## <a id="custom-attributes"></a> Custom Attributes
|
|
|
|
### <a id="custom-attributes-apply"></a> Using Custom Attributes for Apply Rules
|
|
|
|
Custom attributes are not only used at runtime in command definitions to pass
|
|
command arguments, but are also a smart way to define patterns and groups
|
|
for applying objects for dynamic config generation.
|
|
|
|
There are several ways of using custom attributes with [apply rules](3-monitoring-basics.md#using-apply):
|
|
|
|
* As simple attribute literal ([number](15-language-reference.md#numeric-literals), [string](15-language-reference.md#string-literals),
|
|
[boolean](15-language-reference.md#boolean-literals)) for expression conditions (`assign where`, `ignore where`)
|
|
* As [array](15-language-reference.md#array) or [dictionary](15-language-reference.md#dictionary) attribute with nested values
|
|
(e.g. dictionaries in dictionaries) in [apply for](3-monitoring-basics.md#using-apply-for) rules.
|
|
* As a [function object](#functions)
|
|
|
|
Features like [DB IDO](3-monitoring-basics.md#db-ido), Livestatus(#setting-up-livestatus) or StatusData(#status-data)
|
|
dump this column as encoded JSON string, and set `is_json` resp. `cv_is_json` to `1`.
|
|
|
|
If arrays are used in runtime macros (for example `$host.groups$`) all entries
|
|
are separated using the `;` character. If an entry contains a semi-colon itself,
|
|
it is escaped like this: `entry1;ent\;ry2;entry3`.
|
|
|
|
### <a id="runtime-custom-attributes"></a> Using Custom Attributes at Runtime
|
|
|
|
Custom attributes may be used in command definitions to dynamically change how the command
|
|
is executed.
|
|
|
|
Additionally there are Icinga 2 features such as the [PerfDataWriter](3-monitoring-basics.md#performance-data) feature
|
|
which use custom runtime attributes to format their output.
|
|
|
|
> **Tip**
|
|
>
|
|
> Custom attributes are identified by the `vars` dictionary attribute as short name.
|
|
> Accessing the different attribute keys is possible using the [index accessor](15-language-reference.md#indexer) `.`.
|
|
|
|
Custom attributes in command definitions or performance data templates are evaluated at
|
|
runtime when executing a command. These custom attributes cannot be used somewhere else
|
|
for example in other configuration attributes.
|
|
|
|
Custom attribute values must be either a string, a number, a boolean value or an array.
|
|
Dictionaries cannot be used at the time of writing.
|
|
|
|
Arrays can be used to pass multiple arguments with or without repeating the key string.
|
|
This helps passing multiple parameters to check plugins requiring them. Prominent
|
|
plugin examples are:
|
|
|
|
* [check_disk -p](6-icinga-template-library.md#plugin-check-command-disk)
|
|
* [check_nrpe -a](6-icinga-template-library.md#plugin-check-command-nrpe)
|
|
* [check_nscp -l](6-icinga-template-library.md#plugin-check-command-nscp)
|
|
* [check_dns -a](6-icinga-template-library.md#plugin-check-command-dns)
|
|
|
|
More details on how to use `repeat_key` and other command argument options can be
|
|
found in [this section](5-object-types.md#objecttype-checkcommand-arguments).
|
|
|
|
> **Note**
|
|
>
|
|
> If a macro value cannot be resolved, be it a single macro, or a recursive macro
|
|
> containing an array of macros, the entire command argument is skipped.
|
|
|
|
This is an example of a command definition which uses user-defined custom attributes:
|
|
|
|
object CheckCommand "my-icmp" {
|
|
import "plugin-check-command"
|
|
command = [ "/bin/sudo", PluginDir + "/check_icmp" ]
|
|
|
|
arguments = {
|
|
"-H" = {
|
|
value = "$icmp_targets$"
|
|
repeat_key = false
|
|
order = 1
|
|
}
|
|
"-w" = "$icmp_wrta$,$icmp_wpl$%"
|
|
"-c" = "$icmp_crta$,$icmp_cpl$%"
|
|
"-s" = "$icmp_source$"
|
|
"-n" = "$icmp_packets$"
|
|
"-i" = "$icmp_packet_interval$"
|
|
"-I" = "$icmp_target_interval$"
|
|
"-m" = "$icmp_hosts_alive$"
|
|
"-b" = "$icmp_data_bytes$"
|
|
"-t" = "$icmp_timeout$"
|
|
}
|
|
|
|
vars.icmp_wrta = 200.00
|
|
vars.icmp_wpl = 40
|
|
vars.icmp_crta = 500.00
|
|
vars.icmp_cpl = 80
|
|
|
|
vars.notes = "Requires setuid root or sudo."
|
|
}
|
|
|
|
Custom attribute names used at runtime must be enclosed in two `$` signs,
|
|
for example `$address$`.
|
|
|
|
> **Note**
|
|
>
|
|
> When using the `$` sign as single character, you need to escape it with an
|
|
> additional dollar sign (`$$`).
|
|
|
|
This example also makes use of the [command arguments](3-monitoring-basics.md#command-arguments) passed
|
|
to the command line.
|
|
|
|
You can integrate the above example `CheckCommand` definition
|
|
[passing command argument parameters](3-monitoring-basics.md#command-passing-parameters) like this:
|
|
|
|
object Host "my-icmp-host" {
|
|
import "generic-host"
|
|
address = "192.168.1.10"
|
|
vars.address_mgmt = "192.168.2.10"
|
|
vars.address_web = "192.168.10.10"
|
|
vars.icmp_targets = [ "$address$", "$host.vars.address_mgmt$", "$host.vars.address_web$" ]
|
|
}
|
|
|
|
apply Service "my-icmp" {
|
|
check_command = "my-icmp"
|
|
check_interval = 1m
|
|
retry_interval = 30s
|
|
|
|
vars.icmp_targets = host.vars.icmp_targets
|
|
|
|
assign where host.vars.icmp_targets
|
|
}
|
|
|
|
### <a id="runtime-custom-attributes-evaluation-order"></a> Runtime Custom Attributes Evaluation Order
|
|
|
|
When executing commands Icinga 2 checks the following objects in this order to look
|
|
up custom attributes and their respective values:
|
|
|
|
1. User object (only for notifications)
|
|
2. Service object
|
|
3. Host object
|
|
4. Command object
|
|
5. Global custom attributes in the `vars` constant
|
|
|
|
This execution order allows you to define default values for custom attributes
|
|
in your command objects. The `my-ping` command shown above uses this to set
|
|
default values for some of the latency thresholds and timeouts.
|
|
|
|
When using the `my-ping` command you can override some or all of the custom
|
|
attributes in the service definition like this:
|
|
|
|
object Service "ping" {
|
|
host_name = "localhost"
|
|
check_command = "my-ping"
|
|
|
|
vars.ping_packets = 10 // Overrides the default value of 5 given in the command
|
|
}
|
|
|
|
If a custom attribute isn't defined anywhere an empty value is used and a warning is
|
|
emitted to the Icinga 2 log.
|
|
|
|
> **Best Practice**
|
|
>
|
|
> By convention every host should have an `address` attribute. Hosts
|
|
> which have an IPv6 address should also have an `address6` attribute.
|
|
|
|
### <a id="runtime-custom-attribute-env-vars"></a> Runtime Custom Attributes as Environment Variables
|
|
|
|
The `env` command object attribute specifies a list of environment variables with values calculated
|
|
from either runtime macros or custom attributes which should be exported as environment variables
|
|
prior to executing the command.
|
|
|
|
This is useful for example for hiding sensitive information on the command line output
|
|
when passing credentials to database checks:
|
|
|
|
object CheckCommand "mysql-health" {
|
|
import "plugin-check-command"
|
|
|
|
command = [
|
|
PluginDir + "/check_mysql"
|
|
]
|
|
|
|
arguments = {
|
|
"-H" = "$mysql_address$"
|
|
"-d" = "$mysql_database$"
|
|
}
|
|
|
|
vars.mysql_address = "$address$"
|
|
vars.mysql_database = "icinga"
|
|
vars.mysql_user = "icinga_check"
|
|
vars.mysql_pass = "password"
|
|
|
|
env.MYSQLUSER = "$mysql_user$"
|
|
env.MYSQLPASS = "$mysql_pass$"
|
|
}
|
|
|
|
### <a id="multiple-host-addresses-custom-attributes"></a> Multiple Host Addresses using Custom Attributes
|
|
|
|
The following example defines a `Host` with three different interface addresses defined as
|
|
custom attributes in the `vars` dictionary. The `if-eth0` and `if-eth1` services will import
|
|
these values into the `address` custom attribute. This attribute is available through the
|
|
generic `$address$` runtime macro.
|
|
|
|
object Host "multi-ip" {
|
|
check_command = "dummy"
|
|
vars.address_lo = "127.0.0.1"
|
|
vars.address_eth0 = "10.0.0.10"
|
|
vars.address_eth1 = "192.168.1.10"
|
|
}
|
|
|
|
apply Service "if-eth0" {
|
|
import "generic-service"
|
|
|
|
vars.address = "$host.vars.address_eth0$"
|
|
check_command = "my-generic-interface-check"
|
|
|
|
assign where host.vars.address_eth0 != ""
|
|
}
|
|
|
|
apply Service "if-eth1" {
|
|
import "generic-service"
|
|
|
|
vars.address = "$host.vars.address_eth1$"
|
|
check_command = "my-generic-interface-check"
|
|
|
|
assign where host.vars.address_eth1 != ""
|
|
}
|
|
|
|
object CheckCommand "my-generic-interface-check" {
|
|
import "plugin-check-command"
|
|
|
|
command = "echo \"This would be the service $service.description$ using the address value: $address$\""
|
|
}
|
|
|
|
The `CheckCommand` object is just an example to help you with testing and
|
|
understanding the different custom attributes and runtime macros.
|
|
|
|
### <a id="modified-attributes"></a> Modified Attributes
|
|
|
|
Icinga 2 allows you to modify defined object attributes at runtime different to
|
|
the local configuration object attributes. These modified attributes are
|
|
stored as bit-shifted-value and made available in backends. Icinga 2 stores
|
|
modified attributes in its state file and restores them on restart.
|
|
|
|
Modified Attributes can be reset using external commands.
|
|
|
|
|
|
## <a id="runtime-macros"></a> Runtime Macros
|
|
|
|
Next to custom attributes there are additional runtime macros made available by Icinga 2.
|
|
These runtime macros reflect the current object state and may change over time while
|
|
custom attributes are configured statically (but can be modified at runtime using
|
|
external commands).
|
|
|
|
### <a id="runtime-macro-evaluation-order"></a> Runtime Macro Evaluation Order
|
|
|
|
Custom attributes can be accessed at [runtime](3-monitoring-basics.md#runtime-custom-attributes) using their
|
|
identifier omitting the `vars.` prefix.
|
|
There are special cases when those custom attributes are not set and Icinga 2 provides
|
|
a fallback to existing object attributes for example `host.address`.
|
|
|
|
In the following example the `$address$` macro will be resolved with the value of `vars.address`.
|
|
|
|
object Host "localhost" {
|
|
import "generic-host"
|
|
check_command = "my-host-macro-test"
|
|
address = "127.0.0.1"
|
|
vars.address = "127.2.2.2"
|
|
}
|
|
|
|
object CheckCommand "my-host-macro-test" {
|
|
command = "echo \"address: $address$ host.address: $host.address$ host.vars.address: $host.vars.address$\""
|
|
}
|
|
|
|
The check command output will look like
|
|
|
|
"address: 127.2.2.2 host.address: 127.0.0.1 host.vars.address: 127.2.2.2"
|
|
|
|
If you alter the host object and remove the `vars.address` line, Icinga 2 will fail to look up `$address$` in the
|
|
custom attributes dictionary and then look for the host object's attribute.
|
|
|
|
The check command output will change to
|
|
|
|
"address: 127.0.0.1 host.address: 127.0.0.1 host.vars.address: "
|
|
|
|
|
|
The same example can be defined for services overriding the `address` field based on a specific host custom attribute.
|
|
|
|
object Host "localhost" {
|
|
import "generic-host"
|
|
address = "127.0.0.1"
|
|
vars.macro_address = "127.3.3.3"
|
|
}
|
|
|
|
apply Service "my-macro-test" to Host {
|
|
import "generic-service"
|
|
check_command = "my-service-macro-test"
|
|
vars.address = "$host.vars.macro_address$"
|
|
|
|
assign where host.address
|
|
}
|
|
|
|
object CheckCommand "my-service-macro-test" {
|
|
command = "echo \"address: $address$ host.address: $host.address$ host.vars.macro_address: $host.vars.macro_address$ service.vars.address: $service.vars.address$\""
|
|
}
|
|
|
|
When the service check is executed the output looks like
|
|
|
|
"address: 127.3.3.3 host.address: 127.0.0.1 host.vars.macro_address: 127.3.3.3 service.vars.address: 127.3.3.3"
|
|
|
|
That way you can easily override existing macros being accessed by their short name like `$address$` and refrain
|
|
from defining multiple check commands (one for `$address$` and one for `$host.vars.macro_address$`).
|
|
|
|
|
|
### <a id="host-runtime-macros"></a> Host Runtime Macros
|
|
|
|
The following host custom attributes are available in all commands that are executed for
|
|
hosts or services:
|
|
|
|
Name | Description
|
|
-----------------------------|--------------
|
|
host.name | The name of the host object.
|
|
host.display_name | The value of the `display_name` attribute.
|
|
host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
|
|
host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
|
|
host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`.
|
|
host.check_attempt | The current check attempt number.
|
|
host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
|
|
host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
|
|
host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
|
|
host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`.
|
|
host.last_state_change | The last state change's timestamp.
|
|
host.downtime_depth | The number of active downtimes.
|
|
host.duration_sec | The time since the last state change.
|
|
host.latency | The host's check latency.
|
|
host.execution_time | The host's check execution time.
|
|
host.output | The last check's output.
|
|
host.perfdata | The last check's performance data.
|
|
host.last_check | The timestamp when the last check was executed.
|
|
host.check_source | The monitoring instance that performed the last check.
|
|
host.num_services | Number of services associated with the host.
|
|
host.num_services_ok | Number of services associated with the host which are in an `OK` state.
|
|
host.num_services_warning | Number of services associated with the host which are in a `WARNING` state.
|
|
host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state.
|
|
host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state.
|
|
|
|
### <a id="service-runtime-macros"></a> Service Runtime Macros
|
|
|
|
The following service macros are available in all commands that are executed for
|
|
services:
|
|
|
|
Name | Description
|
|
---------------------------|--------------
|
|
service.name | The short name of the service object.
|
|
service.display_name | The value of the `display_name` attribute.
|
|
service.check_command | The short name of the command along with any arguments to be used for the check.
|
|
service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
|
|
service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
|
|
service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`.
|
|
service.check_attempt | The current check attempt number.
|
|
service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
|
|
service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
|
|
service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
|
|
service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`.
|
|
service.last_state_change | The last state change's timestamp.
|
|
service.downtime_depth | The number of active downtimes.
|
|
service.duration_sec | The time since the last state change.
|
|
service.latency | The service's check latency.
|
|
service.execution_time | The service's check execution time.
|
|
service.output | The last check's output.
|
|
service.perfdata | The last check's performance data.
|
|
service.last_check | The timestamp when the last check was executed.
|
|
service.check_source | The monitoring instance that performed the last check.
|
|
|
|
### <a id="command-runtime-macros"></a> Command Runtime Macros
|
|
|
|
The following custom attributes are available in all commands:
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
command.name | The name of the command object.
|
|
|
|
### <a id="user-runtime-macros"></a> User Runtime Macros
|
|
|
|
The following custom attributes are available in all commands that are executed for
|
|
users:
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
user.name | The name of the user object.
|
|
user.display_name | The value of the display_name attribute.
|
|
|
|
### <a id="notification-runtime-macros"></a> Notification Runtime Macros
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
notification.type | The type of the notification.
|
|
notification.author | The author of the notification comment, if existing.
|
|
notification.comment | The comment of the notification, if existing.
|
|
|
|
### <a id="global-runtime-macros"></a> Global Runtime Macros
|
|
|
|
The following macros are available in all executed commands:
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
icinga.timet | Current UNIX timestamp.
|
|
icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
|
|
icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
|
|
icinga.date | Current date. Example: `2014-01-03`
|
|
icinga.time | Current time including timezone information. Example: `11:23:08 +0000`
|
|
icinga.uptime | Current uptime of the Icinga 2 process.
|
|
|
|
The following macros provide global statistics:
|
|
|
|
Name | Description
|
|
----------------------------------|--------------
|
|
icinga.num_services_ok | Current number of services in state 'OK'.
|
|
icinga.num_services_warning | Current number of services in state 'Warning'.
|
|
icinga.num_services_critical | Current number of services in state 'Critical'.
|
|
icinga.num_services_unknown | Current number of services in state 'Unknown'.
|
|
icinga.num_services_pending | Current number of pending services.
|
|
icinga.num_services_unreachable | Current number of unreachable services.
|
|
icinga.num_services_flapping | Current number of flapping services.
|
|
icinga.num_services_in_downtime | Current number of services in downtime.
|
|
icinga.num_services_acknowledged | Current number of acknowledged service problems.
|
|
icinga.num_hosts_up | Current number of hosts in state 'Up'.
|
|
icinga.num_hosts_down | Current number of hosts in state 'Down'.
|
|
icinga.num_hosts_unreachable | Current number of unreachable hosts.
|
|
icinga.num_hosts_flapping | Current number of flapping hosts.
|
|
icinga.num_hosts_in_downtime | Current number of hosts in downtime.
|
|
icinga.num_hosts_acknowledged | Current number of acknowledged host problems.
|
|
|
|
|
|
## <a id="check-result-freshness"></a> Check Result Freshness
|
|
|
|
In Icinga 2 active check freshness is enabled by default. It is determined by the
|
|
`check_interval` attribute and no incoming check results in that period of time.
|
|
|
|
threshold = last check execution time + check interval
|
|
|
|
Passive check freshness is calculated from the `check_interval` attribute if set.
|
|
|
|
threshold = last check result time + check interval
|
|
|
|
If the freshness checks are invalid, a new check is executed defined by the
|
|
`check_command` attribute.
|
|
|
|
|
|
## <a id="check-flapping"></a> Check Flapping
|
|
|
|
The flapping algorithm used in Icinga 2 does not store the past states but
|
|
calculcates the flapping threshold from a single value based on counters and
|
|
half-life values. Icinga 2 compares the value with a single flapping threshold
|
|
configuration attribute named `flapping_threshold`.
|
|
|
|
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
|
|
|
|
|
|
## <a id="volatile-services"></a> Volatile Services
|
|
|
|
By default all services remain in a non-volatile state. When a problem
|
|
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
|
|
is reached with the check counter, a `HARD` state transition happens.
|
|
Notifications are only triggered by `HARD` state changes and are then
|
|
re-sent defined by the `interval` attribute.
|
|
|
|
It may be reasonable to have a volatile service which stays in a `HARD`
|
|
state type if the service stays in a `NOT-OK` state. That way each
|
|
service recheck will automatically trigger a notification unless the
|
|
service is acknowledged or in a scheduled downtime.
|
|
|
|
|
|
## <a id="external-commands"></a> External Commands
|
|
|
|
Icinga 2 provides an external command pipe for processing commands
|
|
triggering specific actions (for example rescheduling a service check
|
|
through the web interface).
|
|
|
|
In order to enable the `ExternalCommandListener` configuration use the
|
|
following command and restart Icinga 2 afterwards:
|
|
|
|
# icinga2 feature enable command
|
|
|
|
Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
|
|
using the default configuration.
|
|
|
|
Web interfaces and other Icinga addons are able to send commands to
|
|
Icinga 2 through the external command pipe, for example for rescheduling
|
|
a forced service check:
|
|
|
|
# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
|
|
|
|
# tail -f /var/log/messages
|
|
|
|
Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
|
|
Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
|
|
|
|
|
|
### <a id="external-command-list"></a> External Command List
|
|
|
|
A list of currently supported external commands can be found [here](17-appendix.md#external-commands-list-detail).
|
|
|
|
Detailed information on the commands and their required parameters can be found
|
|
on the [Icinga 1.x documentation](http://docs.icinga.org/latest/en/extcommands2.html).
|
|
|
|
## <a id="logging"></a> Logging
|
|
|
|
Icinga 2 supports three different types of logging:
|
|
|
|
* File logging
|
|
* Syslog (on *NIX-based operating systems)
|
|
* Console logging (`STDOUT` on tty)
|
|
|
|
You can enable additional loggers using the `icinga2 feature enable`
|
|
and `icinga2 feature disable` commands to configure loggers:
|
|
|
|
Feature | Description
|
|
---------|------------
|
|
debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher)
|
|
mainlog | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher)
|
|
syslog | Syslog (severity: `warning` or higher)
|
|
|
|
By default file the `mainlog` feature is enabled. When running Icinga 2
|
|
on a terminal log messages with severity `information` or higher are
|
|
written to the console.
|
|
|
|
|
|
## <a id="performance-data"></a> Performance Data
|
|
|
|
When a host or service check is executed plugins should provide so-called
|
|
`performance data`. Next to that additional check performance data
|
|
can be fetched using Icinga 2 runtime macros such as the check latency
|
|
or the current service state (or additional custom attributes).
|
|
|
|
The performance data can be passed to external applications which aggregate and
|
|
store them in their backends. These tools usually generate graphs for historical
|
|
reporting and trending.
|
|
|
|
Well-known addons processing Icinga performance data are PNP4Nagios,
|
|
inGraph and Graphite.
|
|
|
|
### <a id="writing-performance-data-files"></a> Writing Performance Data Files
|
|
|
|
PNP4Nagios, inGraph and Graphios use performance data collector daemons to fetch
|
|
the current performance files for their backend updates.
|
|
|
|
Therefore the Icinga 2 `PerfdataWriter` object allows you to define
|
|
the output template format for host and services backed with Icinga 2
|
|
runtime vars.
|
|
|
|
host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$"
|
|
service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.statetype$"
|
|
|
|
The default templates are already provided with the Icinga 2 feature configuration
|
|
which can be enabled using
|
|
|
|
# icinga2 feature enable perfdata
|
|
|
|
By default all performance data files are rotated in a 15 seconds interval into
|
|
the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
|
|
`service-perfdata.<timestamp>`.
|
|
External collectors need to parse the rotated performance data files and then
|
|
remove the processed files.
|
|
|
|
### <a id="graphite-carbon-cache-writer"></a> Graphite Carbon Cache Writer
|
|
|
|
While there are some Graphite collector scripts and daemons like Graphios available for
|
|
Icinga 1.x it's more reasonable to directly process the check and plugin performance
|
|
in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
|
|
write them to the defined Graphite Carbon daemon tcp socket.
|
|
|
|
You can enable the feature using
|
|
|
|
# icinga2 feature enable graphite
|
|
|
|
By default the `GraphiteWriter` object expects the Graphite Carbon Cache to listen at
|
|
`127.0.0.1` on TCP port `2003`.
|
|
|
|
The current naming schema is
|
|
|
|
icinga.<hostname>.<metricname>
|
|
icinga.<hostname>.<servicename>.<metricname>
|
|
|
|
You can customize the metric prefix name by using the `host_name_template` and
|
|
`service_name_template` configuration attributes.
|
|
|
|
The example below uses [runtime macros](3-monitoring-basics.md#runtime-macros) and a
|
|
[global constant](15-language-reference.md#constants) named `GraphiteEnv`. The constant name
|
|
is freely definable and should be put in the [constants.conf](4-configuring-icinga-2.md#constants-conf) file.
|
|
|
|
const GraphiteEnv = "icinga.env1"
|
|
|
|
object GraphiteWriter "graphite" {
|
|
host_name_template = GraphiteEnv + ".$host.name$"
|
|
service_name_template = GraphiteEnv + ".$host.name$.$service.name$"
|
|
}
|
|
|
|
To make sure Icinga 2 writes a valid label into Graphite some characters are replaced
|
|
with `_` in the target name:
|
|
|
|
\/.- (and space)
|
|
|
|
The resulting name in Graphite might look like:
|
|
|
|
www-01 / http-cert / response time
|
|
icinga.www_01.http_cert.response_time
|
|
|
|
In addition to the performance data retrieved from the check plugin, Icinga 2 sends
|
|
internal check statistic data to Graphite:
|
|
|
|
metric | description
|
|
-------------------|------------------------------------------
|
|
current_attempt | current check attempt
|
|
max_check_attempts | maximum check attempts until the hard state is reached
|
|
reachable | checked object is reachable
|
|
downtime_depth | number of downtimes this object is in
|
|
execution_time | check execution time
|
|
latency | check latency
|
|
state | current state of the checked object
|
|
state_type | 0=SOFT, 1=HARD state
|
|
|
|
The following example illustrates how to configure the storage-schemas for Graphite Carbon
|
|
Cache. Please make sure that the order is correct because the first match wins.
|
|
|
|
[icinga_internals]
|
|
pattern = ^icinga\..*\.(max_check_attempts|reachable|current_attempt|execution_time|latency|state|state_type)
|
|
retentions = 5m:7d
|
|
|
|
[icinga_default]
|
|
# intervals like PNP4Nagios uses them per default
|
|
pattern = ^icinga\.
|
|
retentions = 1m:2d,5m:10d,30m:90d,360m:4y
|
|
|
|
### <a id="gelfwriter"></a> GELF Writer
|
|
|
|
The `Graylog Extended Log Format` (short: [GELF](http://www.graylog2.org/resources/gelf))
|
|
can be used to send application logs directly to a TCP socket.
|
|
|
|
While it has been specified by the [graylog2](http://www.graylog2.org/) project as their
|
|
[input resource standard](http://www.graylog2.org/resources/gelf), other tools such as
|
|
[Logstash](http://www.logstash.net) also support `GELF` as
|
|
[input type](http://logstash.net/docs/latest/inputs/gelf).
|
|
|
|
You can enable the feature using
|
|
|
|
# icinga2 feature enable gelf
|
|
|
|
By default the `GelfWriter` object expects the GELF receiver to listen at `127.0.0.1` on TCP port `12201`.
|
|
The default `source` attribute is set to `icinga2`. You can customize that for your needs if required.
|
|
|
|
Currently these events are processed:
|
|
* Check results
|
|
* State changes
|
|
* Notifications
|
|
|
|
### <a id="opentsdb-writer"></a> OpenTSDB Writer
|
|
|
|
While there are some OpenTSDB collector scripts and daemons like tcollector available for
|
|
Icinga 1.x it's more reasonable to directly process the check and plugin performance
|
|
in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
|
|
write them to the defined TSDB TCP socket.
|
|
|
|
You can enable the feature using
|
|
|
|
# icinga2 feature enable opentsdb
|
|
|
|
By default the `OpenTsdbWriter` object expects the TSD to listen at
|
|
`127.0.0.1` on port `4242`.
|
|
|
|
The current naming schema is
|
|
|
|
icinga.host.<metricname>
|
|
icinga.service.<servicename>.<metricname>
|
|
|
|
for host and service checks. The tag host is always applied.
|
|
|
|
To make sure Icinga 2 writes a valid metric into OpenTSDB some characters are replaced
|
|
with `_` in the target name:
|
|
|
|
\ (and space)
|
|
|
|
The resulting name in OpenTSDB might look like:
|
|
|
|
www-01 / http-cert / response time
|
|
icinga.http_cert.response_time
|
|
|
|
In addition to the performance data retrieved from the check plugin, Icinga 2 sends
|
|
internal check statistic data to OpenTSDB:
|
|
|
|
metric | description
|
|
-------------------|------------------------------------------
|
|
current_attempt | current check attempt
|
|
max_check_attempts | maximum check attempts until the hard state is reached
|
|
reachable | checked object is reachable
|
|
downtime_depth | number of downtimes this object is in
|
|
execution_time | check execution time
|
|
latency | check latency
|
|
state | current state of the checked object
|
|
state_type | 0=SOFT, 1=HARD state
|
|
|
|
While reachable, state and state_type are metrics for the host or service the
|
|
other metrics follow the current naming schema
|
|
|
|
icinga.check.<metricname>
|
|
|
|
with the following tags
|
|
|
|
tag | description
|
|
--------|------------------------------------------
|
|
type | the check type, one of [host, service]
|
|
host | hostname, the check ran on
|
|
service | the service name (if type=service)
|
|
|
|
> **Note**
|
|
>
|
|
> You might want to set the tsd.core.auto_create_metrics setting to `true`
|
|
> in your opentsdb.conf configuration file.
|
|
|
|
|
|
## <a id="status-data"></a> Status Data
|
|
|
|
Icinga 1.x writes object configuration data and status data in a cyclic
|
|
interval to its `objects.cache` and `status.dat` files. Icinga 2 provides
|
|
the `StatusDataWriter` object which dumps all configuration objects and
|
|
status updates in a regular interval.
|
|
|
|
# icinga2 feature enable statusdata
|
|
|
|
Icinga 1.x Classic UI requires this data set as part of its backend.
|
|
|
|
> **Note**
|
|
>
|
|
> If you are not using any web interface or addon which uses these files
|
|
> you can safely disable this feature.
|
|
|
|
|
|
## <a id="compat-logging"></a> Compat Logging
|
|
|
|
The Icinga 1.x log format is considered being the `Compat Log`
|
|
in Icinga 2 provided with the `CompatLogger` object.
|
|
|
|
These logs are not only used for informational representation in
|
|
external web interfaces parsing the logs, but also to generate
|
|
SLA reports and trends in Icinga 1.x Classic UI. Furthermore the
|
|
[Livestatus](11-livestatus.md#setting-up-livestatus) feature uses these logs for answering queries to
|
|
historical tables.
|
|
|
|
The `CompatLogger` object can be enabled with
|
|
|
|
# icinga2 feature enable compatlog
|
|
|
|
By default, the Icinga 1.x log file called `icinga.log` is located
|
|
in `/var/log/icinga2/compat`. Rotated log files are moved into
|
|
`var/log/icinga2/compat/archives`.
|
|
|
|
The format cannot be changed without breaking compatibility to
|
|
existing log parsers.
|
|
|
|
# tail -f /var/log/icinga2/compat/icinga.log
|
|
|
|
[1382115688] LOG ROTATION: HOURLY
|
|
[1382115688] LOG VERSION: 2.0
|
|
[1382115688] HOST STATE: CURRENT;localhost;UP;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;disk;WARNING;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;http;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;load;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;ping4;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;ping6;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;processes;WARNING;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;ssh;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;users;OK;HARD;1;
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;disk;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;http;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;load;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping6;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;processes;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ssh;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;users;1382115705
|
|
[1382115731] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;localhost;ping6;2;critical test|
|
|
[1382115731] SERVICE ALERT: localhost;ping6;CRITICAL;SOFT;2;critical test
|
|
|
|
|
|
|
|
|
|
## <a id="db-ido"></a> DB IDO
|
|
|
|
The IDO (Icinga Data Output) modules for Icinga 2 take care of exporting all
|
|
configuration and status information into a database. The IDO database is used
|
|
by a number of projects including Icinga Web 1.x and 2.
|
|
|
|
Details on the installation can be found in the [Configuring DB IDO](2-getting-started.md#configuring-db-ido)
|
|
chapter. Details on the configuration can be found in the
|
|
[IdoMysqlConnection](5-object-types.md#objecttype-idomysqlconnection) and
|
|
[IdoPgsqlConnection](5-object-types.md#objecttype-idopgsqlconnection)
|
|
object configuration documentation.
|
|
The DB IDO feature supports [High Availability](8-monitoring-remote-systems.md#high-availability-db-ido) in
|
|
the Icinga 2 cluster.
|
|
|
|
The following example query checks the health of the current Icinga 2 instance
|
|
writing its current status to the DB IDO backend table `icinga_programstatus`
|
|
every 10 seconds. By default it checks 60 seconds into the past which is a reasonable
|
|
amount of time - adjust it for your requirements. If the condition is not met,
|
|
the query returns an empty result.
|
|
|
|
> **Tip**
|
|
>
|
|
> Use [check plugins](9-addons-plugins.md#plugins) to monitor the backend.
|
|
|
|
Replace the `default` string with your instance name, if different.
|
|
|
|
Example for MySQL:
|
|
|
|
# mysql -u root -p icinga -e "SELECT status_update_time FROM icinga_programstatus ps
|
|
JOIN icinga_instances i ON ps.instance_id=i.instance_id
|
|
WHERE (UNIX_TIMESTAMP(ps.status_update_time) > UNIX_TIMESTAMP(NOW())-60)
|
|
AND i.instance_name='default';"
|
|
|
|
+---------------------+
|
|
| status_update_time |
|
|
+---------------------+
|
|
| 2014-05-29 14:29:56 |
|
|
+---------------------+
|
|
|
|
|
|
Example for PostgreSQL:
|
|
|
|
# export PGPASSWORD=icinga; psql -U icinga -d icinga -c "SELECT ps.status_update_time FROM icinga_programstatus AS ps
|
|
JOIN icinga_instances AS i ON ps.instance_id=i.instance_id
|
|
WHERE ((SELECT extract(epoch from status_update_time) FROM icinga_programstatus) > (SELECT extract(epoch from now())-60))
|
|
AND i.instance_name='default'";
|
|
|
|
status_update_time
|
|
------------------------
|
|
2014-05-29 15:11:38+02
|
|
(1 Zeile)
|
|
|
|
|
|
A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](17-appendix.md#schema-db-ido).
|
|
|
|
|
|
## <a id="check-result-files"></a> Check Result Files
|
|
|
|
Icinga 1.x writes its check result files to a temporary spool directory
|
|
where they are processed in a regular interval.
|
|
While this is extremely inefficient in performance regards it has been
|
|
rendered useful for passing passive check results directly into Icinga 1.x
|
|
skipping the external command pipe.
|
|
|
|
Several clustered/distributed environments and check-aggregation addons
|
|
use that method. In order to support step-by-step migration of these
|
|
environments, Icinga 2 supports the `CheckResultReader` object.
|
|
|
|
There is no feature configuration available, but it must be defined
|
|
on-demand in your Icinga 2 objects configuration.
|
|
|
|
object CheckResultReader "reader" {
|
|
spool_dir = "/data/check-results"
|
|
}
|