mirror of https://github.com/Icinga/icinga2.git
1531 lines
58 KiB
Markdown
1531 lines
58 KiB
Markdown
# <a id="monitoring-basics"></a> Monitoring Basics
|
|
|
|
This part of the Icinga 2 documentation provides an overview of all the basic
|
|
monitoring concepts you need to know to run Icinga 2.
|
|
|
|
## <a id="hosts-services"></a> Hosts and Services
|
|
|
|
Icinga 2 can be used to monitor the availability of hosts and services. Hosts
|
|
and services can be virtually anything which can be checked in some way:
|
|
|
|
* Network services (HTTP, SMTP, SNMP, SSH, etc.)
|
|
* Printers
|
|
* Switches / Routers
|
|
* Temperature Sensors
|
|
* Other local or network-accessible services
|
|
|
|
Host objects provide a mechanism to group services that are running
|
|
on the same physical device.
|
|
|
|
Here is an example of a host object which defines two child services:
|
|
|
|
object Host "my-server1" {
|
|
address = "10.0.0.1"
|
|
check_command = "hostalive"
|
|
}
|
|
|
|
object Service "ping4" {
|
|
host_name = "localhost"
|
|
check_command = "ping4"
|
|
}
|
|
|
|
object Service "http" {
|
|
host_name = "localhost"
|
|
check_command = "http_ip"
|
|
}
|
|
|
|
The example creates two services `ping4` and `http` which belong to the
|
|
host `my-server1`.
|
|
|
|
It also specifies that the host should perform its own check using the `hostalive`
|
|
check command.
|
|
|
|
The `address` custom attribute is used by check commands to determine which network
|
|
address is associated with the host object.
|
|
|
|
Details on troubleshooting check problems can be found [here](#troubleshooting).
|
|
|
|
### <a id="host-states"></a> Host States
|
|
|
|
Hosts can be in any of the following states:
|
|
|
|
Name | Description
|
|
------------|--------------
|
|
UP | The host is available.
|
|
DOWN | The host is unavailable.
|
|
|
|
### <a id="service-states"></a> Service States
|
|
|
|
Services can be in any of the following states:
|
|
|
|
Name | Description
|
|
------------|--------------
|
|
OK | The service is working properly.
|
|
WARNING | The service is experiencing some problems but is still considered to be in working condition.
|
|
CRITICAL | The service is in a critical state.
|
|
UNKNOWN | The check could not determine the service's state.
|
|
|
|
### <a id="hard-soft-states"></a> Hard and Soft States
|
|
|
|
When detecting a problem with a host/service Icinga re-checks the object a number of
|
|
times (based on the `max_check_attempts` and `retry_interval` settings) before sending
|
|
notifications. This ensures that no unnecessary notifications are sent for
|
|
transient failures. During this time the object is in a `SOFT` state.
|
|
|
|
After all re-checks have been executed and the object is still in a non-OK
|
|
state the host/service switches to a `HARD` state and notifications are sent.
|
|
|
|
Name | Description
|
|
------------|--------------
|
|
HARD | The host/service's state hasn't recently changed.
|
|
SOFT | The host/service has recently changed state and is being re-checked.
|
|
|
|
|
|
## <a id="using-templates"></a> Using Templates
|
|
|
|
Templates may be used to apply a set of identical attributes to more than one
|
|
object:
|
|
|
|
template Service "generic-service" {
|
|
max_check_attempts = 3
|
|
check_interval = 5m
|
|
retry_interval = 1m
|
|
enable_perfdata = true
|
|
}
|
|
|
|
object Service "ping4" {
|
|
import "generic-service"
|
|
|
|
host_name = "localhost"
|
|
check_command = "ping4"
|
|
}
|
|
|
|
object Service "ping6" {
|
|
import "generic-service"
|
|
|
|
host_name = "localhost"
|
|
check_command = "ping6"
|
|
}
|
|
|
|
In this example the `ping4` and `ping6` services inherit properties from the
|
|
template `generic-service`.
|
|
|
|
Objects as well as templates themselves can import an arbitrary number of
|
|
templates. Attributes inherited from a template can be overridden in the
|
|
object if necessary.
|
|
|
|
## <a id="using-apply"></a> Apply objects based on rules
|
|
|
|
Instead of assigning each object (`Service`, `Notification`, `Dependency`, `ScheduledDowntime`)
|
|
based on attribute identifiers for example `host_name` objects can be [applied](#apply).
|
|
|
|
apply Service "load" {
|
|
import "generic-service"
|
|
|
|
check_command = "load"
|
|
|
|
assign where "linux-server" in host.groups
|
|
ignore where host.vars.no_load_check
|
|
}
|
|
|
|
In this example the `load` service will be created as object for all hosts in the `linux-server`
|
|
host group. If the `no_load_check` custom attribute is set, the host will be
|
|
ignored.
|
|
|
|
Notifications are applied to specific targets (`Host` or `Service`) and work in a similar
|
|
manner:
|
|
|
|
apply Notification "mail-noc" to Service {
|
|
import "mail-service-notification"
|
|
command = "mail-service-notification"
|
|
user_groups = [ "noc" ]
|
|
|
|
assign where service.vars.sla == "24x7"
|
|
}
|
|
|
|
In this example the `mail-noc` notification will be created as object for all services having the
|
|
`sla` custom attribute set to `24x7`. The notification command is set to `mail-service-notification`
|
|
and all members of the user group `noc` will get notified.
|
|
|
|
`Dependency` and `ScheduledDowntime` objects can be applied in a similar fashion.
|
|
|
|
|
|
## <a id="groups"></a> Groups
|
|
|
|
Groups are used for combining hosts, services, and users into
|
|
accessible configuration attributes and views in external (web)
|
|
interfaces.
|
|
|
|
Group membership is defined at the respective object itself. If
|
|
you have a hostgroup name `windows` for example, and want to assign
|
|
specific hosts to this group for later viewing the group on your
|
|
alert dashboard, first create the hostgroup:
|
|
|
|
object HostGroup "windows" {
|
|
display_name = "Windows Servers"
|
|
}
|
|
|
|
Then add your hosts to this hostgroup
|
|
|
|
template Host "windows-server" {
|
|
groups += [ "windows" ]
|
|
}
|
|
|
|
object Host "mssql-srv1" {
|
|
import "windows-server"
|
|
|
|
vars.mssql_port = 1433
|
|
}
|
|
|
|
object Host "mssql-srv2" {
|
|
import "windows-server"
|
|
|
|
vars.mssql_port = 1433
|
|
}
|
|
|
|
This can be done for service and user groups the same way. Additionally
|
|
the user groups are associated as attributes in `Notification` objects.
|
|
|
|
object UserGroup "windows-mssql-admins" {
|
|
display_name = "Windows MSSQL Admins"
|
|
}
|
|
|
|
template User "generic-windows-mssql-users" {
|
|
groups += [ "windows-mssql-admins" ]
|
|
}
|
|
|
|
object User "win-mssql-noc" {
|
|
import "generic-windows-mssql-users"
|
|
|
|
email = "noc@example.com"
|
|
}
|
|
|
|
object User "win-mssql-ops" {
|
|
import "generic-windows-mssql-users"
|
|
|
|
email = "ops@example.com"
|
|
}
|
|
|
|
### <a id="group-assign"></a> Group Membership Assign
|
|
|
|
If there is a certain number of hosts, services or users matching a pattern
|
|
it's reasonable to assign the group object to these members.
|
|
Details on the `assign where` syntax can be found [here](#apply)
|
|
|
|
object HostGroup "mssql" {
|
|
display_name = "MSSQL Servers"
|
|
assign where host.vars.mssql_port
|
|
}
|
|
|
|
In this inherited example from above all hosts with the `var` `mssql_port`
|
|
set will be added as members to the host group `mssql`.
|
|
|
|
## <a id="notifications"></a> Notifications
|
|
|
|
Notifications for service and host problems are an integral part of your
|
|
monitoring setup.
|
|
|
|
When a host or service is in a downtime, a problem has been acknowledged or
|
|
the dependency logic determined that the host/service is unreachable, no
|
|
notirications are sent. You can configure additional type and state filters
|
|
refining the notifications being actually sent.
|
|
|
|
There are many ways of sending notifications, e.g. by e-mail, XMPP,
|
|
IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
|
|
Instead it relies on external mechanisms such as shell scripts to notify users.
|
|
|
|
A notification specification requires one or more users (and/or user groups)
|
|
who will be notified in case of problems. These users must have all custom
|
|
attributes defined which will be used in the `NotificationCommand` on execution.
|
|
|
|
The user `icingaadmin` in the example below will get notified only on `WARNING` and
|
|
`CRITICAL` states and `problem` and `recovery` notification types.
|
|
|
|
object User "icingaadmin" {
|
|
display_name = "Icinga 2 Admin"
|
|
enable_notifications = true
|
|
states = [ OK, Warning, Critical ]
|
|
types = [ Problem, Recovery ]
|
|
email = "icinga@localhost"
|
|
}
|
|
|
|
If you don't set the `states` and `types` configuration attributes for the `User`
|
|
object, notifications for all states and types will be sent.
|
|
|
|
Details on troubleshooting notification problems can be found [here](#troubleshooting).
|
|
|
|
You should choose which information you (and your notified users) are interested in
|
|
case of emergency, and also which information does not provide any value to you and
|
|
your environment.
|
|
|
|
An example notification command is explained [here](#notification-commands).
|
|
|
|
You can add all shared attributes to a `Notification` template which is inherited
|
|
to the defined notifications. That way you'll save duplicated attributes in each
|
|
`Notification` object. Attributes can be overridden locally.
|
|
|
|
template Notification "generic-notification" {
|
|
interval = 15m
|
|
|
|
command = "mail-service-notification"
|
|
|
|
states = [ Warning, Critical, Unknown ]
|
|
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
|
|
FlappingEnd, DowntimeStart,DowntimeEnd, DowntimeRemoved ]
|
|
|
|
period = "24x7"
|
|
}
|
|
|
|
The time period `24x7` is shipped as example configuration with Icinga 2.
|
|
|
|
Use the `apply` keyword to create `Notification` objects for your services:
|
|
|
|
apply Notification "mail" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
|
|
assign where service.name == "mysql"
|
|
}
|
|
|
|
Instead of assigning users to notifications, you can also add the `user_groups`
|
|
attribute with a list of user groups to the `Notification` object. Icinga 2 will
|
|
send notifications to all group members.
|
|
|
|
### <a id="notification-escalations"></a> Notification Escalations
|
|
|
|
When a problem notification is sent and a problem still exists after re-notification
|
|
you may want to escalate the problem to the next support level. A different approach
|
|
is to configure the default notification by email, and escalate the problem via sms
|
|
if not already solved.
|
|
|
|
You can define notification start and end times as additional configuration
|
|
attributes making the `Notification` object a so-called `notification escalation`.
|
|
Using templates you can share the basic notification attributes such as users or the
|
|
`interval` (and override them for the escalation then).
|
|
|
|
Using the example from above, you can define additional users being escalated for sms
|
|
notifications between start and end time.
|
|
|
|
object User "icinga-oncall-2nd-level" {
|
|
display_name = "Icinga 2nd Level"
|
|
|
|
vars.mobile = "+1 555 424642"
|
|
}
|
|
|
|
object User "icinga-oncall-1st-level" {
|
|
display_name = "Icinga 1st Level"
|
|
|
|
vars.mobile = "+1 555 424642"
|
|
}
|
|
|
|
Define an additional `NotificationCommand` for SMS notifications.
|
|
|
|
> **Note**
|
|
>
|
|
> The example is not complete as there are many different SMS providers.
|
|
> Please note that sending SMS notifications will require an SMS provider
|
|
> or local hardware with a SIM card active.
|
|
|
|
object NotificationCommand "sms-notification" {
|
|
command = [
|
|
PluginDir + "/send_sms_notification",
|
|
"$mobile$",
|
|
"..."
|
|
}
|
|
|
|
The two new notification escalations are added onto the host `localhost`
|
|
and its service `ping4` using the `generic-notification` template.
|
|
The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
|
|
command) after `30m` until `1h`.
|
|
|
|
> **Note**
|
|
>
|
|
> The `interval` was set to 15m in the `generic-notification`
|
|
> template example. Lower that value in your escalations by using a secondary
|
|
> template or overriding the attribute directly in the `notifications` array
|
|
> position for `escalation-sms-2nd-level`.
|
|
|
|
If the problem does not get resolved or acknowledged preventing further notifications
|
|
the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
|
|
notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
|
|
|
|
apply Notification "mail" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
apply Notification "escalation-sms-2nd-level" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "sms-notification"
|
|
users = [ "icinga-oncall-2nd-level" ]
|
|
|
|
times = {
|
|
begin = 30m
|
|
end = 1h
|
|
}
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
apply Notification "escalation-sms-1st-level" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "sms-notification"
|
|
users = [ "icinga-oncall-1st-level" ]
|
|
|
|
times = {
|
|
begin = 1h
|
|
end = 2h
|
|
}
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
### <a id="first-notification-delay"></a> First Notification Delay
|
|
|
|
Sometimes the problem in question should not be notified when the first notification
|
|
happens, but a defined time duration afterwards. In Icinga 2 you can use the `times`
|
|
dictionary and set `begin = 15m` as key and value if you want to suppress notifications
|
|
in the first 15 minutes. Leave out the `end` key - if not set, Icinga 2 will not check against any
|
|
end time for this notification.
|
|
|
|
apply Notification "mail" to Service {
|
|
import "generic-notification"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
|
|
times.begin = 15m // delay first notification
|
|
|
|
assign where service.name == "ping4"
|
|
}
|
|
|
|
### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
|
|
|
|
If there are no notification state and type filter attributes defined at the `Notification`
|
|
or `User` object Icinga 2 assumes that all states and types are being notified.
|
|
|
|
Available state and type filters for notifications are:
|
|
|
|
template Notification "generic-notification" {
|
|
|
|
states = [ Warning, Critical, Unknown ]
|
|
types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
|
|
FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
|
|
}
|
|
|
|
If you are familiar with Icinga 1.x `notification_options` please note that they have been split
|
|
into type and state, and allow more fine granular filtering for example on downtimes and flapping.
|
|
You can filter for acknowledgements and custom notifications too.
|
|
|
|
|
|
## <a id="timeperiods"></a> Time Periods
|
|
|
|
Time Periods define time ranges in Icinga where event actions are
|
|
triggered, for example whether a service check is executed or not within
|
|
the `check_period` attribute. Or a notification should be sent to
|
|
users or not, filtered by the `period` and `notification_period`
|
|
configuration attributes for `Notification` and `User` objects.
|
|
|
|
> **Note**
|
|
>
|
|
> If you are familar with Icinga 1.x - these time period definitions
|
|
> are called `legacy timeperiods` in Icinga 2.
|
|
>
|
|
> An Icinga 2 legacy timeperiod requires the `ITL` provided template
|
|
>`legacy-timeperiod`.
|
|
|
|
The `TimePeriod` attribute `ranges` may contain multiple directives,
|
|
including weekdays, days of the month, and calendar dates.
|
|
These types may overlap/override other types in your ranges dictionary.
|
|
|
|
The descending order of precedence is as follows:
|
|
|
|
* Calendar date (2008-01-01)
|
|
* Specific month date (January 1st)
|
|
* Generic month date (Day 15)
|
|
* Offset weekday of specific month (2nd Tuesday in December)
|
|
* Offset weekday (3rd Monday)
|
|
* Normal weekday (Tuesday)
|
|
|
|
If you don't set any `check_period` or `notification_period` attribute
|
|
on your configuration objects Icinga 2 assumes `24x7` as time period
|
|
as shown below.
|
|
|
|
object TimePeriod "24x7" {
|
|
import "legacy-timeperiod"
|
|
|
|
display_name = "Icinga 2 24x7 TimePeriod"
|
|
ranges = {
|
|
"monday" = "00:00-24:00"
|
|
"tuesday" = "00:00-24:00"
|
|
"wednesday" = "00:00-24:00"
|
|
"thursday" = "00:00-24:00"
|
|
"friday" = "00:00-24:00"
|
|
"saturday" = "00:00-24:00"
|
|
"sunday" = "00:00-24:00"
|
|
}
|
|
}
|
|
|
|
If your operation staff should only be notified during workhours
|
|
create a new timeperiod named `workhours` defining a work day from
|
|
09:00 to 17:00.
|
|
|
|
object TimePeriod "workhours" {
|
|
import "legacy-timeperiod"
|
|
|
|
display_name = "Icinga 2 8x5 TimePeriod"
|
|
ranges = {
|
|
"monday" = "09:00-17:00"
|
|
"tuesday" = "09:00-17:00"
|
|
"wednesday" = "09:00-17:00"
|
|
"thursday" = "09:00-17:00"
|
|
"friday" = "09:00-17:00"
|
|
}
|
|
}
|
|
|
|
Use the `period` attribute to assign time periods to
|
|
`Notification` and `Dependency` objects:
|
|
|
|
object Notification "mail" {
|
|
import "generic-notification"
|
|
|
|
host_name = "localhost"
|
|
|
|
command = "mail-notification"
|
|
users = [ "icingaadmin" ]
|
|
period = "workhours"
|
|
}
|
|
|
|
|
|
## <a id="commands"></a> Commands
|
|
|
|
Icinga 2 uses three different command object types to specify how
|
|
checks should be performed, notifications should be sent and
|
|
events should be handled.
|
|
|
|
### <a id="command-environment-variables"></a> Environment Variables for Commands
|
|
|
|
Please check [Runtime Custom Attributes as Environment Variables](#runtime-custom-attribute-env-vars).
|
|
|
|
|
|
### <a id="check-commands"></a> Check Commands
|
|
|
|
`CheckCommand` objects define the command line how a check is called.
|
|
|
|
#### <a id="command-plugin-integration"></a> Integrate the Plugin with a CheckCommand Definition
|
|
|
|
`CheckCommand` objects require the [ITL template](#itl-plugin-check-command)
|
|
`plugin-check-command` to support native plugin based check methods.
|
|
|
|
Unless you have done so already, download your check plugin and put it
|
|
into the `PluginDir` directory. The following example uses the
|
|
`check_disk` plugin shipped with the Monitoring Plugins package.
|
|
|
|
The plugin path and all command arguments are made a list of
|
|
double-quoted string arguments for proper shell escaping.
|
|
|
|
Call the `check_disk` plugin with the `--help` parameter to see
|
|
all available options. Our example defines warning (`-w`) and
|
|
critical (`-c`) thresholds for the disk usage. Without any
|
|
partition defined (`-p`) it will check all local partitions.
|
|
|
|
icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help
|
|
...
|
|
This plugin checks the amount of used disk space on a mounted file system
|
|
and generates an alert if free space is less than one of the threshold values
|
|
|
|
|
|
Usage:
|
|
check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device}
|
|
[-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ]
|
|
[-t timeout] [-u unit] [-v] [-X type] [-N type]
|
|
...
|
|
|
|
Next step is to understand how command parameters are being passed from
|
|
a host or service object, and add a `CheckCommand` definition based on these
|
|
required parameters and/or default values.
|
|
|
|
#### <a id="command-passing-parameters"></a> Passing Check Command Parameters from Host or Service
|
|
|
|
Unline Icinga 1.x check command parameters are defined as custom attributes
|
|
which can be accessed as runtime macros by the executed check command.
|
|
|
|
Define the default check command custom attribute `disk_wfree` and `disk_cfree`
|
|
(freely definable naming schema) and their default threshold values. You can
|
|
then use these custom attributes as runtime macros on the command line.
|
|
|
|
The default custom attributes can be overridden by the custom attributes
|
|
defined in the service using the check command `disk`. The custom attributes
|
|
can also be inherited from a parent template using additive inheritance (`+=`).
|
|
|
|
object CheckCommand "disk" {
|
|
import "plugin-check-command"
|
|
|
|
command = [
|
|
PluginDir + "/check_disk",
|
|
"-w", "$disk_wfree$%",
|
|
"-c", "$disk_cfree$%"
|
|
],
|
|
|
|
vars.disk_wfree = 20
|
|
vars.disk_cfree = 10
|
|
}
|
|
|
|
The host `localhost` with the service `disk` checks all disks with modified
|
|
custom attributes (warning thresholds at `10%`, critical thresholds at `5%`
|
|
free disk space).
|
|
|
|
object Host "localhost" {
|
|
import "generic-host"
|
|
|
|
address = "127.0.0.1"
|
|
address6 = "::1"
|
|
}
|
|
|
|
object Service "disk" {
|
|
import "generic-service"
|
|
|
|
host_name = "localhost"
|
|
check_command = "disk"
|
|
|
|
vars.disk_wfree = 10
|
|
vars.disk_cfree = 5
|
|
}
|
|
|
|
|
|
### <a id="notification-commands"></a> Notification Commands
|
|
|
|
`NotificationCommand` objects define how notifications are delivered to external
|
|
interfaces (E-Mail, XMPP, IRC, Twitter, etc).
|
|
|
|
`NotificationCommand` objects require the [ITL template](#itl-plugin-notification-command)
|
|
`plugin-notification-command` to support native plugin-based notifications.
|
|
|
|
Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for
|
|
the current check output) sending an email to the user(s) associated with the
|
|
notification itself (`$user.email$`).
|
|
|
|
If you want to specify default values for some of the custom attribute definitions,
|
|
you can add a `vars` dictionary as shown for the `CheckCommand` object.
|
|
|
|
object NotificationCommand "mail-service-notification" {
|
|
import "plugin-notification-command"
|
|
|
|
command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ]
|
|
|
|
env = {
|
|
"NOTIFICATIONTYPE" = "$notification.type$"
|
|
"SERVICEDESC" = "$service.name$"
|
|
"HOSTALIAS" = "$host.display_name$",
|
|
"HOSTADDRESS" = "$address$",
|
|
"SERVICESTATE" = "$service.state$",
|
|
"LONGDATETIME" = "$icinga.long_date_time$",
|
|
"SERVICEOUTPUT" = "$service.output$",
|
|
"NOTIFICATIONAUTHORNAME" = "$notification.author$",
|
|
"NOTIFICATIONCOMMENT" = "$notification.comment$",
|
|
"HOSTDISPLAYNAME" = "$host.display_name$",
|
|
"SERVICEDISPLAYNAME" = "$service.display_name$",
|
|
"USEREMAIL" = "$user.email$"
|
|
}
|
|
}
|
|
|
|
The command attribute in the `mail-service-notification` command refers to the following
|
|
shell script. The macros specified in the `env` array are exported
|
|
as environment variables and can be used in the notification script:
|
|
|
|
#!/usr/bin/env bash
|
|
template=$(cat <<TEMPLATE
|
|
***** Icinga *****
|
|
|
|
Notification Type: $NOTIFICATIONTYPE
|
|
|
|
Service: $SERVICEDESC
|
|
Host: $HOSTALIAS
|
|
Address: $HOSTADDRESS
|
|
State: $SERVICESTATE
|
|
|
|
Date/Time: $LONGDATETIME
|
|
|
|
Additional Info: $SERVICEOUTPUT
|
|
|
|
Comment: [$NOTIFICATIONAUTHORNAME] $NOTIFICATIONCOMMENT
|
|
TEMPLATE
|
|
)
|
|
|
|
/usr/bin/printf "%b" $template | mail -s "$NOTIFICATIONTYPE - $HOSTDISPLAYNAME - $SERVICEDISPLAYNAME is $SERVICESTATE" $USEREMAIL
|
|
|
|
While it's possible to specify the entire notification command right
|
|
in the NotificationCommand object it is generally advisable to create a
|
|
shell script in the `/etc/icinga2/scripts` directory and have the
|
|
NotificationCommand object refer to that.
|
|
|
|
### <a id="event-commands"></a> Event Commands
|
|
|
|
Unlike notifications event commands are called on every service state change
|
|
if defined. Therefore the `EventCommand` object should define a command line
|
|
evaluating the current service state and other service runtime attributes
|
|
available through runtime vars. Runtime macros such as `$SERVICESTATETYPE$`
|
|
and `$SERVICESTATE$` will be processed by Icinga 2 helping on fine-granular
|
|
events being triggered.
|
|
|
|
Common use case scenarios are a failing HTTP check requiring an immediate
|
|
restart via event command, or if an application is locked and requires
|
|
a restart upon detection.
|
|
|
|
`EventCommand` objects require the ITL template `plugin-event-command`
|
|
to support native plugin based checks.
|
|
|
|
When the event command is triggered on a service state change, it will
|
|
send a check result using the `process_check_result` script forcibly
|
|
changing the service state back to `OK` (`-r 0`) providing some debug
|
|
information in the check output (`-o`).
|
|
|
|
object EventCommand "plugin-event-process-check-result" {
|
|
import "plugin-event-command"
|
|
|
|
command = [
|
|
PluginDir + "/process_check_result",
|
|
"-H", "$host.name$",
|
|
"-S", "$service.name$",
|
|
"-c", LocalStateDir + "/run/icinga2/cmd/icinga2.cmd",
|
|
"-r", "0",
|
|
"-o", "Event Handler triggered in state '$service.state$' with output '$service.output$'."
|
|
]
|
|
}
|
|
|
|
### <a id="commands-arguments"></a> Command Arguments
|
|
|
|
By defining a check command line using the `command` attribute Icinga 2
|
|
will resolve all macros in the static string or array. Sometimes it is
|
|
required to extend the arguments list based on a met condition evaluated
|
|
at command execution. Or making arguments optional - only set if the
|
|
macro value can be resolved by Icinga 2.
|
|
|
|
object CheckCommand "check_http" {
|
|
import "plugin-check-command"
|
|
|
|
command = PluginDir + "/check_http"
|
|
|
|
arguments = {
|
|
"-H" = "$http_vhost$"
|
|
"-I" = "$http_address$"
|
|
"-u" = "$http_uri$"
|
|
"-p" = "$http_port$"
|
|
"-S" = {
|
|
set_if = "$http_ssl$"
|
|
}
|
|
"-w" = "$http_warn_time$"
|
|
"-c" = "$http_critical_time$"
|
|
}
|
|
|
|
vars.http_address = "$address$"
|
|
vars.http_ssl = false
|
|
}
|
|
|
|
The example shows the `check_http` check command defining the most common
|
|
arguments. Each of them is optional by default and will be omitted if
|
|
the value is not set. For example if the service calling the check command
|
|
does not have `vars.http_port` set, it won't get added to the command
|
|
line.
|
|
If the `vars.http_ssl` custom attribute is set in the service, host or command
|
|
object definition, Icinga 2 will add the `-S` argument based on the `set_if`
|
|
option to the command line.
|
|
That way you can use the `check_http` command definition for both, with and
|
|
without SSL enabled checks saving you duplicated command definitions.
|
|
|
|
Details on all available options can be found in the
|
|
[CheckCommand object definition](#objecttype-checkcommand).
|
|
|
|
|
|
## <a id="dependencies"></a> Dependencies
|
|
|
|
Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects
|
|
for determing their network reachability.
|
|
The `parent_host_name` and `parent_service_name` attributes are mandatory for
|
|
service dependencies, `parent_host_name` is required for host dependencies.
|
|
|
|
A service can depend on a host, and vice versa. A service has an implicit
|
|
dependency (parent) to its host. A host to host dependency acts implicit
|
|
as host parent relation.
|
|
When dependencies are calculated, not only the immediate parent is taken into
|
|
account but all parents are inherited.
|
|
|
|
Notifications are suppressed if a host or service becomes unreachable.
|
|
|
|
A common scenario is the Icinga 2 server behind a router. Checking internet
|
|
access by pinging the Google DNS server `google-dns` is a common method, but
|
|
will fail in case the `dsl-router` host is down. Therefore the example below
|
|
defines a host dependency which acts implicit as parent relation too.
|
|
|
|
Furthermore the host may be reachable but ping probes are dropped by the
|
|
router's firewall. In case the `dsl-router``ping4` service check fails, all
|
|
further checks for the `ping4` service on host `google-dns` service should
|
|
be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
|
|
|
|
object Host "dsl-router" {
|
|
address = "192.168.1.1"
|
|
}
|
|
|
|
object Host "google-dns" {
|
|
address = "8.8.8.8"
|
|
}
|
|
|
|
apply Service "ping4" {
|
|
import "generic-service"
|
|
|
|
check_command = "ping4"
|
|
|
|
assign where host.address
|
|
}
|
|
|
|
apply Dependency "internet" to Service {
|
|
parent_host_name = "dsl-router"
|
|
disable_checks = true
|
|
|
|
assign where host.name != "dsl-router"
|
|
}
|
|
|
|
Another classic example are agent based checks. You would define a health check
|
|
for the agent daemon responding to your requests, and make all other services
|
|
querying that daemon depend on that health check.
|
|
|
|
The following configuration defines two nrpe based service checks `nrpe-load`
|
|
and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
|
|
`nrpe-health` service.
|
|
|
|
apply Service "nrpe-health" {
|
|
import "generic-service"
|
|
check_command = "nrpe"
|
|
assign where match("nrpe-*", host.name)
|
|
}
|
|
|
|
apply Service "nrpe-load" {
|
|
import "generic-service"
|
|
check_command = "nrpe"
|
|
vars.nrpe_command = "check_load"
|
|
assign where match("nrpe-*", host.name)
|
|
}
|
|
|
|
apply Service "nrpe-disk" {
|
|
import "generic-service"
|
|
check_command = "nrpe"
|
|
vars.nrpe_command = "check_disk"
|
|
assign where match("nrpe-*", host.name)
|
|
}
|
|
|
|
object Host "nrpe-server" {
|
|
import "generic-host"
|
|
address = "192.168.1.5",
|
|
}
|
|
|
|
apply Dependency "disable-nrpe-checks" to Service {
|
|
parent_service_name = "nrpe-health"
|
|
|
|
states = [ Warning, Critical, Unknown ]
|
|
disable_checks = true
|
|
disable_notifications = true
|
|
assign where match("nrpe-*", host.name)
|
|
ignore where service.name == "nrpe-health"
|
|
}
|
|
|
|
The `disable-nrpe-checks` dependency is applied to all services
|
|
on the `nrpe-service` host but not the `nrpe-health` service itself.
|
|
|
|
|
|
## <a id="downtimes"></a> Downtimes
|
|
|
|
Downtimes can be scheduled for planned server maintenance or
|
|
any other targetted service outage you are aware of in advance.
|
|
|
|
Downtimes will suppress any notifications, and may trigger other
|
|
downtimes too. If the downtime was set by accident, or the duration
|
|
exceeds the maintenance, you can manually cancel the downtime.
|
|
Planned downtimes will also be taken into account for SLA reporting
|
|
tools calculating the SLAs based on the state and downtime history.
|
|
|
|
Downtimes may overlap with their start and end times. If there
|
|
are multiple downtimes triggered for one object, the overall downtime depth
|
|
will be more than `1`. This is useful when you want to extend
|
|
your maintenance window taking longer than expected.
|
|
|
|
If the downtime was scheduled after the problem changed to a critical hard
|
|
state triggering a problem notification, and the service recovers during
|
|
the downtime window, the recovery notification won't be suppressed.
|
|
|
|
### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
|
|
|
|
A `fixed` downtime will be activated at the defined start time, and
|
|
removed at the end time. During this time window the service state
|
|
will change to `NOT-OK` and then actually trigger the downtime.
|
|
Notifications are suppressed and the downtime depth is incremented.
|
|
|
|
Common scenarios are a planned distribution upgrade on your linux
|
|
servers, or database updates in your warehouse. The customer knows
|
|
about a fixed downtime window between 23:00 and 24:00. After 24:00
|
|
all problems should be alerted again. Solution is simple -
|
|
schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
|
|
|
|
Unlike a `fixed` downtime, a `flexible` downtime end does not necessarily
|
|
happen at the provided end time. Instead the downtime will be triggered
|
|
by the state change in the time span defined by start and end time, but
|
|
then last a defined duration in minutes.
|
|
|
|
Imagine the following scenario: Your service is frequently polled
|
|
by users trying to grab free deleted domains for immediate registration.
|
|
Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
|
|
a network outage visible to the monitoring. The service is still alive,
|
|
but answering too slow to Icinga 2 service checks.
|
|
For that reason, you may want to schedule a downtime between 07:30 and
|
|
08:00 with a duration of 15 minutes. The downtime will then last from
|
|
its trigger time until the duration is over. After that, the downtime
|
|
is removed (may happen before or after the actual end time!).
|
|
|
|
### <a id="scheduling-downtime"></a> Scheduling a downtime
|
|
|
|
This can either happen through a web interface or by sending an [external command](#external-commands)
|
|
to the external command pipe provided by the `ExternalCommandListener` configuration.
|
|
|
|
Fixed downtimes require a start and end time (a duration will be ignored).
|
|
Flexible downtimes need a start and end time for the time span, and a duration
|
|
independent from that time span.
|
|
|
|
### <a id="triggered-downtimes"></a> Triggered Downtimes
|
|
|
|
This is optional when scheduling a downtime. If there is already a downtime
|
|
scheduled for a future maintenance, the current downtime can be triggered by
|
|
that downtime. This renders useful if you have scheduled a host downtime and
|
|
are now scheduling a child host's downtime getting triggered by the parent
|
|
downtime on NOT-OK state change.
|
|
|
|
### <a id="recurring-downtimes"></a> Recurring Downtimes
|
|
|
|
[ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up
|
|
recurring downtimes for services.
|
|
|
|
Example:
|
|
|
|
apply ScheduledDowntime "backup-downtime" to Service {
|
|
author = "icingaadmin"
|
|
comment = "Scheduled downtime for backup"
|
|
|
|
ranges = {
|
|
monday = "02:00-03:00"
|
|
tuesday = "02:00-03:00"
|
|
wednesday = "02:00-03:00"
|
|
thursday = "02:00-03:00"
|
|
friday = "02:00-03:00"
|
|
saturday = "02:00-03:00"
|
|
sunday = "02:00-03:00"
|
|
}
|
|
|
|
assign where "backup" in service.groups
|
|
}
|
|
|
|
|
|
## <a id="comments"></a> Comments
|
|
|
|
Comments can be added at runtime and are persistent over restarts. You can
|
|
add useful information for others on repeating incidents (for example
|
|
"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
|
|
is primarly accessible using web interfaces.
|
|
|
|
Adding and deleting comment actions are possible through the external command pipe
|
|
provided with the `ExternalCommandListener` configuration. The caller must
|
|
pass the comment id in case of manipulating an existing comment.
|
|
|
|
|
|
## <a id="acknowledgements"></a> Acknowledgements
|
|
|
|
If a problem is alerted and notified you may signal the other notification
|
|
receipients that you are aware of the problem and will handle it.
|
|
|
|
By sending an acknowledgement to Icinga 2 (using the external command pipe
|
|
provided with `ExternalCommandListener` configuration) all future notifications
|
|
are suppressed, a new comment is added with the provided description and
|
|
a notification with the type `NotificationFilterAcknowledgement` is sent
|
|
to all notified users.
|
|
|
|
### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
|
|
|
|
Once a problem is acknowledged it may disappear from your `handled problems`
|
|
dashboard and no-one ever looks at it again since it will suppress
|
|
notifications too.
|
|
|
|
This `fire-and-forget` action is quite common. If you're sure that a
|
|
current problem should be resolved in the future at a defined time,
|
|
you can define an expiration time when acknowledging the problem.
|
|
|
|
Icinga 2 will clear the acknowledgement when expired and start to
|
|
re-notify if the problem persists.
|
|
|
|
|
|
|
|
## <a id="custom-attributes"></a> Custom Attributes
|
|
|
|
### <a id="runtime-custom-attributes"></a> Using Custom Attributes at Runtime
|
|
|
|
Custom attributes may be used in command definitions to dynamically change how the command
|
|
is executed.
|
|
|
|
Additionally there are Icinga 2 features such as the `PerfDataWriter` type
|
|
which use custom attributes to format their output.
|
|
|
|
> **Tip**
|
|
>
|
|
> Custom attributes are identified by the 'vars' dictionary attribute as short name.
|
|
> Accessing the different attribute keys is possible using the '.' accessor.
|
|
|
|
Custom attributes in command definitions or performance data templates are evaluated at
|
|
runtime when executing a command. These custom attributes cannot be used elsewhere
|
|
(e.g. in other configuration attributes).
|
|
|
|
Here is an example of a command definition which uses user-defined custom attributes:
|
|
|
|
object CheckCommand "my-ping" {
|
|
import "plugin-check-command"
|
|
|
|
command = [
|
|
PluginDir + "/check_ping",
|
|
"-4",
|
|
"-H", "$address$",
|
|
"-w", "$ping_wrta$,$ping_wpl$%",
|
|
"-c", "$ping_crta$,$ping_cpl$%",
|
|
"-p", "$ping_packets$",
|
|
"-t", "$ping_timeout$"
|
|
]
|
|
|
|
vars.ping_wrta = 100
|
|
vars.ping_wpl = 5
|
|
vars.ping_crta = 200
|
|
vars.ping_cpl = 15
|
|
vars.ping_packets = 5
|
|
vars.ping_timeout = 0
|
|
}
|
|
|
|
Custom attribute names used at runtime must be enclosed in two `$` signs, e.g.
|
|
`$address$`. When using the `$` sign as single character, you need to escape
|
|
it with an additional dollar sign (`$$`).
|
|
|
|
### <a id="runtime-custom-attributes-evaluation-order"></a> Runtime Custom Attributes Evaluation Order
|
|
|
|
When executing commands Icinga 2 checks the following objects in this order to look
|
|
up custom attributes and their respective values:
|
|
|
|
1. User object (only for notifications)
|
|
2. Service object
|
|
3. Host object
|
|
4. Command object
|
|
5. Global custom attributes in the Vars constant
|
|
|
|
This execution order allows you to define default values for custom attributes
|
|
in your command objects. The `my-ping` command shown above uses this to set
|
|
default values for some of the latency thresholds and timeouts.
|
|
|
|
When using the `my-ping` command you can override all or some of the custom
|
|
attributes in the service definition like this:
|
|
|
|
object Service "ping" {
|
|
host_name = "localhost"
|
|
check_command = "my-ping"
|
|
|
|
vars.ping_packets = 10 // Overrides the default value of 5 given in the command
|
|
}
|
|
|
|
If a custom attribute isn't defined anywhere an empty value is used and a warning is
|
|
emitted to the Icinga 2 log.
|
|
|
|
> **Best Practice**
|
|
>
|
|
> By convention every host should have an `address` attribute. Hosts
|
|
> which have an IPv6 address should also have an `address6` attribute.
|
|
|
|
### <a id="runtime-custom-attribute-env-vars"></a> Runtime Custom Attributes as Environment Variables
|
|
|
|
The `env` command object attribute specifies a list of environment variables with values calculated
|
|
from either runtime macros or custom attributes which should be exported as environment variables
|
|
prior to executing the command.
|
|
|
|
This is useful for example for hiding sensitive information on the command line output
|
|
when passing credentials to database checks:
|
|
|
|
object CheckCommand "mysql-health" {
|
|
import "plugin-check-command",
|
|
|
|
command = PluginDir + "/check_mysql -H $address$ -d $db$",
|
|
|
|
vars.mysql_user = "icinga_check",
|
|
vars.mysql_pass = "password"
|
|
|
|
env.MYSQLUSER = "$mysql_user$",
|
|
env.MYSQLPASS = "$mysql_pass$"
|
|
}
|
|
|
|
### <a id="modified-attributes"></a> Modified Attributes
|
|
|
|
Icinga 2 allows you to modify defined object attributes at runtime different to
|
|
the local configuration object attributes. These modified attributes are
|
|
stored as bit-shifted-value and made available in backends. Icinga 2 stores
|
|
modified attributes in its state file and restores them on restart.
|
|
|
|
Modified Attributes can be reset using external commands.
|
|
|
|
|
|
## <a id="runtime-macros"></a> Runtime Macros
|
|
|
|
Next to custom attributes there are additional runtime macros made available by Icinga 2.
|
|
These runtime macros reflect the current object state and may change over time while
|
|
custom attributes are configured statically (but can be modified at runtime using
|
|
external commands).
|
|
|
|
### <a id="runtime-macro-evaluation-order"></a> Runtime Macro Evaluation Order
|
|
|
|
Custom attributes can be accessed at [runtime](#runtime-custom-attributes) using their
|
|
identifier omitting the `vars.` prefix.
|
|
There are special cases when those custom attributes are not set and Icinga 2 provides
|
|
a fallback to existing object attributes for example `host.address`.
|
|
|
|
In the following example the `$address$` macro will be resolved with the value of `vars.address`.
|
|
|
|
object Host "localhost" {
|
|
import "generic-host"
|
|
check_command = "my-host-macro-test"
|
|
address = "127.0.0.1"
|
|
vars.address = "127.2.2.2"
|
|
}
|
|
|
|
object CheckCommand "my-host-macro-test" {
|
|
command = "echo \"address: $address$ host.address: $host.address$ host.vars.address: $host.vars.address$\""
|
|
}
|
|
|
|
The check command output will look like
|
|
|
|
"address: 127.2.2.2 host.address: 127.0.0.1 host.vars.address: 127.2.2.2"
|
|
|
|
If you alter the host object and remove the `vars.address` line, Icinga 2 will fail to look up `$address$` in the
|
|
custom attributes dictionary and then look for the host object's attribute.
|
|
|
|
The check command output will change to
|
|
|
|
"address: 127.0.0.1 host.address: 127.0.0.1 host.vars.address: "
|
|
|
|
|
|
The same example can be defined for services overriding the `address` field based on a specific host custom attribute.
|
|
|
|
object Host "localhost" {
|
|
import "generic-host"
|
|
address = "127.0.0.1"
|
|
vars.macro_address = "127.3.3.3"
|
|
}
|
|
|
|
apply Service "my-macro-test" to Host {
|
|
import "generic-service"
|
|
check_command = "my-service-macro-test"
|
|
vars.address = "$host.vars.macro_address$"
|
|
|
|
assign where host.address
|
|
}
|
|
|
|
object CheckCommand "my-service-macro-test" {
|
|
command = "echo \"address: $address$ host.address: $host.address$ host.vars.macro_address: $host.vars.macro_address$ service.vars.address: $service.vars.address$\""
|
|
}
|
|
|
|
When the service check is executed the output looks like
|
|
|
|
"address: 127.3.3.3 host.address: 127.0.0.1 host.vars.macro_address: 127.3.3.3 service.vars.address: 127.3.3.3"
|
|
|
|
That way you can easily override existing macros being accessed by their short name like `$address$` and refrain
|
|
from defining multiple check commands (one for `$address$` and one for `$host.vars.macro_address$`).
|
|
|
|
|
|
### <a id="host-runtime-macros"></a> Host Runtime Macros
|
|
|
|
The following host custom attributes are available in all commands that are executed for
|
|
hosts or services:
|
|
|
|
Name | Description
|
|
-----------------------------|--------------
|
|
host.name | The name of the host object.
|
|
host.display_name | The value of the `display_name` attribute.
|
|
host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
|
|
host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
|
|
host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`.
|
|
host.check_attempt | The current check attempt number.
|
|
host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
|
|
host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`.
|
|
host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable).
|
|
host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`.
|
|
host.last_state_change | The last state change's timestamp.
|
|
host.duration_sec | The time since the last state change.
|
|
host.latency | The host's check latency.
|
|
host.execution_time | The host's check execution time.
|
|
host.output | The last check's output.
|
|
host.perfdata | The last check's performance data.
|
|
host.last_check | The timestamp when the last check was executed.
|
|
host.num_services | Number of services associated with the host.
|
|
host.num_services_ok | Number of services associated with the host which are in an `OK` state.
|
|
host.num_services_warning | Number of services associated with the host which are in a `WARNING` state.
|
|
host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state.
|
|
host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state.
|
|
|
|
### <a id="service-runtime-macros"></a> Service Runtime Macros
|
|
|
|
The following service macros are available in all commands that are executed for
|
|
services:
|
|
|
|
Name | Description
|
|
---------------------------|--------------
|
|
service.name | The short name of the service object.
|
|
service.display_name | The value of the `display_name` attribute.
|
|
service.check_command | The short name of the command along with any arguments to be used for the check.
|
|
service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
|
|
service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
|
|
service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`.
|
|
service.check_attempt | The current check attempt number.
|
|
service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state.
|
|
service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`.
|
|
service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown).
|
|
service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`.
|
|
service.last_state_change | The last state change's timestamp.
|
|
service.duration_sec | The time since the last state change.
|
|
service.latency | The service's check latency.
|
|
service.execution_time | The service's check execution time.
|
|
service.output | The last check's output.
|
|
service.perfdata | The last check's performance data.
|
|
service.last_check | The timestamp when the last check was executed.
|
|
|
|
### <a id="command-runtime-macros"></a> Command Runtime Macros
|
|
|
|
The following custom attributes are available in all commands:
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
command.name | The name of the command object.
|
|
|
|
### <a id="user-runtime-macros"></a> User Runtime Macros
|
|
|
|
The following custom attributes are available in all commands that are executed for
|
|
users:
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
user.name | The name of the user object.
|
|
user.display_name | The value of the display_name attribute.
|
|
|
|
### <a id="notification-runtime-macros"></a> Notification Runtime Macros
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
notification.type | The type of the notification.
|
|
notification.author | The author of the notification comment, if existing.
|
|
notification.comment | The comment of the notification, if existing.
|
|
|
|
### <a id="global-runtime-macros"></a> Global Runtime Macros
|
|
|
|
The following macros are available in all executed commands:
|
|
|
|
Name | Description
|
|
-----------------------|--------------
|
|
icinga.timet | Current UNIX timestamp.
|
|
icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000`
|
|
icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08`
|
|
icinga.date | Current date. Example: `2014-01-03`
|
|
icinga.time | Current time including timezone information. Example: `11:23:08 +0000`
|
|
icinga.uptime | Current uptime of the Icinga 2 process.
|
|
|
|
The following macros provide global statistics:
|
|
|
|
Name | Description
|
|
----------------------------------|--------------
|
|
icinga.num_services_ok | Current number of services in state 'OK'.
|
|
icinga.num_services_warning | Current number of services in state 'Warning'.
|
|
icinga.num_services_critical | Current number of services in state 'Critical'.
|
|
icinga.num_services_unknown | Current number of services in state 'Unknown'.
|
|
icinga.num_services_pending | Current number of pending services.
|
|
icinga.num_services_unreachable | Current number of unreachable services.
|
|
icinga.num_services_flapping | Current number of flapping services.
|
|
icinga.num_services_in_downtime | Current number of services in downtime.
|
|
icinga.num_services_acknowledged | Current number of acknowledged service problems.
|
|
icinga.num_hosts_up | Current number of hosts in state 'Up'.
|
|
icinga.num_hosts_down | Current number of hosts in state 'Down'.
|
|
icinga.num_hosts_unreachable | Current number of unreachable hosts.
|
|
icinga.num_hosts_flapping | Current number of flapping hosts.
|
|
icinga.num_hosts_in_downtime | Current number of hosts in downtime.
|
|
icinga.num_hosts_acknowledged | Current number of acknowledged host problems.
|
|
|
|
|
|
## <a id="check-result-freshness"></a> Check Result Freshness
|
|
|
|
In Icinga 2 active check freshness is enabled by default. It is determined by the
|
|
`check_interval` attribute and no incoming check results in that period of time.
|
|
|
|
threshold = last check execution time + check interval
|
|
|
|
Passive check freshness is calculated from the `check_interval` attribute if set.
|
|
|
|
threshold = last check result time + check interval
|
|
|
|
If the freshness checks are invalid, a new check is executed defined by the
|
|
`check_command` attribute.
|
|
|
|
|
|
## <a id="check-flapping"></a> Check Flapping
|
|
|
|
The flapping algorithm used in Icinga 2 does not store the past states but
|
|
calculcates the flapping threshold from a single value based on counters and
|
|
half-life values. Icinga 2 compares the value with a single flapping threshold
|
|
configuration attribute named `flapping_threshold`.
|
|
|
|
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
|
|
|
|
|
|
## <a id="volatile-services"></a> Volatile Services
|
|
|
|
By default all services remain in a non-volatile state. When a problem
|
|
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
|
|
is reached with the check counter, a `HARD` state transition happens.
|
|
Notifications are only triggered by `HARD` state changes and are then
|
|
re-sent defined by the `interval` attribute.
|
|
|
|
It may be reasonable to have a volatile service which stays in a `HARD`
|
|
state type if the service stays in a `NOT-OK` state. That way each
|
|
service recheck will automatically trigger a notification unless the
|
|
service is acknowledged or in a scheduled downtime.
|
|
|
|
|
|
## <a id="external-commands"></a> External Commands
|
|
|
|
Icinga 2 provides an external command pipe for processing commands
|
|
triggering specific actions (for example rescheduling a service check
|
|
through the web interface).
|
|
|
|
In order to enable the `ExternalCommandListener` configuration use the
|
|
following command and restart Icinga 2 afterwards:
|
|
|
|
# icinga2-enable-feature command
|
|
|
|
Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
|
|
using the default configuration.
|
|
|
|
Web interfaces and other Icinga addons are able to send commands to
|
|
Icinga 2 through the external command pipe, for example for rescheduling
|
|
a forced service check:
|
|
|
|
# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
|
|
|
|
# tail -f /var/log/messages
|
|
|
|
Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
|
|
Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
|
|
|
|
By default the command pipe file is owned by the group `icingacmd` with read/write
|
|
permissions. Add your webserver's user to the group `icingacmd` to
|
|
enable sending commands to Icinga 2 through your web interface:
|
|
|
|
# usermod -G -a icingacmd www-data
|
|
|
|
Debian packages use `nagios` as the default user and group name. Therefore change `icingacmd` to
|
|
`nagios`.
|
|
|
|
### <a id="external-command-list"></a> External Command List
|
|
|
|
A list of currently supported external commands can be found [here](#external-commands-list-detail)
|
|
|
|
Detailed information on the commands and their required parameters can be found
|
|
on the [Icinga 1.x documentation](http://docs.icinga.org/latest/en/extcommands2.html).
|
|
|
|
|
|
## <a id="event-handlers"></a> Event Handlers
|
|
|
|
Event handlers are defined as `EventCommand` objects in Icinga 2.
|
|
|
|
Unlike notifications event commands are called on every host/service execution
|
|
if defined. Therefore the `EventCommand` object should define a command line
|
|
evaluating the current service state and other service runtime attributes
|
|
available through runtime macros. Runtime macros such as `$service.state_type$`
|
|
and `$service.state$` will be processed by Icinga 2 helping on fine-granular
|
|
events being triggered.
|
|
|
|
Common use case scenarios are a failing HTTP check requiring an immediate
|
|
restart via event command, or if an application is locked and requires
|
|
a restart upon detection.
|
|
|
|
|
|
## <a id="logging"></a> Logging
|
|
|
|
Icinga 2 supports three different types of logging:
|
|
|
|
* File logging
|
|
* Syslog (on *NIX-based operating systems)
|
|
* Console logging (`STDOUT` on tty)
|
|
|
|
You can enable additional loggers using the `icinga2-enable-feature`
|
|
and `icinga2-disable-feature` commands to configure loggers:
|
|
|
|
Feature | Description
|
|
---------|------------
|
|
debuglog | Debug log (path: `/var/log/icinga2/debug.log`, severity: `debug` or higher)
|
|
mainlog | Main log (path: `/var/log/icinga2/icinga2.log`, severity: `information` or higher)
|
|
syslog | Syslog (severity: `warning` or higher)
|
|
|
|
By default file the `mainlog` feature is enabled. When running Icinga 2
|
|
on a terminal log messages with severity `information` or higher are
|
|
written to the console.
|
|
|
|
|
|
## <a id="performance-data"></a> Performance Data
|
|
|
|
When a host or service check is executed plugins should provide so-called
|
|
`performance data`. Next to that additional check performance data
|
|
can be fetched using Icinga 2 runtime macros such as the check latency
|
|
or the current service state (or additional custom attributes).
|
|
|
|
The performance data can be passed to external applications which aggregate and
|
|
store them in their backends. These tools usually generate graphs for historical
|
|
reporting and trending.
|
|
|
|
Well-known addons processing Icinga performance data are PNP4Nagios,
|
|
inGraph and Graphite.
|
|
|
|
### <a id="writing-performance-data-files"></a> Writing Performance Data Files
|
|
|
|
PNP4Nagios, inGraph and Graphios use performance data collector daemons to fetch
|
|
the current performance files for their backend updates.
|
|
|
|
Therefore the Icinga 2 `PerfdataWriter` object allows you to define
|
|
the output template format for host and services backed with Icinga 2
|
|
runtime vars.
|
|
|
|
host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$"
|
|
service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.description$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.checkcommand$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.statetype$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.statetype$"
|
|
|
|
The default templates are already provided with the Icinga 2 feature configuration
|
|
which can be enabled using
|
|
|
|
# icinga2-enable-feature perfdata
|
|
|
|
By default all performance data files are rotated in a 15 seconds interval into
|
|
the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
|
|
`service-perfdata.<timestamp>`.
|
|
External collectors need to parse the rotated performance data files and then
|
|
remove the processed files.
|
|
|
|
### <a id="graphite-carbon-cache-writer"></a> Graphite Carbon Cache Writer
|
|
|
|
While there are some Graphite collector scripts and daemons like Graphios available for
|
|
Icinga 1.x it's more reasonable to directly process the check and plugin performance
|
|
in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
|
|
write them to the defined Graphite Carbon daemon tcp socket.
|
|
|
|
You can enable the feature using
|
|
|
|
# icinga2-enable-feature graphite
|
|
|
|
By default the `GraphiteWriter` object expects the Graphite Carbon Cache to listen at
|
|
`127.0.0.1` on port `2003`.
|
|
|
|
The current naming schema is
|
|
|
|
icinga.<hostname>.<metricname>
|
|
icinga.<hostname>.<servicename>.<metricname>
|
|
|
|
|
|
|
|
## <a id="status-data"></a> Status Data
|
|
|
|
Icinga 1.x writes object configuration data and status data in a cyclic
|
|
interval to its `objects.cache` and `status.dat` files. Icinga 2 provides
|
|
the `StatusDataWriter` object which dumps all configuration objects and
|
|
status updates in a regular interval.
|
|
|
|
# icinga2-enable-feature statusdata
|
|
|
|
Icinga 1.x Classic UI requires this data set as part of its backend.
|
|
|
|
> **Note**
|
|
>
|
|
> If you are not using any web interface or addon which uses these files
|
|
> you can safely disable this feature.
|
|
|
|
|
|
|
|
## <a id="compat-logging"></a> Compat Logging
|
|
|
|
The Icinga 1.x log format is considered being the `Compat Log`
|
|
in Icinga 2 provided with the `CompatLogger` object.
|
|
|
|
These logs are not only used for informational representation in
|
|
external web interfaces parsing the logs, but also to generate
|
|
SLA reports and trends in Icinga 1.x Classic UI. Futhermore the
|
|
`Livestatus` feature uses these logs for answering queries to
|
|
historical tables.
|
|
|
|
The `CompatLogger` object can be enabled with
|
|
|
|
# icinga2-enable-feature compatlog
|
|
|
|
By default, the Icinga 1.x log file called `icinga.log` is located
|
|
in `/var/log/icinga2/compat`. Rotated log files are moved into
|
|
`var/log/icinga2/compat/archives`.
|
|
|
|
The format cannot be changed without breaking compatibility to
|
|
existing log parsers.
|
|
|
|
# tail -f /var/log/icinga2/compat/icinga.log
|
|
|
|
[1382115688] LOG ROTATION: HOURLY
|
|
[1382115688] LOG VERSION: 2.0
|
|
[1382115688] HOST STATE: CURRENT;localhost;UP;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;disk;WARNING;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;http;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;load;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;ping4;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;ping6;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;processes;WARNING;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;ssh;OK;HARD;1;
|
|
[1382115688] SERVICE STATE: CURRENT;localhost;users;OK;HARD;1;
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;disk;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;http;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;load;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ping6;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;processes;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;ssh;1382115705
|
|
[1382115706] EXTERNAL COMMAND: SCHEDULE_FORCED_SVC_CHECK;localhost;users;1382115705
|
|
[1382115731] EXTERNAL COMMAND: PROCESS_SERVICE_CHECK_RESULT;localhost;ping6;2;critical test|
|
|
[1382115731] SERVICE ALERT: localhost;ping6;CRITICAL;SOFT;2;critical test
|
|
|
|
|
|
|
|
## <a id="check-result-files"></a> Check Result Files
|
|
|
|
Icinga 1.x writes its check result files into a temporary spool directory
|
|
where it reads these check result files in a regular interval from.
|
|
While this is extremly inefficient in performance regards it has been
|
|
rendered useful for passing passive check results directly into Icinga 1.x
|
|
skipping the external command pipe.
|
|
|
|
Several clustered/distributed environments and check-aggregation addons
|
|
use that method. In order to support step-by-step migration of these
|
|
environments, Icinga 2 ships the `CheckResultReader` object.
|
|
|
|
There is no feature configuration available, but it must be defined
|
|
on-demand in your Icinga 2 objects configuration.
|
|
|
|
object CheckResultReader "reader" {
|
|
spool_dir = "/data/check-results"
|
|
}
|
|
|
|
|
|
|
|
|
|
|
|
|