# Monitoring Basics This part of the Icinga 2 documentation provides an overview of all the basic monitoring concepts you need to know to run Icinga 2. Keep in mind these examples are made with a linux server in mind, if you are using Windows you will need to change the services accordingly. See the [ITL reference](7-icinga-template-library.md#windows-plugins) for further information. ## Hosts and Services Icinga 2 can be used to monitor the availability of hosts and services. Hosts and services can be virtually anything which can be checked in some way: * Network services (HTTP, SMTP, SNMP, SSH, etc.) * Printers * Switches / routers * Temperature sensors * Other local or network-accessible services Host objects provide a mechanism to group services that are running on the same physical device. Here is an example of a host object which defines two child services: object Host "my-server1" { address = "10.0.0.1" check_command = "hostalive" } object Service "ping4" { host_name = "my-server1" check_command = "ping4" } object Service "http" { host_name = "my-server1" check_command = "http" } The example creates two services `ping4` and `http` which belong to the host `my-server1`. It also specifies that the host should perform its own check using the `hostalive` check command. The `address` attribute is used by check commands to determine which network address is associated with the host object. Details on troubleshooting check problems can be found [here](16-troubleshooting.md#troubleshooting). ### Host States Hosts can be in any of the following states: Name | Description ------------|-------------- UP | The host is available. DOWN | The host is unavailable. ### Service States Services can be in any of the following states: Name | Description ------------|-------------- OK | The service is working properly. WARNING | The service is experiencing some problems but is still considered to be in working condition. CRITICAL | The service is in a critical state. UNKNOWN | The check could not determine the service's state. ### Hard and Soft States When detecting a problem with a host/service Icinga re-checks the object a number of times (based on the `max_check_attempts` and `retry_interval` settings) before sending notifications. This ensures that no unnecessary notifications are sent for transient failures. During this time the object is in a `SOFT` state. After all re-checks have been executed and the object is still in a non-OK state the host/service switches to a `HARD` state and notifications are sent. Name | Description ------------|-------------- HARD | The host/service's state hasn't recently changed. SOFT | The host/service has recently changed state and is being re-checked. ### Host and Service Checks Hosts and services determine their state by running checks in a regular interval. object Host "router" { check_command = "hostalive" address = "10.0.0.1" } The `hostalive` command is one of several built-in check commands. It sends ICMP echo requests to the IP address specified in the `address` attribute to determine whether a host is online. A number of other [built-in check commands](7-icinga-template-library.md#plugin-check-commands) are also available. In addition to these commands the next few chapters will explain in detail how to set up your own check commands. ## Templates Templates may be used to apply a set of identical attributes to more than one object: template Service "generic-service" { max_check_attempts = 3 check_interval = 5m retry_interval = 1m enable_perfdata = true } apply Service "ping4" { import "generic-service" check_command = "ping4" assign where host.address } apply Service "ping6" { import "generic-service" check_command = "ping6" assign where host.address6 } In this example the `ping4` and `ping6` services inherit properties from the template `generic-service`. Objects as well as templates themselves can import an arbitrary number of other templates. Attributes inherited from a template can be overridden in the object if necessary. You can also import existing non-template objects. Note that templates and objects share the same namespace, i.e. you can't define a template that has the same name like an object. ## Custom Attributes In addition to built-in attributes you can define your own attributes: object Host "localhost" { vars.ssh_port = 2222 } Valid values for custom attributes include: * Strings and numbers * Arrays and dictionaries * Functions ### Functions as Custom Attributes Icinga 2 lets you specify functions for custom attributes. The special case here is that whenever Icinga 2 needs the value for such a custom attribute it runs the function and uses whatever value the function returns: object CheckCommand "random-value" { import "plugin-check-command" command = [ PluginDir + "/check_dummy", "0", "$text$" ] vars.text = {{ Math.random() * 100 }} } This example uses the [abbreviated lambda syntax](19-language-reference.md#nullary-lambdas). These functions have access to a number of variables: Variable | Description -------------|--------------- user | The User object (for notifications). service | The Service object (for service checks/notifications/event handlers). host | The Host object. command | The command object (e.g. a CheckCommand object for checks). Here's an example: vars.text = {{ host.check_interval }} In addition to these variables the `macro` function can be used to retrieve the value of arbitrary macro expressions: vars.text = {{ if (macro("$address$") == "127.0.0.1") { log("Running a check for localhost!") } return "Some text" }} Acessing object attributes at runtime inside these functions is described in the [advanced topics](4-advanced-topics.md#access-object-attributes-at-runtime) chapter. ## Runtime Macros Macros can be used to access other objects' attributes at runtime. For example they are used in command definitions to figure out which IP address a check should be run against: object CheckCommand "my-ping" { import "plugin-check-command" command = [ PluginDir + "/check_ping", "-H", "$ping_address$" ] arguments = { "-w" = "$ping_wrta$,$ping_wpl$%" "-c" = "$ping_crta$,$ping_cpl$%" "-p" = "$ping_packets$" } vars.ping_address = "$address$" vars.ping_wrta = 100 vars.ping_wpl = 5 vars.ping_crta = 250 vars.ping_cpl = 10 vars.ping_packets = 5 } object Host "router" { check_command = "my-ping" address = "10.0.0.1" } In this example we are using the `$address$` macro to refer to the host's `address` attribute. We can also directly refer to custom attributes, e.g. by using `$ping_wrta$`. Icinga automatically tries to find the closest match for the attribute you specified. The exact rules for this are explained in the next section. ### Evaluation Order When executing commands Icinga 2 checks the following objects in this order to look up macros and their respective values: 1. User object (only for notifications) 2. Service object 3. Host object 4. Command object 5. Global custom attributes in the `Vars` constant This execution order allows you to define default values for custom attributes in your command objects. Here's how you can override the custom attribute `ping_packets` from the previous example: object Service "ping" { host_name = "localhost" check_command = "my-ping" vars.ping_packets = 10 // Overrides the default value of 5 given in the command } If a custom attribute isn't defined anywhere an empty value is used and a warning is written to the Icinga 2 log. You can also directly refer to a specific attribute - thereby ignoring these evaluation rules - by specifying the full attribute name: $service.vars.ping_wrta$ This retrieves the value of the `ping_wrta` custom attribute for the service. This returns an empty value if the service does not have such a custom attribute no matter whether another object such as the host has this attribute. ### Host Runtime Macros The following host custom attributes are available in all commands that are executed for hosts or services: Name | Description -----------------------------|-------------- host.name | The name of the host object. host.display_name | The value of the `display_name` attribute. host.state | The host's current state. Can be one of `UNREACHABLE`, `UP` and `DOWN`. host.state_id | The host's current state. Can be one of `0` (up), `1` (down) and `2` (unreachable). host.state_type | The host's current state type. Can be one of `SOFT` and `HARD`. host.check_attempt | The current check attempt number. host.max_check_attempts | The maximum number of checks which are executed before changing to a hard state. host.last_state | The host's previous state. Can be one of `UNREACHABLE`, `UP` and `DOWN`. host.last_state_id | The host's previous state. Can be one of `0` (up), `1` (down) and `2` (unreachable). host.last_state_type | The host's previous state type. Can be one of `SOFT` and `HARD`. host.last_state_change | The last state change's timestamp. host.downtime_depth | The number of active downtimes. host.duration_sec | The time since the last state change. host.latency | The host's check latency. host.execution_time | The host's check execution time. host.output | The last check's output. host.perfdata | The last check's performance data. host.last_check | The timestamp when the last check was executed. host.check_source | The monitoring instance that performed the last check. host.num_services | Number of services associated with the host. host.num_services_ok | Number of services associated with the host which are in an `OK` state. host.num_services_warning | Number of services associated with the host which are in a `WARNING` state. host.num_services_unknown | Number of services associated with the host which are in an `UNKNOWN` state. host.num_services_critical | Number of services associated with the host which are in a `CRITICAL` state. ### Service Runtime Macros The following service macros are available in all commands that are executed for services: Name | Description ---------------------------|-------------- service.name | The short name of the service object. service.display_name | The value of the `display_name` attribute. service.check_command | The short name of the command along with any arguments to be used for the check. service.state | The service's current state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`. service.state_id | The service's current state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown). service.state_type | The service's current state type. Can be one of `SOFT` and `HARD`. service.check_attempt | The current check attempt number. service.max_check_attempts | The maximum number of checks which are executed before changing to a hard state. service.last_state | The service's previous state. Can be one of `OK`, `WARNING`, `CRITICAL` and `UNKNOWN`. service.last_state_id | The service's previous state. Can be one of `0` (ok), `1` (warning), `2` (critical) and `3` (unknown). service.last_state_type | The service's previous state type. Can be one of `SOFT` and `HARD`. service.last_state_change | The last state change's timestamp. service.downtime_depth | The number of active downtimes. service.duration_sec | The time since the last state change. service.latency | The service's check latency. service.execution_time | The service's check execution time. service.output | The last check's output. service.perfdata | The last check's performance data. service.last_check | The timestamp when the last check was executed. service.check_source | The monitoring instance that performed the last check. ### Command Runtime Macros The following custom attributes are available in all commands: Name | Description -----------------------|-------------- command.name | The name of the command object. ### User Runtime Macros The following custom attributes are available in all commands that are executed for users: Name | Description -----------------------|-------------- user.name | The name of the user object. user.display_name | The value of the display_name attribute. ### Notification Runtime Macros Name | Description -----------------------|-------------- notification.type | The type of the notification. notification.author | The author of the notification comment, if existing. notification.comment | The comment of the notification, if existing. ### Global Runtime Macros The following macros are available in all executed commands: Name | Description -----------------------|-------------- icinga.timet | Current UNIX timestamp. icinga.long_date_time | Current date and time including timezone information. Example: `2014-01-03 11:23:08 +0000` icinga.short_date_time | Current date and time. Example: `2014-01-03 11:23:08` icinga.date | Current date. Example: `2014-01-03` icinga.time | Current time including timezone information. Example: `11:23:08 +0000` icinga.uptime | Current uptime of the Icinga 2 process. The following macros provide global statistics: Name | Description ----------------------------------|-------------- icinga.num_services_ok | Current number of services in state 'OK'. icinga.num_services_warning | Current number of services in state 'Warning'. icinga.num_services_critical | Current number of services in state 'Critical'. icinga.num_services_unknown | Current number of services in state 'Unknown'. icinga.num_services_pending | Current number of pending services. icinga.num_services_unreachable | Current number of unreachable services. icinga.num_services_flapping | Current number of flapping services. icinga.num_services_in_downtime | Current number of services in downtime. icinga.num_services_acknowledged | Current number of acknowledged service problems. icinga.num_hosts_up | Current number of hosts in state 'Up'. icinga.num_hosts_down | Current number of hosts in state 'Down'. icinga.num_hosts_unreachable | Current number of unreachable hosts. icinga.num_hosts_flapping | Current number of flapping hosts. icinga.num_hosts_in_downtime | Current number of hosts in downtime. icinga.num_hosts_acknowledged | Current number of acknowledged host problems. ## Apply Rules Instead of assigning each object ([Service](6-object-types.md#objecttype-service), [Notification](6-object-types.md#objecttype-notification), [Dependency](6-object-types.md#objecttype-dependency), [ScheduledDowntime](6-object-types.md#objecttype-scheduleddowntime)) based on attribute identifiers for example `host_name` objects can be [applied](19-language-reference.md#apply). Before you start using the apply rules keep the following in mind: * Define the best match. * A set of unique [custom attributes](#custom-attributes-apply) for these hosts/services? * Or [group](3-monitoring-basics.md#groups) memberships, e.g. a host being a member of a hostgroup, applying services to it? * A generic pattern [match](19-language-reference.md#function-calls) on the host/service name? * [Multiple expressions combined](3-monitoring-basics.md#using-apply-expressions) with `&&` or `||` [operators](19-language-reference.md#expression-operators) * All expressions must return a boolean value (an empty string is equal to `false` e.g.) > **Note** > > You can set/override object attributes in apply rules using the respectively available > objects in that scope (host and/or service objects). [Custom attributes](3-monitoring-basics.md#custom-attributes) can also store nested dictionaries and arrays. That way you can use them for not only matching for their existance or values in apply expressions, but also assign ("inherit") their values into the generated objected from apply rules. * [Apply services to hosts](3-monitoring-basics.md#using-apply-services) * [Apply notifications to hosts and services](3-monitoring-basics.md#using-apply-notifications) * [Apply dependencies to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes) * [Apply scheduled downtimes to hosts and services](3-monitoring-basics.md#using-apply-scheduledowntimes) A more advanced example is using [apply with for loops on arrays or dictionaries](#using-apply-for) for example provided by [custom atttributes](#custom-attributes-apply) or groups. > **Tip** > > Building configuration in that dynamic way requires detailed information > of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object) > after successful [configuration validation](8-cli-commands.md#config-validation). ### Apply Rules Expressions You can use simple or advanced combinations of apply rule expressions. Each expression must evaluate into the boolean `true` value. An empty string will be for instance interpreted as `false`. In a similar fashion undefined attributes will return `false`. Returns `false`: assign where host.vars.attribute_does_not_exist Multiple `assign where` condition rows are evaluated as `OR` condition. You can combine multiple expressions for matching only a subset of objects. In some cases, you want to be able to add more than one assign/ignore where expression which matches a specific condition. To achieve this you can use the logical `and` and `or` operators. Match all `*mysql*` patterns in the host name and (`&&`) custom attribute `prod_mysql_db` matches the `db-*` pattern. All hosts with the custom attribute `test_server` set to `true` should be ignored, or any host name ending with `*internal` pattern. object HostGroup "mysql-server" { display_name = "MySQL Server" assign where match("*mysql*", host.name) && match("db-*", host.vars.prod_mysql_db) ignore where host.vars.test_server == true ignore where match("*internal", host.name) } Similar example for advanced notification apply rule filters: If the service attribute `notes` contains the `has gold support 24x7` string `AND` one of the two condition passes: Either the `customer` host custom attribute is set to `customer-xy` `OR` the host custom attribute `always_notify` is set to `true`. The notification is ignored for services whose host name ends with `*internal` `OR` the `priority` custom attribute is [less than](19-language-reference.md#expression-operators) `2`. template Notification "cust-xy-notification" { users = [ "noc-xy", "mgmt-xy" ] command = "mail-service-notification" } apply Notification "notify-cust-xy-mysql" to Service { import "cust-xy-notification" assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true) ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) } ### Apply Services to Hosts The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf) and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case. The example for `ssh` applies a service object to all hosts with the `address` attribute being defined and the custom attribute `os` set to the string `Linux` in `vars`. apply Service "ssh" { import "generic-service" check_command = "ssh" assign where host.address && host.vars.os == "Linux" } Other detailed scenario examples are used in their respective chapters, for example [apply services with custom command arguments](#using-apply-services-command-arguments). ### Apply Notifications to Hosts and Services Notifications are applied to specific targets (`Host` or `Service`) and work in a similar manner: apply Notification "mail-noc" to Service { import "mail-service-notification" user_groups = [ "noc" ] assign where host.vars.notification.mail } In this example the `mail-noc` notification will be created as object for all services having the `notification.mail` custom attribute defined. The notification command is set to `mail-service-notification` and all members of the user group `noc` will get notified. ### Apply Dependencies to Hosts and Services Detailed examples can be found in the [dependencies](3-monitoring-basics.md#dependencies) chapter. ### Apply Recurring Downtimes to Hosts and Services The sample confituration includes an example in [downtimes.conf](5-configuring-icinga-2.md#downtimes-conf). Detailed examples can be found in the [recurring downtimes](4-advanced-topics.md#recurring-downtimes) chapter. ### Using Apply For Rules Next to the standard way of using [apply rules](3-monitoring-basics.md#using-apply) there is the requirement of generating apply rules objects based on set (array or dictionary). The sample configuration already includes a detailed example in [hosts.conf](5-configuring-icinga-2.md#hosts-conf) and [services.conf](5-configuring-icinga-2.md#services-conf) for this use case. Take the following example: A host provides the snmp oids for different service check types. This could look like the following example: object Host "router-v6" { check_command = "hostalive" address6 = "::1" vars.oids["if01"] = "1.1.1.1.1" vars.oids["temp"] = "1.1.1.1.2" vars.oids["bgp"] = "1.1.1.1.5" } Now we want to create service checks for `if01` and `temp` but not `bgp`. Furthermore we want to pass the snmp oid stored as dictionary value to the custom attribute called `vars.snmp_oid` - this is the command argument required by the [snmp](7-icinga-template-library.md#plugin-check-command-snmp) check command. The service's `display_name` should be set to the identifier inside the dictionary. apply Service for (identifier => oid in host.vars.oids) { check_command = "snmp" display_name = identifier vars.snmp_oid = oid ignore where identifier == "bgp" //don't generate service for bgp checks } Icinga 2 evalatues the `apply for` rule for all objects with the custom attribute `oids` set. It then iterates over all list items inside the `for` loop and evaluates the `assign/ignore where` expressions. You can access the loop variable in these expressions, e.g. for ignoring certain values. In this example we'd ignore the `bgp` identifier and avoid generating an unwanted service. We could extend the configuration by also matching the `oid` value on certain regex/wildcard patterns for example. > **Note** > > You don't need an `assign where` expression only checking for existance > of the custom attribute. That way you'll save duplicated apply rules by combining them into one generic `apply for` rule generating the object name with or without a prefix. #### Apply For and Custom Attribute Override Imagine a different more advanced example: You are monitoring your switch (hosts) with many interfaces (services). The following requirements/problems apply: * Each interface service check should be named with a prefix and a running number * Each interface has its own vlan tag * Some interfaces have QoS enabled * Additional attributes such as `display_name` or `notes, `notes_url` and `action_url` must be dynamically generated By defining the `interfaces` dictionary with three example interfaces on the `core-switch` host object, you'll make sure to pass the storage required by the for loop in the service apply rule. object Host "core-switch" { import "generic-host" address = "127.0.0.1" vars.interfaces["0"] = { port = 1 vlan = "internal" address = "127.0.0.2" qos = "enabled" } vars.interfaces["1"] = { port = 2 vlan = "mgmt" address = "127.0.1.2" } vars.interfaces["2"] = { port = 3 vlan = "remote" address = "127.0.2.2" } } You can also omit the `"if-"` string, then all generated service names are directly taken from the `if_name` variable value. The config dictionary contains all key-value pairs for the specific interface in one loop cycle, like `port`, `vlan`, `address` and `qos` for the `0` interface. By defining a default value for the custom attribute `qos` in the `vars` dictionary before adding the `config` dictionary we'll ensure that this attribute is always defined. After `vars` is fully populated, all object attributes can be set. For strings, you can use string concatention with the `+` operator. You can also specifiy the check command that way. apply Service "if-" for (if_name => config in host.vars.interfaces) { import "generic-service" check_command = "ping4" vars.qos = "disabled" vars += config display_name = "if-" + if_name + "-" + vars.vlan notes = "Interface check for Port " + string(vars.port) + " in VLAN " + vars.vlan + " on Address " + vars.address + " QoS " + vars.qos notes_url = "http://foreman.company.com/hosts/" + host.name action_url = "http://snmp.checker.company.com/" + host.name + "if-" + if_name } Note that numbers must be explicitely casted to string when adding to strings. This can be achieved by wrapping them into the [string()](19-language-reference.md#function-calls) function. > **Tip** > > Building configuration in that dynamic way requires detailed information > of the generated objects. Use the `object list` [CLI command](8-cli-commands.md#cli-command-object) > after successful [configuration validation](8-cli-commands.md#config-validation). ### Use Object Attributes in Apply Rules Since apply rules are evaluated after the generic objects, you can reference existing host and/or service object attributes as values for any object attribute specified in that apply rule. object Host "opennebula-host" { import "generic-host" address = "10.1.1.2" vars.hosting["xyz"] = { http_uri = "/shop" customer_name = "Customer xyz" customer_id = "7568" support_contract = "gold" } vars.hosting["abc"] = { http_uri = "/shop" customer_name = "Customer xyz" customer_id = "7568" support_contract = "silver" } } apply Service for (customer => config in host.vars.hosting) { import "generic-service" check_command = "ping4" vars.qos = "disabled" vars += config vars.http_uri = "/" + vars.customer + "/" + config.http_uri display_name = "Shop Check for " + vars.customer_name + "-" + vars.customer_id notes = "Support contract: " + vars.support_contract + " for Customer " + vars.customer_name + " (" + vars.customer_id + ")." notes_url = "http://foreman.company.com/hosts/" + host.name action_url = "http://snmp.checker.company.com/" + host.name + "/" + vars.customer_id } ## Groups A group is a collection of similar objects. Groups are primarily used as a visualization aid in web interfaces. Group membership is defined at the respective object itself. If you have a hostgroup name `windows` for example, and want to assign specific hosts to this group for later viewing the group on your alert dashboard, first create a HostGroup object: object HostGroup "windows" { display_name = "Windows Servers" } Then add your hosts to this group: template Host "windows-server" { groups += [ "windows" ] } object Host "mssql-srv1" { import "windows-server" vars.mssql_port = 1433 } object Host "mssql-srv2" { import "windows-server" vars.mssql_port = 1433 } This can be done for service and user groups the same way: object UserGroup "windows-mssql-admins" { display_name = "Windows MSSQL Admins" } template User "generic-windows-mssql-users" { groups += [ "windows-mssql-admins" ] } object User "win-mssql-noc" { import "generic-windows-mssql-users" email = "noc@example.com" } object User "win-mssql-ops" { import "generic-windows-mssql-users" email = "ops@example.com" } ### Group Membership Assign Instead of manually assigning each object to a group you can also assign objects to a group based on their attributes: object HostGroup "prod-mssql" { display_name = "Production MSSQL Servers" assign where host.vars.mssql_port && host.vars.prod_mysql_db ignore where host.vars.test_server == true ignore where match("*internal", host.name) } In this example all hosts with the `vars` attribute `mssql_port` will be added as members to the host group `mssql`. However, all `*internal` hosts or with the `test_server` attribute set to `true` are not added to this group. Details on the `assign where` syntax can be found in the [Language Reference](19-language-reference.md#apply) ## Notifications Notifications for service and host problems are an integral part of your monitoring setup. When a host or service is in a downtime, a problem has been acknowledged or the dependency logic determined that the host/service is unreachable, no notifications are sent. You can configure additional type and state filters refining the notifications being actually sent. There are many ways of sending notifications, e.g. by e-mail, XMPP, IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications. Instead it relies on external mechanisms such as shell scripts to notify users. A notification specification requires one or more users (and/or user groups) who will be notified in case of problems. These users must have all custom attributes defined which will be used in the `NotificationCommand` on execution. The user `icingaadmin` in the example below will get notified only on `WARNING` and `CRITICAL` states and `problem` and `recovery` notification types. object User "icingaadmin" { display_name = "Icinga 2 Admin" enable_notifications = true states = [ OK, Warning, Critical ] types = [ Problem, Recovery ] email = "icinga@localhost" } If you don't set the `states` and `types` configuration attributes for the `User` object, notifications for all states and types will be sent. Details on troubleshooting notification problems can be found [here](16-troubleshooting.md#troubleshooting). > **Note** > > Make sure that the [notification](8-cli-commands.md#features) feature is enabled > in order to execute notification commands. You should choose which information you (and your notified users) are interested in case of emergency, and also which information does not provide any value to you and your environment. An example notification command is explained [here](3-monitoring-basics.md#notification-commands). You can add all shared attributes to a `Notification` template which is inherited to the defined notifications. That way you'll save duplicated attributes in each `Notification` object. Attributes can be overridden locally. template Notification "generic-notification" { interval = 15m command = "mail-service-notification" states = [ Warning, Critical, Unknown ] types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] period = "24x7" } The time period `24x7` is included as example configuration with Icinga 2. Use the `apply` keyword to create `Notification` objects for your services: apply Notification "notify-cust-xy-mysql" to Service { import "generic-notification" users = [ "noc-xy", "mgmt-xy" ] assign where match("*has gold support 24x7*", service.notes) && (host.vars.customer == "customer-xy" || host.vars.always_notify == true ignore where match("*internal", host.name) || (service.vars.priority < 2 && host.vars.is_clustered == true) } Instead of assigning users to notifications, you can also add the `user_groups` attribute with a list of user groups to the `Notification` object. Icinga 2 will send notifications to all group members. > **Note** > > Only users who have been notified of a problem before (`Warning`, `Critical`, `Unknown` > states for services, `Down` for hosts) will receive `Recovery` notifications. ### Notification Escalations When a problem notification is sent and a problem still exists at the time of re-notification you may want to escalate the problem to the next support level. A different approach is to configure the default notification by email, and escalate the problem via SMS if not already solved. You can define notification start and end times as additional configuration attributes making the `Notification` object a so-called `notification escalation`. Using templates you can share the basic notification attributes such as users or the `interval` (and override them for the escalation then). Using the example from above, you can define additional users being escalated for SMS notifications between start and end time. object User "icinga-oncall-2nd-level" { display_name = "Icinga 2nd Level" vars.mobile = "+1 555 424642" } object User "icinga-oncall-1st-level" { display_name = "Icinga 1st Level" vars.mobile = "+1 555 424642" } Define an additional [NotificationCommand](#notification) for SMS notifications. > **Note** > > The example is not complete as there are many different SMS providers. > Please note that sending SMS notifications will require an SMS provider > or local hardware with a SIM card active. object NotificationCommand "sms-notification" { command = [ PluginDir + "/send_sms_notification", "$mobile$", "..." } The two new notification escalations are added onto the local host and its service `ping4` using the `generic-notification` template. The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification` command) after `30m` until `1h`. > **Note** > > The `interval` was set to 15m in the `generic-notification` > template example. Lower that value in your escalations by using a secondary > template or by overriding the attribute directly in the `notifications` array > position for `escalation-sms-2nd-level`. If the problem does not get resolved nor acknowledged preventing further notifications the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was notified, but only for one hour (`2h` as `end` key for the `times` dictionary). apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] assign where service.name == "ping4" } apply Notification "escalation-sms-2nd-level" to Service { import "generic-notification" command = "sms-notification" users = [ "icinga-oncall-2nd-level" ] times = { begin = 30m end = 1h } assign where service.name == "ping4" } apply Notification "escalation-sms-1st-level" to Service { import "generic-notification" command = "sms-notification" users = [ "icinga-oncall-1st-level" ] times = { begin = 1h end = 2h } assign where service.name == "ping4" } ### Notification Delay Sometimes the problem in question should not be notified when the notification is due (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to postpone the notification window for 15 minutes. Leave out the `end` key - if not set, Icinga 2 will not check against any end time for this notification. Make sure to specify a relatively low notification `interval` to get notified soon enough again. apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] interval = 5m times.begin = 15m // delay notification window assign where service.name == "ping4" } ### Disable Re-notifications If you prefer to be notified only once, you can disable re-notifications by setting the `interval` attribute to `0`. apply Notification "notify-once" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] interval = 0 // disable re-notification assign where service.name == "ping4" } ### Notification Filters by State and Type If there are no notification state and type filter attributes defined at the `Notification` or `User` object Icinga 2 assumes that all states and types are being notified. Available state and type filters for notifications are: template Notification "generic-notification" { states = [ Warning, Critical, Unknown ] types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] } If you are familiar with Icinga 1.x `notification_options` please note that they have been split into type and state to allow more fine granular filtering for example on downtimes and flapping. You can filter for acknowledgements and custom notifications too. ## Commands Icinga 2 uses three different command object types to specify how checks should be performed, notifications should be sent, and events should be handled. ### Check Commands [CheckCommand](6-object-types.md#objecttype-checkcommand) objects define the command line how a check is called. [CheckCommand](6-object-types.md#objecttype-checkcommand) objects are referenced by [Host](6-object-types.md#objecttype-host) and [Service](6-object-types.md#objecttype-service) objects using the `check_command` attribute. > **Note** > > Make sure that the [checker](8-cli-commands.md#features) feature is enabled in order to > execute checks. #### Integrate the Plugin with a CheckCommand Definition [CheckCommand](6-object-types.md#objecttype-checkcommand) objects require the [ITL template](7-icinga-template-library.md#itl-plugin-check-command) `plugin-check-command` to support native plugin based check methods. Unless you have done so already, download your check plugin and put it into the [PluginDir](5-configuring-icinga-2.md#constants-conf) directory. The following example uses the `check_disk` plugin contained in the Monitoring Plugins package. The plugin path and all command arguments are made a list of double-quoted string arguments for proper shell escaping. Call the `check_disk` plugin with the `--help` parameter to see all available options. Our example defines warning (`-w`) and critical (`-c`) thresholds for the disk usage. Without any partition defined (`-p`) it will check all local partitions. icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help ... This plugin checks the amount of used disk space on a mounted file system and generates an alert if free space is less than one of the threshold values Usage: check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device} [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ] [-t timeout] [-u unit] [-v] [-X type] [-N type] ... > **Note** > > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us. Next step is to understand how command parameters are being passed from a host or service object, and add a [CheckCommand](6-object-types.md#objecttype-checkcommand) definition based on these required parameters and/or default values. #### Passing Check Command Parameters from Host or Service Check command parameters are defined as custom attributes which can be accessed as runtime macros by the executed check command. Define the default check command custom attribute `disk_wfree` and `disk_cfree` (freely definable naming schema) and their default threshold values. You can then use these custom attributes as runtime macros for [command arguments](3-monitoring-basics.md#command-arguments) on the command line. > **Tip** > > Use a common command type as prefix for your command arguments to increase > readability. `disk_wfree` helps understanding the context better than just > `wfree` as argument. The default custom attributes can be overridden by the custom attributes defined in the service using the check command `my-disk`. The custom attributes can also be inherited from a parent template using additive inheritance (`+=`). object CheckCommand "my-disk" { import "plugin-check-command" command = [ PluginDir + "/check_disk" ] arguments = { "-w" = { value = "$disk_wfree$" description = "Exit with WARNING status if less than INTEGER units of disk are free or Exit with WARNING status if less than PERCENT of disk space is free" required = true } "-c" = { value = "$disk_cfree$" description = "Exit with CRITICAL status if less than INTEGER units of disk are free or Exit with CRITCAL status if less than PERCENT of disk space is free" required = true } "-W" = { value = "$disk_inode_wfree$" description = "Exit with WARNING status if less than PERCENT of inode space is free" } "-K" = { value = "$disk_inode_cfree$" description = "Exit with CRITICAL status if less than PERCENT of inode space is free" } "-p" = { value = "$disk_partitions$" description = "Path or partition (may be repeated)" repeat_key = true order = 1 } "-x" = { value = "$disk_partitions_excluded$" description = "Ignore device (only works if -p unspecified)" } } vars.disk_wfree = "20%" vars.disk_cfree = "10%" } > **Note** > > A proper example for the `check_disk` plugin is already shipped with Icinga 2 > ready to use with the [plugin check commands](7-icinga-template-library.md#plugin-check-command-disk). The host `localhost` with the applied service `basic-partitions` checks a basic set of disk partitions with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%` free disk space). The custom attribute `disk_partition` can either hold a single string or an array of string values for passing multiple partitions to the `check_disk` check plugin. object Host "my-server" { import "generic-host" address = "127.0.0.1" address6 = "::1" vars.local_disks["basic-partitions"] = { disk_partitions = [ "/", "/tmp", "/var", "/home" ] } } apply Service for (disk => config in host.vars.local_disks) { import "generic-service" check_command = "my-disk" vars += config vars.disk_wfree = "10%" vars.disk_cfree = "5%" } More details on using arrays in custom attributes can be found in [this chapter](#runtime-custom-attributes). #### Command Arguments By defining a check command line using the `command` attribute Icinga 2 will resolve all macros in the static string or array. Sometimes it is required to extend the arguments list based on a met condition evaluated at command execution. Or making arguments optional - only set if the macro value can be resolved by Icinga 2. object CheckCommand "check_http" { import "plugin-check-command" command = [ PluginDir + "/check_http" ] arguments = { "-H" = "$http_vhost$" "-I" = "$http_address$" "-u" = "$http_uri$" "-p" = "$http_port$" "-S" = { set_if = "$http_ssl$" } "--sni" = { set_if = "$http_sni$" } "-a" = { value = "$http_auth_pair$" description = "Username:password on sites with basic authentication" } "--no-body" = { set_if = "$http_ignore_body$" } "-r" = "$http_expect_body_regex$" "-w" = "$http_warn_time$" "-c" = "$http_critical_time$" "-e" = "$http_expect$" } vars.http_address = "$address$" vars.http_ssl = false vars.http_sni = false } The example shows the `check_http` check command defining the most common arguments. Each of them is optional by default and will be omitted if the value is not set. For example if the service calling the check command does not have `vars.http_port` set, it won't get added to the command line. If the `vars.http_ssl` custom attribute is set in the service, host or command object definition, Icinga 2 will add the `-S` argument based on the `set_if` numeric value to the command line. String values are not supported. If the macro value cannot be resolved, Icinga 2 will not add the defined argument to the final command argument array. Empty strings for macro values won't omit the argument. That way you can use the `check_http` command definition for both, with and without SSL enabled checks saving you duplicated command definitions. Details on all available options can be found in the [CheckCommand object definition](6-object-types.md#objecttype-checkcommand). #### Environment Variables The `env` command object attribute specifies a list of environment variables with values calculated from either runtime macros or custom attributes which should be exported as environment variables prior to executing the command. This is useful for example for hiding sensitive information on the command line output when passing credentials to database checks: object CheckCommand "mysql-health" { import "plugin-check-command" command = [ PluginDir + "/check_mysql" ] arguments = { "-H" = "$mysql_address$" "-d" = "$mysql_database$" } vars.mysql_address = "$address$" vars.mysql_database = "icinga" vars.mysql_user = "icinga_check" vars.mysql_pass = "password" env.MYSQLUSER = "$mysql_user$" env.MYSQLPASS = "$mysql_pass$" } ### Notification Commands [NotificationCommand](6-object-types.md#objecttype-notificationcommand) objects define how notifications are delivered to external interfaces (E-Mail, XMPP, IRC, Twitter, etc). [NotificationCommand](6-object-types.md#objecttype-notificationcommand) objects are referenced by [Notification](6-object-types.md#objecttype-notification) objects using the `command` attribute. `NotificationCommand` objects require the [ITL template](7-icinga-template-library.md#itl-plugin-notification-command) `plugin-notification-command` to support native plugin-based notifications. > **Note** > > Make sure that the [notification](8-cli-commands.md#features) feature is enabled > in order to execute notification commands. Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for the current check output) sending an email to the user(s) associated with the notification itself (`$user.email$`). If you want to specify default values for some of the custom attribute definitions, you can add a `vars` dictionary as shown for the `CheckCommand` object. object NotificationCommand "mail-service-notification" { import "plugin-notification-command" command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ] env = { NOTIFICATIONTYPE = "$notification.type$" SERVICEDESC = "$service.name$" HOSTALIAS = "$host.display_name$" HOSTADDRESS = "$address$" SERVICESTATE = "$service.state$" LONGDATETIME = "$icinga.long_date_time$" SERVICEOUTPUT = "$service.output$" NOTIFICATIONAUTHORNAME = "$notification.author$" NOTIFICATIONCOMMENT = "$notification.comment$" HOSTDISPLAYNAME = "$host.display_name$" SERVICEDISPLAYNAME = "$service.display_name$" USEREMAIL = "$user.email$" } } The command attribute in the `mail-service-notification` command refers to the following shell script. The macros specified in the `env` array are exported as environment variables and can be used in the notification script: #!/usr/bin/env bash template=$(cat <