Documentation: Move notifications/dependencies up.

2014-05-22 20:14:26 +02:00 · 2014-05-22 20:14:26 +02:00 · 110b7b4a53
parent 2027b301a7
commit 110b7b4a53
1 changed files with 280 additions and 281 deletions
--- a/doc/3-monitoring-basics.md
+++ b/doc/3-monitoring-basics.md
@ -218,6 +218,211 @@ Details on the `assign where` syntax can be found [here](#group-assign)
 In this inherited example from above all hosts with the `var` `mssql_port`
 set will be added as members to the host group `mssql`.
 ## <a id="notifications"></a> Notifications
 Notifications for service and host problems are an integral part of your
 monitoring setup.
 When a host or service is in a downtime, a problem has been acknowledged or
 the dependency logic determined that the host/service is unreachable, no
 notirications are sent. You can configure additional type and state filters
 refining the notifications being actually sent.
 There are many ways of sending notifications, e.g. by e-mail, XMPP,
 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
 Instead it relies on external mechanisms such as shell scripts to notify users.
 A notification specification requires one or more users (and/or user groups)
 who will be notified in case of problems. These users must have all custom
 attributes defined which will be used in the `NotificationCommand` on execution.
 The user `icingaadmin` in the example below will get notified only on `WARNING` and
 `CRITICAL` states and `problem` and `recovery` notification types.
    object User "icingaadmin" {
      display_name = "Icinga 2 Admin"
      enable_notifications = true
      states = [ OK, Warning, Critical ]
      types = [ Problem, Recovery ]
      email = "icinga@localhost"
    }
 If you don't set the `states` and `types`
 configuration attributes for the `User` object, notifications for all states and types
 will be sent.
 You should choose which information you (and your notified users) are interested in
 case of emergency, and also which information does not provide any value to you and
 your environment.
 An example notification command is explained [here](#notification-commands).
 You can add all shared attributes to a `Notification` template which is inherited
 to the defined notifications. That way you'll save duplicated attributes in each
 `Notification` object. Attributes can be overridden locally.
    template Notification "generic-notification" {
      interval = 15m
      command = "mail-service-notification"
      states = [ Warning, Critical, Unknown ]
      types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
                FlappingEnd, DowntimeStart,DowntimeEnd, DowntimeRemoved ]
      period = "24x7"
    }
 The time period `24x7` is shipped as example configuration with Icinga 2.
 Use the `apply` keyword to create `Notification` objects for your services:
    apply Notification "mail" to Service {
      import "generic-notification"
      command = "mail-notification"
      users = [ "icingaadmin" ]
      assign where service.name == "mysql"
    }
 Instead of assigning users to notifications, you can also add the `user_groups`
 attribute with a list of user groups to the `Notification` object. Icinga 2 will
 send notifications to all group members.
 ### <a id="notification-escalations"></a> Notification Escalations
 When a problem notification is sent and a problem still exists after re-notification
 you may want to escalate the problem to the next support level. A different approach
 is to configure the default notification by email, and escalate the problem via sms
 if not already solved.
 You can define notification start and end times as additional configuration
 attributes making the `Notification` object a so-called `notification escalation`.
 Using templates you can share the basic notification attributes such as users or the
 `interval` (and override them for the escalation then).
 Using the example from above, you can define additional users being escalated for sms
 notifications between start and end time.
    object User "icinga-oncall-2nd-level" {
      display_name = "Icinga 2nd Level"
      vars.mobile = "+1 555 424642"
    }
    object User "icinga-oncall-1st-level" {
      display_name = "Icinga 1st Level"
      vars.mobile = "+1 555 424642"
    }
 Define an additional `NotificationCommand` for SMS notifications.
 > **Note**
 >
 > The example is not complete as there are many different SMS providers.
 > Please note that sending SMS notifications will require an SMS provider
 > or local hardware with a SIM card active.
    object NotificationCommand "sms-notification" {
       command = [
         PluginDir + "/send_sms_notification",
         "$mobile$",
         "..."
    }
 The two new notification escalations are added onto the host `localhost`
 and its service `ping4` using the `generic-notification` template.
 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
 command) after `30m` until `1h`.
 > **Note**
 >
 > The `interval` was set to 15m in the `generic-notification`
 > template example. Lower that value in your escalations by using a secondary
 > template or overriding the attribute directly in the `notifications` array
 > position for `escalation-sms-2nd-level`.
 If the problem does not get resolved or acknowledged preventing further notifications
 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
    apply Notification "mail" to Service {
      import "generic-notification"
      command = "mail-notification"
      users = [ "icingaadmin" ]
      assign where service.name == "ping4"
    }
    apply Notification "escalation-sms-2nd-level" to Service {
      import "generic-notification"
      command = "sms-notification"
      users = [ "icinga-oncall-2nd-level" ]
      times = {
        begin = 30m
        end = 1h
      }
      assign where service.name == "ping4"
    }
    apply Notification "escalation-sms-1st-level" to Service {
      import "generic-notification"
      command = "sms-notification"
      users = [ "icinga-oncall-1st-level" ]
      times = {
        begin = 1h
        end = 2h
      }
      assign where service.name == "ping4"
    }
 ### <a id="first-notification-delay"></a> First Notification Delay
 Sometimes the problem in question should not be notified when the first notification
 happens, but a defined time duration afterwards. In Icinga 2 you can use the `times`
 dictionary and set `begin = 15m` as key and value if you want to suppress notifications
 in the first 15 minutes. Leave out the `end` key - if not set, Icinga 2 will not check against any
 end time for this notification.
    apply Notification "mail" to Service {
      import "generic-notification"
      command = "mail-notification"
      users = [ "icingaadmin" ]
      times.begin = 15m // delay first notification
      assign where service.name == "ping4"
    }
 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
 If there are no notification state and type filter attributes defined at the `Notification`
 or `User` object Icinga 2 assumes that all states and types are being notified.
 Available state and type filters for notifications are:
    template Notification "generic-notification" {
      states = [ Warning, Critical, Unknown ]
      types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
                FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
    }
 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
 into type and state, and allow more fine granular filtering for example on downtimes and flapping.
 You can filter for acknowledgements and custom notifications too.
 ## <a id="timeperiods"></a> Time Periods
@ -298,7 +503,6 @@ Use the `period` attribute to assign time periods to
    }
 ## <a id="commands"></a> Commands
 Icinga 2 uses three different command object types to specify how
@ -539,210 +743,99 @@ Details on all available options can be found in the
 [CheckCommand object definition](#objecttype-checkcommand).
-## <a id="notifications"></a> Notifications
+## <a id="dependencies"></a> Dependencies
-Notifications for service and host problems are an integral part of your
+Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects
-monitoring setup.
+for determing their network reachability.
 The `parent_host_name` and `parent_service_name` attributes are mandatory for
 service dependencies, `parent_host_name` is required for host dependencies.
-When a host or service is in a downtime, a problem has been acknowledged or
+A service can depend on a host, and vice versa. A service has an implicit
-the dependency logic determined that the host/service is unreachable, no
+dependency (parent) to its host. A host to host dependency acts implicit
-notirications are sent. You can configure additional type and state filters
+as host parent relation.
-refining the notifications being actually sent.
+When dependencies are calculated, not only the immediate parent is taken into
 account but all parents are inherited.
-There are many ways of sending notifications, e.g. by e-mail, XMPP,
+Notifications are suppressed if a host or service becomes unreachable.
 IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications.
 Instead it relies on external mechanisms such as shell scripts to notify users.
-A notification specification requires one or more users (and/or user groups)
+A common scenario is the Icinga 2 server behind a router. Checking internet
-who will be notified in case of problems. These users must have all custom
+access by pinging the Google DNS server `google-dns` is a common method, but
-attributes defined which will be used in the `NotificationCommand` on execution.
+will fail in case the `dsl-router` host is down. Therefore the example below
 defines a host dependency which acts implicit as parent relation too.
-The user `icingaadmin` in the example below will get notified only on `WARNING` and
+Furthermore the host may be reachable but ping probes are dropped by the
-`CRITICAL` states and `problem` and `recovery` notification types.
+router's firewall. In case the `dsl-router``ping4` service check fails, all
 further checks for the `ping4` service on host `google-dns` service should
 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
-    object User "icingaadmin" {
+    object Host "dsl-router" {
-      display_name = "Icinga 2 Admin"
+      address = "192.168.1.1"
      enable_notifications = true
      states = [ OK, Warning, Critical ]
      types = [ Problem, Recovery ]
      email = "icinga@localhost"
    }
-If you don't set the `states` and `types`
+    object Host "google-dns" {
-configuration attributes for the `User` object, notifications for all states and types
+      address = "8.8.8.8"
-will be sent.
+    }
-You should choose which information you (and your notified users) are interested in
+    apply Service "ping4" {
-case of emergency, and also which information does not provide any value to you and
+      import "generic-service"
 your environment.
-An example notification command is explained [here](#notification-commands).
+      check_command = "ping4"
-You can add all shared attributes to a `Notification` template which is inherited
+      assign where host.address
-to the defined notifications. That way you'll save duplicated attributes in each
+    }
 `Notification` object. Attributes can be overridden locally.
    apply Dependency "internet" to Service {
      parent_host_name = "dsl-router"
      disable_checks = true
-    template Notification "generic-notification" {
+      assign where host.name != "dsl-router"
-      interval = 15m
+    }
-      command = "mail-service-notification"
+Another classic example are agent based checks. You would define a health check
 for the agent daemon responding to your requests, and make all other services
 querying that daemon depend on that health check.
 The following configuration defines two nrpe based service checks `nrpe-load`
 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
 `nrpe-health` service.
    apply Service "nrpe-health" {
      import "generic-service"
      check_command = "nrpe"
      assign where match("nrpe-*", host.name)
    }
    apply Service "nrpe-load" {
      import "generic-service"
      check_command = "nrpe"
      vars.nrpe_command = "check_load"
      assign where match("nrpe-*", host.name)
    }
    apply Service "nrpe-disk" {
      import "generic-service"
      check_command = "nrpe"
      vars.nrpe_command = "check_disk"
      assign where match("nrpe-*", host.name)
    }
    object Host "nrpe-server" {
      import "generic-host"
      address = "192.168.1.5",
    }
    apply Dependency "disable-nrpe-checks" to Service {
      parent_service_name = "nrpe-health"
      states = [ Warning, Critical, Unknown ]
-      types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
+      disable_checks = true
-                FlappingEnd, DowntimeStart,DowntimeEnd, DowntimeRemoved ]
+      disable_notifications = true
-
+      assign where match("nrpe-*", host.name)
-      period = "24x7"
+      ignore where service.name == "nrpe-health"
    }
-The time period `24x7` is shipped as example configuration with Icinga 2.
+The `disable-nrpe-checks` dependency is applied to all services
-
+on the `nrpe-service` host but not the `nrpe-health` service itself.
 Use the `apply` keyword to create `Notification` objects for your services:
    apply Notification "mail" to Service {
      import "generic-notification"
      command = "mail-notification"
      users = [ "icingaadmin" ]
      assign where service.name == "mysql"
    }
 Instead of assigning users to notifications, you can also add the `user_groups`
 attribute with a list of user groups to the `Notification` object. Icinga 2 will
 send notifications to all group members.
 ### <a id="notification-escalations"></a> Notification Escalations
 When a problem notification is sent and a problem still exists after re-notification
 you may want to escalate the problem to the next support level. A different approach
 is to configure the default notification by email, and escalate the problem via sms
 if not already solved.
 You can define notification start and end times as additional configuration
 attributes making the `Notification` object a so-called `notification escalation`.
 Using templates you can share the basic notification attributes such as users or the
 `interval` (and override them for the escalation then).
 Using the example from above, you can define additional users being escalated for sms
 notifications between start and end time.
    object User "icinga-oncall-2nd-level" {
      display_name = "Icinga 2nd Level"
      vars.mobile = "+1 555 424642"
    }
    object User "icinga-oncall-1st-level" {
      display_name = "Icinga 1st Level"
      vars.mobile = "+1 555 424642"
    }
 Define an additional `NotificationCommand` for SMS notifications.
 > **Note**
 >
 > The example is not complete as there are many different SMS providers.
 > Please note that sending SMS notifications will require an SMS provider
 > or local hardware with a SIM card active.
    object NotificationCommand "sms-notification" {
       command = [
         PluginDir + "/send_sms_notification",
         "$mobile$",
         "..."
    }
 The two new notification escalations are added onto the host `localhost`
 and its service `ping4` using the `generic-notification` template.
 The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification`
 command) after `30m` until `1h`.
 > **Note**
 >
 > The `interval` was set to 15m in the `generic-notification`
 > template example. Lower that value in your escalations by using a secondary
 > template or overriding the attribute directly in the `notifications` array
 > position for `escalation-sms-2nd-level`.
 If the problem does not get resolved or acknowledged preventing further notifications
 the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was
 notified, but only for one hour (`2h` as `end` key for the `times` dictionary).
    apply Notification "mail" to Service {
      import "generic-notification"
      command = "mail-notification"
      users = [ "icingaadmin" ]
      assign where service.name == "ping4"
    }
    apply Notification "escalation-sms-2nd-level" to Service {
      import "generic-notification"
      command = "sms-notification"
      users = [ "icinga-oncall-2nd-level" ]
      times = {
        begin = 30m
        end = 1h
      }
      assign where service.name == "ping4"
    }
    apply Notification "escalation-sms-1st-level" to Service {
      import "generic-notification"
      command = "sms-notification"
      users = [ "icinga-oncall-1st-level" ]
      times = {
        begin = 1h
        end = 2h
      }
      assign where service.name == "ping4"
    }
 ### <a id="first-notification-delay"></a> First Notification Delay
 Sometimes the problem in question should not be notified when the first notification
 happens, but a defined time duration afterwards. In Icinga 2 you can use the `times`
 dictionary and set `begin = 15m` as key and value if you want to suppress notifications
 in the first 15 minutes. Leave out the `end` key - if not set, Icinga 2 will not check against any
 end time for this notification.
    apply Notification "mail" to Service {
      import "generic-notification"
      command = "mail-notification"
      users = [ "icingaadmin" ]
      times.begin = 15m // delay first notification
      assign where service.name == "ping4"
    }
 ### <a id="notification-filters-state-type"></a> Notification Filters by State and Type
 If there are no notification state and type filter attributes defined at the `Notification`
 or `User` object Icinga 2 assumes that all states and types are being notified.
 Available state and type filters for notifications are:
    template Notification "generic-notification" {
      states = [ Warning, Critical, Unknown ]
      types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart,
                FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ]
    }
 If you are familiar with Icinga 1.x `notification_options` please note that they have been split
 into type and state, and allow more fine granular filtering for example on downtimes and flapping.
 You can filter for acknowledgements and custom notifications too.
 ## <a id="downtimes"></a> Downtimes
@ -872,100 +965,6 @@ Icinga 2 will clear the acknowledgement when expired and start to
 re-notify if the problem persists.
 ## <a id="dependencies"></a> Dependencies
 Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects
 for determing their network reachability.
 The `parent_host_name` and `parent_service_name` attributes are mandatory for
 service dependencies, `parent_host_name` is required for host dependencies.
 A service can depend on a host, and vice versa. A service has an implicit
 dependency (parent) to its host. A host to host dependency acts implicit
 as host parent relation.
 When dependencies are calculated, not only the immediate parent is taken into
 account but all parents are inherited.
 Notifications are suppressed if a host or service becomes unreachable.
 A common scenario is the Icinga 2 server behind a router. Checking internet
 access by pinging the Google DNS server `google-dns` is a common method, but
 will fail in case the `dsl-router` host is down. Therefore the example below
 defines a host dependency which acts implicit as parent relation too.
 Furthermore the host may be reachable but ping probes are dropped by the
 router's firewall. In case the `dsl-router``ping4` service check fails, all
 further checks for the `ping4` service on host `google-dns` service should
 be suppressed. This is achieved by setting the `disable_checks` attribute to `true`.
    object Host "dsl-router" {
      address = "192.168.1.1"
    }
    object Host "google-dns" {
      address = "8.8.8.8"
    }
    apply Service "ping4" {
      import "generic-service"
      check_command = "ping4"
      assign where host.address
    }
    apply Dependency "internet" to Service {
      parent_host_name = "dsl-router"
      disable_checks = true
      assign where host.name != "dsl-router"
    }
 Another classic example are agent based checks. You would define a health check
 for the agent daemon responding to your requests, and make all other services
 querying that daemon depend on that health check.
 The following configuration defines two nrpe based service checks `nrpe-load`
 and `nrpe-disk` applied to the `nrpe-server`. The health check is defined as
 `nrpe-health` service.
    apply Service "nrpe-health" {
      import "generic-service"
      check_command = "nrpe"
      assign where match("nrpe-*", host.name)
    }
    apply Service "nrpe-load" {
      import "generic-service"
      check_command = "nrpe"
      vars.nrpe_command = "check_load"
      assign where match("nrpe-*", host.name)
    }
    apply Service "nrpe-disk" {
      import "generic-service"
      check_command = "nrpe"
      vars.nrpe_command = "check_disk"
      assign where match("nrpe-*", host.name)
    }
    object Host "nrpe-server" {
      import "generic-host"
      address = "192.168.1.5",
    }
    apply Dependency "disable-nrpe-checks" to Service {
      parent_service_name = "nrpe-health"
      states = [ Warning, Critical, Unknown ]
      disable_checks = true
      disable_notifications = true
      assign where match("nrpe-*", host.name)
      ignore where service.name == "nrpe-health"
    }
 The `disable-nrpe-checks` dependency is applied to all services
 on the `nrpe-service` host but not the `nrpe-health` service itself.
 ## <a id="custom-attributes"></a> Custom Attributes