# Monitoring Basics This part of the Icinga 2 documentation provides an overview of all the basic monitoring concepts you need to know to run Icinga 2. ## Hosts and Services Icinga 2 can be used to monitor the availability of hosts and services. Hosts and services can be virtually anything which can be checked in some way: * Network services (HTTP, SMTP, SNMP, SSH, etc.) * Printers * Switches / routers * Temperature sensors * Other local or network-accessible services Host objects provide a mechanism to group services that are running on the same physical device. Here is an example of a host object which defines two child services: object Host "my-server1" { address = "10.0.0.1" check_command = "hostalive" } object Service "ping4" { host_name = "my-server1" check_command = "ping4" } object Service "http" { host_name = "my-server1" check_command = "http" } The example creates two services `ping4` and `http` which belong to the host `my-server1`. It also specifies that the host should perform its own check using the `hostalive` check command. The `address` attribute is used by check commands to determine which network address is associated with the host object. Details on troubleshooting check problems can be found [here](#troubleshooting). ### Host States Hosts can be in any of the following states: Name | Description ------------|-------------- UP | The host is available. DOWN | The host is unavailable. ### Service States Services can be in any of the following states: Name | Description ------------|-------------- OK | The service is working properly. WARNING | The service is experiencing some problems but is still considered to be in working condition. CRITICAL | The service is in a critical state. UNKNOWN | The check could not determine the service's state. ### Hard and Soft States When detecting a problem with a host/service Icinga re-checks the object a number of times (based on the `max_check_attempts` and `retry_interval` settings) before sending notifications. This ensures that no unnecessary notifications are sent for transient failures. During this time the object is in a `SOFT` state. After all re-checks have been executed and the object is still in a non-OK state the host/service switches to a `HARD` state and notifications are sent. Name | Description ------------|-------------- HARD | The host/service's state hasn't recently changed. SOFT | The host/service has recently changed state and is being re-checked. ### Host and Service Checks Hosts and Services determine their state from a check result returned from a check execution to the Icinga 2 application. By default the `generic-host` example template will define `hostalive` as host check. If your host is unreachable for ping, you should consider using a different check command, for instance the `http` check command, or if there is no check available, the `dummy` check command. object Host "uncheckable-host" { check_command = "dummy" vars.dummy_state = 1 vars.dummy_text = "Pretending to be OK." } Service checks could also use a `dummy` check, but the common strategy is to [integrate an existing plugin](#command-plugin-integration) as [check command](#check-commands) and [reference](#command-passing-parameters) that in your [Service](#objecttype-service) object definition. ## Configuration Best Practice The [Getting Started](#getting-started) chapter already introduced various aspects of the Icinga 2 configuration language. If you are ready to configure additional hosts, services, notifications, dependencies, etc, you should think about the requirements first and then decide for a possible strategy. There are many ways of creating Icinga 2 configuration objects: * Manually with your preferred editor, for example vi(m), nano, notepad, etc. * Generated by a configuration management tool such as Puppet, Chef, Ansible, etc. * A configuration addon for Icinga 2 * A custom exporter script from your CMDB or inventory tool * your own. In order to find the best strategy for your own configuration, ask yourself the following questions: * Do your hosts share a common group of services (for example linux hosts with disk, load, etc checks)? * Only a small set of users receives notifications and escalations for all hosts/services? If you can at least answer one of these questions with yes, look for the [apply rules](#using-apply) logic instead of defining objects on a per host and service basis. * You are required to define specific configuration for each host/service? * Does your configuration generation tool already know about the host-service-relationship? Then you should look for the object specific configuration setting `host_name` etc accordingly. Finding the best files and directory tree for your configuration is up to you. Make sure that the [icinga2.conf](#icinga2-conf) configuration file includes them, and then think about: * tree-based on locations, hostgroups, specific host attributes with sub levels of directories. * flat `hosts.conf`, `services.conf`, etc files for rule based configuration. * generated configuration with one file per host and a global configuration for groups, users, etc. * one big file generated from an external application (probably a bad idea for maintaining changes). * your own. In either way of choosing the right strategy you should additionally check the following: * Are there any specific attributes describing the host/service you could set as `vars` custom attributes? You can later use them for applying assign/ignore rules, or export them into external interfaces. * Put hosts into hostgroups, services into servicegroups and use these attributes for your apply rules. * Use templates to store generic attributes for your objects and apply rules making your configuration more readable. Details can be found in the [using templates](#using-templates) chapter. * Apply rules may overlap. Keep a central place (for example, `services.conf` or `notifications.conf`) storing the configuration instead of defining apply rules deep in your configuration tree. * Every plugin used as check, notification or event command requires a `Command` definition. Further details can be looked up in the [check commands](#check-commands) chapter. If you happen to have further questions, do not hesitate to join the [community support channels](https://support.icinga.org) and ask community members for their experience and best practices. ### Object Inheritance Using Templates Templates may be used to apply a set of identical attributes to more than one object: template Service "generic-service" { max_check_attempts = 3 check_interval = 5m retry_interval = 1m enable_perfdata = true } object Service "ping4" { import "generic-service" host_name = "localhost" check_command = "ping4" } object Service "ping6" { import "generic-service" host_name = "localhost" check_command = "ping6" } In this example the `ping4` and `ping6` services inherit properties from the template `generic-service`. Objects as well as templates themselves can import an arbitrary number of templates. Attributes inherited from a template can be overridden in the object if necessary. ### Apply objects based on rules Instead of assigning each object (`Service`, `Notification`, `Dependency`, `ScheduledDowntime`) based on attribute identifiers for example `host_name` objects can be [applied](#apply). Detailed scenario examples are used in their respective chapters, for example [apply services with custom command arguments](#using-apply-services-command-arguments). #### Apply Services to Hosts apply Service "load" { import "generic-service" check_command = "load" assign where "linux-server" in host.groups ignore where host.vars.no_load_check } In this example the `load` service will be created as object for all hosts in the `linux-server` host group. If the `no_load_check` custom attribute is set, the host will be ignored. #### Apply Notifications to Hosts and Services Notifications are applied to specific targets (`Host` or `Service`) and work in a similar manner: apply Notification "mail-noc" to Service { import "mail-service-notification" command = "mail-service-notification" user_groups = [ "noc" ] assign where service.vars.sla == "24x7" } In this example the `mail-noc` notification will be created as object for all services having the `sla` custom attribute set to `24x7`. The notification command is set to `mail-service-notification` and all members of the user group `noc` will get notified. #### Apply Dependencies to Hosts and Services Detailed examples can be found in the [dependencies](#dependencies) chapter. ### Apply Recurring Downtimes to Hosts and Services Detailed examples can be found in the [recurring downtimes](#recurring-downtimes) chapter. ### Groups Groups are used for combining hosts, services, and users into accessible configuration attributes and views in external (web) interfaces. Group membership is defined at the respective object itself. If you have a hostgroup name `windows` for example, and want to assign specific hosts to this group for later viewing the group on your alert dashboard, first create the hostgroup: object HostGroup "windows" { display_name = "Windows Servers" } Then add your hosts to this hostgroup template Host "windows-server" { groups += [ "windows" ] } object Host "mssql-srv1" { import "windows-server" vars.mssql_port = 1433 } object Host "mssql-srv2" { import "windows-server" vars.mssql_port = 1433 } This can be done for service and user groups the same way. Additionally the user groups are associated as attributes in `Notification` objects. object UserGroup "windows-mssql-admins" { display_name = "Windows MSSQL Admins" } template User "generic-windows-mssql-users" { groups += [ "windows-mssql-admins" ] } object User "win-mssql-noc" { import "generic-windows-mssql-users" email = "noc@example.com" } object User "win-mssql-ops" { import "generic-windows-mssql-users" email = "ops@example.com" } #### Group Membership Assign If there is a certain number of hosts, services, or users matching a pattern it's reasonable to assign the group object to these members. Details on the `assign where` syntax can be found [here](#apply) object HostGroup "mssql" { display_name = "MSSQL Servers" assign where host.vars.mssql_port } In this inherited example from above all hosts with the `vars` attribute `mssql_port` set will be added as members to the host group `mssql`. ## Notifications Notifications for service and host problems are an integral part of your monitoring setup. When a host or service is in a downtime, a problem has been acknowledged or the dependency logic determined that the host/service is unreachable, no notifications are sent. You can configure additional type and state filters refining the notifications being actually sent. There are many ways of sending notifications, e.g. by e-mail, XMPP, IRC, Twitter, etc. On its own Icinga 2 does not know how to send notifications. Instead it relies on external mechanisms such as shell scripts to notify users. A notification specification requires one or more users (and/or user groups) who will be notified in case of problems. These users must have all custom attributes defined which will be used in the `NotificationCommand` on execution. The user `icingaadmin` in the example below will get notified only on `WARNING` and `CRITICAL` states and `problem` and `recovery` notification types. object User "icingaadmin" { display_name = "Icinga 2 Admin" enable_notifications = true states = [ OK, Warning, Critical ] types = [ Problem, Recovery ] email = "icinga@localhost" } If you don't set the `states` and `types` configuration attributes for the `User` object, notifications for all states and types will be sent. Details on troubleshooting notification problems can be found [here](#troubleshooting). > **Note** > > Make sure that the [notification](#features) feature is enabled on your master instance > in order to execute notification commands. You should choose which information you (and your notified users) are interested in case of emergency, and also which information does not provide any value to you and your environment. An example notification command is explained [here](#notification-commands). You can add all shared attributes to a `Notification` template which is inherited to the defined notifications. That way you'll save duplicated attributes in each `Notification` object. Attributes can be overridden locally. template Notification "generic-notification" { interval = 15m command = "mail-service-notification" states = [ Warning, Critical, Unknown ] types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] period = "24x7" } The time period `24x7` is shipped as example configuration with Icinga 2. Use the `apply` keyword to create `Notification` objects for your services: apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] assign where service.name == "mysql" } Instead of assigning users to notifications, you can also add the `user_groups` attribute with a list of user groups to the `Notification` object. Icinga 2 will send notifications to all group members. ### Notification Escalations When a problem notification is sent and a problem still exists at the time of re-notification you may want to escalate the problem to the next support level. A different approach is to configure the default notification by email, and escalate the problem via SMS if not already solved. You can define notification start and end times as additional configuration attributes making the `Notification` object a so-called `notification escalation`. Using templates you can share the basic notification attributes such as users or the `interval` (and override them for the escalation then). Using the example from above, you can define additional users being escalated for SMS notifications between start and end time. object User "icinga-oncall-2nd-level" { display_name = "Icinga 2nd Level" vars.mobile = "+1 555 424642" } object User "icinga-oncall-1st-level" { display_name = "Icinga 1st Level" vars.mobile = "+1 555 424642" } Define an additional `NotificationCommand` for SMS notifications. > **Note** > > The example is not complete as there are many different SMS providers. > Please note that sending SMS notifications will require an SMS provider > or local hardware with a SIM card active. object NotificationCommand "sms-notification" { command = [ PluginDir + "/send_sms_notification", "$mobile$", "..." } The two new notification escalations are added onto the host `localhost` and its service `ping4` using the `generic-notification` template. The user `icinga-oncall-2nd-level` will get notified by SMS (`sms-notification` command) after `30m` until `1h`. > **Note** > > The `interval` was set to 15m in the `generic-notification` > template example. Lower that value in your escalations by using a secondary > template or by overriding the attribute directly in the `notifications` array > position for `escalation-sms-2nd-level`. If the problem does not get resolved nor acknowledged preventing further notifications the `escalation-sms-1st-level` user will be escalated `1h` after the initial problem was notified, but only for one hour (`2h` as `end` key for the `times` dictionary). apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] assign where service.name == "ping4" } apply Notification "escalation-sms-2nd-level" to Service { import "generic-notification" command = "sms-notification" users = [ "icinga-oncall-2nd-level" ] times = { begin = 30m end = 1h } assign where service.name == "ping4" } apply Notification "escalation-sms-1st-level" to Service { import "generic-notification" command = "sms-notification" users = [ "icinga-oncall-1st-level" ] times = { begin = 1h end = 2h } assign where service.name == "ping4" } ### Notification Delay Sometimes the problem in question should not be notified when the notification is due (the object reaching the `HARD` state) but a defined time duration afterwards. In Icinga 2 you can use the `times` dictionary and set `begin = 15m` as key and value if you want to postpone the first notification for 15 minutes. Leave out the `end` key - if not set, Icinga 2 will not check against any end time for this notification. apply Notification "mail" to Service { import "generic-notification" command = "mail-notification" users = [ "icingaadmin" ] times.begin = 15m // delay first notification assign where service.name == "ping4" } ### Notification Filters by State and Type If there are no notification state and type filter attributes defined at the `Notification` or `User` object Icinga 2 assumes that all states and types are being notified. Available state and type filters for notifications are: template Notification "generic-notification" { states = [ Warning, Critical, Unknown ] types = [ Problem, Acknowledgement, Recovery, Custom, FlappingStart, FlappingEnd, DowntimeStart, DowntimeEnd, DowntimeRemoved ] } If you are familiar with Icinga 1.x `notification_options` please note that they have been split into type and state to allow more fine granular filtering for example on downtimes and flapping. You can filter for acknowledgements and custom notifications too. ## Time Periods Time Periods define time ranges in Icinga where event actions are triggered, for example whether a service check is executed or not within the `check_period` attribute. Or a notification should be sent to users or not, filtered by the `period` and `notification_period` configuration attributes for `Notification` and `User` objects. > **Note** > > If you are familar with Icinga 1.x - these time period definitions > are called `legacy timeperiods` in Icinga 2. > > An Icinga 2 legacy timeperiod requires the `ITL` provided template >`legacy-timeperiod`. The `TimePeriod` attribute `ranges` may contain multiple directives, including weekdays, days of the month, and calendar dates. These types may overlap/override other types in your ranges dictionary. The descending order of precedence is as follows: * Calendar date (2008-01-01) * Specific month date (January 1st) * Generic month date (Day 15) * Offset weekday of specific month (2nd Tuesday in December) * Offset weekday (3rd Monday) * Normal weekday (Tuesday) If you don't set any `check_period` or `notification_period` attribute on your configuration objects Icinga 2 assumes `24x7` as time period as shown below. object TimePeriod "24x7" { import "legacy-timeperiod" display_name = "Icinga 2 24x7 TimePeriod" ranges = { "monday" = "00:00-24:00" "tuesday" = "00:00-24:00" "wednesday" = "00:00-24:00" "thursday" = "00:00-24:00" "friday" = "00:00-24:00" "saturday" = "00:00-24:00" "sunday" = "00:00-24:00" } } If your operation staff should only be notified during workhours create a new timeperiod named `workhours` defining a work day from 09:00 to 17:00. object TimePeriod "workhours" { import "legacy-timeperiod" display_name = "Icinga 2 8x5 TimePeriod" ranges = { "monday" = "09:00-17:00" "tuesday" = "09:00-17:00" "wednesday" = "09:00-17:00" "thursday" = "09:00-17:00" "friday" = "09:00-17:00" } } Use the `period` attribute to assign time periods to `Notification` and `Dependency` objects: object Notification "mail" { import "generic-notification" host_name = "localhost" command = "mail-notification" users = [ "icingaadmin" ] period = "workhours" } ## Commands Icinga 2 uses three different command object types to specify how checks should be performed, notifications should be sent, and events should be handled. ### Environment Variables for Commands Please check [Runtime Custom Attributes as Environment Variables](#runtime-custom-attribute-env-vars). ### Check Commands `CheckCommand` objects define the command line how a check is called. > **Note** > > Make sure that the [checker](#features) feature is enabled in order to > execute checks. #### Integrate the Plugin with a CheckCommand Definition `CheckCommand` objects require the [ITL template](#itl-plugin-check-command) `plugin-check-command` to support native plugin based check methods. Unless you have done so already, download your check plugin and put it into the `PluginDir` directory. The following example uses the `check_disk` plugin shipped with the Monitoring Plugins package. The plugin path and all command arguments are made a list of double-quoted string arguments for proper shell escaping. Call the `check_disk` plugin with the `--help` parameter to see all available options. Our example defines warning (`-w`) and critical (`-c`) thresholds for the disk usage. Without any partition defined (`-p`) it will check all local partitions. icinga@icinga2 $ /usr/lib/nagios/plugins/check_disk --help ... This plugin checks the amount of used disk space on a mounted file system and generates an alert if free space is less than one of the threshold values Usage: check_disk -w limit -c limit [-W limit] [-K limit] {-p path | -x device} [-C] [-E] [-e] [-f] [-g group ] [-k] [-l] [-M] [-m] [-R path ] [-r path ] [-t timeout] [-u unit] [-v] [-X type] [-N type] ... > **Note** > > Don't execute plugins as `root` and always use the absolute path to the plugin! Trust us. Next step is to understand how command parameters are being passed from a host or service object, and add a `CheckCommand` definition based on these required parameters and/or default values. #### Passing Check Command Parameters from Host or Service Unlike Icinga 1.x check command parameters are defined as custom attributes which can be accessed as runtime macros by the executed check command. Define the default check command custom attribute `disk_wfree` and `disk_cfree` (freely definable naming schema) and their default threshold values. You can then use these custom attributes as runtime macros for [command arguments](#command-arguments) on the command line. The default custom attributes can be overridden by the custom attributes defined in the service using the check command `my-disk`. The custom attributes can also be inherited from a parent template using additive inheritance (`+=`). object CheckCommand "my-disk" { import "plugin-check-command" command = [ PluginDir + "/check_disk" ] arguments = { "-w" = "$disk_wfree$%" "-c" = "$disk_cfree$%" } vars.disk_wfree = 20 vars.disk_cfree = 10 } The host `localhost` with the service `my-disk` checks all disks with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%` free disk space). object Host "localhost" { import "generic-host" address = "127.0.0.1" address6 = "::1" } object Service "my-disk" { import "generic-service" host_name = "localhost" check_command = "my-disk" vars.disk_wfree = 10 vars.disk_cfree = 5 } #### Command Arguments By defining a check command line using the `command` attribute Icinga 2 will resolve all macros in the static string or array. Sometimes it is required to extend the arguments list based on a met condition evaluated at command execution. Or making arguments optional - only set if the macro value can be resolved by Icinga 2. object CheckCommand "check_http" { import "plugin-check-command" command = [ PluginDir + "/check_http" ] arguments = { "-H" = "$http_vhost$" "-I" = "$http_address$" "-u" = "$http_uri$" "-p" = "$http_port$" "-S" = { set_if = "$http_ssl$" } "--sni" = { set_if = "$http_sni$" } "-a" = { value = "$http_auth_pair$" description = "Username:password on sites with basic authentication" } "--no-body" = { set_if = "$http_ignore_body$" } "-r" = "$http_expect_body_regex$" "-w" = "$http_warn_time$" "-c" = "$http_critical_time$" "-e" = "$http_expect$" } vars.http_address = "$address$" vars.http_ssl = false vars.http_sni = false } The example shows the `check_http` check command defining the most common arguments. Each of them is optional by default and will be omitted if the value is not set. For example if the service calling the check command does not have `vars.http_port` set, it won't get added to the command line. If the `vars.http_ssl` custom attribute is set in the service, host or command object definition, Icinga 2 will add the `-S` argument based on the `set_if` option to the command line. That way you can use the `check_http` command definition for both, with and without SSL enabled checks saving you duplicated command definitions. Details on all available options can be found in the [CheckCommand object definition](#objecttype-checkcommand). ### Apply Services with custom Command Arguments Imagine the following scenario: The `my-host1` host is reachable using the default port 22, while the `my-host2` host requires a different port on 2222. Both hosts are in the hostgroup `my-linux-servers`. object HostGroup "my-linux-servers" { display_name = "Linux Servers" assign where host.vars.os == "Linux" } /* this one has port 22 opened */ object Host "my-host1" { import "generic-host" address = "129.168.1.50" vars.os = "Linux" } /* this one listens on a different ssh port */ object Host "my-host2" { import "generic-host" address = "129.168.2.50" vars.os = "Linux" vars.custom_ssh_port = 2222 } All hosts in the `my-linux-servers` hostgroup should get the `my-ssh` service applied based on an [apply rule](#apply). The optional `ssh_port` command argument should be inherited from the host the service is applied to. If not set, the check command `my-ssh` will omit the argument. object CheckCommand "my-ssh" { import "plugin-check-command" command = [ PluginDir + "/check_ssh" ] arguments = { "-p" = "$ssh_port$" "host" = { value = "$ssh_address$" skip_key = true order = -1 } } vars.ssh_address = "$address$" } /* apply ssh service */ apply Service "my-ssh" { import "generic-service" check_command = "my-ssh" //set the command argument for ssh port with a custom host attribute, if set vars.ssh_port = "$host.vars.custom_ssh_port$" assign where "my-linux-servers" in host.groups } The `my-host1` will get the `my-ssh` service checking on the default port: [2014-05-26 21:52:23 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '129.168.1.50': PID 27281 The `my-host2` will inherit the `custom_ssh_port` variable to the service and execute a different command: [2014-05-26 21:51:32 +0200] notice/Process: Running command '/usr/lib/nagios/plugins/check_ssh', '-p', '2222', '129.168.2.50': PID 26956 ### Notification Commands `NotificationCommand` objects define how notifications are delivered to external interfaces (E-Mail, XMPP, IRC, Twitter, etc). `NotificationCommand` objects require the [ITL template](#itl-plugin-notification-command) `plugin-notification-command` to support native plugin-based notifications. > **Note** > > Make sure that the [notification](#features) feature is enabled on your master instance > in order to execute notification commands. Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for the current check output) sending an email to the user(s) associated with the notification itself (`$user.email$`). If you want to specify default values for some of the custom attribute definitions, you can add a `vars` dictionary as shown for the `CheckCommand` object. object NotificationCommand "mail-service-notification" { import "plugin-notification-command" command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ] env = { NOTIFICATIONTYPE = "$notification.type$" SERVICEDESC = "$service.name$" HOSTALIAS = "$host.display_name$" HOSTADDRESS = "$address$" SERVICESTATE = "$service.state$" LONGDATETIME = "$icinga.long_date_time$" SERVICEOUTPUT = "$service.output$" NOTIFICATIONAUTHORNAME = "$notification.author$" NOTIFICATIONCOMMENT = "$notification.comment$" HOSTDISPLAYNAME = "$host.display_name$" SERVICEDISPLAYNAME = "$service.display_name$" USEREMAIL = "$user.email$" } } The command attribute in the `mail-service-notification` command refers to the following shell script. The macros specified in the `env` array are exported as environment variables and can be used in the notification script: #!/usr/bin/env bash template=$(cat <