# Monitoring Basics This part of the Icinga 2 documentation provides an overview of all the basic monitoring concepts you need to know to run Icinga 2. ## Hosts and Services Icinga 2 can be used to monitor the availability of hosts and services. Hosts and services can be virtually anything which can be checked in some way: * Network services (HTTP, SMTP, SNMP, SSH, etc.) * Printers * Switches / Routers * Temperature Sensors * Other local or network-accessible services Host objects provide a mechanism to group services that are running on the same physical device. Here is an example of a host object which defines two child services: object Host "my-server1" { address = "10.0.0.1" check_command = "hostalive" } object Service "ping4" { host_name = "localhost" check_command = "ping4" } object Service "http" { host_name = "localhost" check_command = "http_ip" } The example creates two services `ping4` and `http` which belong to the host `my-server1`. It also specifies that the host should perform its own check using the `hostalive` check command. The `address` custom attribute is used by check commands to determine which network address is associated with the host object. ### Host States Hosts can be in any of the following states: Name | Description ------------|-------------- UP | The host is available. DOWN | The host is unavailable. ### Service States Services can be in any of the following states: Name | Description ------------|-------------- OK | The service is working properly. WARNING | The service is experiencing some problems but is still considered to be in working condition. CRITICAL | The service is in a critical state. UNKNOWN | The check could not determine the service's state. ### Hard and Soft States When detecting a problem with a host/service Icinga re-checks the object a number of times (based on the `max_check_attempts` and `retry_interval` settings) before sending notifications. This ensures that no unnecessary notifications are sent for transient failures. During this time the object is in a `SOFT` state. After all re-checks have been executed and the object is still in a non-OK state the host/service switches to a `HARD` state and notifications are sent. Name | Description ------------|-------------- HARD | The host/service's state hasn't recently changed. SOFT | The host/service has recently changed state and is being re-checked. ## Using Templates Templates may be used to apply a set of identical attributes to more than one object: template Service "generic-service" { max_check_attempts = 3 check_interval = 5m retry_interval = 1m enable_perfdata = true } object Service "ping4" { import "generic-service" host_name = "localhost" check_command = "ping4" } object Service "ping6" { import "generic-service" host_name = "localhost" check_command = "ping6" } In this example the `ping4` and `ping6` services inherit properties from the template `generic-service`. Objects as well as templates themselves can import an arbitrary number of templates. Attributes inherited from a template can be overridden in the object if necessary. ## Apply objects based on rules Instead of assigning each object (`Service`, `Notification`, `Dependency`, `ScheduledDowntime`) based on attribute identifiers for example `host_name` objects can be [applied](#apply). apply Service "load" { import "generic-service" check_command = "load" assign where "linux-server" in host.groups ignore where host.vars.no_load_check } In this example the `load` service will be created as object for all hosts in the `linux-server` host group. If the `no_load_check` custom attribute is set, the host will be ignored. Notifications are applied to specific targets (`Host` or `Service`) and work in a similar manner: apply Notification "mail-noc" to Service { import "mail-service-notification" command = "mail-service-notification" user_groups = [ "noc" ] assign where service.vars.sla == "24x7" } In this example the `mail-noc` notification will be created as object for all services having the `sla` custom attribute set to `24x7`. The notification command is set to `mail-service-notification` and all members of the user group `noc` will get notified. `Dependency` and `ScheduledDowntime` objects can be applied in a similar fashion. ## Groups Groups are used for combining hosts, services, and users into accessible configuration attributes and views in external (web) interfaces. Group membership is defined at the respective object itself. If you have a hostgroup name `windows` for example, and want to assign specific hosts to this group for later viewing the group on your alert dashboard, first create the hostgroup: object HostGroup "windows" { display_name = "Windows Servers" } Then add your hosts to this hostgroup template Host "windows-server" { groups += [ "windows" ] } object Host "mssql-srv1" { import "windows-server" vars.mssql_port = 1433 } object Host "mssql-srv2" { import "windows-server" vars.mssql_port = 1433 } This can be done for service and user groups the same way. Additionally the user groups are associated as attributes in `Notification` objects. object UserGroup "windows-mssql-admins" { display_name = "Windows MSSQL Admins" } template User "generic-windows-mssql-users" { groups += [ "windows-mssql-admins" ] } object User "win-mssql-noc" { import "generic-windows-mssql-users" email = "noc@example.com" } object User "win-mssql-ops" { import "generic-windows-mssql-users" email = "ops@example.com" } ### Group Membership Assign If there is a certain number of hosts, services or users matching a pattern it's reasonable to assign the group object to these members. Details on the `assign where` syntax can be found [here](#group-assign) object HostGroup "mssql" { display_name = "MSSQL Servers" assign where host.vars.mssql_port } In this inherited example from above all hosts with the `var` `mssql_port` set will be added as members to the host group `mssql`. ## Time Periods Time Periods define time ranges in Icinga where event actions are triggered, for example whether a service check is executed or not within the `check_period` attribute. Or a notification should be sent to users or not, filtered by the `period` and `notification_period` configuration attributes for `Notification` and `User` objects. > **Note** > > If you are familar with Icinga 1.x - these time period definitions > are called `legacy timeperiods` in Icinga 2. > > An Icinga 2 legacy timeperiod requires the `ITL` provided template >`legacy-timeperiod`. The `TimePeriod` attribute `ranges` may contain multiple directives, including weekdays, days of the month, and calendar dates. These types may overlap/override other types in your ranges dictionary. The descending order of precedence is as follows: * Calendar date (2008-01-01) * Specific month date (January 1st) * Generic month date (Day 15) * Offset weekday of specific month (2nd Tuesday in December) * Offset weekday (3rd Monday) * Normal weekday (Tuesday) If you don't set any `check_period` or `notification_period` attribute on your configuration objects Icinga 2 assumes `24x7` as time period as shown below. object TimePeriod "24x7" { import "legacy-timeperiod" display_name = "Icinga 2 24x7 TimePeriod" ranges = { "monday" = "00:00-24:00" "tuesday" = "00:00-24:00" "wednesday" = "00:00-24:00" "thursday" = "00:00-24:00" "friday" = "00:00-24:00" "saturday" = "00:00-24:00" "sunday" = "00:00-24:00" } } If your operation staff should only be notified during workhours create a new timeperiod named `workhours` defining a work day from 09:00 to 17:00. object TimePeriod "workhours" { import "legacy-timeperiod" display_name = "Icinga 2 8x5 TimePeriod" ranges = { "monday" = "09:00-17:00" "tuesday" = "09:00-17:00" "wednesday" = "09:00-17:00" "thursday" = "09:00-17:00" "friday" = "09:00-17:00" } } Use the `period` attribute to assign time periods to `Notification` and `Dependency` objects: object Notification "mail" { import "generic-notification" host_name = "localhost" command = "mail-notification" users = [ "icingaadmin" ] period = "workhours" } ## Commands Icinga 2 uses three different command object types to specify how checks should be performed, notifications should be sent and events should be handled. ### Environment Variables for Commands Please check [Runtime Custom Attributes as Environment Variables](#runtime-custom-attribute-env-vars). ### Check Commands `CheckCommand` objects define the command line how a check is called. `CheckCommand` objects require the [ITL template](#itl-plugin-check-command) `plugin-check-command` to support native plugin based check methods. Unless you have done so already, download your check plugin and put it into the `PluginDir` directory. The following example uses the `check_disk` plugin shipped with the Monitoring Plugins package. The plugin path and all command arguments are made a list of double-quoted string arguments for proper shell escaping. Call the `check_disk` plugin with the `--help` parameter to see all available options. Our example defines warning (`-w`) and critical (`-c`) thresholds for the disk usage. Without any partition defined (`-p`) it will check all local partitions. Define the default check command custom attribute `disk_wfree` and `disk_cfree` freely definable naming schema) and their default threshold values. You can then use these custom attributes as runtime macros on the command line. The default custom attributes can be overridden by the custom attributes defined in the service using the check command `disk`. The custom attributes can also be inherited from a parent template using additive inheritance (`+=`). object CheckCommand "disk" { import "plugin-check-command" command = [ PluginDir + "/check_disk", "-w", "$disk_wfree$%", "-c", "$disk_cfree$%" ], vars.disk_wfree = 20 vars.disk_cfree = 10 } The host `localhost` with the service `disk` checks all disks with modified custom attributes (warning thresholds at `10%`, critical thresholds at `5%` free disk space). object Host "localhost" { import "generic-host" address = "127.0.0.1" address6 = "::1" } object Service "disk" { import "generic-service" host_name = "localhost" check_command = "disk" vars.disk_wfree = 10 vars.disk_cfree = 5 } ### Notification Commands `NotificationCommand` objects define how notifications are delivered to external interfaces (E-Mail, XMPP, IRC, Twitter, etc). `NotificationCommand` objects require the [ITL template](#itl-plugin-notification-command) `plugin-notification-command` to support native plugin-based notifications. Below is an example using runtime macros from Icinga 2 (such as `$service.output$` for the current check output) sending an email to the user(s) associated with the notification itself (`$user.email$`). If you want to specify default values for some of the custom attribute definitions, you can add a `vars` dictionary as shown for the `CheckCommand` object. object NotificationCommand "mail-service-notification" { import "plugin-notification-command" command = [ SysconfDir + "/icinga2/scripts/mail-notification.sh" ] env = { "NOTIFICATIONTYPE" = "$notification.type$" "SERVICEDESC" = "$service.name$" "HOSTALIAS" = "$host.display_name$", "HOSTADDRESS" = "$address$", "SERVICESTATE" = "$service.state$", "LONGDATETIME" = "$icinga.long_date_time$", "SERVICEOUTPUT" = "$service.output$", "NOTIFICATIONAUTHORNAME" = "$notification.author$", "NOTIFICATIONCOMMENT" = "$notification.comment$", "HOSTDISPLAYNAME" = "$host.display_name$", "SERVICEDISPLAYNAME" = "$service.display_name$", "USEREMAIL" = "$user.email$" } } The command attribute in the `mail-service-notification` command refers to the following shell script. The macros specified in the `env` array are exported as environment variables and can be used in the notification script: #!/usr/bin/env bash template=$(cat <