mirror of https://github.com/Icinga/icinga2.git
parent
fe1f798858
commit
a910e45f51
|
@ -1,494 +1 @@
|
|||
# <a id="advanced-topics"></a> Advanced Topics
|
||||
|
||||
## <a id="downtimes"></a> Downtimes
|
||||
|
||||
Downtimes can be scheduled for planned server maintenance or
|
||||
any other targetted service outage you are aware of in advance.
|
||||
|
||||
Downtimes will suppress any notifications, and may trigger other
|
||||
downtimes too. If the downtime was set by accident, or the duration
|
||||
exceeds the maintenance, you can manually cancel the downtime.
|
||||
Planned downtimes will also be taken into account for SLA reporting
|
||||
tools calculating the SLAs based on the state and downtime history.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Downtimes may overlap with their start and end times. If there
|
||||
> are multiple downtimes triggered for one object, the overall downtime depth
|
||||
> will be more than `1`. This is useful when you want to extend
|
||||
> your maintenance window taking longer than expected.
|
||||
|
||||
### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
|
||||
|
||||
A `fixed` downtime will be activated at the defined start time, and
|
||||
removed at the end time. During this time window the service state
|
||||
will change to `NOT-OK` and then actually trigger the downtime.
|
||||
Notifications are suppressed and the downtime depth is incremented.
|
||||
|
||||
Common scenarios are a planned distribution upgrade on your linux
|
||||
servers, or database updates in your warehouse. The customer knows
|
||||
about a fixed downtime window between 23:00 and 24:00. After 24:00
|
||||
all problems should be alerted again. Solution is simple -
|
||||
schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
|
||||
|
||||
Unlike a `fixed` downtime, a `flexible` downtime end does not necessarily
|
||||
happen at the provided end time. Instead the downtime will be triggered
|
||||
by the state change in the time span defined by start and end time, but
|
||||
then last a defined duration in minutes.
|
||||
|
||||
Imagine the following scenario: Your service is frequently polled
|
||||
by users trying to grab free deleted domains for immediate registration.
|
||||
Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
|
||||
a network outage visible to the monitoring. The service is still alive,
|
||||
but answering too slow to Icinga 2 service checks.
|
||||
For that reason, you may want to schedule a downtime between 07:30 and
|
||||
08:00 with a duration of 15 minutes. The downtime will then last from
|
||||
its trigger time until the duration is over. After that, the downtime
|
||||
is removed (may happen before or after the actual end time!).
|
||||
|
||||
### <a id="scheduling-downtime"></a> Scheduling a downtime
|
||||
|
||||
This can either happen through a web interface (Icinga 1.x Classic UI or Web)
|
||||
or by using the external command pipe provided by the `ExternalCommandListener`
|
||||
configuration.
|
||||
|
||||
Fixed downtimes require a start and end time (a duration will be ignored).
|
||||
Flexible downtimes need a start and end time for the time span, and a duration
|
||||
independent from that time span.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Modern web interfaces treat services in a downtime as `handled`.
|
||||
|
||||
### <a id="triggered-downtimes"></a> Triggered Downtimes
|
||||
|
||||
This is optional when scheduling a downtime. If there is already a downtime
|
||||
scheduled for a future maintenance, the current downtime can be triggered by
|
||||
that downtime. This renders useful if you have scheduled a host downtime and
|
||||
are now scheduling a child host's downtime getting triggered by the parent
|
||||
downtime on NOT-OK state change.
|
||||
|
||||
### <a id="recurring-downtimes"></a> Recurring Downtimes
|
||||
|
||||
[ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up
|
||||
recurring downtimes for services.
|
||||
|
||||
Example:
|
||||
|
||||
template ScheduledDowntime "backup-downtime" {
|
||||
author = "icingaadmin",
|
||||
comment = "Scheduled downtime for backup",
|
||||
|
||||
ranges = {
|
||||
monday = "02:00-03:00",
|
||||
tuesday = "02:00-03:00",
|
||||
wednesday = "02:00-03:00",
|
||||
thursday = "02:00-03:00",
|
||||
friday = "02:00-03:00",
|
||||
saturday = "02:00-03:00",
|
||||
sunday = "02:00-03:00"
|
||||
}
|
||||
}
|
||||
|
||||
object Host "localhost" inherits "generic-host" {
|
||||
...
|
||||
services["load"] = {
|
||||
templates = [ "generic-service" ],
|
||||
|
||||
check_command = "load",
|
||||
|
||||
scheduled_downtimes["backup"] = {
|
||||
templates = [ "backup-downtime" ]
|
||||
}
|
||||
},
|
||||
}
|
||||
|
||||
|
||||
## <a id="comments"></a> Comments
|
||||
|
||||
Comments can be added at runtime and are persistent over restarts. You can
|
||||
add useful information for others on repeating incidents (for example
|
||||
"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
|
||||
is primarly accessible using web interfaces.
|
||||
|
||||
Adding and deleting comment actions are possible through the external command pipe
|
||||
provided with the `ExternalCommandListener` configuration. The caller must
|
||||
pass the comment id in case of manipulating an existing comment.
|
||||
|
||||
## <a id="acknowledgements"></a> Acknowledgements
|
||||
|
||||
If a problem is alerted and notified you may signal the other notification
|
||||
receipients that you are aware of the problem and will handle it.
|
||||
|
||||
By sending an acknowledgement to Icinga 2 (using the external command pipe
|
||||
provided with `ExternalCommandListener` configuration) all future notifications
|
||||
are suppressed, a new comment is added with the provided description and
|
||||
a notification with the type `NotificationFilterAcknowledgement` is sent
|
||||
to all notified users.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Modern web interfaces treat acknowledged problems as `handled`.
|
||||
|
||||
### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
|
||||
|
||||
Once a problem is acknowledged it may disappear from your `handled problems`
|
||||
dashboard and no-one ever looks at it again since it will suppress
|
||||
notifications too.
|
||||
|
||||
This `fire-and-forget` action is quite common. If you're sure that a
|
||||
current problem should be resolved in the future at a defined time,
|
||||
you can define an expiration time when acknowledging the problem.
|
||||
|
||||
Icinga 2 will clear the acknowledgement when expired and start to
|
||||
re-notify if the problem persists.
|
||||
|
||||
## <a id="cluster"></a> Cluster
|
||||
|
||||
An Icinga 2 cluster consists of two or more nodes and can reside on multiple
|
||||
architectures. The base concept of Icinga 2 is the possibility to add additional
|
||||
features using components. In case of a cluster setup you have to add the
|
||||
cluster feature to all nodes. Before you start configuring the diffent nodes
|
||||
it's necessary to setup the underlying communication layer based on SSL.
|
||||
|
||||
### <a id="certificate-authority-certificates"></a> Certificate Authority and Certificates
|
||||
|
||||
Icinga 2 comes with two scripts helping you to create CA and node certificates
|
||||
for you Icinga 2 Cluster.
|
||||
|
||||
The first step is the creation of CA using the following command:
|
||||
|
||||
icinga2-build-ca
|
||||
|
||||
Please make sure to export a variable containing an empty folder for the created
|
||||
CA files:
|
||||
|
||||
export ICINGA_CA="/root/icinga-ca"
|
||||
|
||||
In the next step you have to create a certificate and a key file for every node
|
||||
using the following command:
|
||||
|
||||
icinga2-build-key icinga-node-1
|
||||
|
||||
Please create a certificate and a key file for every node in the Icinga 2
|
||||
Cluster and save the CA key in case you want to set up certificates for
|
||||
additional nodes at a later date.
|
||||
|
||||
### <a id="enable-cluster-configuration"></a> Enable the Cluster Configuration
|
||||
|
||||
Until the cluster-component is moved into an independent feature you have to
|
||||
enable the required libraries in the icinga2.conf configuration file:
|
||||
|
||||
library "cluster"
|
||||
|
||||
### <a id="configure-clusterlistener-object"></a> Configure the ClusterListener Object
|
||||
|
||||
The ClusterListener needs to be configured on every node in the cluster with the
|
||||
following settings:
|
||||
|
||||
Configuration Setting |Value
|
||||
-------------------------|------------------------------------
|
||||
ca_path | path to ca.crt file
|
||||
cert_path | path to server certificate
|
||||
key_path | path to server key
|
||||
bind_port | port for incoming and outgoing conns
|
||||
peers | array of all reachable nodes
|
||||
------------------------- ------------------------------------
|
||||
|
||||
A sample config part can look like this:
|
||||
|
||||
/**
|
||||
* Load cluster library and configure ClusterListener using certificate files
|
||||
*/
|
||||
library "cluster"
|
||||
|
||||
object ClusterListener "cluster" {
|
||||
ca_path = "/etc/icinga2/ca/ca.crt",
|
||||
cert_path = "/etc/icinga2/ca/icinga-node-1.crt",
|
||||
key_path = "/etc/icinga2/ca/icinga-node-1.key",
|
||||
|
||||
bind_port = 8888,
|
||||
|
||||
peers = [ "icinga-node-2" ]
|
||||
}
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The certificate files must be readable by the user Icinga 2 is running as. Also,
|
||||
> the private key file should not be world-readable.
|
||||
|
||||
Peers configures the direction used to connect multiple nodes together. If have
|
||||
a three node cluster consisting of
|
||||
|
||||
* node-1
|
||||
* node-2
|
||||
* node-3
|
||||
|
||||
and `node-3` is only reachable from `node-2`, you have to consider this in your
|
||||
peer configuration.
|
||||
|
||||
### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
|
||||
|
||||
In addition to the configured port and hostname every endpoint can have specific
|
||||
abilities to send configuration files to other nodes and limit the hosts allowed
|
||||
to send configuration files.
|
||||
|
||||
Configuration Setting |Value
|
||||
-------------------------|------------------------------------
|
||||
host | hostname
|
||||
port | port
|
||||
accept_config | defines all nodes allowed to send configs
|
||||
config_files | defines all files to be send to that node - MUST BE AN ABSOLUTE PATH
|
||||
------------------------- ------------------------------------
|
||||
|
||||
A sample config part can look like this:
|
||||
|
||||
/**
|
||||
* Configure config master endpoint
|
||||
*/
|
||||
|
||||
object Endpoint "icinga-node-1" {
|
||||
host = "icinga-node-1.localdomain",
|
||||
port = 8888,
|
||||
config_files = ["/etc/icinga2/conf.d/*.conf"]
|
||||
}
|
||||
|
||||
If you update the configuration files on the configured file sender, it will
|
||||
force a restart on all receiving nodes after validating the new config.
|
||||
|
||||
A sample config part for a config receiver endpoint can look like this:
|
||||
|
||||
/**
|
||||
* Configure config receiver endpoint
|
||||
*/
|
||||
|
||||
object Endpoint "icinga-node-2" {
|
||||
host = "icinga-node-2.localdomain",
|
||||
port = 8888,
|
||||
accept_config = [ "icinga-node-1" ]
|
||||
}
|
||||
|
||||
By default these configuration files are saved in /var/lib/icinga2/cluster/config.
|
||||
|
||||
In order to load configuration files which were received from a remote Icinga 2
|
||||
instance you will have to add the following include directive to your
|
||||
`icinga2.conf` configuration file:
|
||||
|
||||
include (IcingaLocalStateDir + "/lib/icinga2/cluster/config/*/*")
|
||||
|
||||
### <a id="initial-cluster-sync"></a> Initial Cluster Sync
|
||||
|
||||
In order to make sure that all of your cluster nodes have the same state you will
|
||||
have to pick one of the nodes as your initial "master" and copy its state file
|
||||
to all the other nodes.
|
||||
|
||||
You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
|
||||
the state file you should make sure that all your cluster nodes are properly shut
|
||||
down.
|
||||
|
||||
|
||||
### <a id="assign-services-to-cluster-nodes"></a> Assign Services to Cluster Nodes
|
||||
|
||||
By default all services are distributed among the cluster nodes with the `Checker`
|
||||
feature enabled.
|
||||
If you require specific services to be only executed by one or more checker nodes
|
||||
within the cluster, you must define `authorities` as additional service object
|
||||
attribute. Required Endpoints must be defined as array.
|
||||
|
||||
object Host "dmz-host1" inherits "generic-host" {
|
||||
services["dmz-oracledb"] = {
|
||||
templates = [ "generic-service" ],
|
||||
authorities = [ "icinga-node-1" ],
|
||||
}
|
||||
}
|
||||
|
||||
> **Tip**
|
||||
>
|
||||
> Most common usecase is building a classic Master-Slave-Setup. The master node
|
||||
> does not have the `Checker` feature enabled, and the slave nodes are checking
|
||||
> services based on their location, inheriting from a global service template
|
||||
> defining the authorities.
|
||||
|
||||
### <a id="cluster-health-check"></a> Cluster Health Check
|
||||
|
||||
The Icinga 2 [ITL](#itl) ships an internal check command checking all configured
|
||||
`EndPoints` in the cluster setup. The check result will become critical if
|
||||
one or more configured nodes are not connected.
|
||||
|
||||
Example:
|
||||
|
||||
object Host "icinga2a" inherits "generic-host" {
|
||||
services["cluster"] = {
|
||||
templates = [ "generic-service" ],
|
||||
check_interval = 1m,
|
||||
check_command = "cluster",
|
||||
authorities = [ "icinga2a" ]
|
||||
},
|
||||
}
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Each cluster node should execute its own local cluster health check to
|
||||
> get an idea about network related connection problems from different
|
||||
> point of views. Use the `authorities` attribute to assign the service
|
||||
> check to the configured node.
|
||||
|
||||
### <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
|
||||
|
||||
Special scenarios might require multiple cluster nodes running on a single host.
|
||||
By default Icinga 2 and its features will drop their runtime data below the prefix
|
||||
`IcingaLocalStateDir`. By default packages will set that path to `/var`.
|
||||
You can either set that variable as constant configuration
|
||||
definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to
|
||||
the Icinga 2 daemon.
|
||||
|
||||
# icinga2 -c /etc/icinga2/node1/icinga2.conf -DIcingaLocalStateDir=/opt/node1/var
|
||||
|
||||
## <a id="domains"></a> Domains
|
||||
|
||||
A [Service](#objecttype-service) object can be restricted using the `domains` attribute
|
||||
array specifying endpoint privileges.
|
||||
A Domain object specifices the ACLs applied for each [Endpoint](#objecttype-endpoint).
|
||||
|
||||
The following example assigns the domain `dmz-db` to the service `dmz-oracledb`. Endpoint
|
||||
`icinga-node-dmz-1` does not allow any object modification (no commands, check results) and only
|
||||
relays local messages to the remote node(s). The endpoint `icinga-node-dmz-2` processes all
|
||||
messages read and write (accept check results, commands and also relay messages to remote
|
||||
nodes).
|
||||
|
||||
That way the service `dmz-oracledb` on endpoint `icinga-node-dmz-1` will not be modified
|
||||
by any cluster event message, and could be checked by the local authority too presenting
|
||||
a different state history. `icinga-node-dmz-2` still receives all cluster message updates
|
||||
from the `icinga-node-dmz-1` endpoint.
|
||||
|
||||
object Host "dmz-host1" inherits "generic-host" {
|
||||
services["dmz-oracledb"] = {
|
||||
templates = [ "generic-service" ],
|
||||
domains = [ "dmz-db" ],
|
||||
authorities = [ "icinga-node-dmz-1", "icinga-node-dmz-2"],
|
||||
}
|
||||
}
|
||||
|
||||
object Domain "dmz-db" {
|
||||
acl = {
|
||||
icinga-node-dmz-1 = (DomainPrivReadOnly),
|
||||
icinga-node-dmz-2 = (DomainPrivReadWrite)
|
||||
}
|
||||
}
|
||||
|
||||
## <a id="dependencies"></a> Dependencies
|
||||
|
||||
Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects either directly
|
||||
defined or as inline definition as `dependencies` dictionary. The `parent_host` and `parent_service`
|
||||
attributes are mandatory, `child_host` and `child_service` attributes are obsolete within
|
||||
inline definitions in an existing service object or service inline definition.
|
||||
|
||||
A service can depend on a host, and vice versa. A service has an implicit dependency (parent)
|
||||
to its host. A host to host dependency acts implicit as host parent relation.
|
||||
When dependencies are calculated, not only the immediate parent is taken into
|
||||
account but all parents are inherited.
|
||||
|
||||
A common scenario is the Icinga 2 server behind a router. Checking internet
|
||||
access by pinging the Google DNS server `google-dns` is a common method, but
|
||||
will fail in case the `dsl-router` host is down. Therefore the example below
|
||||
defines a host dependency which acts implicit as parent relation too.
|
||||
|
||||
Furthermore the host may be reachable but ping samples are dropped by the
|
||||
router's firewall. In case the `dsl-router``ping4` service check fails, all
|
||||
further checks for the `google-dns` `ping4` service should be suppressed.
|
||||
This is achieved by setting the `disable_checks` attribute to `true`.
|
||||
|
||||
object Host "dsl-router" {
|
||||
services["ping4"] = {
|
||||
templates = "generic-service",
|
||||
check_command = "ping4"
|
||||
}
|
||||
|
||||
macros = {
|
||||
address = "192.168.1.1",
|
||||
},
|
||||
}
|
||||
|
||||
object Host "google-dns" {
|
||||
services["ping4"] = {
|
||||
templates = "generic-service",
|
||||
check_command = "ping4",
|
||||
dependencies["dsl-router-ping4"] = {
|
||||
parent_host = "dsl-router",
|
||||
parent_service = "ping4",
|
||||
disable_checks = true
|
||||
}
|
||||
}
|
||||
|
||||
macros = {
|
||||
address = "8.8.8.8",
|
||||
},
|
||||
|
||||
dependencies["dsl-router"] = {
|
||||
parent_host = "dsl-router"
|
||||
},
|
||||
|
||||
}
|
||||
|
||||
## <a id="check-result-freshness"></a> Check Result Freshness
|
||||
|
||||
In Icinga 2 active check freshness is enabled by default. It is determined by the
|
||||
`check_interval` attribute and no incoming check results in that period of time.
|
||||
|
||||
threshold = last check execution time + check interval
|
||||
|
||||
Passive check freshness is calculated from the `check_interval` attribute if set.
|
||||
|
||||
threshold = last check result time + check interval
|
||||
|
||||
If the freshness checks are invalid, a new check is executed defined by the
|
||||
`check_command` attribute.
|
||||
|
||||
## <a id="check-flapping"></a> Check Flapping
|
||||
|
||||
The flapping algorithm used in Icinga 2 does not store the past states but
|
||||
calculcates the flapping threshold from a single value based on counters and
|
||||
half-life values. Icinga 2 compares the value with a single flapping threshold
|
||||
configuration attribute named `flapping_threshold`.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Flapping must be explicitely enabled setting the `Service` object attribute
|
||||
> `enable_flapping = 1`.
|
||||
|
||||
## <a id="volatile-services"></a> Volatile Services
|
||||
|
||||
By default all services remain in a non-volatile state. When a problem
|
||||
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
|
||||
is reached with the check counter, a `HARD` state transition happens.
|
||||
Notifications are only triggered by `HARD` state changes and are then
|
||||
re-sent defined by the `notification_interval` attribute.
|
||||
|
||||
It may be reasonable to have a volatile service which stays in a `HARD`
|
||||
state type if the service stays in a `NOT-OK` state. That way each
|
||||
service recheck will automatically trigger a notification unless the
|
||||
service is acknowledged or in a scheduled downtime.
|
||||
|
||||
## <a id="modified-attributes"></a> Modified Attributes
|
||||
|
||||
Icinga 2 allows you to modify defined object attributes at runtime different to
|
||||
the local configuration object attributes. These modified attributes are
|
||||
stored as bit-shifted-value and made available in backends. Icinga 2 stores
|
||||
modified attributes in its state file and restores them on restart.
|
||||
|
||||
Modified Attributes can be reset using external commands.
|
||||
|
||||
|
||||
## <a id="plugin-api"></a> Plugin API
|
||||
|
||||
Currently the native plugin api inherited from the `Monitoring Plugins` (former
|
||||
`Nagios Plugins`) project is available.
|
||||
Future specifications will be documented here.
|
||||
|
||||
### <a id="monitoring-plugin-api"></a> Monitoring Plugin API
|
||||
|
||||
The `Monitoring Plugin API` (former `Nagios Plugin API`) is defined in the
|
||||
[Monitoring Plugins Development Guidelines](https://www.monitoring-plugins.org/doc/guidelines.html).
|
||||
|
||||
|
||||
|
||||
|
|
|
@ -0,0 +1,102 @@
|
|||
## <a id="downtimes"></a> Downtimes
|
||||
|
||||
Downtimes can be scheduled for planned server maintenance or
|
||||
any other targetted service outage you are aware of in advance.
|
||||
|
||||
Downtimes will suppress any notifications, and may trigger other
|
||||
downtimes too. If the downtime was set by accident, or the duration
|
||||
exceeds the maintenance, you can manually cancel the downtime.
|
||||
Planned downtimes will also be taken into account for SLA reporting
|
||||
tools calculating the SLAs based on the state and downtime history.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Downtimes may overlap with their start and end times. If there
|
||||
> are multiple downtimes triggered for one object, the overall downtime depth
|
||||
> will be more than `1`. This is useful when you want to extend
|
||||
> your maintenance window taking longer than expected.
|
||||
|
||||
### <a id="fixed-flexible-downtimes"></a> Fixed and Flexible Downtimes
|
||||
|
||||
A `fixed` downtime will be activated at the defined start time, and
|
||||
removed at the end time. During this time window the service state
|
||||
will change to `NOT-OK` and then actually trigger the downtime.
|
||||
Notifications are suppressed and the downtime depth is incremented.
|
||||
|
||||
Common scenarios are a planned distribution upgrade on your linux
|
||||
servers, or database updates in your warehouse. The customer knows
|
||||
about a fixed downtime window between 23:00 and 24:00. After 24:00
|
||||
all problems should be alerted again. Solution is simple -
|
||||
schedule a `fixed` downtime starting at 23:00 and ending at 24:00.
|
||||
|
||||
Unlike a `fixed` downtime, a `flexible` downtime end does not necessarily
|
||||
happen at the provided end time. Instead the downtime will be triggered
|
||||
by the state change in the time span defined by start and end time, but
|
||||
then last a defined duration in minutes.
|
||||
|
||||
Imagine the following scenario: Your service is frequently polled
|
||||
by users trying to grab free deleted domains for immediate registration.
|
||||
Between 07:30 and 08:00 the impact will hit for 15 minutes and generate
|
||||
a network outage visible to the monitoring. The service is still alive,
|
||||
but answering too slow to Icinga 2 service checks.
|
||||
For that reason, you may want to schedule a downtime between 07:30 and
|
||||
08:00 with a duration of 15 minutes. The downtime will then last from
|
||||
its trigger time until the duration is over. After that, the downtime
|
||||
is removed (may happen before or after the actual end time!).
|
||||
|
||||
### <a id="scheduling-downtime"></a> Scheduling a downtime
|
||||
|
||||
This can either happen through a web interface (Icinga 1.x Classic UI or Web)
|
||||
or by using the external command pipe provided by the `ExternalCommandListener`
|
||||
configuration.
|
||||
|
||||
Fixed downtimes require a start and end time (a duration will be ignored).
|
||||
Flexible downtimes need a start and end time for the time span, and a duration
|
||||
independent from that time span.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Modern web interfaces treat services in a downtime as `handled`.
|
||||
|
||||
### <a id="triggered-downtimes"></a> Triggered Downtimes
|
||||
|
||||
This is optional when scheduling a downtime. If there is already a downtime
|
||||
scheduled for a future maintenance, the current downtime can be triggered by
|
||||
that downtime. This renders useful if you have scheduled a host downtime and
|
||||
are now scheduling a child host's downtime getting triggered by the parent
|
||||
downtime on NOT-OK state change.
|
||||
|
||||
### <a id="recurring-downtimes"></a> Recurring Downtimes
|
||||
|
||||
[ScheduledDowntime objects](#objecttype-scheduleddowntime) can be used to set up
|
||||
recurring downtimes for services.
|
||||
|
||||
Example:
|
||||
|
||||
template ScheduledDowntime "backup-downtime" {
|
||||
author = "icingaadmin",
|
||||
comment = "Scheduled downtime for backup",
|
||||
|
||||
ranges = {
|
||||
monday = "02:00-03:00",
|
||||
tuesday = "02:00-03:00",
|
||||
wednesday = "02:00-03:00",
|
||||
thursday = "02:00-03:00",
|
||||
friday = "02:00-03:00",
|
||||
saturday = "02:00-03:00",
|
||||
sunday = "02:00-03:00"
|
||||
}
|
||||
}
|
||||
|
||||
object Host "localhost" inherits "generic-host" {
|
||||
...
|
||||
services["load"] = {
|
||||
templates = [ "generic-service" ],
|
||||
|
||||
check_command = "load",
|
||||
|
||||
scheduled_downtimes["backup"] = {
|
||||
templates = [ "backup-downtime" ]
|
||||
}
|
||||
},
|
||||
}
|
|
@ -0,0 +1,10 @@
|
|||
## <a id="comments"></a> Comments
|
||||
|
||||
Comments can be added at runtime and are persistent over restarts. You can
|
||||
add useful information for others on repeating incidents (for example
|
||||
"last time syslog at 100% cpu on 17.10.2013 due to stale nfs mount") which
|
||||
is primarly accessible using web interfaces.
|
||||
|
||||
Adding and deleting comment actions are possible through the external command pipe
|
||||
provided with the `ExternalCommandListener` configuration. The caller must
|
||||
pass the comment id in case of manipulating an existing comment.
|
|
@ -0,0 +1,27 @@
|
|||
## <a id="acknowledgements"></a> Acknowledgements
|
||||
|
||||
If a problem is alerted and notified you may signal the other notification
|
||||
receipients that you are aware of the problem and will handle it.
|
||||
|
||||
By sending an acknowledgement to Icinga 2 (using the external command pipe
|
||||
provided with `ExternalCommandListener` configuration) all future notifications
|
||||
are suppressed, a new comment is added with the provided description and
|
||||
a notification with the type `NotificationFilterAcknowledgement` is sent
|
||||
to all notified users.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Modern web interfaces treat acknowledged problems as `handled`.
|
||||
|
||||
### <a id="expiring-acknowledgements"></a> Expiring Acknowledgements
|
||||
|
||||
Once a problem is acknowledged it may disappear from your `handled problems`
|
||||
dashboard and no-one ever looks at it again since it will suppress
|
||||
notifications too.
|
||||
|
||||
This `fire-and-forget` action is quite common. If you're sure that a
|
||||
current problem should be resolved in the future at a defined time,
|
||||
you can define an expiration time when acknowledging the problem.
|
||||
|
||||
Icinga 2 will clear the acknowledgement when expired and start to
|
||||
re-notify if the problem persists.
|
|
@ -0,0 +1,200 @@
|
|||
## <a id="cluster"></a> Cluster
|
||||
|
||||
An Icinga 2 cluster consists of two or more nodes and can reside on multiple
|
||||
architectures. The base concept of Icinga 2 is the possibility to add additional
|
||||
features using components. In case of a cluster setup you have to add the
|
||||
cluster feature to all nodes. Before you start configuring the diffent nodes
|
||||
it's necessary to setup the underlying communication layer based on SSL.
|
||||
|
||||
### <a id="certificate-authority-certificates"></a> Certificate Authority and Certificates
|
||||
|
||||
Icinga 2 comes with two scripts helping you to create CA and node certificates
|
||||
for you Icinga 2 Cluster.
|
||||
|
||||
The first step is the creation of CA using the following command:
|
||||
|
||||
icinga2-build-ca
|
||||
|
||||
Please make sure to export a variable containing an empty folder for the created
|
||||
CA files:
|
||||
|
||||
export ICINGA_CA="/root/icinga-ca"
|
||||
|
||||
In the next step you have to create a certificate and a key file for every node
|
||||
using the following command:
|
||||
|
||||
icinga2-build-key icinga-node-1
|
||||
|
||||
Please create a certificate and a key file for every node in the Icinga 2
|
||||
Cluster and save the CA key in case you want to set up certificates for
|
||||
additional nodes at a later date.
|
||||
|
||||
### <a id="enable-cluster-configuration"></a> Enable the Cluster Configuration
|
||||
|
||||
Until the cluster-component is moved into an independent feature you have to
|
||||
enable the required libraries in the icinga2.conf configuration file:
|
||||
|
||||
library "cluster"
|
||||
|
||||
### <a id="configure-clusterlistener-object"></a> Configure the ClusterListener Object
|
||||
|
||||
The ClusterListener needs to be configured on every node in the cluster with the
|
||||
following settings:
|
||||
|
||||
Configuration Setting |Value
|
||||
-------------------------|------------------------------------
|
||||
ca_path | path to ca.crt file
|
||||
cert_path | path to server certificate
|
||||
key_path | path to server key
|
||||
bind_port | port for incoming and outgoing conns
|
||||
peers | array of all reachable nodes
|
||||
------------------------- ------------------------------------
|
||||
|
||||
A sample config part can look like this:
|
||||
|
||||
/**
|
||||
* Load cluster library and configure ClusterListener using certificate files
|
||||
*/
|
||||
library "cluster"
|
||||
|
||||
object ClusterListener "cluster" {
|
||||
ca_path = "/etc/icinga2/ca/ca.crt",
|
||||
cert_path = "/etc/icinga2/ca/icinga-node-1.crt",
|
||||
key_path = "/etc/icinga2/ca/icinga-node-1.key",
|
||||
|
||||
bind_port = 8888,
|
||||
|
||||
peers = [ "icinga-node-2" ]
|
||||
}
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The certificate files must be readable by the user Icinga 2 is running as. Also,
|
||||
> the private key file should not be world-readable.
|
||||
|
||||
Peers configures the direction used to connect multiple nodes together. If have
|
||||
a three node cluster consisting of
|
||||
|
||||
* node-1
|
||||
* node-2
|
||||
* node-3
|
||||
|
||||
and `node-3` is only reachable from `node-2`, you have to consider this in your
|
||||
peer configuration.
|
||||
|
||||
### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
|
||||
|
||||
In addition to the configured port and hostname every endpoint can have specific
|
||||
abilities to send configuration files to other nodes and limit the hosts allowed
|
||||
to send configuration files.
|
||||
|
||||
Configuration Setting |Value
|
||||
-------------------------|------------------------------------
|
||||
host | hostname
|
||||
port | port
|
||||
accept_config | defines all nodes allowed to send configs
|
||||
config_files | defines all files to be send to that node - MUST BE AN ABSOLUTE PATH
|
||||
------------------------- ------------------------------------
|
||||
|
||||
A sample config part can look like this:
|
||||
|
||||
/**
|
||||
* Configure config master endpoint
|
||||
*/
|
||||
|
||||
object Endpoint "icinga-node-1" {
|
||||
host = "icinga-node-1.localdomain",
|
||||
port = 8888,
|
||||
config_files = ["/etc/icinga2/conf.d/*.conf"]
|
||||
}
|
||||
|
||||
If you update the configuration files on the configured file sender, it will
|
||||
force a restart on all receiving nodes after validating the new config.
|
||||
|
||||
A sample config part for a config receiver endpoint can look like this:
|
||||
|
||||
/**
|
||||
* Configure config receiver endpoint
|
||||
*/
|
||||
|
||||
object Endpoint "icinga-node-2" {
|
||||
host = "icinga-node-2.localdomain",
|
||||
port = 8888,
|
||||
accept_config = [ "icinga-node-1" ]
|
||||
}
|
||||
|
||||
By default these configuration files are saved in /var/lib/icinga2/cluster/config.
|
||||
|
||||
In order to load configuration files which were received from a remote Icinga 2
|
||||
instance you will have to add the following include directive to your
|
||||
`icinga2.conf` configuration file:
|
||||
|
||||
include (IcingaLocalStateDir + "/lib/icinga2/cluster/config/*/*")
|
||||
|
||||
### <a id="initial-cluster-sync"></a> Initial Cluster Sync
|
||||
|
||||
In order to make sure that all of your cluster nodes have the same state you will
|
||||
have to pick one of the nodes as your initial "master" and copy its state file
|
||||
to all the other nodes.
|
||||
|
||||
You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
|
||||
the state file you should make sure that all your cluster nodes are properly shut
|
||||
down.
|
||||
|
||||
|
||||
### <a id="assign-services-to-cluster-nodes"></a> Assign Services to Cluster Nodes
|
||||
|
||||
By default all services are distributed among the cluster nodes with the `Checker`
|
||||
feature enabled.
|
||||
If you require specific services to be only executed by one or more checker nodes
|
||||
within the cluster, you must define `authorities` as additional service object
|
||||
attribute. Required Endpoints must be defined as array.
|
||||
|
||||
object Host "dmz-host1" inherits "generic-host" {
|
||||
services["dmz-oracledb"] = {
|
||||
templates = [ "generic-service" ],
|
||||
authorities = [ "icinga-node-1" ],
|
||||
}
|
||||
}
|
||||
|
||||
> **Tip**
|
||||
>
|
||||
> Most common usecase is building a classic Master-Slave-Setup. The master node
|
||||
> does not have the `Checker` feature enabled, and the slave nodes are checking
|
||||
> services based on their location, inheriting from a global service template
|
||||
> defining the authorities.
|
||||
|
||||
### <a id="cluster-health-check"></a> Cluster Health Check
|
||||
|
||||
The Icinga 2 [ITL](#itl) ships an internal check command checking all configured
|
||||
`EndPoints` in the cluster setup. The check result will become critical if
|
||||
one or more configured nodes are not connected.
|
||||
|
||||
Example:
|
||||
|
||||
object Host "icinga2a" inherits "generic-host" {
|
||||
services["cluster"] = {
|
||||
templates = [ "generic-service" ],
|
||||
check_interval = 1m,
|
||||
check_command = "cluster",
|
||||
authorities = [ "icinga2a" ]
|
||||
},
|
||||
}
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Each cluster node should execute its own local cluster health check to
|
||||
> get an idea about network related connection problems from different
|
||||
> point of views. Use the `authorities` attribute to assign the service
|
||||
> check to the configured node.
|
||||
|
||||
### <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
|
||||
|
||||
Special scenarios might require multiple cluster nodes running on a single host.
|
||||
By default Icinga 2 and its features will drop their runtime data below the prefix
|
||||
`IcingaLocalStateDir`. By default packages will set that path to `/var`.
|
||||
You can either set that variable as constant configuration
|
||||
definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to
|
||||
the Icinga 2 daemon.
|
||||
|
||||
# icinga2 -c /etc/icinga2/node1/icinga2.conf -DIcingaLocalStateDir=/opt/node1/var
|
|
@ -0,0 +1,31 @@
|
|||
## <a id="domains"></a> Domains
|
||||
|
||||
A [Service](#objecttype-service) object can be restricted using the `domains` attribute
|
||||
array specifying endpoint privileges.
|
||||
A Domain object specifices the ACLs applied for each [Endpoint](#objecttype-endpoint).
|
||||
|
||||
The following example assigns the domain `dmz-db` to the service `dmz-oracledb`. Endpoint
|
||||
`icinga-node-dmz-1` does not allow any object modification (no commands, check results) and only
|
||||
relays local messages to the remote node(s). The endpoint `icinga-node-dmz-2` processes all
|
||||
messages read and write (accept check results, commands and also relay messages to remote
|
||||
nodes).
|
||||
|
||||
That way the service `dmz-oracledb` on endpoint `icinga-node-dmz-1` will not be modified
|
||||
by any cluster event message, and could be checked by the local authority too presenting
|
||||
a different state history. `icinga-node-dmz-2` still receives all cluster message updates
|
||||
from the `icinga-node-dmz-1` endpoint.
|
||||
|
||||
object Host "dmz-host1" inherits "generic-host" {
|
||||
services["dmz-oracledb"] = {
|
||||
templates = [ "generic-service" ],
|
||||
domains = [ "dmz-db" ],
|
||||
authorities = [ "icinga-node-dmz-1", "icinga-node-dmz-2"],
|
||||
}
|
||||
}
|
||||
|
||||
object Domain "dmz-db" {
|
||||
acl = {
|
||||
icinga-node-dmz-1 = (DomainPrivReadOnly),
|
||||
icinga-node-dmz-2 = (DomainPrivReadWrite)
|
||||
}
|
||||
}
|
|
@ -0,0 +1,53 @@
|
|||
## <a id="dependencies"></a> Dependencies
|
||||
|
||||
Icinga 2 uses host and service [Dependency](#objecttype-dependency) objects either directly
|
||||
defined or as inline definition as `dependencies` dictionary. The `parent_host` and `parent_service`
|
||||
attributes are mandatory, `child_host` and `child_service` attributes are obsolete within
|
||||
inline definitions in an existing service object or service inline definition.
|
||||
|
||||
A service can depend on a host, and vice versa. A service has an implicit dependency (parent)
|
||||
to its host. A host to host dependency acts implicit as host parent relation.
|
||||
When dependencies are calculated, not only the immediate parent is taken into
|
||||
account but all parents are inherited.
|
||||
|
||||
A common scenario is the Icinga 2 server behind a router. Checking internet
|
||||
access by pinging the Google DNS server `google-dns` is a common method, but
|
||||
will fail in case the `dsl-router` host is down. Therefore the example below
|
||||
defines a host dependency which acts implicit as parent relation too.
|
||||
|
||||
Furthermore the host may be reachable but ping samples are dropped by the
|
||||
router's firewall. In case the `dsl-router``ping4` service check fails, all
|
||||
further checks for the `google-dns` `ping4` service should be suppressed.
|
||||
This is achieved by setting the `disable_checks` attribute to `true`.
|
||||
|
||||
object Host "dsl-router" {
|
||||
services["ping4"] = {
|
||||
templates = "generic-service",
|
||||
check_command = "ping4"
|
||||
}
|
||||
|
||||
macros = {
|
||||
address = "192.168.1.1",
|
||||
},
|
||||
}
|
||||
|
||||
object Host "google-dns" {
|
||||
services["ping4"] = {
|
||||
templates = "generic-service",
|
||||
check_command = "ping4",
|
||||
dependencies["dsl-router-ping4"] = {
|
||||
parent_host = "dsl-router",
|
||||
parent_service = "ping4",
|
||||
disable_checks = true
|
||||
}
|
||||
}
|
||||
|
||||
macros = {
|
||||
address = "8.8.8.8",
|
||||
},
|
||||
|
||||
dependencies["dsl-router"] = {
|
||||
parent_host = "dsl-router"
|
||||
},
|
||||
|
||||
}
|
|
@ -0,0 +1,13 @@
|
|||
## <a id="check-result-freshness"></a> Check Result Freshness
|
||||
|
||||
In Icinga 2 active check freshness is enabled by default. It is determined by the
|
||||
`check_interval` attribute and no incoming check results in that period of time.
|
||||
|
||||
threshold = last check execution time + check interval
|
||||
|
||||
Passive check freshness is calculated from the `check_interval` attribute if set.
|
||||
|
||||
threshold = last check result time + check interval
|
||||
|
||||
If the freshness checks are invalid, a new check is executed defined by the
|
||||
`check_command` attribute.
|
|
@ -0,0 +1,11 @@
|
|||
## <a id="check-flapping"></a> Check Flapping
|
||||
|
||||
The flapping algorithm used in Icinga 2 does not store the past states but
|
||||
calculcates the flapping threshold from a single value based on counters and
|
||||
half-life values. Icinga 2 compares the value with a single flapping threshold
|
||||
configuration attribute named `flapping_threshold`.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Flapping must be explicitely enabled setting the `Service` object attribute
|
||||
> `enable_flapping = 1`.
|
|
@ -0,0 +1,12 @@
|
|||
## <a id="volatile-services"></a> Volatile Services
|
||||
|
||||
By default all services remain in a non-volatile state. When a problem
|
||||
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
|
||||
is reached with the check counter, a `HARD` state transition happens.
|
||||
Notifications are only triggered by `HARD` state changes and are then
|
||||
re-sent defined by the `notification_interval` attribute.
|
||||
|
||||
It may be reasonable to have a volatile service which stays in a `HARD`
|
||||
state type if the service stays in a `NOT-OK` state. That way each
|
||||
service recheck will automatically trigger a notification unless the
|
||||
service is acknowledged or in a scheduled downtime.
|
|
@ -0,0 +1,8 @@
|
|||
## <a id="modified-attributes"></a> Modified Attributes
|
||||
|
||||
Icinga 2 allows you to modify defined object attributes at runtime different to
|
||||
the local configuration object attributes. These modified attributes are
|
||||
stored as bit-shifted-value and made available in backends. Icinga 2 stores
|
||||
modified attributes in its state file and restores them on restart.
|
||||
|
||||
Modified Attributes can be reset using external commands.
|
|
@ -0,0 +1,10 @@
|
|||
## <a id="plugin-api"></a> Plugin API
|
||||
|
||||
Currently the native plugin api inherited from the `Monitoring Plugins` (former
|
||||
`Nagios Plugins`) project is available.
|
||||
Future specifications will be documented here.
|
||||
|
||||
### <a id="monitoring-plugin-api"></a> Monitoring Plugin API
|
||||
|
||||
The `Monitoring Plugin API` (former `Nagios Plugin API`) is defined in the
|
||||
[Monitoring Plugins Development Guidelines](https://www.monitoring-plugins.org/doc/guidelines.html).
|
Loading…
Reference in New Issue