Docs: Improve features chapter and add details on HA setups

refs #4855
This commit is contained in:
Michael Friedrich 2019-05-08 17:48:13 +02:00
parent 15326caf38
commit 07790e456b
2 changed files with 239 additions and 96 deletions

View File

@ -60,7 +60,7 @@ Use your distribution's package manager to install the `pnp4nagios` package.
If you're planning to use it, configure it to use the
[bulk mode with npcd and npcdmod](https://docs.pnp4nagios.org/pnp-0.6/modes#bulk_mode_with_npcd_and_npcdmod)
in combination with Icinga 2's [PerfdataWriter](14-features.md#performance-data). NPCD collects the performance
in combination with Icinga 2's [PerfdataWriter](14-features.md#writing-performance-data-files). NPCD collects the performance
data files which Icinga 2 generates.
Enable performance data writer in icinga 2

View File

@ -38,7 +38,13 @@ files then:
By default, log files will be rotated daily.
## DB IDO <a id="db-ido"></a>
## Core Backends <a id="core-backends"></a>
### REST API <a id="core-backends-api"></a>
The REST API is documented [here](12-icinga2-api.md#icinga2-api) as a core feature.
### IDO Database (DB IDO) <a id="db-ido"></a>
The IDO (Icinga Data Output) feature for Icinga 2 takes care of exporting all
configuration and status information into a database. The IDO database is used
@ -49,10 +55,8 @@ chapter. Details on the configuration can be found in the
[IdoMysqlConnection](09-object-types.md#objecttype-idomysqlconnection) and
[IdoPgsqlConnection](09-object-types.md#objecttype-idopgsqlconnection)
object configuration documentation.
The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in
the Icinga 2 cluster.
### DB IDO Health <a id="db-ido-health"></a>
#### DB IDO Health <a id="db-ido-health"></a>
If the monitoring health indicator is critical in Icinga Web 2,
you can use the following queries to manually check whether Icinga 2
@ -100,7 +104,21 @@ status_update_time
A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](24-appendix.md#schema-db-ido).
### DB IDO Cleanup <a id="db-ido-cleanup"></a>
#### DB IDO in Cluster HA Zones <a id="db-ido-cluster-ha"></a>
The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in
the Icinga 2 cluster.
By default, both endpoints in a zone calculate the
endpoint which activates the feature, the other endpoint
automatically pauses it. If the cluster connection
breaks at some point, the paused IDO feature automatically
does a failover.
You can disable this behaviour by setting `enable_ha = false`
in both feature configuration files.
#### DB IDO Cleanup <a id="db-ido-cleanup"></a>
Objects get deactivated when they are deleted from the configuration.
This is visible with the `is_active` column in the `icinga_objects` table.
@ -125,7 +143,7 @@ Example if you prefer to keep notification history for 30 days:
The historical tables are populated depending on the data `categories` specified.
Some tables are empty by default.
### DB IDO Tuning <a id="db-ido-tuning"></a>
#### DB IDO Tuning <a id="db-ido-tuning"></a>
As with any application database, there are ways to optimize and tune the database performance.
@ -171,98 +189,30 @@ VACUUM
> Don't use `VACUUM FULL` as this has a severe impact on performance.
## External Commands <a id="external-commands"></a>
## Metrics <a id="metrics"></a>
> **Note**
>
> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative
> for external actions.
Whenever a host or service check is executed, or received via the REST API,
best practice is to provide performance data.
> **Note**
>
> This feature is DEPRECATED and will be removed in future releases.
> Check the [roadmap](https://github.com/Icinga/icinga2/milestones).
This data is parsed by features sending metrics to time series databases (TSDB):
Icinga 2 provides an external command pipe for processing commands
triggering specific actions (for example rescheduling a service check
through the web interface).
* [Graphite](14-features.md#graphite-carbon-cache-writer)
* [InfluxDB](14-features.md#influxdb-writer)
* [OpenTSDB](14-features.md#opentsdb-writer)
In order to enable the `ExternalCommandListener` configuration use the
following command and restart Icinga 2 afterwards:
Metrics, state changes and notifications can be managed with the following integrations:
```
# icinga2 feature enable command
```
* [Elastic Stack](14-features.md#elastic-stack-integration)
* [Graylog](14-features.md#graylog-integration)
Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
using the default configuration.
Web interfaces and other Icinga addons are able to send commands to
Icinga 2 through the external command pipe, for example for rescheduling
a forced service check:
### Graphite Writer <a id="graphite-carbon-cache-writer"></a>
```
# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
[Graphite](13-addons.md#addons-graphing-graphite) is a tool stack for storing
metrics and needs to be running prior to enabling the `graphite` feature.
# tail -f /var/log/messages
Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
```
A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail).
Detailed information on the commands and their required parameters can be found
on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html).
## Performance Data <a id="performance-data"></a>
When a host or service check is executed plugins should provide so-called
`performance data`. Next to that additional check performance data
can be fetched using Icinga 2 runtime macros such as the check latency
or the current service state (or additional custom attributes).
The performance data can be passed to external applications which aggregate and
store them in their backends. These tools usually generate graphs for historical
reporting and trending.
Well-known addons processing Icinga performance data are [PNP4Nagios](13-addons.md#addons-graphing-pnp),
[Graphite](13-addons.md#addons-graphing-graphite) or [OpenTSDB](14-features.md#opentsdb-writer).
### Writing Performance Data Files <a id="writing-performance-data-files"></a>
PNP4Nagios and Graphios use performance data collector daemons to fetch
the current performance files for their backend updates.
Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter)
feature allows you to define the output template format for host and services helped
with Icinga 2 runtime vars.
```
host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$"
service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$"
```
The default templates are already provided with the Icinga 2 feature configuration
which can be enabled using
```
# icinga2 feature enable perfdata
```
By default all performance data files are rotated in a 15 seconds interval into
the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
`service-perfdata.<timestamp>`.
External collectors need to parse the rotated performance data files and then
remove the processed files.
### Graphite Carbon Cache Writer <a id="graphite-carbon-cache-writer"></a>
While there are some [Graphite](13-addons.md#addons-graphing-graphite)
collector scripts and daemons like Graphios available for Icinga 1.x it's more
reasonable to directly process the check and plugin performance
in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
write them to the defined Graphite Carbon daemon tcp socket.
Icinga 2 writes parsed metrics directly to Graphite's Carbon Cache
TCP port, defaulting to `2003`.
You can enable the feature using
@ -273,7 +223,7 @@ You can enable the feature using
By default the [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) feature
expects the Graphite Carbon Cache to listen at `127.0.0.1` on TCP port `2003`.
#### Current Graphite Schema <a id="graphite-carbon-cache-writer-schema"></a>
#### Graphite Schema <a id="graphite-carbon-cache-writer-schema"></a>
The current naming schema is defined as follows. The [Icinga Web 2 Graphite module](https://github.com/icinga/icingaweb2-module-graphite)
depends on this schema.
@ -308,7 +258,8 @@ Metric values are stored like this:
<prefix>.perfdata.<perfdata-label>.value
```
The following characters are escaped in perfdata labels:
The following characters are escaped in performance labels
parsed from plugin output:
Character | Escaped character
--------------|--------------------------
@ -317,7 +268,7 @@ The following characters are escaped in perfdata labels:
/ | _
:: | .
Note that perfdata labels may contain dots (`.`) allowing to
Note that labels may contain dots (`.`) allowing to
add more subsequent levels inside the Graphite tree.
`::` adds support for [multi performance labels](http://my-plugin.de/wiki/projects/check_multi/configuration/performance)
and is therefore replaced by `.`.
@ -369,6 +320,25 @@ pattern = ^icinga2\.
retentions = 1m:2d,5m:10d,30m:90d,360m:4y
```
#### Graphite in Cluster HA Zones <a id="graphite-carbon-cache-writer-cluster-ha"></a>
The Graphite feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
in cluster zones since 2.11.
By default, all endpoints in a zone will activate the feature and start
writing metrics to a Carbon Cache socket. In HA enabled scenarios,
it is possible to set `enable_ha = true` in all feature configuration
files. This allows each endpoint to calculate the feature authority,
and only one endpoint actively writes metrics, the other endpoints
pause the feature.
When the cluster connection breaks at some point, the remaining endpoint(s)
in that zone will automatically resume the feature. This built-in failover
mechanism ensures that metrics are written even if the cluster fails.
The recommended way of running Graphite in this scenario is a dedicated server
where Carbon Cache/Relay is running as receiver.
### InfluxDB Writer <a id="influxdb-writer"></a>
@ -447,6 +417,25 @@ object InfluxdbWriter "influxdb" {
}
```
#### InfluxDB in Cluster HA Zones <a id="influxdb-writer-cluster-ha"></a>
The InfluxDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
in cluster zones since 2.11.
By default, all endpoints in a zone will activate the feature and start
writing metrics to the InfluxDB HTTP API. In HA enabled scenarios,
it is possible to set `enable_ha = true` in all feature configuration
files. This allows each endpoint to calculate the feature authority,
and only one endpoint actively writes metrics, the other endpoints
pause the feature.
When the cluster connection breaks at some point, the remaining endpoint(s)
in that zone will automatically resume the feature. This built-in failover
mechanism ensures that metrics are written even if the cluster fails.
The recommended way of running InfluxDB in this scenario is a dedicated server
where the InfluxDB HTTP API or Telegraf as Proxy are running.
### Elastic Stack Integration <a id="elastic-stack-integration"></a>
[Icingabeat](https://github.com/icinga/icingabeat) is an Elastic Beat that fetches data
@ -524,6 +513,26 @@ check_result.perfdata.<perfdata-label>.warn
check_result.perfdata.<perfdata-label>.crit
```
#### Elasticsearch in Cluster HA Zones <a id="elasticsearch-writer-cluster-ha"></a>
The Elasticsearch feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
in cluster zones since 2.11.
By default, all endpoints in a zone will activate the feature and start
writing events to the Elasticsearch HTTP API. In HA enabled scenarios,
it is possible to set `enable_ha = true` in all feature configuration
files. This allows each endpoint to calculate the feature authority,
and only one endpoint actively writes events, the other endpoints
pause the feature.
When the cluster connection breaks at some point, the remaining endpoint(s)
in that zone will automatically resume the feature. This built-in failover
mechanism ensures that events are written even if the cluster fails.
The recommended way of running Elasticsearch in this scenario is a dedicated server
where you either have the Elasticsearch HTTP API, or a TLS secured HTTP proxy,
or Logstash for additional filtering.
### Graylog Integration <a id="graylog-integration"></a>
#### GELF Writer <a id="gelfwriter"></a>
@ -550,6 +559,24 @@ Currently these events are processed:
* State changes
* Notifications
#### Graylog/GELF in Cluster HA Zones <a id="gelf-writer-cluster-ha"></a>
The Gelf feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
in cluster zones since 2.11.
By default, all endpoints in a zone will activate the feature and start
writing events to the Graylog HTTP API. In HA enabled scenarios,
it is possible to set `enable_ha = true` in all feature configuration
files. This allows each endpoint to calculate the feature authority,
and only one endpoint actively writes events, the other endpoints
pause the feature.
When the cluster connection breaks at some point, the remaining endpoint(s)
in that zone will automatically resume the feature. This built-in failover
mechanism ensures that events are written even if the cluster fails.
The recommended way of running Graylog in this scenario is a dedicated server
where you have the Graylog HTTP API listening.
### OpenTSDB Writer <a id="opentsdb-writer"></a>
@ -625,6 +652,75 @@ with the following tags
> You might want to set the tsd.core.auto_create_metrics setting to `true`
> in your opentsdb.conf configuration file.
#### OpenTSDB in Cluster HA Zones <a id="opentsdb-writer-cluster-ha"></a>
The OpenTSDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
in cluster zones since 2.11.
By default, all endpoints in a zone will activate the feature and start
writing events to the OpenTSDB listener. In HA enabled scenarios,
it is possible to set `enable_ha = true` in all feature configuration
files. This allows each endpoint to calculate the feature authority,
and only one endpoint actively writes metrics, the other endpoints
pause the feature.
When the cluster connection breaks at some point, the remaining endpoint(s)
in that zone will automatically resume the feature. This built-in failover
mechanism ensures that metrics are written even if the cluster fails.
The recommended way of running OpenTSDB in this scenario is a dedicated server
where you have OpenTSDB running.
### Writing Performance Data Files <a id="writing-performance-data-files"></a>
PNP and Graphios use performance data collector daemons to fetch
the current performance files for their backend updates.
Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter)
feature allows you to define the output template format for host and services helped
with Icinga 2 runtime vars.
```
host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$"
service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$"
```
The default templates are already provided with the Icinga 2 feature configuration
which can be enabled using
```
# icinga2 feature enable perfdata
```
By default all performance data files are rotated in a 15 seconds interval into
the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
`service-perfdata.<timestamp>`.
External collectors need to parse the rotated performance data files and then
remove the processed files.
#### Perfdata Files in Cluster HA Zones <a id="perfdata-writer-cluster-ha"></a>
The Perfdata feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
in cluster zones since 2.11.
By default, all endpoints in a zone will activate the feature and start
writing metrics to the local spool directory. In HA enabled scenarios,
it is possible to set `enable_ha = true` in all feature configuration
files. This allows each endpoint to calculate the feature authority,
and only one endpoint actively writes metrics, the other endpoints
pause the feature.
When the cluster connection breaks at some point, the remaining endpoint(s)
in that zone will automatically resume the feature. This built-in failover
mechanism ensures that metrics are written even if the cluster fails.
The recommended way of running Perfdata is to mount the perfdata spool
directory via NFS on a central server where PNP with the NPCD collector
is running on.
## Livestatus <a id="setting-up-livestatus"></a>
@ -831,7 +927,9 @@ The `commands` table is populated with `CheckCommand`, `EventCommand` and `Notif
A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](24-appendix.md#schema-livestatus).
## Status Data Files <a id="status-data"></a>
## Deprecated Features <a id="deprecated-features"></a>
### Status Data Files <a id="status-data"></a>
> **Note**
>
@ -850,7 +948,7 @@ status updates in a regular interval.
If you are not using any web interface or addon which uses these files,
you can safely disable this feature.
## Compat Log Files <a id="compat-logging"></a>
### Compat Log Files <a id="compat-logging"></a>
> **Note**
>
@ -876,7 +974,52 @@ By default, the Icinga 1.x log file called `icinga.log` is located
in `/var/log/icinga2/compat`. Rotated log files are moved into
`var/log/icinga2/compat/archives`.
## Check Result Files <a id="check-result-files"></a>
### External Command Pipe <a id="external-commands"></a>
> **Note**
>
> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative
> for external actions.
> **Note**
>
> This feature is DEPRECATED and will be removed in future releases.
> Check the [roadmap](https://github.com/Icinga/icinga2/milestones).
Icinga 2 provides an external command pipe for processing commands
triggering specific actions (for example rescheduling a service check
through the web interface).
In order to enable the `ExternalCommandListener` configuration use the
following command and restart Icinga 2 afterwards:
```
# icinga2 feature enable command
```
Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
using the default configuration.
Web interfaces and other Icinga addons are able to send commands to
Icinga 2 through the external command pipe, for example for rescheduling
a forced service check:
```
# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
# tail -f /var/log/messages
Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
```
A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail).
Detailed information on the commands and their required parameters can be found
on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html).
### Check Result Files <a id="check-result-files"></a>
> **Note**
>