mirror of https://github.com/Icinga/icinga2.git
Merge pull request #7169 from Icinga/feature/enhance-docs
Docs: Improve distributed, features HA, reachability chapters
This commit is contained in:
commit
81075088f1
|
@ -2383,14 +2383,19 @@ states = [ OK, Critical, Unknown ]
|
|||
> If the parent service object changes into the `Warning` state, this
|
||||
> dependency will fail and render all child objects (hosts or services) unreachable.
|
||||
|
||||
You can determine the child's reachability by querying the `is_reachable` attribute
|
||||
in for example [DB IDO](24-appendix.md#schema-db-ido-extensions).
|
||||
You can determine the child's reachability by querying the `last_reachable` attribute
|
||||
via the [REST API](12-icinga2-api.md#icinga2-api).
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Reachability calculation depends on fresh and processed check results. If dependencies
|
||||
> disable checks for child objects, this won't work reliably.
|
||||
|
||||
### Implicit Dependencies for Services on Host <a id="dependencies-implicit-host-service"></a>
|
||||
|
||||
Icinga 2 automatically adds an implicit dependency for services on their host. That way
|
||||
service notifications are suppressed when a host is `DOWN` or `UNREACHABLE`. This dependency
|
||||
does not overwrite other dependencies and implicitely sets `disable_notifications = true` and
|
||||
does not overwrite other dependencies and implicitly sets `disable_notifications = true` and
|
||||
`states = [ Up ]` for all service objects.
|
||||
|
||||
Service checks are still executed. If you want to prevent them from happening, you can
|
||||
|
|
|
@ -1266,15 +1266,26 @@ If you are eager to start fresh instead you might take a look into the
|
|||
|
||||
The following examples should give you an idea on how to build your own
|
||||
distributed monitoring environment. We've seen them all in production
|
||||
environments and received feedback from our [community](https://icinga.com/community/)
|
||||
environments and received feedback from our [community](https://community.icinga.com/)
|
||||
and [partner support](https://icinga.com/support/) channels:
|
||||
|
||||
* Single master with clients.
|
||||
* HA master with clients as command endpoint.
|
||||
* Three level cluster with config HA masters, satellites receiving config sync, and clients checked using command endpoint.
|
||||
* [Single master with client](06-distributed-monitoring.md#distributed-monitoring-master-clients).
|
||||
* [HA master with clients as command endpoint](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-clients)
|
||||
* [Three level cluster](06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-client) with config HA masters, satellites receiving config sync, and clients checked using command endpoint.
|
||||
|
||||
You can also extend the cluster tree depth to four levels e.g. with 2 satellite levels.
|
||||
Just keep in mind that multiple levels become harder to debug in case of errors.
|
||||
|
||||
You can also start with a single master setup, and later add a secondary
|
||||
master endpoint. This requires an extra step with the [initial sync](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-initial-sync)
|
||||
for cloning the runtime state. This is described in detail [here](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-clients).
|
||||
|
||||
### Master with Clients <a id="distributed-monitoring-master-clients"></a>
|
||||
|
||||
In this scenario, a single master node runs the check scheduler, notifications
|
||||
and IDO database backend and uses the [command endpoint mode](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint)
|
||||
to execute checks on the remote clients.
|
||||
|
||||
![Icinga 2 Distributed Master with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_master_clients.png)
|
||||
|
||||
* `icinga2-master1.localdomain` is the primary master node.
|
||||
|
@ -1441,16 +1452,22 @@ Validate the configuration and restart Icinga 2 on the master node `icinga2-mast
|
|||
Open Icinga Web 2 and check the two newly created client hosts with two new services
|
||||
-- one executed locally (`ping4`) and one using command endpoint (`disk`).
|
||||
|
||||
### High-Availability Master with Clients <a id="distributed-monitoring-scenarios-ha-master-clients"></a>
|
||||
|
||||
![Icinga 2 Distributed High Availability Master with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_ha_master_clients.png)
|
||||
### High-Availability Master with Clients <a id="distributed-monitoring-scenarios-ha-master-clients"></a>
|
||||
|
||||
This scenario is similar to the one in the [previous section](06-distributed-monitoring.md#distributed-monitoring-master-clients). The only difference is that we will now set up two master nodes in a high-availability setup.
|
||||
These nodes must be configured as zone and endpoints objects.
|
||||
|
||||
![Icinga 2 Distributed High Availability Master with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_ha_master_clients.png)
|
||||
|
||||
The setup uses the capabilities of the Icinga 2 cluster. All zone members
|
||||
replicate cluster events amongst each other. In addition to that, several Icinga 2
|
||||
features can enable HA functionality.
|
||||
features can enable [HA functionality](06-distributed-monitoring.md#distributed-monitoring-high-availability-features).
|
||||
|
||||
Best practice is to run the database backend on a dedicated server/cluster and
|
||||
only expose a virtual IP address to Icinga and the IDO feature. By default, only one
|
||||
endpoint will actively write to the backend then. Typical setups for MySQL clusters
|
||||
involve Galera, more tips can be found on our [community forums](https://community.icinga.com/).
|
||||
|
||||
**Note**: All nodes in the same zone require that you enable the same features for high-availability (HA).
|
||||
|
||||
|
@ -1481,6 +1498,12 @@ you can disable the HA feature and write to a local database on each node.
|
|||
Both methods require that you configure Icinga Web 2 accordingly (monitoring
|
||||
backend, IDO database, used transports, etc.).
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> You can also start with a single master shown [here](06-distributed-monitoring.md#distributed-monitoring-master-clients) and later add
|
||||
> the second master. This requires an extra step with the [initial sync](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-initial-sync)
|
||||
> for cloning the runtime state after done. Once done, proceed here.
|
||||
|
||||
The zone hierarchy could look like this. It involves putting the two master nodes
|
||||
`icinga2-master1.localdomain` and `icinga2-master2.localdomain` into the `master` zone.
|
||||
|
||||
|
@ -1659,17 +1682,22 @@ to make sure that your cluster notifies you in case of failure.
|
|||
|
||||
### Three Levels with Master, Satellites, and Clients <a id="distributed-monitoring-scenarios-master-satellite-client"></a>
|
||||
|
||||
![Icinga 2 Distributed Master and Satellites with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_master_satellite_client.png)
|
||||
|
||||
This scenario combines everything you've learned so far: High-availability masters,
|
||||
satellites receiving their configuration from the master zone, and clients checked via command
|
||||
endpoint from the satellite zones.
|
||||
|
||||
![Icinga 2 Distributed Master and Satellites with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_master_satellite_client.png)
|
||||
|
||||
> **Tip**:
|
||||
>
|
||||
> It can get complicated, so grab a pen and paper and bring your thoughts to life.
|
||||
> Play around with a test setup before using it in a production environment!
|
||||
|
||||
Best practice is to run the database backend on a dedicated server/cluster and
|
||||
only expose a virtual IP address to Icinga and the IDO feature. By default, only one
|
||||
endpoint will actively write to the backend then. Typical setups for MySQL clusters
|
||||
involve Galera, more tips can be found on our [community forums](https://community.icinga.com/).
|
||||
|
||||
Overview:
|
||||
|
||||
* `icinga2-master1.localdomain` is the configuration master master node.
|
||||
|
@ -2747,6 +2775,24 @@ object Endpoint "icinga2-master2.localdomain" {
|
|||
}
|
||||
```
|
||||
|
||||
### Initial Sync for new Endpoints in a Zone <a id="distributed-monitoring-advanced-hints-initial-sync"></a>
|
||||
|
||||
In order to make sure that all of your zone endpoints have the same state you need
|
||||
to pick the authoritative running one and copy the following content:
|
||||
|
||||
* State file from `/var/lib/icinga2/icinga2.state`
|
||||
* Internal config package for runtime created objects (downtimes, comments, hosts, etc.) at `/var/lib/icinga2/api/packages/_api`
|
||||
|
||||
If you need already deployed config packages from the Director, or synced cluster zones,
|
||||
you can also sync the entire `/var/lib/icinga2` directory. This directory should also be
|
||||
included in your [backup strategy](02-getting-started.md#install-backup).
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Ensure that all endpoints are shut down during this procedure. Once you have
|
||||
> synced the cached files, proceed with configuring the remaining endpoints
|
||||
> to let them know about the new master/satellite node (zones.conf).
|
||||
|
||||
### Manual Certificate Creation <a id="distributed-monitoring-advanced-hints-certificates-manual"></a>
|
||||
|
||||
#### Create CA on the Master <a id="distributed-monitoring-advanced-hints-certificates-manual-ca"></a>
|
||||
|
|
|
@ -60,7 +60,7 @@ Use your distribution's package manager to install the `pnp4nagios` package.
|
|||
|
||||
If you're planning to use it, configure it to use the
|
||||
[bulk mode with npcd and npcdmod](https://docs.pnp4nagios.org/pnp-0.6/modes#bulk_mode_with_npcd_and_npcdmod)
|
||||
in combination with Icinga 2's [PerfdataWriter](14-features.md#performance-data). NPCD collects the performance
|
||||
in combination with Icinga 2's [PerfdataWriter](14-features.md#writing-performance-data-files). NPCD collects the performance
|
||||
data files which Icinga 2 generates.
|
||||
|
||||
Enable performance data writer in icinga 2
|
||||
|
|
|
@ -38,7 +38,13 @@ files then:
|
|||
|
||||
By default, log files will be rotated daily.
|
||||
|
||||
## DB IDO <a id="db-ido"></a>
|
||||
## Core Backends <a id="core-backends"></a>
|
||||
|
||||
### REST API <a id="core-backends-api"></a>
|
||||
|
||||
The REST API is documented [here](12-icinga2-api.md#icinga2-api) as a core feature.
|
||||
|
||||
### IDO Database (DB IDO) <a id="db-ido"></a>
|
||||
|
||||
The IDO (Icinga Data Output) feature for Icinga 2 takes care of exporting all
|
||||
configuration and status information into a database. The IDO database is used
|
||||
|
@ -49,10 +55,8 @@ chapter. Details on the configuration can be found in the
|
|||
[IdoMysqlConnection](09-object-types.md#objecttype-idomysqlconnection) and
|
||||
[IdoPgsqlConnection](09-object-types.md#objecttype-idopgsqlconnection)
|
||||
object configuration documentation.
|
||||
The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in
|
||||
the Icinga 2 cluster.
|
||||
|
||||
### DB IDO Health <a id="db-ido-health"></a>
|
||||
#### DB IDO Health <a id="db-ido-health"></a>
|
||||
|
||||
If the monitoring health indicator is critical in Icinga Web 2,
|
||||
you can use the following queries to manually check whether Icinga 2
|
||||
|
@ -100,7 +104,21 @@ status_update_time
|
|||
|
||||
A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](24-appendix.md#schema-db-ido).
|
||||
|
||||
### DB IDO Cleanup <a id="db-ido-cleanup"></a>
|
||||
#### DB IDO in Cluster HA Zones <a id="db-ido-cluster-ha"></a>
|
||||
|
||||
The DB IDO feature supports [High Availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-db-ido) in
|
||||
the Icinga 2 cluster.
|
||||
|
||||
By default, both endpoints in a zone calculate the
|
||||
endpoint which activates the feature, the other endpoint
|
||||
automatically pauses it. If the cluster connection
|
||||
breaks at some point, the paused IDO feature automatically
|
||||
does a failover.
|
||||
|
||||
You can disable this behaviour by setting `enable_ha = false`
|
||||
in both feature configuration files.
|
||||
|
||||
#### DB IDO Cleanup <a id="db-ido-cleanup"></a>
|
||||
|
||||
Objects get deactivated when they are deleted from the configuration.
|
||||
This is visible with the `is_active` column in the `icinga_objects` table.
|
||||
|
@ -125,7 +143,7 @@ Example if you prefer to keep notification history for 30 days:
|
|||
The historical tables are populated depending on the data `categories` specified.
|
||||
Some tables are empty by default.
|
||||
|
||||
### DB IDO Tuning <a id="db-ido-tuning"></a>
|
||||
#### DB IDO Tuning <a id="db-ido-tuning"></a>
|
||||
|
||||
As with any application database, there are ways to optimize and tune the database performance.
|
||||
|
||||
|
@ -171,98 +189,30 @@ VACUUM
|
|||
> Don't use `VACUUM FULL` as this has a severe impact on performance.
|
||||
|
||||
|
||||
## External Commands <a id="external-commands"></a>
|
||||
## Metrics <a id="metrics"></a>
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative
|
||||
> for external actions.
|
||||
Whenever a host or service check is executed, or received via the REST API,
|
||||
best practice is to provide performance data.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This feature is DEPRECATED and will be removed in future releases.
|
||||
> Check the [roadmap](https://github.com/Icinga/icinga2/milestones).
|
||||
This data is parsed by features sending metrics to time series databases (TSDB):
|
||||
|
||||
Icinga 2 provides an external command pipe for processing commands
|
||||
triggering specific actions (for example rescheduling a service check
|
||||
through the web interface).
|
||||
* [Graphite](14-features.md#graphite-carbon-cache-writer)
|
||||
* [InfluxDB](14-features.md#influxdb-writer)
|
||||
* [OpenTSDB](14-features.md#opentsdb-writer)
|
||||
|
||||
In order to enable the `ExternalCommandListener` configuration use the
|
||||
following command and restart Icinga 2 afterwards:
|
||||
Metrics, state changes and notifications can be managed with the following integrations:
|
||||
|
||||
```
|
||||
# icinga2 feature enable command
|
||||
```
|
||||
* [Elastic Stack](14-features.md#elastic-stack-integration)
|
||||
* [Graylog](14-features.md#graylog-integration)
|
||||
|
||||
Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
|
||||
using the default configuration.
|
||||
|
||||
Web interfaces and other Icinga addons are able to send commands to
|
||||
Icinga 2 through the external command pipe, for example for rescheduling
|
||||
a forced service check:
|
||||
### Graphite Writer <a id="graphite-carbon-cache-writer"></a>
|
||||
|
||||
```
|
||||
# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
|
||||
[Graphite](13-addons.md#addons-graphing-graphite) is a tool stack for storing
|
||||
metrics and needs to be running prior to enabling the `graphite` feature.
|
||||
|
||||
# tail -f /var/log/messages
|
||||
|
||||
Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
|
||||
Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
|
||||
```
|
||||
|
||||
A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail).
|
||||
|
||||
Detailed information on the commands and their required parameters can be found
|
||||
on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html).
|
||||
|
||||
## Performance Data <a id="performance-data"></a>
|
||||
|
||||
When a host or service check is executed plugins should provide so-called
|
||||
`performance data`. Next to that additional check performance data
|
||||
can be fetched using Icinga 2 runtime macros such as the check latency
|
||||
or the current service state (or additional custom attributes).
|
||||
|
||||
The performance data can be passed to external applications which aggregate and
|
||||
store them in their backends. These tools usually generate graphs for historical
|
||||
reporting and trending.
|
||||
|
||||
Well-known addons processing Icinga performance data are [PNP4Nagios](13-addons.md#addons-graphing-pnp),
|
||||
[Graphite](13-addons.md#addons-graphing-graphite) or [OpenTSDB](14-features.md#opentsdb-writer).
|
||||
|
||||
### Writing Performance Data Files <a id="writing-performance-data-files"></a>
|
||||
|
||||
PNP4Nagios and Graphios use performance data collector daemons to fetch
|
||||
the current performance files for their backend updates.
|
||||
|
||||
Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter)
|
||||
feature allows you to define the output template format for host and services helped
|
||||
with Icinga 2 runtime vars.
|
||||
|
||||
```
|
||||
host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$"
|
||||
service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$"
|
||||
```
|
||||
|
||||
The default templates are already provided with the Icinga 2 feature configuration
|
||||
which can be enabled using
|
||||
|
||||
```
|
||||
# icinga2 feature enable perfdata
|
||||
```
|
||||
|
||||
By default all performance data files are rotated in a 15 seconds interval into
|
||||
the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
|
||||
`service-perfdata.<timestamp>`.
|
||||
External collectors need to parse the rotated performance data files and then
|
||||
remove the processed files.
|
||||
|
||||
### Graphite Carbon Cache Writer <a id="graphite-carbon-cache-writer"></a>
|
||||
|
||||
While there are some [Graphite](13-addons.md#addons-graphing-graphite)
|
||||
collector scripts and daemons like Graphios available for Icinga 1.x it's more
|
||||
reasonable to directly process the check and plugin performance
|
||||
in memory in Icinga 2. Once there are new metrics available, Icinga 2 will directly
|
||||
write them to the defined Graphite Carbon daemon tcp socket.
|
||||
Icinga 2 writes parsed metrics directly to Graphite's Carbon Cache
|
||||
TCP port, defaulting to `2003`.
|
||||
|
||||
You can enable the feature using
|
||||
|
||||
|
@ -273,7 +223,7 @@ You can enable the feature using
|
|||
By default the [GraphiteWriter](09-object-types.md#objecttype-graphitewriter) feature
|
||||
expects the Graphite Carbon Cache to listen at `127.0.0.1` on TCP port `2003`.
|
||||
|
||||
#### Current Graphite Schema <a id="graphite-carbon-cache-writer-schema"></a>
|
||||
#### Graphite Schema <a id="graphite-carbon-cache-writer-schema"></a>
|
||||
|
||||
The current naming schema is defined as follows. The [Icinga Web 2 Graphite module](https://github.com/icinga/icingaweb2-module-graphite)
|
||||
depends on this schema.
|
||||
|
@ -308,7 +258,8 @@ Metric values are stored like this:
|
|||
<prefix>.perfdata.<perfdata-label>.value
|
||||
```
|
||||
|
||||
The following characters are escaped in perfdata labels:
|
||||
The following characters are escaped in performance labels
|
||||
parsed from plugin output:
|
||||
|
||||
Character | Escaped character
|
||||
--------------|--------------------------
|
||||
|
@ -317,7 +268,7 @@ The following characters are escaped in perfdata labels:
|
|||
/ | _
|
||||
:: | .
|
||||
|
||||
Note that perfdata labels may contain dots (`.`) allowing to
|
||||
Note that labels may contain dots (`.`) allowing to
|
||||
add more subsequent levels inside the Graphite tree.
|
||||
`::` adds support for [multi performance labels](http://my-plugin.de/wiki/projects/check_multi/configuration/performance)
|
||||
and is therefore replaced by `.`.
|
||||
|
@ -369,6 +320,25 @@ pattern = ^icinga2\.
|
|||
retentions = 1m:2d,5m:10d,30m:90d,360m:4y
|
||||
```
|
||||
|
||||
#### Graphite in Cluster HA Zones <a id="graphite-carbon-cache-writer-cluster-ha"></a>
|
||||
|
||||
The Graphite feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
|
||||
in cluster zones since 2.11.
|
||||
|
||||
By default, all endpoints in a zone will activate the feature and start
|
||||
writing metrics to a Carbon Cache socket. In HA enabled scenarios,
|
||||
it is possible to set `enable_ha = true` in all feature configuration
|
||||
files. This allows each endpoint to calculate the feature authority,
|
||||
and only one endpoint actively writes metrics, the other endpoints
|
||||
pause the feature.
|
||||
|
||||
When the cluster connection breaks at some point, the remaining endpoint(s)
|
||||
in that zone will automatically resume the feature. This built-in failover
|
||||
mechanism ensures that metrics are written even if the cluster fails.
|
||||
|
||||
The recommended way of running Graphite in this scenario is a dedicated server
|
||||
where Carbon Cache/Relay is running as receiver.
|
||||
|
||||
|
||||
### InfluxDB Writer <a id="influxdb-writer"></a>
|
||||
|
||||
|
@ -447,6 +417,25 @@ object InfluxdbWriter "influxdb" {
|
|||
}
|
||||
```
|
||||
|
||||
#### InfluxDB in Cluster HA Zones <a id="influxdb-writer-cluster-ha"></a>
|
||||
|
||||
The InfluxDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
|
||||
in cluster zones since 2.11.
|
||||
|
||||
By default, all endpoints in a zone will activate the feature and start
|
||||
writing metrics to the InfluxDB HTTP API. In HA enabled scenarios,
|
||||
it is possible to set `enable_ha = true` in all feature configuration
|
||||
files. This allows each endpoint to calculate the feature authority,
|
||||
and only one endpoint actively writes metrics, the other endpoints
|
||||
pause the feature.
|
||||
|
||||
When the cluster connection breaks at some point, the remaining endpoint(s)
|
||||
in that zone will automatically resume the feature. This built-in failover
|
||||
mechanism ensures that metrics are written even if the cluster fails.
|
||||
|
||||
The recommended way of running InfluxDB in this scenario is a dedicated server
|
||||
where the InfluxDB HTTP API or Telegraf as Proxy are running.
|
||||
|
||||
### Elastic Stack Integration <a id="elastic-stack-integration"></a>
|
||||
|
||||
[Icingabeat](https://github.com/icinga/icingabeat) is an Elastic Beat that fetches data
|
||||
|
@ -524,6 +513,26 @@ check_result.perfdata.<perfdata-label>.warn
|
|||
check_result.perfdata.<perfdata-label>.crit
|
||||
```
|
||||
|
||||
#### Elasticsearch in Cluster HA Zones <a id="elasticsearch-writer-cluster-ha"></a>
|
||||
|
||||
The Elasticsearch feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
|
||||
in cluster zones since 2.11.
|
||||
|
||||
By default, all endpoints in a zone will activate the feature and start
|
||||
writing events to the Elasticsearch HTTP API. In HA enabled scenarios,
|
||||
it is possible to set `enable_ha = true` in all feature configuration
|
||||
files. This allows each endpoint to calculate the feature authority,
|
||||
and only one endpoint actively writes events, the other endpoints
|
||||
pause the feature.
|
||||
|
||||
When the cluster connection breaks at some point, the remaining endpoint(s)
|
||||
in that zone will automatically resume the feature. This built-in failover
|
||||
mechanism ensures that events are written even if the cluster fails.
|
||||
|
||||
The recommended way of running Elasticsearch in this scenario is a dedicated server
|
||||
where you either have the Elasticsearch HTTP API, or a TLS secured HTTP proxy,
|
||||
or Logstash for additional filtering.
|
||||
|
||||
### Graylog Integration <a id="graylog-integration"></a>
|
||||
|
||||
#### GELF Writer <a id="gelfwriter"></a>
|
||||
|
@ -550,6 +559,24 @@ Currently these events are processed:
|
|||
* State changes
|
||||
* Notifications
|
||||
|
||||
#### Graylog/GELF in Cluster HA Zones <a id="gelf-writer-cluster-ha"></a>
|
||||
|
||||
The Gelf feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
|
||||
in cluster zones since 2.11.
|
||||
|
||||
By default, all endpoints in a zone will activate the feature and start
|
||||
writing events to the Graylog HTTP API. In HA enabled scenarios,
|
||||
it is possible to set `enable_ha = true` in all feature configuration
|
||||
files. This allows each endpoint to calculate the feature authority,
|
||||
and only one endpoint actively writes events, the other endpoints
|
||||
pause the feature.
|
||||
|
||||
When the cluster connection breaks at some point, the remaining endpoint(s)
|
||||
in that zone will automatically resume the feature. This built-in failover
|
||||
mechanism ensures that events are written even if the cluster fails.
|
||||
|
||||
The recommended way of running Graylog in this scenario is a dedicated server
|
||||
where you have the Graylog HTTP API listening.
|
||||
|
||||
### OpenTSDB Writer <a id="opentsdb-writer"></a>
|
||||
|
||||
|
@ -625,6 +652,75 @@ with the following tags
|
|||
> You might want to set the tsd.core.auto_create_metrics setting to `true`
|
||||
> in your opentsdb.conf configuration file.
|
||||
|
||||
#### OpenTSDB in Cluster HA Zones <a id="opentsdb-writer-cluster-ha"></a>
|
||||
|
||||
The OpenTSDB feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
|
||||
in cluster zones since 2.11.
|
||||
|
||||
By default, all endpoints in a zone will activate the feature and start
|
||||
writing events to the OpenTSDB listener. In HA enabled scenarios,
|
||||
it is possible to set `enable_ha = true` in all feature configuration
|
||||
files. This allows each endpoint to calculate the feature authority,
|
||||
and only one endpoint actively writes metrics, the other endpoints
|
||||
pause the feature.
|
||||
|
||||
When the cluster connection breaks at some point, the remaining endpoint(s)
|
||||
in that zone will automatically resume the feature. This built-in failover
|
||||
mechanism ensures that metrics are written even if the cluster fails.
|
||||
|
||||
The recommended way of running OpenTSDB in this scenario is a dedicated server
|
||||
where you have OpenTSDB running.
|
||||
|
||||
|
||||
### Writing Performance Data Files <a id="writing-performance-data-files"></a>
|
||||
|
||||
PNP and Graphios use performance data collector daemons to fetch
|
||||
the current performance files for their backend updates.
|
||||
|
||||
Therefore the Icinga 2 [PerfdataWriter](09-object-types.md#objecttype-perfdatawriter)
|
||||
feature allows you to define the output template format for host and services helped
|
||||
with Icinga 2 runtime vars.
|
||||
|
||||
```
|
||||
host_format_template = "DATATYPE::HOSTPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tHOSTPERFDATA::$host.perfdata$\tHOSTCHECKCOMMAND::$host.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$"
|
||||
service_format_template = "DATATYPE::SERVICEPERFDATA\tTIMET::$icinga.timet$\tHOSTNAME::$host.name$\tSERVICEDESC::$service.name$\tSERVICEPERFDATA::$service.perfdata$\tSERVICECHECKCOMMAND::$service.check_command$\tHOSTSTATE::$host.state$\tHOSTSTATETYPE::$host.state_type$\tSERVICESTATE::$service.state$\tSERVICESTATETYPE::$service.state_type$"
|
||||
```
|
||||
|
||||
The default templates are already provided with the Icinga 2 feature configuration
|
||||
which can be enabled using
|
||||
|
||||
```
|
||||
# icinga2 feature enable perfdata
|
||||
```
|
||||
|
||||
By default all performance data files are rotated in a 15 seconds interval into
|
||||
the `/var/spool/icinga2/perfdata/` directory as `host-perfdata.<timestamp>` and
|
||||
`service-perfdata.<timestamp>`.
|
||||
External collectors need to parse the rotated performance data files and then
|
||||
remove the processed files.
|
||||
|
||||
#### Perfdata Files in Cluster HA Zones <a id="perfdata-writer-cluster-ha"></a>
|
||||
|
||||
The Perfdata feature supports [high availability](06-distributed-monitoring.md#distributed-monitoring-high-availability-features)
|
||||
in cluster zones since 2.11.
|
||||
|
||||
By default, all endpoints in a zone will activate the feature and start
|
||||
writing metrics to the local spool directory. In HA enabled scenarios,
|
||||
it is possible to set `enable_ha = true` in all feature configuration
|
||||
files. This allows each endpoint to calculate the feature authority,
|
||||
and only one endpoint actively writes metrics, the other endpoints
|
||||
pause the feature.
|
||||
|
||||
When the cluster connection breaks at some point, the remaining endpoint(s)
|
||||
in that zone will automatically resume the feature. This built-in failover
|
||||
mechanism ensures that metrics are written even if the cluster fails.
|
||||
|
||||
The recommended way of running Perfdata is to mount the perfdata spool
|
||||
directory via NFS on a central server where PNP with the NPCD collector
|
||||
is running on.
|
||||
|
||||
|
||||
|
||||
|
||||
## Livestatus <a id="setting-up-livestatus"></a>
|
||||
|
||||
|
@ -831,7 +927,9 @@ The `commands` table is populated with `CheckCommand`, `EventCommand` and `Notif
|
|||
A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](24-appendix.md#schema-livestatus).
|
||||
|
||||
|
||||
## Status Data Files <a id="status-data"></a>
|
||||
## Deprecated Features <a id="deprecated-features"></a>
|
||||
|
||||
### Status Data Files <a id="status-data"></a>
|
||||
|
||||
> **Note**
|
||||
>
|
||||
|
@ -850,7 +948,7 @@ status updates in a regular interval.
|
|||
If you are not using any web interface or addon which uses these files,
|
||||
you can safely disable this feature.
|
||||
|
||||
## Compat Log Files <a id="compat-logging"></a>
|
||||
### Compat Log Files <a id="compat-logging"></a>
|
||||
|
||||
> **Note**
|
||||
>
|
||||
|
@ -876,7 +974,52 @@ By default, the Icinga 1.x log file called `icinga.log` is located
|
|||
in `/var/log/icinga2/compat`. Rotated log files are moved into
|
||||
`var/log/icinga2/compat/archives`.
|
||||
|
||||
## Check Result Files <a id="check-result-files"></a>
|
||||
### External Command Pipe <a id="external-commands"></a>
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative
|
||||
> for external actions.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This feature is DEPRECATED and will be removed in future releases.
|
||||
> Check the [roadmap](https://github.com/Icinga/icinga2/milestones).
|
||||
|
||||
Icinga 2 provides an external command pipe for processing commands
|
||||
triggering specific actions (for example rescheduling a service check
|
||||
through the web interface).
|
||||
|
||||
In order to enable the `ExternalCommandListener` configuration use the
|
||||
following command and restart Icinga 2 afterwards:
|
||||
|
||||
```
|
||||
# icinga2 feature enable command
|
||||
```
|
||||
|
||||
Icinga 2 creates the command pipe file as `/var/run/icinga2/cmd/icinga2.cmd`
|
||||
using the default configuration.
|
||||
|
||||
Web interfaces and other Icinga addons are able to send commands to
|
||||
Icinga 2 through the external command pipe, for example for rescheduling
|
||||
a forced service check:
|
||||
|
||||
```
|
||||
# /bin/echo "[`date +%s`] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;`date +%s`" >> /var/run/icinga2/cmd/icinga2.cmd
|
||||
|
||||
# tail -f /var/log/messages
|
||||
|
||||
Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885
|
||||
Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4'
|
||||
```
|
||||
|
||||
A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail).
|
||||
|
||||
Detailed information on the commands and their required parameters can be found
|
||||
on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html).
|
||||
|
||||
|
||||
### Check Result Files <a id="check-result-files"></a>
|
||||
|
||||
> **Note**
|
||||
>
|
||||
|
|
Loading…
Reference in New Issue