mirror of https://github.com/Icinga/icinga2.git
Documentation: Enhance cluster troubleshooting; add HA command_endpoint
fixes #9419 fixes #9420
This commit is contained in:
parent
f42bf537c3
commit
7e37609b1e
|
@ -155,7 +155,7 @@ graphical installer for Windows based client setup.
|
||||||
Your client setup requires the following
|
Your client setup requires the following
|
||||||
|
|
||||||
* A ready configured and installed [master node](10-icinga2-client.md#icinga2-client-installation-master-setup)
|
* A ready configured and installed [master node](10-icinga2-client.md#icinga2-client-installation-master-setup)
|
||||||
* SSL signed certificate for communication with the master (Use [CSR auto-signing](certifiates-csr-autosigning)).
|
* SSL signed certificate for communication with the master (Use [CSR auto-signing](10-icinga2-client.md#csr-autosigning-requirements)).
|
||||||
* Enabled API feature, and a local Endpoint and Zone object configuration
|
* Enabled API feature, and a local Endpoint and Zone object configuration
|
||||||
* Firewall ACLs for the communication port (default 5665)
|
* Firewall ACLs for the communication port (default 5665)
|
||||||
|
|
||||||
|
@ -600,8 +600,8 @@ defined endpoint. The check result is then received asynchronously through the c
|
||||||
vars.users_wgreater = 10
|
vars.users_wgreater = 10
|
||||||
vars.users_cgreater = 20
|
vars.users_cgreater = 20
|
||||||
|
|
||||||
/* assign where a remote client is set */
|
/* assign where a remote client pattern is matched */
|
||||||
assign where host.vars.remote_client
|
assign where match("*-remote", host.name)
|
||||||
}
|
}
|
||||||
|
|
||||||
|
|
||||||
|
|
|
@ -391,12 +391,19 @@ master instances anymore.
|
||||||
|
|
||||||
## <a id="cluster-health-check"></a> Cluster Health Check
|
## <a id="cluster-health-check"></a> Cluster Health Check
|
||||||
|
|
||||||
The Icinga 2 [ITL](7-icinga-template-library.md#icinga-template-library) ships an internal check command checking all configured
|
The Icinga 2 [ITL](7-icinga-template-library.md#icinga-template-library) provides
|
||||||
`EndPoints` in the cluster setup. The check result will become critical if
|
an internal check command checking all configured `EndPoints` in the cluster setup.
|
||||||
one or more configured nodes are not connected.
|
The check result will become critical if one or more configured nodes are not connected.
|
||||||
|
|
||||||
Example:
|
Example:
|
||||||
|
|
||||||
|
object Host "icinga2a" {
|
||||||
|
display_name = "Health Checks on icinga2a"
|
||||||
|
|
||||||
|
address = "192.168.33.10"
|
||||||
|
check_command = "hostalive"
|
||||||
|
}
|
||||||
|
|
||||||
object Service "cluster" {
|
object Service "cluster" {
|
||||||
check_command = "cluster"
|
check_command = "cluster"
|
||||||
check_interval = 5s
|
check_interval = 5s
|
||||||
|
@ -423,6 +430,31 @@ Example for the `checker` zone checking the connection to the `master` zone:
|
||||||
host_name = "icinga2b"
|
host_name = "icinga2b"
|
||||||
}
|
}
|
||||||
|
|
||||||
|
## <a id="cluster-health-check-command-endpoint"></a> Cluster Health Check with Command Endpoints
|
||||||
|
|
||||||
|
If you are planning to sync the zone configuration inside a [High-Availability]()
|
||||||
|
cluster zone, you can also use the `command_endpoint` object attribute to
|
||||||
|
pin host/service checks to a specific endpoint inside the same zone.
|
||||||
|
|
||||||
|
This requires the `accept_commands` setting inside the [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object)
|
||||||
|
object set to `true` similar to the [remote client command execution bridge](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
|
||||||
|
setup.
|
||||||
|
|
||||||
|
Make sure to set `command_endpoint` to the correct endpoint instance.
|
||||||
|
The example below assumes that the endpoint name is the same as the
|
||||||
|
host name configured for health checks. If it differs, define a host
|
||||||
|
custom attribute providing [this information](10-icinga2-client.md#icinga2-client-configuration-command-bridge-master-config).
|
||||||
|
|
||||||
|
apply Service "cluster-ha" {
|
||||||
|
check_command = "cluster"
|
||||||
|
check_interval = 5s
|
||||||
|
retry_interval = 1s
|
||||||
|
/* make sure host.name is the same as endpoint name */
|
||||||
|
command_endpoint = host.name
|
||||||
|
|
||||||
|
assign where regex("^icinga2[a|b]", host.name)
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
## <a id="cluster-scenarios"></a> Cluster Scenarios
|
## <a id="cluster-scenarios"></a> Cluster Scenarios
|
||||||
|
|
||||||
|
|
|
@ -169,6 +169,11 @@ or modify these attributes in the current object.
|
||||||
|
|
||||||
## <a id="troubleshooting-cluster"></a> Cluster Troubleshooting
|
## <a id="troubleshooting-cluster"></a> Cluster Troubleshooting
|
||||||
|
|
||||||
|
This applies to anything using the cluster protocol:
|
||||||
|
|
||||||
|
* [Distributed and High-Availability](12-distributed-monitoring-ha.md#distributed-monitoring-high-availability) scenarios
|
||||||
|
* [Remote client](10-icinga2-client.md#icinga2-client-scenarios) scenarios
|
||||||
|
|
||||||
You should configure the [cluster health checks](12-distributed-monitoring-ha.md#cluster-health-check) if you haven't
|
You should configure the [cluster health checks](12-distributed-monitoring-ha.md#cluster-health-check) if you haven't
|
||||||
done so already.
|
done so already.
|
||||||
|
|
||||||
|
@ -196,16 +201,50 @@ happens (default port is `5665`).
|
||||||
|
|
||||||
### <a id="troubleshooting-cluster-ssl-errors"></a> Cluster Troubleshooting SSL Errors
|
### <a id="troubleshooting-cluster-ssl-errors"></a> Cluster Troubleshooting SSL Errors
|
||||||
|
|
||||||
If the cluster communication fails with cryptic SSL error messages, make sure to check
|
If the cluster communication fails with SSL error messages, make sure to check
|
||||||
the following
|
the following
|
||||||
|
|
||||||
* File permissions on the SSL certificate files
|
* File permissions on the SSL certificate files
|
||||||
* Does the used CA match for all cluster endpoints?
|
* Does the used CA match for all cluster endpoints?
|
||||||
|
* Verify the `Issuer` being your trusted CA
|
||||||
|
* Verify the `Subject` containing your endpoint's common name (CN)
|
||||||
|
* Check the validity of the certificate itself
|
||||||
|
|
||||||
Examples:
|
Steps:
|
||||||
|
|
||||||
# ls -la /etc/icinga2/pki
|
# ls -la /etc/icinga2/pki
|
||||||
|
|
||||||
|
# cd /etc/icinga2/pki/
|
||||||
|
# openssl x509 -in icinga2a.crt -text
|
||||||
|
Certificate:
|
||||||
|
Data:
|
||||||
|
Version: 1 (0x0)
|
||||||
|
Serial Number: 2 (0x2)
|
||||||
|
Signature Algorithm: sha1WithRSAEncryption
|
||||||
|
Issuer: C=DE, ST=Bavaria, L=Nuremberg, O=NETWAYS GmbH, OU=Monitoring, CN=Icinga CA
|
||||||
|
Validity
|
||||||
|
Not Before: Jan 7 13:17:38 2014 GMT
|
||||||
|
Not After : Jan 5 13:17:38 2024 GMT
|
||||||
|
Subject: C=DE, ST=Bavaria, L=Nuremberg, O=NETWAYS GmbH, OU=Monitoring, CN=icinga2a
|
||||||
|
Subject Public Key Info:
|
||||||
|
Public Key Algorithm: rsaEncryption
|
||||||
|
Public-Key: (4096 bit)
|
||||||
|
Modulus:
|
||||||
|
...
|
||||||
|
|
||||||
|
Try to manually connect to the cluster node:
|
||||||
|
|
||||||
|
# openssl s_client -connect 192.168.33.10:5665
|
||||||
|
|
||||||
|
|
||||||
|
Unauthenticated nodes are able to connect required by the
|
||||||
|
[CSR auto-signing](10-icinga2-client.md#csr-autosigning-requirements) functionality.
|
||||||
|
|
||||||
|
[2015-06-10 03:28:11 +0200] information/ApiListener: New client connection for identity 'icinga-client' (unauthenticated)
|
||||||
|
|
||||||
|
If this message does not go away, make sure to verify the client's certificate and
|
||||||
|
its received `ca.crt` in `/etc/icinga2/pki`.
|
||||||
|
|
||||||
|
|
||||||
### <a id="troubleshooting-cluster-message-errors"></a> Cluster Troubleshooting Message Errors
|
### <a id="troubleshooting-cluster-message-errors"></a> Cluster Troubleshooting Message Errors
|
||||||
|
|
||||||
|
@ -216,6 +255,21 @@ they remain in a Split-Brain-mode and history may differ.
|
||||||
Although the Icinga 2 cluster protocol stores historical events in a replay log for later synchronisation,
|
Although the Icinga 2 cluster protocol stores historical events in a replay log for later synchronisation,
|
||||||
you should make sure to check why the network connection failed.
|
you should make sure to check why the network connection failed.
|
||||||
|
|
||||||
|
### <a id="troubleshooting-cluster-command-endpoint-errors"></a> Cluster Troubleshooting Command Endpoint Errors
|
||||||
|
|
||||||
|
Command endpoints can be used for clients acting as [remote command execution bridge](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
|
||||||
|
as well as inside an [High-Availability cluster](12-distributed-monitoring-ha.md#distributed-monitoring-high-availability).
|
||||||
|
|
||||||
|
There is no cli command for manually executing the check, but you can verify
|
||||||
|
the following (e.g. by invoking a forced check from the web interface):
|
||||||
|
|
||||||
|
* `icinga2.log` contains connection and execution errors
|
||||||
|
* `CheckCommand` definition not found on the remote client
|
||||||
|
* Referenced check plugin not found on the remote client
|
||||||
|
* Runtime warnings and errors, e.g. unresolved runtime macros or configuration problems
|
||||||
|
* Specific error messages are also populated into `UNKNOWN` check results including a detailed error message in their output
|
||||||
|
|
||||||
|
|
||||||
### <a id="troubleshooting-cluster-config-sync"></a> Cluster Troubleshooting Config Sync
|
### <a id="troubleshooting-cluster-config-sync"></a> Cluster Troubleshooting Config Sync
|
||||||
|
|
||||||
If the cluster zones do not sync their configuration, make sure to check the following:
|
If the cluster zones do not sync their configuration, make sure to check the following:
|
||||||
|
|
Loading…
Reference in New Issue