mirror of https://github.com/Icinga/icinga2.git
Documentation: Rewrite cluster docs
* Re-organize structure * New section with HA features * Permissions and security * How to add a new node * Cluster requirements * Additional hints on installation * More troubleshooting fixes #6743 fixes #6703 fixes #6997
This commit is contained in:
parent
3972aa20c4
commit
32c20132d0
|
@ -144,74 +144,52 @@ passing the check results to Icinga 2.
|
|||
remote sender to push check results into the Icinga 2 `ExternalCommandListener`
|
||||
feature.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> This addon works in a similar fashion like the Icinga 1.x distributed model. If you
|
||||
> are looking for a real distributed architecture with Icinga 2, scroll down.
|
||||
|
||||
|
||||
## <a id="distributed-monitoring-high-availability"></a> Distributed Monitoring and High Availability
|
||||
|
||||
An Icinga 2 cluster consists of two or more nodes and can reside on multiple
|
||||
architectures. The base concept of Icinga 2 is the possibility to add additional
|
||||
features using components. In case of a cluster setup you have to add the api feature
|
||||
to all nodes.
|
||||
|
||||
An Icinga 2 cluster can be used for the following scenarios:
|
||||
Building distributed environments with high availability included is fairly easy with Icinga 2.
|
||||
The cluster feature is built-in and allows you to build many scenarios based on your requirements:
|
||||
|
||||
* [High Availability](#cluster-scenarios-high-availability). All instances in the `Zone` elect one active master and run as Active/Active cluster.
|
||||
* [Distributed Zones](#cluster-scenarios-distributed-zones). A master zone and one or more satellites in their zones.
|
||||
* [Load Distribution](#cluster-scenarios-load-distribution). A configuration master and multiple checker satellites.
|
||||
|
||||
You can combine these scenarios into a global setup fitting your requirements.
|
||||
|
||||
Each instance got their own event scheduler, and does not depend on a centralized master
|
||||
coordinating and distributing the events. In case of a cluster failure, all nodes
|
||||
continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
|
||||
is in effect - all alive instances continue to do their job, and history will begin to differ.
|
||||
|
||||
> ** Note **
|
||||
>
|
||||
> Before you start, make sure to read the [requirements](#distributed-monitoring-requirements).
|
||||
|
||||
|
||||
### <a id="cluster-requirements"></a> Cluster Requirements
|
||||
|
||||
Before you start deploying, keep the following things in mind:
|
||||
|
||||
* Your [SSL CA and certificates](#certificate-authority-certificates) are mandatory for secure communication
|
||||
* Get pen and paper or a drawing board and design your nodes and zones!
|
||||
** all nodes in a cluster zone are providing high availability functionality and trust each other
|
||||
** cluster zones can be built in a Top-Down-design where the child trusts the parent
|
||||
** communication between zones happens bi-directional which means that a DMZ-located node can still reach the master node, or vice versa
|
||||
* Update firewall rules and ACLs
|
||||
* Decide whether to use the built-in [configuration syncronization](#cluster-zone-config-sync) or use an external tool (Puppet, Ansible, Chef, Salt, etc) to manage the configuration deployment
|
||||
|
||||
|
||||
> **Tip**
|
||||
>
|
||||
> If you're looking for troubleshooting cluster problems, check the general
|
||||
> [troubleshooting](#troubleshooting-cluster) section.
|
||||
|
||||
Before you start configuring the diffent nodes it is necessary to setup the underlying
|
||||
communication layer based on SSL.
|
||||
|
||||
### <a id="certificate-authority-certificates"></a> Certificate Authority and Certificates
|
||||
|
||||
Icinga 2 ships two scripts assisting with CA and node certificate creation
|
||||
for your Icinga 2 cluster.
|
||||
|
||||
The first step is the creation of CA running the following command:
|
||||
|
||||
# icinga2-build-ca
|
||||
|
||||
Please make sure to export the environment variable `ICINGA_CA` pointing to
|
||||
an empty folder for the newly created CA files:
|
||||
|
||||
# export ICINGA_CA="/root/icinga-ca"
|
||||
|
||||
Now create a certificate and key file for each node running the following command
|
||||
(replace `icinga2a` with the required hostname):
|
||||
|
||||
# icinga2-build-key icinga2a
|
||||
|
||||
Repeat the step for all nodes in your cluster scenario. Save the CA key in case
|
||||
you want to set up certificates for additional nodes at a later time.
|
||||
|
||||
Each node requires the following files in `/etc/icinga2/pki` (replace `fqdn-nodename` with
|
||||
the host's FQDN):
|
||||
|
||||
* ca.crt
|
||||
* <fqdn-nodename>.crt
|
||||
* <fqdn-nodename>.key
|
||||
|
||||
|
||||
|
||||
### <a id="configure-nodename"></a> Configure the Icinga Node Name
|
||||
|
||||
Instead of using the default FQDN as node name you can optionally set
|
||||
that value using the [NodeName](#global-constants) constant.
|
||||
This setting must be unique for each node, and must also match
|
||||
the name of the local [Endpoint](#objecttype-endpoint) object and the
|
||||
SSL certificate common name.
|
||||
|
||||
const NodeName = "icinga2a"
|
||||
|
||||
Read further about additional [naming conventions](#cluster-naming-convention).
|
||||
|
||||
Not specifying the node name will make Icinga 2 using the FQDN. Make sure that all
|
||||
configured endpoint names and common names are in sync.
|
||||
|
||||
### <a id="cluster-naming-convention"></a> Cluster Naming Convention
|
||||
#### <a id="cluster-naming-convention"></a> Cluster Naming Convention
|
||||
|
||||
The SSL certificate common name (CN) will be used by the [ApiListener](#objecttype-apilistener)
|
||||
object to determine the local authority. This name must match the local [Endpoint](#objecttype-endpoint)
|
||||
|
@ -240,13 +218,103 @@ The [Endpoint](#objecttype-endpoint) name is further referenced as `endpoints` a
|
|||
endpoints = [ "icinga2a", "icinga2b" ]
|
||||
}
|
||||
|
||||
Specifying the local node name using the [NodeName](#global-constants) variable requires
|
||||
Specifying the local node name using the [NodeName](#configure-nodename) variable requires
|
||||
the same name as used for the endpoint name and common name above. If not set, the FQDN is used.
|
||||
|
||||
const NodeName = "icinga2a"
|
||||
|
||||
|
||||
### <a id="configure-clusterlistener-object"></a> Configure the ApiListener Object
|
||||
### <a id="certificate-authority-certificates"></a> Certificate Authority and Certificates
|
||||
|
||||
Icinga 2 ships two scripts assisting with CA and node certificate creation
|
||||
for your Icinga 2 cluster.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> You're free to use your own method to generated a valid ca and signed client
|
||||
> certificates.
|
||||
|
||||
Please make sure to export the environment variable `ICINGA_CA` pointing to
|
||||
an empty folder for the newly created CA files:
|
||||
|
||||
# export ICINGA_CA="/root/icinga-ca"
|
||||
|
||||
The scripts will put all generated data and the required certificates in there.
|
||||
|
||||
The first step is the creation of the certificate authority (CA) running the
|
||||
following command:
|
||||
|
||||
# icinga2-build-ca
|
||||
|
||||
Now create a certificate and key file for each node running the following command
|
||||
(replace `icinga2a` with the required hostname):
|
||||
|
||||
# icinga2-build-key icinga2a
|
||||
|
||||
Repeat the step for all nodes in your cluster scenario.
|
||||
|
||||
Save the CA key in a secure location in case you want to set up certificates for
|
||||
additional nodes at a later time.
|
||||
|
||||
Navigate to the location of your newly generated certificate files, and manually
|
||||
copy/transfer them to `/etc/icinga2/pki` in your Icinga 2 configuration folder.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The certificate files must be readable by the user Icinga 2 is running as. Also,
|
||||
> the private key file must not be world-readable.
|
||||
|
||||
Each node requires the following files in `/etc/icinga2/pki` (replace `fqdn-nodename` with
|
||||
the host's FQDN):
|
||||
|
||||
* ca.crt
|
||||
* <fqdn-nodename>.crt
|
||||
* <fqdn-nodename>.key
|
||||
|
||||
|
||||
### <a id="cluster-configuration"></a> Cluster Configuration
|
||||
|
||||
The following section describe which configuration must be updated/created
|
||||
in order to get your cluster running with basic functionality.
|
||||
|
||||
* [configure the node name](#configure-nodename)
|
||||
* [configure the ApiListener object](#configure-apilistener-object)
|
||||
* [configure cluster endpoints](#configure-cluster-endpoints)
|
||||
* [configure cluster zones](#configure-cluster-zones)
|
||||
|
||||
Once you're finished with the basic setup the following section will
|
||||
describe how to use [zone configuration synchronisation](#cluster-zone-config-sync)
|
||||
and configure [cluster scenarios](#cluster-scenarios).
|
||||
|
||||
#### <a id="configure-nodename"></a> Configure the Icinga Node Name
|
||||
|
||||
Instead of using the default FQDN as node name you can optionally set
|
||||
that value using the [NodeName](#global-constants) constant.
|
||||
|
||||
> ** Note **
|
||||
>
|
||||
> Skip this step if your FQDN already matches the default `NodeName` set
|
||||
> in `/etc/icinga2/constants.conf`.
|
||||
|
||||
This setting must be unique for each node, and must also match
|
||||
the name of the local [Endpoint](#objecttype-endpoint) object and the
|
||||
SSL certificate common name as described in the
|
||||
[cluster naming convention](#cluster-naming-convention).
|
||||
|
||||
vim /etc/icinga2/constants.conf
|
||||
|
||||
/* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
|
||||
* This should be the common name from the API certificate.
|
||||
*/
|
||||
const NodeName = "icinga2a"
|
||||
|
||||
|
||||
Read further about additional [naming conventions](#cluster-naming-convention).
|
||||
|
||||
Not specifying the node name will make Icinga 2 using the FQDN. Make sure that all
|
||||
configured endpoint names and common names are in sync.
|
||||
|
||||
#### <a id="configure-apilistener-object"></a> Configure the ApiListener Object
|
||||
|
||||
The [ApiListener](#objecttype-apilistener) object needs to be configured on
|
||||
every node in the cluster with the following settings:
|
||||
|
@ -272,8 +340,7 @@ synchronisation enabled for this node.
|
|||
> The certificate files must be readable by the user Icinga 2 is running as. Also,
|
||||
> the private key file must not be world-readable.
|
||||
|
||||
|
||||
### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
|
||||
#### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
|
||||
|
||||
`Endpoint` objects specify the `host` and `port` settings for the cluster nodes.
|
||||
This configuration can be the same on all nodes in the cluster only containing
|
||||
|
@ -292,8 +359,7 @@ A sample configuration looks like:
|
|||
If this endpoint object is reachable on a different port, you must configure the
|
||||
`ApiListener` on the local `Endpoint` object accordingly too.
|
||||
|
||||
|
||||
### <a id="configure-cluster-zones"></a> Configure Cluster Zones
|
||||
#### <a id="configure-cluster-zones"></a> Configure Cluster Zones
|
||||
|
||||
`Zone` objects specify the endpoints located in a zone. That way your distributed setup can be
|
||||
seen as zones connected together instead of multiple instances in that specific zone.
|
||||
|
@ -324,7 +390,7 @@ the defined parent zone `config-ha-master`.
|
|||
}
|
||||
|
||||
|
||||
#### <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
|
||||
### <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
|
||||
|
||||
By default all objects for specific zones should be organized in
|
||||
|
||||
|
@ -376,12 +442,19 @@ process.
|
|||
> determines the required include directory. This can be overridden using the
|
||||
> [global constant](#global-constants) `ZonesDir`.
|
||||
|
||||
#### <a id="zone-synchronisation-permissions"></a> Global Configuration Zone
|
||||
#### <a id="zone-global-config-templates"></a> Global Configuration Zone for Templates
|
||||
|
||||
If your zone configuration setup shares the same templates, groups, commands, timeperiods, etc.
|
||||
you would have to duplicate quite a lot of configuration objects making the merged configuration
|
||||
on your configuration master unique.
|
||||
|
||||
> ** Note **
|
||||
>
|
||||
> Only put templates, groups, etc into this zone. DO NOT add checkable objects such as
|
||||
> hosts or services here. If they are checked by all instances globally, this will lead
|
||||
> into duplicated check results and unclear state history. Not easy to troubleshoot too -
|
||||
> you've been warned.
|
||||
|
||||
That is not necessary by defining a global zone shipping all those templates. By setting
|
||||
`global = true` you ensure that this zone serving common configuration templates will be
|
||||
synchronized to all involved nodes (only if they accept configuration though).
|
||||
|
@ -406,11 +479,11 @@ your zone configuration visible to all nodes.
|
|||
> **Note**
|
||||
>
|
||||
> If the remote node does not have this zone configured, it will ignore the configuration
|
||||
> update, if it accepts configuration.
|
||||
> update, if it accepts synchronized configuration.
|
||||
|
||||
If you don't require any global configuration, skip this setting.
|
||||
|
||||
#### <a id="zone-synchronisation-permissions"></a> Zone Configuration Permissions
|
||||
#### <a id="zone-config-sync-permissions"></a> Zone Configuration Synchronisation Permissions
|
||||
|
||||
Each [ApiListener](#objecttype-apilistener) object must have the `accept_config` attribute
|
||||
set to `true` to receive configuration from the parent `Zone` members. Default value is `false`.
|
||||
|
@ -422,15 +495,13 @@ set to `true` to receive configuration from the parent `Zone` members. Default v
|
|||
accept_config = true
|
||||
}
|
||||
|
||||
### <a id="initial-cluster-sync"></a> Initial Cluster Sync
|
||||
If `accept_config` is set to `false`, this instance won't accept configuration from remote
|
||||
master instances anymore.
|
||||
|
||||
In order to make sure that all of your cluster nodes have the same state you will
|
||||
have to pick one of the nodes as your initial "master" and copy its state file
|
||||
to all the other nodes.
|
||||
|
||||
You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
|
||||
the state file you should make sure that all your cluster nodes are properly shut
|
||||
down.
|
||||
> ** Tip **
|
||||
>
|
||||
> Look into the [troubleshooting guides](#troubleshooting-cluster-config-sync) for debugging
|
||||
> problems with the configuration synchronisation.
|
||||
|
||||
|
||||
### <a id="cluster-health-check"></a> Cluster Health Check
|
||||
|
@ -441,12 +512,12 @@ one or more configured nodes are not connected.
|
|||
|
||||
Example:
|
||||
|
||||
apply Service "cluster" {
|
||||
object Service "cluster" {
|
||||
check_command = "cluster"
|
||||
check_interval = 5s
|
||||
retry_interval = 1s
|
||||
|
||||
assign where host.name == "icinga2a"
|
||||
host_name = "icinga2a"
|
||||
}
|
||||
|
||||
Each cluster node should execute its own local cluster health check to
|
||||
|
@ -458,78 +529,33 @@ connected zones.
|
|||
|
||||
Example for the `checker` zone checking the connection to the `master` zone:
|
||||
|
||||
apply Service "cluster-zone-master" {
|
||||
object Service "cluster-zone-master" {
|
||||
check_command = "cluster-zone"
|
||||
check_interval = 5s
|
||||
retry_interval = 1s
|
||||
vars.cluster_zone = "master"
|
||||
|
||||
assign where host.name == "icinga2b"
|
||||
host_name = "icinga2b"
|
||||
}
|
||||
|
||||
|
||||
### <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
|
||||
|
||||
Special scenarios might require multiple cluster nodes running on a single host.
|
||||
By default Icinga 2 and its features will place their runtime data below the prefix
|
||||
`LocalStateDir`. By default packages will set that path to `/var`.
|
||||
You can either set that variable as constant configuration
|
||||
definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to
|
||||
the Icinga 2 daemon.
|
||||
|
||||
# icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var
|
||||
|
||||
### <a id="high-availability-db-ido"></a> High Availability with DB IDO
|
||||
|
||||
All instances within the same zone (e.g. the `master` zone as HA cluster) must
|
||||
have the DB IDO feature enabled.
|
||||
|
||||
Example DB IDO MySQL:
|
||||
|
||||
# icinga2-enable-feature ido-mysql
|
||||
The feature 'ido-mysql' is already enabled.
|
||||
|
||||
By default the DB IDO feature only runs on the elected zone master. All other nodes
|
||||
disable the active IDO database connection at runtime.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
|
||||
> for the [IdoMysqlConnection](#objecttype-idomysqlconnection) or
|
||||
> [IdoPgsqlConnection](#objecttype-idopgsqlconnection) object on all nodes in the
|
||||
> same zone.
|
||||
>
|
||||
> All endpoints will enable the DB IDO feature then, connect to the configured
|
||||
> database and dump configuration, status and historical data on their own.
|
||||
|
||||
If the instance with the active DB IDO connection dies, the HA functionality will
|
||||
re-enable the DB IDO connection on the newly elected zone master.
|
||||
|
||||
The DB IDO feature will try to determine which cluster endpoint is currently writing
|
||||
to the database and bail out if another endpoint is active. You can manually verify that
|
||||
by running the following query:
|
||||
|
||||
icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
|
||||
status_update_time | endpoint_name
|
||||
------------------------+---------------
|
||||
2014-08-15 15:52:26+02 | icinga2a
|
||||
(1 Zeile)
|
||||
|
||||
This is useful when the cluster connection between endpoints breaks, and prevents
|
||||
data duplication in split-brain-scenarios. The failover timeout can be set for the
|
||||
`failover_timeout` attribute, but not lower than 60 seconds.
|
||||
|
||||
|
||||
### <a id="cluster-scenarios"></a> Cluster Scenarios
|
||||
|
||||
All cluster nodes are full-featured Icinga 2 instances. You only need to enabled
|
||||
the features for their role (for example, a `Checker` node only requires the `checker`
|
||||
feature enabled, but not `notification` or `ido-mysql` features).
|
||||
|
||||
Each instance got their own event scheduler, and does not depend on a centralized master
|
||||
coordinating and distributing the events. In case of a cluster failure, all nodes
|
||||
continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
|
||||
is in effect - all alive instances continue to do their job, and history will begin to differ.
|
||||
#### <a id="cluster-scenarios-security"></a> Security in Cluster Scenarios
|
||||
|
||||
While there are certain capabilities to ensure the safe communication between all
|
||||
nodes (firewalls, policies, software hardening, etc) the Icinga 2 cluster also provides
|
||||
additional security itself:
|
||||
|
||||
* [SSL certificates](#certificate-authority-certificates) are mandatory for cluster communication.
|
||||
* Child zones only receive event updates (check results, commands, etc) for their configured updates.
|
||||
* Zones cannot influence/interfere other zones. Each checked object is assigned to only one zone.
|
||||
* All nodes in a zone trust each other.
|
||||
* [Configuration sync](#zone-config-sync-permissions) is disabled by default.
|
||||
|
||||
#### <a id="cluster-scenarios-features"></a> Features in Cluster Zones
|
||||
|
||||
|
@ -539,11 +565,13 @@ Even further all commands are distributed amongst connected nodes. For example,
|
|||
re-schedule a check or acknowledge a problem on the master, and it gets replicated to the
|
||||
actual slave checker node.
|
||||
|
||||
DB IDO on the left, graphite on the right side - works.
|
||||
DB IDO on the left, graphite on the right side - works (if you disable
|
||||
[DB IDO HA](#high-availability-db-ido)).
|
||||
Icinga Web 2 on the left, checker and notifications on the right side - works too.
|
||||
Everything on the left and on the right side - make sure to deal with duplicated notifications
|
||||
and automated check distribution.
|
||||
|
||||
Everything on the left and on the right side - make sure to deal with
|
||||
[load-balanced notifications and checks](#high-availability-features) in a
|
||||
[HA zone](#cluster-scenarios-high-availability).
|
||||
configure-cluster-zones
|
||||
#### <a id="cluster-scenarios-distributed-zones"></a> Distributed Zones
|
||||
|
||||
That scenario fits if your instances are spread over the globe and they all report
|
||||
|
@ -612,7 +640,6 @@ The zones would look like:
|
|||
The `nuremberg-master` zone will only execute local checks, and receive
|
||||
check results from the satellite nodes in the zones `berlin` and `vienna`.
|
||||
|
||||
|
||||
#### <a id="cluster-scenarios-load-distribution"></a> Load Distribution
|
||||
|
||||
If you are planning to off-load the checks to a defined set of remote workers
|
||||
|
@ -663,17 +690,13 @@ Zones:
|
|||
global = true
|
||||
}
|
||||
|
||||
|
||||
#### <a id="cluster-scenarios-high-availability"></a> High Availability
|
||||
#### <a id="cluster-scenarios-high-availability"></a> Cluster High Availability
|
||||
|
||||
High availability with Icinga 2 is possible by putting multiple nodes into
|
||||
a dedicated `Zone`. All nodes will elect their active master, and retry an
|
||||
a dedicated `Zone`. All nodes will elect one active master, and retry an
|
||||
election once the current active master failed.
|
||||
|
||||
Selected features (such as [DB IDO](#high-availability-db-ido)) will only be
|
||||
active on the current active master.
|
||||
All other passive nodes will pause the features without reload/restart.
|
||||
|
||||
Selected features provide advanced [HA functionality](#high-availability-features).
|
||||
Checks and notifications are load-balanced between nodes in the high availability
|
||||
zone.
|
||||
|
||||
|
@ -693,7 +716,6 @@ Two or more nodes in a high availability setup require an [initial cluster sync]
|
|||
> configuration files in the `zones.d` directory. All other nodes must not
|
||||
> have that directory populated. Detail in the [Configuration Sync Chapter](#cluster-zone-config-sync).
|
||||
|
||||
|
||||
#### <a id="cluster-scenarios-multiple-hierachies"></a> Multiple Hierachies
|
||||
|
||||
Your master zone collects all check results for reporting and graphing and also
|
||||
|
@ -717,3 +739,110 @@ department instances. Furthermore the master NOC is able to see what's going on.
|
|||
|
||||
The instances in the departments will serve a local interface, and allow the administrators
|
||||
to reschedule checks or acknowledge problems for their services.
|
||||
|
||||
|
||||
### <a id="high-availability-features"></a> High Availability for Icinga 2 features
|
||||
|
||||
All nodes in the same zone require the same features enabled for High Availability (HA)
|
||||
amongst them.
|
||||
|
||||
By default the following features provide advanced HA functionality:
|
||||
|
||||
* [Checks](#high-availability-checks) (load balanced, automated failover)
|
||||
* [Notifications](#high-availability-notifications) (load balanced, automated failover)
|
||||
* DB IDO (Run-Once, automated failover)
|
||||
|
||||
#### <a id="high-availability-checks"></a> High Availability with Checks
|
||||
|
||||
All nodes in the same zone automatically load-balance the check execution. When one instance
|
||||
fails the other nodes will automatically take over the reamining checks.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> If a node should not check anything, disable the `checker` feature explicitely and
|
||||
> reload Icinga 2.
|
||||
|
||||
# icinga2-disable-feature checker
|
||||
# service icinga2 reload
|
||||
|
||||
#### <a id="high-availability-notifications"></a> High Availability with Notifications
|
||||
|
||||
Notifications are load balanced amongst all nodes in a zone. By default this functionality
|
||||
is enabled.
|
||||
If your nodes should notify independent from any other nodes (this will cause
|
||||
duplicated notifications if not properly handled!), you can set `enable_ha = false`
|
||||
in the [NotificationComponent](#objecttype-notificationcomponent) feature.
|
||||
|
||||
#### <a id="high-availability-db-ido"></a> High Availability with DB IDO
|
||||
|
||||
All instances within the same zone (e.g. the `master` zone as HA cluster) must
|
||||
have the DB IDO feature enabled.
|
||||
|
||||
Example DB IDO MySQL:
|
||||
|
||||
# icinga2-enable-feature ido-mysql
|
||||
The feature 'ido-mysql' is already enabled.
|
||||
|
||||
By default the DB IDO feature only runs on the elected zone master. All other passive
|
||||
nodes disable the active IDO database connection at runtime.
|
||||
|
||||
> **Note**
|
||||
>
|
||||
> The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
|
||||
> for the [IdoMysqlConnection](#objecttype-idomysqlconnection) or
|
||||
> [IdoPgsqlConnection](#objecttype-idopgsqlconnection) object on all nodes in the
|
||||
> same zone.
|
||||
>
|
||||
> All endpoints will enable the DB IDO feature then, connect to the configured
|
||||
> database and dump configuration, status and historical data on their own.
|
||||
|
||||
If the instance with the active DB IDO connection dies, the HA functionality will
|
||||
re-enable the DB IDO connection on the newly elected zone master.
|
||||
|
||||
The DB IDO feature will try to determine which cluster endpoint is currently writing
|
||||
to the database and bail out if another endpoint is active. You can manually verify that
|
||||
by running the following query:
|
||||
|
||||
icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
|
||||
status_update_time | endpoint_name
|
||||
------------------------+---------------
|
||||
2014-08-15 15:52:26+02 | icinga2a
|
||||
(1 Zeile)
|
||||
|
||||
This is useful when the cluster connection between endpoints breaks, and prevents
|
||||
data duplication in split-brain-scenarios. The failover timeout can be set for the
|
||||
`failover_timeout` attribute, but not lower than 60 seconds.
|
||||
|
||||
|
||||
### <a id="cluster-add-node"></a> Add a new cluster endpoint
|
||||
|
||||
These steps are required for integrating a new cluster endpoint:
|
||||
|
||||
* generate a new [SSL client certificate](#certificate-authority-certificates)
|
||||
* identify its location in the zones
|
||||
* update the `zones.conf` file on each involved node ([endpoint](#configure-cluster-endpoints), [zones](#configure-cluster-zones))
|
||||
** a new slave zone node requires updates for the master and slave zones
|
||||
* if the node requires the existing zone history: [initial cluster sync](#initial-cluster-sync)
|
||||
* add a [cluster health check](#cluster-health-check)
|
||||
|
||||
#### <a id="initial-cluster-sync"></a> Initial Cluster Sync
|
||||
|
||||
In order to make sure that all of your cluster nodes have the same state you will
|
||||
have to pick one of the nodes as your initial "master" and copy its state file
|
||||
to all the other nodes.
|
||||
|
||||
You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
|
||||
the state file you should make sure that all your cluster nodes are properly shut
|
||||
down.
|
||||
|
||||
|
||||
### <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
|
||||
|
||||
Special scenarios might require multiple cluster nodes running on a single host.
|
||||
By default Icinga 2 and its features will place their runtime data below the prefix
|
||||
`LocalStateDir`. By default packages will set that path to `/var`.
|
||||
You can either set that variable as constant configuration
|
||||
definition in [icinga2.conf](#icinga2-conf) or pass it as runtime variable to
|
||||
the Icinga 2 daemon.
|
||||
|
||||
# icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var
|
||||
|
|
|
@ -164,6 +164,14 @@ they remain in a Split-Brain-mode and history may differ.
|
|||
Although the Icinga 2 cluster protocol stores historical events in a replay log for later synchronisation,
|
||||
you should make sure to check why the network connection failed.
|
||||
|
||||
### <a id="troubleshooting-cluster-config-sync"></a> Cluster Troubleshooting Config Sync
|
||||
|
||||
If the cluster zones do not sync their configuration, make sure to check the following:
|
||||
|
||||
* Within a config master zone, only one configuration master is allowed to have its config in `/etc/icinga2/zones.d`.
|
||||
** The master syncs the configuration to `/var/lib/icinga2/api/zones/` during startup and only syncs valid configuration to the other nodes
|
||||
** The other nodes receive the configuration into `/var/lib/icinga2/api/zones/`
|
||||
* The `icinga2.log` log file will indicate whether this ApiListener [accepts config](#zone-config-sync-permissions), or not
|
||||
|
||||
|
||||
## <a id="debug"></a> Debug Icinga 2
|
||||
|
|
|
@ -8,7 +8,7 @@ const PluginDir = "@ICINGA2_PLUGINDIR@"
|
|||
|
||||
/* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
|
||||
* This should be the common name from the API certificate.
|
||||
*/
|
||||
*/
|
||||
//const NodeName = "localhost"
|
||||
|
||||
/* Our local zone name. */
|
||||
|
|
Loading…
Reference in New Issue