mirror of
				https://github.com/Icinga/icinga2.git
				synced 2025-10-31 03:03:52 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			893 lines
		
	
	
		
			36 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			893 lines
		
	
	
		
			36 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # <a id="distributed-monitoring-high-availability"></a> Distributed Monitoring and High Availability
 | |
| 
 | |
| Building distributed environments with high availability included is fairly easy with Icinga 2.
 | |
| The cluster feature is built-in and allows you to build many scenarios based on your requirements:
 | |
| 
 | |
| * [High Availability](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability). All instances in the `Zone` run as Active/Active cluster.
 | |
| * [Distributed Zones](12-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones). A master zone and one or more satellites in their zones.
 | |
| * [Load Distribution](12-distributed-monitoring-ha.md#cluster-scenarios-load-distribution). A configuration master and multiple checker satellites.
 | |
| 
 | |
| You can combine these scenarios into a global setup fitting your requirements.
 | |
| 
 | |
| Each instance got their own event scheduler, and does not depend on a centralized master
 | |
| coordinating and distributing the events. In case of a cluster failure, all nodes
 | |
| continue to run independently. Be alarmed when your cluster fails and a Split-Brain-scenario
 | |
| is in effect - all alive instances continue to do their job, and history will begin to differ.
 | |
| 
 | |
| 
 | |
| ## <a id="cluster-requirements"></a> Cluster Requirements
 | |
| 
 | |
| Before you start deploying, keep the following things in mind:
 | |
| 
 | |
| * Your [SSL CA and certificates](12-distributed-monitoring-ha.md#manual-certificate-generation) are mandatory for secure communication
 | |
| * Get pen and paper or a drawing board and design your nodes and zones!
 | |
|     * all nodes in a cluster zone are providing high availability functionality and trust each other
 | |
|     * cluster zones can be built in a Top-Down-design where the child trusts the parent
 | |
|     * communication between zones happens bi-directional which means that a DMZ-located node can still reach the master node, or vice versa
 | |
| * Update firewall rules and ACLs
 | |
| * Decide whether to use the built-in [configuration syncronization](12-distributed-monitoring-ha.md#cluster-zone-config-sync) or use an external tool (Puppet, Ansible, Chef, Salt, etc) to manage the configuration deployment
 | |
| 
 | |
| 
 | |
| > **Tip**
 | |
| >
 | |
| > If you're looking for troubleshooting cluster problems, check the general
 | |
| > [troubleshooting](16-troubleshooting.md#troubleshooting-cluster) section.
 | |
| 
 | |
| ## <a id="manual-certificate-generation"></a> Manual SSL Certificate Generation
 | |
| 
 | |
| Icinga 2 provides [CLI commands](8-cli-commands.md#cli-command-pki) assisting with CA
 | |
| and node certificate creation for your Icinga 2 distributed setup.
 | |
| 
 | |
| > **Tip**
 | |
| >
 | |
| > You can also use the master and client setup wizards to install the cluster nodes
 | |
| > using CSR-Autosigning.
 | |
| >
 | |
| > The manual steps are helpful if you want to use your own and/or existing CA (for example
 | |
| > Puppet CA).
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > You're free to use your own method to generated a valid ca and signed client
 | |
| > certificates.
 | |
| 
 | |
| The first step is the creation of the certificate authority (CA) by running the
 | |
| following command:
 | |
| 
 | |
|     # icinga2 pki new-ca
 | |
| 
 | |
| Now create a certificate and key file for each node running the following command
 | |
| (replace `icinga2a` with the required hostname):
 | |
| 
 | |
|     # icinga2 pki new-cert --cn icinga2a --key icinga2a.key --csr icinga2a.csr
 | |
|     # icinga2 pki sign-csr --csr icinga2a.csr --cert icinga2a.crt
 | |
| 
 | |
| Repeat the step for all nodes in your cluster scenario.
 | |
| 
 | |
| Save the CA key in a secure location in case you want to set up certificates for
 | |
| additional nodes at a later time.
 | |
| 
 | |
| Navigate to the location of your newly generated certificate files, and manually
 | |
| copy/transfer them to `/etc/icinga2/pki` in your Icinga 2 configuration folder.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > The certificate files must be readable by the user Icinga 2 is running as. Also,
 | |
| > the private key file must not be world-readable.
 | |
| 
 | |
| Each node requires the following files in `/etc/icinga2/pki` (replace `fqdn-nodename` with
 | |
| the host's FQDN):
 | |
| 
 | |
| * ca.crt
 | |
| * <fqdn-nodename>.crt
 | |
| * <fqdn-nodename>.key
 | |
| 
 | |
| If you're planning to use your existing CA and certificates please note that you *must not*
 | |
| use wildcard certificates. The common name (CN) is mandatory for the cluster communication and
 | |
| therefore must be unique for each connecting instance.
 | |
| 
 | |
| ### <a id="cluster-naming-convention"></a> Cluster Naming Convention
 | |
| 
 | |
| The SSL certificate common name (CN) will be used by the [ApiListener](6-object-types.md#objecttype-apilistener)
 | |
| object to determine the local authority. This name must match the local [Endpoint](6-object-types.md#objecttype-endpoint)
 | |
| object name.
 | |
| 
 | |
| Example:
 | |
| 
 | |
|     # icinga2 pki new-cert --cn icinga2a --key icinga2a.key --csr icinga2a.csr
 | |
|     # icinga2 pki sign-csr --csr icinga2a.csr --cert icinga2a.crt
 | |
| 
 | |
|     # vim zones.conf
 | |
| 
 | |
|     object Endpoint "icinga2a" {
 | |
|       host = "icinga2a.icinga.org"
 | |
|     }
 | |
| 
 | |
| The [Endpoint](6-object-types.md#objecttype-endpoint) name is further referenced as `endpoints` attribute on the
 | |
| [Zone](6-object-types.md#objecttype-zone) object.
 | |
| 
 | |
|     object Endpoint "icinga2b" {
 | |
|       host = "icinga2b.icinga.org"
 | |
|     }
 | |
| 
 | |
|     object Zone "config-ha-master" {
 | |
|       endpoints = [ "icinga2a", "icinga2b" ]
 | |
|     }
 | |
| 
 | |
| Specifying the local node name using the [NodeName](12-distributed-monitoring-ha.md#configure-nodename) variable requires
 | |
| the same name as used for the endpoint name and common name above. If not set, the FQDN is used.
 | |
| 
 | |
|     const NodeName = "icinga2a"
 | |
| 
 | |
| 
 | |
| ## <a id="cluster-configuration"></a> Cluster Configuration
 | |
| 
 | |
| The following section describe which configuration must be updated/created
 | |
| in order to get your cluster running with basic functionality.
 | |
| 
 | |
| * [configure the node name](12-distributed-monitoring-ha.md#configure-nodename)
 | |
| * [configure the ApiListener object](12-distributed-monitoring-ha.md#configure-apilistener-object)
 | |
| * [configure cluster endpoints](12-distributed-monitoring-ha.md#configure-cluster-endpoints)
 | |
| * [configure cluster zones](12-distributed-monitoring-ha.md#configure-cluster-zones)
 | |
| 
 | |
| Once you're finished with the basic setup the following section will
 | |
| describe how to use [zone configuration synchronisation](12-distributed-monitoring-ha.md#cluster-zone-config-sync)
 | |
| and configure [cluster scenarios](12-distributed-monitoring-ha.md#cluster-scenarios).
 | |
| 
 | |
| ### <a id="configure-nodename"></a> Configure the Icinga Node Name
 | |
| 
 | |
| Instead of using the default FQDN as node name you can optionally set
 | |
| that value using the [NodeName](19-language-reference.md#constants) constant.
 | |
| 
 | |
| > ** Note **
 | |
| >
 | |
| > Skip this step if your FQDN already matches the default `NodeName` set
 | |
| > in `/etc/icinga2/constants.conf`.
 | |
| 
 | |
| This setting must be unique for each node, and must also match
 | |
| the name of the local [Endpoint](6-object-types.md#objecttype-endpoint) object and the
 | |
| SSL certificate common name as described in the
 | |
| [cluster naming convention](12-distributed-monitoring-ha.md#cluster-naming-convention).
 | |
| 
 | |
|     vim /etc/icinga2/constants.conf
 | |
| 
 | |
|     /* Our local instance name. By default this is the server's hostname as returned by `hostname --fqdn`.
 | |
|      * This should be the common name from the API certificate.
 | |
|      */
 | |
|     const NodeName = "icinga2a"
 | |
| 
 | |
| 
 | |
| Read further about additional [naming conventions](12-distributed-monitoring-ha.md#cluster-naming-convention).
 | |
| 
 | |
| Not specifying the node name will make Icinga 2 using the FQDN. Make sure that all
 | |
| configured endpoint names and common names are in sync.
 | |
| 
 | |
| ### <a id="configure-apilistener-object"></a> Configure the ApiListener Object
 | |
| 
 | |
| The [ApiListener](6-object-types.md#objecttype-apilistener) object needs to be configured on
 | |
| every node in the cluster with the following settings:
 | |
| 
 | |
| A sample config looks like:
 | |
| 
 | |
|     object ApiListener "api" {
 | |
|       cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
 | |
|       key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
 | |
|       ca_path = SysconfDir + "/icinga2/pki/ca.crt"
 | |
|       accept_config = true
 | |
|       accept_commands = true
 | |
|     }
 | |
| 
 | |
| You can simply enable the `api` feature using
 | |
| 
 | |
|     # icinga2 feature enable api
 | |
| 
 | |
| Edit `/etc/icinga2/features-enabled/api.conf` if you require the configuration
 | |
| synchronisation enabled for this node. Set the `accept_config` attribute to `true`.
 | |
| 
 | |
| If you want to use this node as [remote client for command execution](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
 | |
| set the `accept_commands` attribute to `true`.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > The certificate files must be readable by the user Icinga 2 is running as. Also,
 | |
| > the private key file must not be world-readable.
 | |
| 
 | |
| ### <a id="configure-cluster-endpoints"></a> Configure Cluster Endpoints
 | |
| 
 | |
| `Endpoint` objects specify the `host` and `port` settings for the cluster node
 | |
| connection information.
 | |
| This configuration can be the same on all nodes in the cluster only containing
 | |
| connection information.
 | |
| 
 | |
| A sample configuration looks like:
 | |
| 
 | |
|     /**
 | |
|      * Configure config master endpoint
 | |
|      */
 | |
| 
 | |
|     object Endpoint "icinga2a" {
 | |
|       host = "icinga2a.icinga.org"
 | |
|     }
 | |
| 
 | |
| If this endpoint object is reachable on a different port, you must configure the
 | |
| `ApiListener` on the local `Endpoint` object accordingly too.
 | |
| 
 | |
| If you don't want the local instance to connect to the remote instance, remove the
 | |
| `host` attribute locally. Keep in mind that the configuration is now different amongst
 | |
| all instances and point-of-view dependant.
 | |
| 
 | |
| ### <a id="configure-cluster-zones"></a> Configure Cluster Zones
 | |
| 
 | |
| `Zone` objects specify the endpoints located in a zone. That way your distributed setup can be
 | |
| seen as zones connected together instead of multiple instances in that specific zone.
 | |
| 
 | |
| Zones can be used for [high availability](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability),
 | |
| [distributed setups](12-distributed-monitoring-ha.md#cluster-scenarios-distributed-zones) and
 | |
| [load distribution](12-distributed-monitoring-ha.md#cluster-scenarios-load-distribution).
 | |
| Furthermore zones are used for the [Icinga 2 remote client](10-icinga2-client.md#icinga2-client).
 | |
| 
 | |
| Each Icinga 2 `Endpoint` must be put into its respective `Zone`. In this example, you will
 | |
| define the zone `config-ha-master` where the `icinga2a` and `icinga2b` endpoints
 | |
| are located. The `check-satellite` zone consists of `icinga2c` only, but more nodes could
 | |
| be added.
 | |
| 
 | |
| The `config-ha-master` zone acts as High-Availability setup - the Icinga 2 instances elect
 | |
| one instance running a check, notification or feature (DB IDO), for example `icinga2a`. In case of
 | |
| failure of the `icinga2a` instance, `icinga2b` will take over automatically.
 | |
| 
 | |
|     object Zone "config-ha-master" {
 | |
|       endpoints = [ "icinga2a", "icinga2b" ]
 | |
|     }
 | |
| 
 | |
| The `check-satellite` zone is a separated location and only sends back their checkresults to
 | |
| the defined parent zone `config-ha-master`.
 | |
| 
 | |
|     object Zone "check-satellite" {
 | |
|       endpoints = [ "icinga2c" ]
 | |
|       parent = "config-ha-master"
 | |
|     }
 | |
| 
 | |
| 
 | |
| ## <a id="cluster-zone-config-sync"></a> Zone Configuration Synchronisation
 | |
| 
 | |
| By default all objects for specific zones should be organized in
 | |
| 
 | |
|     /etc/icinga2/zones.d/<zonename>
 | |
| 
 | |
| on the configuration master.
 | |
| 
 | |
| Your child zones and endpoint members **must not** have their config copied to `zones.d`.
 | |
| The built-in configuration synchronisation takes care of that if your nodes accept
 | |
| configuration from the parent zone. You can define that in the
 | |
| [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object) object by configuring the `accept_config`
 | |
| attribute accordingly.
 | |
| 
 | |
| You should remove the sample config included in `conf.d` by commenting the `recursive_include`
 | |
| statement in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf):
 | |
| 
 | |
|     //include_recursive "conf.d"
 | |
| 
 | |
| This applies to any other non-used configuration directories as well (e.g. `repository.d`
 | |
| if not used).
 | |
| 
 | |
| Better use a dedicated directory name for local configuration like `local` or similar, and
 | |
| include that one if your nodes require local configuration not being synced to other nodes. That's
 | |
| useful for local [health checks](12-distributed-monitoring-ha.md#cluster-health-check) for example.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > In a [high availability](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability)
 | |
| > setup only one assigned node can act as configuration master. All other zone
 | |
| > member nodes **must not** have the `/etc/icinga2/zones.d` directory populated.
 | |
| 
 | |
| 
 | |
| These zone packages are then distributed to all nodes in the same zone, and
 | |
| to their respective target zone instances.
 | |
| 
 | |
| Each configured zone must exist with the same directory name. The parent zone
 | |
| syncs the configuration to the child zones, if allowed using the `accept_config`
 | |
| attribute of the [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object) object.
 | |
| 
 | |
| Config on node `icinga2a`:
 | |
| 
 | |
|     object Zone "master" {
 | |
|       endpoints = [ "icinga2a" ]
 | |
|     }
 | |
| 
 | |
|     object Zone "checker" {
 | |
|       endpoints = [ "icinga2b" ]
 | |
|       parent = "master"
 | |
|     }
 | |
| 
 | |
|     /etc/icinga2/zones.d
 | |
|       master
 | |
|         health.conf
 | |
|       checker
 | |
|         health.conf
 | |
|         demo.conf
 | |
| 
 | |
| Config on node `icinga2b`:
 | |
| 
 | |
|     object Zone "master" {
 | |
|       endpoints = [ "icinga2a" ]
 | |
|     }
 | |
| 
 | |
|     object Zone "checker" {
 | |
|       endpoints = [ "icinga2b" ]
 | |
|       parent = "master"
 | |
|     }
 | |
| 
 | |
|     /etc/icinga2/zones.d
 | |
|       EMPTY_IF_CONFIG_SYNC_ENABLED
 | |
| 
 | |
| If the local configuration is newer than the received update Icinga 2 will skip the synchronisation
 | |
| process.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > `zones.d` must not be included in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf). Icinga 2 automatically
 | |
| > determines the required include directory. This can be overridden using the
 | |
| > [global constant](19-language-reference.md#constants) `ZonesDir`.
 | |
| 
 | |
| ### <a id="zone-global-config-templates"></a> Global Configuration Zone for Templates
 | |
| 
 | |
| If your zone configuration setup shares the same templates, groups, commands, timeperiods, etc.
 | |
| you would have to duplicate quite a lot of configuration objects making the merged configuration
 | |
| on your configuration master unique.
 | |
| 
 | |
| > ** Note **
 | |
| >
 | |
| > Only put templates, groups, etc into this zone. DO NOT add checkable objects such as
 | |
| > hosts or services here. If they are checked by all instances globally, this will lead
 | |
| > into duplicated check results and unclear state history. Not easy to troubleshoot too -
 | |
| > you've been warned.
 | |
| 
 | |
| That is not necessary by defining a global zone shipping all those templates. By setting
 | |
| `global = true` you ensure that this zone serving common configuration templates will be
 | |
| synchronized to all involved nodes (only if they accept configuration though).
 | |
| 
 | |
| Config on configuration master:
 | |
| 
 | |
|     /etc/icinga2/zones.d
 | |
|       global-templates/
 | |
|         templates.conf
 | |
|         groups.conf
 | |
|       master
 | |
|         health.conf
 | |
|       checker
 | |
|         health.conf
 | |
|         demo.conf
 | |
| 
 | |
| In this example, the global zone is called `global-templates` and must be defined in
 | |
| your zone configuration visible to all nodes.
 | |
| 
 | |
|     object Zone "global-templates" {
 | |
|       global = true
 | |
|     }
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > If the remote node does not have this zone configured, it will ignore the configuration
 | |
| > update, if it accepts synchronized configuration.
 | |
| 
 | |
| If you don't require any global configuration, skip this setting.
 | |
| 
 | |
| ### <a id="zone-config-sync-permissions"></a> Zone Configuration Synchronisation Permissions
 | |
| 
 | |
| Each [ApiListener](6-object-types.md#objecttype-apilistener) object must have the `accept_config` attribute
 | |
| set to `true` to receive configuration from the parent `Zone` members. Default value is `false`.
 | |
| 
 | |
|     object ApiListener "api" {
 | |
|       cert_path = SysconfDir + "/icinga2/pki/" + NodeName + ".crt"
 | |
|       key_path = SysconfDir + "/icinga2/pki/" + NodeName + ".key"
 | |
|       ca_path = SysconfDir + "/icinga2/pki/ca.crt"
 | |
|       accept_config = true
 | |
|     }
 | |
| 
 | |
| If `accept_config` is set to `false`, this instance won't accept configuration from remote
 | |
| master instances anymore.
 | |
| 
 | |
| > ** Tip **
 | |
| >
 | |
| > Look into the [troubleshooting guides](16-troubleshooting.md#troubleshooting-cluster-config-sync) for debugging
 | |
| > problems with the configuration synchronisation.
 | |
| 
 | |
| 
 | |
| ### <a id="zone-config-sync-best-practice"></a> Zone Configuration Synchronisation Best Practice
 | |
| 
 | |
| The configuration synchronisation works with multiple hierarchies. The following example
 | |
| illustrate a quite common setup where the master is reponsible for configuration deployment:
 | |
| 
 | |
| * [High-Availability master zone](12-distributed-monitoring-ha.md#distributed-monitoring-high-availability)
 | |
| * [Distributed satellites](12-distributed-monitoring-ha.md#)
 | |
| * [Remote clients](10-icinga2-client.md#icinga2-client-scenarios) connected to the satellite
 | |
| 
 | |
| While you could use the clients with local configuration and service discovery on the satellite/master
 | |
| **bottom up**, the configuration sync could be more reasonable working **top-down** in a cascaded scenario.
 | |
| 
 | |
| Take pen and paper and draw your network scenario including the involved zone and endpoint names.
 | |
| Once you've added them to your zones.conf as connection and permission configuration, start over with
 | |
| the actual configuration organization:
 | |
| 
 | |
| * Ensure that `command` object definitions are globally available. That way you can use the
 | |
| `command_endpoint` configuration more easily on clients as [command execution bridge](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
 | |
| * Generic `Templates`, `timeperiods`, `downtimes` should be synchronized in a global zone as well.
 | |
| * [Apply rules](3-monitoring-basics.md#using-apply) can be synchronized globally. Keep in mind that they are evaluated on each instance,
 | |
| and might require additional filters (e.g. `match("icinga2*", NodeName) or similar based on the zone information.
 | |
| * [Apply rules](3-monitoring-basics.md#using-apply) specified inside zone directories will only affect endpoints in the same zone or below.
 | |
| * Host configuration must be put into the specific zone directory.
 | |
| * Duplicated host and service objects (also generated by faulty apply rules) will generate a configuration error.
 | |
| * Consider using custom constants in your host/service configuration. Each instance may set their local value, e.g. for `PluginDir`.
 | |
| 
 | |
| This example specifies the following hierarchy over three levels:
 | |
| 
 | |
| * `ha-master` zone with two child zones `dmz1-checker` and `dmz2-checker`
 | |
| * `dmz1-checker` has two client child zones `dmz1-client1` and `dmz1-client2`
 | |
| * `dmz2-checker` has one client child zone `dmz2-client9`
 | |
| 
 | |
| The configuration tree could look like this:
 | |
| 
 | |
|     # tree /etc/icinga2/zones.d
 | |
|     /etc/icinga2/zones.d
 | |
|     ├── dmz1-checker
 | |
|     │   └── health.conf
 | |
|     ├── dmz1-client1
 | |
|     │   └── hosts.conf
 | |
|     ├── dmz1-client2
 | |
|     │   └── hosts.conf
 | |
|     ├── dmz2-checker
 | |
|     │   └── health.conf
 | |
|     ├── dmz2-client9
 | |
|     │   └── hosts.conf
 | |
|     ├── global-templates
 | |
|     │   ├── apply_notifications.conf
 | |
|     │   ├── apply_services.conf
 | |
|     │   ├── commands.conf
 | |
|     │   ├── groups.conf
 | |
|     │   ├── templates.conf
 | |
|     │   └── users.conf
 | |
|     ├── ha-master
 | |
|     │   └── health.conf
 | |
|     └── README
 | |
|     
 | |
|     7 directories, 13 files
 | |
| 
 | |
| If you prefer a different naming schema for directories or files names, go for it. If you
 | |
| are unsure about the best method, join the [support channels](1-about.md#support) and discuss
 | |
| with the community.
 | |
| 
 | |
| If you are planning to synchronize local service health checks inside a zone, look into the
 | |
| [command endpoint](12-distributed-monitoring-ha.md#cluster-health-check-command-endpoint)
 | |
| explainations.
 | |
| 
 | |
| 
 | |
| 
 | |
| ## <a id="cluster-health-check"></a> Cluster Health Check
 | |
| 
 | |
| The Icinga 2 [ITL](7-icinga-template-library.md#icinga-template-library) provides
 | |
| an internal check command checking all configured `EndPoints` in the cluster setup.
 | |
| The check result will become critical if one or more configured nodes are not connected.
 | |
| 
 | |
| Example:
 | |
| 
 | |
|     object Host "icinga2a" {
 | |
|       display_name = "Health Checks on icinga2a"
 | |
| 
 | |
|       address = "192.168.33.10"
 | |
|       check_command = "hostalive"
 | |
|     }
 | |
| 
 | |
|     object Service "cluster" {
 | |
|         check_command = "cluster"
 | |
|         check_interval = 5s
 | |
|         retry_interval = 1s
 | |
| 
 | |
|         host_name = "icinga2a"
 | |
|     }
 | |
| 
 | |
| Each cluster node should execute its own local cluster health check to
 | |
| get an idea about network related connection problems from different
 | |
| points of view.
 | |
| 
 | |
| Additionally you can monitor the connection from the local zone to the remote
 | |
| connected zones.
 | |
| 
 | |
| Example for the `checker` zone checking the connection to the `master` zone:
 | |
| 
 | |
|     object Service "cluster-zone-master" {
 | |
|       check_command = "cluster-zone"
 | |
|       check_interval = 5s
 | |
|       retry_interval = 1s
 | |
|       vars.cluster_zone = "master"
 | |
| 
 | |
|       host_name = "icinga2b"
 | |
|     }
 | |
| 
 | |
| ## <a id="cluster-health-check-command-endpoint"></a> Cluster Health Check with Command Endpoints
 | |
| 
 | |
| If you are planning to sync the zone configuration inside a [High-Availability]()
 | |
| cluster zone, you can also use the `command_endpoint` object attribute to
 | |
| pin host/service checks to a specific endpoint inside the same zone.
 | |
| 
 | |
| This requires the `accept_commands` setting inside the [ApiListener](12-distributed-monitoring-ha.md#configure-apilistener-object)
 | |
| object set to `true` similar to the [remote client command execution bridge](10-icinga2-client.md#icinga2-client-configuration-command-bridge)
 | |
| setup.
 | |
| 
 | |
| Make sure to set `command_endpoint` to the correct endpoint instance.
 | |
| The example below assumes that the endpoint name is the same as the
 | |
| host name configured for health checks. If it differs, define a host
 | |
| custom attribute providing [this information](10-icinga2-client.md#icinga2-client-configuration-command-bridge-master-config).
 | |
| 
 | |
|     apply Service "cluster-ha" {
 | |
|       check_command = "cluster"
 | |
|       check_interval = 5s
 | |
|       retry_interval = 1s
 | |
|       /* make sure host.name is the same as endpoint name */
 | |
|       command_endpoint = host.name
 | |
| 
 | |
|       assign where regex("^icinga2[a|b]", host.name)
 | |
|     }
 | |
| 
 | |
| 
 | |
| ## <a id="cluster-scenarios"></a> Cluster Scenarios
 | |
| 
 | |
| All cluster nodes are full-featured Icinga 2 instances. You only need to enabled
 | |
| the features for their role (for example, a `Checker` node only requires the `checker`
 | |
| feature enabled, but not `notification` or `ido-mysql` features).
 | |
| 
 | |
| > **Tip**
 | |
| >
 | |
| > There's a [Vagrant demo setup](https://github.com/Icinga/icinga-vagrant/tree/master/icinga2x-cluster)
 | |
| > available featuring a two node cluster showcasing several aspects (config sync,
 | |
| > remote command execution, etc).
 | |
| 
 | |
| ### <a id="cluster-scenarios-master-satellite-clients"></a> Cluster with Master, Satellites and Remote Clients
 | |
| 
 | |
| You can combine "classic" cluster scenarios from HA to Master-Checker with the
 | |
| Icinga 2 Remote Client modes. Each instance plays a certain role in that picture.
 | |
| 
 | |
| Imagine the following scenario:
 | |
| 
 | |
| * The master zone acts as High-Availability zone
 | |
| * Remote satellite zones execute local checks and report them to the master
 | |
| * All satellites query remote clients and receive check results (which they also replay to the master)
 | |
| * All involved nodes share the same configuration logic: zones, endpoints, apilisteners
 | |
| 
 | |
| You'll need to think about the following:
 | |
| 
 | |
| * Deploy the entire configuration from the master to satellites and cascading remote clients? ("top down")
 | |
| * Use local client configuration instead and report the inventory to satellites and cascading to the master? ("bottom up")
 | |
| * Combine that with command execution brdiges on remote clients and also satellites
 | |
| 
 | |
| 
 | |
| 
 | |
| 
 | |
| ### <a id="cluster-scenarios-security"></a> Security in Cluster Scenarios
 | |
| 
 | |
| While there are certain capabilities to ensure the safe communication between all
 | |
| nodes (firewalls, policies, software hardening, etc) the Icinga 2 cluster also provides
 | |
| additional security itself:
 | |
| 
 | |
| * [SSL certificates](12-distributed-monitoring-ha.md#manual-certificate-generation) are mandatory for cluster communication.
 | |
| * Child zones only receive event updates (check results, commands, etc) for their configured updates.
 | |
| * Zones cannot influence/interfere other zones. Each checked object is assigned to only one zone.
 | |
| * All nodes in a zone trust each other.
 | |
| * [Configuration sync](12-distributed-monitoring-ha.md#zone-config-sync-permissions) is disabled by default.
 | |
| 
 | |
| ### <a id="cluster-scenarios-features"></a> Features in Cluster Zones
 | |
| 
 | |
| Each cluster zone may use all available features. If you have multiple locations
 | |
| or departments, they may write to their local database, or populate graphite.
 | |
| Even further all commands are distributed amongst connected nodes. For example, you could
 | |
| re-schedule a check or acknowledge a problem on the master, and it gets replicated to the
 | |
| actual slave checker node.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > All features must be same on all endpoints inside an [HA zone](12-distributed-monitoring-ha.md#cluster-scenarios-high-availability).
 | |
| > There are additional [High-Availability-enabled features](12-distributed-monitoring-ha.md#high-availability-features) available.
 | |
| 
 | |
| ### <a id="cluster-scenarios-distributed-zones"></a> Distributed Zones
 | |
| 
 | |
| That scenario fits if your instances are spread over the globe and they all report
 | |
| to a master instance. Their network connection only works towards the master master
 | |
| (or the master is able to connect, depending on firewall policies) which means
 | |
| remote instances won't see each/connect to each other.
 | |
| 
 | |
| All events (check results, downtimes, comments, etc) are synced to the master node,
 | |
| but the remote nodes can still run local features such as a web interface, reporting,
 | |
| graphing, etc. in their own specified zone.
 | |
| 
 | |
| Imagine the following example with a master node in Nuremberg, and two remote DMZ
 | |
| based instances in Berlin and Vienna. Additonally you'll specify
 | |
| [global templates](12-distributed-monitoring-ha.md#zone-global-config-templates) available in all zones.
 | |
| 
 | |
| The configuration tree on the master instance `nuremberg` could look like this:
 | |
| 
 | |
|     zones.d
 | |
|       global-templates/
 | |
|         templates.conf
 | |
|         groups.conf
 | |
|       nuremberg/
 | |
|         local.conf
 | |
|       berlin/
 | |
|         hosts.conf
 | |
|       vienna/
 | |
|         hosts.conf
 | |
| 
 | |
| The configuration deployment will take care of automatically synchronising
 | |
| the child zone configuration:
 | |
| 
 | |
| * The master node sends `zones.d/berlin` to the `berlin` child zone.
 | |
| * The master node sends `zones.d/vienna` to the `vienna` child zone.
 | |
| * The master node sends `zones.d/global-templates` to the `vienna` and `berlin` child zones.
 | |
| 
 | |
| The endpoint configuration would look like:
 | |
| 
 | |
|     object Endpoint "nuremberg-master" {
 | |
|       host = "nuremberg.icinga.org"
 | |
|     }
 | |
| 
 | |
|     object Endpoint "berlin-satellite" {
 | |
|       host = "berlin.icinga.org"
 | |
|     }
 | |
| 
 | |
|     object Endpoint "vienna-satellite" {
 | |
|       host = "vienna.icinga.org"
 | |
|     }
 | |
| 
 | |
| The zones would look like:
 | |
| 
 | |
|     object Zone "nuremberg" {
 | |
|       endpoints = [ "nuremberg-master" ]
 | |
|     }
 | |
| 
 | |
|     object Zone "berlin" {
 | |
|       endpoints = [ "berlin-satellite" ]
 | |
|       parent = "nuremberg"
 | |
|     }
 | |
| 
 | |
|     object Zone "vienna" {
 | |
|       endpoints = [ "vienna-satellite" ]
 | |
|       parent = "nuremberg"
 | |
|     }
 | |
| 
 | |
|     object Zone "global-templates" {
 | |
|       global = true
 | |
|     }
 | |
| 
 | |
| The `nuremberg-master` zone will only execute local checks, and receive
 | |
| check results from the satellite nodes in the zones `berlin` and `vienna`.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > The child zones `berlin` and `vienna` will get their configuration synchronised
 | |
| > from the configuration master 'nuremberg'. The endpoints in the child
 | |
| > zones **must not** have their `zones.d` directory populated if this endpoint
 | |
| > [accepts synced configuration](12-distributed-monitoring-ha.md#zone-config-sync-permissions).
 | |
| 
 | |
| ### <a id="cluster-scenarios-load-distribution"></a> Load Distribution
 | |
| 
 | |
| If you are planning to off-load the checks to a defined set of remote workers
 | |
| you can achieve that by:
 | |
| 
 | |
| * Deploying the configuration on all nodes.
 | |
| * Let Icinga 2 distribute the load amongst all available nodes.
 | |
| 
 | |
| That way all remote check instances will receive the same configuration
 | |
| but only execute their part. The master instance located in the `master` zone
 | |
| can also execute checks, but you may also disable the `Checker` feature.
 | |
| 
 | |
| Configuration on the master node:
 | |
| 
 | |
|     zones.d/
 | |
|       global-templates/
 | |
|       master/
 | |
|       checker/
 | |
| 
 | |
| If you are planning to have some checks executed by a specific set of checker nodes
 | |
| you have to define additional zones and define these check objects there.
 | |
| 
 | |
| Endpoints:
 | |
| 
 | |
|     object Endpoint "master-node" {
 | |
|       host = "master.icinga.org"
 | |
|     }
 | |
| 
 | |
|     object Endpoint "checker1-node" {
 | |
|       host = "checker1.icinga.org"
 | |
|     }
 | |
| 
 | |
|     object Endpoint "checker2-node" {
 | |
|       host = "checker2.icinga.org"
 | |
|     }
 | |
| 
 | |
| 
 | |
| Zones:
 | |
| 
 | |
|     object Zone "master" {
 | |
|       endpoints = [ "master-node" ]
 | |
|     }
 | |
| 
 | |
|     object Zone "checker" {
 | |
|       endpoints = [ "checker1-node", "checker2-node" ]
 | |
|       parent = "master"
 | |
|     }
 | |
| 
 | |
|     object Zone "global-templates" {
 | |
|       global = true
 | |
|     }
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > The child zones `checker` will get its configuration synchronised
 | |
| > from the configuration master 'master'. The endpoints in the child
 | |
| > zone **must not** have their `zones.d` directory populated if this endpoint
 | |
| > [accepts synced configuration](12-distributed-monitoring-ha.md#zone-config-sync-permissions).
 | |
| 
 | |
| ### <a id="cluster-scenarios-high-availability"></a> Cluster High Availability
 | |
| 
 | |
| High availability with Icinga 2 is possible by putting multiple nodes into
 | |
| a dedicated [zone](12-distributed-monitoring-ha.md#configure-cluster-zones). All nodes will elect one
 | |
| active master, and retry an election once the current active master is down.
 | |
| 
 | |
| Selected features provide advanced [HA functionality](12-distributed-monitoring-ha.md#high-availability-features).
 | |
| Checks and notifications are load-balanced between nodes in the high availability
 | |
| zone.
 | |
| 
 | |
| Connections from other zones will be accepted by all active and passive nodes
 | |
| but all are forwarded to the current active master dealing with the check results,
 | |
| commands, etc.
 | |
| 
 | |
|     object Zone "config-ha-master" {
 | |
|       endpoints = [ "icinga2a", "icinga2b", "icinga2c" ]
 | |
|     }
 | |
| 
 | |
| Two or more nodes in a high availability setup require an [initial cluster sync](12-distributed-monitoring-ha.md#initial-cluster-sync).
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > Keep in mind that **only one node acts as configuration master** having the
 | |
| > configuration files in the `zones.d` directory. All other nodes **must not**
 | |
| > have that directory populated. Instead they are required to
 | |
| > [accept synced configuration](12-distributed-monitoring-ha.md#zone-config-sync-permissions).
 | |
| > Details in the [Configuration Sync Chapter](12-distributed-monitoring-ha.md#cluster-zone-config-sync).
 | |
| 
 | |
| ### <a id="cluster-scenarios-multiple-hierarchies"></a> Multiple Hierarchies
 | |
| 
 | |
| Your master zone collects all check results for reporting and graphing and also
 | |
| does some sort of additional notifications.
 | |
| The customers got their own instances in their local DMZ zones. They are limited to read/write
 | |
| only their services, but replicate all events back to the master instance.
 | |
| Within each DMZ there are additional check instances also serving interfaces for local
 | |
| departments. The customers instances will collect all results, but also send them back to
 | |
| your master instance.
 | |
| Additionally the customers instance on the second level in the middle prohibits you from
 | |
| sending commands to the subjacent department nodes. You're only allowed to receive the
 | |
| results, and a subset of each customers configuration too.
 | |
| 
 | |
| Your master zone will generate global reports, aggregate alert notifications, and check
 | |
| additional dependencies (for example, the customers internet uplink and bandwidth usage).
 | |
| 
 | |
| The customers zone instances will only check a subset of local services and delegate the rest
 | |
| to each department. Even though it acts as configuration master with a master dashboard
 | |
| for all departments managing their configuration tree which is then deployed to all
 | |
| department instances. Furthermore the master NOC is able to see what's going on.
 | |
| 
 | |
| The instances in the departments will serve a local interface, and allow the administrators
 | |
| to reschedule checks or acknowledge problems for their services.
 | |
| 
 | |
| 
 | |
| ## <a id="high-availability-features"></a> High Availability for Icinga 2 features
 | |
| 
 | |
| All nodes in the same zone require the same features enabled for High Availability (HA)
 | |
| amongst them.
 | |
| 
 | |
| By default the following features provide advanced HA functionality:
 | |
| 
 | |
| * [Checks](12-distributed-monitoring-ha.md#high-availability-checks) (load balanced, automated failover)
 | |
| * [Notifications](12-distributed-monitoring-ha.md#high-availability-notifications) (load balanced, automated failover)
 | |
| * [DB IDO](12-distributed-monitoring-ha.md#high-availability-db-ido) (Run-Once, automated failover)
 | |
| 
 | |
| ### <a id="high-availability-checks"></a> High Availability with Checks
 | |
| 
 | |
| All instances within the same zone (e.g. the `master` zone as HA cluster) must
 | |
| have the `checker` feature enabled.
 | |
| 
 | |
| Example:
 | |
| 
 | |
|     # icinga2 feature enable checker
 | |
| 
 | |
| All nodes in the same zone load-balance the check execution. When one instance shuts down
 | |
| the other nodes will automatically take over the reamining checks.
 | |
| 
 | |
| ### <a id="high-availability-notifications"></a> High Availability with Notifications
 | |
| 
 | |
| All instances within the same zone (e.g. the `master` zone as HA cluster) must
 | |
| have the `notification` feature enabled.
 | |
| 
 | |
| Example:
 | |
| 
 | |
|     # icinga2 feature enable notification
 | |
| 
 | |
| Notifications are load balanced amongst all nodes in a zone. By default this functionality
 | |
| is enabled.
 | |
| If your nodes should notify independent from any other nodes (this will cause
 | |
| duplicated notifications if not properly handled!), you can set `enable_ha = false`
 | |
| in the [NotificationComponent](6-object-types.md#objecttype-notificationcomponent) feature.
 | |
| 
 | |
| ### <a id="high-availability-db-ido"></a> High Availability with DB IDO
 | |
| 
 | |
| All instances within the same zone (e.g. the `master` zone as HA cluster) must
 | |
| have the DB IDO feature enabled.
 | |
| 
 | |
| Example DB IDO MySQL:
 | |
| 
 | |
|     # icinga2 feature enable ido-mysql
 | |
| 
 | |
| By default the DB IDO feature only runs on one node. All other nodes in the same zone disable
 | |
| the active IDO database connection at runtime. The node with the active DB IDO connection is
 | |
| not necessarily the zone master.
 | |
| 
 | |
| > **Note**
 | |
| >
 | |
| > The DB IDO HA feature can be disabled by setting the `enable_ha` attribute to `false`
 | |
| > for the [IdoMysqlConnection](6-object-types.md#objecttype-idomysqlconnection) or
 | |
| > [IdoPgsqlConnection](6-object-types.md#objecttype-idopgsqlconnection) object on **all** nodes in the
 | |
| > **same** zone.
 | |
| >
 | |
| > All endpoints will enable the DB IDO feature and connect to the configured
 | |
| > database and dump configuration, status and historical data on their own.
 | |
| 
 | |
| If the instance with the active DB IDO connection dies, the HA functionality will
 | |
| automatically elect a new DB IDO master.
 | |
| 
 | |
| The DB IDO feature will try to determine which cluster endpoint is currently writing
 | |
| to the database and bail out if another endpoint is active. You can manually verify that
 | |
| by running the following query:
 | |
| 
 | |
|     icinga=> SELECT status_update_time, endpoint_name FROM icinga_programstatus;
 | |
|        status_update_time   | endpoint_name
 | |
|     ------------------------+---------------
 | |
|      2014-08-15 15:52:26+02 | icinga2a
 | |
|     (1 Zeile)
 | |
| 
 | |
| This is useful when the cluster connection between endpoints breaks, and prevents
 | |
| data duplication in split-brain-scenarios. The failover timeout can be set for the
 | |
| `failover_timeout` attribute, but not lower than 60 seconds.
 | |
| 
 | |
| 
 | |
| ## <a id="cluster-add-node"></a> Add a new cluster endpoint
 | |
| 
 | |
| These steps are required for integrating a new cluster endpoint:
 | |
| 
 | |
| * generate a new [SSL client certificate](12-distributed-monitoring-ha.md#manual-certificate-generation)
 | |
| * identify its location in the zones
 | |
| * update the `zones.conf` file on each involved node ([endpoint](12-distributed-monitoring-ha.md#configure-cluster-endpoints), [zones](12-distributed-monitoring-ha.md#configure-cluster-zones))
 | |
|     * a new slave zone node requires updates for the master and slave zones
 | |
|     * verify if this endpoints requires [configuration synchronisation](12-distributed-monitoring-ha.md#cluster-zone-config-sync) enabled
 | |
| * if the node requires the existing zone history: [initial cluster sync](12-distributed-monitoring-ha.md#initial-cluster-sync)
 | |
| * add a [cluster health check](12-distributed-monitoring-ha.md#cluster-health-check)
 | |
| 
 | |
| ### <a id="initial-cluster-sync"></a> Initial Cluster Sync
 | |
| 
 | |
| In order to make sure that all of your cluster nodes have the same state you will
 | |
| have to pick one of the nodes as your initial "master" and copy its state file
 | |
| to all the other nodes.
 | |
| 
 | |
| You can find the state file in `/var/lib/icinga2/icinga2.state`. Before copying
 | |
| the state file you should make sure that all your cluster nodes are properly shut
 | |
| down.
 | |
| 
 | |
| 
 | |
| ## <a id="host-multiple-cluster-nodes"></a> Host With Multiple Cluster Nodes
 | |
| 
 | |
| Special scenarios might require multiple cluster nodes running on a single host.
 | |
| By default Icinga 2 and its features will place their runtime data below the prefix
 | |
| `LocalStateDir`. By default packages will set that path to `/var`.
 | |
| You can either set that variable as constant configuration
 | |
| definition in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) or pass it as runtime variable to
 | |
| the Icinga 2 daemon.
 | |
| 
 | |
|     # icinga2 -c /etc/icinga2/node1/icinga2.conf -DLocalStateDir=/opt/node1/var
 |