diff --git a/doc/06-distributed-monitoring.md b/doc/06-distributed-monitoring.md index f630ef021..a423eb392 100644 --- a/doc/06-distributed-monitoring.md +++ b/doc/06-distributed-monitoring.md @@ -2,7 +2,7 @@ This chapter will guide you through the setup of a distributed monitoring environment, including high-availability clustering and setup details -for the Icinga 2 client. +for Icinga masters, satellites and agents. ## Roles: Master, Satellites, and Clients @@ -10,7 +10,7 @@ Icinga 2 nodes can be given names for easier understanding: * A `master` node which is on top of the hierarchy. * A `satellite` node which is a child of a `satellite` or `master` node. -* A `client` node which works as an `agent` connected to `master` and/or `satellite` nodes. +* An `agent` node which is connected to `master` and/or `satellite` nodes. ![Icinga 2 Distributed Roles](images/distributed-monitoring/icinga2_distributed_roles.png) @@ -23,17 +23,26 @@ Rephrasing this picture into more details: * A `satellite` node may execute checks on its own or delegate check execution to child nodes. * A `satellite` node can receive configuration for hosts/services, etc. from the parent node. * A `satellite` node continues to run even if the master node is temporarily unavailable. -* A `client` node only has a parent node. - * A `client` node will either run its own configured checks or receive command execution events from the parent node. +* An `agent` node only has a parent node. + * An `agent` node will either run its own configured checks or receive command execution events from the parent node. + +A client can be a secondary master, a satellite or an agent. It +typically requests something from the primary master or parent node. The following sections will refer to these roles and explain the differences and the possibilities this kind of setup offers. +> **Note** +> +> Previous versions of this documentation used the term `Icinga client`. +> This has been refined into `Icinga agent` and is visible in the docs, +> backends and web interfaces. + **Tip**: If you just want to install a single master node that monitors several hosts -(i.e. Icinga 2 clients), continue reading -- we'll start with +(i.e. Icinga agents), continue reading -- we'll start with simple examples. In case you are planning a huge cluster setup with multiple levels and -lots of clients, read on -- we'll deal with these cases later on. +lots of satellites and agents, read on -- we'll deal with these cases later on. The installation on each system is the same: You need to install the [Icinga 2 package](02-installation.md#setting-up-icinga2) and the required [plugins](02-installation.md#setting-up-check-plugins). @@ -74,8 +83,8 @@ trust hierarchy allows for example the `master` zone to send configuration files to the `satellite` zone. Read more about this in the [security section](06-distributed-monitoring.md#distributed-monitoring-security). -`client` nodes also have their own unique zone. By convention you -can use the FQDN for the zone name. +`agent` nodes also have their own unique zone. By convention you +must use the FQDN for the zone name. ## Endpoints @@ -108,7 +117,7 @@ All endpoints in the same zone work as high-availability setup. For example, if you have two nodes in the `master` zone, they will load-balance the check execution. Endpoint objects are important for specifying the connection -information, e.g. if the master should actively try to connect to a client. +information, e.g. if the master should actively try to connect to an agent. The zone membership is defined inside the `Zone` object definition using the `endpoints` attribute with an array of `Endpoint` names. @@ -164,7 +173,7 @@ While there are certain mechanisms to ensure a secure communication between all nodes (firewalls, policies, software hardening, etc.), Icinga 2 also provides additional security: -* SSL certificates are mandatory for communication between nodes. The CLI commands +* TLS/SSL certificates are mandatory for communication between nodes. The CLI commands help you create those certificates. * Child zones only receive updates (check results, commands, etc.) for their configured objects. * Child zones are not allowed to push configuration updates to parent zones. @@ -177,22 +186,22 @@ The connection is secured by TLS. The message protocol uses an internal API, and as such message types and names may change internally and are not documented. Zones build the trust relationship in a distributed environment. If you do not specify -a zone for a client and specify the parent zone, its zone members e.g. the master instance -won't trust the client. +a zone for an agent/satellite and specify the parent zone, its zone members e.g. the master instance +won't trust the agent/satellite. Building this trust is key in your distributed environment. That way the parent node knows that it is able to send messages to the child zone, e.g. configuration objects, configuration in global zones, commands to be executed in this zone/for this endpoint. It also receives check results from the child zone for checkable objects (host/service). -Vice versa, the client trusts the master and accepts configuration and commands if enabled -in the api feature. If the client would send configuration to the parent zone, the parent nodes -will deny it. The parent zone is the configuration entity, and does not trust clients in this matter. -A client could attempt to modify a different client for example, or inject a check command +Vice versa, the agent/satellite trusts the master and accepts configuration and commands if enabled +in the api feature. If the agent/satellite would send configuration to the parent zone, the parent nodes +will deny it. The parent zone is the configuration entity, and does not trust agents/satellites in this matter. +An agent/satellite could attempt to modify a different agent/satellite for example, or inject a check command with malicious code. -While it may sound complicated for client setups, it removes the problem with different roles -and configurations for a master and a client. Both of them work the same way, are configured +While it may sound complicated for agent/satellite setups, it removes the problem with different roles +and configurations for a master and child nodes. Both of them work the same way, are configured in the same way (Zone, Endpoint, ApiListener), and you can troubleshoot and debug them in just one go. ## Versions and Upgrade @@ -203,14 +212,14 @@ Prior to upgrading, make sure to plan a maintenance window. The Icinga project aims to allow the following compatibility: ``` -master (2.11) >= satellite (2.10) >= clients (2.9) +master (2.11) >= satellite (2.10) >= agent (2.9) ``` -Older client versions may work, but there's no guarantee. Always keep in mind that +Older agent versions may work, but there's no guarantee. Always keep in mind that older versions are out of support and can contain bugs. In terms of an upgrade, ensure that the master is upgraded first, then -involved satellites, and last the Icinga 2 clients. If you are on v2.10 +involved satellites, and last the Icinga agents. If you are on v2.10 currently, first upgrade the master instance(s) to 2.11, and then proceed with the satellites. Things are getting easier with any sort of automation tool (Puppet, Ansible, etc.). @@ -263,7 +272,7 @@ Welcome to the Icinga 2 Setup Wizard! We will guide you through all required configuration details. -Please specify if this is a satellite/client setup ('n' installs a master setup) [Y/n]: n +Please specify if this is a satellite/agent setup ('n' installs a master setup) [Y/n]: n Starting the Master setup routine... @@ -295,7 +304,7 @@ Now restart your Icinga 2 daemon to finish the installation! You can verify that the CA public and private keys are stored in the `/var/lib/icinga2/ca` directory. Keep this path secure and include it in your [backups](02-installation.md#install-backup). -In case you lose the CA private key you have to generate a new CA for signing new client +In case you lose the CA private key you have to generate a new CA for signing new agent/satellite certificate requests. You then have to also re-create new signed certificates for all existing nodes. @@ -314,7 +323,7 @@ and should be the same on all master instances. You can avoid signing and deploying certificates [manually](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-certificates-manual) by using built-in methods for auto-signing certificate signing requests (CSR): -* [CSR Auto-Signing](06-distributed-monitoring.md#distributed-monitoring-setup-csr-auto-signing) which uses a client ticket generated on the master as trust identifier. +* [CSR Auto-Signing](06-distributed-monitoring.md#distributed-monitoring-setup-csr-auto-signing) which uses a client (an agent or a satellite) ticket generated on the master as trust identifier. * [On-Demand CSR Signing](06-distributed-monitoring.md#distributed-monitoring-setup-on-demand-csr-signing) which allows to sign pending certificate requests on the master. Both methods are described in detail below. @@ -325,8 +334,8 @@ Both methods are described in detail below. ### CSR Auto-Signing -A client which sends a certificate signing request (CSR) must authenticate itself -in a trusted way. The master generates a client ticket which is included in this request. +A client can be a secondary master, a satellite or an agent. It sends a certificate signing request (CSR) +and must authenticate itself in a trusted way. The master generates a client ticket which is included in this request. That way the master can verify that the request matches the previously trusted ticket and sign the request. @@ -334,12 +343,12 @@ and sign the request. > > Icinga 2 v2.8 added the possibility to forward signing requests on a satellite > to the master node. This is called `CA Proxy` in blog posts and design drafts. -> This functionality helps with the setup of [three level clusters](#06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-client) +> This functionality helps with the setup of [three level clusters](06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-agents) > and more. Advantages: -* Nodes can be installed by different users who have received the client ticket. +* Nodes (secondary master, satellites, agents) can be installed by different users who have received the client ticket. * No manual interaction necessary on the master node. * Automation tools like Puppet, Ansible, etc. can retrieve the pre-generated ticket in their client catalog and run the node setup directly. @@ -350,7 +359,7 @@ Disadvantages: * No central signing management. -Setup wizards for satellite/client nodes will ask you for this specific client ticket. +Setup wizards for agent/satellite nodes will ask you for this specific client ticket. There are two possible ways to retrieve the ticket: @@ -361,9 +370,9 @@ Required information: Parameter | Description --------------------|-------------------- - Common name (CN) | **Required.** The common name for the satellite/client. By convention this should be the FQDN. + Common name (CN) | **Required.** The common name for the agent/satellite. By convention this should be the FQDN. -The following example shows how to generate a ticket on the master node `icinga2-master1.localdomain` for the client `icinga2-agent1.localdomain`: +The following example shows how to generate a ticket on the master node `icinga2-master1.localdomain` for the agent `icinga2-agent1.localdomain`: ``` [root@icinga2-master1.localdomain /]# icinga2 pki ticket --cn icinga2-agent1.localdomain @@ -388,7 +397,7 @@ Retrieve the ticket on the master node `icinga2-master1.localdomain` with `curl` -X POST 'https://localhost:5665/v1/actions/generate-ticket' -d '{ "cn": "icinga2-agent1.localdomain" }' ``` -Store that ticket number for the satellite/client setup below. +Store that ticket number for the agent/satellite setup below. > **Note** > @@ -399,8 +408,9 @@ Store that ticket number for the satellite/client setup below. ### On-Demand CSR Signing -The client sends a certificate signing request to specified parent node without any -ticket. The admin on the master is responsible for reviewing and signing the requests +The client can be a secondary master, satellite or agent. +It sends a certificate signing request to specified parent node without any +ticket. The admin on the primary master is responsible for reviewing and signing the requests with the private CA key. This could either be directly the master, or a satellite which forwards the request @@ -460,18 +470,22 @@ information/cli: Certificate 5c31ca0e2269c10363a97e40e3f2b2cd56493f9194d5b185254 ``` If you want to restore a certificate you have removed, you can use `ca restore`. + + -## Client/Satellite Setup +## Agent/Satellite Setup -This section describes the setup of a satellite and/or client connected to an +This section describes the setup of an agent or satellite connected to an existing master node setup. If you haven't done so already, please [run the master setup](06-distributed-monitoring.md#distributed-monitoring-setup-master). Icinga 2 on the master node must be running and accepting connections on port `5665`. + + -### Client/Satellite Setup on Linux +### Agent/Satellite Setup on Linux -Please ensure that you've run all the steps mentioned in the [client/satellite section](06-distributed-monitoring.md#distributed-monitoring-setup-satellite-client). +Please ensure that you've run all the steps mentioned in the [agent/satellite section](06-distributed-monitoring.md#distributed-monitoring-setup-agent-satellite). Install the [Icinga 2 package](02-installation.md#setting-up-icinga2) and setup the required [plugins](02-installation.md#setting-up-check-plugins) if you haven't done @@ -479,7 +493,7 @@ so already. The next step is to run the `node wizard` CLI command. -In this example we're generating a ticket on the master node `icinga2-master1.localdomain` for the client `icinga2-agent1.localdomain`: +In this example we're generating a ticket on the master node `icinga2-master1.localdomain` for the agent `icinga2-agent1.localdomain`: ``` [root@icinga2-master1.localdomain /]# icinga2 pki ticket --cn icinga2-agent1.localdomain @@ -488,7 +502,7 @@ In this example we're generating a ticket on the master node `icinga2-master1.lo Note: You don't need this step if you have chosen to use [On-Demand CSR Signing](06-distributed-monitoring.md#distributed-monitoring-setup-on-demand-csr-signing). -Start the wizard on the client `icinga2-agent1.localdomain`: +Start the wizard on the agent `icinga2-agent1.localdomain`: ``` [root@icinga2-agent1.localdomain /]# icinga2 node wizard @@ -498,10 +512,10 @@ Welcome to the Icinga 2 Setup Wizard! We will guide you through all required configuration details. ``` -Press `Enter` or add `y` to start a satellite or client setup. +Press `Enter` or add `y` to start a satellite or agent setup. ``` -Please specify if this is a satellite/client setup ('n' installs a master setup) [Y/n]: +Please specify if this is an agent/satellite setup ('n' installs a master setup) [Y/n]: ``` Press `Enter` to use the proposed name in brackets, or add a specific common name (CN). By convention @@ -612,7 +626,7 @@ Set the local zone name to something else, if you are installing a satellite or Local zone name [icinga2-agent1.localdomain]: ``` -Set the parent zone name to something else than `master` if this client connects to a satellite instance instead of the master. +Set the parent zone name to something else than `master` if this agents connects to a satellite instance instead of the master. ``` Parent zone name [master]: @@ -629,7 +643,7 @@ Do you want to specify additional global zones? [y/N]: N ``` Last but not least the wizard asks you whether you want to disable the inclusion of the local configuration -directory in `conf.d`, or not. Defaults to disabled, as clients either are checked via command endpoint, or +directory in `conf.d`, or not. Defaults to disabled, as agents either are checked via command endpoint, or they receive configuration synced from the parent zone. ``` @@ -667,8 +681,8 @@ Here is an overview of all parameters in detail: Common name (CN) | **Required.** By convention this should be the host's FQDN. Defaults to the FQDN. Master common name | **Required.** Use the common name you've specified for your master node before. Establish connection to the parent node | **Optional.** Whether the node should attempt to connect to the parent node or not. Defaults to `y`. - Master/Satellite endpoint host | **Required if the the client needs to connect to the master/satellite.** The parent endpoint's IP address or FQDN. This information is included in the `Endpoint` object configuration in the `zones.conf` file. - Master/Satellite endpoint port | **Optional if the the client needs to connect to the master/satellite.** The parent endpoints's listening port. This information is included in the `Endpoint` object configuration. + Master/Satellite endpoint host | **Required if the the agent needs to connect to the master/satellite.** The parent endpoint's IP address or FQDN. This information is included in the `Endpoint` object configuration in the `zones.conf` file. + Master/Satellite endpoint port | **Optional if the the agent needs to connect to the master/satellite.** The parent endpoints's listening port. This information is included in the `Endpoint` object configuration. Add more master/satellite endpoints | **Optional.** If you have multiple master/satellite nodes configured, add them here. Parent Certificate information | **Required.** Verify that the connecting host really is the requested master node. Request ticket | **Optional.** Add the [ticket](06-distributed-monitoring.md#distributed-monitoring-setup-csr-auto-signing) generated on the master. @@ -676,8 +690,8 @@ Here is an overview of all parameters in detail: API bind port | **Optional.** Allows to specify the port the ApiListener is bound to. For advanced usage only (requires changing the default port 5665 everywhere). Accept config | **Optional.** Whether this node accepts configuration sync from the master node (required for [config sync mode](06-distributed-monitoring.md#distributed-monitoring-top-down-config-sync)). For [security reasons](06-distributed-monitoring.md#distributed-monitoring-security) this defaults to `n`. Accept commands | **Optional.** Whether this node accepts command execution messages from the master node (required for [command endpoint mode](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint)). For [security reasons](06-distributed-monitoring.md#distributed-monitoring-security) this defaults to `n`. - Local zone name | **Optional.** Allows to specify the name for the local zone. This comes in handy when this instance is a satellite, not a client. Defaults to the FQDN. - Parent zone name | **Optional.** Allows to specify the name for the parent zone. This is important if the client has a satellite instance as parent, not the master. Defaults to `master`. + Local zone name | **Optional.** Allows to specify the name for the local zone. This comes in handy when this instance is a satellite, not an agent. Defaults to the FQDN. + Parent zone name | **Optional.** Allows to specify the name for the parent zone. This is important if the agent has a satellite instance as parent, not the master. Defaults to `master`. Global zones | **Optional.** Allows to specify more global zones in addition to `global-templates` and `director-global`. Defaults to `n`. Disable conf.d | **Optional.** Allows to disable the inclusion of the `conf.d` directory which holds local example configuration. Clients should retrieve their configuration from the parent node, or act as command endpoint execution bridge. Defaults to `y`. @@ -687,7 +701,7 @@ The setup wizard will ensure that the following steps are taken: * Create a certificate signing request (CSR) for the local node. * Request a signed certificate i(optional with the provided ticket number) on the master node. * Allow to verify the parent node's certificate. -* Store the signed client certificate and ca.crt in `/var/lib/icinga2/certs`. +* Store the signed agent/satellite certificate and ca.crt in `/var/lib/icinga2/certs`. * Update the `zones.conf` file with the new zone hierarchy. * Update `/etc/icinga2/features-enabled/api.conf` (`accept_config`, `accept_commands`) and `constants.conf`. * Update `/etc/icinga2/icinga2.conf` and comment out `include_recursive "conf.d"`. @@ -696,18 +710,20 @@ You can verify that the certificate files are stored in the `/var/lib/icinga2/ce > **Note** > -> If the client is not directly connected to the certificate signing master, -> signing requests and responses might need some minutes to fully update the client certificates. +> If the agent is not directly connected to the certificate signing master, +> signing requests and responses might need some minutes to fully update the agent certificates. > > If you have chosen to use [On-Demand CSR Signing](06-distributed-monitoring.md#distributed-monitoring-setup-on-demand-csr-signing) > certificates need to be signed on the master first. Ticket-less setups require at least Icinga 2 v2.8+ on all involved instances. -Now that you've successfully installed a Linux/Unix satellite/client instance, please proceed to +Now that you've successfully installed a Linux/Unix agent/satellite instance, please proceed to the [configuration modes](06-distributed-monitoring.md#distributed-monitoring-configuration-modes). + + -### Client Setup on Windows +### Agent Setup on Windows Download the MSI-Installer package from [https://packages.icinga.com/windows/](https://packages.icinga.com/windows/). @@ -725,10 +741,10 @@ to get you started more easily. > **Note** > -> Please note that Icinga 2 was designed to run as light-weight client on Windows. +> Please note that Icinga 2 was designed to run as light-weight agent on Windows. > There is no support for satellite instances. -#### Windows Client Setup Start +#### Windows Agent Setup Start Run the MSI-Installer package and follow the instructions shown in the screenshots. @@ -763,7 +779,7 @@ Add the following details: Parameter | Description -------------------------------|------------------------------- - Instance name | **Required.** The master/satellite endpoint name where this client is a direct child of. + Instance name | **Required.** The master/satellite endpoint name where this agent is a direct child of. Master/Satellite endpoint host | **Required.** The master or satellite's IP address or FQDN. This information is included in the `Endpoint` object configuration in the `zones.conf` file. Master/Satellite endpoint port | **Optional.** The master or satellite's listening port. This information is included in the `Endpoint` object configuration. @@ -790,7 +806,7 @@ Verify the certificate from the master/satellite instance where this node should ![Icinga 2 Windows Setup](images/distributed-monitoring/icinga2_windows_setup_wizard_04.png) -#### Bundled NSClient++ Setup +#### Bundled NSClient++ Setup If you have chosen to install/update the NSClient++ package, the Icinga 2 setup wizard asks you to do so. @@ -828,7 +844,7 @@ The NSClient++ REST API can be used to query metrics. [check_nscp_api](06-distri uses this transport method. -#### Finish Windows Client Setup +#### Finish Windows Client Setup Finish the Windows setup wizard. @@ -849,21 +865,22 @@ Click `Examine Config` in the setup wizard to open a new Explorer window. ![Icinga 2 Windows Setup](images/distributed-monitoring/icinga2_windows_setup_wizard_examine_config.png) -The configuration files can be modified with your favorite editor e.g. Notepad. +The configuration files can be modified with your favorite editor e.g. Notepad++ or vim in Powershell (via chocolatey). -In order to use the [top down](06-distributed-monitoring.md#distributed-monitoring-top-down) client +In order to use the [top down](06-distributed-monitoring.md#distributed-monitoring-top-down) agent configuration prepare the following steps. -You don't need any local configuration on the client except for +You don't need any local configuration on the agent except for CheckCommand definitions which can be synced using the global zone above. Therefore disable the inclusion of the `conf.d` directory in the `icinga2.conf` file. + Navigate to `C:\ProgramData\icinga2\etc\icinga2` and open the `icinga2.conf` file in your preferred editor. Remove or comment (`//`) the following line: ``` -// Commented out, not required on a client with top down mode +// Commented out, not required on an agent with top down mode //include_recursive "conf.d" ``` @@ -887,25 +904,26 @@ and restart the `icinga2` service. Alternatively, you can use the `net {start,st ![Icinga 2 Windows Service Start/Stop](images/distributed-monitoring/icinga2_windows_cmd_admin_net_start_stop.png) -Now that you've successfully installed a Windows client, please proceed to +Now that you've successfully installed a Windows agent, please proceed to the [detailed configuration modes](06-distributed-monitoring.md#distributed-monitoring-configuration-modes). + ## Configuration Modes There are different ways to ensure that the Icinga 2 cluster nodes execute checks, send notifications, etc. The preferred method is to configure monitoring objects on the master -and distribute the configuration to satellites and clients. +and distribute the configuration to satellites and agents. -The following chapters will explain this in detail with hands-on manual configuration +The following chapters explain this in detail with hands-on manual configuration examples. You should test and implement this once to fully understand how it works. Once you are familiar with Icinga 2 and distributed monitoring, you can start with additional integrations to manage and deploy your configuration: -* [Icinga Director](https://github.com/icinga/icingaweb2-module-director) provides a web interface to manage configuration and also allows to sync imported resources (CMDB, PuppetDB, etc.) +* [Icinga Director](https://icinga.com/docs/director/latest/) provides a web interface to manage configuration and also allows to sync imported resources (CMDB, PuppetDB, etc.) * [Ansible Roles](https://github.com/Icinga/icinga2-ansible) * [Puppet Module](https://github.com/Icinga/puppet-icinga2) * [Chef Cookbook](https://github.com/Icinga/chef-icinga2) @@ -919,14 +937,14 @@ There are two different behaviors with check execution: * Send a command execution event remotely: The scheduler still runs on the parent node. * Sync the host/service objects directly to the child node: Checks are executed locally. -Again, technically it does not matter whether this is a `client` or a `satellite` +Again, technically it does not matter whether this is an `agent` or a `satellite` which is receiving configuration or command execution events. ### Top Down Command Endpoint -This mode will force the Icinga 2 node to execute commands remotely on a specified endpoint. -The host/service object configuration is located on the master/satellite and the client only -needs the CheckCommand object definitions being used there. +This mode forces the Icinga 2 node to execute commands remotely on a specified endpoint. +The host/service object configuration is located on the master/satellite and the agent only +needs the CheckCommand object definitions available. Every endpoint has its own remote check queue. The amount of checks executed simultaneously can be limited on the endpoint with the `MaxConcurrentChecks` constant defined in [constants.conf](04-configuration.md#constants-conf). Icinga 2 may discard check requests, @@ -936,7 +954,7 @@ if the remote check queue is full. Advantages: -* No local checks need to be defined on the child node (client). +* No local checks need to be defined on the child node (agent). * Light-weight remote check execution (asynchronous events). * No [replay log](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-command-endpoint-log-duration) is necessary for the child node. * Pin checks to specific endpoints (if the child zone consists of 2 endpoints). @@ -952,7 +970,7 @@ commands, you need to configure the `Zone` and `Endpoint` hierarchy on all nodes. * `icinga2-master1.localdomain` is the configuration master in this scenario. -* `icinga2-agent1.localdomain` acts as client which receives command execution messages via command endpoint from the master. In addition, it receives the global check command configuration from the master. +* `icinga2-agent1.localdomain` acts as agent which receives command execution messages via command endpoint from the master. In addition, it receives the global check command configuration from the master. Include the endpoint and zone configuration on **both** nodes in the file `/etc/icinga2/zones.conf`. @@ -967,13 +985,14 @@ object Endpoint "icinga2-master1.localdomain" { object Endpoint "icinga2-agent1.localdomain" { host = "192.168.56.111" + log_duration = 0 // Disable the replay log for command endpoint agents } ``` -Next, you need to define two zones. There is no naming convention, best practice is to either use `master`, `satellite`/`client-fqdn` or to choose region names for example `Europe`, `USA` and `Asia`, though. +Next, you need to define two zones. There is no naming convention, best practice is to either use `master`, `satellite`/`agent-fqdn` or to choose region names for example `Europe`, `USA` and `Asia`, though. -**Note**: Each client requires its own zone and endpoint configuration. Best practice -is to use the client's FQDN for all object names. +**Note**: Each agent requires its own zone and endpoint configuration. Best practice +is to use the agent's FQDN for all object names. The `master` zone is a parent of the `icinga2-agent1.localdomain` zone: @@ -991,7 +1010,7 @@ object Zone "icinga2-agent1.localdomain" { } ``` -You don't need any local configuration on the client except for +You don't need any local configuration on the agent except for CheckCommand definitions which can be synced using the global zone above. Therefore disable the inclusion of the `conf.d` directory in `/etc/icinga2/icinga2.conf`. @@ -999,7 +1018,7 @@ in `/etc/icinga2/icinga2.conf`. ``` [root@icinga2-agent1.localdomain /]# vim /etc/icinga2/icinga2.conf -// Commented out, not required on a client as command endpoint +// Commented out, not required on an agent as command endpoint //include_recursive "conf.d" ``` @@ -1021,8 +1040,8 @@ Example on CentOS 7: [root@icinga2-master1.localdomain /]# systemctl restart icinga2 ``` -Once the clients have successfully connected, you are ready for the next step: **execute -a remote check on the client using the command endpoint**. +Once the agents have successfully connected, you are ready for the next step: **execute +a remote check on the agent using the command endpoint**. Include the host and service object configuration in the `master` zone -- this will help adding a secondary master for high-availability later. @@ -1035,8 +1054,15 @@ Add the host and service objects you want to monitor. There is no limitation for files and directories -- best practice is to sort things by type. -By convention a master/satellite/client host object should use the same name as the endpoint object. -You can also add multiple hosts which execute checks against remote services/clients. +By convention a master/satellite/agent host object should use the same name as the endpoint object. +You can also add multiple hosts which execute checks against remote services/agents. + +The following example adds the `agent_endpoint` custom variable to the +host and stores its name (FQDN). _Versions older than 2.11 +used the `client_endpoint` custom variable._ + +This custom variable serves two purposes: 1) Service apply rules can match against it. +2) Apply rules can retrieve its value and assign it to the `command_endpoint` attribute. ``` [root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/master @@ -1046,11 +1072,11 @@ object Host "icinga2-agent1.localdomain" { check_command = "hostalive" //check is executed on the master address = "192.168.56.111" - vars.client_endpoint = name //follows the convention that host name == endpoint name + vars.agent_endpoint = name //follows the convention that host name == endpoint name } ``` -Given that you are monitoring a Linux client, we'll add a remote [disk](10-icinga-template-library.md#plugin-check-command-disk) +Given that you are monitoring a Linux agent, add a remote [disk](10-icinga-template-library.md#plugin-check-command-disk) check. ``` @@ -1059,10 +1085,11 @@ check. apply Service "disk" { check_command = "disk" - //specify where the check is executed - command_endpoint = host.vars.client_endpoint + // Specify the remote agent as command execution endpoint, fetch the host custom variable + command_endpoint = host.vars.agent_endpoint - assign where host.vars.client_endpoint + // Only assign where a host is marked as agent endpoint + assign where host.vars.agent_endpoint } ``` @@ -1095,7 +1122,7 @@ The following steps will happen: * The `icinga2-agent1.localdomain` node receives the execute command event with additional command parameters. * The `icinga2-agent1.localdomain` node maps the command parameters to the local check command, executes the check locally, and sends back the check result message. -As you can see, no interaction from your side is required on the client itself, and it's not necessary to reload the Icinga 2 service on the client. +As you can see, no interaction from your side is required on the agent itself, and it's not necessary to reload the Icinga 2 service on the agent. You have learned the basics about command endpoint checks. Proceed with the [scenarios](06-distributed-monitoring.md#distributed-monitoring-scenarios) @@ -1130,30 +1157,27 @@ commands, you need to configure the `Zone` and `Endpoint` hierarchy on all nodes. * `icinga2-master1.localdomain` is the configuration master in this scenario. -* `icinga2-agent2.localdomain` acts as client which receives configuration from the master. Checks are scheduled locally. +* `icinga2-satellite1.localdomain` acts as satellite which receives configuration from the master. Checks are scheduled locally. Include the endpoint and zone configuration on **both** nodes in the file `/etc/icinga2/zones.conf`. The endpoint configuration could look like this: ``` -[root@icinga2-agent2.localdomain /]# vim /etc/icinga2/zones.conf +[root@icinga2-satellite1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { host = "192.168.56.101" } -object Endpoint "icinga2-agent2.localdomain" { - host = "192.168.56.112" +object Endpoint "icinga2-satellite1.localdomain" { + host = "192.168.56.105" } ``` -Next, you need to define two zones. There is no naming convention, best practice is to either use `master`, `satellite`/`client-fqdn` or to choose region names for example `Europe`, `USA` and `Asia`, though. +Next, you need to define two zones. There is no naming convention, best practice is to either use `master`, `satellite`/`agent-fqdn` or to choose region names for example `Europe`, `USA` and `Asia`, though. -**Note**: Each client requires its own zone and endpoint configuration. Best practice -is to use the client's FQDN for all object names. - -The `master` zone is a parent of the `icinga2-agent2.localdomain` zone: +The `master` zone is a parent of the `satellite` zone: ``` [root@icinga2-agent2.localdomain /]# vim /etc/icinga2/zones.conf @@ -1162,19 +1186,19 @@ object Zone "master" { endpoints = [ "icinga2-master1.localdomain" ] //array with endpoint names } -object Zone "icinga2-agent2.localdomain" { - endpoints = [ "icinga2-agent2.localdomain" ] +object Zone "satellite" { + endpoints = [ "icinga2-satellite1.localdomain" ] parent = "master" //establish zone hierarchy } ``` -Edit the `api` feature on the client `icinga2-agent2.localdomain` in +Edit the `api` feature on the satellite `icinga2-satellite1.localdomain` in the `/etc/icinga2/features-enabled/api.conf` file and set `accept_config` to `true`. ``` -[root@icinga2-agent2.localdomain /]# vim /etc/icinga2/features-enabled/api.conf +[root@icinga2-satellite1.localdomain /]# vim /etc/icinga2/features-enabled/api.conf object ApiListener "api" { //... @@ -1188,8 +1212,8 @@ on both nodes. Example on CentOS 7: ``` -[root@icinga2-agent2.localdomain /]# icinga2 daemon -C -[root@icinga2-agent2.localdomain /]# systemctl restart icinga2 +[root@icinga2-satellite1.localdomain /]# icinga2 daemon -C +[root@icinga2-satellite1.localdomain /]# systemctl restart icinga2 [root@icinga2-master1.localdomain /]# icinga2 daemon -C [root@icinga2-master1.localdomain /]# systemctl restart icinga2 @@ -1198,43 +1222,44 @@ Example on CentOS 7: **Tip**: Best practice is to use a [global zone](06-distributed-monitoring.md#distributed-monitoring-global-zone-config-sync) for common configuration items (check commands, templates, groups, etc.). -Once the clients have connected successfully, it's time for the next step: **execute -a local check on the client using the configuration sync**. +Once the satellite(s) have connected successfully, it's time for the next step: **execute +a local check on the satellite using the configuration sync**. Navigate to `/etc/icinga2/zones.d` on your master node `icinga2-master1.localdomain` and create a new directory with the same -name as your satellite/client zone name: +name as your satellite/agent zone name: ``` -[root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/zones.d/icinga2-agent2.localdomain +[root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/zones.d/satellite ``` Add the host and service objects you want to monitor. There is no limitation for files and directories -- best practice is to sort things by type. -By convention a master/satellite/client host object should use the same name as the endpoint object. -You can also add multiple hosts which execute checks against remote services/clients. +By convention a master/satellite/agent host object should use the same name as the endpoint object. +You can also add multiple hosts which execute checks against remote services/agents via [command endpoint](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) +checks. ``` -[root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/icinga2-agent2.localdomain -[root@icinga2-master1.localdomain /etc/icinga2/zones.d/icinga2-agent2.localdomain]# vim hosts.conf +[root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/satellite +[root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim hosts.conf -object Host "icinga2-agent2.localdomain" { +object Host "icinga2-satellite1.localdomain" { check_command = "hostalive" address = "192.168.56.112" - zone = "master" //optional trick: sync the required host object to the client, but enforce the "master" zone to execute the check + zone = "master" //optional trick: sync the required host object to the satellite, but enforce the "master" zone to execute the check } ``` -Given that you are monitoring a Linux client we'll just add a local [disk](10-icinga-template-library.md#plugin-check-command-disk) +Given that you are monitoring a Linux satellite add a local [disk](10-icinga-template-library.md#plugin-check-command-disk) check. ``` -[root@icinga2-master1.localdomain /etc/icinga2/zones.d/icinga2-agent2.localdomain]# vim services.conf +[root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim services.conf object Service "disk" { - host_name = "icinga2-agent2.localdomain" + host_name = "icinga2-satellite1.localdomain" check_command = "disk" } @@ -1257,11 +1282,10 @@ The following steps will happen: * Icinga 2 validates the configuration on `icinga2-master1.localdomain`. * Icinga 2 copies the configuration into its zone config store in `/var/lib/icinga2/api/zones`. * The `icinga2-master1.localdomain` node sends a config update event to all endpoints in the same or direct child zones. -* The `icinga2-agent2.localdomain` node accepts config and populates the local zone config store with the received config files. -* The `icinga2-agent2.localdomain` node validates the configuration and automatically restarts. +* The `icinga2-satellite1.localdomain` node accepts config and populates the local zone config store with the received config files. +* The `icinga2-satellite1.localdomain` node validates the configuration and automatically restarts. -Again, there is no interaction required on the client -itself. +Again, there is no interaction required on the satellite itself. You can also use the config sync inside a high-availability zone to ensure that all config objects are synced among zone members. @@ -1287,32 +1311,35 @@ distributed monitoring environment. We've seen them all in production environments and received feedback from our [community](https://community.icinga.com/) and [partner support](https://icinga.com/support/) channels: -* [Single master with client](06-distributed-monitoring.md#distributed-monitoring-master-clients). -* [HA master with clients as command endpoint](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-clients) -* [Three level cluster](06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-client) with config HA masters, satellites receiving config sync, and clients checked using command endpoint. +* [Single master with agents](06-distributed-monitoring.md#distributed-monitoring-master-agents). +* [HA master with agents as command endpoint](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-agents) +* [Three level cluster](06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-agents) with config HA masters, satellites receiving config sync, and agents checked using command endpoint. You can also extend the cluster tree depth to four levels e.g. with 2 satellite levels. Just keep in mind that multiple levels become harder to debug in case of errors. You can also start with a single master setup, and later add a secondary master endpoint. This requires an extra step with the [initial sync](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-initial-sync) -for cloning the runtime state. This is described in detail [here](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-clients). +for cloning the runtime state. This is described in detail [here](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-agents). -### Master with Clients + + + +### Master with Agents In this scenario, a single master node runs the check scheduler, notifications and IDO database backend and uses the [command endpoint mode](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) -to execute checks on the remote clients. +to execute checks on the remote agents. ![Icinga 2 Distributed Master with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_master_clients.png) * `icinga2-master1.localdomain` is the primary master node. -* `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` are two child nodes as clients. +* `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` are two child nodes as agents. Setup requirements: * Set up `icinga2-master1.localdomain` as [master](06-distributed-monitoring.md#distributed-monitoring-setup-master). -* Set up `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` as [client](06-distributed-monitoring.md#distributed-monitoring-setup-satellite-client). +* Set up `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` as [agent](06-distributed-monitoring.md#distributed-monitoring-setup-agent-satellite). Edit the `zones.conf` configuration file on the master: @@ -1320,14 +1347,17 @@ Edit the `zones.conf` configuration file on the master: [root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { + // That's us } object Endpoint "icinga2-agent1.localdomain" { - host = "192.168.56.111" //the master actively tries to connect to the client + host = "192.168.56.111" // The master actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents } object Endpoint "icinga2-agent2.localdomain" { - host = "192.168.56.112" //the master actively tries to connect to the client + host = "192.168.56.112" // The master actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents } object Zone "master" { @@ -1350,24 +1380,28 @@ object Zone "icinga2-agent2.localdomain" { object Zone "global-templates" { global = true } +object Zone "director-global" { + global = true +} ``` -The two client nodes do not necessarily need to know about each other. The only important thing +The two agent nodes do not need to know about each other. The only important thing is that they know about the parent zone and their endpoint members (and optionally the global zone). If you specify the `host` attribute in the `icinga2-master1.localdomain` endpoint object, -the client will actively try to connect to the master node. Since we've specified the client -endpoint's attribute on the master node already, we don't want the clients to connect to the +the agent will actively try to connect to the master node. Since you've specified the agent +endpoint's attribute on the master node already, you don't want the agents to connect to the master. **Choose one [connection direction](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-connection-direction).** ``` [root@icinga2-agent1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute } object Endpoint "icinga2-agent1.localdomain" { + // That's us } object Zone "master" { @@ -1384,14 +1418,19 @@ object Zone "icinga2-agent1.localdomain" { object Zone "global-templates" { global = true } - +object Zone "director-global" { + global = true +} +``` +``` [root@icinga2-agent2.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute } object Endpoint "icinga2-agent2.localdomain" { + // That's us } object Zone "master" { @@ -1408,9 +1447,12 @@ object Zone "icinga2-agent2.localdomain" { object Zone "global-templates" { global = true } +object Zone "director-global" { + global = true +} ``` -Now it is time to define the two client hosts and apply service checks using +Now it is time to define the two agent hosts and apply service checks using the command endpoint execution method on them. Note: You can also use the config sync mode here. @@ -1420,7 +1462,7 @@ Create a new configuration directory on the master node: [root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/zones.d/master ``` -Add the two client nodes as host objects: +Add the two agent nodes as host objects: ``` [root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/master @@ -1429,13 +1471,15 @@ Add the two client nodes as host objects: object Host "icinga2-agent1.localdomain" { check_command = "hostalive" address = "192.168.56.111" - vars.client_endpoint = name //follows the convention that host name == endpoint name + + vars.agent_endpoint = name //follows the convention that host name == endpoint name } object Host "icinga2-agent2.localdomain" { check_command = "hostalive" address = "192.168.56.112" - vars.client_endpoint = name //follows the convention that host name == endpoint name + + vars.agent_endpoint = name //follows the convention that host name == endpoint name } ``` @@ -1446,6 +1490,7 @@ Add services using command endpoint checks: apply Service "ping4" { check_command = "ping4" + //check is executed on the master node assign where host.address } @@ -1453,10 +1498,11 @@ apply Service "ping4" { apply Service "disk" { check_command = "disk" - //specify where the check is executed - command_endpoint = host.vars.client_endpoint + // Execute the check on the remote command endpoint + command_endpoint = host.vars.agent_endpoint - assign where host.vars.client_endpoint + // Assign the service onto an agent + assign where host.vars.agent_endpoint } ``` @@ -1467,19 +1513,39 @@ Validate the configuration and restart Icinga 2 on the master node `icinga2-mast [root@icinga2-master1.localdomain /]# systemctl restart icinga2 ``` -Open Icinga Web 2 and check the two newly created client hosts with two new services +Open Icinga Web 2 and check the two newly created agent hosts with two new services -- one executed locally (`ping4`) and one using command endpoint (`disk`). +> **Note** +> +> You don't necessarily need to add the agent endpoint/zone configuration objects +> into the master's zones.conf file. Instead, you can put them into `/etc/icinga2/zones.d/master` +> either in `hosts.conf` shown above, or in a new file called `agents.conf`. -### High-Availability Master with Clients +> **Tip**: +> +> It's a good idea to add [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks) +to make sure that your cluster notifies you in case of failure. -This scenario is similar to the one in the [previous section](06-distributed-monitoring.md#distributed-monitoring-master-clients). The only difference is that we will now set up two master nodes in a high-availability setup. +In terms of health checks, consider adding the following for this scenario: + +- Master node(s) check the connection to the agents +- Optional: Add dependencies for the agent host to prevent unwanted notifications when agents are unreachable + +Proceed in [this chapter](06-distributed-monitoring.md#distributed-monitoring-health-checks-master-agents). + + + + +### High-Availability Master with Clients + +This scenario is similar to the one in the [previous section](06-distributed-monitoring.md#distributed-monitoring-master-agents). The only difference is that we will now set up two master nodes in a high-availability setup. These nodes must be configured as zone and endpoints objects. ![Icinga 2 Distributed High Availability Master with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_ha_master_clients.png) The setup uses the capabilities of the Icinga 2 cluster. All zone members -replicate cluster events amongst each other. In addition to that, several Icinga 2 +replicate cluster events between each other. In addition to that, several Icinga 2 features can enable [HA functionality](06-distributed-monitoring.md#distributed-monitoring-high-availability-features). Best practice is to run the database backend on a dedicated server/cluster and @@ -1495,16 +1561,16 @@ Overview: * `icinga2-master1.localdomain` is the config master master node. * `icinga2-master2.localdomain` is the secondary master master node without config in `zones.d`. -* `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` are two child nodes as clients. +* `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` are two child nodes as agents. Setup requirements: * Set up `icinga2-master1.localdomain` as [master](06-distributed-monitoring.md#distributed-monitoring-setup-master). -* Set up `icinga2-master2.localdomain` as [client](06-distributed-monitoring.md#distributed-monitoring-setup-satellite-client) (we will modify the generated configuration). -* Set up `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` as [clients](06-distributed-monitoring.md#distributed-monitoring-setup-satellite-client) (when asked for adding multiple masters, set to `y` and add the secondary master `icinga2-master2.localdomain`). +* Set up `icinga2-master2.localdomain` as [satellite](06-distributed-monitoring.md#distributed-monitoring-setup-agent-satellite) (**we will modify the generated configuration**). +* Set up `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` as [agents](06-distributed-monitoring.md#distributed-monitoring-setup-agent-satellite) (when asked for adding multiple masters, set to `y` and add the secondary master `icinga2-master2.localdomain`). In case you don't want to use the CLI commands, you can also manually create and sync the -required SSL certificates. We will modify and discuss all the details of the automatically generated configuration here. +required TLS certificates. We will modify and discuss all the details of the automatically generated configuration here. Since there are now two nodes in the same zone, we must consider the [high-availability features](06-distributed-monitoring.md#distributed-monitoring-high-availability-features). @@ -1520,74 +1586,89 @@ backend, IDO database, used transports, etc.). > **Note** > -> You can also start with a single master shown [here](06-distributed-monitoring.md#distributed-monitoring-master-clients) and later add +> You can also start with a single master shown [here](06-distributed-monitoring.md#distributed-monitoring-master-agents) and later add > the second master. This requires an extra step with the [initial sync](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-initial-sync) > for cloning the runtime state after done. Once done, proceed here. -The zone hierarchy could look like this. It involves putting the two master nodes -`icinga2-master1.localdomain` and `icinga2-master2.localdomain` into the `master` zone. +In this scenario, we are not adding the agent configuration immediately +to the `zones.conf` file but will establish the hierarchy later. + +The first master looks like this: ``` [root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - host = "192.168.56.101" + // That's us } object Endpoint "icinga2-master2.localdomain" { - host = "192.168.56.102" -} - -object Endpoint "icinga2-agent1.localdomain" { - host = "192.168.56.111" //the master actively tries to connect to the client -} - -object Endpoint "icinga2-agent2.localdomain" { - host = "192.168.56.112" //the master actively tries to connect to the client + host = "192.168.56.102" // Actively connect to the secondary master } object Zone "master" { endpoints = [ "icinga2-master1.localdomain", "icinga2-master2.localdomain" ] } -object Zone "icinga2-agent1.localdomain" { - endpoints = [ "icinga2-agent1.localdomain" ] +/* sync global commands */ +object Zone "global-templates" { + global = true +} +object Zone "director-global" { + global = true +} +``` - parent = "master" +The secondary master waits for connection attempts from the first master, +and therefore does not try to connect to it again. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.conf + +object Endpoint "icinga2-master1.localdomain" { + // That's us } -object Zone "icinga2-agent2.localdomain" { - endpoints = [ "icinga2-agent2.localdomain" ] +object Endpoint "icinga2-master2.localdomain" { + // The first master already connects to us +} - parent = "master" +object Zone "master" { + endpoints = [ "icinga2-master1.localdomain", "icinga2-master2.localdomain" ] } /* sync global commands */ object Zone "global-templates" { global = true } +object Zone "director-global" { + global = true +} ``` -The two client nodes do not necessarily need to know about each other. The only important thing +Restart both masters and ensure the initial connection and TLS handshake works. + +The two agent nodes do not need to know about each other. The only important thing is that they know about the parent zone and their endpoint members (and optionally about the global zone). If you specify the `host` attribute in the `icinga2-master1.localdomain` and `icinga2-master2.localdomain` -endpoint objects, the client will actively try to connect to the master node. Since we've specified the client -endpoint's attribute on the master node already, we don't want the clients to connect to the +endpoint objects, the agent will actively try to connect to the master node. Since we've specified the agent +endpoint's attribute on the master node already, we don't want the agent to connect to the master nodes. **Choose one [connection direction](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-connection-direction).** ``` [root@icinga2-agent1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute } object Endpoint "icinga2-master2.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute } object Endpoint "icinga2-agent1.localdomain" { + // That's us } object Zone "master" { @@ -1604,18 +1685,25 @@ object Zone "icinga2-agent1.localdomain" { object Zone "global-templates" { global = true } +object Zone "director-global" { + global = true +} +``` + +``` [root@icinga2-agent2.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute } object Endpoint "icinga2-master2.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute } object Endpoint "icinga2-agent2.localdomain" { + //That's us } object Zone "master" { @@ -1632,11 +1720,13 @@ object Zone "icinga2-agent2.localdomain" { object Zone "global-templates" { global = true } +object Zone "director-global" { + global = true +} ``` -Now it is time to define the two client hosts and apply service checks using -the command endpoint execution method. Note: You can also use the -config sync mode here. +Now it is time to define the two agent hosts and apply service checks using +the command endpoint execution method. Create a new configuration directory on the master node `icinga2-master1.localdomain`. **Note**: The secondary master node `icinga2-master2.localdomain` receives the @@ -1646,22 +1736,74 @@ configuration using the [config sync mode](06-distributed-monitoring.md#distribu [root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/zones.d/master ``` -Add the two client nodes as host objects: +Add the two agent nodes with their zone/endpoint and host object configuration. + +> **Note** +> +> In order to keep things in sync between the two HA masters, +> keep the `zones.conf` file as small as possible. +> +> You can create the agent zone and endpoint objects inside the +> master zone and have them synced to the secondary master. +> The cluster config sync enforces a reload allowing the secondary +> master to connect to the agents as well. + +Edit the `zones.conf` file and ensure that the agent zone/endpoint objects +are **not** specified in there. + +Then navigate into `/etc/icinga2/zones.d/master` and create a new file `agents.conf`. ``` [root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/master +[root@icinga2-master1.localdomain /etc/icinga2/zones.d/master]# vim agents.conf + +//----------------------------------------------- +// Endpoints + +object Endpoint "icinga2-agent1.localdomain" { + host = "192.168.56.111" // The master actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents +} + +object Endpoint "icinga2-agent2.localdomain" { + host = "192.168.56.112" // The master actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents +} + +//----------------------------------------------- +// Zones + +object Zone "icinga2-agent1.localdomain" { + endpoints = [ "icinga2-agent1.localdomain" ] + + parent = "master" +} + +object Zone "icinga2-agent2.localdomain" { + endpoints = [ "icinga2-agent2.localdomain" ] + + parent = "master" +} +``` + +Whenever you need to add an agent again, edit the mentioned files. + +Next, create the corresponding host objects for the agents. Use the same names +for host and endpoint objects. + +``` [root@icinga2-master1.localdomain /etc/icinga2/zones.d/master]# vim hosts.conf object Host "icinga2-agent1.localdomain" { check_command = "hostalive" address = "192.168.56.111" - vars.client_endpoint = name //follows the convention that host name == endpoint name + vars.agent_endpoint = name //follows the convention that host name == endpoint name } object Host "icinga2-agent2.localdomain" { check_command = "hostalive" address = "192.168.56.112" - vars.client_endpoint = name //follows the convention that host name == endpoint name + vars.agent_endpoint = name //follows the convention that host name == endpoint name } ``` @@ -1672,17 +1814,18 @@ Add services using command endpoint checks: apply Service "ping4" { check_command = "ping4" - //check is executed on the master node + + // Check is executed on the master node assign where host.address } apply Service "disk" { check_command = "disk" - //specify where the check is executed - command_endpoint = host.vars.client_endpoint + // Check is executed on the remote command endpoint + command_endpoint = host.vars.agent_endpoint - assign where host.vars.client_endpoint + assign where host.vars.agent_endpoint } ``` @@ -1693,17 +1836,28 @@ Validate the configuration and restart Icinga 2 on the master node `icinga2-mast [root@icinga2-master1.localdomain /]# systemctl restart icinga2 ``` -Open Icinga Web 2 and check the two newly created client hosts with two new services +Open Icinga Web 2 and check the two newly created agent hosts with two new services -- one executed locally (`ping4`) and one using command endpoint (`disk`). -**Tip**: It's a good idea to add [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks) +> **Tip**: +> +> It's a good idea to add [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks) to make sure that your cluster notifies you in case of failure. +In terms of health checks, consider adding the following for this scenario: -### Three Levels with Master, Satellites, and Clients +- Master node(s) check the connection to the agents +- Optional: Add dependencies for the agent host to prevent unwanted notifications when agents are unreachable + +Proceed in [this chapter](06-distributed-monitoring.md#distributed-monitoring-health-checks-master-agents). + + + + +### Three Levels with Masters, Satellites and Agents This scenario combines everything you've learned so far: High-availability masters, -satellites receiving their configuration from the master zone, and clients checked via command +satellites receiving their configuration from the master zone, and agents checked via command endpoint from the satellite zones. ![Icinga 2 Distributed Master and Satellites with Clients](images/distributed-monitoring/icinga2_distributed_scenarios_master_satellite_client.png) @@ -1724,17 +1878,17 @@ Overview: * `icinga2-master1.localdomain` is the configuration master master node. * `icinga2-master2.localdomain` is the secondary master master node without configuration in `zones.d`. * `icinga2-satellite1.localdomain` and `icinga2-satellite2.localdomain` are satellite nodes in a `master` child zone. They forward CSR signing requests to the master zone. -* `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` are two child nodes as clients. +* `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` are two child nodes as agents. Setup requirements: * Set up `icinga2-master1.localdomain` as [master](06-distributed-monitoring.md#distributed-monitoring-setup-master). -* Set up `icinga2-master2.localdomain`, `icinga2-satellite1.localdomain` and `icinga2-satellite2.localdomain` as [clients](06-distributed-monitoring.md#distributed-monitoring-setup-satellite-client) (we will modify the generated configuration). -* Set up `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` as [clients](06-distributed-monitoring.md#distributed-monitoring-setup-satellite-client). +* Set up `icinga2-master2.localdomain`, `icinga2-satellite1.localdomain` and `icinga2-satellite2.localdomain` as [agents](06-distributed-monitoring.md#distributed-monitoring-setup-agent-satellite) (we will modify the generated configuration). +* Set up `icinga2-agent1.localdomain` and `icinga2-agent2.localdomain` as [agents](06-distributed-monitoring.md#distributed-monitoring-setup-agent-satellite). When being asked for the parent endpoint providing CSR auto-signing capabilities, please add one of the satellite nodes. **Note**: This requires Icinga 2 v2.8+ -and the `CA Proxy` on all master, satellite and client nodes. +and the `CA Proxy` on all master, satellite and agent nodes. Example for `icinga2-agent1.localdomain`: @@ -1811,7 +1965,7 @@ in the generated zone configuration file. Local zone name [icinga2-agent1.localdomain]: icinga2-agent1.localdomain ``` -Set the parent zone name to `satellite` for this client. +Set the parent zone name to `satellite` for this agent. ``` Parent zone name [master]: satellite @@ -1828,8 +1982,8 @@ Do you want to specify additional global zones? [y/N]: N ``` Last but not least the wizard asks you whether you want to disable the inclusion of the local configuration -directory in `conf.d`, or not. Defaults to disabled, as clients either are checked via command endpoint, or -they receive configuration synced from the parent zone. +directory in `conf.d`, or not. Defaults to disabled, since agents are checked via command endpoint and the example +configuration would collide with this mode. ``` Do you want to disable the inclusion of the conf.d directory [Y/n]: Y @@ -1850,25 +2004,54 @@ must include the `host` attribute for the satellite endpoints: [root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - //that's us + // That's us } object Endpoint "icinga2-master2.localdomain" { - host = "192.168.56.102" + host = "192.168.56.102" // Actively connect to the second master. } object Endpoint "icinga2-satellite1.localdomain" { - host = "192.168.56.105" + host = "192.168.56.105" // Actively connect to the satellites. } object Endpoint "icinga2-satellite2.localdomain" { - host = "192.168.56.106" + host = "192.168.56.106" // Actively connect to the satellites. } object Zone "master" { endpoints = [ "icinga2-master1.localdomain", "icinga2-master2.localdomain" ] } +``` +The endpoint configuration on the secondary master looks similar, +but changes the connection attributes - the first master already +tries to connect, there is no need for a secondary attempt. + +``` +[root@icinga2-master2.localdomain /]# vim /etc/icinga2/zones.conf + +object Endpoint "icinga2-master1.localdomain" { + // First master already connects to us +} + +object Endpoint "icinga2-master2.localdomain" { + // That's us +} + +object Endpoint "icinga2-satellite1.localdomain" { + host = "192.168.56.105" // Actively connect to the satellites. +} + +object Endpoint "icinga2-satellite2.localdomain" { + host = "192.168.56.106" // Actively connect to the satellites. +} +``` + +The zone configuration on both masters looks the same. Add this +to the corresponding `zones.conf` entries for the endpoints. + +``` object Zone "satellite" { endpoints = [ "icinga2-satellite1.localdomain", "icinga2-satellite2.localdomain" ] @@ -1894,21 +2077,50 @@ instances. [root@icinga2-satellite1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-master1.localdomain" { - //this endpoint will connect to us + // This endpoint will connect to us } object Endpoint "icinga2-master2.localdomain" { - //this endpoint will connect to us + // This endpoint will connect to us } object Endpoint "icinga2-satellite1.localdomain" { - //that's us + // That's us } object Endpoint "icinga2-satellite2.localdomain" { - host = "192.168.56.106" + host = "192.168.56.106" // Actively connect to the secondary satellite +} +``` + +Again, only one side is required to establish the connection inside the HA zone. +Since satellite1 already connects to satellite2, leave out the `host` attribute +for `icinga2-satellite1.localdomain` on satellite2. + +``` +[root@icinga2-satellite2.localdomain /]# vim /etc/icinga2/zones.conf + +object Endpoint "icinga2-master1.localdomain" { + // This endpoint will connect to us } +object Endpoint "icinga2-master2.localdomain" { + // This endpoint will connect to us +} + +object Endpoint "icinga2-satellite1.localdomain" { + // First satellite already connects to us +} + +object Endpoint "icinga2-satellite2.localdomain" { + // That's us +} +``` + +The zone configuration on both satellites looks the same. Add this +to the corresponding `zones.conf` entries for the endpoints. + +``` object Zone "master" { endpoints = [ "icinga2-master1.localdomain", "icinga2-master2.localdomain" ] } @@ -1928,15 +2140,19 @@ object Zone "director-global" { global = true } ``` + Keep in mind to control the endpoint [connection direction](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-connection-direction) using the `host` attribute, also for other endpoints in the same zone. -Adopt the configuration for `icinga2-master2.localdomain` and `icinga2-satellite2.localdomain`. - Since we want to use [top down command endpoint](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) checks, -we must configure the client endpoint and zone objects. -In order to minimize the effort, we'll sync the client zone and endpoint configuration to the -satellites where the connection information is needed as well. +we must configure the agent endpoint and zone objects. + +In order to minimize the effort, we'll sync the agent zone and endpoint configuration to the +satellites where the connection information is needed as well. Note: This only works with satellite +and agents, since there already is a trust relationship between the master and the satellite zone. +The cluster config sync to the satellite invokes an automated reload causing the agent connection attempts. + +`icinga2-master1.localdomain` is the configuration master where everything is stored: ``` [root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/zones.d/{master,satellite,global-templates} @@ -1945,7 +2161,8 @@ satellites where the connection information is needed as well. [root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim icinga2-agent1.localdomain.conf object Endpoint "icinga2-agent1.localdomain" { - host = "192.168.56.111" //the satellite actively tries to connect to the client + host = "192.168.56.111" // The satellite actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents } object Zone "icinga2-agent1.localdomain" { @@ -1957,7 +2174,8 @@ object Zone "icinga2-agent1.localdomain" { [root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim icinga2-agent2.localdomain.conf object Endpoint "icinga2-agent2.localdomain" { - host = "192.168.56.112" //the satellite actively tries to connect to the client + host = "192.168.56.112" // The satellite actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents } object Zone "icinga2-agent2.localdomain" { @@ -1967,13 +2185,20 @@ object Zone "icinga2-agent2.localdomain" { } ``` -The two client nodes do not necessarily need to know about each other, either. The only important thing +The two agent nodes do not need to know about each other. The only important thing is that they know about the parent zone (the satellite) and their endpoint members (and optionally the global zone). -If you specify the `host` attribute in the `icinga2-satellite1.localdomain` and `icinga2-satellite2.localdomain` -endpoint objects, the client node will actively try to connect to the satellite node. Since we've specified the client -endpoint's attribute on the satellite node already, we don't want the client node to connect to the -satellite nodes. **Choose one [connection direction](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-connection-direction).** +> **Tipp** +> +> In the example above we've specified the `host` attribute in the agent endpoint configuration. In this mode, +> the satellites actively connect to the agents. This costs some resources on the satellite -- if you prefer to +> offload the connection attempts to the agent, or your DMZ requires this, you can also change the **[connection direction](06-distributed-monitoring.md#distributed-monitoring-advanced-hints-connection-direction).** +> +> 1) Don't set the `host` attribute for the agent endpoints put into `zones.d/satellite`. +> 2) Modify each agent's zones.conf file and add the `host` attribute to all parent satellites. You can automate this with using the `node wizard/setup` CLI commands. + +The agents are waiting for the satellites to connect, therefore they don't specify +the `host` attribute in the endpoint objects locally. Example for `icinga2-agent1.localdomain`: @@ -1981,15 +2206,15 @@ Example for `icinga2-agent1.localdomain`: [root@icinga2-agent1.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-satellite1.localdomain" { - //do not actively connect to the satellite by leaving out the 'host' attribute + // Do not actively connect to the satellite by leaving out the 'host' attribute } object Endpoint "icinga2-satellite2.localdomain" { - //do not actively connect to the satellite by leaving out the 'host' attribute + // Do not actively connect to the satellite by leaving out the 'host' attribute } object Endpoint "icinga2-agent1.localdomain" { - //that's us + // That's us } object Zone "satellite" { @@ -2018,15 +2243,15 @@ Example for `icinga2-agent2.localdomain`: [root@icinga2-agent2.localdomain /]# vim /etc/icinga2/zones.conf object Endpoint "icinga2-satellite1.localdomain" { - //do not actively connect to the satellite by leaving out the 'host' attribute + // Do not actively connect to the satellite by leaving out the 'host' attribute } object Endpoint "icinga2-satellite2.localdomain" { - //do not actively connect to the satellite by leaving out the 'host' attribute + // Do not actively connect to the satellite by leaving out the 'host' attribute } object Endpoint "icinga2-agent2.localdomain" { - //that's us + // That's us } object Zone "satellite" { @@ -2049,18 +2274,18 @@ object Zone "director-global" { } ``` -Now it is time to define the two client hosts on the master, sync them to the satellites +Now it is time to define the two agents hosts on the master, sync them to the satellites and apply service checks using the command endpoint execution method to them. -Add the two client nodes as host objects to the `satellite` zone. +Add the two agent nodes as host objects to the `satellite` zone. We've already created the directories in `/etc/icinga2/zones.d` including the files for the -zone and endpoint configuration for the clients. +zone and endpoint configuration for the agents. ``` [root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/satellite ``` -Add the host object configuration for the `icinga2-agent1.localdomain` client. You should +Add the host object configuration for the `icinga2-agent1.localdomain` agent. You should have created the configuration file in the previous steps and it should contain the endpoint and zone object configuration already. @@ -2070,11 +2295,12 @@ and zone object configuration already. object Host "icinga2-agent1.localdomain" { check_command = "hostalive" address = "192.168.56.111" - vars.client_endpoint = name //follows the convention that host name == endpoint name + + vars.agent_endpoint = name // Follows the convention that host name == endpoint name } ``` -Add the host object configuration for the `icinga2-agent2.localdomain` client configuration file: +Add the host object configuration for the `icinga2-agent2.localdomain` agent configuration file: ``` [root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim icinga2-agent2.localdomain.conf @@ -2082,7 +2308,8 @@ Add the host object configuration for the `icinga2-agent2.localdomain` client co object Host "icinga2-agent2.localdomain" { check_command = "hostalive" address = "192.168.56.112" - vars.client_endpoint = name //follows the convention that host name == endpoint name + + vars.agent_endpoint = name // Follows the convention that host name == endpoint name } ``` @@ -2093,7 +2320,8 @@ Add a service object which is executed on the satellite nodes (e.g. `ping4`). Pi apply Service "ping4" { check_command = "ping4" - //check is executed on the satellite node + + // Check is executed on the satellite node assign where host.zone == "satellite" && host.address } ``` @@ -2106,10 +2334,10 @@ Add services using command endpoint checks. Pin the apply rules to the `satellit apply Service "disk" { check_command = "disk" - //specify where the check is executed - command_endpoint = host.vars.client_endpoint + // Execute the check on the remote command endpoint + command_endpoint = host.vars.agent_endpoint - assign where host.zone == "satellite" && host.vars.client_endpoint + assign where host.zone == "satellite" && host.vars.agent_endpoint } ``` @@ -2120,7 +2348,7 @@ Validate the configuration and restart Icinga 2 on the master node `icinga2-mast [root@icinga2-master1.localdomain /]# systemctl restart icinga2 ``` -Open Icinga Web 2 and check the two newly created client hosts with two new services +Open Icinga Web 2 and check the two newly created agent hosts with two new services -- one executed locally (`ping4`) and one using command endpoint (`disk`). > **Tip**: @@ -2128,6 +2356,15 @@ Open Icinga Web 2 and check the two newly created client hosts with two new serv > It's a good idea to add [health checks](06-distributed-monitoring.md#distributed-monitoring-health-checks) to make sure that your cluster notifies you in case of failure. +In terms of health checks, consider adding the following for this scenario: + +- Master nodes check whether the satellite zone is connected +- Satellite nodes check the connection to the agents +- Optional: Add dependencies for the agent host to prevent unwanted notifications when agents are unreachable + +Proceed in [this chapter](06-distributed-monitoring.md#distributed-monitoring-health-checks-master-satellite-agent). + + ## Best Practice We've put together a collection of configuration examples from community feedback. @@ -2145,13 +2382,14 @@ to all nodes depending on them. Common examples are: * Group objects. * TimePeriod objects. -Plugin scripts and binaries cannot be synced, this is for Icinga 2 +Plugin scripts and binaries must not be synced, this is for Icinga 2 configuration files only. Use your preferred package repository and/or configuration management tool (Puppet, Ansible, Chef, etc.) -for that. +for keeping packages and scripts uptodate. **Note**: Checkable objects (hosts and services) cannot be put into a global -zone. The configuration validation will terminate with an error. +zone. The configuration validation will terminate with an error. Apply rules +work as they are evaluated locally on each endpoint. The zone object configuration must be deployed on all nodes which should receive the global configuration files: @@ -2187,7 +2425,7 @@ object CheckCommand "webinject" { } ``` -Restart the client(s) which should receive the global zone before +Restart the endpoints(s) which should receive the global zone before before restarting the parent master/satellite nodes. Then validate the configuration on the master node and restart Icinga 2. @@ -2211,8 +2449,138 @@ checks. In order to minimize the problems caused by this, you should configure additional health checks. -The `cluster` check, for example, will check if all endpoints in the current zone and the directly -connected zones are working properly: +#### cluster-zone with Masters and Agents + +The `cluster-zone` check will test whether the configured target zone is currently +connected or not. This example adds a health check for the [ha master with agents scenario](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-agents). + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/services.conf + +apply Service "agent-health" { + check_command = "cluster-zone" + + display_name = "cluster-health-" + host.name + + /* This follows the convention that the agent zone name is the FQDN which is the same as the host object name. */ + vars.cluster_zone = host.name + + assign where host.vars.agent_endpoint +} +``` + +In order to prevent unwanted notifications, add a service dependency which gets applied to +all services using the command endpoint mode. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/dependencies.conf + +apply Dependency "agent-health-check" to Service { + parent_service_name = "agent-health" + + states = [ OK ] // Fail if the parent service state switches to NOT-OK + disable_notifications = true + + assign where host.vars.agent_endpoint // Automatically assigns all agent endpoint checks as child services on the matched host + ignore where service.name == "agent-health" // Avoid a self reference from child to parent +} +``` + +#### cluster-zone with Masters, Satellites and Agents + +This example adds health checks for the [master, satellites and agents scenario](06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-agents). + +Whenever the connection between the master and satellite zone breaks, +you may encounter late check results in Icinga Web. In order to view +this failure and also send notifications, add the following configuration: + +First, add the two masters as host objects to the master zone, if not already +existing. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/hosts.conf + +object Host "icinga2-master1.localdomain" { + check_command = "hostalive" + + address = "192.168.56.101" +} + +object Host "icinga2-master2.localdomain" { + check_command = "hostalive" + + address = "192.168.56.102" +} +``` + +Add service health checks against the satellite zone. + +``` +[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/health.conf + +apply Service "satellite-zone-health" { + check_command = "cluster-zone" + check_interval = 30s + retry_interval = 10s + + vars.cluster_zone = "satellite" + + assign where match("icinga2-master*.localdomain", host.name) +} +``` + +**Don't forget to create notification apply rules for these services.** + +Next are health checks for agents connected to the satellite zone. +Navigate into the satellite directory in `zones.d`: + +``` +[root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/satellite +``` + +You should already have configured agent host objects following [the master, satellite, agents scenario](06-distributed-monitoring.md#distributed-monitoring-scenarios-master-satellite-agents). +Add a new configuration file where all the health checks are defined. + +``` +[root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim health.conf + +apply Service "agent-health" { + check_command = "cluster-zone" + + display_name = "agent-health-" + host.name + + // This follows the convention that the agent zone name is the FQDN which is the same as the host object name. + vars.cluster_zone = host.name + + // Create this health check for agent hosts in the satellite zone + assign where host.zone == "satellite" && host.vars.agent_endpoint +} +``` + +In order to prevent unwanted notifications, add a service dependency which gets applied to +all services using the command endpoint mode. + +``` +[root@icinga2-master1.localdomain /etc/icinga2/zones.d/satellite]# vim health.conf + +apply Dependency "agent-health-check" to Service { + parent_service_name = "agent-health" + + states = [ OK ] // Fail if the parent service state switches to NOT-OK + disable_notifications = true + + assign where host.zone == "satellite" && host.vars.agent_endpoint // Automatically assigns all agent endpoint checks as child services on the matched host + ignore where service.name == "agent-health" // Avoid a self reference from child to parent +} +``` + +This is all done on the configuration master, and requires the scenario to be fully up and running. + +#### Cluster Check + +The `cluster` check will check if all endpoints in the current zone and the directly +connected zones are working properly. The disadvantage of using this check is that +you cannot monitor 3 or more cluster levels with it. ``` [root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/zones.d/master @@ -2234,57 +2602,6 @@ object Service "cluster" { } ``` -The `cluster-zone` check will test whether the configured target zone is currently -connected or not. This example adds a health check for the [ha master with clients scenario](06-distributed-monitoring.md#distributed-monitoring-scenarios-ha-master-clients). - -``` -[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/services.conf - -apply Service "cluster-health" { - check_command = "cluster-zone" - - display_name = "cluster-health-" + host.name - - /* This follows the convention that the client zone name is the FQDN which is the same as the host object name. */ - vars.cluster_zone = host.name - - assign where host.vars.client_endpoint -} -``` - -In case you cannot assign the `cluster_zone` attribute, add specific -checks to your cluster: - -``` -[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/cluster.conf - -object Service "cluster-zone-satellite" { - check_command = "cluster-zone" - check_interval = 5s - retry_interval = 1s - vars.cluster_zone = "satellite" - - host_name = "icinga2-master1.localdomain" -} -``` - -If you are using top down checks with command endpoint configuration, you can -add a dependency which prevents notifications for all other failing services: - -``` -[root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.d/master/dependencies.conf - -apply Dependency "health-check" to Service { - parent_service_name = "cluster-health" - - states = [ OK ] - disable_notifications = true - - assign where host.vars.client_endpoint - ignore where service.name == "cluster-health" -} -``` - ### Pin Checks in a Zone In case you want to pin specific checks to their endpoints in a given zone you'll need to use @@ -2329,7 +2646,7 @@ C:\WINDOWS\system32>netsh advfirewall firewall add rule name="ICMP Allow incomin #### Icinga 2 -If your master/satellite nodes should actively connect to the Windows client +If your master/satellite nodes should actively connect to the Windows agent you'll also need to ensure that port `5665` is enabled. ``` @@ -2346,7 +2663,7 @@ C:\WINDOWS\system32>netsh advfirewall firewall add rule name="Open port 8443 (NS ``` For security reasons, it is advised to enable the NSClient++ HTTP API for local -connection from the Icinga 2 client only. Remote connections to the HTTP API +connection from the Icinga agent only. Remote connections to the HTTP API are not recommended with using the legacy HTTP API. ### Windows Client and Plugins @@ -2354,7 +2671,7 @@ are not recommended with using the legacy HTTP API. The Icinga 2 package on Windows already provides several plugins. Detailed [documentation](10-icinga-template-library.md#windows-plugins) is available for all check command definitions. -Add the following `include` statement on all your nodes (master, satellite, client): +Add the following `include` statement on all your nodes (master, satellite, agent): ``` vim /etc/icinga2/icinga2.conf @@ -2362,10 +2679,10 @@ vim /etc/icinga2/icinga2.conf include ``` -Based on the [master with clients](06-distributed-monitoring.md#distributed-monitoring-master-clients) +Based on the [master with agents](06-distributed-monitoring.md#distributed-monitoring-master-agents) scenario we'll now add a local disk check. -First, add the client node as host object: +First, add the agent node as host object: ``` [root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/master @@ -2374,7 +2691,7 @@ First, add the client node as host object: object Host "icinga2-agent2.localdomain" { check_command = "hostalive" address = "192.168.56.112" - vars.client_endpoint = name //follows the convention that host name == endpoint name + vars.agent_endpoint = name //follows the convention that host name == endpoint name vars.os_type = "windows" } ``` @@ -2391,9 +2708,9 @@ apply Service "disk C:" { vars.disk_win_path = "C:" //specify where the check is executed - command_endpoint = host.vars.client_endpoint + command_endpoint = host.vars.agent_endpoint - assign where host.vars.os_type == "windows" && host.vars.client_endpoint + assign where host.vars.os_type == "windows" && host.vars.agent_endpoint } ``` @@ -2415,7 +2732,7 @@ for the requirements. There are two methods available for querying NSClient++: -* Query the [HTTP API](06-distributed-monitoring.md#distributed-monitoring-windows-nscp-check-api) locally from an Icinga 2 client (requires a running NSClient++ service) +* Query the [HTTP API](06-distributed-monitoring.md#distributed-monitoring-windows-nscp-check-api) locally from an Icinga agent (requires a running NSClient++ service) * Run a [local CLI check](06-distributed-monitoring.md#distributed-monitoring-windows-nscp-check-local) (does not require NSClient++ as a service) Both methods have their advantages and disadvantages. One thing to @@ -2424,14 +2741,14 @@ CPU utilization, please use the HTTP API instead of the CLI sample call. #### NSCLient++ with check_nscp_api -The [Windows setup](06-distributed-monitoring.md#distributed-monitoring-setup-client-windows) already allows +The [Windows setup](06-distributed-monitoring.md#distributed-monitoring-setup-agent-windows) already allows you to install the NSClient++ package. In addition to the Windows plugins you can use the [nscp_api command](10-icinga-template-library.md#nscp-check-api) provided by the Icinga Template Library (ITL). The initial setup for the NSClient++ API and the required arguments is the described in the ITL chapter for the [nscp_api](10-icinga-template-library.md#nscp-check-api) CheckCommand. -Based on the [master with clients](06-distributed-monitoring.md#distributed-monitoring-master-clients) +Based on the [master with agents](06-distributed-monitoring.md#distributed-monitoring-master-agents) scenario we'll now add a local nscp check which queries the NSClient++ API to check the free disk space. Define a host object called `icinga2-agent2.localdomain` on the master. Add the `nscp_api_password` @@ -2442,12 +2759,13 @@ custom variable and specify the drives to check. [root@icinga2-master1.localdomain /etc/icinga2/zones.d/master]# vim hosts.conf object Host "icinga2-agent1.localdomain" { -check_command = "hostalive" -address = "192.168.56.111" -vars.client_endpoint = name //follows the convention that host name == endpoint name -vars.os_type = "Windows" -vars.nscp_api_password = "icinga" -vars.drives = [ "C:", "D:" ] + check_command = "hostalive" + address = "192.168.56.111" + + vars.agent_endpoint = name //follows the convention that host name == endpoint name + vars.os_type = "Windows" + vars.nscp_api_password = "icinga" + vars.drives = [ "C:", "D:" ] } ``` @@ -2461,7 +2779,7 @@ apply Service "nscp-api-" for (drive in host.vars.drives) { import "generic-service" check_command = "nscp_api" - command_endpoint = host.vars.client_endpoint + command_endpoint = host.vars.agent_endpoint //display_name = "nscp-drive-" + drive @@ -2490,7 +2808,7 @@ the command on the master. This also requires a different value for `nscp_api_ho which defaults to `host.address`. ``` - //command_endpoint = host.vars.client_endpoint + //command_endpoint = host.vars.agent_endpoint //vars.nscp_api_host = "localhost" ``` @@ -2505,12 +2823,13 @@ If you want to monitor specific Windows services, you could use the following ex [root@icinga2-master1.localdomain /etc/icinga2/zones.d/master]# vim hosts.conf object Host "icinga2-agent1.localdomain" { -check_command = "hostalive" -address = "192.168.56.111" -vars.client_endpoint = name //follows the convention that host name == endpoint name -vars.os_type = "Windows" -vars.nscp_api_password = "icinga" -vars.services = [ "Windows Update", "wscsvc" ] + check_command = "hostalive" + address = "192.168.56.111" + + vars.agent_endpoint = name //follows the convention that host name == endpoint name + vars.os_type = "Windows" + vars.nscp_api_password = "icinga" + vars.services = [ "Windows Update", "wscsvc" ] } [root@icinga2-master1.localdomain /etc/icinga2/zones.d/master]# vim services.conf @@ -2519,7 +2838,7 @@ apply Service "nscp-api-" for (svc in host.vars.services) { import "generic-service" check_command = "nscp_api" - command_endpoint = host.vars.client_endpoint + command_endpoint = host.vars.agent_endpoint //display_name = "nscp-service-" + svc @@ -2534,14 +2853,14 @@ apply Service "nscp-api-" for (svc in host.vars.services) { #### NSCLient++ with nscp-local -The [Windows setup](06-distributed-monitoring.md#distributed-monitoring-setup-client-windows) already allows -you to install the NSClient++ package. In addition to the Windows plugins you can +The [Windows setup](06-distributed-monitoring.md#distributed-monitoring-setup-agent-windows) allows +you to install the bundled NSClient++ package. In addition to the Windows plugins you can use the [nscp-local commands](10-icinga-template-library.md#nscp-plugin-check-commands) provided by the Icinga Template Library (ITL). ![Icinga 2 Distributed Monitoring Windows Client with NSClient++](images/distributed-monitoring/icinga2_distributed_windows_nscp.png) -Add the following `include` statement on all your nodes (master, satellite, client): +Add the following `include` statement on all your nodes (master, satellite, agent): ``` vim /etc/icinga2/icinga2.conf @@ -2552,10 +2871,10 @@ include The CheckCommand definitions will automatically determine the installed path to the `nscp.exe` binary. -Based on the [master with clients](06-distributed-monitoring.md#distributed-monitoring-master-clients) +Based on the [master with agents](06-distributed-monitoring.md#distributed-monitoring-master-agents) scenario we'll now add a local nscp check querying a given performance counter. -First, add the client node as host object: +First, add the agent node as host object: ``` [root@icinga2-master1.localdomain /]# cd /etc/icinga2/zones.d/master @@ -2564,7 +2883,8 @@ First, add the client node as host object: object Host "icinga2-agent1.localdomain" { check_command = "hostalive" address = "192.168.56.111" - vars.client_endpoint = name //follows the convention that host name == endpoint name + + vars.agent_endpoint = name //follows the convention that host name == endpoint name vars.os_type = "windows" } ``` @@ -2577,7 +2897,7 @@ Next, add a performance counter check using command endpoint checks (details in apply Service "nscp-local-counter-cpu" { check_command = "nscp-local-counter" - command_endpoint = host.vars.client_endpoint + command_endpoint = host.vars.agent_endpoint vars.nscp_counter_name = "\\Processor(_total)\\% Processor Time" vars.nscp_counter_perfsyntax = "Total Processor Time" @@ -2586,7 +2906,7 @@ apply Service "nscp-local-counter-cpu" { vars.nscp_counter_showall = true - assign where host.vars.os_type == "windows" && host.vars.client_endpoint + assign where host.vars.os_type == "windows" && host.vars.agent_endpoint } ``` @@ -2711,11 +3031,11 @@ data duplication in split-brain-scenarios. The failover timeout can be set for t ### Endpoint Connection Direction -Nodes will attempt to connect to another node when its local [Endpoint](09-object-types.md#objecttype-endpoint) object +Endpoints attempt to connect to another endpoint when its local [Endpoint](09-object-types.md#objecttype-endpoint) object configuration specifies a valid `host` attribute (FQDN or IP address). Example for the master node `icinga2-master1.localdomain` actively connecting -to the client node `icinga2-agent1.localdomain`: +to the agent node `icinga2-agent1.localdomain`: ``` [root@icinga2-master1.localdomain /]# vim /etc/icinga2/zones.conf @@ -2723,12 +3043,12 @@ to the client node `icinga2-agent1.localdomain`: //... object Endpoint "icinga2-agent1.localdomain" { - host = "192.168.56.111" //the master actively tries to connect to the client - log_duration = 0 + host = "192.168.56.111" // The master actively tries to connect to the agent + log_duration = 0 // Disable the replay log for command endpoint agents } ``` -Example for the client node `icinga2-agent1.localdomain` not actively +Example for the agent node `icinga2-agent1.localdomain` not actively connecting to the master node `icinga2-master1.localdomain`: ``` @@ -2737,16 +3057,17 @@ connecting to the master node `icinga2-master1.localdomain`: //... object Endpoint "icinga2-master1.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute - log_duration = 0 + // Do not actively connect to the master by leaving out the 'host' attribute + log_duration = 0 // Disable the replay log for command endpoint agents } ``` -It is not necessary that both the master and the client node establish +It is not necessary that both the master and the agent node establish two connections to each other. Icinga 2 will only use one connection -and close the second connection if established. +and close the second connection if established. This generates useless +CPU cycles and leads to blocking resources when the connection times out. -**Tip**: Choose either to let master/satellite nodes connect to client nodes +**Tip**: Choose either to let master/satellite nodes connect to agent nodes or vice versa. @@ -2757,12 +3078,12 @@ keep the same history (check results, notifications, etc.) when nodes are tempor disconnected and then reconnect. This functionality is not needed when a master/satellite node is sending check -execution events to a client which is purely configured for [command endpoint](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) -checks only. +execution events to an agent which is configured as [command endpoint](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) +for check execution. The [Endpoint](09-object-types.md#objecttype-endpoint) object attribute `log_duration` can be lower or set to 0 to fully disable any log replay updates when the -client is not connected. +agent is not connected. Configuration on the master node `icinga2-master1.localdomain`: @@ -2772,17 +3093,17 @@ Configuration on the master node `icinga2-master1.localdomain`: //... object Endpoint "icinga2-agent1.localdomain" { - host = "192.168.56.111" //the master actively tries to connect to the client + host = "192.168.56.111" // The master actively tries to connect to the agent log_duration = 0 } object Endpoint "icinga2-agent2.localdomain" { - host = "192.168.56.112" //the master actively tries to connect to the client + host = "192.168.56.112" // The master actively tries to connect to the agent log_duration = 0 } ``` -Configuration on the client `icinga2-agent1.localdomain`: +Configuration on the agent `icinga2-agent1.localdomain`: ``` [root@icinga2-agent1.localdomain /]# vim /etc/icinga2/zones.conf @@ -2790,12 +3111,12 @@ Configuration on the client `icinga2-agent1.localdomain`: //... object Endpoint "icinga2-master1.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute log_duration = 0 } object Endpoint "icinga2-master2.localdomain" { - //do not actively connect to the master by leaving out the 'host' attribute + // Do not actively connect to the master by leaving out the 'host' attribute log_duration = 0 } ``` @@ -2918,7 +3239,7 @@ This will tremendously help when someone is trying to help in the [community cha ### Silent Windows Setup -If you want to install the client silently/unattended, use the `/qn` modifier. The +If you want to install the agent silently/unattended, use the `/qn` modifier. The installation should not trigger a restart, but if you want to be completely sure, you can use the `/norestart` modifier. ``` @@ -2967,8 +3288,10 @@ In case you don't need anything in `conf.d`, use the following command line: [root@icinga2-master1.localdomain /]# icinga2 node setup --master --disable-confd ``` + + -#### Node Setup with Satellites/Clients +#### Node Setup with Agents/Satellites Make sure that the `/var/lib/icinga2/certs` directory exists and is owned by the `icinga` user (or the user Icinga 2 is running as). @@ -3027,7 +3350,7 @@ Pass the following details to the `node setup` CLI command: Trusted master certificate | **Required.** Add the previously fetched trusted master certificate (this step means that you've verified its origin). Parent host | **Optional.** FQDN or IP address of the parent host. This is where the command connects for CSR signing. If not specified, you need to manually copy the parent's public CA certificate file into `/var/lib/icinga2/certs/ca.crt` in order to start Icinga 2. Parent endpoint | **Required.** Specify the parent's endpoint name. - Client zone name | **Required.** Specify the client's zone name. + Local zone name | **Required.** Specify the agent/satellite zone name. Parent zone name | **Optional.** Specify the parent's zone name. Accept config | **Optional.** Whether this node accepts configuration sync from the master node (required for [config sync mode](06-distributed-monitoring.md#distributed-monitoring-top-down-config-sync)). Accept commands | **Optional.** Whether this node accepts command execution messages from the master node (required for [command endpoint mode](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint)). @@ -3052,7 +3375,7 @@ Example: --disable-confd ``` -In case the client should connect to the master node, you'll +In case the agent/satellite should connect to the master node, you'll need to modify the `--endpoint` parameter using the format `cn,host,port`: ``` @@ -3060,13 +3383,13 @@ need to modify the `--endpoint` parameter using the format `cn,host,port`: ``` Specify the parent zone using the `--parent_zone` parameter. This is useful -if the client connects to a satellite, not the master instance. +if the agent connects to a satellite, not the master instance. ``` --parent_zone satellite ``` -In case the client should know the additional global zone `linux-templates`, you'll +In case the agent should know the additional global zone `linux-templates`, you'll need to set the `--global-zones` parameter. ``` @@ -3077,7 +3400,7 @@ The `--parent-host` parameter is optional since v2.9 and allows you to perform a You cannot restart Icinga 2 yet, the CLI command asked to to manually copy the parent's public CA certificate file in `/var/lib/icinga2/certs/ca.crt`. Once Icinga 2 is started, it sends a ticket signing request to the parent node. If you have provided a ticket, the master node -signs the request and sends it back to the client which performs a certificate update in-memory. +signs the request and sends it back to the agent/satellite which performs a certificate update in-memory. In case you did not provide a ticket, you need to manually sign the CSR on the master node which holds the CA's key pair. @@ -3085,7 +3408,7 @@ which holds the CA's key pair. **You can find additional best practices below.** -If this client node is configured as [remote command endpoint execution](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) +If this agent node is configured as [remote command endpoint execution](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) you can safely disable the `checker` feature. The `node setup` CLI command already disabled the `notification` feature. ``` @@ -3093,7 +3416,7 @@ you can safely disable the `checker` feature. The `node setup` CLI command alrea ``` Disable "conf.d" inclusion if this is a [top down](06-distributed-monitoring.md#distributed-monitoring-top-down) -configured client. +configured agent. ``` [root@icinga2-agent1.localdomain /]# sed -i 's/include_recursive "conf.d"/\/\/include_recursive "conf.d"/g' /etc/icinga2/icinga2.conf @@ -3106,7 +3429,7 @@ configured client. ``` [root@icinga2-agent1.localdomain /]# cat </etc/icinga2/conf.d/api-users.conf object ApiUser "root" { - password = "clientsupersecretpassword" + password = "agentsupersecretpassword" permissions = ["*"] } EOF @@ -3130,7 +3453,7 @@ Your automation tool must then configure master node in the meantime. ``` # cat <>/etc/icinga2/zones.conf object Endpoint "icinga2-agent1.localdomain" { - //client connects itself + // Agent connects itself } object Zone "icinga2-agent1.localdomain" { @@ -3143,6 +3466,10 @@ EOF ## Using Multiple Environments +> **Note** +> +> This documentation only covers the basics. Full functionality requires a not yet released addon. + In some cases it can be desired to run multiple Icinga instances on the same host. Two potential scenarios include: