From 95f0a7a0221e03b0d95ef3707cd909c44e37b46a Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Fri, 8 Sep 2017 13:40:09 +0200 Subject: [PATCH] Docs: Technical Concepts for cluster and signing refs #5450 --- doc/03-monitoring-basics.md | 2 +- doc/07-agent-based-monitoring.md | 2 +- doc/14-features.md | 15 +- doc/15-troubleshooting.md | 2 +- doc/19-technical-concepts.md | 270 ++++++++++++++++++ ...ript-debugger.md => 20-script-debugger.md} | 0 doc/{20-development.md => 21-development.md} | 0 doc/{21-selinux.md => 22-selinux.md} | 0 ...a-1x.md => 23-migrating-from-icinga-1x.md} | 16 +- doc/{23-appendix.md => 24-appendix.md} | 12 +- 10 files changed, 297 insertions(+), 22 deletions(-) create mode 100644 doc/19-technical-concepts.md rename doc/{19-script-debugger.md => 20-script-debugger.md} (100%) rename doc/{20-development.md => 21-development.md} (100%) rename doc/{21-selinux.md => 22-selinux.md} (100%) rename doc/{22-migrating-from-icinga-1x.md => 23-migrating-from-icinga-1x.md} (99%) rename doc/{23-appendix.md => 24-appendix.md} (99%) diff --git a/doc/03-monitoring-basics.md b/doc/03-monitoring-basics.md index eb8b6c197..2bd040f37 100644 --- a/doc/03-monitoring-basics.md +++ b/doc/03-monitoring-basics.md @@ -2155,7 +2155,7 @@ Rephrased: If the parent service object changes into the `Warning` state, this dependency will fail and render all child objects (hosts or services) unreachable. You can determine the child's reachability by querying the `is_reachable` attribute -in for example [DB IDO](23-appendix.md#schema-db-ido-extensions). +in for example [DB IDO](24-appendix.md#schema-db-ido-extensions). ### Implicit Dependencies for Services on Host diff --git a/doc/07-agent-based-monitoring.md b/doc/07-agent-based-monitoring.md index a79feba47..e2597a341 100644 --- a/doc/07-agent-based-monitoring.md +++ b/doc/07-agent-based-monitoring.md @@ -181,7 +181,7 @@ SNMP Traps can be received and filtered by using [SNMPTT](http://snmptt.sourcefo and specific trap handlers passing the check results to Icinga 2. Following the SNMPTT [Format](http://snmptt.sourceforge.net/docs/snmptt.shtml#SNMPTT.CONF-FORMAT) -documentation and the Icinga external command syntax found [here](23-appendix.md#external-commands-list-detail) +documentation and the Icinga external command syntax found [here](24-appendix.md#external-commands-list-detail) we can create generic services that can accommodate any number of hosts for a given scenario. ### Simple SNMP Traps diff --git a/doc/14-features.md b/doc/14-features.md index 5b7b5e372..f356e3073 100644 --- a/doc/14-features.md +++ b/doc/14-features.md @@ -78,11 +78,16 @@ Example for PostgreSQL: (1 Zeile) -A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](23-appendix.md#schema-db-ido). +A detailed list on the available table attributes can be found in the [DB IDO Schema documentation](24-appendix.md#schema-db-ido). ## External Commands +> **Note** +> +> Please use the [REST API](12-icinga2-api.md#icinga2-api) as modern and secure alternative +> for external actions. + Icinga 2 provides an external command pipe for processing commands triggering specific actions (for example rescheduling a service check through the web interface). @@ -106,7 +111,7 @@ a forced service check: Oct 17 15:01:25 icinga-server icinga2: Executing external command: [1382014885] SCHEDULE_FORCED_SVC_CHECK;localhost;ping4;1382014885 Oct 17 15:01:25 icinga-server icinga2: Rescheduling next check for service 'ping4' -A list of currently supported external commands can be found [here](23-appendix.md#external-commands-list-detail). +A list of currently supported external commands can be found [here](24-appendix.md#external-commands-list-detail). Detailed information on the commands and their required parameters can be found on the [Icinga 1.x documentation](https://docs.icinga.com/latest/en/extcommands2.html). @@ -441,7 +446,7 @@ re-implementation of the Livestatus protocol which is compatible with MK Livestatus. Details on the available tables and attributes with Icinga 2 can be found -in the [Livestatus Schema](23-appendix.md#schema-livestatus) section. +in the [Livestatus Schema](24-appendix.md#schema-livestatus) section. You can enable Livestatus using icinga2 feature enable: @@ -517,7 +522,7 @@ Example using the tcp socket listening on port `6558`: ### Livestatus COMMAND Queries -A list of available external commands and their parameters can be found [here](23-appendix.md#external-commands-list-detail) +A list of available external commands and their parameters can be found [here](24-appendix.md#external-commands-list-detail) $ echo -e 'COMMAND ' | netcat 127.0.0.1 6558 @@ -618,7 +623,7 @@ Default separators. The `commands` table is populated with `CheckCommand`, `EventCommand` and `NotificationCommand` objects. -A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](23-appendix.md#schema-livestatus). +A detailed list on the available table attributes can be found in the [Livestatus Schema documentation](24-appendix.md#schema-livestatus). ## Status Data Files diff --git a/doc/15-troubleshooting.md b/doc/15-troubleshooting.md index f8ee88465..f9702911e 100644 --- a/doc/15-troubleshooting.md +++ b/doc/15-troubleshooting.md @@ -29,7 +29,7 @@ findings and details please. * The newest Icinga 2 crash log if relevant, located in `/var/log/icinga2/crash` * Additional details * If the check command failed, what's the output of your manual plugin tests? - * In case of [debugging](20-development.md#development) Icinga 2, the full back traces and outputs + * In case of [debugging](21-development.md#development) Icinga 2, the full back traces and outputs ## Analyze your Environment diff --git a/doc/19-technical-concepts.md b/doc/19-technical-concepts.md new file mode 100644 index 000000000..10d0b9a07 --- /dev/null +++ b/doc/19-technical-concepts.md @@ -0,0 +1,270 @@ +# Technical Concepts + +This chapter provides insights into specific Icinga 2 +components, libraries, features and any other technical concept +and design. + + + +## Features + +Features are implemented in specific libraries and can be enabled +using CLI commands. + +Features either write specific data or receive data. + +Examples for writing data: [DB IDO](14-features.md#db-ido), [Graphite](14-features.md#graphite-carbon-cache-writer), [InfluxDB](14-features.md#influxdb-writer). [GELF](14-features.md#gelfwriter), etc. +Examples for receiving data: [REST API](12-icinga2-api.md#icinga2-api), etc. + +The implementation of features makes use of existing libraries +and functionality. This makes the code more abstract, but shorter +and easier to read. + +Features register callback functions on specific events they want +to handle. For example the `GraphiteWriter` feature subscribes to +new CheckResult events. + +Each time Icinga 2 receives and processes a new check result, this +event is triggered and forwarded to all subscribers. + +The GraphiteWriter feature calls the registered function and processes +the received data. Features which connect Icinga 2 to external interfaces +normally parse and reformat the received data into an applicable format. + +The GraphiteWriter uses a TCP socket to communicate with the carbon cache +daemon of Graphite. The InfluxDBWriter is instead writing bulk metric messages +to InfluxDB's HTTP API. + + + +## Cluster + +### Communication + +Icinga 2 uses its own certificate authority (CA) by default. The +public and private CA keys can be generated on the signing master. + +Each node certificate must be signed by the private CA key. + +Note: The following description uses `parent node` and `child node`. +This also applies to nodes in the same cluster zone. + +During the connection attempt, an SSL handshake is performed. +If the public certificate of a child node is not signed by the same +CA, the child node is not trusted and the connection will be closed. + +If the SSL handshake succeeds, the parent node reads the +certificate's common name (CN) of the child node and looks for +a local Endpoint object name configuration. + +If there is no Endpoint object found, further communication +(runtime and config sync, etc.) is terminated. + +The child node also checks the CN from the parent node's public +certificate. If the child node does not find any local Endpoint +object name configuration, it will not trust the parent node. + +Both checks prevent accepting cluster messages from an untrusted +source endpoint. + +If an Endpoint match was found, there is one additional security +mechanism in place: Endpoints belong to a Zone hierarchy. + +Several cluster messages can only be sent "top down", others like +check results are allowed being sent from the child to the parent node. + +Once this check succeeds the cluster messages are exchanged and processed. + + +### CSR Signing + +In order to make things easier, Icinga 2 provides built-in methods +to allow child nodes to request a signed certificate from the +signing master. + +Icinga 2 v2.8 introduces the possibility to request certificates +from indirectly connected nodes. This is required for multi level +cluster environments with masters, satellites and clients. + +CSR Signing in general starts with the master setup. This step +ensures that the master is in a working CSR signing state with: + +* public and private CA key in `/var/lib/icinga2/ca` +* private `TicketSalt` constant defined inside the `api` feature +* Cluster communication is ready and Icinga 2 listens on port 5665 + +The child node setup which is run with CLI commands will now +attempt to connect to the parent node. This is not necessarily +the signing master instance, but could also be a parent satellite node. + +During this process the child node asks the user to verify the +parent node's public certificate to prevent MITM attacks. + +There are two methods to request signed certificates: + +* Add the ticket into the request. This ticket was generated on the master +beforehand and contains hashed details for which client it has been created. +The signing master uses this information to automatically sign the certificate +request. + +* Do not add a ticket into the request. It will be sent to the signing master +which stores the pending request. Manual user interaction with CLI commands +is necessary to sign the request. + +The certificate request is sent as `pki::RequestCertificate` cluster +message to the parent node. + +If the parent node is not the signing master, it stores the request +in `/var/lib/icinga2/certificate-requests` and forwards the +cluster message to its parent node. + +Once the message arrives on the signing master, it first verifies that +the sent certificate request is valid. This is to prevent unwanted errors +or modified requests from the "proxy" node. + +After verification, the signing master checks if the request contains +a valid signing ticket. It hashes the certificate's common name and +compares the value to the received ticket number. + +If the ticket is valid, the certificate request is immediately signed +with CA key. The request is sent back to the client inside a `pki::UpdateCertificate` +cluster message. + +If the child node was not the certificate request origin, it only updates +the cached request for the child node and send another cluster message +down to its child node (e.g. from a satellite to a client). + + +If no ticket was specified, the signing master waits until the +`ca sign` CLI command manually signed the certificate. + +> **Note** +> +> Push notifications for manual request signing is not yet implemented (TODO). + +Once the child node reconnects it synchronizes all signed certificate requests. +This takes some minutes and requires all nodes to reconnect to each other. + + +#### CSR Signing: Clients without parent connection + +There is an additional scenario: The setup on a child node does +not necessarily need a connection to the parent node. + +This mode leaves the node in a semi-configured state. You need +to manually copy the master's public CA key into `/var/lib/icinga2/certs/ca.crt` +on the client before starting Icinga 2. + +The parent node needs to actively connect to the child node. +Once this connections succeeds, the child node will actively +request a signed certificate. + +The update procedure works the same way as above. + +### High Availability + +High availability is automatically enabled between two nodes in the same +cluster zone. + +This requires the same configuration and enabled features on both nodes. + +HA zone members trust each other and share event updates as cluster messages. +This includes for example check results, next check timestamp updates, acknowledgements +or notifications. + +This ensures that both nodes are synchronized. If one node goes away, the +remaining node takes over and continues as normal. + + +Cluster nodes automatically determine the authority for configuration +objects. This results in activated but paused objects. You can verify +that by querying the `paused` attribute for all objects via REST API +or debug console. + +Nodes inside a HA zone calculate the object authority independent from each other. + +The number of endpoints in a zone is defined through the configuration. This number +is used inside a local modulo calculation to determine whether the node feels +responsible for this object or not. + +This object authority is important for selected features explained below. + +Since features are configuration objects too, you must ensure that all nodes +inside the HA zone share the same enabled features. If configured otherwise, +one might have a checker feature on the left node, nothing on the right node. +This leads to late check results because one half is not executed by the right +node which holds half of the object authorities. + +### High Availability: Checker + +The `checker` feature only executes checks for `Checkable` objects (Host, Service) +where it is authoritative. + +That way each node only executes checks for a segment of the overall configuration objects. + +The cluster message routing ensures that all check results are synchronized +to nodes which are not authoritative for this configuration object. + + +### High Availability: Notifications + +The `notification` feature only sends notifications for `Notification` objects +where it is authoritative. + +That way each node only executes notifications for a segment of all notification objects. + +Notified users and other event details are synchronized throughout the cluster. +This is required if for example the DB IDO feature is active on the other node. + +### High Availability: DB IDO + +If you don't have HA enabled for the IDO feature, both nodes will +write their status and historical data to their own separate database +backends. + +In order to avoid data separation and a split view (each node would require its +own Icinga Web 2 installation on top), the high availability option was added +to the DB IDO feature. This is enabled by default with the `enable_ha` setting. + +This requires a central database backend. Best practice is to use a MySQL cluster +with a virtual IP. + +Both Icinga 2 nodes require the connection and credential details configured in +their DB IDO feature. + +During startup Icinga 2 calculates whether the feature configuration object +is authoritative on this node or not. The order is an alpha-numeric +comparison, e.g. if you have `master1` and `master2`, Icinga 2 will enable +the DB IDO feature on `master2` by default. + +If the connection between endpoints drops, the object authority is re-calculated. + +In order to prevent data duplication in a split-brain scenario where both +nodes would write into the same database, there is another safety mechanism +in place. + +The split-brain decision which node will write to the database is calculated +from a quorum inside the `programstatus` table. Each node +verifies whether the `endpoint_name` column is not itself on database connect. +In addition to that the DB IDO feature compares the `last_update_time` column +against the current timestamp plus the configured `failover_timeout` offset. + +That way only one active DB IDO feature writes to the database, even if they +are not currently connected in a cluster zone. This prevents data duplication +in historical tables. + + diff --git a/doc/19-script-debugger.md b/doc/20-script-debugger.md similarity index 100% rename from doc/19-script-debugger.md rename to doc/20-script-debugger.md diff --git a/doc/20-development.md b/doc/21-development.md similarity index 100% rename from doc/20-development.md rename to doc/21-development.md diff --git a/doc/21-selinux.md b/doc/22-selinux.md similarity index 100% rename from doc/21-selinux.md rename to doc/22-selinux.md diff --git a/doc/22-migrating-from-icinga-1x.md b/doc/23-migrating-from-icinga-1x.md similarity index 99% rename from doc/22-migrating-from-icinga-1x.md rename to doc/23-migrating-from-icinga-1x.md index 831841e95..4faeccc3b 100644 --- a/doc/22-migrating-from-icinga-1x.md +++ b/doc/23-migrating-from-icinga-1x.md @@ -11,7 +11,7 @@ on your migration requirements. For a long-term migration of your configuration you should consider re-creating your configuration based on the proposed Icinga 2 configuration paradigm. -Please read the [next chapter](22-migrating-from-icinga-1x.md#differences-1x-2) to find out more about the differences +Please read the [next chapter](23-migrating-from-icinga-1x.md#differences-1x-2) to find out more about the differences between 1.x and 2. ### Manual Config Migration Hints @@ -24,7 +24,7 @@ The examples are taken from Icinga 1.x test and production environments and conv straight into a possible Icinga 2 format. If you found a different strategy, please let us know! -If you require in-depth explanations, please check the [next chapter](22-migrating-from-icinga-1x.md#differences-1x-2). +If you require in-depth explanations, please check the [next chapter](23-migrating-from-icinga-1x.md#differences-1x-2). #### Manual Config Migration Hints for Intervals @@ -185,7 +185,7 @@ While you could manually migrate this like (please note the new generic command #### Manual Config Migration Hints for Runtime Macros -Runtime macros have been renamed. A detailed comparison table can be found [here](22-migrating-from-icinga-1x.md#differences-1x-2-runtime-macros). +Runtime macros have been renamed. A detailed comparison table can be found [here](23-migrating-from-icinga-1x.md#differences-1x-2-runtime-macros). For example, accessing the service check output looks like the following in Icinga 1.x: @@ -257,7 +257,7 @@ while the service check command resolves its value to the service attribute attr #### Manual Config Migration Hints for Contacts (Users) Contacts in Icinga 1.x act as users in Icinga 2, but do not have any notification commands specified. -This migration part is explained in the [next chapter](22-migrating-from-icinga-1x.md#manual-config-migration-hints-notifications). +This migration part is explained in the [next chapter](23-migrating-from-icinga-1x.md#manual-config-migration-hints-notifications). define contact{ contact_name testconfig-user @@ -267,7 +267,7 @@ This migration part is explained in the [next chapter](22-migrating-from-icinga- email icinga@localhost } -The `service_notification_options` can be [mapped](22-migrating-from-icinga-1x.md#manual-config-migration-hints-notification-filters) +The `service_notification_options` can be [mapped](23-migrating-from-icinga-1x.md#manual-config-migration-hints-notification-filters) into generic `state` and `type` filters, if additional notification filtering is required. `alias` gets renamed to `display_name`. @@ -319,7 +319,7 @@ Assign it to the host or service and set the newly generated notification comman Convert the `notification_options` attribute from Icinga 1.x to Icinga 2 `states` and `types`. Details -[here](22-migrating-from-icinga-1x.md#manual-config-migration-hints-notification-filters). Add the notification period. +[here](23-migrating-from-icinga-1x.md#manual-config-migration-hints-notification-filters). Add the notification period. states = [ OK, Warning, Critical ] types = [ Recovery, Problem, Custom ] @@ -556,7 +556,7 @@ enabled. assign where "hg_svcdep2" in host.groups } -Host dependencies are explained in the [next chapter](22-migrating-from-icinga-1x.md#manual-config-migration-hints-host-parents). +Host dependencies are explained in the [next chapter](23-migrating-from-icinga-1x.md#manual-config-migration-hints-host-parents). @@ -955,7 +955,7 @@ In Icinga 1.x arguments are specified in the `check_command` attribute and are separated from the command name using an exclamation mark (`!`). Please check the migration hints for a detailed -[migration example](22-migrating-from-icinga-1x.md#manual-config-migration-hints-check-command-arguments). +[migration example](23-migrating-from-icinga-1x.md#manual-config-migration-hints-check-command-arguments). > **Note** > diff --git a/doc/23-appendix.md b/doc/24-appendix.md similarity index 99% rename from doc/23-appendix.md rename to doc/24-appendix.md index 3453ee993..f57c1bf33 100644 --- a/doc/23-appendix.md +++ b/doc/24-appendix.md @@ -692,16 +692,16 @@ Not supported: `debug_info`. #### Livestatus Hostsbygroup Table Attributes -All [hosts](23-appendix.md#schema-livestatus-hosts-table-attributes) table attributes grouped with -the [hostgroups](23-appendix.md#schema-livestatus-hostgroups-table-attributes) table prefixed with `hostgroup_`. +All [hosts](24-appendix.md#schema-livestatus-hosts-table-attributes) table attributes grouped with +the [hostgroups](24-appendix.md#schema-livestatus-hostgroups-table-attributes) table prefixed with `hostgroup_`. #### Livestatus Servicesbygroup Table Attributes -All [services](23-appendix.md#schema-livestatus-services-table-attributes) table attributes grouped with -the [servicegroups](23-appendix.md#schema-livestatus-servicegroups-table-attributes) table prefixed with `servicegroup_`. +All [services](24-appendix.md#schema-livestatus-services-table-attributes) table attributes grouped with +the [servicegroups](24-appendix.md#schema-livestatus-servicegroups-table-attributes) table prefixed with `servicegroup_`. #### Livestatus Servicesbyhostgroup Table Attributes -All [services](23-appendix.md#schema-livestatus-services-table-attributes) table attributes grouped with -the [hostgroups](23-appendix.md#schema-livestatus-hostgroups-table-attributes) table prefixed with `hostgroup_`. +All [services](24-appendix.md#schema-livestatus-services-table-attributes) table attributes grouped with +the [hostgroups](24-appendix.md#schema-livestatus-hostgroups-table-attributes) table prefixed with `hostgroup_`.