From 9c7b3351be78647917fbac735bd245fb5c1c3421 Mon Sep 17 00:00:00 2001 From: Michael Friedrich Date: Wed, 9 Aug 2017 20:49:40 +0200 Subject: [PATCH] Docs: Enhance Troubleshooting with nscp-local, check_source, wrong thresholds --- doc/15-troubleshooting.md | 126 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 126 insertions(+) diff --git a/doc/15-troubleshooting.md b/doc/15-troubleshooting.md index 5150ccac6..68baf5541 100644 --- a/doc/15-troubleshooting.md +++ b/doc/15-troubleshooting.md @@ -307,6 +307,132 @@ Fetch all check result events matching the `event.service` name `random`: $ curl -k -s -u root:icinga -X POST 'https://localhost:5665/v1/events?queue=debugchecks&types=CheckResult&filter=match%28%22random*%22,event.service%29' + +### Analyze Check Source + +Sometimes checks are not executed on the remote host, but on the master and so on. +This could lead into unwanted results or NOT-OK states. + +The `check_source` attribute is the best indication where a check command +was actually executed. This could be a satellite with synced configuration +or a client as remote command bridge -- both will return the check source +as where the plugin is called. + +Example for retrieving the check source from all `disk` services using a +[regex match](18-library-reference.md#global-functions-regex) on the name: + + $ curl -k -s -u root:icinga -H 'Accept: application/json' -H 'X-HTTP-Method-Override: GET' -X POST 'https://localhost:5665/v1/objects/services' \ + -d '{ "filter": "regex(pattern, service.name)", "filter_vars": { "pattern": "^disk" }, "attrs": [ "__name", "last_check_result" ] }' | python -m json.tool + { + "results": [ + { + "attrs": { + "__name": "icinga2-client1.localdomain!disk", + "last_check_result": { + "active": true, + "check_source": "icinga2-client1.localdomain", + + ... + + } + }, + "joins": {}, + "meta": {}, + "name": "icinga2-client1.localdomain!disk", + "type": "Service" + } + ] + } + +Example for using the `icinga2 console` CLI command evaluation functionality: + + $ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/' \ + --eval 'get_service("icinga2-client1.localdomain", "disk").last_check_result.check_source' | python -m json.tool + + "icinga2-client1.localdomain" + + +### NSClient++ Check Errors with nscp-local + +The [nscp-local](10-icinga-template-library.md#nscp-check-local) CheckCommand object definitions call the local `nscp.exe` command. +If a Windows client service check fails to find the `nscp.exe` command, the log output would look like this: + +``` +Command ".\nscp.exe" "client" "-a" "drive=d" "-a" "show-all" "-b" "-q" "check_drivesize" failed to execute: 2, "The system cannot find the file specified." +``` + +or + +``` +Command ". +scp.exe" "client" "-a" "drive=d" "-a" "show-all" "-b" "-q" "check_drivesize" failed to execute: 2, "The system cannot find the file specified." +``` + +The above actually prints `.\\nscp.exe` where the escaped `\n` character gets interpreted as new line. + +Both errors lead to the assumption that the `NscpPath` constant is empty or set to a `.` character. +This could mean the following: + +* The command is **not executed on the Windows client**. Check the [check_source](15-troubleshooting.md#checks-check-source) attribute from the check result. +* You are using an outdated NSClient++ version (0.3.x or 0.4.x) which is not compatible with Icinga 2. +* You are using a custom NSClient++ installer which does not register the correct GUID for NSClient++ + +More troubleshooting: + +Retrieve the `NscpPath` constant on your Windows client: + +``` +C:\Program Files\ICINGA2\sbin\icinga2.exe variable get NscpPath +``` + +If the variable is returned empty, manually test how Icinga 2 would resolve +its path (this can be found inside the ITL): + +``` +C:\Program Files\ICINGA2\sbin\icinga2.exe console --eval "dirname(msi_get_component_path(\"{5C45463A-4AE9-4325-96DB-6E239C034F93}\"))" +``` + +If this command does not return anything, NSClient++ is not properly installed. +Verify that inside the `Programs and Features` (`appwiz.cpl`) control panel. + +You can run the bundled NSClient++ installer from the Icinga 2 Windows package. +The msi package is located in `C:\Program Files\ICINGA2\sbin`. + +The bundled NSClient++ version has properly been tested with Icinga 2. Keep that +in mind when using a different package. + + +### Check Thresholds Not Applied + +This could happen with [clients as command endpoint execution](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint). + +If you have for example a client host `icinga2-client1.localdomain` +and a service `disk` check defined on the master, the warning and +critical thresholds are sometimes to applied and unwanted notification +alerts are raised. + +This happens because the client itself includes a host object with +its `NodeName` and a basic set of checks in the [conf.d](04-configuring-icinga-2.md#conf-d) +directory, i.e. `disk` with the default thresholds. + +Clients which have the `checker` feature enabled will attempt +to execute checks for local services and send their results +back to the master. + +> **Note** +> +> This is part of the deprecated client bottom up mode. + +If you now have the same host and service objects on the +master you will receive wrong check results from the client. + +Solution: + +* Disable the `checker` feature on clients: `icinga2 feature disable checker`. +* Remove the inclusion of [conf.d](04-configuring-icinga-2.md#conf-d) as suggested in the [client setup docs](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint). + + + ### Check Fork Errors We've learned that newer kernel versions introduce a [fork limit for cgroups](https://lwn.net/Articles/663873/)