mirror of
https://github.com/Icinga/icinga2.git
synced 2025-07-23 05:34:48 +02:00
Docs: Enhance Troubleshooting with nscp-local, check_source, wrong thresholds
refs #5487
This commit is contained in:
parent
2b3a65d928
commit
6844c50748
@ -307,6 +307,132 @@ Fetch all check result events matching the `event.service` name `random`:
|
|||||||
$ curl -k -s -u root:icinga -X POST 'https://localhost:5665/v1/events?queue=debugchecks&types=CheckResult&filter=match%28%22random*%22,event.service%29'
|
$ curl -k -s -u root:icinga -X POST 'https://localhost:5665/v1/events?queue=debugchecks&types=CheckResult&filter=match%28%22random*%22,event.service%29'
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
### Analyze Check Source <a id="checks-check-source"></a>
|
||||||
|
|
||||||
|
Sometimes checks are not executed on the remote host, but on the master and so on.
|
||||||
|
This could lead into unwanted results or NOT-OK states.
|
||||||
|
|
||||||
|
The `check_source` attribute is the best indication where a check command
|
||||||
|
was actually executed. This could be a satellite with synced configuration
|
||||||
|
or a client as remote command bridge -- both will return the check source
|
||||||
|
as where the plugin is called.
|
||||||
|
|
||||||
|
Example for retrieving the check source from all `disk` services using a
|
||||||
|
[regex match](18-library-reference.md#global-functions-regex) on the name:
|
||||||
|
|
||||||
|
$ curl -k -s -u root:icinga -H 'Accept: application/json' -H 'X-HTTP-Method-Override: GET' -X POST 'https://localhost:5665/v1/objects/services' \
|
||||||
|
-d '{ "filter": "regex(pattern, service.name)", "filter_vars": { "pattern": "^disk" }, "attrs": [ "__name", "last_check_result" ] }' | python -m json.tool
|
||||||
|
{
|
||||||
|
"results": [
|
||||||
|
{
|
||||||
|
"attrs": {
|
||||||
|
"__name": "icinga2-client1.localdomain!disk",
|
||||||
|
"last_check_result": {
|
||||||
|
"active": true,
|
||||||
|
"check_source": "icinga2-client1.localdomain",
|
||||||
|
|
||||||
|
...
|
||||||
|
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"joins": {},
|
||||||
|
"meta": {},
|
||||||
|
"name": "icinga2-client1.localdomain!disk",
|
||||||
|
"type": "Service"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
|
||||||
|
Example for using the `icinga2 console` CLI command evaluation functionality:
|
||||||
|
|
||||||
|
$ ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://root@localhost:5665/' \
|
||||||
|
--eval 'get_service("icinga2-client1.localdomain", "disk").last_check_result.check_source' | python -m json.tool
|
||||||
|
|
||||||
|
"icinga2-client1.localdomain"
|
||||||
|
|
||||||
|
|
||||||
|
### NSClient++ Check Errors with nscp-local <a id="nsclient-check-errors-nscp-local"></a>
|
||||||
|
|
||||||
|
The [nscp-local](10-icinga-template-library.md#nscp-check-local) CheckCommand object definitions call the local `nscp.exe` command.
|
||||||
|
If a Windows client service check fails to find the `nscp.exe` command, the log output would look like this:
|
||||||
|
|
||||||
|
```
|
||||||
|
Command ".\nscp.exe" "client" "-a" "drive=d" "-a" "show-all" "-b" "-q" "check_drivesize" failed to execute: 2, "The system cannot find the file specified."
|
||||||
|
```
|
||||||
|
|
||||||
|
or
|
||||||
|
|
||||||
|
```
|
||||||
|
Command ".
|
||||||
|
scp.exe" "client" "-a" "drive=d" "-a" "show-all" "-b" "-q" "check_drivesize" failed to execute: 2, "The system cannot find the file specified."
|
||||||
|
```
|
||||||
|
|
||||||
|
The above actually prints `.\\nscp.exe` where the escaped `\n` character gets interpreted as new line.
|
||||||
|
|
||||||
|
Both errors lead to the assumption that the `NscpPath` constant is empty or set to a `.` character.
|
||||||
|
This could mean the following:
|
||||||
|
|
||||||
|
* The command is **not executed on the Windows client**. Check the [check_source](15-troubleshooting.md#checks-check-source) attribute from the check result.
|
||||||
|
* You are using an outdated NSClient++ version (0.3.x or 0.4.x) which is not compatible with Icinga 2.
|
||||||
|
* You are using a custom NSClient++ installer which does not register the correct GUID for NSClient++
|
||||||
|
|
||||||
|
More troubleshooting:
|
||||||
|
|
||||||
|
Retrieve the `NscpPath` constant on your Windows client:
|
||||||
|
|
||||||
|
```
|
||||||
|
C:\Program Files\ICINGA2\sbin\icinga2.exe variable get NscpPath
|
||||||
|
```
|
||||||
|
|
||||||
|
If the variable is returned empty, manually test how Icinga 2 would resolve
|
||||||
|
its path (this can be found inside the ITL):
|
||||||
|
|
||||||
|
```
|
||||||
|
C:\Program Files\ICINGA2\sbin\icinga2.exe console --eval "dirname(msi_get_component_path(\"{5C45463A-4AE9-4325-96DB-6E239C034F93}\"))"
|
||||||
|
```
|
||||||
|
|
||||||
|
If this command does not return anything, NSClient++ is not properly installed.
|
||||||
|
Verify that inside the `Programs and Features` (`appwiz.cpl`) control panel.
|
||||||
|
|
||||||
|
You can run the bundled NSClient++ installer from the Icinga 2 Windows package.
|
||||||
|
The msi package is located in `C:\Program Files\ICINGA2\sbin`.
|
||||||
|
|
||||||
|
The bundled NSClient++ version has properly been tested with Icinga 2. Keep that
|
||||||
|
in mind when using a different package.
|
||||||
|
|
||||||
|
|
||||||
|
### Check Thresholds Not Applied <a id="check-thresholds-not-applied"></a>
|
||||||
|
|
||||||
|
This could happen with [clients as command endpoint execution](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint).
|
||||||
|
|
||||||
|
If you have for example a client host `icinga2-client1.localdomain`
|
||||||
|
and a service `disk` check defined on the master, the warning and
|
||||||
|
critical thresholds are sometimes to applied and unwanted notification
|
||||||
|
alerts are raised.
|
||||||
|
|
||||||
|
This happens because the client itself includes a host object with
|
||||||
|
its `NodeName` and a basic set of checks in the [conf.d](04-configuring-icinga-2.md#conf-d)
|
||||||
|
directory, i.e. `disk` with the default thresholds.
|
||||||
|
|
||||||
|
Clients which have the `checker` feature enabled will attempt
|
||||||
|
to execute checks for local services and send their results
|
||||||
|
back to the master.
|
||||||
|
|
||||||
|
> **Note**
|
||||||
|
>
|
||||||
|
> This is part of the deprecated client bottom up mode.
|
||||||
|
|
||||||
|
If you now have the same host and service objects on the
|
||||||
|
master you will receive wrong check results from the client.
|
||||||
|
|
||||||
|
Solution:
|
||||||
|
|
||||||
|
* Disable the `checker` feature on clients: `icinga2 feature disable checker`.
|
||||||
|
* Remove the inclusion of [conf.d](04-configuring-icinga-2.md#conf-d) as suggested in the [client setup docs](06-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint).
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### Check Fork Errors <a id="check-fork-errors"></a>
|
### Check Fork Errors <a id="check-fork-errors"></a>
|
||||||
|
|
||||||
We've learned that newer kernel versions introduce a [fork limit for cgroups](https://lwn.net/Articles/663873/)
|
We've learned that newer kernel versions introduce a [fork limit for cgroups](https://lwn.net/Articles/663873/)
|
||||||
|
Loading…
x
Reference in New Issue
Block a user