Update documentation (troubleshooting, monitor Icinga 2, configs, integrations, etc.)

fixes #5137
fixes #5140
fixes #1880
fixes #5142
fixes #5144
This commit is contained in:
Michael Friedrich 2017-04-05 19:49:00 +02:00
parent 966c2a4602
commit 1c816ac9ad
8 changed files with 456 additions and 104 deletions

View File

@ -3,7 +3,7 @@
Icinga 2 comes with a number of CLI commands which support bash autocompletion.
These CLI commands will allow you to use certain functionality
provided by and around the Icinga 2 daemon.
provided by and around Icinga 2.
Each CLI command provides its own help and usage information, so please
make sure to always run them with the `--help` parameter.
@ -84,13 +84,13 @@ options.
Bash Auto-Completion (pressing `<TAB>`) is provided only for the corresponding context.
While `--config` will suggest and auto-complete files and directories on disk,
`feature enable` will only suggest disabled features. Try it yourself.
While `--config` suggests and auto-completes files and directories on disk,
`feature enable` only suggests disabled features.
RPM and Debian packages install the bash completion files into
`/etc/bash_completion.d/icinga2`.
You will need to install the `bash-completion` package if not already installed.
You need to install the `bash-completion` package if not already installed.
RHEL/CentOS/Fedora:
@ -117,11 +117,13 @@ into your current session and test it:
By default the `icinga2` binary loads the `icinga` library. A different application type
can be specified with the `--app` command-line option.
Note: This is not needed by the average Icinga user, only developers.
### Libraries
Instead of loading libraries using the [`library` config directive](17-language-reference.md#library)
you can also use the `--library` command-line option.
Note: This is not needed by the average Icinga user, only developers.
### Constants
@ -135,8 +137,8 @@ brackets like this:
include <test.conf>
This would cause Icinga 2 to search its include path for the configuration file
`test.conf`. By default the installation path for the Icinga Template Library
This causes Icinga 2 to search its include path for the configuration file
`test.conf`. By default the installation path for the [Icinga Template Library](10-icinga-template-library.md#icinga-template-library)
is the only search directory.
Using the `--include` command-line option additional search directories can be
@ -145,11 +147,11 @@ added.
## <a id="cli-command-console"></a> CLI command: Console
The CLI command `console` can be used to evaluate Icinga 2 config expressions, e.g. to test
[functions](17-language-reference.md#functions).
The CLI command `console` can be used to debug and evaluate Icinga 2 config expressions,
e.g. to test [functions](17-language-reference.md#functions) in your local sandbox.
$ icinga2 console
Icinga 2 (version: v2.4.0)
Icinga 2 (version: v2.6.0)
<1> => function test(name) {
<1> .. log("Hello " + name)
<1> .. }
@ -159,6 +161,7 @@ The CLI command `console` can be used to evaluate Icinga 2 config expressions, e
null
<3> =>
Further usage examples can be found in the [library reference](18-library-reference.md#library-reference) chapter.
On operating systems without the `libedit` library installed there is no
support for line-editing or a command history. However you can
@ -166,18 +169,23 @@ use the `rlwrap` program if you require those features:
$ rlwrap icinga2 console
The `console` can be used to connect to a running Icinga 2 instance using
The debug console can be used to connect to a running Icinga 2 instance using
the [REST API](12-icinga2-api.md#icinga2-api). [API permissions](12-icinga2-api.md#icinga2-api-permissions)
are required for executing config expressions and auto-completion.
> **Note**
> The console does not currently support SSL certificate verification.
>
> The debug console does not currently support SSL certificate verification.
>
> Runtime modifications are not validated and might cause the Icinga 2
> daemon to crash or behave in an unexpected way. Use these runtime changes
> at your own risk and rather *inspect and debug objects read-only*.
You can specify the API URL using the `--connect` parameter.
Although the password can be specified there process arguments on UNIX platforms are
usually visible to other users (e.g. through `ps`). In order to securely specify the
user credentials the console supports two environment variables:
user credentials the debug console supports two environment variables:
Environment variable | Description
---------------------|-------------
@ -248,7 +256,7 @@ Here's an example that retrieves the command that was used by Icinga to check th
## <a id="cli-command-daemon"></a> CLI command: Daemon
The CLI command `daemon` provides the functionality to start/stop Icinga 2.
Furthermore it provides the [configuration validation](11-cli-commands.md#config-validation).
Furthermore it allows to run the [configuration validation](11-cli-commands.md#config-validation).
# icinga2 daemon --help
icinga2 - The Icinga 2 network monitoring daemon (version: v2.6.0)
@ -286,7 +294,7 @@ Furthermore it provides the [configuration validation](11-cli-commands.md#config
### Config Files
Using the `--config` option you can specify one or more configuration files.
You can specify one or more configuration files with the `--config` option.
Configuration files are processed in the order they're specified on the command-line.
When no configuration file is specified and the `--no-config` is not used
@ -295,7 +303,7 @@ Icinga 2 automatically falls back to using the configuration file
### Config Validation
The `--validate` option can be used to check if your configuration files
The `--validate` option can be used to check if configuration files
contain errors. If any errors are found, the exit status is 1, otherwise 0
is returned. More details in the [configuration validation](11-cli-commands.md#config-validation) chapter.
@ -374,9 +382,15 @@ nodes in a [distributed monitoring](6-distributed-monitoring.md#distributed-moni
## <a id="cli-command-object"></a> CLI command: Object
The `object` CLI command can be used to list all configuration objects and their
attributes. The command also shows where each of the attributes was modified.
attributes. The command also shows where each of the attributes was modified and as such
provides debug information for further configuration problem analysis.
That way you can also identify which objects have been created from your [apply rules](17-language-reference.md#apply).
Runtime modifications via the [REST API](12-icinga2-api.md#icinga2-api-config-objects)
are not immediately updated. Furthermore there is a known issue with
[group assign expressions](17-language-reference.md#group-assign) which are not reflected in the host object output.
You need to restart Icinga 2 in order to update the `icinga2.debug` cache file.
More information can be found in the [troubleshooting](15-troubleshooting.md#list-configuration-objects) section.
# icinga2 object --help

View File

@ -1612,6 +1612,11 @@ The following parameters need to be specified (either as URL parameters or in a
The [API permission](12-icinga2-api.md#icinga2-api-permissions) `console` is required for executing
expressions.
> **Note**
>
> Runtime modifications via `execute-script` calls are not validated and might cause the Icinga 2
> daemon to crash or behave in an unexpected way. Use these runtime changes at your own risk.
If you specify a session identifier, the same script context can be reused for multiple requests. This allows you to, for example, set a local variable in a request and use that local variable in another request. Sessions automatically expire after a set period of inactivity (currently 30 minutes).
Example for fetching the command line from the local host's last check result:
@ -1695,7 +1700,9 @@ The Windows installer already includes Icinga Studio. On Debian and Ubuntu the p
### <a id="icinga2-api-clients-cli-console"></a> Icinga 2 Console
By default the [console CLI command](11-cli-commands.md#cli-command-console) evaluates expressions in a local interpreter, i.e. independently from your Icinga 2 daemon. Using the `--connect` parameter you can use the Icinga 2 console to evaluate expressions via the API.
By default the [console CLI command](11-cli-commands.md#cli-command-console) evaluates
expressions in a local interpreter, i.e. independently from your Icinga 2 daemon.
Add the `--connect` parameter to debug and evaluate expressions via the API.
### <a id="icinga2-api-clients-programmatic-examples"></a> API Clients Programmatic Examples

View File

@ -29,7 +29,7 @@ platforms. This configuration ensures that the `icinga2.log`, `error.log` and
The IDO (Icinga Data Output) modules for Icinga 2 take care of exporting all
configuration and status information into a database. The IDO database is used
by a number of projects including Icinga Web 1.x and 2.
by Icinga Web 2.
Details on the installation can be found in the [Configuring DB IDO](2-getting-started.md#configuring-db-ido-mysql)
chapter. Details on the configuration can be found in the
@ -336,7 +336,9 @@ expects the InfluxDB daemon to listen at `127.0.0.1` on port `8086`.
More configuration details can be found [here](9-object-types.md#objecttype-influxdbwriter).
### <a id="gelfwriter"></a> GELF Writer
### <a id="graylog-integration"></a> Graylog Integration
#### <a id="gelfwriter"></a> GELF Writer
The `Graylog Extended Log Format` (short: [GELF](http://www.graylog2.org/resources/gelf))
can be used to send application logs directly to a TCP socket.
@ -358,7 +360,16 @@ Currently these events are processed:
* State changes
* Notifications
### <a id="logstash-writer"></a> Logstash Writer
### <a id="elastic-stack-integration"></a> Elastic Stack Integration
[Icingabeat](https://github.com/icinga/icingabeat) is an Elastic Beat that fetches data
from the Icinga 2 API and sends it either directly to Elasticsearch or Logstash.
More integrations in development:
* [Logstash output](https://github.com/Icinga/logstash-output-icinga) for the Icinga 2 API.
* [Logstash Grok Pattern](https://github.com/Icinga/logstash-grok-pattern) for Icinga 2 logs.
#### <a id="logstash-writer"></a> Logstash Writer
[Logstash](https://www.elastic.co/products/logstash) receives
and processes event messages sent by Icinga 2 and the [LogstashWriter](9-object-types.md#objecttype-logstashwriter)

View File

@ -1,19 +1,111 @@
# <a id="troubleshooting"></a> Icinga 2 Troubleshooting
## <a id="troubleshooting-information-required"></a> Which information is required
## <a id="troubleshooting-information-required"></a> Required Information
* Run `icinga2 troubleshoot` to collect required troubleshooting information
* Alternative, manual steps:
Please ensure to provide any detail which may help reproduce and understand your issue.
Whether you ask on the community channels or you create an issue at [GitHub](https://github.com/Icinga), make sure
that others can follow your explanations. If necessary, draw a picture and attach it for
better illustration. This is especially helpful if you are troubleshooting a distributed
setup.
We've come around many community questions and compiled this list. Add your own
findings and details please.
* Describe the expected behavior in your own words.
* Describe the actual behavior in one or two sentences.
* Ensure to provide general information such as:
* How was Icinga 2 installed (and which repository in case) and which distribution are you using
* `icinga2 --version`
* `icinga2 feature list`
* `icinga2 daemon --validate`
* Relevant output from your main and debug log ( `icinga2 object list --type='filelogger'` )
* The newest Icinga 2 crash log if relevant
* Your icinga2.conf and, if you run multiple Icinga 2 instances, your zones.conf
* How was Icinga 2 installed (and which repository in case) and which distribution are you using
* Provide complete configuration snippets explaining your problem in detail
* If the check command failed, what's the output of your manual plugin tests?
* In case of [debugging](20-development.md#development) Icinga 2, the full back traces and outputs
* `icinga2 daemon -C`
* [Icinga Web 2](https://www.icinga.com/products/icinga-web-2/) version (screenshot from System - About)
* [Icinga Web 2 modules](https://www.icinga.com/products/icinga-web-2-modules/) e.g. the Icinga Director (optional)
* Configuration insights:
* Provide complete configuration snippets explaining your problem in detail
* Your [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) file
* If you run multiple Icinga 2 instances, the [zones.conf](4-configuring-icinga-2.md#zones-conf) file (or `icinga2 object list --type Endpoint` and `icinga2 object list --type Zone`) from all affected nodes.
* Logs
* Relevant output from your main and [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) in `/var/log/icinga2`. Please add step-by-step explanations with timestamps if required.
* The newest Icinga 2 crash log if relevant, located in `/var/log/icinga2/crash`
* Additional details
* If the check command failed, what's the output of your manual plugin tests?
* In case of [debugging](20-development.md#development) Icinga 2, the full back traces and outputs
## <a id="troubleshooting-analyze-environment"></a> Analyze your Environment
There are many components involved on a server running Icinga 2. When you
analyze a problem, keep in mind that basic system administration knowledge
is also key to identify bottlenecks and issues.
> **Tip**
>
> [Monitor Icinga 2](8-advanced-topics.md#monitoring-icinga) and use the hints for further analysis.
* Analyze the system's performance and dentify bottlenecks and issues.
* Collect details about all applications (e.g. Icinga 2, MySQL, Apache, Graphite, Elastic, etc.).
* If data is exchanged via network (e.g. central MySQL cluster) ensure to monitor the bandwidth capabilities too.
* Add graphs and screenshots to your issue description
Install tools which help you to do so. Opinions differ, let us know if you have any additions here!
## <a id="troubleshooting-analyze-environment-linux"></a> Analyse your Linux/Unix Environment
[htop](http://hisham.hm/htop/) is a better replacement for `top` and helps to analyze processes
interactively.
```
yum install htop
apt-get install htop
```
If you are for example experiencing performance issues, open `htop` and take a screenshot.
Add it to your question and/or bug report.
Analyse disk I/O performance in Grafana, take a screenshot and obfuscate any sensitive details.
Attach it when posting a question to the community channels.
The [sysstat](https://github.com/sysstat/sysstat) package provides a number of tools to
analyze the performance on Linux. On FreeBSD you could use `systat` for example.
```
yum install htop
apt-get install htop
```
Example for `vmstat` (summary of memory, processes, etc.):
```
# summary
vmstat -s
# print timestamps, format in MB, stats every 1 second, 5 times
vmstat -t -S M 1 5
```
Example for `iostat`:
```
watch -n 1 iostat
```
Example for `sar`:
```
sar //cpu
sar -r //ram
sar -q //load avg
sar -b //I/O
```
`sysstat` also provides the `iostat` binary. On FreeBSD you could use `systat` for example.
If you are missing checks and metrics found in your analysis, add them to your monitoring!
## <a id="troubleshooting-analyze-environment-windows"></a> Analyze your Windows Environment
A good tip for Windows are the tools found inside the [Sysinternals Suite](https://technet.microsoft.com/en-us/sysinternals/bb842062.aspx).
You can also start `perfmon` and analyze specific performance counters.
Keep notes which could be important for your monitoring, and add service
checks later on.
## <a id="troubleshooting-enable-debug-output"></a> Enable Debug Output
@ -22,14 +114,14 @@ Enable the `debuglog` feature:
# icinga2 feature enable debuglog
# service icinga2 restart
You can find the debug log file in `/var/log/icinga2/debug.log`.
The debug log file can be found in `/var/log/icinga2/debug.log`.
Alternatively you may run Icinga 2 in the foreground with debugging enabled. Specify the console
log severity as an additional parameter argument to `-x`.
# /usr/sbin/icinga2 daemon -x notice
The log level can be one of `critical`, `warning`, `information`, `notice`
The [log severity](9-object-types.md#objecttype-filelogger) can be one of `critical`, `warning`, `information`, `notice`
and `debug`.
## <a id="list-configuration-objects"></a> List Configuration Objects
@ -98,10 +190,16 @@ You can also filter by name and type:
[2014-10-15 14:27:19 +0200] information/cli: Parsed 175 objects.
Runtime modifications via the [REST API](12-icinga2-api.md#icinga2-api-config-objects)
are not immediately updated. Furthermore there is a known issue with
[group assign expressions](17-language-reference.md#group-assign) which are not reflected in the host object output.
You need to restart Icinga 2 in order to update the `icinga2.debug` cache file.
## <a id="check-command-definitions"></a> Where are the check command definitions?
Icinga 2 features a number of built-in [check command definitions](10-icinga-template-library.md#plugin-check-commands) which are
included using
included with
include <itl>
include <plugins>
@ -123,7 +221,8 @@ for their check result containing the executed shell command.
to fetch the checkable object, its check result and the executed shell command.
* Alternatively enable the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) and look for the executed command.
Example for a service object query using a [regex match]() on the name:
Example for a service object query using a [regex match](18-library-reference.md#global-functions-regex)
on the name:
$ curl -k -s -u root:icinga -H 'Accept: application/json' -H 'X-HTTP-Method-Override: GET' -X POST 'https://localhost:5665/v1/objects/services' \
-d '{ "filter": "regex(pattern, service.name)", "filter_vars": { "pattern": "^http" }, "attrs": [ "__name", "last_check_result" ] }' | python -m json.tool
@ -194,17 +293,99 @@ Fetch all check result events matching the `event.service` name `random`:
$ curl -k -s -u root:icinga -X POST 'https://localhost:5665/v1/events?queue=debugchecks&types=CheckResult&filter=match%28%22random*%22,event.service%29'
### <a id="late-check-results"></a> Late Check Results
[Icinga Web 2](https://www.icinga.com/products/icinga-web-2/) provides
a dashboard overview for `overdue checks`.
The REST API provides the [status] URL endpoint with some generic metrics
on Icinga and its features.
# curl -k -s -u root:icinga 'https://localhost:5665/v1/status' | python -m json.tool | less
You can also calculate late check results via the REST API:
* Fetch the `last_check` timestamp from each object
* Compare the timestamp with the current time and add `check_interval` multiple times (change it to see which results are really late, like five times check_interval)
You can use the [icinga2 console](11-cli-commands.md#cli-command-console) to connect to the instance, fetch all data
and calculate the differences. More infos can be found in [this blogpost](https://www.icinga.com/2016/08/11/analyse-icinga-2-problems-using-the-console-api/).
# ICINGA2_API_USERNAME=root ICINGA2_API_PASSWORD=icinga icinga2 console --connect 'https://localhost:5665/'
<1> => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res
[ [ "10807-host!10807-service", "2016-06-10 15:54:55 +0200" ], [ "mbmif.int.netways.de!disk /", "2016-01-26 16:32:29 +0100" ] ]
Or if you are just interested in numbers, call [len](18-library-reference.md#array-len) on the result array `res`:
<2> => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res.len()
2.000000
If you need to analyze that problem multiple times, just add the current formatted timestamp
and repeat the commands.
<23> => DateTime(get_time()).to_string()
"2017-04-04 16:09:39 +0200"
<24> => var res = []; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res.add([s.__name, DateTime(s.last_check).to_string()]) }; res.len()
8287.000000
More details about the Icinga 2 DSL and its possibilities can be
found in the [language](17-language-reference.md#language-reference) and [library](18-library-reference.md#library-reference) reference chapters.
### <a id="late-check-results-distributed"></a> Late Check Results in Distributed Environments
When it comes to a distributed HA setup, each node is responsible for a load-balanced amount of checks.
Host and Service objects provide the attribute `paused`. If this is set to `false`, the current node
actively attempts to schedule and execute checks. Otherwise the node does not feel responsible.
<3> => var res = {}; for (s in get_objects(Service).filter(s => s.last_check < get_time() - 2 * s.check_interval)) { res[s.paused] += 1 }; res
{
@false = 2.000000
@true = 1.000000
}
You may ask why this analysis is important? Fair enough - if the numbers are not inverted in a HA zone
with two members, this may give a hint that the cluster nodes are in a split-brain scenario, or you've
found a bug in the cluster.
If you are running a cluster setup where the master/satellite executes checks on the client via
[top down command endpoint](6-distributed-monitoring.md#distributed-monitoring-top-down-command-endpoint) mode,
you might want to know which zones are affected.
This analysis assumes that clients which are not connected, have the string `connected` in their
service check result output and their state is `UNKNOWN`.
<4> => var res = {}; for (s in get_objects(Service)) { if (s.state==3) { if (match("*connected*", s.last_check_result.output)) { res[s.zone] += [s.host_name] } } }; for (k => v in res) { res[k] = len(v.unique()) }; res
{
Asia = 31.000000
Europe = 214.000000
USA = 207.000000
}
The result set shows the configured zones and their affected hosts in a unique list. The output also just prints the numbers
but you can adjust this by omitting the `len()` call inside the for loop.
## <a id="notifications-not-sent"></a> Notifications are not sent
* Check the debug log to see if a notification is triggered.
* Check the [debug log](15-troubleshooting.md#troubleshooting-enable-debug-output) to see if a notification is triggered.
* If yes, verify that all conditions are satisfied.
* Are any errors on the notification command execution logged?
Please ensure to add these details with your own description
to any question or issue posted to the community channels.
Verify the following configuration:
* Is the host/service `enable_notifications` attribute set, and if so, to which value?
* Do the notification attributes `states`, `types`, `period` match the notification conditions?
* Do the user attributes `states`, `types`, `period` match the notification conditions?
* Do the [notification](9-object-types.md#objecttype-notification) attributes `states`, `types`, `period` match the notification conditions?
* Do the [user](9-object-types.md#objecttype-user) attributes `states`, `types`, `period` match the notification conditions?
* Are there any notification `begin` and `end` times configured?
* Make sure the [notification](11-cli-commands.md#enable-features) feature is enabled.
* Does the referenced NotificationCommand work when executed as Icinga user on the shell?
@ -232,18 +413,33 @@ to `features-enabled` and that the latter is included in [icinga2.conf](4-config
* Are the feature attributes set correctly according to the documentation?
* Any errors on the logs?
Look up the [object type](9-object-types.md#object-types) for the required feature and verify it is enabled:
# icinga2 object list --type <feature object type>
Example for the `graphite` feature:
# icinga2 object list --type GraphiteWriter
## <a id="configuration-ignored"></a> Configuration is ignored
* Make sure that the line(s) are not [commented out](17-language-reference.md#comments) (starting with `//` or `#`, or
encapsulated by `/* ... */`).
* Is the configuration file included in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf)?
Run the [configuration validation](11-cli-commands.md#config-validation) and add `notice` as log severity.
Search for the file which should be included i.e. using the `grep` CLI command.
# icinga2 daemon -C -x notice | grep command
## <a id="configuration-attribute-inheritance"></a> Configuration attributes are inherited from
Icinga 2 allows you to import templates using the [import](17-language-reference.md#template-imports) keyword. If these templates
contain additional attributes, your objects will automatically inherit them. You can override
or modify these attributes in the current object.
The [object list](15-troubleshooting.md#list-configuration-objects) CLI command allows you to verify the attribute origin.
## <a id="configuration-value-dollar-sign"></a> Configuration Value with Single Dollar Sign
In case your configuration validation fails with a missing closing dollar sign error message, you
@ -251,6 +447,9 @@ did not properly escape the single dollar sign preventing its usage as [runtime
critical/config: Error: Validation failed for Object 'ping4' (Type: 'Service') at /etc/icinga2/zones.d/global-templates/windows.conf:24: Closing $ not found in macro format string 'top-syntax=${list}'.
Correct the custom attribute value to
"top-syntax=$${list}"
## <a id="troubleshooting-cluster"></a> Cluster and Clients Troubleshooting
@ -261,19 +460,19 @@ done so already.
> **Note**
>
> Some problems just exist due to wrong file permissions or packet filters applied. Make
> Some problems just exist due to wrong file permissions or applied packet filters. Make
> sure to check these in the first place.
### <a id="troubleshooting-cluster-connection-errors"></a> Cluster Troubleshooting Connection Errors
General connection errors normally lead you to one of the following problems:
General connection errors could be one of the following problems:
* Wrong network configuration
* Packet loss on the connection
* Incorrect network configuration
* Packet loss
* Firewall rules preventing traffic
Use tools like `netstat`, `tcpdump`, `nmap`, etc. to make sure that the cluster communication
happens (default port is `5665`).
works (default port is `5665`).
# tcpdump -n port 5665 -i any

View File

@ -4,9 +4,9 @@
These functions are globally available in [assign/ignore where expressions](3-monitoring-basics.md#using-apply-expressions),
[functions](17-language-reference.md#functions), [API filters](12-icinga2-api.md#icinga2-api-filters)
and the [Icinga 2 console](11-cli-commands.md#cli-command-console).
and the [Icinga 2 debug console](11-cli-commands.md#cli-command-console).
You can use the [Icinga 2 console](11-cli-commands.md#cli-command-console)
You can use the [Icinga 2 debug console](11-cli-commands.md#cli-command-console)
as a sandbox to test these functions before implementing
them in your scenarios.

View File

@ -4,9 +4,8 @@ This chapter provides an introduction into best practices with your Icinga 2 con
The configuration files which are automatically created when installing the Icinga 2 packages
are a good way to start with Icinga 2.
If you're interested in a detailed explanation of each language feature used in those
configuration files, you can find more information in the [Language Reference](17-language-reference.md#language-reference)
chapter.
The [Language Reference](17-language-reference.md#language-reference) chapter explains details
on value types (string, number, dictionaries, etc.) and the general configuration syntax.
## <a id="configuration-best-practice"></a> Configuration Best Practice
@ -17,12 +16,12 @@ decide for a possible strategy.
There are many ways of creating Icinga 2 configuration objects:
* Manually with your preferred editor, for example vi(m), nano, notepad, etc.
* A configuration tool for Icinga 2 e.g. the [Icinga Director](https://github.com/Icinga/icingaweb2-module-director)
* Generated by a [configuration management tool](13-addons.md#configuration-tools) such as Puppet, Chef, Ansible, etc.
* A configuration addon for Icinga 2 ([Icinga Director](https://github.com/Icinga/icingaweb2-module-director))
* A custom exporter script from your CMDB or inventory tool
* your own.
* etc.
In order to find the best strategy for your own configuration, ask yourself the following questions:
Find the best strategy for your own configuration and ask yourself the following questions:
* Do your hosts share a common group of services (for example linux hosts with disk, load, etc. checks)?
* Only a small set of users receives notifications and escalations for all hosts/services?
@ -36,11 +35,12 @@ host and service basis.
Then you should look for the object specific configuration setting `host_name` etc. accordingly.
Finding the best files and directory tree for your configuration is up to you. Make sure that
the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file includes them,
and then think about:
You decide on the "best" layout for configuration files and directories. Ensure that
the [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file includes them.
* tree-based on locations, hostgroups, specific host attributes with sub levels of directories.
Consider these ideas:
* tree-based on locations, host groups, specific host attributes with sub levels of directories.
* flat `hosts.conf`, `services.conf`, etc. files for rule based configuration.
* generated configuration with one file per host and a global configuration for groups, users, etc.
* one big file generated from an external application (probably a bad idea for maintaining changes).
@ -62,12 +62,33 @@ If you are planning to use a distributed monitoring setup with master, satellite
take the configuration location into account too. Everything configured on the master, synced to all other
nodes? Or any specific local configuration (e.g. health checks)?
TODO
There is a detailed chapter on [distributed monitoring scenarios](6-distributed-monitoring.md#distributed-monitoring-scenarios).
Please ensure to have read the [introduction](6-distributed-monitoring.md#distributed-monitoring) at first glance.
If you happen to have further questions, do not hesitate to join the
[community support channels](https://www.icinga.com/community/get-involved/)
and ask community members for their experience and best practices.
## <a id="your-configuration"></a> Your Configuration
If you prefer to organize your own local object tree, you can also remove
`include_recursive "conf.d"` from your icinga2.conf file.
Create a new configuration directory, e.g. `objects.d` and include it
in your icinga2.conf file.
[root@icinga2-master1.localdomain /]# mkdir -p /etc/icinga2/objects.d
[root@icinga2-master1.localdomain /]# vim /etc/icinga2/icinga2.conf
/* Local object configuration on our master instance. */
include_recursive "objects.d"
This approach is used by the [Icinga 2 Puppet module](https://github.com/Icinga/puppet-icinga2).
If you plan to setup a distributed setup with HA clusters and clients, please refer to [this chapter](#6-distributed-monitoring.md#distributed-monitoring-top-down)
for examples with `zones.d` as configuration directory.
## <a id="configuring-icinga2-overview"></a> Configuration Overview
### <a id="icinga2-conf"></a> icinga2.conf
@ -148,6 +169,10 @@ This `include_recursive` directive is used for discovery of services on remote c
and their generated configuration described in
[this chapter](6-distributed-monitoring.md#distributed-monitoring-bottom-up).
**Note**: This has been DEPRECATED in Icinga 2 v2.6 and is **not** required for
satellites and clients using the [top down approach](#6-distributed-monitoring.md#distributed-monitoring-top-down).
You can safely disable/remove it.
/**
* Although in theory you could define all your objects in this file
@ -177,7 +202,6 @@ Example:
/* The directory which contains the plugins from the Monitoring Plugins project. */
const PluginDir = "/usr/lib64/nagios/plugins"
/* The directory which contains the Manubulon plugins.
* Check the documentation, chapter "SNMP Manubulon Plugin Check Commands", for details.
*/
@ -197,9 +221,24 @@ Example:
The `ZoneName` and `TicketSalt` constants are required for remote client
and distributed setups only.
### <a id="zones-conf"></a> zones.conf
This file can be used to specify the required [Zone](9-object-types.md#objecttype-zone)
and [Endpoint](9-object-types.md#objecttype-endpoint) configuration object for
[distributed monitoring](6-distributed-monitoring.md#distributed-monitoring).
By default the `NodeName` and `ZoneName` [constants](4-configuring-icinga-2.md#constants-conf) will be used.
It also contains several [global zones](6-distributed-monitoring.md#distributed-monitoring-global-zone-config-sync)
for distributed monitoring environments.
Please ensure to modify this configuration with real names i.e. use the FQDN
mentioned in [this chapter](6-distributed-monitoring.md#distributed-monitoring-conventions)
for your `Zone` and `Endpoint` object names.
### <a id="conf-d"></a> The conf.d Directory
This directory contains example configuration which should help you get started
This directory contains **example configuration** which should help you get started
with monitoring the local host and its services. It is included in the
[icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) configuration file by default.
@ -207,8 +246,10 @@ It can be used as reference example for your own configuration strategy.
Just keep in mind to include the main directories in the
[icinga2.conf](4-configuring-icinga-2.md#icinga2-conf) file.
You are certainly not bound to it. Remove it if you prefer your own
way of deploying Icinga 2 configuration.
> **Note**
>
> You can remove the include directive in [icinga2.conf](4-configuring-icinga-2.md#icinga2-conf)
> if you prefer your own way of deploying Icinga 2 configuration.
Further details on configuration best practice and how to build your
own strategy is described in [this chapter](4-configuring-icinga-2.md#configuration-best-practice).

View File

@ -183,6 +183,8 @@ Instead, choose a plugin and configure its parameters and thresholds. The follow
* [disk](10-icinga-template-library.md#plugin-check-command-disk)
* [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap)
* [procs](10-icinga-template-library.md#plugin-check-command-processes)
* [users](10-icinga-template-library.md#plugin-check-command-users)
* [running_kernel](10-icinga-template-library.md#plugin-contrib-command-running_kernel)
* package management: [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum), etc.
* [ssh](10-icinga-template-library.md#plugin-check-command-ssh)
@ -269,6 +271,7 @@ check [this blog entry](http://www.claudiokuenzler.com/blog/650/slow-vmware-perl
* [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [ssmtp](10-icinga-template-library.md#plugin-check-command-ssmtp)
* [imap](10-icinga-template-library.md#plugin-check-command-imap), [simap](10-icinga-template-library.md#plugin-check-command-simap)
* [pop](10-icinga-template-library.md#plugin-check-command-pop), [spop](10-icinga-template-library.md#plugin-check-command-spop)
* [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
### <a id="service-monitoring-hardware"></a> Hardware Monitoring

View File

@ -343,7 +343,121 @@ and adds the excluded time period names as an array.
}
}
## <a id="advanced-use-of-apply-rules"></a> Advanced Use of Apply Rules
## <a id="check-result-freshness"></a> Check Result Freshness
In Icinga 2 active check freshness is enabled by default. It is determined by the
`check_interval` attribute and no incoming check results in that period of time.
threshold = last check execution time + check interval
Passive check freshness is calculated from the `check_interval` attribute if set.
threshold = last check result time + check interval
If the freshness checks are invalid, a new check is executed defined by the
`check_command` attribute.
## <a id="check-flapping"></a> Check Flapping
The flapping algorithm used in Icinga 2 does not store the past states but
calculates the flapping threshold from a single value based on counters and
half-life values. Icinga 2 compares the value with a single flapping threshold
configuration attribute named `flapping_threshold`.
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
## <a id="volatile-services"></a> Volatile Services
By default all services remain in a non-volatile state. When a problem
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
is reached with the check counter, a `HARD` state transition happens.
Notifications are only triggered by `HARD` state changes and are then
re-sent defined by the `interval` attribute.
It may be reasonable to have a volatile service which stays in a `HARD`
state type if the service stays in a `NOT-OK` state. That way each
service recheck will automatically trigger a notification unless the
service is acknowledged or in a scheduled downtime.
## <a id="monitoring-icinga"></a> Monitoring Icinga 2
Why should you do that? Icinga and its components run like any other
service application on your server. There are predictable issues
such as "disk space is running low" and your monitoring suffers from just
that.
You would also like to ensure that features and backends are running
and storing required data. Be it the database backend where Icinga Web 2
presents fancy dashboards, forwarded metrics to Graphite or InfluxDB or
the entire distributed setup.
This list isn't complete but should help with your own setup.
Windows client specific checks are highlighted.
Type | Description | Plugins and CheckCommands
----------------|-------------------------------|-----------------------------------------------------
System | Filesystem | [disk](10-icinga-template-library.md#plugin-check-command-disk), [disk-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
System | Memory, Swap | [mem](10-icinga-template-library.md#plugin-contrib-command-mem), [swap](10-icinga-template-library.md#plugin-check-command-swap), [memory](10-icinga-template-library.md#windows-plugins) (Windows Client)
System | Hardware | [hpasm](10-icinga-template-library.md#plugin-contrib-command-hpasm), [ipmi-sensor](10-icinga-template-library.md#plugin-contrib-command-ipmi-sensor)
System | Virtualization | [VMware](10-icinga-template-library.md#plugin-contrib-vmware), [esxi_hardware](10-icinga-template-library.md#plugin-contrib-command-esxi-hardware)
System | Processes | [procs](10-icinga-template-library.md#plugin-check-command-processes), [service-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
System | System Activity Reports | [check_sar_perf](https://github.com/dnsmichi/icinga-plugins/blob/master/scripts/check_sar_perf.py)
System | I/O | [iostat](10-icinga-template-library.md#plugin-contrib-command-iostat)
System | Network interfaces | [nwc_health](10-icinga-template-library.md#plugin-contrib-command-nwc_health), [interfaces](10-icinga-template-library.md#plugin-contrib-command-interfaces)
System | Users | [users](10-icinga-template-library.md#plugin-check-command-users), [users-windows](10-icinga-template-library.md#windows-plugins) (Windows Client)
System | Logs | Forward them to [Elastic Stack](14-features.md#elastic-stack-integration) or [Graylog](14-features.md#graylog-integration) and add your own alerts.
System | NTP | [ntp_time](10-icinga-template-library.md#plugin-check-command-ntp-time)
System | Updates | [apt](10-icinga-template-library.md#plugin-check-command-apt), [yum](10-icinga-template-library.md#plugin-contrib-command-yum)
Icinga | Status & Stats | [icinga](10-icinga-template-library.md#itl-icinga) (more below)
Icinga | Cluster & Clients | [health checks](6-distributed-monitoring.md#distributed-monitoring-health-checks)
Database | MySQL | [mysql_health](10-icinga-template-library.md#plugin-contrib-command-mysql_health)
Database | PostgreSQL | [postgres](10-icinga-template-library.md#plugin-contrib-command-postgres)
Database | Housekeeping | Check the database size and growth and analyse metrics to examine trends.
Database | DB IDO | [ido](10-icinga-template-library.md#itl-icinga-ido) (more below)
Webserver | Apache2, Nginx, etc. | [http](10-icinga-template-library.md#plugin-check-command-http), [apache_status](10-icinga-template-library.md#plugin-contrib-command-apache_status), [nginx_status](10-icinga-template-library.md#plugin-contrib-command-nginx_status)
Webserver | Certificates | [http](10-icinga-template-library.md#plugin-check-command-http)
Webserver | Authorization | [http](10-icinga-template-library.md#plugin-check-command-http)
Notifications | Mail (queue) | [smtp](10-icinga-template-library.md#plugin-check-command-smtp), [mailq](10-icinga-template-library.md#plugin-check-command-mailq)
Notifications | SMS (GSM modem) | [check_sms3_status](https://exchange.icinga.com/netways/check_sms3status)
Notifications | Messengers, Cloud services | XMPP, Twitter, IRC, Telegram, PagerDuty, VictorOps, etc.
Metrics | PNP, RRDTool | [check_pnp_rrds](https://github.com/lingej/pnp4nagios/tree/master/scripts) checks for stale RRD files.
Metrics | Graphite | [graphite](10-icinga-template-library.md#plugin-contrib-command-graphite)
Metrics | InfluxDB | [check_influxdb](https://exchange.icinga.com/Mikanoshi/InfluxDB+data+monitoring+plugin)
Metrics | Elastic Stack | [elasticsearch](10-icinga-template-library.md#plugin-contrib-command-elasticsearch), [Elastic Stack integration](14-features.md#elastic-stack-integration)
Metrics | Graylog | [Graylog integration](14-features.md#graylog-integration)
The [icinga](10-icinga-template-library.md#itl-icinga) CheckCommand provides metrics for the runtime stats of
Icinga 2. You can forward them to your preferred graphing solution.
If you require more metrics you can also query the [REST API](12-icinga2-api.md#icinga2-api) and write
your own custom check plugin. Or you keep using the built-in [object accessor functions](8-advanced-topics.md#access-object-attributes-at-runtime)
to calculate stats in-memory.
There is a built-in [ido](10-icinga-template-library.md#itl-icinga-ido) check available for DB IDO MySQL/PostgreSQL
which provides additional metrics for the IDO database.
```
apply Service "ido-mysql" {
check_command = "ido"
vars.ido_type = "IdoMysqlConnection"
vars.ido_name = "ido-mysql" //the name defined in /etc/icinga2/features-enabled/ido-mysql.conf
assign where match("master*.localdomain", host.name)
}
```
More specific database queries can be found in the [DB IDO](14-features.md#db-ido) chapter.
Distributed setups should include specific [health checks](6-distributed-monitoring.md#distributed-monitoring-health-checks).
You might also want to add additional checks for SSL certificate expiration.
## <a id="advanced-configuration-hints"></a> Advanced Configuration Hints
### <a id="advanced-use-of-apply-rules"></a> Advanced Use of Apply Rules
[Apply rules](3-monitoring-basics.md#using-apply) can be used to create a rule set which is
entirely based on host objects and their attributes.
@ -426,7 +540,7 @@ service checks in this example.
In addition to defining check parameters this way, you can also enrich the `display_name`
attribute with more details. This will be shown in in Icinga Web 2 for example.
## <a id="use-functions-object-config"></a> Use Functions in Object Configuration
### <a id="use-functions-object-config"></a> Use Functions in Object Configuration
There is a limited scope where functions can be used as object attributes such as:
@ -449,7 +563,7 @@ inside the `icinga2.log` file depending in your log severity
* Use the `icinga2 console` to test basic functionality (e.g. iterating over a dictionary)
* Build them step-by-step. You can always refactor your code later on.
### <a id="use-functions-command-arguments-setif"></a> Use Functions in Command Arguments set_if
#### <a id="use-functions-command-arguments-setif"></a> Use Functions in Command Arguments set_if
The `set_if` attribute inside the command arguments definition in the
[CheckCommand object definition](9-object-types.md#objecttype-checkcommand) is primarily used to
@ -528,7 +642,7 @@ The more programmatic approach for `set_if` could look like this:
}
### <a id="use-functions-command-attribute"></a> Use Functions as Command Attribute
#### <a id="use-functions-command-attribute"></a> Use Functions as Command Attribute
This comes in handy for [NotificationCommands](9-object-types.md#objecttype-notificationcommand)
or [EventCommands](9-object-types.md#objecttype-eventcommand) which does not require
@ -582,7 +696,7 @@ You can omit the `log()` calls, they only help debugging.
}
}
### <a id="custom-functions-as-attribute"></a> Use Custom Functions as Attribute
#### <a id="custom-functions-as-attribute"></a> Use Custom Functions as Attribute
To use custom functions as attributes, the function must be defined in a
slightly unexpected way. The following example shows how to assign values
@ -609,7 +723,7 @@ as value for `ping_wrta`, all other hosts use 100.
assign where true
}
### <a id="use-functions-assign-where"></a> Use Functions in Assign Where Expressions
#### <a id="use-functions-assign-where"></a> Use Functions in Assign Where Expressions
If a simple expression for matching a name or checking if an item
exists in an array or dictionary does not fit, you should consider
@ -698,7 +812,7 @@ with the `vars_app` dictionary.
assign where check_app_type(host, "ABAP")
}
## <a id="access-object-attributes-at-runtime"></a> Access Object Attributes at Runtime
### <a id="access-object-attributes-at-runtime"></a> Access Object Attributes at Runtime
The [Object Accessor Functions](18-library-reference.md#object-accessor-functions)
can be used to retrieve references to other objects by name.
@ -801,40 +915,3 @@ time of the day compared to the defined time period.
}
## <a id="check-result-freshness"></a> Check Result Freshness
In Icinga 2 active check freshness is enabled by default. It is determined by the
`check_interval` attribute and no incoming check results in that period of time.
threshold = last check execution time + check interval
Passive check freshness is calculated from the `check_interval` attribute if set.
threshold = last check result time + check interval
If the freshness checks are invalid, a new check is executed defined by the
`check_command` attribute.
## <a id="check-flapping"></a> Check Flapping
The flapping algorithm used in Icinga 2 does not store the past states but
calculates the flapping threshold from a single value based on counters and
half-life values. Icinga 2 compares the value with a single flapping threshold
configuration attribute named `flapping_threshold`.
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
## <a id="volatile-services"></a> Volatile Services
By default all services remain in a non-volatile state. When a problem
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
is reached with the check counter, a `HARD` state transition happens.
Notifications are only triggered by `HARD` state changes and are then
re-sent defined by the `interval` attribute.
It may be reasonable to have a volatile service which stays in a `HARD`
state type if the service stays in a `NOT-OK` state. That way each
service recheck will automatically trigger a notification unless the
service is acknowledged or in a scheduled downtime.