Introduce redundancy groups for Dependency Objects

Traditional behaviour was to regard all dependecies as cumulative (e.g., the parent considered unreachable if any one dependency is violated), commit ed5892238916ab667a4c9d904bd73acd3ed162f2 made all dependencies regarded redundant (e.g., the parent considered unreachable only if all dependency are violated). This may lead to unrelated services (or even hosts vs. services) inadvertantly regarded to be redundant to each other.

Most importantly, applying the explicit "disable-host-service-checks" dependency described in the "Monitoring Basics" chapter will defeat all other dependencies.

This commit introduces a new "redundancy_group" attribute for dependencies.
Specifying a redundancy_group causes a dependency to be regarded as redundant only inside that redundancy group.
Dependencies lacking a redundancy_group attribute are regarded as essential for the parent.

This allows for both cumulative and redundant dependencies and even a combination (cumulation of redundancies, like SSH depeding on both LDAP and DNS to function, while operating redundant LDAP servers as well as redundant DNS resolvers).

This commit lacks changes to the tests.
This commit is contained in:
Edgar Fuß 2020-09-08 17:25:45 +02:00 committed by Alexander A. Klimov
parent d9767cff3f
commit cfef9fdadc
3 changed files with 40 additions and 9 deletions

View File

@ -201,6 +201,7 @@ Configuration Attributes:
parent\_service\_name | Object name | **Optional.** The parent service. If omitted, this dependency object is treated as host dependency.
child\_host\_name | Object name | **Required.** The child host.
child\_service\_name | Object name | **Optional.** The child service. If omitted, this dependency object is treated as host dependency.
redundancy\_group | String | **Optional.** Puts the dependency into a group of mutually redundant ones. See discussion below.
disable\_checks | Boolean | **Optional.** Whether to disable checks (i.e., don't schedule active checks and drop passive results) when this dependency fails. Defaults to false.
disable\_notifications | Boolean | **Optional.** Whether to disable notifications when this dependency fails. Defaults to true.
ignore\_soft\_states | Boolean | **Optional.** Whether to ignore soft states for the reachability calculation. Defaults to true.
@ -218,6 +219,16 @@ Up
Down
```
Redundancy groups:
Sometimes, you want a dependencies to accumulate (e.g., the parent considered reachable only if no dependency is violated), sometimes you want them to be regarded as redundant (e.g., the parent considered unreachable only if no dependency is fulfilled) or even a mixture of both. Think of a host connected to both a network and a storage switch vs. a host connected to redundant routers or a service like SSH depeding on both LDAP and DNS to function, while operating redundant LDAP servers as well as redundant DNS resolvers.
Behaviour prior to 2.12.0 was to regard all dependecies as cumulative; 2.12.0 made all dependencies regareded redundant.
This may lead to unrelated services inadvertantly regarded to be redundant to each other.
Specifying a `redundancy_group` causes a dependency to be regarded as redundant only inside that redundancy group.
Dependencies lacking a `redundancy_group` attribute are regarded as essential for the parent.
When using [apply rules](03-monitoring-basics.md#using-apply) for dependencies, you can leave out certain attributes which will be
automatically determined by Icinga 2.

View File

@ -74,25 +74,43 @@ bool Checkable::IsReachable(DependencyType dt, Dependency::Ptr *failedDependency
auto deps = GetDependencies();
int countDeps = deps.size();
int countFailed = 0;
std::unordered_map<std::string, Dependency::Ptr> violated; // key: redundancy group, value: nullptr if satisfied, violating dependency otherwise
for (const Dependency::Ptr& dep : deps) {
if (!dep->IsAvailable(dt)) {
countFailed++;
std::string redundancy_group = dep->GetRedundancyGroup();
if (failedDependency)
*failedDependency = dep;
if (!dep->IsAvailable(dt)) {
if (redundancy_group.empty()) {
Log(LogDebug, "Checkable")
<< "Non-redundant dependency '" << dep->GetName() << "' failed for checkable '" << GetName() << "': Marking as unreachable.";
if (failedDependency)
*failedDependency = dep;
return false;
}
// tentatively mark this dependency group as failed unless it is already marked;
// so it either passed before (don't overwrite) or already failed (so don't care)
if (violated.find(redundancy_group) == violated.end())
violated.insert(std::make_pair(redundancy_group, dep));
} else if (!redundancy_group.empty()) {
// definitely mark this dependency group as passed
violated.insert(std::make_pair(redundancy_group, nullptr));
}
}
/* If there are dependencies, and all of them failed, mark as unreachable. */
if (countDeps > 0 && countFailed == countDeps) {
auto violator = std::find_if(violated.begin(), violated.end(), [](const std::pair<std::string, Dependency::Ptr>&v) { return v.second != nullptr; });
if (violator != violated.end()) {
Log(LogDebug, "Checkable")
<< "All dependencies have failed for checkable '" << GetName() << "': Marking as unreachable.";
<< "All dependencies in redundancy group '" << violator->first << "' have failed for checkable '" << GetName() << "': Marking as unreachable.";
if (failedDependency)
*failedDependency = violator->second;
return false;
}
if (failedDependency)
*failedDependency = nullptr;

View File

@ -77,6 +77,8 @@ class Dependency : CustomVarObject < DependencyNameComposer
}}}
};
[config] String redundancy_group;
[config, navigation] name(TimePeriod) period (PeriodRaw) {
navigate {{{
return TimePeriod::GetByName(GetPeriodRaw());