Previously, the `command#timeout` which by default is `1m`, was used to reschedule
the just sent remote check. However, this results into a bunch of extra checks being
sent to the remote host, even though the first one is still running. That's because
if one want to override the default timeout of the command for a specific host/service,
one has to set the `checkable#check_timeout` attribute to the desired value. So, this
commit makes sure that the `checkable#check_timeout` attribute (if set) is used to
reschedule the remote check.
Previously, recovery and ACK notifications were not delivered to users
who weren't notified about the problem state while having a configured
`Problem` type filter. However, since the type filter can also be
configured on the `Notification` object level, this resulted to an
incorrect behaviour. This PR changes the existing logic so that the
recovery and ACK notifications gets dropped only if the `Problem` filter
is configured on both the `User` and `Notification` object levels.
Previously, if you enable flapping for a Checkable but the corresponding
`Notification` object does not have `FlappingStart` or `FlappingEnd`
types set, the `no_more_notifications` flag wasn't reset to false again.
This commit ensures that this flag is always reset on `Recovery` even
the type filter does not match including when we miss the `Recovery` due
to Flapping state.
Notification objects can refer User and Group objects similar to how they can
refer Host and Service objects, so that dependency feels quite natural. Note
that for evaluating most configuration, this order doesn't really matter, the
configuration will successfully evaluate in either case, the difference can be
noticed mainly in more advanced configurations, for example when dynamically
assigning user based on their groups. When accessing user objects from the
Notification object definition (like in the following example), without this
change, only groups configured directly in groups attribute of User objects are
visible and those added via assign clauses in UserGroup objects are missing.
With this commit, these are also visible.
apply Notification "n" to Host {
for (var u in get_objects(User)) {
log(u.name + " -> " + Json.encode(u.groups))
}
# [...]
}
There are inputs to mktime() where the behavior is not specified and there's
also no single obviously correct behavior. In particular, this affects how
auto-detection of whether DST is in effect is done when tm_isdst = -1 is set
and the time specified does not exist at all or exists twice on that day.
If different implementations are used within an Icinga 2 cluster, that can lead
to inconsistent behavior because different nodes may interpret the same
TimePeriod differently.
This commit introduces a wrapper to mktime(), namely Utility::NormalizeTm()
that implements the behavior provided by glibc. The choice for glibc's behavior
is pretty arbitrary, it was simply picked because most systems that are
officially/fully supported use it (with the only exception being Windows), so
this should give the least possible amount of user-visible changes.
As part of this commit, the closely related helper function mktime_const() is
also moved to Utility::TmToTimestamp() and made a wrapper around the newly
introduced NormalizeTm().
for (const T& needle : haystack) creates the illusion that haystack is a
container of T and we're just borrowing needle. In these cases that's not true.
The timestamps used both in the CheckerComponent and Checkable debug
logs were printed in the scientific notation, making them effectively
useless.
> debug/CheckerComponent: Scheduling info for checkable 'host!service' (2025-02-26 14:53:16 +0100): Object 'host!service', Next Check: 2025-02-26 14:53:16 +0100(1.74058e+09).
> debug/Checkable: Update checkable 'host!service' with check interval '300' from last check time at 2025-02-26 14:48:47 +0100 (1.74058e+09) to next check time at 2025-02-26 14:58:12 +0100 (1.74058e+09).
Switching to std::fixed actually shows the complete Unix timestamp.
> debug/CheckerComponent: Scheduling info for checkable 'host!service' (2025-02-26 15:36:44 +0000): Object 'host!service', Next Check: 2025-02-26 15:36:44 +0000 (1740584204).
> debug/Checkable: Update checkable 'host!service' with check interval '60' from last check time at 2025-02-26 15:37:11 +0000 (1740584232) to next check time at 2025-02-26 15:38:09 +0000 (1740584290).
So far, Service::GetSeverity() only considered the state of its own host, i.e.
the implicit service to its own host dependency, and treated it similar to
acknowledgements and downtimes. In contrast, Host::GetSeverity() considered
reachability and treated it like a state, i.e. for the severity calculation,
the host was either up, down, or unreachable.
This commit changes the following things:
1. Make the service severity also consider explicitly configured dependencies
by using IsReachable().
2. Prefer acknowledgements and downtimes over unreachability in the severity
calculation so that if an already acknowledged or in-downtime services (i.e.
already handled service) becomes unreachable, it shouln't become more
severe.
3. To unify host and service severities a bit, hosts now use the same logic
that treats reachability more like acknowledgements/downtimes instead of
like a state (changing the other way around would the state from the check
plugin would not affect the severity for unrachable services anymore).
No change in functionality. The first two branches actually set the final
return value for the method, so they can just return directly, removing the
need to have the rest of the function inside an else block.
No change in functionality. The first two branches actually set the final
return value for the method, so they can just return directly, removing the
need to have the rest of the function inside an else block.
This prevents the use of DependencyGroup for storing the dependencies during
the early registration (m_DependencyGroupsPushedToRegistry = false),
m_PendingDependencies is introduced as a replacement to store the dependencies
at that time.
Previously the dependency state was evaluated by picking the first
dependency object from the batched members. However, since the
dependency `disable_{checks,notifications` attributes aren't taken into
account when batching the members, the evaluated state may yield a wrong
result for some Checkables due to some random dependency from other
Checkable of that group that has the `disable_{checks,notifications`
attrs set. This commit forces the callers to always provide the child
Checkable the state is evaluated for and picks only the dependency
objects of that child Checkable.
The new implementation just counts reachable and available parents and
determines the overall result by comparing numbers, see inline comments for
more information.
This also fixes an issue in the previous implementation: if it didn't return
early from the loop, it would just return the state of the last parent
considered which may not actually represent the group state accurately.
Services downtimes scheduled via the `all_services` flag get already
removed automatically when removing their parent downtimes (introduced
with #8913). Now, this commit makes it possible to perform the same actions
for all child downtimes, i.e. not only for those of service objects, but
for all child objects represented in the dependency tree.