Disregard passive check results while no active checks are being scheduled due to violated dependencies.
This copes with the fact that programs feeding passive check results into Icinga may have no notion of reachability and so drive a checkable into HARD state although dependencies have caused active check scheduling being suspended. This may prevent superflous problem notifications being emitted during recovery.
As disable_checks defaults to false, it was regarded OK (by @Al2Klimov) to make this behaviour (which resembles the active check case) unconditional and not conditionalize it on an additional attribute.
In the description of disable_checks, note that a value of true both disables scheduling of active checks and drops passive check results.
This delays the log message stating that the initial dump is done until
all queries are actually done and now logs a meaningful duration. In
addition, this delays the return of the function and therefore when
state variables are updated by the caller.
This commit sets the activation priority if IcingaDB objects to 100 (the
same value as IDO uses) so that it get's activated after most regular
config objects (hosts, services, ...).
Before (note how Icinga 2 continues to active objects for over a minute
after IcingaDB is started and thinks the initial dump is done):
[2021-01-19 08:33:19 +0000] information/IcingaDB: 'icingadb' started.
[2021-01-19 08:34:02 +0000] information/IcingaDB: Initial config/status dump finished in 28.247 seconds.
[2021-01-19 08:35:49 +0000] information/ConfigItem: Activated all objects.
After (now activation of objects is done right after IcingaDB is
started, as it's one of the last objects to be activated):
[2021-01-19 08:39:01 +0000] information/IcingaDB: 'icingadb' started.
[2021-01-19 08:39:02 +0000] information/ConfigItem: Activated all objects.
[2021-01-19 08:39:38 +0000] information/IcingaDB: Initial config/status dump finished in 21.6606 seconds.
There is an assertion that after activating items, all these items are
active, which sounds reasonable at first. However, with concurrent API
queries, some of these could already be deleted and therefore be
deactivated again.
At numerous places in the code, something like this is performed:
String name = Downtime::AddDowntime(...);
Downtime::Ptr downtime = Downtime::GetByName(name);
However, `downtime` can be a `nullptr` after this as it is possible that
the downtime is deleted in between.
This commit changes the return type of `Downtime::AddDowntime` to return
a Downtime::Ptr instead of the full name of the downtime. `AddDowntime`
performs the very same `GetByName()` operation internally, but handles
the `nullptr` case correctly and throws an exception.
Since commit d9010c7b9f, ActivateItems no
longer uses the WorkQueue upq to perform tasks but instead performs
these locally. One instance of `upq.Join()`/`upq.HasExceptions()`
remained in the function, but I believe this was just missed when
removing the `upq.Enqueue()` call just before.
This commit removes the corresponding parameter and updates all call
sites accordingly.
`this` could be deleted after `Notification::BeginExecuteNotification`
exited and before `Notification::ExecuteNotificationHelper` finished.
This is fixed by constructing a `Notification::Ptr` and operate on that
one as it is properly reference-counted.
Only two out of three cases were handled properly by the code: host
downtimes referencing a deleted host and service downtimes referencing a
deleted service worked fine. However, if a service downtime references a
deleted host, `Host::GetByName()` returns `nullptr` which isn't
accounted for. Use `Service::GetByNamePair()` instead as this performs a
check for the host being null internally.
Boost.Beast changed the signature of
boost::beast::http::basic_fields::set in version 1.74 so that no longer
allows passing an icinga::String instance as value. This adds a
conversion function so that it works again.
Boost.Beast changed the signature of the previously used generic `set`
method so that it no longer accepts integer types, however there is
alreay a more specific method for setting the Content-Length header, so
use this one instead.
When a CRL is specified in the ApiListener configuration, Icinga 2 only
used it when connections were established so far, but not when a
certificate is requested. This allows a node to automatically renew a
revoked certificate if it meets the other conditions for auto-renewal
(issued before 2017 or expires in less than 30 days).
The function was never used and it's implementation contains a bug where
a buffer of too small size is used as a paramter to ERR_error_string.
According to the `man 3 ERR_error_info`, the buffer has to be at least
256 bytes in size.
Also the function seems of limited use as it allows to output the tag
object used with additional error information for exceptions in Boost.
However, you boost::get_error_info<>() just returns the value type but
not the full tag object from the exception.
Merge AsyncTryActivateZonesStage and TryActivateZonesStageCallback and
name the result TryActivateZonesStage. The old split was a leftover from
the one being a callback function with no actual meaningful separation.
Anonymous connections are normally only used for requesting a
certificate and are closed after this request is received. However, the
request is only sent if the child has successfully verified the
certificate of its parent so that it is an authenticated connection from
its perspective. In case this verification fails, both ends view it as
an anonymous connection and never actually use it but attempt a
reconnect after 10 seconds leaking the connection. Therefore close it
after a timeout.
RelayMessageOne used to relay the message only to one other endpoint for
other zones, which is fine, as long as the target zone is a child/parent
zone but breaks if the target zone is a global one. In this case, the
message has to be forwarded within the local zone as well as to one node
in each child zone.
The initial config object sync for each new connection (in
`ApiListener::SendRuntimeConfigObjects()`) only considers currently
existing objects and has no way to pass the information that objects
were deleted in the meantime.
This commit logs config object deletions to the replay log if required
so that there is a chance that it will be propagated to nodes that were
offline when the deletion happened.
Note that this can only be considered a workaround as the replay log
might be pruned or could even be completely disabled. Also, there still
seems to be a race-condition between the config sync and replay log of
multiple new connections at the same time.
Before postgres 9.1, this setting defaulted to off and icinga2 code was
making heavy use of this feature. Since postgres 9.1, this settings
defaults to on. During the adoption of postgres >= 9.1, the icinga2
postgres ido code maintained compatibility by setting it to off
explicitly.
In the mean time, the postgres ido code has been converted to using the
`E'...'` escape literal syntax exclusively.
The last remaining step is now to no longer force the setting to off
because no query is using the feature any longer.
Closes github issue #8122.