Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.
Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
A day specification like "monday -1" refers to the last Monday of the month.
However, there was an off by one if the first day of the next month is the same
day of the week, i.e. a Monday in this example.
LegacyTimePeriod::FindNthWeekday() picks a day to start the search for the day
in question. When given a negative n to search for the n-th last day, it
wrongly used the first day of the following month as the start and counted it
as if it was within the current month. This resulted in a 1/7 chance that the
result was one week too late.
This is fixed by using the last day of the current month instead.
Before ae693cb7e1df1b885142854cf8a0f8a7600a3fb7 (#9577) we've repeatedly looped over all items in parallel like this:
while not types.done:
for t in types:
if not t.done and t.dependencies.done:
with parallel(all_items, CONCURRENCY) as some_items:
for i in some_items:
if i.type is t:
i.commit()
I.e. all items got distributed over CONCURRENCY threads, but not always equally. E.g. it was the hosts' turn, but only two threads got hosts and did all the work. The others didn't do actual work (due to the lack of hosts in their queue) which reduced the performance. c721c302cd9c96bee25a20b3862dad347345648a (#6581) fixed it by shuffling all_items first. ae693cb7e1df1b885142854cf8a0f8a7600a3fb7 (#9577) made the latter unnecessary by replacing the above algorithm with this:
while not types.done:
for t in types:
if not t.done and t.dependencies.done:
with parallel(all_items[t], CONCURRENCY) as some_items:
for i in some_items:
if i.type is t:
i.commit()
I.e. parallel() gets only items of type t, so all threads get e.g. hosts.
When a HTTP connection dies prematurely while the response is sent,
`http::async_write()` sets the error code to something like broken pipe for
example. When calling `async_flush()` afterwards, it sometimes happens that
this never returns. This results in a resource leak as the coroutine isn't
cleaned up. This commit makes the individual functions throw exceptions instead
of silently ignoring the errors, resulting in the function terminating early
and also resulting in an error being logged as well.