In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.
The reason for introducing AsioTlsStream::GracefulDisconnect() was to handle
the TLS shutdown properly with a timeout since it involves a timeout. However,
the implementation of this timeout involves spwaning coroutines which are
redundant in some cases. This commit adds comments to the remaining calls of
async_shutdown() stating why calling it is safe in these places.
Calling `AsioTlsStream::async_shutdown()` performs a TLS shutdown which
exchanges messages (that's why it takes a `yield_context`) and thus has the
potential to block the coroutine. Therefore, it should be protected with a
timeout. As `async_shutdown()` doesn't simply take a timeout, this has to be
implemented using a timer. So far, these timers are scattered throughout the
codebase with some places missing them entirely. This commit adds helper
functions to properly shutdown a TLS connection with a single function call.
All usages of `AsioTlsStream` were already using `Shared<AsioTlsStream>` to
keep a reference-counted instance. This commit moves the reference counting to
`AsioTlsStream` itself by inheriting from `SharedObject`. This will allow to
implement methods making use of the fact that these objects are
reference-counted.
The changes outside of `lib/base/tlsstream.hpp` are merely replacing
`Shared<AsioTlsStream>::Ptr` with `AsioTlsStream::Ptr` everywhere.
Currently, for each `Disconnect()` call, we spawn a coroutine, but every
one of them is just usesless, except the first one. However, since all
`Disconnect()` usages share the same asio strand and cannot interfere
with each other, spawning another coroutine within `Disconnect()` isn't
even necessary. When a coroutine calls `Disconnect()` now, it will
immediately initiate an async shutdown of the socket, potentially causing
the coroutine to yield and allowing the others to resume. Therefore, the
`m_ShuttingDown` flag is still required by the coroutines to be checked
regularly.
This fixes an issue where recovery notifications get lost if they happen
outside of a notification time period.
Not all calls to `Checkable::NotificationReasonApplies()` need
`GetStateBeforeSuppression()` to be checked. In fact, for one caller,
`FireSuppressedNotifications()` in
`lib/notification/notificationcomponent.cpp`, the state before suppression may
not even be initialized properly, so that the default value of OK is used which
can lead to incorrect return values. Note the difference between suppressions
happening on the level of the `Checkable` object level and the `Notification`
object level. Only the first sets the state before suppression in the
`Checkable` object, but so far, also the latter used that value incorrectly.
This commit moves the check of `GetStateBeforeSuppression()` from
`Checkable::NotificationReasonApplies()` to the one place where it's actually
relevant: `Checkable::FireSuppressedNotifications()`. This made the existing
call to `NotificationReasonApplies()` unneccessary as it would always return
true: the `type` argument is computed based on the current check result, so
there's no need to check it against the current check result.
Something is definitely going wrong if a client tries to reconnect to
this endpoint while it still has an active connection to that client. So
we shouldn't hide this, but at least log it at info level. Apart from
that, I've added some additional information about the currently active
client, such as when the last message was sent and received.
When the `Desconnect()` method is called, clients are not disconnected
immediately. Instead, a new coroutine is spawned using the same strand
as the other coroutines. This coroutine calls `async_shutdown` on the
TCP socket, which might be blocking. However, in order not to block
indefintely, the `Timeout` class cancels all operations on the socket
after `10` seconds. Though, the timeout does not trigger the handler
immediately; it creates spawns another coroutine using the same strand
as in the `JsonRpcConnection` class. This can cause unexpected delays if
e.g. `HandleIncomingMessages` gets resumed before the coroutine from the
timeout class. Apart from that, the coroutine for writing messages uses
the same condition, making the two symmetrical.
This reverts commit 850f79e774735a4a366e58f3fa68e446769bdd1a
which has already been cherry-picked there, but is also needed for v2.14.3.
This has the same effect as `git merge support/2.14`, but involves no merge,
no conflict resolution, less commits and a smaller diff.
The previous validation in set_verify_callback() could be bypassed, tricking
Icinga 2 into treating invalid certificates as valid. To fix this, the
validation checks were moved into the IsVerifyOK() function.
This is tracked as CVE-2024-49369, more details will be published at a later time.
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.
Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.