The reason for introducing AsioTlsStream::GracefulDisconnect() was to handle
the TLS shutdown properly with a timeout since it involves a timeout. However,
the implementation of this timeout involves spwaning coroutines which are
redundant in some cases. This commit adds comments to the remaining calls of
async_shutdown() stating why calling it is safe in these places.
The .ti files call `DependencyGraph::AddDependency(this, service.get())`. Obviously, `service.get()` is the parent and `this` (Downtime, Notification, ...) is the child. The DependencyGraph terminology should reflect this not to confuse its future users.
Currently, for each `Disconnect()` call, we spawn a coroutine, but every
one of them is just usesless, except the first one. However, since all
`Disconnect()` usages share the same asio strand and cannot interfere
with each other, spawning another coroutine within `Disconnect()` isn't
even necessary. When a coroutine calls `Disconnect()` now, it will
immediately initiate an async shutdown of the socket, potentially causing
the coroutine to yield and allowing the others to resume. Therefore, the
`m_ShuttingDown` flag is still required by the coroutines to be checked
regularly.
In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.
When the `Desconnect()` method is called, clients are not disconnected
immediately. Instead, a new coroutine is spawned using the same strand
as the other coroutines. This coroutine calls `async_shutdown` on the
TCP socket, which might be blocking. However, in order not to block
indefintely, the `Timeout` class cancels all operations on the socket
after `10` seconds. Though, the timeout does not trigger the handler
immediately; it creates spawns another coroutine using the same strand
as in the `JsonRpcConnection` class. This can cause unexpected delays if
e.g. `HandleIncomingMessages` gets resumed before the coroutine from the
timeout class. Apart from that, the coroutine for writing messages uses
the same condition, making the two symmetrical.
Something is definitely going wrong if a client tries to reconnect to
this endpoint while it still has an active connection to that client. So
we shouldn't hide this, but at least log it at info level. Apart from
that, I've added some additional information about the currently active
client, such as when the last message was sent and received.
Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
When a HTTP connection dies prematurely while the response is sent,
`http::async_write()` sets the error code to something like broken pipe for
example. When calling `async_flush()` afterwards, it sometimes happens that
this never returns. This results in a resource leak as the coroutine isn't
cleaned up. This commit makes the individual functions throw exceptions instead
of silently ignoring the errors, resulting in the function terminating early
and also resulting in an error being logged as well.
Given that the internal `config::Update` cluster events are using this
as well to create received runtime objects, we don't want to persist
first the conf file and the load and validate it with `CompileFile`.
Otherwise, we are forced to remove the newly created file whenever we
can't validate, commit or activate it. This also would also have the
downside that two cluster events for the same object arriving at the
same moment from two different endpoints would result in two different
threads simultaneously creating and loading the same config file -
whereby only one of the surpasses the validation, while the other is
facing an object `re-definition` error and tries to remove that config
file it mistakenly thinks it has created. As a consequence, an object
successfully created by the former is implicitly deleted by the latter
thread, causing the objects to mysteriously disappear.
Closing and re-opening that very same log file shouldn't reset the
counter, otherwise some log files may exceed the max limit per file as
their offset indicator is reset each time they are re-opened.
On incoming connection timeout we log the remote endpoint which isn't
available if it was already disconnected - an exception is thrown. Get it
as long as we're still connected not to lose it, nor to get an exception.
Although there is locking involved here, it shoudln't take too long for
the thread to actually acquire it, since there aren't that many threads
dealing with endpoint clients concurrently. It's just wasting pointless
time trying to obtain a CPU slot.