by not calling `std::atomic<T>::atomic(void)`.
After the latter the instance "does not contain a T object, and its only valid uses are destruction and initialization by std::atomic_init" which we don't call. So the only safe option is `std::atomic<T>::atomic(T)`.
https://en.cppreference.com/w/cpp/atomic/atomic/atomic
When the `Desconnect()` method is called, clients are not disconnected
immediately. Instead, a new coroutine is spawned using the same strand
as the other coroutines. This coroutine calls `async_shutdown` on the
TCP socket, which might be blocking. However, in order not to block
indefintely, the `Timeout` class cancels all operations on the socket
after `10` seconds. Though, the timeout does not trigger the handler
immediately; it creates spawns another coroutine using the same strand
as in the `JsonRpcConnection` class. This can cause unexpected delays if
e.g. `HandleIncomingMessages` gets resumed before the coroutine from the
timeout class. Apart from that, the coroutine for writing messages uses
the same condition, making the two symmetrical.
Something is definitely going wrong if a client tries to reconnect to
this endpoint while it still has an active connection to that client. So
we shouldn't hide this, but at least log it at info level. Apart from
that, I've added some additional information about the currently active
client, such as when the last message was sent and received.
This fixes an issue where recovery notifications get lost if they happen
outside of a notification time period.
Not all calls to `Checkable::NotificationReasonApplies()` need
`GetStateBeforeSuppression()` to be checked. In fact, for one caller,
`FireSuppressedNotifications()` in
`lib/notification/notificationcomponent.cpp`, the state before suppression may
not even be initialized properly, so that the default value of OK is used which
can lead to incorrect return values. Note the difference between suppressions
happening on the level of the `Checkable` object level and the `Notification`
object level. Only the first sets the state before suppression in the
`Checkable` object, but so far, also the latter used that value incorrectly.
This commit moves the check of `GetStateBeforeSuppression()` from
`Checkable::NotificationReasonApplies()` to the one place where it's actually
relevant: `Checkable::FireSuppressedNotifications()`. This made the existing
call to `NotificationReasonApplies()` unneccessary as it would always return
true: the `type` argument is computed based on the current check result, so
there's no need to check it against the current check result.
`m_IsNoOp` was introduced to avoid building up log messages that will later be
discarded, like debug messages if no debug logging is configured. However, it
looks like the template operator<< implemented in the header file was forgotten
when adding this feature, all other places writing into `m_Buffer` already have
an if guard like added by this commit.
error: no matching function for call to 'intrusive_ptr_release'
...
candidate function not viable: cannot convert argument of incomplete type 'icinga::Notification *' to 'Object *' for 1st argument
void intrusive_ptr_release(Object *object);
Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
This allows the function to be used both with a double timestamp or a pointer
to a tm struct. With this, a similar implementation inside the tests can simply
use our regular function.
So far, the return value of strftime() was simply ignored and the output buffer
passed to the icinga::String constructor. However, there are error conditions
where strftime() returns 0 to signal an error, like if the buffer was too small
for the output. In that case, there's no guarantee on the buffer contents and
reading it can result in undefined behavior. Unfortunately, returning 0 can
also indicate success and strftime() doesn't set errno, so there's no reliable
way to distinguish both situations. Thus, the implementation now returns the
empty string in both cases.
I attempted to use std::put_time() at first as that allows for better error
handling, however, there were problems with the implementation on Windows (see
inline comment), so I put that plan on hold at left strftime() there for the
time being.
localtime() is not thread-safe as it returns a pointer to a shared tm struct.
Everywhere except on Windows, localtime_r() is used already which avoids the
problem by using a struct allocated by the caller for the output.
Windows actually has a similar function called localtime_s() which has the same
properties, just with a different name and order of arguments.
The previous implementation actually had undefined behavior when called with a
double that can't be represented as time_t. With boost::numeric_cast, there's a
convenient cast available that avoids this and throws an exceptions on
overflow.
It's undefined behavior ([0], where the implicit conversion rule comes into
play because the C-style cast uses static_cast [1] which in turn uses the
imlicit conversion as per rule 5 of [2]):
> A prvalue of floating-point type can be converted to a prvalue of any integer
> type. The fractional part is truncated, that is, the fractional part is
> discarded.
>
> * If the truncated value cannot fit into the destination type, the behavior
> is undefined (even when the destination type is unsigned, modulo arithmetic
> does not apply).
Note that on Linux amd64, the undefined behavior typically manifests itself in
the result being the minimal value of time_t which then results in localtime_r
failing with EOVERFLOW.
[0]: https://en.cppreference.com/w/cpp/language/implicit_conversion#Floating.E2.80.93integral_conversions
[1]: https://en.cppreference.com/w/cpp/language/explicit_cast
[2]: https://en.cppreference.com/w/cpp/language/static_cast
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.
Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.