Commit Graph

6424 Commits

Author SHA1 Message Date
Alexander Aleksandrovič Klimov 9a8620d923
Merge pull request #10213 from Icinga/do-not-read-data-on-disconnect
JsonRpcConnection: Don't read any data on shutdown
2024-11-07 12:32:02 +01:00
Alexander Aleksandrovič Klimov fb64c4f057
Atomic#Atomic(): remove superfluous atomic write 2024-11-06 11:37:02 +01:00
Alexander Aleksandrovič Klimov a77259adc1
Atomic<T>#Atomic(T): fix C++ compliance
by not calling `std::atomic<T>::atomic(void)`.

After the latter the instance "does not contain a T object, and its only valid uses are destruction and initialization by std::atomic_init" which we don't call. So the only safe option is `std::atomic<T>::atomic(T)`.

https://en.cppreference.com/w/cpp/atomic/atomic/atomic
2024-11-05 13:15:22 +01:00
Yonas Habteab 1c34610a78 JsonRpcConnection: Don't read any data on shutdown
When the `Desconnect()` method is called, clients are not disconnected
immediately. Instead, a new coroutine is spawned using the same strand
as the other coroutines. This coroutine calls `async_shutdown` on the
TCP socket, which might be blocking. However, in order not to block
indefintely, the `Timeout` class cancels all operations on the socket
after `10` seconds. Though, the timeout does not trigger the handler
immediately; it creates spawns another coroutine using the same strand
as in the `JsonRpcConnection` class. This can cause unexpected delays if
e.g. `HandleIncomingMessages` gets resumed before the coroutine from the
timeout class. Apart from that, the coroutine for writing messages uses
the same condition, making the two symmetrical.
2024-10-31 17:09:13 +01:00
Yonas Habteab d894792c36
Merge pull request #10209 from Icinga/log-error-context-only-once
ApiListener: Log error context only once
2024-10-31 13:14:42 +01:00
Alexander Aleksandrovič Klimov 5f487aff1b
Merge pull request #10201 from Icinga/Validation-failed
Remove redundant "Validation failed" prefix from ValidationError exceptions
2024-10-31 12:30:39 +01:00
Yonas Habteab 8574357443 ApiListener: Log error context only once
When logging at the warning level, the logger will automatically look up
for registered context and append them to the log entry accordingly.
2024-10-30 16:55:13 +01:00
Yonas Habteab e8b7baa298 JsonRpcConnection: Drop unused `m_NextHeartbeat` variable 2024-10-30 14:31:48 +01:00
Yonas Habteab 10775f4481
Merge pull request #10207 from Icinga/log-connected-endpoint-connection-attempts
ApiListener: Log connection attempts from an already connected client prominently
2024-10-30 13:31:44 +01:00
Yonas Habteab 9d4625e1ec ApiListener: Log connection attempts from an already connected client
Something is definitely going wrong if a client tries to reconnect to
this endpoint while it still has an active connection to that client. So
we shouldn't hide this, but at least log it at info level. Apart from
that, I've added some additional information about the currently active
client, such as when the last message was sent and received.
2024-10-30 11:26:21 +01:00
Alexander Aleksandrovič Klimov 4ca68e444e
Merge pull request #10204 from Icinga/an-HA
doc/: fix "a HA" -> "an HA"
2024-10-24 11:30:24 +02:00
Alexander Aleksandrovič Klimov fb8badfd2e
Merge pull request #10187 from Icinga/state-before-suppression
Fix lost recovery notifications after recovery outside of notification time period
2024-10-24 10:07:59 +02:00
Alexander Aleksandrovič Klimov 7df6baf146
Merge pull request #10176 from Icinga/ICINGA2_UNITY_BUILD=OFF-ICINGA2_WITH_LIVESTATUS=ON
Fix build on Mac with -DICINGA2_UNITY_BUILD=OFF -DICINGA2_WITH_LIVESTATUS=ON
2024-10-24 10:03:57 +02:00
Alexander A. Klimov 095e5982f4 doc/: fix "a HA" -> "an HA" 2024-10-24 09:44:36 +02:00
Alexander A. Klimov 7a4ba59961 Remove redundant "Validation failed" prefix from ValidationError exceptions
ValidationError#ValidationError() already prefixes #m_What,
which #what() returns, with "Validation failed for object".
2024-10-23 13:06:12 +02:00
Yonas Habteab f4e61ef9bd
Merge pull request #10177 from Icinga/log-noop-fix
Log: fix some parts of messages not being discarded early
2024-10-21 09:31:19 +02:00
Julian Brost 7d0a43f926 Use `Checkable::GetStateBeforeSuppression()` only where relevant
This fixes an issue where recovery notifications get lost if they happen
outside of a notification time period.

Not all calls to `Checkable::NotificationReasonApplies()` need
`GetStateBeforeSuppression()` to be checked. In fact, for one caller,
`FireSuppressedNotifications()` in
`lib/notification/notificationcomponent.cpp`, the state before suppression may
not even be initialized properly, so that the default value of OK is used which
can lead to incorrect return values. Note the difference between suppressions
happening on the level of the `Checkable` object level and the `Notification`
object level. Only the first sets the state before suppression in the
`Checkable` object, but so far, also the latter used that value incorrectly.

This commit moves the check of `GetStateBeforeSuppression()` from
`Checkable::NotificationReasonApplies()` to the one place where it's actually
relevant: `Checkable::FireSuppressedNotifications()`. This made the existing
call to `NotificationReasonApplies()` unneccessary as it would always return
true: the `type` argument is computed based on the current check result, so
there's no need to check it against the current check result.
2024-10-11 13:21:10 +02:00
Julian Brost 5e9e0bbcdf
Merge pull request #10059 from Icinga/IcingaDB-TimestampToMilliseconds-limit
IcingaDB::TimestampToMilliseconds(): limit output to four year digits
2024-10-02 09:19:03 +02:00
Alexander A. Klimov dc4869c3aa IcingaDB::TimestampToMilliseconds(): limit output to four year digits
Too high timestamps may overflow uint64_t (and the YYYY format) and negative
ones don't fit into uint64_t. Those may crash our Go daemon.
2024-09-30 16:54:40 +02:00
Julian Brost f0e084d530 Log: fix some parts of messages not being discarded early
`m_IsNoOp` was introduced to avoid building up log messages that will later be
discarded, like debug messages if no debug logging is configured. However, it
looks like the template operator<< implemented in the header file was forgotten
when adding this feature, all other places writing into `m_Buffer` already have
an if guard like added by this commit.
2024-09-27 14:23:05 +02:00
Alexander A. Klimov 2bbeaec916 Fix build on Mac with -DICINGA2_UNITY_BUILD=OFF -DICINGA2_WITH_LIVESTATUS=ON
error: no matching function for call to 'intrusive_ptr_release'
...
candidate function not viable: cannot convert argument of incomplete type 'icinga::Notification *' to 'Object *' for 1st argument
void intrusive_ptr_release(Object *object);
2024-09-27 12:41:11 +02:00
Julian Brost b6b1506bda
Merge pull request #10140 from Icinga/drop-cpu-bound-work-usage-from-ifwapi
Don't use thread-local var in coroutine & drop superfluous `CpuBoundWork` usage
2024-09-27 11:31:58 +02:00
Yonas Habteab 92df9ef8c3
Merge pull request #10148 from Icinga/enhanced-sort-types-by-load-dependencies
Sort config types by their load dependencies once
2024-09-26 15:27:41 +02:00
Sebastian Grund 8c68c6e9d8
Add closing quotationmarks in Validator for influxdb writer config 2024-09-25 13:03:00 +02:00
Yonas Habteab 467e8b18e7 Type: Simplify sort by load dependencies algorithm 2024-09-20 16:18:12 +02:00
Alexander A. Klimov 31f3acaa13 ConfigItem::CommitNewItems(): pre-sort types by their load dependencies once
to avoid complicated nested loops, iterating over the same types and
checking dependencies over and over, skipping already completed ones.
2024-09-20 16:18:12 +02:00
Alexander A. Klimov b848934d57 Introduce Type::GetConfigTypesSortedByLoadDependencies() 2024-09-20 16:18:12 +02:00
Yonas Habteab 26f43b0b48 IcingaDB: Don't sync partially initialised objects 2024-09-11 14:08:27 +02:00
Yonas Habteab 74009f0fcb Don't use thread-local variable in coroutine & process final `cr` in global thread pool 2024-09-05 17:36:03 +02:00
Yonas Habteab c9159494c0 HttpServerConnection: Drop yet another superfluous `CpuBoundWork` usage 2024-09-05 15:10:14 +02:00
Alexander Aleksandrovič Klimov 79e3cb2a95 Utility::ReleaseHelper(): remove detection of EOL distros
We only support /etc/os-release owners.
2024-09-04 10:26:50 +02:00
Alexander Aleksandrovič Klimov 0951230ce1
Merge pull request #9991 from Icinga/JsonRpcConnection-9985
JsonRpcConnection#Send*(): discard messages ASAP once shutting down
2024-09-03 15:13:30 +02:00
Julian Brost 4c6b93d617
Merge pull request #10011 from Icinga/next-check-cluster-sync-issue
Checkable: Don't recalculate `next_check` for remotely generated `cr`
2024-08-30 13:37:41 +02:00
Yonas Habteab 9f84c1516e ApiListener: Reorder logging in `ApiTimerHandler()` 2024-08-28 16:53:53 +02:00
Yonas Habteab e062ceb901 ApiListener: Catch & supress clients runtime errors 2024-08-28 16:53:53 +02:00
Julian Brost 88e79ea41a
Merge pull request #10111 from Icinga/unregister-invalid-objects-properly
Unregister invalid config objects properly
2024-08-27 14:30:38 +02:00
Yonas Habteab 932a53449d JsonRpcConnection: Raise an exception when trying to send to disconnected clients 2024-08-27 14:23:41 +02:00
Julian Brost 9222a63ff7 Make sure log file is reopened when `ApiListener::ReplayLog()` returns 2024-08-27 14:23:41 +02:00
Yonas Habteab a5a83e311a Defer: Allow empty initialization & add `SetFunc()` method 2024-08-27 14:23:41 +02:00
Yonas Habteab 73db30c08b Use `Defer` class for cleanup in `ApiListener::ReplayLog()` 2024-08-27 14:23:41 +02:00
Alexander A. Klimov f074e24d2a ApiListener#ReplayLog(): stop reading files ASAP on send error 2024-08-27 14:23:41 +02:00
Alexander A. Klimov b538ad2528 JsonRpcConnection#Send*(): discard messages ASAP once shutting down
Especially ApiListener#ReplayLog() enqueued lots of messages into
JsonRpcConnection#{m_IoStrand,m_OutgoingMessagesQueue} (RAM) even if
the connection was shut(ting) down. Now #Disconnect() takes effect ASAP.
2024-08-27 14:23:41 +02:00
Alexander A. Klimov 33f8ea6dcc JsonRpcConnection#Disconnect(): spawn coroutine only if necessary
by checking the now atomic #m_ShuttingDown outside of it.
2024-08-27 14:23:41 +02:00
Julian Brost 39ae2e8ca4 Utility::FormatDateTime(): provide an overload for tm*
This allows the function to be used both with a double timestamp or a pointer
to a tm struct. With this, a similar implementation inside the tests can simply
use our regular function.
2024-08-23 12:48:50 +02:00
Julian Brost d5b3ffaa6d Utility::FormatDateTime(): handle invalid format strings on Windows
On Windows, the strftime() function family invokes an invalid parameter handler
when the format string is invalid (see the "Remarks" section in their
documentation). std::put_time() shows the same behavior as it uses
_wcsftime_l() internally. The default invalid parameter handler may terminate
the process, which can be a problem given that the format string can be
specified by the user from the Icinga DSL.

Thus, temporarily set a thread-local no-op handler to disable the default one
allowing the program to continue. This then simply results in the function
returning an error which then results in an exception as we ask the stream to
throw one.

See also:
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/strftime-wcsftime-strftime-l-wcsftime-l?view=msvc-170
https://learn.microsoft.com/en-us/cpp/c-runtime-library/parameter-validation?view=msvc-170
https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/set-invalid-parameter-handler-set-thread-local-invalid-parameter-handler?view=msvc-170
2024-08-23 12:48:50 +02:00
Julian Brost 0285028689 Utility::FormatDateTime(): handle errors from strftime()
So far, the return value of strftime() was simply ignored and the output buffer
passed to the icinga::String constructor. However, there are error conditions
where strftime() returns 0 to signal an error, like if the buffer was too small
for the output. In that case, there's no guarantee on the buffer contents and
reading it can result in undefined behavior. Unfortunately, returning 0 can
also indicate success and strftime() doesn't set errno, so there's no reliable
way to distinguish both situations. Thus, the implementation now returns the
empty string in both cases.

I attempted to use std::put_time() at first as that allows for better error
handling, however, there were problems with the implementation on Windows (see
inline comment), so I put that plan on hold at left strftime() there for the
time being.
2024-08-23 12:42:54 +02:00
Julian Brost c2c66908f6 Utility::FormatDateTime(): use localtime_s() on Windows
localtime() is not thread-safe as it returns a pointer to a shared tm struct.
Everywhere except on Windows, localtime_r() is used already which avoids the
problem by using a struct allocated by the caller for the output.

Windows actually has a similar function called localtime_s() which has the same
properties, just with a different name and order of arguments.
2024-08-23 12:42:32 +02:00
Julian Brost 704acdc698 Utility::FormatDateTime(): use boost::numeric_cast<>()
The previous implementation actually had undefined behavior when called with a
double that can't be represented as time_t. With boost::numeric_cast, there's a
convenient cast available that avoids this and throws an exceptions on
overflow.

It's undefined behavior ([0], where the implicit conversion rule comes into
play because the C-style cast uses static_cast [1] which in turn uses the
imlicit conversion as per rule 5 of [2]):

> A prvalue of floating-point type can be converted to a prvalue of any integer
> type. The fractional part is truncated, that is, the fractional part is
> discarded.
>
> * If the truncated value cannot fit into the destination type, the behavior
>   is undefined (even when the destination type is unsigned, modulo arithmetic
>   does not apply).

Note that on Linux amd64, the undefined behavior typically manifests itself in
the result being the minimal value of time_t which then results in localtime_r
failing with EOVERFLOW.

[0]: https://en.cppreference.com/w/cpp/language/implicit_conversion#Floating.E2.80.93integral_conversions
[1]: https://en.cppreference.com/w/cpp/language/explicit_cast
[2]: https://en.cppreference.com/w/cpp/language/static_cast
2024-08-23 12:42:30 +02:00
Julian Brost 4c83d793a6
Merge pull request #9983 from Icinga/broken-timeperiod
Fix broken `TimePeriod/ScheduledDowntime`s
2024-08-20 10:05:59 +02:00
Yonas Habteab ca7cc54438 Checkable: Don't recalculate `next_check` while processing remotely genrated check
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.

Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
2024-08-16 16:15:56 +02:00