6570 Commits

Author SHA1 Message Date
Lorenz Kästle
e7381193c8
Reject infinite performance data values
Some fault monitoring plugins may return "inf" or "-inf" as
values due to a failure to initialize or other errors.

This patch introduces a check on whether the parse value is infinite
(or negative infinite) and rejects the data point if that is the case.

The reasoning here is: There is no possible way a value of "inf" is ever
a true measuring or even useful. Furthermore, when passed to the
performance data writers, it may be rejected by the backend and lead
to further complications.
2025-01-09 11:46:34 +01:00
Yonas Habteab
1425641931 Don't endlessly wait on writer coroutine on disconnect 2025-01-08 16:30:36 +01:00
Yonas Habteab
41373ad0e5 Log before & after an RPC client is disconnected 2025-01-08 16:30:36 +01:00
Yonas Habteab
3af7cfe2ec JsonRpcConnection: Don't drop client from cache prematurely
PR #7445 incorrectly assumed that a peer that had already disconnected
and never reconnected was due to the endpoint client being dropped after
a successful socket shutdown. However, the issue at that time was that
there was not a single timeout guards that could cancel the `async_shutdown`
call, petentially blocking indefinetely. Although removing the client from
cache early might have allowed the endpoint to reconnect, it did not
resolve the underlying problem. Now that we have a proper cancellation
timeout, we can wait until the currently used socket is fully closed
before dropping the client from our cache. When our socket termination
works reliably, the `ApiListener` reconnect timer should attempt to
reconnect this endpoint after the next tick. Additionally, we now have
logs both for before and after socket termination, which may help
identify if it is hanging somewhere in between.
2025-01-08 16:30:36 +01:00
Alexander A. Klimov
8f72891228 Document Timeout 2025-01-07 18:20:54 +01:00
Alexander A. Klimov
3ca7ff7bf4 Timeout: explicitly delete #Timeout(const Timeout&), #Timeout(Timeout&&), #operator=(const Timeout&), #operator=(Timeout&&) 2025-01-07 18:20:52 +01:00
Alexander A. Klimov
27e0e236cb Move Timeout instances from heap to stack 2025-01-07 18:20:50 +01:00
Alexander A. Klimov
d77d7506f1 Don't call Timeout#Cancel() where Timeout#~Timeout() is called 2025-01-07 18:20:14 +01:00
Alexander A. Klimov
959b162913 Timeout#~Timeout(), #Cancel(): support boost::asio::io_context running on multiple threads 2025-01-07 18:19:42 +01:00
Alexander A. Klimov
cb51649363 Timeout#Timeout(): drop unnecessary template parameters 2025-01-07 18:19:39 +01:00
Alexander A. Klimov
d2285bcf0e While using Timeout, don't unnecessarily keep the strand alive via smart pointer 2025-01-07 18:18:46 +01:00
Alexander A. Klimov
faaeb4eb2e Timeout: use a plain callback, not an unnecessary coroutine 2025-01-07 18:18:24 +01:00
Alexander A. Klimov
92ab913226 Timeout#Timeout(): don't pass yield_context to callback
It's not used. Also, the callback shall run completely at once. This ensures that it won't (continue to) run once another coroutine on the strand calls Timeout#Cancel().
2025-01-07 18:18:18 +01:00
Julian Brost
880632b93a
Merge pull request #9861 from ymartin-ovh/issue-9752
icinga2: address comment loading where host reference is not found
2025-01-07 14:12:03 +01:00
Julian Brost
cf125dd8d5 Simplify DependencyGraph:RemoveDependency() method 2025-01-07 11:07:46 +01:00
Yonas Habteab
ff0e12e6ac ApiListener: Sync runtime configs in order 2025-01-07 11:07:46 +01:00
Yonas Habteab
015374e69d DependencyGraph: Allow lookups by parent & child dependencies 2025-01-07 11:07:46 +01:00
Alexander Aleksandrovič Klimov
383773eb2b
Merge pull request #10264 from Icinga/DependencyGraph-ConfigObject
DependencyGraph: use ConfigObject*, not Object*
2024-12-18 13:36:56 +01:00
Alexander A. Klimov
3a09cf72d6 DependencyGraph: use ConfigObject*, not Object*
This saves dynamic_cast<ConfigObject*> + if() on every item of GetChildren().
2024-12-17 18:33:05 +01:00
Julian Brost
452386cdb6
Merge pull request #10005 from Icinga/graceful-tls-disconnect
Add a dedicated method for disconnecting TLS connections
2024-12-12 16:20:14 +01:00
Julian Brost
3642ca3369
Merge pull request #10263 from Icinga/DependencyGraph-parent-child
DependencyGraph: switch "parent" and "child" terminology
2024-12-12 15:13:08 +01:00
Julian Brost
a506d562ae Add comment for remaining uses of async_shutdown() why it's safe
The reason for introducing AsioTlsStream::GracefulDisconnect() was to handle
the TLS shutdown properly with a timeout since it involves a timeout. However,
the implementation of this timeout involves spwaning coroutines which are
redundant in some cases. This commit adds comments to the remaining calls of
async_shutdown() stating why calling it is safe in these places.
2024-12-12 12:10:59 +01:00
Julian Brost
e6d103d0dd HttpServerConnection: use AsioTlsStream::GracefulDisconnect()
This new helper function has proper timeout handling which was missing here.
2024-12-12 12:10:59 +01:00
Julian Brost
007e3fbe7e JsonRpcConnection: use AsioTlsStream::GracefulDisconnect()
This new helper functions allows deduplicating the timeout handling for
`async_shutdown()`.
2024-12-12 12:10:59 +01:00
Julian Brost
56d5811283 AsioTlsStream: add GracefulDisconnect() and ForceDisconnect()
Calling `AsioTlsStream::async_shutdown()` performs a TLS shutdown which
exchanges messages (that's why it takes a `yield_context`) and thus has the
potential to block the coroutine. Therefore, it should be protected with a
timeout. As `async_shutdown()` doesn't simply take a timeout, this has to be
implemented using a timer. So far, these timers are scattered throughout the
codebase with some places missing them entirely. This commit adds helper
functions to properly shutdown a TLS connection with a single function call.
2024-12-12 12:10:59 +01:00
Alexander A. Klimov
188ba53b74 DependencyGraph: switch "parent" and "child" terminology
The .ti files call `DependencyGraph::AddDependency(this, service.get())`. Obviously, `service.get()` is the parent and `this` (Downtime, Notification, ...) is the child. The DependencyGraph terminology should reflect this not to confuse its future users.
2024-12-04 10:57:30 +01:00
Alexander Aleksandrovič Klimov
8f51f54f19
Merge pull request #10221 from Icinga/Al2Klimov-patch-7
JsonRpcConnection: don't write new messages on shutdown
2024-11-29 09:24:10 +01:00
Yonas Habteab
4564c068fe JsonRpcConnection: Log message processing time stats
Co-Authored-By: Julian Brost <julian.brost@icinga.com>
2024-11-27 09:57:38 +01:00
Yonas Habteab
e0b053cbe1 HttpServerConnection: Log noticable CPU semaphore wait time 2024-11-27 09:57:38 +01:00
Yonas Habteab
3218908595
Merge pull request #10214 from Icinga/useless-http-coroutines
HttpServerConnection: Don't spawn useless coroutines
2024-11-19 15:53:54 +01:00
Yonas Habteab
2931aea9bb
Merge pull request #7818 from Icinga/bugfix/no_more_notifications-7758
Don't set Notification#no_more_notifications on custom notifications
2024-11-15 14:43:12 +01:00
Alexander A. Klimov
35a705752f Don't set Notification#no_more_notifications on custom notifications 2024-11-15 13:03:22 +01:00
Alvar Penning
0bbe7a9b2f
IcingaDB Check: Multiple Responsible Instances
By design, only one Icinga 2 instance should be responsible in the HA
context. If this promise is broken, the Icinga 2 IcingaDB check should
report it.

The code did not check for invalid data in icingadb:telemetry:heartbeat.
With this change, it will go CRITICAL with a descriptive message and
report the actual number of icingadb_responsible_instances in the
performance data.
2024-11-15 12:56:45 +01:00
Yonas Habteab
5c0f9bfdaa HttpServerConnection: Don't spawn useless coroutines
Currently, for each `Disconnect()` call, we spawn a coroutine, but every
one of them is just usesless, except the first one. However, since all
`Disconnect()` usages share the same asio strand and cannot interfere
with each other, spawning another coroutine within `Disconnect()` isn't
even necessary. When a coroutine calls `Disconnect()` now, it will
immediately initiate an async shutdown of the socket, potentially causing
the coroutine to yield and allowing the others to resume. Therefore, the
`m_ShuttingDown` flag is still required by the coroutines to be checked
regularly.
2024-11-14 16:47:01 +01:00
Yonas Habteab
d68ee3fcf8
Merge pull request #10224 from Icinga/Empty-constant
Make icinga::Empty constant to prevent accidental changes
2024-11-14 10:35:36 +01:00
Julian Brost
67175c43c0
Merge pull request #10102 from Icinga/icingadb-redis-username
Icinga DB: Config no_user_modify and Support Redis username authentication
2024-11-12 17:04:20 +01:00
Julian Brost
5817e7666b
Merge commit from fork
Security: fix TLS certificate validation bypass
2024-11-12 15:01:57 +01:00
Alexander A. Klimov
09160ea9eb Make icinga::Empty constant to prevent accidental changes 2024-11-11 16:31:04 +01:00
Alexander Aleksandrovič Klimov
aa7f159a0f
JsonRpcConnection: don't write new messages on shutdown
In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.
2024-11-07 17:32:12 +01:00
Alexander Aleksandrovič Klimov
9a8620d923
Merge pull request #10213 from Icinga/do-not-read-data-on-disconnect
JsonRpcConnection: Don't read any data on shutdown
2024-11-07 12:32:02 +01:00
Alexander Aleksandrovič Klimov
fb64c4f057
Atomic#Atomic(): remove superfluous atomic write 2024-11-06 11:37:02 +01:00
Alexander Aleksandrovič Klimov
a77259adc1
Atomic<T>#Atomic(T): fix C++ compliance
by not calling `std::atomic<T>::atomic(void)`.

After the latter the instance "does not contain a T object, and its only valid uses are destruction and initialization by std::atomic_init" which we don't call. So the only safe option is `std::atomic<T>::atomic(T)`.

https://en.cppreference.com/w/cpp/atomic/atomic/atomic
2024-11-05 13:15:22 +01:00
Yonas Habteab
1c34610a78 JsonRpcConnection: Don't read any data on shutdown
When the `Desconnect()` method is called, clients are not disconnected
immediately. Instead, a new coroutine is spawned using the same strand
as the other coroutines. This coroutine calls `async_shutdown` on the
TCP socket, which might be blocking. However, in order not to block
indefintely, the `Timeout` class cancels all operations on the socket
after `10` seconds. Though, the timeout does not trigger the handler
immediately; it creates spawns another coroutine using the same strand
as in the `JsonRpcConnection` class. This can cause unexpected delays if
e.g. `HandleIncomingMessages` gets resumed before the coroutine from the
timeout class. Apart from that, the coroutine for writing messages uses
the same condition, making the two symmetrical.
2024-10-31 17:09:13 +01:00
Yonas Habteab
d894792c36
Merge pull request #10209 from Icinga/log-error-context-only-once
ApiListener: Log error context only once
2024-10-31 13:14:42 +01:00
Alexander Aleksandrovič Klimov
5f487aff1b
Merge pull request #10201 from Icinga/Validation-failed
Remove redundant "Validation failed" prefix from ValidationError exceptions
2024-10-31 12:30:39 +01:00
Yonas Habteab
8574357443 ApiListener: Log error context only once
When logging at the warning level, the logger will automatically look up
for registered context and append them to the log entry accordingly.
2024-10-30 16:55:13 +01:00
Yonas Habteab
e8b7baa298 JsonRpcConnection: Drop unused m_NextHeartbeat variable 2024-10-30 14:31:48 +01:00
Yonas Habteab
10775f4481
Merge pull request #10207 from Icinga/log-connected-endpoint-connection-attempts
ApiListener: Log connection attempts from an already connected client prominently
2024-10-30 13:31:44 +01:00
Yonas Habteab
9d4625e1ec ApiListener: Log connection attempts from an already connected client
Something is definitely going wrong if a client tries to reconnect to
this endpoint while it still has an active connection to that client. So
we shouldn't hide this, but at least log it at info level. Apart from
that, I've added some additional information about the currently active
client, such as when the last message was sent and received.
2024-10-30 11:26:21 +01:00
Alexander Aleksandrovič Klimov
4ca68e444e
Merge pull request #10204 from Icinga/an-HA
doc/: fix "a HA" -> "an HA"
2024-10-24 11:30:24 +02:00