6413 Commits

Author SHA1 Message Date
Alexander A. Klimov
e8da951c1f ApiListener#RelayMessageOne(): log to which Endpoint messages are relayed
if they're for our parent Zone.
2025-04-07 11:55:21 +02:00
Alexander A. Klimov
fc70825cb2 Zone#GetEndpoints(): return endpoints in the specified order, not randomly
ApiListener#RelayMessageOne() relays every given message to the first connected endpoint Zone#GetEndpoints() returns. Randomness in combination with bad luck can direct more traffic (from a particular network segment) to one master than the admin wants.

This change lets the Zone#endpoints order prefer one endpoint over the other.
2025-04-07 11:55:20 +02:00
Alexander A. Klimov
cf9ca50d19 Introduce Endpoint#seconds_{reading_messages,awaiting_semaphore,processing_messages} for the API 2025-04-07 11:55:19 +02:00
Alexander A. Klimov
2b1e1a5a08 Benchmark message reading/waiting/processing time per endpoint 2025-04-07 11:55:19 +02:00
Alexander A. Klimov
002d422738 Introduce AtomicDuration 2025-04-07 11:55:19 +02:00
Yonas Habteab
7a26dea9ff IcingaDB: Don't publish useless data to Redis
The Icinga DB daemon processes the data from the `IcingaApplication`
type only and Icinga DB Web also uses only those stats. However, before
this commit, Icinga DB published all kinds of useless stats to Redis
each second, like the number of (un)reachable hosts, services, and so
on, which is waste of CPU and some other resources. This commit reduces
the published data drastically to only those simple stats coming from
the `IcingaApplication` type.
2025-04-07 11:55:18 +02:00
Yonas Habteab
067131aa51 Value: Add a specialized rvalue reference of Get()
The move `String(Value&&)` constructor tries to partially move `String`
values from a `Value` type. However, since there was no an appropriate
`Value::Get<T>()` implementation that binds to the requested move
operation, the compiler will actually not move the value but copy it
instead as the only available implementation of `Value::Get<T>()`
returns a const reference `const T&`. This commit adds a new overload
that returns a non-const reference and allows to optionally move the string
value of a Value type.
2025-04-07 11:55:16 +02:00
Yonas Habteab
8236d74669 String: Mark move constructor & assignment op as noexcept
The Icinga DB code performs intensive operations on certain STL containers,
primarily on `std::vector<String>`. Specifically, it inserts 2-3 new elements
at the beginning of a vector containing thousands of elements. Without this commit,
all the existing elements would be unnecessarily copied just to accommodate the new
elements at the front. By making this change, the compiler is able to optimize STL
operations like `push_back`, `emplace_back`, and `insert`, enabling it to prefer the
move constructor over copy operations, provided it is guaranteed that no exceptions
will be thrown.
2025-04-07 11:55:16 +02:00
Julian Brost
6e545a531b
Merge pull request #10339 from Icinga/fix-perf-data-normalization
PerfData: Don't discard min/max values even if crit/warn thresholds aren’t given
2025-02-04 17:57:05 +01:00
Maciej Dems
56116d2218 Fix missing values in PerfData normalization 2025-02-04 13:53:45 +01:00
Yonas Habteab
c026f03ee2 Don't abruptly close anonymous connections
This was mistakenly introduced with PR #7686 due to too many open
connections (#7680). This was wrong in the sense that closing the
connection is simply out of place here and should have been handled
differently. After we revised the RPC connection disconnect procedure
with `v2.14.4`, it becomes clear why it is wrong, because the connection
is closed abruptly before the corresponding response (`result`) has
even been written. Now if you remove the disconnect here, shouldn't the
issue #7680 occur again, you ask? The answer is no, because we now also
have a maximum timeout of `10s` for anonymous connections, after which
they are automatically closed. Thanks to the introduction of this
timeout by @julianbrost in #8479, this `Disconnect()` call has become
superfluous.
2025-02-04 13:15:41 +01:00
Alexander Aleksandrovič Klimov
c538324040
Merge pull request #10304 from Icinga/win-progfiles-icinga2-var214
On Windows, don't create C:\Program Files\Icinga2\var during MSI build
2025-01-16 14:03:58 +01:00
Alexander A. Klimov
e241a240a8 On Windows, don't create C:\Program Files\Icinga2\var during MSI build 2025-01-16 12:04:12 +01:00
Alexander A. Klimov
32d604f954 Ido*sqlConnection#FieldToEscapedString(): don't write out of range time
MySQL's FROM_UNIXTIME() NULLs ts <1970, errors for >2038.
Postgres' TO_TIMESTAMP() errors for all ts not between 4713BC - 294276AD.
2025-01-14 10:05:36 +01:00
Alexander A. Klimov
ecce7f8dcb Ido*sqlConnection#FieldToEscapedString(): don't overflow timestamps > long 2025-01-14 10:05:36 +01:00
Yonas Habteab
2c0925cedd
Merge pull request #10293 from Icinga/graceful-tls-disconnect-214
Add a dedicated method for disconnecting TLS connections
2025-01-14 10:03:22 +01:00
Yonas Habteab
ee98a9e335
Merge pull request #10298 from Icinga/timestamp-serialization-issues
IcingaDB: limit several numbers not to crash Go daemon
2025-01-13 16:27:27 +01:00
Yonas Habteab
f53e5343c8
Merge pull request #10292 from Icinga/rpc-sync-failures
Runtime RPC sync failures
2025-01-13 14:44:02 +01:00
Alexander A. Klimov
cf895e7e3f IcingaDB::TimestampToMilliseconds(): limit output to four year digits
Too high timestamps may overflow uint64_t (and the YYYY format) and negative
ones don't fit into uint64_t. Those may crash our Go daemon.
2025-01-13 14:30:54 +01:00
Alexander A. Klimov
c21e99a15c IcingaDB#SerializeState(): limit execution_time and latency to 2^32-1
not to write higher values into Redis than the Icinga DB schema can hold.
This fixes yet another potential Go daemon crash.
2025-01-13 14:30:32 +01:00
Yonas Habteab
14b854d891
Merge pull request #10296 from Icinga/comment-loading-nullptr-deference
Address comment loading where host reference is not found gracefully
2025-01-13 13:17:40 +01:00
Yonas Habteab
7defb0c942
Merge pull request #10295 from Icinga/do-not-write-new-messages-on-shutdown
JsonRpcConnection: don't write new messages on shutdown
2025-01-13 13:11:17 +01:00
Yannick Martin
ec2645d33c icinga2: address comment loading where host reference is not found
address #9752: check if host reference is valid
2025-01-13 11:19:42 +01:00
Alexander Aleksandrovič Klimov
ebf905a220 JsonRpcConnection: don't write new messages on shutdown
In fact, this is already done for the outer loop (for each bulk), just not yet for the inner one (for each message of a bulk). So once the remote signals EOF, don't try to process the remaining queue until write error (which can't be associated with a particular message anyway, due to buffering), but just let the peer go. Flush already half-written messages, though, if possible.
2025-01-13 11:17:23 +01:00
Alexander A. Klimov
2bc1c8e1dc Document Timeout 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
e544fef7a2 Timeout: explicitly delete #Timeout(const Timeout&), #Timeout(Timeout&&), #operator=(const Timeout&), #operator=(Timeout&&) 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
d956920bd7 Move Timeout instances from heap to stack 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
1703f99d14 Don't call Timeout#Cancel() where Timeout#~Timeout() is called 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
fe1420523a Timeout#~Timeout(), #Cancel(): support boost::asio::io_context running on multiple threads 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
a47508b7b3 Timeout#Timeout(): drop unnecessary template parameters 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
d69291739f While using Timeout, don't unnecessarily keep the strand alive via smart pointer 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
ff5ae18b9c Timeout: use a plain callback, not an unnecessary coroutine 2025-01-13 10:42:36 +01:00
Alexander A. Klimov
f839707c4a Timeout#Timeout(): don't pass yield_context to callback
It's not used. Also, the callback shall run completely at once. This ensures that it won't (continue to) run once another coroutine on the strand calls Timeout#Cancel().
2025-01-13 10:42:36 +01:00
Yonas Habteab
a88d6988b4 JsonRpcConnection: Log message processing time stats
Co-Authored-By: Julian Brost <julian.brost@icinga.com>
2025-01-13 10:39:23 +01:00
Yonas Habteab
7225d78047 HttpServerConnection: Log noticable CPU semaphore wait time 2025-01-13 10:39:23 +01:00
Yonas Habteab
7b30cb3431 Don't endlessly wait on writer coroutine on disconnect 2025-01-13 10:36:21 +01:00
Yonas Habteab
f2fbb61ad8 Log before & after an RPC client is disconnected 2025-01-13 10:36:21 +01:00
Yonas Habteab
7ed5c6a2c7 JsonRpcConnection: Don't drop client from cache prematurely
PR #7445 incorrectly assumed that a peer that had already disconnected
and never reconnected was due to the endpoint client being dropped after
a successful socket shutdown. However, the issue at that time was that
there was not a single timeout guards that could cancel the `async_shutdown`
call, petentially blocking indefinetely. Although removing the client from
cache early might have allowed the endpoint to reconnect, it did not
resolve the underlying problem. Now that we have a proper cancellation
timeout, we can wait until the currently used socket is fully closed
before dropping the client from our cache. When our socket termination
works reliably, the `ApiListener` reconnect timer should attempt to
reconnect this endpoint after the next tick. Additionally, we now have
logs both for before and after socket termination, which may help
identify if it is hanging somewhere in between.
2025-01-13 10:36:21 +01:00
Julian Brost
2fffb28ab0 Add comment for remaining uses of async_shutdown() why it's safe
The reason for introducing AsioTlsStream::GracefulDisconnect() was to handle
the TLS shutdown properly with a timeout since it involves a timeout. However,
the implementation of this timeout involves spwaning coroutines which are
redundant in some cases. This commit adds comments to the remaining calls of
async_shutdown() stating why calling it is safe in these places.
2025-01-13 10:33:11 +01:00
Julian Brost
f99d35ed91 HttpServerConnection: use AsioTlsStream::GracefulDisconnect()
This new helper function has proper timeout handling which was missing here.
2025-01-13 10:33:11 +01:00
Julian Brost
28776cb37c JsonRpcConnection: use AsioTlsStream::GracefulDisconnect()
This new helper functions allows deduplicating the timeout handling for
`async_shutdown()`.
2025-01-13 10:33:11 +01:00
Julian Brost
a593bdfa5f AsioTlsStream: add GracefulDisconnect() and ForceDisconnect()
Calling `AsioTlsStream::async_shutdown()` performs a TLS shutdown which
exchanges messages (that's why it takes a `yield_context`) and thus has the
potential to block the coroutine. Therefore, it should be protected with a
timeout. As `async_shutdown()` doesn't simply take a timeout, this has to be
implemented using a timer. So far, these timers are scattered throughout the
codebase with some places missing them entirely. This commit adds helper
functions to properly shutdown a TLS connection with a single function call.
2025-01-13 10:33:11 +01:00
Julian Brost
156ba265e1 Simplify DependencyGraph:RemoveDependency() method 2025-01-13 10:25:48 +01:00
Yonas Habteab
7501525550 ApiListener: Sync runtime configs in order 2025-01-13 10:25:48 +01:00
Yonas Habteab
5c0ce6350c DependencyGraph: Allow lookups by parent & child dependencies 2025-01-13 10:25:48 +01:00
Alexander A. Klimov
c64ff492ec DependencyGraph: use ConfigObject*, not Object*
This saves dynamic_cast<ConfigObject*> + if() on every item of GetChildren().
2025-01-13 10:25:42 +01:00
Alexander A. Klimov
6aa2355427 DependencyGraph: switch "parent" and "child" terminology
The .ti files call `DependencyGraph::AddDependency(this, service.get())`. Obviously, `service.get()` is the parent and `this` (Downtime, Notification, ...) is the child. The DependencyGraph terminology should reflect this not to confuse its future users.
2025-01-13 10:23:28 +01:00
Alexander A. Klimov
3b40ba1fe4 doc/: fix "a HA" -> "an HA" 2024-12-02 10:13:54 +01:00
Yonas Habteab
d768c90937 HttpServerConnection: Don't spawn useless coroutines
Currently, for each `Disconnect()` call, we spawn a coroutine, but every
one of them is just usesless, except the first one. However, since all
`Disconnect()` usages share the same asio strand and cannot interfere
with each other, spawning another coroutine within `Disconnect()` isn't
even necessary. When a coroutine calls `Disconnect()` now, it will
immediately initiate an async shutdown of the socket, potentially causing
the coroutine to yield and allowing the others to resume. Therefore, the
`m_ShuttingDown` flag is still required by the coroutines to be checked
regularly.
2024-11-19 16:08:37 +01:00
Yonas Habteab
6f9ae05948
Merge pull request #10239 from Icinga/state-before-suppression214
Fix lost recovery notifications after recovery outside of notification time period
2024-11-14 13:49:15 +01:00