1177 Commits

Author SHA1 Message Date
Yonas Habteab
ce3275d27f Disallow stage deletions during reload
Once the new worker process has read the config, it also includes a
`include */include.conf` statement within the config packages root
directory, and from there on we must not allow to delete any stage
directory from the config package. Otherwise, when the worker actually
evaluates that include statement, it will fail to find the directory
where the include file is located, or the `active.conf` file, which is
included from each stage's `include.conf` file, thus causing the worker
fail.

Co-Authored-By: Johannes Schmidt <johannes.schmidt@icinga.com>
2025-07-24 16:02:30 +02:00
Yonas Habteab
1ac4d83963 Use AtomicFile where applicable in ConfigPackageUtility 2025-07-24 10:54:39 +02:00
Yonas Habteab
35f42fa5a3 Handle concurrent config package updates gracefully
Previously, we used a simple boolean to track the state of the package updates,
and didn't reset it back when the config validation was successful because it was
assumed that if we successfully validated the config beforehand, then the worker
would also successfully reload the config afterwards, and that the old worker would
be terminated. However, this assumption is not always true due to a number of reasons
that I can't even think of right now, but the most obvious one is that after we successfully
validated the config, the config  might have changed again before the worker was able
to reload it. If that happens, then the new worker might fail to successfully validate
the config due to the recent changes, in which case the old worker would remain active,
and this flag would still be set to true, causing any subsequent requests to fail with a
`423` until you manually restart the Icinga 2 service.

So, in order to prevent such a situation, we are additionally tracking the last time a reload
failed and allow to bypass the `m_RunningPackageUpdates` flag only if the last reload failed
time was changed since the previous request.
2025-07-24 10:54:39 +02:00
Julian Brost
827f85c327
Merge pull request #10387 from Icinga/cnt-msg
Introduce Endpoint#messages_received_per_type
2025-07-16 17:29:24 +02:00
Johannes Schmidt
82bb636d2b Use WaitGroup to wait for or abort HTTP requests
The wait group gets passed to HttpServerConnection, then down to the
HttpHandlers. For those handlers that modify the program state, the
wait group is locked so ApiListener will wait on Stop() for the
request to complete. If the request iterates over config objects,
a further check on the state of the wait group is added to abort early
and not delay program shutdown. In that case, 503 responses will be
sent to the client.

Additionally, in HttpServerConnection, no further requests than the
one already started will be allowed once the wait group is joining.
2025-06-13 14:48:15 +02:00
Johannes Schmidt
33777f6f3f Disconnect JSON-RPC clients on ApiListner::Stop() 2025-06-13 14:48:15 +02:00
Johannes Schmidt
00802ed9fa Stop ApiListener::ListenerCoroutineProc() when Stop() is called 2025-06-13 14:48:11 +02:00
Julian Brost
4e08adc532
Merge pull request #10161 from Icinga/authBYhost-10157
ApiListener::UpdateObjectAuthority(): distribute auth. by object's host
2025-06-06 15:03:32 +02:00
Yonas Habteab
3d04eb456a
Merge pull request #10415 from Icinga/abort-no-endpoint-conns
Abort connections with no valid endpoint
2025-06-02 16:21:46 +02:00
Julian Brost
c253e7eb6e
Merge pull request #10397 from Icinga/activation-priority-10179
Checkable#ProcessCheckResult(): discard🗑️ CR or delay its producers shutdown
2025-05-28 12:30:40 +02:00
Alexander A. Klimov
f4691dd054 Require to pass WaitGroup::Ptr to several methods
Namely:

Checkable#ProcessCheckResult()
ClusterCheckTask::ScriptFunc()
ClusterZoneCheckTask::ScriptFunc()
DummyCheckTask::ScriptFunc()
ExceptionCheckTask::ScriptFunc()
IcingaCheckTask::ScriptFunc()
IfwApiCheckTask::ScriptFunc()
NullCheckTask::ScriptFunc()
PluginCheckTask::ScriptFunc()
RandomCheckTask::ScriptFunc()
SleepCheckTask::ScriptFunc()
IdoCheckTask::ScriptFunc()
IcingadbCheck::ScriptFunc()
CheckCommand#Execute()
Checkable#ExecuteCheck()
ClusterEvents::ExecuteCheckFromQueue()
ExternalCommandProcessor::Process*CheckResult()
ExternalCommandCallback
ExternalCommandProcessor::Execute()
ExternalCommandProcessor::ExecuteFromFile()
ExternalCommandProcessor::ProcessFile()
LivestatusQuery#ExecuteCommandHelper()
LivestatusQuery#Execute()
2025-05-23 14:53:58 +02:00
Alexander A. Klimov
c7cca7b460 Add a StoppableWaitGroup, and join it on #Stop(), to:
ApiListener
CheckerComponent
ExternalCommandListener
LivestatusListener
2025-05-23 14:53:58 +02:00
Alexander A. Klimov
77b86bba52 Move l_MyCapabilities -> ApiCapabilities::MyCapabilities 2025-05-23 10:44:16 +02:00
Alexander A. Klimov
6cd83ba2b8 ApiListener::UpdateObjectAuthority(): distribute auth. by object's host
Pin child objects of hosts (HOST!...) to the same endpoint as the host.
This reduces cross-object action latency withing the same host.
If all endpoints know this algorithm, we can use it.
2025-05-23 10:44:16 +02:00
Alexander A. Klimov
18f810a1ea For the same zone, call ApiListener::UpdateObjectAuthority() in icinga::Hello
after setting remote capabilities. They'll become important for teamwork in a zone.
2025-05-22 15:01:06 +02:00
Alexander Aleksandrovič Klimov
ec2080dcc1
Merge pull request #9731 from Icinga/fix-compiler-warnings-by-copy-constructing-loop-variables-explicitly
Fix compiler warnings by (copy-)constructing loop variables explicitly or not at all
2025-05-21 14:26:47 +02:00
Alexander A. Klimov
22e75f08fa Fix compiler warnings by not unnecessarily (copy-)constructing loop variables 2025-05-21 11:36:32 +02:00
Alexander A. Klimov
98d097517b Introduce Endpoint#messages_received_per_type 2025-04-29 11:42:14 +02:00
Alexander A. Klimov
3dd7b15808 Count incoming messages per type and endpoint 2025-04-29 11:42:14 +02:00
Alexander A. Klimov
c566f6dc31 ApiFunction: store own name 2025-04-29 11:42:14 +02:00
Alexander A. Klimov
331ba1f661 Rename AsioConditionVariable to AsioEvent
The current implementation is rather similar to Python's threading.Event, than to a CV.
2025-04-29 11:39:42 +02:00
Johannes Schmidt
353386f404 Abort verified JSON-RPC connections with no valid endpoint 2025-04-23 16:55:16 +02:00
Yonas Habteab
1da497be89 Fix inverted IsHACluster check 2025-04-23 15:18:06 +02:00
Alexander A. Klimov
c2ddd20ef3 Fix compiler warnings by (copy-)constructing loop variables explicitly
for (const T& needle : haystack) creates the illusion that haystack is a
container of T and we're just borrowing needle. In these cases that's not true.
2025-04-22 13:55:49 +02:00
Julian Brost
ccfc72267f Prefer icinga::String::GetData() over icinga::String::CStr()
Creating the string_view from the std::string (as returned by GetData()) uses
the stored length instead of having to detect it by finding '\0'.
2025-04-14 17:30:19 +02:00
Alexander Aleksandrovič Klimov
011c67964e Don't use boost::asio::io_context::strand method removed in Boost 1.87 2025-04-14 17:30:19 +02:00
Alexander Aleksandrovič Klimov
7bd35d8c6b Don't use boost::asio::ip::tcp::resolver::query
It was removed in Boost 1.87.
2025-04-14 17:30:19 +02:00
Alexander A. Klimov
402a6bbf40 Remove unused EventQueue::Unregister() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
d19c0637ee Remove unused EventQueue::UnregisterIfUnused() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
41f61ccba4 Remove unused ApiFunction::Unregister() 2025-03-18 11:22:56 +01:00
Alexander A. Klimov
cce03c5903 Remove unused ApiAction::Unregister() 2025-03-18 11:22:56 +01:00
Yonas Habteab
fa63fda75b ApiListener: Simplify deferred SSL shutdown in NewClientHandlerInternal() 2025-03-13 12:12:28 +01:00
Julian Brost
21c9ad5323
Merge pull request #10332 from Icinga/do-not-close-connection-in-request-cert-handler
Don't abruptly close anonymous connections
2025-02-04 10:58:17 +01:00
Alexander Aleksandrovič Klimov
065dfe4c40
Merge pull request #9928 from Icinga/no-data-received-on-new-api-connection
API: also log error behind "No data received on new API connection"
2025-02-03 15:39:26 +01:00
Yonas Habteab
25bbac1677 Don't abruptly close anonymous connections
This was mistakenly introduced with PR #7686 due to too many open
connections (#7680). This was wrong in the sense that closing the
connection is simply out of place here and should have been handled
differently. After we revised the RPC connection disconnect procedure
with `v2.14.4`, it becomes clear why it is wrong, because the connection
is closed abruptly before the corresponding response (`result`) has
even been written. Now if you remove the disconnect here, shouldn't the
issue #7680 occur again, you ask? The answer is no, because we now also
have a maximum timeout of `10s` for anonymous connections, after which
they are automatically closed. Thanks to the introduction of this
timeout by @julianbrost in #8479, this `Disconnect()` call has become
superfluous.
2025-01-30 17:45:27 +01:00
Alexander A. Klimov
411c57aac5 API: also log error behind "No data received on new API connection" 2025-01-24 11:28:16 +01:00
Julian Brost
78883669d3
Merge pull request #8169 from Icinga/bugfix/object-query-all-attrs-8167
GET /v1/objects/*: handle "attrs":[] as expected
2025-01-24 09:14:17 +01:00
Alexander A. Klimov
e18c923abb GET /v1/objects/*: handle "attrs":[] as expected
... i.e. yield no attrs and not all.

refs #8167
2025-01-21 11:36:55 +01:00
Alexander Aleksandrovič Klimov
866db3ba3c
Merge pull request #10137 from Icinga/win-progfiles-icinga2-var
On Windows, don't create C:\Program Files\Icinga2\var during MSI build
2025-01-16 12:02:33 +01:00
Yonas Habteab
1425641931 Don't endlessly wait on writer coroutine on disconnect 2025-01-08 16:30:36 +01:00
Yonas Habteab
41373ad0e5 Log before & after an RPC client is disconnected 2025-01-08 16:30:36 +01:00
Yonas Habteab
3af7cfe2ec JsonRpcConnection: Don't drop client from cache prematurely
PR #7445 incorrectly assumed that a peer that had already disconnected
and never reconnected was due to the endpoint client being dropped after
a successful socket shutdown. However, the issue at that time was that
there was not a single timeout guards that could cancel the `async_shutdown`
call, petentially blocking indefinetely. Although removing the client from
cache early might have allowed the endpoint to reconnect, it did not
resolve the underlying problem. Now that we have a proper cancellation
timeout, we can wait until the currently used socket is fully closed
before dropping the client from our cache. When our socket termination
works reliably, the `ApiListener` reconnect timer should attempt to
reconnect this endpoint after the next tick. Additionally, we now have
logs both for before and after socket termination, which may help
identify if it is hanging somewhere in between.
2025-01-08 16:30:36 +01:00
Alexander A. Klimov
27e0e236cb Move Timeout instances from heap to stack 2025-01-07 18:20:50 +01:00
Alexander A. Klimov
d77d7506f1 Don't call Timeout#Cancel() where Timeout#~Timeout() is called 2025-01-07 18:20:14 +01:00
Alexander A. Klimov
cb51649363 Timeout#Timeout(): drop unnecessary template parameters 2025-01-07 18:19:39 +01:00
Alexander A. Klimov
d2285bcf0e While using Timeout, don't unnecessarily keep the strand alive via smart pointer 2025-01-07 18:18:46 +01:00
Alexander A. Klimov
92ab913226 Timeout#Timeout(): don't pass yield_context to callback
It's not used. Also, the callback shall run completely at once. This ensures that it won't (continue to) run once another coroutine on the strand calls Timeout#Cancel().
2025-01-07 18:18:18 +01:00
Yonas Habteab
ff0e12e6ac ApiListener: Sync runtime configs in order 2025-01-07 11:07:46 +01:00
Alexander Aleksandrovič Klimov
383773eb2b
Merge pull request #10264 from Icinga/DependencyGraph-ConfigObject
DependencyGraph: use ConfigObject*, not Object*
2024-12-18 13:36:56 +01:00
Alexander A. Klimov
3a09cf72d6 DependencyGraph: use ConfigObject*, not Object*
This saves dynamic_cast<ConfigObject*> + if() on every item of GetChildren().
2024-12-17 18:33:05 +01:00