6509 Commits

Author SHA1 Message Date
Yonas Habteab
e0fd0d3df4 Introduce & use enum DowntimeRemovalReason 2024-05-23 09:34:15 +02:00
Alexander Aleksandrovič Klimov
cc3965c3ce
Merge pull request #10065 from Icinga/heavy-update-missing-table-relations
Update `object#config_hash` after all relations queries
2024-05-22 15:38:31 +02:00
Yonas Habteab
1019398d55 Update object#config_hash after all relations queries 2024-05-22 13:39:30 +02:00
Yonas Habteab
3d64240ee3
Merge pull request #10066 from Icinga/Checkable-RemoveAllDowntimes
Remove unused Checkable#RemoveAllDowntimes()
2024-05-21 17:13:16 +02:00
Alexander A. Klimov
e2bdb8a2f1 Remove unused Checkable#RemoveAllDowntimes() 2024-05-21 14:28:39 +02:00
Alexander A. Klimov
f9adf18111 IcingaDB#SerializeState(): limit execution_time and latency to 2^32-1
not to write higher values into Redis than the Icinga DB schema can hold.
This fixes yet another potential Go daemon crash.
2024-05-15 12:55:41 +02:00
Alexander Aleksandrovič Klimov
8c2eb3c1ed
Merge pull request #10049 from Icinga/AddDowntime-trigger_name
Downtime::AddDowntime(): NULL-check pointer before deref not to crash
2024-05-06 10:26:26 +02:00
Alexander Aleksandrovič Klimov
d8f8d64f1a
Merge pull request #10027 from macdems/master
Fix missing values in PerfData normalization
2024-04-25 19:38:21 +02:00
Maciej Dems
2bb5cc62e2 Fix missing values in PerfData normalization 2024-04-25 17:41:12 +02:00
Alexander A. Klimov
5f80ac17aa l_LegacyDowntimesCache: delete removed objects not to leak memory 2024-04-25 12:13:52 +02:00
Alexander A. Klimov
c0f87dd4c9 /v1/actions/schedule-downtime: reject request on invalid trigger_name
For this purpose lookup the specified Downtime. Also pass Downtime objects,
not just names, to Downtime::AddDowntime() not to lookup it twice.
2024-04-25 12:13:52 +02:00
Alexander A. Klimov
f0b5239a15 [Refactor] Downtime::GetDowntimeIDFromLegacyID(): return the Downtime itself
not just its name.
2024-04-25 12:13:52 +02:00
Alexander A. Klimov
28b0f7a48c [Refactor] l_LegacyDowntimesCache: store Downtime objects, not just their names
to avoid names of vanished objects.
2024-04-24 12:33:56 +02:00
Alexander A. Klimov
bb13e98ca5 PluginCheckTask::ProcessFinishedHandler(): warn about exit codes outside 0..3
in the plugin output as well, in addition to the warning log.
2024-04-23 17:45:31 +02:00
Alexander A. Klimov
e33befabfb Make ProcessResult#ExitStatus and CheckResult#exit_status 64-bit ints
so that they can hold Windows exit codes like 3221225477 (>2147483647).
2024-04-23 17:45:31 +02:00
Alexander A. Klimov
5c17465a19 OpenTsdbWriter#CheckResultHandler(): skip custom tags with empty values
refs #7724
2024-04-18 11:36:21 +02:00
Yannick Martin
5e92450877 icinga2: address comment loading where host reference is not found
address #9752: check if host reference is valid
2024-03-11 12:42:23 +01:00
Julian Brost
31be43ff6c
Merge pull request #10018 from Icinga/revert-9980-config-sync-conflicts
Revert "Process `config::update/delete` cluster events gracefully"
2024-03-08 16:58:28 +01:00
Julian Brost
af97431bfb
Merge pull request #10006 from Icinga/http-error-handling
HttpServerConnection: use exceptions for error handling
2024-03-08 15:06:51 +01:00
Yonas Habteab
a924a49cd8
Revert "Process config::update/delete cluster events gracefully" 2024-03-07 17:17:17 +01:00
Julian Brost
097ba00a9c
Merge pull request #10008 from Icinga/Al2Klimov-patch-12
Don't unnecessarily shuffle items before config validation
2024-03-07 16:44:38 +01:00
Alexander Aleksandrovič Klimov
629038344b
OpenTsdbWriter#CheckResultHandler(): clarify log messages
Clarify which "host or service" an "Unable to resolve macro" debug log message refers to.
2024-02-22 10:34:35 +01:00
Julian Brost
abea2f270c
Merge pull request #9997 from Icinga/ListenerCoroutineProc-remote_endpoint
ApiListener#ListenerCoroutineProc(): get remote endpoint ASAP for logging
2024-02-20 13:46:02 +01:00
Alexander Aleksandrovič Klimov
51cdd593da
Don't unnecessarily shuffle items before config validation
Before ae693cb7e1df1b885142854cf8a0f8a7600a3fb7 (#9577) we've repeatedly looped over all items in parallel like this:

while not types.done:
  for t in types:
    if not t.done and t.dependencies.done:
      with parallel(all_items, CONCURRENCY) as some_items:
        for i in some_items:
          if i.type is t:
            i.commit()

I.e. all items got distributed over CONCURRENCY threads, but not always equally. E.g. it was the hosts' turn, but only two threads got hosts and did all the work. The others didn't do actual work (due to the lack of hosts in their queue) which reduced the performance. c721c302cd9c96bee25a20b3862dad347345648a (#6581) fixed it by shuffling all_items first. ae693cb7e1df1b885142854cf8a0f8a7600a3fb7 (#9577) made the latter unnecessary by replacing the above algorithm with this:

while not types.done:
  for t in types:
    if not t.done and t.dependencies.done:
      with parallel(all_items[t], CONCURRENCY) as some_items:
        for i in some_items:
          if i.type is t:
            i.commit()

I.e. parallel() gets only items of type t, so all threads get e.g. hosts.
2024-02-19 14:26:06 +01:00
Julian Brost
700c5a13d7 HttpServerConnection: use exceptions for error handling
When a HTTP connection dies prematurely while the response is sent,
`http::async_write()` sets the error code to something like broken pipe for
example. When calling `async_flush()` afterwards, it sometimes happens that
this never returns. This results in a resource leak as the coroutine isn't
cleaned up. This commit makes the individual functions throw exceptions instead
of silently ignoring the errors, resulting in the function terminating early
and also resulting in an error being logged as well.
2024-02-19 14:12:41 +01:00
Julian Brost
04ef105caa
Merge pull request #9980 from Icinga/config-sync-conflicts
Process `config::update/delete` cluster events gracefully
2024-02-19 13:49:41 +01:00
Julian Brost
7d1c887a32
Merge pull request #9999 from Icinga/reset-log-message-count-correctly
ApiListener: Reset `m_LogMessageCount` when rotating
2024-02-15 17:06:16 +01:00
Alexander Aleksandrovič Klimov
9db1c4aca3
Merge pull request #8011 from Icinga/bugfix/reset-sigpipe-6912
Reset all signal handlers of child processes
2024-02-15 12:22:36 +01:00
Yonas Habteab
456144c1dc ApiListener: Process cluster config updates sequentially 2024-02-14 14:25:53 +01:00
Yonas Habteab
40011b0584 Introduce ObjectNamesMutex helper class 2024-02-14 14:25:53 +01:00
Alexander Aleksandrovič Klimov
1a8ce5a90e
Merge pull request #9575 from Icinga/WorkQueue-ParallelFor
WorkQueue#ParallelFor(): allocate lambda once per thread, not once per item
2024-02-14 12:59:50 +01:00
Julian Brost
2be08aa2e0
Merge pull request #9992 from Icinga/remove-redundat-cpu-bound-work
Drop redundant `CpuBoundWork` usage in `JsonRpcConnection::Disconnect()`
2024-02-13 15:51:34 +01:00
Julian Brost
fc6a106345
Merge pull request #9994 from Icinga/redundant-cpu-bound-work-usages
Drop redundant `CpuBoundWork` usages in `lib/remote`
2024-02-13 14:53:59 +01:00
Alexander Aleksandrovič Klimov
48eb563ca0
Merge pull request #9736 from Icinga/stream-read-allow_partial
Stream#Read(): remove de facto unused param allow_partial
2024-02-13 13:04:15 +01:00
Yonas Habteab
008fcd1744 Preserve runtime objects in a tmp file for the entire validation process
Given that the internal `config::Update` cluster events are using this
as well to create received runtime objects, we don't want to persist
first the conf file and the load and validate it with `CompileFile`.
Otherwise, we are forced to remove the newly created file whenever we
can't validate, commit or activate it. This also would also have the
downside that two cluster events for the same object arriving at the
same moment from two different endpoints would result in two different
threads simultaneously creating and loading the same config file -
whereby only one of the surpasses the validation, while the other is
facing an object `re-definition` error and tries to remove that config
file it mistakenly thinks it has created. As a consequence, an object
successfully created by the former is implicitly deleted by the latter
thread, causing the objects to mysteriously disappear.
2024-02-12 15:18:32 +01:00
Yonas Habteab
6e66cd9aff ApiListener: Reset m_LogMessageCount when rotating
Closing and re-opening that very same log file shouldn't reset the
counter, otherwise some log files may exceed the max limit per file as
their offset indicator is reset each time they are re-opened.
2024-02-09 18:04:20 +01:00
Yonas Habteab
eb813cfb99 HttpServerConnection: Drop superfluous CpuBoundWork usage 2024-02-09 15:17:26 +01:00
Alexander A. Klimov
62e1d7650d ApiListener#ListenerCoroutineProc(): get remote endpoint ASAP for logging
On incoming connection timeout we log the remote endpoint which isn't
available if it was already disconnected - an exception is thrown.  Get it
as long as we're still connected not to lose it, nor to get an exception.
2024-02-09 12:27:25 +01:00
Yonas Habteab
32531fe909 EventsHandler: Drop superfluous CpuBoundWork usage 2024-02-09 12:00:50 +01:00
Eric Lippmann
c7293de91d IoEngine: Always log coroutine exception diagnostics
While analyzing a possible memory leak, we encountered several coroutine
exception messages, which unfortunately do not provide any information
about what exactly went wrong, as exception diagnostics were previously
only logged at the notice level.
2024-02-08 12:09:06 +01:00
Yonas Habteab
72266434df Drop redundant CpuBoundWork usages in lib/remote 2024-02-08 11:30:23 +01:00
Yonas Habteab
e2793f1d88 Drop redundant CpuBoundWork usage in JsonRpcConnection::Disconnect()
Although there is locking involved here, it shoudln't take too long for
the thread to actually acquire it, since there aren't that many threads
dealing with endpoint clients concurrently. It's just wasting pointless
time trying to obtain a CPU slot.
2024-02-08 11:24:55 +01:00
Alexander Aleksandrovič Klimov
e9fcbf400f
Merge pull request #9966 from Icinga/Al2Klimov-patch-3
HttpServerConnection: remove duplicate ")" from a log message
2024-01-18 10:46:51 +01:00
Alexander A. Klimov
d48b369554 Reset all signal handlers of child processes
... not to disturb check plugins.

refs #6912
2024-01-17 12:25:59 +01:00
Alexander Aleksandrovič Klimov
966b46e808
Merge pull request #9965 from Icinga/http-request-time
HttpServerConnection: log request processing time as well
2024-01-17 11:30:33 +01:00
Julian Brost
b1fe15f694
Merge pull request #9962 from Icinga/influx-disk-9948
Influx DB: truncate timestamps to whole seconds to save disk space
2024-01-17 08:50:16 +01:00
Alexander A. Klimov
b6874cc8d4 HttpServerConnection: log request processing time as well 2024-01-16 17:52:07 +01:00
Alexander Aleksandrovič Klimov
6a4cb5c12c
HttpServerConnection: remove duplicate ")" from a log message
The commit 5c32a5a7dcd220598d36b2b47e745d14c23edb93, which introduced it, clearly shows that the other ")" already existed legitimately.
2024-01-16 16:31:00 +01:00
Alexander A. Klimov
cc9db3756f Revert "Influx DB: don't unneccessarily truncate timestamps to whole seconds"
This reverts commit eaa3cd83adf860732b955a77b8f5fca7e30c65c2.
2024-01-16 12:19:48 +01:00
Alexander A. Klimov
fc5b1178c6 Revert "Remove no-op InfluxDB URL param"
This reverts commit 21f548d3c07189c6a413cf88c2b60cc9ada73497.
2024-01-16 12:19:47 +01:00