6527 Commits

Author SHA1 Message Date
Julian Brost
4c83d793a6
Merge pull request #9983 from Icinga/broken-timeperiod
Fix broken `TimePeriod/ScheduledDowntime`s
2024-08-20 10:05:59 +02:00
Yonas Habteab
ca7cc54438 Checkable: Don't recalculate next_check while processing remotely genrated check
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.

Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
2024-08-16 16:15:56 +02:00
Alexander Aleksandrovič Klimov
02ba5e4101
Merge pull request #10015 from Icinga/malloc_info
/v1/debug/malloc_info: call malloc_info(3) if available
2024-08-12 14:41:09 +02:00
Alexander A. Klimov
f3c7ac11e9 /v1/debug/malloc_info: call malloc_info(3) if available
The GNU libc function malloc_info(3) provides memory allocation and usage
statistics of Icinga 2 itself.
2024-08-09 12:59:25 +02:00
Julian Brost
2bfa1f1649
Merge pull request #10107 from Icinga/timeperiod-nth-day-of-month-off-by-one
Timeperiods: fix off by one when calculating n-th last weekday of the month
2024-08-08 14:40:18 +02:00
Julian Brost
c45829b59f Timeperiods: fix off by one when calculating n-th last weekday of the month
A day specification like "monday -1" refers to the last Monday of the month.
However, there was an off by one if the first day of the next month is the same
day of the week, i.e. a Monday in this example.

LegacyTimePeriod::FindNthWeekday() picks a day to start the search for the day
in question. When given a negative n to search for the n-th last day, it
wrongly used the first day of the following month as the start and counted it
as if it was within the current month. This resulted in a 1/7 chance that the
result was one week too late.

This is fixed by using the last day of the current month instead.
2024-08-07 12:06:05 +02:00
Yonas Habteab
c4edecc1fb Unregister invalid config objects properly 2024-08-06 16:59:30 +02:00
Julian Brost
07d253009a
Merge pull request #10013 from Icinga/broken-runtime-config-sync
Fix broken runtime config sync
2024-08-06 11:57:24 +02:00
Yonas Habteab
86347013a6 Check segemnt start date inclusively in TimePeriod::IsInside() 2024-08-01 16:16:48 +02:00
Yonas Habteab
4daa03dc02 Fix broken timeperiods/scheduleddowntimes 2024-08-01 15:14:34 +02:00
Yonas Habteab
546dea95a2 Don't allow to modify/create/delete an object concurrently 2024-06-13 11:26:19 +02:00
Yonas Habteab
099f664ce6 ConfigObjectUtility#CreateObject(): Use Defer for config path cleanup 2024-06-13 11:26:19 +02:00
Yonas Habteab
433e2de13a ApiListener: Process cluster config updates sequentially 2024-06-13 11:26:19 +02:00
Yonas Habteab
1a55b68541 Introduce RAII style ObjectNameLock class 2024-06-13 11:26:19 +02:00
Yonas Habteab
2218ebd6b0 ConfigObjectUtility: Use AtomicFile to store object config files 2024-06-13 11:26:19 +02:00
Alexander Aleksandrovič Klimov
f1be9b73ab
Merge pull request #10060 from Icinga/IcingaDB-SerializeState-execution_time-latency
IcingaDB#SerializeState(): limit execution_time and latency to 2^32-1
2024-06-13 09:55:45 +02:00
Yonas Habteab
81a94a0759 Don't fail to remove obsolete downtimes 2024-05-23 10:09:41 +02:00
Yonas Habteab
4eeccce36c Don't loose args in recursive Downtime::RemoveDowntime() call 2024-05-23 10:09:41 +02:00
Yonas Habteab
e0fd0d3df4 Introduce & use enum DowntimeRemovalReason 2024-05-23 09:34:15 +02:00
Alexander Aleksandrovič Klimov
cc3965c3ce
Merge pull request #10065 from Icinga/heavy-update-missing-table-relations
Update `object#config_hash` after all relations queries
2024-05-22 15:38:31 +02:00
Yonas Habteab
1019398d55 Update object#config_hash after all relations queries 2024-05-22 13:39:30 +02:00
Yonas Habteab
3d64240ee3
Merge pull request #10066 from Icinga/Checkable-RemoveAllDowntimes
Remove unused Checkable#RemoveAllDowntimes()
2024-05-21 17:13:16 +02:00
Alexander A. Klimov
e2bdb8a2f1 Remove unused Checkable#RemoveAllDowntimes() 2024-05-21 14:28:39 +02:00
Alexander A. Klimov
f9adf18111 IcingaDB#SerializeState(): limit execution_time and latency to 2^32-1
not to write higher values into Redis than the Icinga DB schema can hold.
This fixes yet another potential Go daemon crash.
2024-05-15 12:55:41 +02:00
Alexander Aleksandrovič Klimov
8c2eb3c1ed
Merge pull request #10049 from Icinga/AddDowntime-trigger_name
Downtime::AddDowntime(): NULL-check pointer before deref not to crash
2024-05-06 10:26:26 +02:00
Alexander Aleksandrovič Klimov
d8f8d64f1a
Merge pull request #10027 from macdems/master
Fix missing values in PerfData normalization
2024-04-25 19:38:21 +02:00
Maciej Dems
2bb5cc62e2 Fix missing values in PerfData normalization 2024-04-25 17:41:12 +02:00
Alexander A. Klimov
5f80ac17aa l_LegacyDowntimesCache: delete removed objects not to leak memory 2024-04-25 12:13:52 +02:00
Alexander A. Klimov
c0f87dd4c9 /v1/actions/schedule-downtime: reject request on invalid trigger_name
For this purpose lookup the specified Downtime. Also pass Downtime objects,
not just names, to Downtime::AddDowntime() not to lookup it twice.
2024-04-25 12:13:52 +02:00
Alexander A. Klimov
f0b5239a15 [Refactor] Downtime::GetDowntimeIDFromLegacyID(): return the Downtime itself
not just its name.
2024-04-25 12:13:52 +02:00
Alexander A. Klimov
28b0f7a48c [Refactor] l_LegacyDowntimesCache: store Downtime objects, not just their names
to avoid names of vanished objects.
2024-04-24 12:33:56 +02:00
Alexander A. Klimov
bb13e98ca5 PluginCheckTask::ProcessFinishedHandler(): warn about exit codes outside 0..3
in the plugin output as well, in addition to the warning log.
2024-04-23 17:45:31 +02:00
Alexander A. Klimov
e33befabfb Make ProcessResult#ExitStatus and CheckResult#exit_status 64-bit ints
so that they can hold Windows exit codes like 3221225477 (>2147483647).
2024-04-23 17:45:31 +02:00
Alexander A. Klimov
5c17465a19 OpenTsdbWriter#CheckResultHandler(): skip custom tags with empty values
refs #7724
2024-04-18 11:36:21 +02:00
Yannick Martin
5e92450877 icinga2: address comment loading where host reference is not found
address #9752: check if host reference is valid
2024-03-11 12:42:23 +01:00
Julian Brost
31be43ff6c
Merge pull request #10018 from Icinga/revert-9980-config-sync-conflicts
Revert "Process `config::update/delete` cluster events gracefully"
2024-03-08 16:58:28 +01:00
Julian Brost
af97431bfb
Merge pull request #10006 from Icinga/http-error-handling
HttpServerConnection: use exceptions for error handling
2024-03-08 15:06:51 +01:00
Yonas Habteab
a924a49cd8
Revert "Process config::update/delete cluster events gracefully" 2024-03-07 17:17:17 +01:00
Julian Brost
097ba00a9c
Merge pull request #10008 from Icinga/Al2Klimov-patch-12
Don't unnecessarily shuffle items before config validation
2024-03-07 16:44:38 +01:00
Alexander Aleksandrovič Klimov
629038344b
OpenTsdbWriter#CheckResultHandler(): clarify log messages
Clarify which "host or service" an "Unable to resolve macro" debug log message refers to.
2024-02-22 10:34:35 +01:00
Julian Brost
abea2f270c
Merge pull request #9997 from Icinga/ListenerCoroutineProc-remote_endpoint
ApiListener#ListenerCoroutineProc(): get remote endpoint ASAP for logging
2024-02-20 13:46:02 +01:00
Alexander Aleksandrovič Klimov
51cdd593da
Don't unnecessarily shuffle items before config validation
Before ae693cb7e1df1b885142854cf8a0f8a7600a3fb7 (#9577) we've repeatedly looped over all items in parallel like this:

while not types.done:
  for t in types:
    if not t.done and t.dependencies.done:
      with parallel(all_items, CONCURRENCY) as some_items:
        for i in some_items:
          if i.type is t:
            i.commit()

I.e. all items got distributed over CONCURRENCY threads, but not always equally. E.g. it was the hosts' turn, but only two threads got hosts and did all the work. The others didn't do actual work (due to the lack of hosts in their queue) which reduced the performance. c721c302cd9c96bee25a20b3862dad347345648a (#6581) fixed it by shuffling all_items first. ae693cb7e1df1b885142854cf8a0f8a7600a3fb7 (#9577) made the latter unnecessary by replacing the above algorithm with this:

while not types.done:
  for t in types:
    if not t.done and t.dependencies.done:
      with parallel(all_items[t], CONCURRENCY) as some_items:
        for i in some_items:
          if i.type is t:
            i.commit()

I.e. parallel() gets only items of type t, so all threads get e.g. hosts.
2024-02-19 14:26:06 +01:00
Julian Brost
700c5a13d7 HttpServerConnection: use exceptions for error handling
When a HTTP connection dies prematurely while the response is sent,
`http::async_write()` sets the error code to something like broken pipe for
example. When calling `async_flush()` afterwards, it sometimes happens that
this never returns. This results in a resource leak as the coroutine isn't
cleaned up. This commit makes the individual functions throw exceptions instead
of silently ignoring the errors, resulting in the function terminating early
and also resulting in an error being logged as well.
2024-02-19 14:12:41 +01:00
Julian Brost
04ef105caa
Merge pull request #9980 from Icinga/config-sync-conflicts
Process `config::update/delete` cluster events gracefully
2024-02-19 13:49:41 +01:00
Julian Brost
7d1c887a32
Merge pull request #9999 from Icinga/reset-log-message-count-correctly
ApiListener: Reset `m_LogMessageCount` when rotating
2024-02-15 17:06:16 +01:00
Alexander Aleksandrovič Klimov
9db1c4aca3
Merge pull request #8011 from Icinga/bugfix/reset-sigpipe-6912
Reset all signal handlers of child processes
2024-02-15 12:22:36 +01:00
Yonas Habteab
456144c1dc ApiListener: Process cluster config updates sequentially 2024-02-14 14:25:53 +01:00
Yonas Habteab
40011b0584 Introduce ObjectNamesMutex helper class 2024-02-14 14:25:53 +01:00
Alexander Aleksandrovič Klimov
1a8ce5a90e
Merge pull request #9575 from Icinga/WorkQueue-ParallelFor
WorkQueue#ParallelFor(): allocate lambda once per thread, not once per item
2024-02-14 12:59:50 +01:00
Julian Brost
2be08aa2e0
Merge pull request #9992 from Icinga/remove-redundat-cpu-bound-work
Drop redundant `CpuBoundWork` usage in `JsonRpcConnection::Disconnect()`
2024-02-13 15:51:34 +01:00