Commit Graph

6425 Commits

Author SHA1 Message Date
Yonas Habteab ca7cc54438 Checkable: Don't recalculate `next_check` while processing remotely genrated check
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.

Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
2024-08-16 16:15:56 +02:00
Alexander Aleksandrovič Klimov 02ba5e4101
Merge pull request #10015 from Icinga/malloc_info
/v1/debug/malloc_info: call malloc_info(3) if available
2024-08-12 14:41:09 +02:00
Alexander A. Klimov f3c7ac11e9 /v1/debug/malloc_info: call malloc_info(3) if available
The GNU libc function malloc_info(3) provides memory allocation and usage
statistics of Icinga 2 itself.
2024-08-09 12:59:25 +02:00
Julian Brost 2bfa1f1649
Merge pull request #10107 from Icinga/timeperiod-nth-day-of-month-off-by-one
Timeperiods: fix off by one when calculating n-th last weekday of the month
2024-08-08 14:40:18 +02:00
Julian Brost c45829b59f Timeperiods: fix off by one when calculating n-th last weekday of the month
A day specification like "monday -1" refers to the last Monday of the month.
However, there was an off by one if the first day of the next month is the same
day of the week, i.e. a Monday in this example.

LegacyTimePeriod::FindNthWeekday() picks a day to start the search for the day
in question. When given a negative n to search for the n-th last day, it
wrongly used the first day of the following month as the start and counted it
as if it was within the current month. This resulted in a 1/7 chance that the
result was one week too late.

This is fixed by using the last day of the current month instead.
2024-08-07 12:06:05 +02:00
Yonas Habteab c4edecc1fb Unregister invalid config objects properly 2024-08-06 16:59:30 +02:00
Julian Brost 07d253009a
Merge pull request #10013 from Icinga/broken-runtime-config-sync
Fix broken runtime config sync
2024-08-06 11:57:24 +02:00
Yonas Habteab 86347013a6 Check segemnt start date inclusively in `TimePeriod::IsInside()` 2024-08-01 16:16:48 +02:00
Yonas Habteab 4daa03dc02 Fix broken timeperiods/scheduleddowntimes 2024-08-01 15:14:34 +02:00
Yonas Habteab 546dea95a2 Don't allow to modify/create/delete an object concurrently 2024-06-13 11:26:19 +02:00
Yonas Habteab 099f664ce6 `ConfigObjectUtility#CreateObject()`: Use `Defer` for config path cleanup 2024-06-13 11:26:19 +02:00
Yonas Habteab 433e2de13a ApiListener: Process cluster config updates sequentially 2024-06-13 11:26:19 +02:00
Yonas Habteab 1a55b68541 Introduce RAII style `ObjectNameLock` class 2024-06-13 11:26:19 +02:00
Yonas Habteab 2218ebd6b0 `ConfigObjectUtility`: Use `AtomicFile` to store object config files 2024-06-13 11:26:19 +02:00
Alexander Aleksandrovič Klimov f1be9b73ab
Merge pull request #10060 from Icinga/IcingaDB-SerializeState-execution_time-latency
IcingaDB#SerializeState(): limit execution_time and latency to 2^32-1
2024-06-13 09:55:45 +02:00
Yonas Habteab 81a94a0759 Don't fail to remove obsolete downtimes 2024-05-23 10:09:41 +02:00
Yonas Habteab 4eeccce36c Don't loose args in recursive `Downtime::RemoveDowntime()` call 2024-05-23 10:09:41 +02:00
Yonas Habteab e0fd0d3df4 Introduce & use enum `DowntimeRemovalReason` 2024-05-23 09:34:15 +02:00
Alexander Aleksandrovič Klimov cc3965c3ce
Merge pull request #10065 from Icinga/heavy-update-missing-table-relations
Update `object#config_hash` after all relations queries
2024-05-22 15:38:31 +02:00
Yonas Habteab 1019398d55 Update object#config_hash after all relations queries 2024-05-22 13:39:30 +02:00
Yonas Habteab 3d64240ee3
Merge pull request #10066 from Icinga/Checkable-RemoveAllDowntimes
Remove unused Checkable#RemoveAllDowntimes()
2024-05-21 17:13:16 +02:00
Alexander A. Klimov e2bdb8a2f1 Remove unused Checkable#RemoveAllDowntimes() 2024-05-21 14:28:39 +02:00
Alexander A. Klimov f9adf18111 IcingaDB#SerializeState(): limit execution_time and latency to 2^32-1
not to write higher values into Redis than the Icinga DB schema can hold.
This fixes yet another potential Go daemon crash.
2024-05-15 12:55:41 +02:00
Alexander Aleksandrovič Klimov 8c2eb3c1ed
Merge pull request #10049 from Icinga/AddDowntime-trigger_name
Downtime::AddDowntime(): NULL-check pointer before deref not to crash
2024-05-06 10:26:26 +02:00
Alexander Aleksandrovič Klimov d8f8d64f1a
Merge pull request #10027 from macdems/master
Fix missing values in PerfData normalization
2024-04-25 19:38:21 +02:00
Maciej Dems 2bb5cc62e2 Fix missing values in PerfData normalization 2024-04-25 17:41:12 +02:00
Alexander A. Klimov 5f80ac17aa l_LegacyDowntimesCache: delete removed objects not to leak memory 2024-04-25 12:13:52 +02:00
Alexander A. Klimov c0f87dd4c9 /v1/actions/schedule-downtime: reject request on invalid trigger_name
For this purpose lookup the specified Downtime. Also pass Downtime objects,
not just names, to Downtime::AddDowntime() not to lookup it twice.
2024-04-25 12:13:52 +02:00
Alexander A. Klimov f0b5239a15 [Refactor] Downtime::GetDowntimeIDFromLegacyID(): return the Downtime itself
not just its name.
2024-04-25 12:13:52 +02:00
Alexander A. Klimov 28b0f7a48c [Refactor] l_LegacyDowntimesCache: store Downtime objects, not just their names
to avoid names of vanished objects.
2024-04-24 12:33:56 +02:00
Alexander A. Klimov bb13e98ca5 PluginCheckTask::ProcessFinishedHandler(): warn about exit codes outside 0..3
in the plugin output as well, in addition to the warning log.
2024-04-23 17:45:31 +02:00
Alexander A. Klimov e33befabfb Make ProcessResult#ExitStatus and CheckResult#exit_status 64-bit ints
so that they can hold Windows exit codes like 3221225477 (>2147483647).
2024-04-23 17:45:31 +02:00
Alexander A. Klimov 5c17465a19 OpenTsdbWriter#CheckResultHandler(): skip custom tags with empty values
refs #7724
2024-04-18 11:36:21 +02:00
Julian Brost 31be43ff6c
Merge pull request #10018 from Icinga/revert-9980-config-sync-conflicts
Revert "Process `config::update/delete` cluster events gracefully"
2024-03-08 16:58:28 +01:00
Julian Brost af97431bfb
Merge pull request #10006 from Icinga/http-error-handling
HttpServerConnection: use exceptions for error handling
2024-03-08 15:06:51 +01:00
Yonas Habteab a924a49cd8
Revert "Process `config::update/delete` cluster events gracefully" 2024-03-07 17:17:17 +01:00
Julian Brost 097ba00a9c
Merge pull request #10008 from Icinga/Al2Klimov-patch-12
Don't unnecessarily shuffle items before config validation
2024-03-07 16:44:38 +01:00
Alexander Aleksandrovič Klimov 629038344b
OpenTsdbWriter#CheckResultHandler(): clarify log messages
Clarify which "host or service" an "Unable to resolve macro" debug log message refers to.
2024-02-22 10:34:35 +01:00
Julian Brost abea2f270c
Merge pull request #9997 from Icinga/ListenerCoroutineProc-remote_endpoint
ApiListener#ListenerCoroutineProc(): get remote endpoint ASAP for logging
2024-02-20 13:46:02 +01:00
Alexander Aleksandrovič Klimov 51cdd593da
Don't unnecessarily shuffle items before config validation
Before ae693cb7e1 (#9577) we've repeatedly looped over all items in parallel like this:

while not types.done:
  for t in types:
    if not t.done and t.dependencies.done:
      with parallel(all_items, CONCURRENCY) as some_items:
        for i in some_items:
          if i.type is t:
            i.commit()

I.e. all items got distributed over CONCURRENCY threads, but not always equally. E.g. it was the hosts' turn, but only two threads got hosts and did all the work. The others didn't do actual work (due to the lack of hosts in their queue) which reduced the performance. c721c302cd (#6581) fixed it by shuffling all_items first. ae693cb7e1 (#9577) made the latter unnecessary by replacing the above algorithm with this:

while not types.done:
  for t in types:
    if not t.done and t.dependencies.done:
      with parallel(all_items[t], CONCURRENCY) as some_items:
        for i in some_items:
          if i.type is t:
            i.commit()

I.e. parallel() gets only items of type t, so all threads get e.g. hosts.
2024-02-19 14:26:06 +01:00
Julian Brost 700c5a13d7 HttpServerConnection: use exceptions for error handling
When a HTTP connection dies prematurely while the response is sent,
`http::async_write()` sets the error code to something like broken pipe for
example. When calling `async_flush()` afterwards, it sometimes happens that
this never returns. This results in a resource leak as the coroutine isn't
cleaned up. This commit makes the individual functions throw exceptions instead
of silently ignoring the errors, resulting in the function terminating early
and also resulting in an error being logged as well.
2024-02-19 14:12:41 +01:00
Julian Brost 04ef105caa
Merge pull request #9980 from Icinga/config-sync-conflicts
Process `config::update/delete` cluster events gracefully
2024-02-19 13:49:41 +01:00
Julian Brost 7d1c887a32
Merge pull request #9999 from Icinga/reset-log-message-count-correctly
ApiListener: Reset `m_LogMessageCount` when rotating
2024-02-15 17:06:16 +01:00
Alexander Aleksandrovič Klimov 9db1c4aca3
Merge pull request #8011 from Icinga/bugfix/reset-sigpipe-6912
Reset all signal handlers of child processes
2024-02-15 12:22:36 +01:00
Yonas Habteab 456144c1dc ApiListener: Process cluster config updates sequentially 2024-02-14 14:25:53 +01:00
Yonas Habteab 40011b0584 Introduce `ObjectNamesMutex` helper class 2024-02-14 14:25:53 +01:00
Alexander Aleksandrovič Klimov 1a8ce5a90e
Merge pull request #9575 from Icinga/WorkQueue-ParallelFor
WorkQueue#ParallelFor(): allocate lambda once per thread, not once per item
2024-02-14 12:59:50 +01:00
Julian Brost 2be08aa2e0
Merge pull request #9992 from Icinga/remove-redundat-cpu-bound-work
Drop redundant `CpuBoundWork` usage in `JsonRpcConnection::Disconnect()`
2024-02-13 15:51:34 +01:00
Julian Brost fc6a106345
Merge pull request #9994 from Icinga/redundant-cpu-bound-work-usages
Drop redundant `CpuBoundWork` usages in `lib/remote`
2024-02-13 14:53:59 +01:00
Alexander Aleksandrovič Klimov 48eb563ca0
Merge pull request #9736 from Icinga/stream-read-allow_partial
Stream#Read(): remove de facto unused param allow_partial
2024-02-13 13:04:15 +01:00
Yonas Habteab 008fcd1744 Preserve runtime objects in a tmp file for the entire validation process
Given that the internal `config::Update` cluster events are using this
as well to create received runtime objects, we don't want to persist
first the conf file and the load and validate it with `CompileFile`.
Otherwise, we are forced to remove the newly created file whenever we
can't validate, commit or activate it. This also would also have the
downside that two cluster events for the same object arriving at the
same moment from two different endpoints would result in two different
threads simultaneously creating and loading the same config file -
whereby only one of the surpasses the validation, while the other is
facing an object `re-definition` error and tries to remove that config
file it mistakenly thinks it has created. As a consequence, an object
successfully created by the former is implicitly deleted by the latter
thread, causing the objects to mysteriously disappear.
2024-02-12 15:18:32 +01:00
Yonas Habteab 6e66cd9aff ApiListener: Reset `m_LogMessageCount` when rotating
Closing and re-opening that very same log file shouldn't reset the
counter, otherwise some log files may exceed the max limit per file as
their offset indicator is reset each time they are re-opened.
2024-02-09 18:04:20 +01:00
Yonas Habteab eb813cfb99 HttpServerConnection: Drop superfluous `CpuBoundWork` usage 2024-02-09 15:17:26 +01:00
Alexander A. Klimov 62e1d7650d ApiListener#ListenerCoroutineProc(): get remote endpoint ASAP for logging
On incoming connection timeout we log the remote endpoint which isn't
available if it was already disconnected - an exception is thrown.  Get it
as long as we're still connected not to lose it, nor to get an exception.
2024-02-09 12:27:25 +01:00
Yonas Habteab 32531fe909 EventsHandler: Drop superfluous `CpuBoundWork` usage 2024-02-09 12:00:50 +01:00
Eric Lippmann c7293de91d IoEngine: Always log coroutine exception diagnostics
While analyzing a possible memory leak, we encountered several coroutine
exception messages, which unfortunately do not provide any information
about what exactly went wrong, as exception diagnostics were previously
only logged at the notice level.
2024-02-08 12:09:06 +01:00
Yonas Habteab 72266434df Drop redundant `CpuBoundWork` usages in `lib/remote` 2024-02-08 11:30:23 +01:00
Yonas Habteab e2793f1d88 Drop redundant `CpuBoundWork` usage in `JsonRpcConnection::Disconnect()`
Although there is locking involved here, it shoudln't take too long for
the thread to actually acquire it, since there aren't that many threads
dealing with endpoint clients concurrently. It's just wasting pointless
time trying to obtain a CPU slot.
2024-02-08 11:24:55 +01:00
Alexander Aleksandrovič Klimov e9fcbf400f
Merge pull request #9966 from Icinga/Al2Klimov-patch-3
HttpServerConnection: remove duplicate ")" from a log message
2024-01-18 10:46:51 +01:00
Alexander A. Klimov d48b369554 Reset all signal handlers of child processes
... not to disturb check plugins.

refs #6912
2024-01-17 12:25:59 +01:00
Alexander Aleksandrovič Klimov 966b46e808
Merge pull request #9965 from Icinga/http-request-time
HttpServerConnection: log request processing time as well
2024-01-17 11:30:33 +01:00
Julian Brost b1fe15f694
Merge pull request #9962 from Icinga/influx-disk-9948
Influx DB: truncate timestamps to whole seconds to save disk space
2024-01-17 08:50:16 +01:00
Alexander A. Klimov b6874cc8d4 HttpServerConnection: log request processing time as well 2024-01-16 17:52:07 +01:00
Alexander Aleksandrovič Klimov 6a4cb5c12c
HttpServerConnection: remove duplicate ")" from a log message
The commit 5c32a5a7dc, which introduced it, clearly shows that the other ")" already existed legitimately.
2024-01-16 16:31:00 +01:00
Alexander A. Klimov cc9db3756f Revert "Influx DB: don't unneccessarily truncate timestamps to whole seconds"
This reverts commit eaa3cd83ad.
2024-01-16 12:19:48 +01:00
Alexander A. Klimov fc5b1178c6 Revert "Remove no-op InfluxDB URL param"
This reverts commit 21f548d3c0.
2024-01-16 12:19:47 +01:00
Alexander Aleksandrovič Klimov 28b2db8446
Merge pull request #9851 from Icinga/Al2Klimov-patch-3
Make ObjectImpl<Logger>#GetSeverity() non-virtual
2023-12-22 12:44:51 +01:00
Alexander Aleksandrovič Klimov 6c03598678
Merge pull request #9896 from Icinga/provide-cancel_time-where-has_been_cancelled-may-be-1
Disallow triggering a cancelled downtime, but provide cancel_time in Icinga DB downtime history where has_been_cancelled may be 1
2023-12-20 10:03:09 +01:00
Alexander Aleksandrovič Klimov 949d983a76
Merge pull request #9895 from Icinga/targeted-api-filter
FilterUtility::GetFilterTargets(): don't run filter for specific object(s) for all objects
2023-12-19 15:18:41 +01:00
Alexander Aleksandrovič Klimov 8b2e28a869
Merge pull request #9891 from Icinga/renew-the-ca-9890
ApiListener#Start(): auto-renew CA on its owner
2023-12-19 14:57:47 +01:00
Alexander Aleksandrovič Klimov 96cfc4abe8
Merge pull request #9887 from Icinga/argument-list-too-long-9340
PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment
2023-12-19 14:36:57 +01:00
Alexander A. Klimov 175153ce6a PluginNotificationTask::ScriptFunc(): on Linux truncate output and comment
not to run into an exec(3) error E2BIG due to a too long argument.
This sends a notification with truncated output instead of not sending.
2023-12-19 12:21:03 +01:00
Alexander A. Klimov 966216f4ba RequestCertificateHandler(): also renew if CA needs a renewal
and a newer one is available.
2023-12-18 15:28:11 +01:00
Alexander A. Klimov 551c3afa60 CertificateToString(): allow raw pointer input 2023-12-18 15:28:11 +01:00
Alexander A. Klimov bc778116e9 ApiListener#Start(): auto-renew CA on its owner
otherwise it would expire.
2023-12-18 15:28:11 +01:00
Alexander A. Klimov 36a08b0497 ApiListener#RenewCert(): enable optional CA creation 2023-12-18 15:28:11 +01:00
Alexander A. Klimov 7b55df6f11 CreateCertIcingaCA(EVP_PKEY*, X509_NAME*): enable optional CA creation 2023-12-18 15:28:11 +01:00
Alexander Aleksandrovič Klimov 953eeba061
Merge pull request #9893 from Icinga/do-not-re-notify-if-filtered-states-don-t-change-4503
Discard likely duplicate problem notifications via Notification#last_notified_state_per_user
2023-12-13 16:13:28 +01:00
Alexander A. Klimov ecfc9033b0 FilterUtility::GetFilterTargets(): don't run filter for specific object(s) for all objects 2023-12-13 16:02:50 +01:00
Alexander A. Klimov 15191bcd74 ApplyRule::GetTarget*s(): support constant strings from variables
in addition to literal strings. This is for sandboxed filters with some
variables pre-set by the caller. They're "constant" in that scope, too.
2023-12-13 16:02:50 +01:00
Alexander A. Klimov a04cef1890 Introduce DictExpression#GetExpressions() 2023-12-13 16:02:50 +01:00
Alexander A. Klimov 8bcae97ecc Introduce Dictionary#GetRef() 2023-12-13 16:02:50 +01:00
Alexander A. Klimov 97cd05db7a Notification#BeginExecuteNotification(): on recovery clear last_notified_state_per_user 2023-12-13 13:21:22 +01:00
Alexander A. Klimov 44e9c6f40d Notification#BeginExecuteNotification(): discard likely duplicate problem notifications 2023-12-13 13:21:19 +01:00
Alexander A. Klimov 74f52c6fcd Introduce IsCaUptodate() by splitting IsCertUptodate() 2023-12-13 12:08:34 +01:00
Julian Brost 871fa67b52
Merge pull request #9885 from Icinga/renegotiation 2023-12-12 17:38:09 +01:00
Alexander A. Klimov 2cff763295 Cluster-sync Notification#last_notified_state_per_user 2023-12-12 15:29:50 +01:00
Alexander A. Klimov b25ba7a316 Notification#BeginExecuteNotification(): track state change notifications 2023-12-07 12:43:30 +01:00
Julian Brost d2a7117007
Merge pull request #9899 from Icinga/icinga2-crashes-silently-9897
IcingaDB#SendConfigDelete(): fix missing nullptr check before deref
2023-11-21 11:03:28 +01:00
Alexander Aleksandrovič Klimov 7fc7d054af
Merge pull request #9841 from WuerthPhoenix/fix-9840-lock-console-api-during-reload 2023-11-21 10:36:26 +01:00
Alexander A. Klimov 7174dc864d IcingaDB#SendConfigDelete(): fix missing nullptr check before deref 2023-11-10 17:43:33 +01:00
Alexander A. Klimov 9aaa9901bd Icinga DB downtime history: provide cancel_time where has_been_cancelled may be 1
The table sla_history_downtime requires a downtime_end.
The Go daemon takes the cancel_time if has_been_cancelled is 1.
So we must supply a cancel_time whereever has_been_cancelled is 1.
Otherwise the Go daemon can't process some entries.
2023-11-08 15:22:39 +01:00
Alexander A. Klimov 7ce9457a4a Disable TLS renegotiation
The API doesn't need it and a customer's security scanner
is afraid of a potential DoS attack vector.
2023-11-06 18:46:37 +01:00
Theo Buehler 1f06589f7a Remove dead code in GetSignatureAlgorithm()
This code was added in commit 548eb93 and never did anything useful.
Using X509_get_signature_nid() or its expanded version in the pre-1.1
branch is the correct way of retrieving the signature algorithm of a
certificate.

CLA: trivial
2023-10-20 18:55:44 +02:00
Julian Brost bba6a76f4a
Merge pull request #9853 from Icinga/GelfWriter-m_StreamMutex
GelfWriter: protect m_Stream via m_WorkQueue, not ObjectLock(this)
2023-09-07 11:46:38 +02:00
Alexander Aleksandrovič Klimov e5d988a2fe
Merge pull request #7799 from Icinga/bugfix/file-end
Fix file endings
2023-08-25 11:06:19 +02:00
Alexander A. Klimov 4ee10a6c20 GelfWriter: protect m_Stream via m_WorkQueue, not ObjectLock(this)
On shutdown or HA re-connect ConfigObject#SetAuthority(false) is called which
does ObjectLock(this) and ConfigObject#Pause(). GelfWriter#Pause(), with the
above ObjectLock, calls m_WorkQueue.Join(). But items inside that also doing
ObjectLock(this) cause a deadlock.
2023-08-24 17:48:09 +02:00
Alexander Aleksandrovič Klimov 993c9b742d
Make ObjectImpl<Logger>#GetSeverity() non-virtual
After all it's not overridden.
2023-08-15 13:03:31 +02:00
Mattia Codato 41e21cb8cf Prevent calls to command API while the configuration is reloading.
Fixes #9840
2023-08-09 08:45:04 +02:00
Alexander A. Klimov 1308ad62af Stream#Read(): remove de facto unused param allow_partial
The only caller passes true, so no one forbids partial reads (even implicitly).
All usages in the implementation just assert it being true (allowed).
2023-07-13 16:55:48 +02:00