152 Commits

Author SHA1 Message Date
Julian Brost
c253e7eb6e
Merge pull request #10397 from Icinga/activation-priority-10179
Checkable#ProcessCheckResult(): discard🗑️ CR or delay its producers shutdown
2025-05-28 12:30:40 +02:00
Alexander A. Klimov
36743f3100 Checkable#ProcessCheckResult(): discard CR or delay its producers shutdown 2025-05-23 14:53:58 +02:00
Alexander A. Klimov
f4691dd054 Require to pass WaitGroup::Ptr to several methods
Namely:

Checkable#ProcessCheckResult()
ClusterCheckTask::ScriptFunc()
ClusterZoneCheckTask::ScriptFunc()
DummyCheckTask::ScriptFunc()
ExceptionCheckTask::ScriptFunc()
IcingaCheckTask::ScriptFunc()
IfwApiCheckTask::ScriptFunc()
NullCheckTask::ScriptFunc()
PluginCheckTask::ScriptFunc()
RandomCheckTask::ScriptFunc()
SleepCheckTask::ScriptFunc()
IdoCheckTask::ScriptFunc()
IcingadbCheck::ScriptFunc()
CheckCommand#Execute()
Checkable#ExecuteCheck()
ClusterEvents::ExecuteCheckFromQueue()
ExternalCommandProcessor::Process*CheckResult()
ExternalCommandCallback
ExternalCommandProcessor::Execute()
ExternalCommandProcessor::ExecuteFromFile()
ExternalCommandProcessor::ProcessFile()
LivestatusQuery#ExecuteCommandHelper()
LivestatusQuery#Execute()
2025-05-23 14:53:58 +02:00
Yonas Habteab
2b0f73987e
Merge pull request #10443 from Icinga/use-correct-timeout-for-command-endpoints
Checkable: Use correct timeout for rescheduling remote checks
2025-05-23 10:14:49 +02:00
Alexander A. Klimov
69d2a0442a Remove unused ProcessingResult::NoCheckResult
No one passes a NULL CR to Checkable#ProcessCheckResult() anymore.
2025-05-20 10:41:26 +02:00
Yonas Habteab
0c2215a9f8 Checkable: Use correct timeout for rescheduling remote checks
Previously, the `command#timeout` which by default is `1m`, was used to reschedule
the just sent remote check. However, this results into a bunch of extra checks being
sent to the remote host, even though the first one is still running. That's because
if one want to override the default timeout of the command for a specific host/service,
one has to set the `checkable#check_timeout` attribute to the desired value. So, this
commit makes sure that the `checkable#check_timeout` attribute (if set) is used to
reschedule the remote check.
2025-05-19 14:07:33 +02:00
Yonas Habteab
9cc3971288
Merge pull request #10352 from Icinga/checkable-checkercomponent-fixed-timestamp-debug-logs
Fixed double output for timestamps in debug log
2025-04-14 12:16:51 +02:00
Alvar Penning
2ce34e8134
Fixed double output for timestamps in debug log
The timestamps used both in the CheckerComponent and Checkable debug
logs were printed in the scientific notation, making them effectively
useless.

> debug/CheckerComponent: Scheduling info for checkable 'host!service' (2025-02-26 14:53:16 +0100): Object 'host!service', Next Check: 2025-02-26 14:53:16 +0100(1.74058e+09).
> debug/Checkable: Update checkable 'host!service' with check interval '300' from last check time at 2025-02-26 14:48:47 +0100 (1.74058e+09) to next check time at 2025-02-26 14:58:12 +0100 (1.74058e+09).

Switching to std::fixed actually shows the complete Unix timestamp.

> debug/CheckerComponent: Scheduling info for checkable 'host!service' (2025-02-26 15:36:44 +0000): Object 'host!service', Next Check: 2025-02-26 15:36:44 +0000 (1740584204).
> debug/Checkable: Update checkable 'host!service' with check interval '60' from last check time at 2025-02-26 15:37:11 +0000 (1740584232) to next check time at 2025-02-26 15:38:09 +0000 (1740584290).
2025-04-14 10:09:42 +02:00
Yonas Habteab
772420a438 Checkable: Don't always trigger reachablity changed signal
But only when the current check result being processed affects the child
Checkables in any way.
2025-03-12 16:19:22 +01:00
Alexander A. Klimov
095e5982f4 doc/: fix "a HA" -> "an HA" 2024-10-24 09:44:36 +02:00
Yonas Habteab
ca7cc54438 Checkable: Don't recalculate next_check while processing remotely genrated check
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.

Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
2024-08-16 16:15:56 +02:00
Alexander A. Klimov
6414fd19f5 Checkable#ProcessCheckResult(): only clean up ack comments older than check result
Normally if for some reason an ack comment still exists on a checkable not
acked anymore, still clean it up. But while replaying log config objects
incl. ack comments come before check results and acks. I.e. 1) ack comment,
2) DOWN check result and 3) ack. Not 1) DOWN check result, 2) ack and 3) ack
comment. So the checkable is temporarily not acked, but already has the ack
comment. In this case the DOWN check result which is older than the ack
comment shall not clean up the latter.
2023-03-03 15:48:34 +01:00
Alexander A. Klimov
dceb29c742 Checkable#RemoveCommentsByType(): remove redundant parameter 2023-03-03 11:53:02 +01:00
Alexander A. Klimov
83021f8231 CONTEXT: use << everywhere to unify usages 2022-11-30 11:06:51 +01:00
Damiano Chini
9d9810b44d Return correct status codes in process-check-result API 2022-04-26 13:33:59 +02:00
Julian Brost
39cee3538a Checkable: improve state notifications after suppression ends
This commit changes the Checkable notification suppression logic (notifications
are currently suppressed on the Checkable if it is unreachable, in a downtime,
or acknowledged) to that after the suppression reason ends, a state
notification is sent if and only if the first hard state after is different
from the last hard state from before. If the checkable is in a soft state after
the suppression ends, the notification is further suppressed until a hard state
is reached.

To achieve this behavior, a new attribute state_before_suppression is added to
Checkable. This attribute is set to the last hard state the first time either a
PROBLEM or a RECOVERY notification is suppressed. Compared to from before,
neither of these two flags in the suppressed_notification will ever be cleared
while the supression is still ongoing but only after the suppression ended and
the current state is compared with the old state stored in
state_before_suppression.
2022-03-03 14:25:23 +01:00
Julian Brost
3bb9cdb8cc Prevent deadlock in ProcessCheckResult
Without this commit, children and parents of a checkable were rescheduled on a
state change while holding the lock for the current checkable. If both ends of
a dependency are checked at the same time and both change state, they could end
up in a deadlock waiting for each other.

This commit fixes this problem by changing the code so that other checkables
are rescheduled only after releasing the lock for the current checkable.
2022-02-17 16:13:25 +01:00
Alexander A. Klimov
2ef3dd6a38 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often
Call it only on state changes to reduce no-op Redis/IDO updates a lot.

refs #9143
2022-02-03 11:12:53 +01:00
Alexander A. Klimov
4c38715ef2 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last
to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged().
That needs the dependency parent to be already in the correct state.

refs #9143
2022-01-25 13:33:46 +01:00
Alexander A. Klimov
1fee3f1b12 IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
Now it gets everything from one source, the CheckResult.

refs #9132
2022-01-10 19:18:11 +01:00
Julian Brost
c71029f2e8 Set downtime trigger time deterministically
When triggering a downtime, the time of the causing event is now passed on as
the trigger time. That time is:

* For fixed downtimes: the later one of start and entry time.
* If a check result triggers the downtime: The execution end of the check
  result.
* If another downtime triggers the downtime: The trigger time of the first
  downtime.

This is done so two nodes in a HA setup can write consistent Icinga DB downtime
history streams.

refs #9101
2021-12-08 14:15:50 +01:00
Michael Insel
da394b2ab0
Implement scheduling_source attribute (#6326)
* Implement scheduling_source attribute

This implements the attribute `scheduling_source` for hosts and services to show which endpoint is running the scheduler for the check.

refs #4814
2021-07-20 11:10:26 +02:00
Alexander Aleksandrovič Klimov
aa0baf6f69
Merge pull request #8099 from Icinga/feature/std-mutex
Use std::mutex, not boost::mutex
2021-02-04 10:19:04 +01:00
Alexander A. Klimov
c3388e9af6 Use std::mutex, not boost::mutex 2021-02-03 09:54:57 +01:00
Julian Brost
9219f68c83
Merge pull request #8158 from Icinga/bugfix/check-source-passive-7948
Checkable#ProcessCheckResult(): don't overwrite check source
2021-01-26 10:49:55 +01:00
Alexander A. Klimov
c3eba7e88d Checkable#ProcessCheckResult(): don't overwrite check source
... set by passive check results.

refs #7948
2021-01-25 16:05:03 +01:00
Alexander Aleksandrovič Klimov
124f98eed4
Merge pull request #8600 from Icinga/feature/flapping-ignore-unknown
Flapping: Allow to ignore states in flapping detection
2021-01-21 13:47:44 +01:00
Noah Hilverling
e060995fd8 Flapping: Allow to ignore states in flapping calculation 2021-01-20 11:09:03 +01:00
Alexander Aleksandrovič Klimov
915a3c3001
Merge pull request #8436 from Icinga/bugfix/children-recover-too-late
On recovery: re-check children
2020-12-11 15:41:31 +01:00
Alexander A. Klimov
854939a8ce On recovery: re-check children 2020-12-02 12:24:40 +01:00
Alexander Aleksandrovič Klimov
4f6fecc74c
Merge pull request #8101 from Icinga/bugfix/timestamps-checkresult-differ-across-nodes-8092
State timestamps set by the same check result differ across nodes
2020-10-30 17:24:15 +01:00
Noah Hilverling
ddf1e50d93 ProcessCheckResult(): Make sure hosts aren't locked during Service::GetSeverity() 2020-08-11 15:24:54 +02:00
Alexander A. Klimov
4585a404d6 Checkable#ExecuteCheck(): set #last_check_started to now before #UpdateNextCheck()
refs #7888
2020-07-28 11:54:13 +02:00
Yonas Habteab
3ecaf1e4a4 Use executionEnd instead of GetTime() 2020-07-09 10:44:38 +02:00
Noah Hilverling
4c9e4959f3
Merge pull request #7823 from Icinga/bugfix/unify-application-start-times
Fix timing point for Application::GetStartTime() (related to command endpoint grace period)
2020-03-09 09:45:57 +01:00
Noah Hilverling
8f061ae80e Fix OnHostProblemChanged signal 2020-03-04 10:55:07 +01:00
Michael Friedrich
8e62fc8efb Fix 'check_timeout' not being forwarded to agent command endpoints
fixes #6992
2020-02-27 11:46:52 +01:00
Michael Friedrich
d53eb34520 Unify Application::GetStartTime() and drop GetMainTime()
This essentially moves the start time into the scope when main
starts to "do something", after the reload and configuration handling
is done.
2020-02-11 17:26:15 +01:00
Noah Hilverling
71ef1de964
Merge pull request #7667 from Icinga/feature/icingadb-acks-history
IcingaDB: populate icinga:history:stream:acknowledgement
2019-12-05 09:20:36 +01:00
Alexander A. Klimov
798c56b809 IcingaDB: update service state on Host#problem change
refs #7673
2019-12-03 17:37:51 +01:00
Alexander A. Klimov
ea5403a55c Extend Checkable::OnAcknowledgementCleared by removedBy 2019-12-03 17:00:54 +01:00
Alexander A. Klimov
6c7a9eb651 IcingaDB#SendStatusUpdate(): add icinga:history:stream:state#previous_soft_state 2019-11-04 12:59:57 +01:00
Alexander A. Klimov
746a48e2ca RedisWriter: add icinga:history:stream:{state,notification}#previous_hard_state 2019-11-02 14:00:24 +01:00
Alexander A. Klimov
efc7f2cf8d Correct current_concurrent_checks to actually running checks
refs #7416
2019-08-15 13:39:01 +02:00
Michael Friedrich
a3c6797310 Fix compiler warnings and style 2019-07-10 11:51:58 +02:00
Alexander A. Klimov
ed56fa34dc Re-send suppressed notifications
refs #5919
2019-07-09 16:38:50 +02:00
Alexander Stoll
471dbc79a3 Remove double whitespaces for notifications log message
Add space to checkable debug message to unify timestamp format
2019-05-22 14:13:14 +02:00
Michael Friedrich
93030709f5 Implement previous_state_change 2019-03-27 11:43:14 +01:00
Michael Friedrich
d14a88235d Replace Copyright header with a short version, part I
CLion -> replace in path
2019-02-25 14:48:22 +01:00
Michael Friedrich
d1fb1a8eda Refactor conditions and add debug log messages for future crs and skipped crs 2019-02-08 13:32:13 +01:00