Commit Graph

143 Commits

Author SHA1 Message Date
Alexander A. Klimov 095e5982f4 doc/: fix "a HA" -> "an HA" 2024-10-24 09:44:36 +02:00
Yonas Habteab ca7cc54438 Checkable: Don't recalculate `next_check` while processing remotely genrated check
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.

Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
2024-08-16 16:15:56 +02:00
Alexander A. Klimov 6414fd19f5 Checkable#ProcessCheckResult(): only clean up ack comments older than check result
Normally if for some reason an ack comment still exists on a checkable not
acked anymore, still clean it up. But while replaying log config objects
incl. ack comments come before check results and acks. I.e. 1) ack comment,
2) DOWN check result and 3) ack. Not 1) DOWN check result, 2) ack and 3) ack
comment. So the checkable is temporarily not acked, but already has the ack
comment. In this case the DOWN check result which is older than the ack
comment shall not clean up the latter.
2023-03-03 15:48:34 +01:00
Alexander A. Klimov dceb29c742 Checkable#RemoveCommentsByType(): remove redundant parameter 2023-03-03 11:53:02 +01:00
Alexander A. Klimov 83021f8231 CONTEXT: use << everywhere to unify usages 2022-11-30 11:06:51 +01:00
Damiano Chini 9d9810b44d Return correct status codes in process-check-result API 2022-04-26 13:33:59 +02:00
Julian Brost 39cee3538a Checkable: improve state notifications after suppression ends
This commit changes the Checkable notification suppression logic (notifications
are currently suppressed on the Checkable if it is unreachable, in a downtime,
or acknowledged) to that after the suppression reason ends, a state
notification is sent if and only if the first hard state after is different
from the last hard state from before. If the checkable is in a soft state after
the suppression ends, the notification is further suppressed until a hard state
is reached.

To achieve this behavior, a new attribute state_before_suppression is added to
Checkable. This attribute is set to the last hard state the first time either a
PROBLEM or a RECOVERY notification is suppressed. Compared to from before,
neither of these two flags in the suppressed_notification will ever be cleared
while the supression is still ongoing but only after the suppression ended and
the current state is compared with the old state stored in
state_before_suppression.
2022-03-03 14:25:23 +01:00
Julian Brost 3bb9cdb8cc Prevent deadlock in ProcessCheckResult
Without this commit, children and parents of a checkable were rescheduled on a
state change while holding the lock for the current checkable. If both ends of
a dependency are checked at the same time and both change state, they could end
up in a deadlock waiting for each other.

This commit fixes this problem by changing the code so that other checkables
are rescheduled only after releasing the lock for the current checkable.
2022-02-17 16:13:25 +01:00
Alexander A. Klimov 2ef3dd6a38 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often
Call it only on state changes to reduce no-op Redis/IDO updates a lot.

refs #9143
2022-02-03 11:12:53 +01:00
Alexander A. Klimov 4c38715ef2 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last
to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged().
That needs the dependency parent to be already in the correct state.

refs #9143
2022-01-25 13:33:46 +01:00
Alexander A. Klimov 1fee3f1b12 IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
Now it gets everything from one source, the CheckResult.

refs #9132
2022-01-10 19:18:11 +01:00
Julian Brost c71029f2e8 Set downtime trigger time deterministically
When triggering a downtime, the time of the causing event is now passed on as
the trigger time. That time is:

* For fixed downtimes: the later one of start and entry time.
* If a check result triggers the downtime: The execution end of the check
  result.
* If another downtime triggers the downtime: The trigger time of the first
  downtime.

This is done so two nodes in a HA setup can write consistent Icinga DB downtime
history streams.

refs #9101
2021-12-08 14:15:50 +01:00
Michael Insel da394b2ab0
Implement scheduling_source attribute (#6326)
* Implement scheduling_source attribute

This implements the attribute `scheduling_source` for hosts and services to show which endpoint is running the scheduler for the check.

refs #4814
2021-07-20 11:10:26 +02:00
Alexander Aleksandrovič Klimov aa0baf6f69
Merge pull request #8099 from Icinga/feature/std-mutex
Use std::mutex, not boost::mutex
2021-02-04 10:19:04 +01:00
Alexander A. Klimov c3388e9af6 Use std::mutex, not boost::mutex 2021-02-03 09:54:57 +01:00
Julian Brost 9219f68c83
Merge pull request #8158 from Icinga/bugfix/check-source-passive-7948
Checkable#ProcessCheckResult(): don't overwrite check source
2021-01-26 10:49:55 +01:00
Alexander A. Klimov c3eba7e88d Checkable#ProcessCheckResult(): don't overwrite check source
... set by passive check results.

refs #7948
2021-01-25 16:05:03 +01:00
Alexander Aleksandrovič Klimov 124f98eed4
Merge pull request #8600 from Icinga/feature/flapping-ignore-unknown
Flapping: Allow to ignore states in flapping detection
2021-01-21 13:47:44 +01:00
Noah Hilverling e060995fd8 Flapping: Allow to ignore states in flapping calculation 2021-01-20 11:09:03 +01:00
Alexander Aleksandrovič Klimov 915a3c3001
Merge pull request #8436 from Icinga/bugfix/children-recover-too-late
On recovery: re-check children
2020-12-11 15:41:31 +01:00
Alexander A. Klimov 854939a8ce On recovery: re-check children 2020-12-02 12:24:40 +01:00
Alexander Aleksandrovič Klimov 4f6fecc74c
Merge pull request #8101 from Icinga/bugfix/timestamps-checkresult-differ-across-nodes-8092
State timestamps set by the same check result differ across nodes
2020-10-30 17:24:15 +01:00
Noah Hilverling ddf1e50d93 ProcessCheckResult(): Make sure hosts aren't locked during Service::GetSeverity() 2020-08-11 15:24:54 +02:00
Alexander A. Klimov 4585a404d6 Checkable#ExecuteCheck(): set #last_check_started to now before #UpdateNextCheck()
refs #7888
2020-07-28 11:54:13 +02:00
Yonas Habteab 3ecaf1e4a4 Use executionEnd instead of GetTime() 2020-07-09 10:44:38 +02:00
Noah Hilverling 4c9e4959f3
Merge pull request #7823 from Icinga/bugfix/unify-application-start-times
Fix timing point for Application::GetStartTime() (related to command endpoint grace period)
2020-03-09 09:45:57 +01:00
Noah Hilverling 8f061ae80e Fix OnHostProblemChanged signal 2020-03-04 10:55:07 +01:00
Michael Friedrich 8e62fc8efb Fix 'check_timeout' not being forwarded to agent command endpoints
fixes #6992
2020-02-27 11:46:52 +01:00
Michael Friedrich d53eb34520 Unify Application::GetStartTime() and drop GetMainTime()
This essentially moves the start time into the scope when main
starts to "do something", after the reload and configuration handling
is done.
2020-02-11 17:26:15 +01:00
Noah Hilverling 71ef1de964
Merge pull request #7667 from Icinga/feature/icingadb-acks-history
IcingaDB: populate icinga:history:stream:acknowledgement
2019-12-05 09:20:36 +01:00
Alexander A. Klimov 798c56b809 IcingaDB: update service state on Host#problem change
refs #7673
2019-12-03 17:37:51 +01:00
Alexander A. Klimov ea5403a55c Extend Checkable::OnAcknowledgementCleared by removedBy 2019-12-03 17:00:54 +01:00
Alexander A. Klimov 6c7a9eb651 IcingaDB#SendStatusUpdate(): add icinga:history:stream:state#previous_soft_state 2019-11-04 12:59:57 +01:00
Alexander A. Klimov 746a48e2ca RedisWriter: add icinga:history:stream:{state,notification}#previous_hard_state 2019-11-02 14:00:24 +01:00
Alexander A. Klimov efc7f2cf8d Correct current_concurrent_checks to actually running checks
refs #7416
2019-08-15 13:39:01 +02:00
Michael Friedrich a3c6797310 Fix compiler warnings and style 2019-07-10 11:51:58 +02:00
Alexander A. Klimov ed56fa34dc Re-send suppressed notifications
refs #5919
2019-07-09 16:38:50 +02:00
Alexander Stoll 471dbc79a3 Remove double whitespaces for notifications log message
Add space to checkable debug message to unify timestamp format
2019-05-22 14:13:14 +02:00
Michael Friedrich 93030709f5 Implement previous_state_change 2019-03-27 11:43:14 +01:00
Michael Friedrich d14a88235d Replace Copyright header with a short version, part I
CLion -> replace in path
2019-02-25 14:48:22 +01:00
Michael Friedrich d1fb1a8eda Refactor conditions and add debug log messages for future crs and skipped crs 2019-02-08 13:32:13 +01:00
Jean Flach c97f3c80f5 Fix checkresults from the future breaking checks 2019-02-08 12:08:40 +01:00
Michael Friedrich 6b7f651478
Merge pull request #6899 from Icinga/bugfix/localtime-zero-windows
Log: Ensure not to pass negative values to localtime()
2019-01-24 10:58:43 +01:00
Michael Friedrich 2fc33996b6 Log: Ensure not to pass negative values to localtime()
refs #6887
2019-01-16 17:27:38 +01:00
Alexander A. Klimov 9ae738d17f Allow Checkable#retry_interval to be 0
refs #6871
2019-01-09 11:27:33 +01:00
Michael Friedrich dab53448bc icinga.com: Update *.{h,c}pp 2018-10-18 09:27:04 +02:00
Jordi van Scheijen bc1dc9c7a7 Fix issue 5022 2018-09-27 07:52:37 +02:00
Bas Couwenberg 0891380789 Fix spelling errors.
* occured -> occurred
 * dosen't -> doesn't
2018-07-21 10:38:09 +02:00
Michael Friedrich ab9a32d67d Fix missing next check update causing the scheduler to execute checks too often
Regression from #6217, only in git master.

fixes #6421
2018-07-02 16:17:53 +02:00
Michael Friedrich 601c54e44e Add more debug logging for check scheduling 2018-07-02 16:17:33 +02:00