Commit Graph

139 Commits

Author SHA1 Message Date
Alexander A. Klimov 83021f8231 CONTEXT: use << everywhere to unify usages 2022-11-30 11:06:51 +01:00
Damiano Chini 9d9810b44d Return correct status codes in process-check-result API 2022-04-26 13:33:59 +02:00
Julian Brost 39cee3538a Checkable: improve state notifications after suppression ends
This commit changes the Checkable notification suppression logic (notifications
are currently suppressed on the Checkable if it is unreachable, in a downtime,
or acknowledged) to that after the suppression reason ends, a state
notification is sent if and only if the first hard state after is different
from the last hard state from before. If the checkable is in a soft state after
the suppression ends, the notification is further suppressed until a hard state
is reached.

To achieve this behavior, a new attribute state_before_suppression is added to
Checkable. This attribute is set to the last hard state the first time either a
PROBLEM or a RECOVERY notification is suppressed. Compared to from before,
neither of these two flags in the suppressed_notification will ever be cleared
while the supression is still ongoing but only after the suppression ended and
the current state is compared with the old state stored in
state_before_suppression.
2022-03-03 14:25:23 +01:00
Julian Brost 3bb9cdb8cc Prevent deadlock in ProcessCheckResult
Without this commit, children and parents of a checkable were rescheduled on a
state change while holding the lock for the current checkable. If both ends of
a dependency are checked at the same time and both change state, they could end
up in a deadlock waiting for each other.

This commit fixes this problem by changing the code so that other checkables
are rescheduled only after releasing the lock for the current checkable.
2022-02-17 16:13:25 +01:00
Alexander A. Klimov 2ef3dd6a38 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often
Call it only on state changes to reduce no-op Redis/IDO updates a lot.

refs #9143
2022-02-03 11:12:53 +01:00
Alexander A. Klimov 4c38715ef2 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last
to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged().
That needs the dependency parent to be already in the correct state.

refs #9143
2022-01-25 13:33:46 +01:00
Alexander A. Klimov 1fee3f1b12 IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
Now it gets everything from one source, the CheckResult.

refs #9132
2022-01-10 19:18:11 +01:00
Julian Brost c71029f2e8 Set downtime trigger time deterministically
When triggering a downtime, the time of the causing event is now passed on as
the trigger time. That time is:

* For fixed downtimes: the later one of start and entry time.
* If a check result triggers the downtime: The execution end of the check
  result.
* If another downtime triggers the downtime: The trigger time of the first
  downtime.

This is done so two nodes in a HA setup can write consistent Icinga DB downtime
history streams.

refs #9101
2021-12-08 14:15:50 +01:00
Michael Insel da394b2ab0
Implement scheduling_source attribute (#6326)
* Implement scheduling_source attribute

This implements the attribute `scheduling_source` for hosts and services to show which endpoint is running the scheduler for the check.

refs #4814
2021-07-20 11:10:26 +02:00
Alexander Aleksandrovič Klimov aa0baf6f69
Merge pull request #8099 from Icinga/feature/std-mutex
Use std::mutex, not boost::mutex
2021-02-04 10:19:04 +01:00
Alexander A. Klimov c3388e9af6 Use std::mutex, not boost::mutex 2021-02-03 09:54:57 +01:00
Julian Brost 9219f68c83
Merge pull request #8158 from Icinga/bugfix/check-source-passive-7948
Checkable#ProcessCheckResult(): don't overwrite check source
2021-01-26 10:49:55 +01:00
Alexander A. Klimov c3eba7e88d Checkable#ProcessCheckResult(): don't overwrite check source
... set by passive check results.

refs #7948
2021-01-25 16:05:03 +01:00
Alexander Aleksandrovič Klimov 124f98eed4
Merge pull request #8600 from Icinga/feature/flapping-ignore-unknown
Flapping: Allow to ignore states in flapping detection
2021-01-21 13:47:44 +01:00
Noah Hilverling e060995fd8 Flapping: Allow to ignore states in flapping calculation 2021-01-20 11:09:03 +01:00
Alexander Aleksandrovič Klimov 915a3c3001
Merge pull request #8436 from Icinga/bugfix/children-recover-too-late
On recovery: re-check children
2020-12-11 15:41:31 +01:00
Alexander A. Klimov 854939a8ce On recovery: re-check children 2020-12-02 12:24:40 +01:00
Alexander Aleksandrovič Klimov 4f6fecc74c
Merge pull request #8101 from Icinga/bugfix/timestamps-checkresult-differ-across-nodes-8092
State timestamps set by the same check result differ across nodes
2020-10-30 17:24:15 +01:00
Noah Hilverling ddf1e50d93 ProcessCheckResult(): Make sure hosts aren't locked during Service::GetSeverity() 2020-08-11 15:24:54 +02:00
Alexander A. Klimov 4585a404d6 Checkable#ExecuteCheck(): set #last_check_started to now before #UpdateNextCheck()
refs #7888
2020-07-28 11:54:13 +02:00
Yonas Habteab 3ecaf1e4a4 Use executionEnd instead of GetTime() 2020-07-09 10:44:38 +02:00
Noah Hilverling 4c9e4959f3
Merge pull request #7823 from Icinga/bugfix/unify-application-start-times
Fix timing point for Application::GetStartTime() (related to command endpoint grace period)
2020-03-09 09:45:57 +01:00
Noah Hilverling 8f061ae80e Fix OnHostProblemChanged signal 2020-03-04 10:55:07 +01:00
Michael Friedrich 8e62fc8efb Fix 'check_timeout' not being forwarded to agent command endpoints
fixes #6992
2020-02-27 11:46:52 +01:00
Michael Friedrich d53eb34520 Unify Application::GetStartTime() and drop GetMainTime()
This essentially moves the start time into the scope when main
starts to "do something", after the reload and configuration handling
is done.
2020-02-11 17:26:15 +01:00
Noah Hilverling 71ef1de964
Merge pull request #7667 from Icinga/feature/icingadb-acks-history
IcingaDB: populate icinga:history:stream:acknowledgement
2019-12-05 09:20:36 +01:00
Alexander A. Klimov 798c56b809 IcingaDB: update service state on Host#problem change
refs #7673
2019-12-03 17:37:51 +01:00
Alexander A. Klimov ea5403a55c Extend Checkable::OnAcknowledgementCleared by removedBy 2019-12-03 17:00:54 +01:00
Alexander A. Klimov 6c7a9eb651 IcingaDB#SendStatusUpdate(): add icinga:history:stream:state#previous_soft_state 2019-11-04 12:59:57 +01:00
Alexander A. Klimov 746a48e2ca RedisWriter: add icinga:history:stream:{state,notification}#previous_hard_state 2019-11-02 14:00:24 +01:00
Alexander A. Klimov efc7f2cf8d Correct current_concurrent_checks to actually running checks
refs #7416
2019-08-15 13:39:01 +02:00
Michael Friedrich a3c6797310 Fix compiler warnings and style 2019-07-10 11:51:58 +02:00
Alexander A. Klimov ed56fa34dc Re-send suppressed notifications
refs #5919
2019-07-09 16:38:50 +02:00
Alexander Stoll 471dbc79a3 Remove double whitespaces for notifications log message
Add space to checkable debug message to unify timestamp format
2019-05-22 14:13:14 +02:00
Michael Friedrich 93030709f5 Implement previous_state_change 2019-03-27 11:43:14 +01:00
Michael Friedrich d14a88235d Replace Copyright header with a short version, part I
CLion -> replace in path
2019-02-25 14:48:22 +01:00
Michael Friedrich d1fb1a8eda Refactor conditions and add debug log messages for future crs and skipped crs 2019-02-08 13:32:13 +01:00
Jean Flach c97f3c80f5 Fix checkresults from the future breaking checks 2019-02-08 12:08:40 +01:00
Michael Friedrich 6b7f651478
Merge pull request #6899 from Icinga/bugfix/localtime-zero-windows
Log: Ensure not to pass negative values to localtime()
2019-01-24 10:58:43 +01:00
Michael Friedrich 2fc33996b6 Log: Ensure not to pass negative values to localtime()
refs #6887
2019-01-16 17:27:38 +01:00
Alexander A. Klimov 9ae738d17f Allow Checkable#retry_interval to be 0
refs #6871
2019-01-09 11:27:33 +01:00
Michael Friedrich dab53448bc icinga.com: Update *.{h,c}pp 2018-10-18 09:27:04 +02:00
Jordi van Scheijen bc1dc9c7a7 Fix issue 5022 2018-09-27 07:52:37 +02:00
Bas Couwenberg 0891380789 Fix spelling errors.
* occured -> occurred
 * dosen't -> doesn't
2018-07-21 10:38:09 +02:00
Michael Friedrich ab9a32d67d Fix missing next check update causing the scheduler to execute checks too often
Regression from #6217, only in git master.

fixes #6421
2018-07-02 16:17:53 +02:00
Michael Friedrich 601c54e44e Add more debug logging for check scheduling 2018-07-02 16:17:33 +02:00
Noah Hilverling d8b400dc17 Fix that checks with command_endpoint don't return any check results
fixes #6337
2018-05-29 13:51:34 +02:00
Jean Flach 1a9c1591c0 Fix check behavior on restart
This patch changes the way checkresults are handled during a restart.

  1. Check results coming in during a shutdown are ignored.
  2. Upon start, checks which should have ran (next_check in the past),
  are re-scheduled within the first minute.

This new behavior means there will be no more "Unknown - Terminated"
checkresults during a restart and checks with high check_interval will
be run earlier if they were already scheduled to run. The downside is
that after Icinga2 was down for a while, there will be a lot of checks
within the first minute. Our max concurrent check should take care of
this though.
2018-04-10 15:52:50 +02:00
Noah Hilverling e28277175b Implement concurrent checks limit for remote checks
fixes #4841
2018-01-29 14:50:14 +01:00
Gunnar Beutner c2fb9fe226 Use initializer lists for arrays and dictionaries 2018-01-16 12:27:44 +01:00