icinga2

mirror of https://github.com/Icinga/icinga2.git synced 2025-08-21 01:28:21 +02:00

Author	SHA1	Message	Date
Julian Brost	069a56d84f	Use `Checkable::GetStateBeforeSuppression()` only where relevant This fixes an issue where recovery notifications get lost if they happen outside of a notification time period. Not all calls to `Checkable::NotificationReasonApplies()` need `GetStateBeforeSuppression()` to be checked. In fact, for one caller, `FireSuppressedNotifications()` in `lib/notification/notificationcomponent.cpp`, the state before suppression may not even be initialized properly, so that the default value of OK is used which can lead to incorrect return values. Note the difference between suppressions happening on the level of the `Checkable` object level and the `Notification` object level. Only the first sets the state before suppression in the `Checkable` object, but so far, also the latter used that value incorrectly. This commit moves the check of `GetStateBeforeSuppression()` from `Checkable::NotificationReasonApplies()` to the one place where it's actually relevant: `Checkable::FireSuppressedNotifications()`. This made the existing call to `NotificationReasonApplies()` unneccessary as it would always return true: the `type` argument is computed based on the current check result, so there's no need to check it against the current check result.	2024-11-14 15:01:14 +01:00
Yonas Habteab	f66f99516b	Checkable: Don't recalculate `next_check` while processing remotely genrated check Currently, when processing a `CheckResult`, it will first trigger an `OnNextCheckChanged` event, which is sent to all connected endpoints. Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult` event is fired, which is of course also sent to all connected endpoints. Next, the other endpoints receive the `event::SetNextCheck` cluster event followed by `event::CheckResult`and invoke `checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly received check. So they also try to recalculate the next check themselves and invalidate the previously received next check timestamp from the source endpoint. Since each endpoint randomly initialises its own scheduling offset, the recalculated next check will always differ by a split second/millisecond on each of them. As a consequence, two Icinga DB HA instances will generate two different checksums for the same state and causes the state histories to be fully resynchronised after a takeover/Icinga 2 reload.	2024-09-20 12:24:39 +02:00
Yonas Habteab	20a73f6d13	Merge pull request #10127 from Icinga/AddDowntime-trigger_name-2.13 Downtime::AddDowntime(): NULL-check pointer before deref not to crash	2024-09-18 10:50:16 +02:00
Yonas Habteab	3d1a2b4e44	Merge pull request #10128 from Icinga/broken-timeperiod-2.13 Fix broken `TimePeriod/ScheduledDowntime`s	2024-09-18 10:36:41 +02:00
Alexander A. Klimov	7761d6f78b	l_LegacyDowntimesCache: delete removed objects not to leak memory	2024-09-17 16:44:57 +02:00
Alexander A. Klimov	d09d051fad	/v1/actions/schedule-downtime: reject request on invalid trigger_name For this purpose lookup the specified Downtime. Also pass Downtime objects, not just names, to Downtime::AddDowntime() not to lookup it twice.	2024-09-17 16:44:57 +02:00
Alexander A. Klimov	0bba441a86	[Refactor] Downtime::GetDowntimeIDFromLegacyID(): return the Downtime itself not just its name.	2024-09-17 16:44:57 +02:00
Alexander A. Klimov	5a155346e7	[Refactor] l_LegacyDowntimesCache: store Downtime objects, not just their names to avoid names of vanished objects.	2024-09-17 16:44:57 +02:00
Yonas Habteab	e611dc65ea	Check segemnt start date inclusively in `TimePeriod::IsInside()`	2024-09-17 16:44:32 +02:00
Yonas Habteab	3de7ec062b	Fix broken timeperiods/scheduleddowntimes	2024-09-17 16:44:32 +02:00
Julian Brost	6986417533	Timeperiods: fix off by one when calculating n-th last weekday of the month A day specification like "monday -1" refers to the last Monday of the month. However, there was an off by one if the first day of the next month is the same day of the week, i.e. a Monday in this example. LegacyTimePeriod::FindNthWeekday() picks a day to start the search for the day in question. When given a negative n to search for the n-th last day, it wrongly used the first day of the following month as the start and counted it as if it was within the current month. This resulted in a 1/7 chance that the result was one week too late. This is fixed by using the last day of the current month instead.	2024-09-17 16:44:09 +02:00
Alexander Aleksandrovič Klimov	45d5a3f5f3	Merge pull request #9817 from Icinga/flexible-downtimes-disappear-too-early-9797 Downtime#Start(): trigger flexible downtimes not earlier than fixed ones	2023-07-05 17:06:03 +02:00
Alexander Aleksandrovič Klimov	51afc74310	Merge pull request #9820 from Icinga/2.13.8/checkable-processcheckresult-only-clean-up-ack-comments-older-than-check-result-9718 Checkable#ProcessCheckResult(): only clean up ack comments older than check result	2023-07-05 11:20:55 +02:00
Yonas Habteab	ff0b45eca0	PluginUtility: Fix PerfData don't get parsed correctly The problem was that some PerfData labels contained several whitespace characters, not just one, and therefore it was parsed incorrectly in `SplitPerfdata()`. I.e. the condition in line 144 checks whether the first and last character is a normal quote, but since the label can contain spaces at the beginning and at the end respectively, this caused the problems. This PR fixes the problem by removing all occurring whitespace from the beginning and end, before starting to parse the actual label.	2023-07-04 11:10:38 +02:00
Alexander A. Klimov	6dffc57a37	Checkable#ProcessCheckResult(): only clean up ack comments older than check result Normally if for some reason an ack comment still exists on a checkable not acked anymore, still clean it up. But while replaying log config objects incl. ack comments come before check results and acks. I.e. 1) ack comment, 2) DOWN check result and 3) ack. Not 1) DOWN check result, 2) ack and 3) ack comment. So the checkable is temporarily not acked, but already has the ack comment. In this case the DOWN check result which is older than the ack comment shall not clean up the latter.	2023-07-04 10:56:30 +02:00
Alexander A. Klimov	c160c4b62e	Checkable#RemoveAckComments(): add optional comment entry time filter	2023-07-04 10:56:30 +02:00
Alexander A. Klimov	0470fe12a7	Checkable#RemoveCommentsByType(): remove redundant parameter	2023-07-04 10:56:30 +02:00
Alexander A. Klimov	43c4feb645	Downtime#Start(): trigger flexible downtimes not earlier than fixed ones the last state change could be a long time ago. If it's longer than the new downtime's duration, the downtime expires immediately. trigger time + duration < now	2023-07-04 10:39:14 +02:00
Alexander A. Klimov	34844c146d	Deduplicate and stabilize fragile filesystem transactions by using AtomicFile so they ensure all or nothing of a file gets replaced.	2023-02-15 17:19:57 +01:00
Yonas Habteab	a15bbfe913	Use service short name for evaluating targeted service rules	2022-11-04 12:47:41 +01:00
Alexander A. Klimov	cc67510063	ApplyRule::RuleMap: reduce complexity, save unnecessary lookups	2022-11-04 12:47:41 +01:00
Alexander A. Klimov	0bf093af14	Targeted apply rules: don't unnecessarily eval filter	2022-11-04 12:47:41 +01:00
Alexander A. Klimov	840bfb6be1	Unify storages of regular/targeted apply rules: std::vector<ApplyRule::Ptr>	2022-11-04 12:47:41 +01:00
Alexander A. Klimov	1935137a8d	Separately handle apply rules targetting only specific parent objects not to unnecessarily run e.g. the filter assign where host.name=="example.com" for all hosts being not example.com.	2022-11-04 12:47:41 +01:00
Alexander A. Klimov	fcedb01d0d	Lookup apply rules faster by Type, not String and by map instead of ==/!= 1. The lookup of apply rules per source type now implies no String(const char) (no malloc()) and just pointer (uint64) comparisions 2. Apply rules are now also grouped by target type via a nested map, that obsoletes checking the target type while iterating over all rules per source type	2022-11-02 11:06:12 +01:00
Alexander A. Klimov	17c652d8dd	Construct string once, not unnecessarily N times	2022-10-31 12:40:59 +01:00
Alexander Aleksandrovič Klimov	6bf16fe4f1	Merge pull request #9395 from Icinga/bugfix/atomic-members-2.13 Replace EventuallyAtomic with AtomicOrLocked which falls back to a mutex	2022-06-23 11:32:18 +02:00
Alexander A. Klimov	20ae49ad49	Introduce Command#arguments[].separator ... for letting check commands produce argv like --key=value, not just --key value. refs #6277	2022-06-14 15:04:09 +02:00
Julian Brost	184548f4fe	Replace EventuallyAtomic with AtomicOrLocked which falls back to a mutex Apparently there was a reason for making the members of generated classes atomic. However, this was only done for some types, others were still accessed using non-atomic operations. For members of type T::Ptr (i.e. intrusive_ptr<T>), this can result in a double free when multiple threads access the same variable and at least one of them writes to the variable. This commit makes use of std::atomic<T> for more T (it removes the additional constraint sizeof(T) <= sizeof(void*)) and uses a type including a mutex for load and store operations as a fallback.	2022-06-14 13:46:40 +02:00
Julian Brost	d56afbc51a	Take host state into account when sending suppressed notifications Checkable::FireSuppressedNotifications() compares the time of the current checkable with the last recovery time of parents to avoid notification right after a parent recovered and before the current checkable was checked. This commit makes this check also include to host if the checkable is a service. This makes the behavior consistent with the documentation that states there is an implicit dependency on the host (which isn't realized as implicitly generating a Dependency object unfortunately).	2022-04-22 12:04:46 +02:00
Alexander A. Klimov	45b723644c	Introduce Comment#sticky Carries whether ack was sticky for ack comments.	2022-03-30 09:45:39 +02:00
Julian Brost	f67a5532dc	Merge pull request #9285 from Icinga/bugfix/suppressed-state-notifications-2.13 Checkable: send state notifications after suppression if and only if the state differs compared to before the suppression started	2022-03-29 15:16:04 +02:00
Julian Brost	3be1202eb3	Merge pull request #9290 from Icinga/bugfix/override-default-template-apply-rules-7914 Apply rules: import default templates first	2022-03-29 13:55:41 +02:00
Alexander A. Klimov	07cd15f48f	Apply rules: import default templates first ... to allow to override the attributes they set. refs #7914	2022-03-24 14:04:58 +01:00
Julian Brost	ccb18a04ec	Checkable: Add test for state notifications after a suppression ends	2022-03-09 17:06:09 +01:00
Julian Brost	6303d8df09	Checkable: sync state_before_suppression in cluster This ensures that in case of a failover in an HA zone, the other can take over properly and has the required state to send the proper notifications.	2022-03-09 17:06:09 +01:00
Julian Brost	29fc3ad151	Checkable: improve state notifications after suppression ends This commit changes the Checkable notification suppression logic (notifications are currently suppressed on the Checkable if it is unreachable, in a downtime, or acknowledged) to that after the suppression reason ends, a state notification is sent if and only if the first hard state after is different from the last hard state from before. If the checkable is in a soft state after the suppression ends, the notification is further suppressed until a hard state is reached. To achieve this behavior, a new attribute state_before_suppression is added to Checkable. This attribute is set to the last hard state the first time either a PROBLEM or a RECOVERY notification is suppressed. Compared to from before, neither of these two flags in the suppressed_notification will ever be cleared while the supression is still ongoing but only after the suppression ended and the current state is compared with the old state stored in state_before_suppression.	2022-03-09 17:06:09 +01:00
Julian Brost	e09eaa3ad2	Merge pull request #9239 from Icinga/bugfix/adjust-behavior-of-service-get-severity Service#GetSeverity(): behave as the respective IDO query of Icinga Web	2022-03-08 16:34:18 +01:00
Julian Brost	dbe13e2f32	Merge pull request #9238 from Icinga/bugfix/timeperiod-dst-2.0 LegacyTimePeriod::ScriptFunc: fix DST edge-cases	2022-03-08 15:22:09 +01:00
Julian Brost	12293d999c	Merge pull request #9190 from Icinga/bugfix/sync-missing-history-information-213 Icinga DB: ensure consistent history streams in HA setup	2022-03-07 11:32:15 +01:00
Julian Brost	50ef32a0ad	Merge pull request #9228 from Icinga/bugfix/processcheckresult-dependency-deadlock-2.13 Prevent deadlock in ProcessCheckResult	2022-03-07 11:16:00 +01:00
Julian Brost	c6bac19da8	Merge pull request #9241 from Icinga/bugfix/icingadb-reachabilitychangehandler-9143 Icinga DB: ensure is_reachable and severity don't miss updates	2022-03-07 09:27:35 +01:00
Julian Brost	53a389769c	Merge pull request #9260 from Icinga/bugfix/event-handler-spamming-8704-213 Checkable#ExecuteEventHandler(): don't outsource event command run twice	2022-02-25 16:52:27 +01:00
Alexander A. Klimov	74935dad7b	Checkable#ExecuteEventHandler(): don't outsource event command run twice refs #8704	2022-02-24 14:03:57 +01:00
Alexander A. Klimov	88b041c7c9	Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often Call it only on state changes to reduce no-op Redis/IDO updates a lot. refs #9143	2022-02-23 16:06:31 +01:00
Alexander A. Klimov	4ea65076b0	Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged(). That needs the dependency parent to be already in the correct state. refs #9143	2022-02-23 16:06:31 +01:00
Alexander Aleksandrovič Klimov	501691cdde	Service#GetSeverity(): behave as the respective IDO query of Icinga Web which doesn't include host reachability.	2022-02-21 15:30:49 +01:00
Julian Brost	26246a4601	LegacyTimePeriod::ScriptFunc: fix DST edge-cases This change fixes two problems: * The internal functions used by ScriptFunc more or less expect to operate on full days, but ScriptFunc may have called them with some random timestamp during the day. This is fixed by always using midnight of the day as reference time. * Previously, the code advanced a timestamp to the next day by adding 24 hours. On days with DST changes, this could either still be on the same day (a day may have 25 hours) or skip an entire day (a day may have 23 hours). This is fixed by using a struct tm to advance the time to the next day.	2022-02-21 15:24:15 +01:00
Julian Brost	c55615a048	Prevent deadlock in ProcessCheckResult Without this commit, children and parents of a checkable were rescheduled on a state change while holding the lock for the current checkable. If both ends of a dependency are checked at the same time and both change state, they could end up in a deadlock waiting for each other. This commit fixes this problem by changing the code so that other checkables are rescheduled only after releasing the lock for the current checkable.	2022-02-18 14:21:59 +01:00
Julian Brost	4c2f6faa61	Sync comment and downtime removal info for Icinga DB history When a comment or downtime is removed manually, the name of the requestor and timestamp have to be synced to other nodes in the cluster to allow all of them to generate a consistent Icinga DB history stream. refs #9101	2022-01-24 18:03:03 +01:00

1 2 3 4 5 ...

1674 Commits