1724 Commits

Author SHA1 Message Date
Yannick Martin
ec2645d33c icinga2: address comment loading where host reference is not found
address #9752: check if host reference is valid
2025-01-13 11:19:42 +01:00
Alexander A. Klimov
3b40ba1fe4 doc/: fix "a HA" -> "an HA" 2024-12-02 10:13:54 +01:00
Julian Brost
e313b3f2aa Use Checkable::GetStateBeforeSuppression() only where relevant
This fixes an issue where recovery notifications get lost if they happen
outside of a notification time period.

Not all calls to `Checkable::NotificationReasonApplies()` need
`GetStateBeforeSuppression()` to be checked. In fact, for one caller,
`FireSuppressedNotifications()` in
`lib/notification/notificationcomponent.cpp`, the state before suppression may
not even be initialized properly, so that the default value of OK is used which
can lead to incorrect return values. Note the difference between suppressions
happening on the level of the `Checkable` object level and the `Notification`
object level. Only the first sets the state before suppression in the
`Checkable` object, but so far, also the latter used that value incorrectly.

This commit moves the check of `GetStateBeforeSuppression()` from
`Checkable::NotificationReasonApplies()` to the one place where it's actually
relevant: `Checkable::FireSuppressedNotifications()`. This made the existing
call to `NotificationReasonApplies()` unneccessary as it would always return
true: the `type` argument is computed based on the current check result, so
there's no need to check it against the current check result.
2024-11-14 12:07:02 +01:00
Yonas Habteab
200f198a35 Checkable: Don't recalculate next_check while processing remotely genrated check
Currently, when processing a `CheckResult`, it will first trigger an
`OnNextCheckChanged` event, which is sent to all connected endpoints.
Then, when `Checkable::ProcessCheckResult()` returns, an `OnCheckResult`
event is fired, which is of course also sent to all connected endpoints.

Next, the other endpoints receive the `event::SetNextCheck` cluster
event followed by `event::CheckResult`and invoke
`checkable#SetNextCheck()` and `Checkable#CheckResult()` with the newly
received check. So they also try to recalculate the next check
themselves and invalidate the previously received next check timestamp
from the source endpoint. Since each endpoint randomly initialises its
own scheduling offset, the recalculated next check will always differ by
a split second/millisecond on each of them. As a consequence, two Icinga
DB HA instances will generate two different checksums for the same state
and causes the state histories to be fully resynchronised after a
takeover/Icinga 2 reload.
2024-09-19 13:27:50 +02:00
Yonas Habteab
1d0a984d33
Merge pull request #10124 from Icinga/do-not-fail-removing-obsolete-downtimes-2.14
Don't fail to remove obsolete downtimes, remove RemoveAllDowntimes()
2024-09-17 17:17:42 +02:00
Yonas Habteab
2d970bcd3b
Merge pull request #10123 from Icinga/AddDowntime-trigger_name-2.14
Downtime::AddDowntime(): NULL-check pointer before deref not to crash
2024-09-17 16:09:16 +02:00
Yonas Habteab
221041487c
Merge pull request #10122 from Icinga/broken-timeperiod-2.14
Fix broken `TimePeriod/ScheduledDowntime`s
2024-09-17 15:56:49 +02:00
Yonas Habteab
06b01cb574
Merge pull request #10125 from Icinga/output-exit-code-2.14
Mention plugin exit codes outside [0..3] in the plugin output and warning log
2024-09-17 15:18:27 +02:00
Yonas Habteab
ad7495dc0e Don't fail to remove obsolete downtimes 2024-09-17 15:06:45 +02:00
Yonas Habteab
6347c9089d Don't loose args in recursive Downtime::RemoveDowntime() call 2024-09-17 15:06:45 +02:00
Yonas Habteab
088f4e0f48 Introduce & use enum DowntimeRemovalReason 2024-09-17 15:06:45 +02:00
Alexander A. Klimov
d7a590e3ef Remove unused Checkable#RemoveAllDowntimes() 2024-09-17 15:06:45 +02:00
Yonas Habteab
574cbb4b4e Check segemnt start date inclusively in TimePeriod::IsInside() 2024-09-17 12:33:27 +02:00
Yonas Habteab
ed16377349 Fix broken timeperiods/scheduleddowntimes 2024-09-17 12:33:27 +02:00
Alexander A. Klimov
b3c914242c l_LegacyDowntimesCache: delete removed objects not to leak memory 2024-09-17 12:33:16 +02:00
Alexander A. Klimov
58a10ad312 /v1/actions/schedule-downtime: reject request on invalid trigger_name
For this purpose lookup the specified Downtime. Also pass Downtime objects,
not just names, to Downtime::AddDowntime() not to lookup it twice.
2024-09-17 12:33:16 +02:00
Alexander A. Klimov
a3dd32e6e5 [Refactor] Downtime::GetDowntimeIDFromLegacyID(): return the Downtime itself
not just its name.
2024-09-17 12:33:16 +02:00
Alexander A. Klimov
9cf585b625 [Refactor] l_LegacyDowntimesCache: store Downtime objects, not just their names
to avoid names of vanished objects.
2024-09-17 12:33:16 +02:00
Alexander A. Klimov
0605577d4d Make ProcessResult#ExitStatus and CheckResult#exit_status 64-bit ints
so that they can hold Windows exit codes like 3221225477 (>2147483647).
2024-09-17 12:33:01 +02:00
Julian Brost
3247e83957 Timeperiods: fix off by one when calculating n-th last weekday of the month
A day specification like "monday -1" refers to the last Monday of the month.
However, there was an off by one if the first day of the next month is the same
day of the week, i.e. a Monday in this example.

LegacyTimePeriod::FindNthWeekday() picks a day to start the search for the day
in question. When given a negative n to search for the n-th last day, it
wrongly used the first day of the following month as the start and counted it
as if it was within the current month. This resulted in a 1/7 chance that the
result was one week too late.

This is fixed by using the last day of the current month instead.
2024-09-17 12:32:10 +02:00
Alexander A. Klimov
d7500ca1bd Notification#BeginExecuteNotification(): on recovery clear last_notified_state_per_user 2023-12-13 16:14:57 +01:00
Alexander A. Klimov
bbadf1f27b Notification#BeginExecuteNotification(): discard likely duplicate problem notifications 2023-12-13 16:14:57 +01:00
Alexander A. Klimov
0ae2bdc444 Cluster-sync Notification#last_notified_state_per_user 2023-12-13 16:14:57 +01:00
Alexander A. Klimov
66ba9f446a Notification#BeginExecuteNotification(): track state change notifications 2023-12-13 16:14:57 +01:00
Julian Brost
7c381ae12f
Merge pull request #9779 from Icinga/macroprocessor-resolvemacro-quasi-cv-object-icingaapplication
MacroProcessor::ResolveMacro(): treat quasi-CV-object IcingaApplication as real CV-object
2023-05-31 20:41:31 +02:00
Alexander A. Klimov
a9c80ffb2e MacroProcessor::ResolveMacro(): treat quasi-CV-object IcingaApplication as real CV-object
As MacroProcessor checked just for CustomVarObject base class, but
IcingaApplication provided the vars attribute by itself, it had to also
resolve CV macros by itself. That logic diverged from MacroProcessor so that
macros inside IcingaApplication CVs weren't resolved. Until now.
2023-05-31 16:35:09 +02:00
Julian Brost
0e25644151
Merge pull request #8969 from Icinga/bugfix/perfdata-dont-get-parsed-correctly-8912
PluginUtility: Fix PerfData parsing for values separated with multiple spaces
2023-05-22 17:16:31 +02:00
Julian Brost
eca8890d49
Merge pull request #9718 from Icinga/acknowledgement-sync-between-masters-are-not-working-9652
Checkable#ProcessCheckResult(): only clean up ack comments older than check result
2023-05-05 15:29:38 +02:00
Julian Brost
af9d67b262
Merge pull request #9726 from Icinga/43624b
Remove -and notify- expired downtimes immediately, not every 60s II
2023-05-02 11:25:03 +02:00
Alexander A. Klimov
58b788cd51 Downtime#Start(): trigger flexible downtimes not earlier than fixed ones
the last state change could be a long time ago. If it's longer than
the new downtime's duration, the downtime expires immediately.

trigger time + duration < now
2023-04-18 16:55:32 +02:00
Alexander A. Klimov
0ac1cd1ecb Rename Downtime::DowntimesExpireTimerHandler()
to actually reflect its purpose.
2023-04-14 14:52:05 +02:00
Alexander A. Klimov
6adf2d19e4 Remove -and notify- expired downtimes immediately, not every 60s
Don't look for expired downtimes in a timer fired every 60s,
but fire one timer per downtime once at expire time.
2023-04-14 14:52:05 +02:00
Julian Brost
8228fae740
Merge pull request #8627 from WuerthPhoenix/bug/agent-cannot-update-executions-8616
Fix update execution message discarded. refs #8616
2023-04-13 19:29:49 +02:00
Julian Brost
f505325ff9
Merge pull request #9445 from Icinga/9365
Disallow config modifications via API during reload
2023-04-13 17:11:58 +02:00
Mattia Codato
c5c17928a6 Allow to exec command on endpoint where the checkable is not present but checkable has command_endpoint specified 2023-04-13 14:44:07 +02:00
Alexander A. Klimov
2ee776b5ab Disallow config modifications via API during reload
Once the new main process has read the config,
it misses subsequent modifications from the old process otherwise.
2023-04-12 14:45:40 +02:00
Julian Brost
50018c1d2b
Merge pull request #8218 from efuss/redundancy_group
Introduce redundancy groups for Dependency Objects
2023-04-05 18:49:58 +02:00
Yonas Habteab
24d95e1178 PluginUtility: Fix PerfData don't get parsed correctly
The problem was that some PerfData labels contained several whitespace characters,
not just one, and therefore it was parsed incorrectly in `SplitPerfdata()`. I.e. the condition
in line 144 checks whether the first and last character is a normal quote, but since the
label can contain spaces at the beginning and at the end respectively, this caused the problems.

This PR fixes the problem by removing all occurring whitespace from the beginning and end,
before starting to parse the actual label.
2023-04-05 15:37:54 +02:00
Alexander A. Klimov
21b68455ce Use Timer::Create() instead of new Timer()
git ls-files -z |xargs -0 perl -pi -e 's/\bnew Timer\b/Timer::Create/g'

ex. in Timer::Create() itself.
2023-04-04 10:35:20 +02:00
Alexander A. Klimov
35248b1b63 Code style 2023-04-03 13:39:08 +02:00
Alexander A. Klimov
cc872dac1f Remove CheckResultReader which has been deprecated for 5 major versions 2023-04-03 11:39:21 +02:00
Alexander A. Klimov
6414fd19f5 Checkable#ProcessCheckResult(): only clean up ack comments older than check result
Normally if for some reason an ack comment still exists on a checkable not
acked anymore, still clean it up. But while replaying log config objects
incl. ack comments come before check results and acks. I.e. 1) ack comment,
2) DOWN check result and 3) ack. Not 1) DOWN check result, 2) ack and 3) ack
comment. So the checkable is temporarily not acked, but already has the ack
comment. In this case the DOWN check result which is older than the ack
comment shall not clean up the latter.
2023-03-03 15:48:34 +01:00
Alexander A. Klimov
4662d4477b Checkable#RemoveAckComments(): add optional comment entry time filter 2023-03-03 15:48:11 +01:00
Alexander A. Klimov
dceb29c742 Checkable#RemoveCommentsByType(): remove redundant parameter 2023-03-03 11:53:02 +01:00
Mattia Codato
912fdb9700 Fix update execution message discarded
refs Icinga#8616
2023-03-02 17:50:39 +01:00
Alexander A. Klimov
bbf2e80002 Remove StatusDataWriter which has been deprecated for 5 major versions 2023-03-01 17:16:28 +01:00
Edgar Fuß
20d7e1b5e6 Fix use of std::unordered_map::insert() as pointed out by Nathaniel Wesley Filardo in GitHup Pull Request #8999 2023-02-21 16:23:40 +01:00
Edgar Fuß
5bba609e60 Add missing #include 2023-02-21 16:23:40 +01:00
Edgar Fuß
cfef9fdadc Introduce redundancy groups for Dependency Objects
Traditional behaviour was to regard all dependecies as cumulative (e.g., the parent considered unreachable if any one dependency is violated), commit ed5892238916ab667a4c9d904bd73acd3ed162f2 made all dependencies regarded redundant (e.g., the parent considered unreachable only if all dependency are violated). This may lead to unrelated services (or even hosts vs. services) inadvertantly regarded to be redundant to each other.

Most importantly, applying the explicit "disable-host-service-checks" dependency described in the "Monitoring Basics" chapter will defeat all other dependencies.

This commit introduces a new "redundancy_group" attribute for dependencies.
Specifying a redundancy_group causes a dependency to be regarded as redundant only inside that redundancy group.
Dependencies lacking a redundancy_group attribute are regarded as essential for the parent.

This allows for both cumulative and redundant dependencies and even a combination (cumulation of redundancies, like SSH depeding on both LDAP and DNS to function, while operating redundant LDAP servers as well as redundant DNS resolvers).

This commit lacks changes to the tests.
2023-02-21 16:23:36 +01:00
Julian Brost
a84a0a3cee
Merge pull request #8302 from Icinga/bugfix/windows-systemroot-aliases-6259
Macros: support $env.ENV_VAR_NAME$
2023-02-20 13:09:15 +01:00