Commit Graph

1702 Commits

Author SHA1 Message Date
Alexander A. Klimov d468d7993c Lookup apply rules faster by Type*, not String and by map instead of ==/!=
1. The lookup of apply rules per source type now implies
   no String(const char*) (no malloc()) and just pointer (uint64) comparisions
2. Apply rules are now also grouped by target type via a nested map, that obsoletes
   checking the target type while iterating over all rules per source type
2022-10-19 13:43:51 +02:00
Alexander A. Klimov ce1a122618 Construct string once, not unnecessarily N times 2022-10-17 15:54:02 +02:00
Yonas Habteab 28c29c1fbc Don't allow to change object parent,host/service_name at runtime 2022-09-09 18:26:28 +02:00
Julian Brost 3220fecd4c
Merge pull request #7919 from Icinga/feature/parameter-delimiters-check-execution-6277
Introduce Command#arguments[].separator
2022-05-23 13:23:36 +02:00
Alexander A. Klimov 069c3968d9 Introduce Command#arguments[].sep
... for letting check commands produce argv like --key=value,
not just --key value.

refs #6277
2022-05-11 17:50:12 +02:00
Julian Brost 4184dcd62c
Merge pull request #9354 from WuerthPhoenix/feature/return-correct-status-in-process-check-result-api
Return correct status codes in process-check-result API
2022-05-05 15:30:09 +02:00
Julian Brost abe2dfa763 Replace EventuallyAtomic with AtomicOrLocked which falls back to a mutex
Apparently there was a reason for making the members of generated classes
atomic. However, this was only done for some types, others were still accessed
using non-atomic operations. For members of type T::Ptr (i.e.  intrusive_ptr<T>),
this can result in a double free when multiple threads access the same variable
and at least one of them writes to the variable.

This commit makes use of std::atomic<T> for more T (it removes the additional
constraint sizeof(T) <= sizeof(void*)) and uses a type including a mutex for
load and store operations as a fallback.
2022-05-03 12:02:46 +02:00
Damiano Chini 9d9810b44d Return correct status codes in process-check-result API 2022-04-26 13:33:59 +02:00
Julian Brost 51cd7e7b0b Take host state into account when sending suppressed notifications
Checkable::FireSuppressedNotifications() compares the time of the current
checkable with the last recovery time of parents to avoid notification right
after a parent recovered and before the current checkable was checked.

This commit makes this check also include to host if the checkable is a
service.  This makes the behavior consistent with the documentation that states
there is an implicit dependency on the host (which isn't realized as implicitly
generating a Dependency object unfortunately).
2022-04-19 16:13:15 +02:00
Alexander Aleksandrovič Klimov bbc2b59b0d
Merge pull request #9287 from Icinga/9275
Icinga DB: correct ack comments' is_sticky
2022-03-28 22:42:52 +02:00
Alexander A. Klimov 4399e82d9d Introduce Comment#sticky
Carries whether ack was sticky for ack comments.
2022-03-24 16:42:18 +01:00
Julian Brost ba154d2a38
Merge pull request #7929 from Icinga/bugfix/override-default-template-apply-rules-7914
Apply rules: import default templates first
2022-03-23 11:30:51 +01:00
Julian Brost bf5b905707
Merge pull request #9250 from Icinga/feature/fix-compiler-warning-do-not-move-local-variables
Fix compiler warnings don't move local variables
2022-03-08 11:37:09 +01:00
Julian Brost 90848f602b Checkable: Add test for state notifications after a suppression ends 2022-03-03 14:25:23 +01:00
Julian Brost cbc0b21b86 Checkable: sync state_before_suppression in cluster
This ensures that in case of a failover in an HA zone, the other can take over
properly and has the required state to send the proper notifications.
2022-03-03 14:25:23 +01:00
Julian Brost 39cee3538a Checkable: improve state notifications after suppression ends
This commit changes the Checkable notification suppression logic (notifications
are currently suppressed on the Checkable if it is unreachable, in a downtime,
or acknowledged) to that after the suppression reason ends, a state
notification is sent if and only if the first hard state after is different
from the last hard state from before. If the checkable is in a soft state after
the suppression ends, the notification is further suppressed until a hard state
is reached.

To achieve this behavior, a new attribute state_before_suppression is added to
Checkable. This attribute is set to the last hard state the first time either a
PROBLEM or a RECOVERY notification is suppressed. Compared to from before,
neither of these two flags in the suppressed_notification will ever be cleared
while the supression is still ongoing but only after the suppression ended and
the current state is compared with the old state stored in
state_before_suppression.
2022-03-03 14:25:23 +01:00
Julian Brost 9d3eba8383
Merge pull request #9259 from Icinga/bugfix/event-handler-spamming-8704
Checkable#ExecuteEventHandler(): don't outsource event command run twice
2022-02-25 16:51:31 +01:00
Alexander A. Klimov 74935dad7b Checkable#ExecuteEventHandler(): don't outsource event command run twice
refs #8704
2022-02-24 14:03:57 +01:00
Yonas Habteab a0607aceff Fix compiler warnings don't move local variables 2022-02-22 17:51:43 +01:00
Julian Brost 3bb9cdb8cc Prevent deadlock in ProcessCheckResult
Without this commit, children and parents of a checkable were rescheduled on a
state change while holding the lock for the current checkable. If both ends of
a dependency are checked at the same time and both change state, they could end
up in a deadlock waiting for each other.

This commit fixes this problem by changing the code so that other checkables
are rescheduled only after releasing the lock for the current checkable.
2022-02-17 16:13:25 +01:00
Julian Brost 1b0ad099f1
Merge pull request #9154 from Icinga/bugfix/icingadb-reachabilitychangehandler-9143
Icinga DB: ensure is_reachable and severity don't miss updates
2022-02-03 14:53:51 +01:00
Alexander A. Klimov 2ef3dd6a38 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often
Call it only on state changes to reduce no-op Redis/IDO updates a lot.

refs #9143
2022-02-03 11:12:53 +01:00
Alexander Aleksandrovič Klimov ff712f6b23
Service#GetSeverity(): behave as the respective IDO query of Icinga Web
which doesn't include host reachability.
2022-01-27 12:21:06 +01:00
Alexander A. Klimov 4c38715ef2 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last
to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged().
That needs the dependency parent to be already in the correct state.

refs #9143
2022-01-25 13:33:46 +01:00
Julian Brost 6390911262
Merge pull request #9123 from Icinga/bugfix/icinga2-crashes-when-sending-notifications-8186
Avoid "type" key in dicts being part of object state attrs
2022-01-19 11:48:40 +01:00
Julian Brost 463b159414
Merge pull request #9171 from Icinga/bugfix/icinga-db-notification-history-might-use-incorrect-previous_hard_state-9132
IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
2022-01-18 16:54:16 +01:00
Alexander A. Klimov 1fee3f1b12 IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
Now it gets everything from one source, the CheckResult.

refs #9132
2022-01-10 19:18:11 +01:00
Julian Brost e518dc2436
Merge pull request #9112 from Icinga/bugfix/sync-missing-history-information
Icinga DB: ensure consistent history streams in HA setup
2022-01-07 15:14:06 +01:00
Julian Brost 3e73a262cc Sync comment and downtime removal info for Icinga DB history
When a comment or downtime is removed manually, the name of the requestor and
timestamp have to be synced to other nodes in the cluster to allow all of them
to generate a consistent Icinga DB history stream.

refs #9101
2022-01-05 10:27:13 +01:00
Alexander Aleksandrovič Klimov 80663cf5e6
Merge pull request #9048 from Icinga/bugfix/timeperiod-dst-2.0
LegacyTimePeriod::ScriptFunc: fix DST edge-cases
2022-01-03 18:11:32 +01:00
Julian Brost 13ea635188 Don't trigger a fixed downtime like a flexible one
When creating a fixed downtime that starts immediately while the checkable is
in a non-OK state, previously the code path for flexible downtimes was used to
trigger this downtime. This is fixed by this commit which resolves two issued:

1. Missing downtime start notification: notifications work differently for
   fixed and flexible downtimes. This resulted in missing downtime start
   notifications under the conditions described above.
2. Incorrect downtime trigger time: this code path would incorrectly assume the
   timestamp of the last checkable as the trigger time which is incorrect for
   fixed downtimes.
2021-12-14 11:02:40 +01:00
Alexander A. Klimov eb71fb7529 Avoid "type" key in dicts being part of object state attrs
not to confuse the state file deserializator with e.g. `"type":32` on startup.
That would unexpectedly restore null (not `{"type":32}`) as there's no type "32".

refs #8186
2021-12-13 17:56:12 +01:00
Julian Brost c71029f2e8 Set downtime trigger time deterministically
When triggering a downtime, the time of the causing event is now passed on as
the trigger time. That time is:

* For fixed downtimes: the later one of start and entry time.
* If a check result triggers the downtime: The execution end of the check
  result.
* If another downtime triggers the downtime: The trigger time of the first
  downtime.

This is done so two nodes in a HA setup can write consistent Icinga DB downtime
history streams.

refs #9101
2021-12-08 14:15:50 +01:00
Alexander Aleksandrovič Klimov 31c564182a
Merge pull request #8990 from Icinga/bugfix/downtime-all-services-on-child-hosts
Fix scheduling of downtimes for all services on child hosts
2021-12-07 12:48:01 +01:00
Julian Brost 596fcdc123 Downtime::DowntimesExpireTimerHandler: don't copy vector
`ConfigType::GetObjectsByType<Downtime>()` already returns a
`std::vector<Downtime::Ptr>` so there is no point in copying it into another
vector of the same type just to then iterate the copied vector instead of the
original one.
2021-12-01 13:05:23 +01:00
Julian Brost 2ad0a4b8c3 Add missing include to fix non-unity builds
This commit fixes the following build error:

    [ 55%] Building CXX object lib/icinga/CMakeFiles/icinga.dir/usergroup.cpp.o
    lib/icinga/usergroup.cpp:79:24: error: incomplete type ‘icinga::Notification’ used in nested name specifier
       79 | std::set<Notification::Ptr> UserGroup::GetNotifications() const
          |                        ^~~
2021-11-17 16:11:15 +01:00
Julian Brost a740b1d66c LegacyTimePeriod::ScriptFunc: fix DST edge-cases
This change fixes two problems:
* The internal functions used by ScriptFunc more or less expect to operate on
  full days, but ScriptFunc may have called them with some random timestamp
  during the day. This is fixed by always using midnight of the day as
  reference time.
* Previously, the code advanced a timestamp to the next day by adding 24 hours.
  On days with DST changes, this could either still be on the same day (a day
  may have 25 hours) or skip an entire day (a day may have 23 hours). This is
  fixed by using a struct tm to advance the time to the next day.
2021-11-17 13:09:10 +01:00
Noah Hilverling 73e0d6e61b Icinga DB: Make sure object relationships are handled correctly 2021-11-12 13:34:57 +01:00
Julian Brost bb0dcdf0b4 Prevent duplicate donwtimes when combining child_options and all_services 2021-09-03 15:44:01 +02:00
Julian Brost e556d3c489 Fix scheduling of downtimes for all services on child hosts
The loop iterated over the services of the wrong host resulting in duplicate
downtimes scheduled for services of the parent host instead of downtimes for
services of the child host.
2021-09-03 15:19:27 +02:00
Alexander A. Klimov 2818245e01 Introduce Checkable#GetLastComment() 2021-07-29 12:10:42 +02:00
Julian Brost 42eb055c5f
Merge pull request #8921 from Icinga/bugfix/timeperiod-dst
TimePeriod/ScheduledDowntime: improve DST handling
2021-07-27 18:11:34 +02:00
Noah Hilverling 07145d2e61
Merge pull request #8913 from Icinga/feature/remove-child-downtimes
API Action "remove-downtime": Also remove child downtimes
2021-07-27 18:02:15 +02:00
Noah Hilverling 7217959206 API Action 'remove-downtime': Also remove child downtimes 2021-07-23 13:53:44 +02:00
Julian Brost 4273f30157 LegacyTimePeriod: Prevent modification of input parameters
Many functions of LegacyTimePeriod take a tm pointer as an input parameter and
then pass it to mktime() which actually modifies it. This causes problems if
tm_isdst was intentionally set to -1 (to automatically detect whether DST is
active at some time) and then a function is called that implicitly sets
tm_isdst and then the values of tm are modified in a way that crosses a DST
change. This resulted in 1 hour offsets with ScheduledDowntimes on days with
DST changes.
2021-07-22 15:17:06 +02:00
Michael Insel da394b2ab0
Implement scheduling_source attribute (#6326)
* Implement scheduling_source attribute

This implements the attribute `scheduling_source` for hosts and services to show which endpoint is running the scheduler for the check.

refs #4814
2021-07-20 11:10:26 +02:00
Alexander Aleksandrovič Klimov bad8059969
Merge pull request #8761 from Icinga/feature/icingadb-perfdata
Icinga DB: introduce icinga:*:state#normalized_performance_data
2021-07-07 12:29:21 +02:00
Julian Brost 7d2a1bbffe
Merge pull request #8310 from Icinga/feature/scheduleddowntime-change-remove-downtimes-8309
On ScheduledDowntime change: remove downtimes created before change
2021-07-07 10:44:08 +02:00
Alexander A. Klimov 43e4ab4760 Checkable::NotifyDowntimeEnd(): don't send Downtime end notification unless triggered
... for fixed Downtimes as well.
2021-07-06 12:50:44 +02:00
Alexander A. Klimov ea5411a6e0 PluginUtility::FormatPerfdata(): normalize UoMs if desired 2021-07-05 19:05:32 +02:00
Alexander A. Klimov 666c5818bb On ScheduledDowntime change: remove future downtimes created before change
refs #8309
2021-07-02 10:37:29 +02:00
Alexander Aleksandrovič Klimov 31f97d3e6a
Merge pull request #8828 from Icinga/bugfix/execute-command-origin-check
event::ExecuteCommand: add missing origin check
2021-06-29 18:08:07 +02:00
Alexander A. Klimov bcc3870f3a On ScheduledDowntime change: ignore downtimes created before change
... while creating new downtimes.

refs #8309
2021-06-29 17:08:41 +02:00
Alexander A. Klimov 1ee26ac89e Introduce Downtime#config_owner_hash
refs #8309
2021-06-29 16:38:33 +02:00
Julian Brost 8f585bd2ee event::ExecuteCommand: add missing origin check
Only handle messages with a trusted origin in
ClusterEvents::ExecuteCommandAPIHandler. Previously, it would not locally
execute any command but forward them to other nodes where they would then have
a trusted origin and be executed.
2021-06-29 11:15:22 +02:00
Julian Brost 5fdfd47176
Merge pull request #8848 from Icinga/bugfix/harden-scheduled-downtimes
ScheduledDowntime::TimerProc(): Catch exceptions to make sure other downtimes are still created
2021-06-28 17:16:57 +02:00
Noah Hilverling f48ad574d7 ScheduledDowntime::TimerProc(): Catch exceptions to make sure other downtimes are still created 2021-06-24 14:05:08 +02:00
Alexander A. Klimov d8e5e07c4f Downtime#Start(): trigger fixed downtimes immediately instead of waiting for the timer
... not to cause e.g. notifications if a problem occurs
between the downtime start time and the timer routine.
2021-06-23 19:16:15 +02:00
Alexander A. Klimov f28b9fb7f3 ScheduledDowntime: ignore not related Downtimes while creating Downtimes 2021-05-19 16:10:57 +02:00
Alexander Aleksandrovič Klimov 1c0ce89cb3
Merge pull request #8681 from Icinga/bugfix/problem-notification-at-downtime-end
Send problem notifications after downtime end for checkables in child zones
2021-03-22 17:56:25 +01:00
Alexander Aleksandrovič Klimov ef8619f76b
Merge pull request #8601 from Icinga/feature/replace-std-boost-bind-with-lambdas-7006
Feature: Replace std/boost::bind() with lambdas
2021-03-18 17:56:13 +01:00
Julian Brost 29727e06c0 Only handle event::SetSuppressed{Notifications,NotificationTypes} within the local zone
Note that even when passing `nullptr` as target zone to `RelayMessage()`, the
cluster message will still be sent to the parent zone. These incoming messages
will now be rejected by the parent nodes. At the moment, there's no way to only
send within the local zone.
2021-03-17 15:05:12 +01:00
Yonas Habteab 43ba2da39c Replace std/boost::bind() function with lambda expression 2021-03-10 16:29:40 +01:00
Julian Brost 02fd60934f
Merge pull request #8008 from Icinga/bugfix/ascii-tables-in-plugin-output-8006
PluginUtility::ParseCheckOutput(): if it doesn't look like perfdata, it's not perfdata
2021-03-05 17:19:38 +01:00
Julian Brost ddbad7937d
Merge pull request #8622 from Icinga/bugfix/dependency-ti-typo-8180
dependency.ti: fix typo
2021-02-05 11:49:03 +01:00
Alexander A. Klimov ebfa73388f dependency.ti: fix typo
refs #8180
2021-02-04 18:29:54 +01:00
Alexander Aleksandrovič Klimov aa0baf6f69
Merge pull request #8099 from Icinga/feature/std-mutex
Use std::mutex, not boost::mutex
2021-02-04 10:19:04 +01:00
Alexander A. Klimov c3388e9af6 Use std::mutex, not boost::mutex 2021-02-03 09:54:57 +01:00
Alexander Aleksandrovič Klimov 9a867c2c25
Merge pull request #8513 from Icinga/bugfix/notifications-downtime-change-in-timeperiod-8509
FireSuppressedNotifications(const Notification::Ptr&): don't send notifications while suppressed by checkable
2021-01-28 10:01:23 +01:00
Alexander Aleksandrovič Klimov 8d1e958275
Make code doc more readable
Co-authored-by: Julian Brost <julian.brost@icinga.com>
2021-01-27 15:43:37 +01:00
Julian Brost 9219f68c83
Merge pull request #8158 from Icinga/bugfix/check-source-passive-7948
Checkable#ProcessCheckResult(): don't overwrite check source
2021-01-26 10:49:55 +01:00
Alexander A. Klimov c3eba7e88d Checkable#ProcessCheckResult(): don't overwrite check source
... set by passive check results.

refs #7948
2021-01-25 16:05:03 +01:00
Alexander Aleksandrovič Klimov 124f98eed4
Merge pull request #8600 from Icinga/feature/flapping-ignore-unknown
Flapping: Allow to ignore states in flapping detection
2021-01-21 13:47:44 +01:00
Alexander Aleksandrovič Klimov ef23ae5f3c
Merge pull request #8267 from efuss/passive_reach
Drop passive check results for unreachable hosts/services
2021-01-20 17:07:52 +01:00
Noah Hilverling e060995fd8 Flapping: Allow to ignore states in flapping calculation 2021-01-20 11:09:03 +01:00
Edgar Fuß 3c050fcc46 Drop passive check results for unreachable hosts/services
Disregard passive check results while no active checks are being scheduled due to violated dependencies.

This copes with the fact that programs feeding passive check results into Icinga may have no notion of reachability and so drive a checkable into HARD state although dependencies have caused active check scheduling being suspended. This may prevent superflous problem notifications being emitted during recovery.

As disable_checks defaults to false, it was regarded OK (by @Al2Klimov) to make this behaviour (which resembles the active check case) unconditional and not conditionalize it on an additional attribute.

In the description of disable_checks, note that a value of true both disables scheduling of active checks and drops passive check results.
2021-01-19 20:08:38 +01:00
Alexander Aleksandrovič Klimov cbd0d6ea6e
Merge pull request #8588 from Icinga/bugfix/concurrent-schedule-downtime-delete-host
Fix null pointer dereferences when deleting objects while scheduling downtimes
2021-01-19 13:51:08 +01:00
Julian Brost 88e5744d54 AddDowntime: return Downtime::Ptr instead of String containing the name
At numerous places in the code, something like this is performed:

    String name = Downtime::AddDowntime(...);
    Downtime::Ptr downtime = Downtime::GetByName(name);

However, `downtime` can be a `nullptr` after this as it is possible that
the downtime is deleted in between.

This commit changes the return type of `Downtime::AddDowntime` to return
a Downtime::Ptr instead of the full name of the downtime. `AddDowntime`
performs the very same `GetByName()` operation internally, but handles
the `nullptr` case correctly and throws an exception.
2021-01-15 16:34:48 +01:00
Alexander Aleksandrovič Klimov c549a6657e
Merge pull request #8562 from Icinga/bugfix/fix-no-renotification-for-non-ok-state-changes-8545
Fix no re-notification for non OK state changes with time delay
2021-01-14 17:49:29 +01:00
Alexander Aleksandrovič Klimov 70b438a2bf
Merge pull request #8104 from Icinga/bugfix/remove-downtime-returns-wrong-status-7408
API: Display a correct status when removing a downtime
2021-01-14 17:49:00 +01:00
Yonas Habteab 997ad86225 Fix no re-notification for non OK state changes with time delay 2021-01-14 11:54:25 +01:00
Julian Brost f12666c166
Merge pull request #8157 from Icinga/bugfix/temporary-files-5124
Clean up temp files
2021-01-13 15:45:29 +01:00
Alexander A. Klimov 450b2117d2 Add ".tmp" to state and modified attributes temp files
refs #5124
2021-01-12 17:35:29 +01:00
Alexander A. Klimov 18c2dae941 Clean up temp files
refs #5124
2021-01-12 17:35:29 +01:00
Julian Brost aea06a27dc Use reference-counted pointer in notification callback
`this` could be deleted after `Notification::BeginExecuteNotification`
exited and before `Notification::ExecuteNotificationHelper` finished.
This is fixed by constructing a `Notification::Ptr` and operate on that
one as it is properly reference-counted.
2021-01-12 17:19:29 +01:00
Yonas Habteab 756abbb2ff ApiEvents: Implement new API event streams response 2021-01-11 14:59:48 +01:00
Julian Brost 0276c0b052 Properly handle service downtime referencing a deleted host
Only two out of three cases were handled properly by the code: host
downtimes referencing a deleted host and service downtimes referencing a
deleted service worked fine. However, if a service downtime references a
deleted host, `Host::GetByName()` returns `nullptr` which isn't
accounted for. Use `Service::GetByNamePair()` instead as this performs a
check for the host being null internally.
2021-01-08 11:12:15 +01:00
Alexander A. Klimov f311dfb775 Apply rules: import default templates first
... to allow to override the attributes they set.

refs #7914
2020-12-14 18:15:18 +01:00
Alexander A. Klimov 5547488cd5 Introduce Checkable#NotificationReasonSuppressed()
refs #8509
2020-12-14 13:27:58 +01:00
Alexander Aleksandrovič Klimov 915a3c3001
Merge pull request #8436 from Icinga/bugfix/children-recover-too-late
On recovery: re-check children
2020-12-11 15:41:31 +01:00
Yonas Habteab dd02e3b6d8 API: Display a correct status code when removing a scheduled downtime 2020-12-07 13:19:41 +01:00
Julian Brost f2a532de32
Merge pull request #8035 from Icinga/feature/expiry-date-comments-4663
/v1/actions/add-comment: add param expiry
2020-12-04 15:48:50 +01:00
Alexander A. Klimov 854939a8ce On recovery: re-check children 2020-12-02 12:24:40 +01:00
Alexander A. Klimov 668bf06424 Don't fire suppressed notifications if last parent recovery >= last check result 2020-12-02 12:03:19 +01:00
Alexander Aleksandrovič Klimov bee4ac7f7c
Merge pull request #8040 from Icinga/feature/v1-actions-execute-command-8034
Add API endpoint: /v1/actions/execute-command
2020-12-02 10:53:24 +01:00
Alexander A. Klimov 5cfac1f643 Fix function and variable names
refs #8034
2020-11-23 16:43:47 +01:00
Alexander A. Klimov 0ad1ab20aa Fix code style
refs #8034
2020-11-23 16:39:24 +01:00
Alexander Aleksandrovič Klimov 4f6fecc74c
Merge pull request #8101 from Icinga/bugfix/timestamps-checkresult-differ-across-nodes-8092
State timestamps set by the same check result differ across nodes
2020-10-30 17:24:15 +01:00
Alexander Aleksandrovič Klimov 3fa1eab344
Merge pull request #7736 from froehl/7735
API-Event StateChange & CheckResult: Added acknowledgement and downtime_depth…
2020-10-29 13:41:52 +01:00
Alexander A. Klimov bb851b0558 Merge branch 'master' into feature/v1-actions-execute-command-8034 2020-10-28 18:37:08 +01:00