Commit Graph

5968 Commits

Author SHA1 Message Date
Julian Brost cbc0b21b86 Checkable: sync state_before_suppression in cluster
This ensures that in case of a failover in an HA zone, the other can take over
properly and has the required state to send the proper notifications.
2022-03-03 14:25:23 +01:00
Julian Brost 39cee3538a Checkable: improve state notifications after suppression ends
This commit changes the Checkable notification suppression logic (notifications
are currently suppressed on the Checkable if it is unreachable, in a downtime,
or acknowledged) to that after the suppression reason ends, a state
notification is sent if and only if the first hard state after is different
from the last hard state from before. If the checkable is in a soft state after
the suppression ends, the notification is further suppressed until a hard state
is reached.

To achieve this behavior, a new attribute state_before_suppression is added to
Checkable. This attribute is set to the last hard state the first time either a
PROBLEM or a RECOVERY notification is suppressed. Compared to from before,
neither of these two flags in the suppressed_notification will ever be cleared
while the supression is still ongoing but only after the suppression ended and
the current state is compared with the old state stored in
state_before_suppression.
2022-03-03 14:25:23 +01:00
Alexander A. Klimov 6b5106ffdd IcingaDB#Stop(): don't block shutdown, timeout instead 2022-03-02 16:39:44 +01:00
Alexander A. Klimov 3a8efcb4ea IcingaDB#Send*(): don't enqueue any history once stopped 2022-03-02 16:39:44 +01:00
Alexander A. Klimov cac22fe38b RedisConnection#Connect(): wait for all promises to be completed
by the read loop from the previous connection.
2022-03-02 16:39:44 +01:00
Alexander A. Klimov 9585a63fa0 Introduce IoEngine::YieldCurrentCoroutine() 2022-03-02 16:39:44 +01:00
Alexander A. Klimov 732d5c472d RedisConnection#ReadLoop(): don't crash (silently) if a promise to be set is already set 2022-03-02 16:39:37 +01:00
Alexander A. Klimov 50fee6aeb9 Icinga DB: include amount of history kept in memory in /v1/status 2022-03-02 16:39:37 +01:00
Alexander A. Klimov ad0fe764f7 Icinga DB: log amount of history kept in memory every 10s 2022-03-02 16:39:37 +01:00
Alexander A. Klimov 8ea62f7fc7 Icinga DB: keep history in memory until written to Redis
by putting the messages into a Bulker and retrying each chunk.
2022-03-02 16:39:37 +01:00
Alexander A. Klimov 9a8d388734 Introduce Bulker 2022-03-02 16:39:37 +01:00
Alexander Aleksandrovič Klimov 3fee562e7a
Merge pull request #9256 from Icinga/bugfix/add-some-missing-locks
Add some missing locks to prevent data races
2022-03-01 16:12:50 +01:00
Julian Brost 9d3eba8383
Merge pull request #9259 from Icinga/bugfix/event-handler-spamming-8704
Checkable#ExecuteEventHandler(): don't outsource event command run twice
2022-02-25 16:51:31 +01:00
Yonas Habteab f00a3c9693 ConfigObject: Initialize local static var at declaration to ensure thread safety 2022-02-25 15:23:49 +01:00
Yonas Habteab fb21345bfd ConfigItem: Use atomic variables for notified and commited items count 2022-02-25 15:17:33 +01:00
Alexander A. Klimov 74935dad7b Checkable#ExecuteEventHandler(): don't outsource event command run twice
refs #8704
2022-02-24 14:03:57 +01:00
Julian Brost 5383df3c79
Merge pull request #9212 from Icinga/bugfix/multi-ido-notification-id
IDO: fix incorrect contacts in notification history with multiple IDO instances on a single node
2022-02-21 11:40:46 +01:00
Julian Brost 8e81faf3e0
Merge pull request #9221 from Icinga/bugfix/processcheckresult-dependency-deadlock
Prevent deadlock in ProcessCheckResult
2022-02-18 14:14:46 +01:00
Julian Brost 99008755b5
Merge pull request #9213 from Icinga/feature/icingadb-add-previous_soft_state-to-host_state-and-service_state-9210
IcingaDB: Add previous_soft_state to host_state and service_state
2022-02-18 14:09:35 +01:00
Julian Brost 3bb9cdb8cc Prevent deadlock in ProcessCheckResult
Without this commit, children and parents of a checkable were rescheduled on a
state change while holding the lock for the current checkable. If both ends of
a dependency are checked at the same time and both change state, they could end
up in a deadlock waiting for each other.

This commit fixes this problem by changing the code so that other checkables
are rescheduled only after releasing the lock for the current checkable.
2022-02-17 16:13:25 +01:00
Alexander A. Klimov c613e62454 IcingaDB: Add previous_soft_state to host_state and service_state
refs #9210
2022-02-14 11:32:46 +01:00
Julian Brost 7c9d0fff01 IDO: use per-instance notification_id in history
When there are multiple active IDO instances on the same node, before this
commit, all of them would share a single DbValue object for the notification_id
column of the icinga_contactnotifications table. This resulted in the issue
that one database references the notification_id in another database.

This commit fixes this by using a separate DbValue value for each IDO instance.
This needs a new signal as the existing OnQuery and OnMultipleQueries signals
perform the same queries on all IDO instances, but different queries are needed
here per instance (they only differ in the referenced DbValue). Therefore, a
new signal OnMakeQueries is added that takes a std::function which is called
once per IDO instance and can access callbacks to perform one or multiple
queries only on this specific IDO instance.
2022-02-10 16:36:35 +01:00
Julian Brost 1b0ad099f1
Merge pull request #9154 from Icinga/bugfix/icingadb-reachabilitychangehandler-9143
Icinga DB: ensure is_reachable and severity don't miss updates
2022-02-03 14:53:51 +01:00
Alexander A. Klimov 2ef3dd6a38 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often
Call it only on state changes to reduce no-op Redis/IDO updates a lot.

refs #9143
2022-02-03 11:12:53 +01:00
Alexander Aleksandrovič Klimov ff712f6b23
Service#GetSeverity(): behave as the respective IDO query of Icinga Web
which doesn't include host reachability.
2022-01-27 12:21:06 +01:00
Alexander A. Klimov 4c38715ef2 Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last
to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged().
That needs the dependency parent to be already in the correct state.

refs #9143
2022-01-25 13:33:46 +01:00
Alexander A. Klimov 84d09876b4 Icinga DB: ensure is_reachable and severity don't miss updates
refs #9143
2022-01-25 13:33:46 +01:00
Julian Brost 185fab3761
Merge pull request #9144 from Icinga/bugfix/icingadb-state-history
Icinga DB: don't write state history for ack/downtime/host problem changes
2022-01-20 12:00:24 +01:00
Julian Brost 6390911262
Merge pull request #9123 from Icinga/bugfix/icinga2-crashes-when-sending-notifications-8186
Avoid "type" key in dicts being part of object state attrs
2022-01-19 11:48:40 +01:00
Julian Brost 463b159414
Merge pull request #9171 from Icinga/bugfix/icinga-db-notification-history-might-use-incorrect-previous_hard_state-9132
IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
2022-01-18 16:54:16 +01:00
Julian Brost 31da6a56e6 Icinga DB: remove obsolete StateChangeHandler overload
This version of StateChangeHandler is no longer called anywhere as it was the
wrong function for all previous callers anyways.
2022-01-18 12:26:43 +01:00
Julian Brost cf73c6136b Icinga DB: make host problem change events update the state tables but not write state history
StateChangeHandler() is the function used when the actual hard/soft state
changes and thus also writes state history. This is not desired in this case,
instead, a runtime update should be generated, therefore call UpdateState()
instead.

refs #9063
2022-01-18 12:26:43 +01:00
Julian Brost 855e342b63 Icinga DB: make acknowledgement events update the state tables but not write state history
StateChangeHandler() is the function used when the actual hard/soft state
changes and thus also writes state history. This is not desired in this case,
instead, a runtime update should be generated, therefore call UpdateState()
instead.

refs #9063
2022-01-18 12:26:43 +01:00
Julian Brost f63268b0dd Icinga DB: make downtime events update the state tables but not write state history
StateChangeHandler() is the function used when the actual hard/soft state
changes and thus also writes state history. This is not desired in this case,
instead, a runtime update should be generated, therefore call UpdateState()
instead.

refs #9063
2022-01-18 12:26:43 +01:00
Julian Brost 447884be72 Icinga DB: don't reimplement volatile state update in SendConfigUpdate
Sending a volatile state update is already implemented in UpdateState, so just
use that function instead of generating the update queries.
2022-01-18 12:26:43 +01:00
Julian Brost a6d6cb788e Icinga DB: Merge SendStatusUpdate into UpdateState
Previously, both funktions did related operations but had unclear and confusing
naming:
- UpdateState updated the icinga:{host,service}:state Redis keys.
- SendStatusUpdate sent a runtime update for the icinga:{host,service}:state.

This commit merges both functions into one with a new mode parameter. The
following modes are now supported:
- Volatile: Update the icinga:{host,service}:state Redis key.
- Full: Perform the volatile state update and in addition send a corresponding
  runtime update so that this state update gets written through to the
  persistent database by a running icingadb process.
- RuntimeOnly: Special mode for callers that can ensure that a volatile update
  for the current state was already performed but has to be upgraded to a full
  update.

refs #9063
2022-01-18 12:26:43 +01:00
Alexander A. Klimov 1fee3f1b12 IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state
Now it gets everything from one source, the CheckResult.

refs #9132
2022-01-10 19:18:11 +01:00
Julian Brost 3d04b04172
Merge pull request #9138 from Icinga/bugfix/mysql-schema-versions
Make MySQL schema version in full schema file and upgrade files consistent
2022-01-10 09:54:38 +01:00
Julian Brost e518dc2436
Merge pull request #9112 from Icinga/bugfix/sync-missing-history-information
Icinga DB: ensure consistent history streams in HA setup
2022-01-07 15:14:06 +01:00
Julian Brost a99c04030c
Merge pull request #9150 from Icinga/bugfix/icingadb-cmd-arg-order-int
Icinga DB: ensure icinga:*command:argument#order is an int
2022-01-05 16:07:30 +01:00
Julian Brost 3e73a262cc Sync comment and downtime removal info for Icinga DB history
When a comment or downtime is removed manually, the name of the requestor and
timestamp have to be synced to other nodes in the cluster to allow all of them
to generate a consistent Icinga DB history stream.

refs #9101
2022-01-05 10:27:13 +01:00
Alexander Aleksandrovič Klimov 1b50d912a0
Merge pull request #9137 from Icinga/bugfix/influxdb-writer-synchronization
Fix unsafe concurrent access to m_DataBuffer in InfluxdbCommonWriter
2022-01-04 17:37:28 +01:00
Alexander A. Klimov e9e555468d Handle "type" key in dicts being part of object state attrs
i.e. the confusion of the state file deserializator with e.g. `"type":32` on startup.
That would unexpectedly restore (the now ignored) null (not `{"type":32}`) as there's no type "32".

refs #8186
2022-01-04 17:17:20 +01:00
Alexander Aleksandrovič Klimov 80663cf5e6
Merge pull request #9048 from Icinga/bugfix/timeperiod-dst-2.0
LegacyTimePeriod::ScriptFunc: fix DST edge-cases
2022-01-03 18:11:32 +01:00
Alexander A. Klimov a8c9d19dae Icinga DB: ensure icinga:*command:argument#order is an int
The config parser requires *Command#arguments#order to be a Number, i.e. 42,
4.2 or even "4.2". That's int-casted where needed, now also for Icinga DB.

Before:

```
object CheckCommand "9117" {
	command = [ "true" ]
	arguments = {
		"4.2" = { order = "4.2" }
	}
}
```

2022-01-03T13:25:07.166+0100	FATAL	icingadb	json: cannot unmarshal string into Go value of type int64
2022-01-03 13:28:19 +01:00
Julian Brost 33781496da InfluxdbCommonWriter: use atomic_size_t to data buffer size from stats function
m_DataBuffer may be modified concurrently while StatsFunc() is called, thus
it's unsafe to call size() on it. As write access to m_DataBuffer is already
synchronized by only modifying it from the single work queue thread, instead of
adding a mutex, this commit adds a new std::atomic_size_t which is additionally
updated when modifying m_DataBuffer and can safely be accessed in StatsFunc().
2022-01-03 12:24:26 +01:00
Julian Brost e6300aacf9 InfluxdbCommonWriter: only flush from work queue
There is no explicit synchronization of access to m_DataBuffer which is fine if
it is only accessed from the single-threaded work queue. However, Stop() also
called Flush() in another thread, leading to concurrent write access to
m_DataBuffer which can result in a crash due to use after free/double free.

Changes in this commit:
* Flush() is renamed to FlushWQ() to show that it should only be called from
  the work queue. Additionally, it now asserts that it is running on the work
  queue.
* Visibility of some data members is changed from protected to private. No
  other classes have to access these at the moment. By this change, accidental
  concurrent access from derived classes in the future is prevented.
* Stop() now flushes by posting FlushWQ() to the work queue and joining it.
2022-01-03 12:24:26 +01:00
Julian Brost 23693248d4 Make MySQL schema version in full schema file and upgrade files consistent
In the 2.12.6 release, the full schema file sets the version to 1.14.3, whereas
the latest available upgrade file 2.11.0.sql sets it to 1.15.0. Therefore, ship
a new upgrade file 2.12.7.sql for all users who imported their schema with
version 2.11.0 or later and never performed an upgrade since then. Their
databases incorrectly state schema version 1.14.3 and is bumped to the correct
version 1.15.0 by the upgrade.

In the 2.13.2 release, the full schema file sets the version to 1.15.0, whereas
the latest available upgrade file 2.13.0.sql sets it to 1.15.1. Therefore,
rename the incorrectly named upgrade file 2.13.1.sql (it was not shipped in
this or any other release so far) to 2.13.3.sql for users who imported their
schema with version 2.13.0 or later and never performed an upgrade since then.
Their databases incorrectly state schema version 1.15.0 and are bumped to the
correct version 1.15.1 by the upgrade.

The full schema is not touched by this commit as for the current branch, this
was already fixed by 815533b334.
2021-12-16 15:48:12 +01:00
Julian Brost 13ea635188 Don't trigger a fixed downtime like a flexible one
When creating a fixed downtime that starts immediately while the checkable is
in a non-OK state, previously the code path for flexible downtimes was used to
trigger this downtime. This is fixed by this commit which resolves two issued:

1. Missing downtime start notification: notifications work differently for
   fixed and flexible downtimes. This resulted in missing downtime start
   notifications under the conditions described above.
2. Incorrect downtime trigger time: this code path would incorrectly assume the
   timestamp of the last checkable as the trigger time which is incorrect for
   fixed downtimes.
2021-12-14 11:02:40 +01:00
Alexander A. Klimov eb71fb7529 Avoid "type" key in dicts being part of object state attrs
not to confuse the state file deserializator with e.g. `"type":32` on startup.
That would unexpectedly restore null (not `{"type":32}`) as there's no type "32".

refs #8186
2021-12-13 17:56:12 +01:00