icinga2

Commit Graph

Author	SHA1	Message	Date
Julian Brost	cbc0b21b86	Checkable: sync state_before_suppression in cluster This ensures that in case of a failover in an HA zone, the other can take over properly and has the required state to send the proper notifications.	2022-03-03 14:25:23 +01:00
Julian Brost	39cee3538a	Checkable: improve state notifications after suppression ends This commit changes the Checkable notification suppression logic (notifications are currently suppressed on the Checkable if it is unreachable, in a downtime, or acknowledged) to that after the suppression reason ends, a state notification is sent if and only if the first hard state after is different from the last hard state from before. If the checkable is in a soft state after the suppression ends, the notification is further suppressed until a hard state is reached. To achieve this behavior, a new attribute state_before_suppression is added to Checkable. This attribute is set to the last hard state the first time either a PROBLEM or a RECOVERY notification is suppressed. Compared to from before, neither of these two flags in the suppressed_notification will ever be cleared while the supression is still ongoing but only after the suppression ended and the current state is compared with the old state stored in state_before_suppression.	2022-03-03 14:25:23 +01:00
Alexander A. Klimov	6b5106ffdd	IcingaDB#Stop(): don't block shutdown, timeout instead	2022-03-02 16:39:44 +01:00
Alexander A. Klimov	3a8efcb4ea	IcingaDB#Send*(): don't enqueue any history once stopped	2022-03-02 16:39:44 +01:00
Alexander A. Klimov	cac22fe38b	RedisConnection#Connect(): wait for all promises to be completed by the read loop from the previous connection.	2022-03-02 16:39:44 +01:00
Alexander A. Klimov	9585a63fa0	Introduce IoEngine::YieldCurrentCoroutine()	2022-03-02 16:39:44 +01:00
Alexander A. Klimov	732d5c472d	RedisConnection#ReadLoop(): don't crash (silently) if a promise to be set is already set	2022-03-02 16:39:37 +01:00
Alexander A. Klimov	50fee6aeb9	Icinga DB: include amount of history kept in memory in /v1/status	2022-03-02 16:39:37 +01:00
Alexander A. Klimov	ad0fe764f7	Icinga DB: log amount of history kept in memory every 10s	2022-03-02 16:39:37 +01:00
Alexander A. Klimov	8ea62f7fc7	Icinga DB: keep history in memory until written to Redis by putting the messages into a Bulker and retrying each chunk.	2022-03-02 16:39:37 +01:00
Alexander A. Klimov	9a8d388734	Introduce Bulker	2022-03-02 16:39:37 +01:00
Alexander Aleksandrovič Klimov	3fee562e7a	Merge pull request #9256 from Icinga/bugfix/add-some-missing-locks Add some missing locks to prevent data races	2022-03-01 16:12:50 +01:00
Julian Brost	9d3eba8383	Merge pull request #9259 from Icinga/bugfix/event-handler-spamming-8704 Checkable#ExecuteEventHandler(): don't outsource event command run twice	2022-02-25 16:51:31 +01:00
Yonas Habteab	f00a3c9693	ConfigObject: Initialize local static var at declaration to ensure thread safety	2022-02-25 15:23:49 +01:00
Yonas Habteab	fb21345bfd	ConfigItem: Use atomic variables for notified and commited items count	2022-02-25 15:17:33 +01:00
Alexander A. Klimov	74935dad7b	Checkable#ExecuteEventHandler(): don't outsource event command run twice refs #8704	2022-02-24 14:03:57 +01:00
Julian Brost	5383df3c79	Merge pull request #9212 from Icinga/bugfix/multi-ido-notification-id IDO: fix incorrect contacts in notification history with multiple IDO instances on a single node	2022-02-21 11:40:46 +01:00
Julian Brost	8e81faf3e0	Merge pull request #9221 from Icinga/bugfix/processcheckresult-dependency-deadlock Prevent deadlock in ProcessCheckResult	2022-02-18 14:14:46 +01:00
Julian Brost	99008755b5	Merge pull request #9213 from Icinga/feature/icingadb-add-previous_soft_state-to-host_state-and-service_state-9210 IcingaDB: Add previous_soft_state to host_state and service_state	2022-02-18 14:09:35 +01:00
Julian Brost	3bb9cdb8cc	Prevent deadlock in ProcessCheckResult Without this commit, children and parents of a checkable were rescheduled on a state change while holding the lock for the current checkable. If both ends of a dependency are checked at the same time and both change state, they could end up in a deadlock waiting for each other. This commit fixes this problem by changing the code so that other checkables are rescheduled only after releasing the lock for the current checkable.	2022-02-17 16:13:25 +01:00
Alexander A. Klimov	c613e62454	IcingaDB: Add previous_soft_state to host_state and service_state refs #9210	2022-02-14 11:32:46 +01:00
Julian Brost	7c9d0fff01	IDO: use per-instance notification_id in history When there are multiple active IDO instances on the same node, before this commit, all of them would share a single DbValue object for the notification_id column of the icinga_contactnotifications table. This resulted in the issue that one database references the notification_id in another database. This commit fixes this by using a separate DbValue value for each IDO instance. This needs a new signal as the existing OnQuery and OnMultipleQueries signals perform the same queries on all IDO instances, but different queries are needed here per instance (they only differ in the referenced DbValue). Therefore, a new signal OnMakeQueries is added that takes a std::function which is called once per IDO instance and can access callbacks to perform one or multiple queries only on this specific IDO instance.	2022-02-10 16:36:35 +01:00
Julian Brost	1b0ad099f1	Merge pull request #9154 from Icinga/bugfix/icingadb-reachabilitychangehandler-9143 Icinga DB: ensure is_reachable and severity don't miss updates	2022-02-03 14:53:51 +01:00
Alexander A. Klimov	2ef3dd6a38	Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged less often Call it only on state changes to reduce no-op Redis/IDO updates a lot. refs #9143	2022-02-03 11:12:53 +01:00
Alexander Aleksandrovič Klimov	ff712f6b23	Service#GetSeverity(): behave as the respective IDO query of Icinga Web which doesn't include host reachability.	2022-01-27 12:21:06 +01:00
Alexander A. Klimov	4c38715ef2	Checkable#ProcessCheckResult(): call Checkable::OnReachabilityChanged last to ensure Checkable#IsReachable() returns correctly for dependency children inside OnReachabilityChanged(). That needs the dependency parent to be already in the correct state. refs #9143	2022-01-25 13:33:46 +01:00
Alexander A. Klimov	84d09876b4	Icinga DB: ensure is_reachable and severity don't miss updates refs #9143	2022-01-25 13:33:46 +01:00
Julian Brost	185fab3761	Merge pull request #9144 from Icinga/bugfix/icingadb-state-history Icinga DB: don't write state history for ack/downtime/host problem changes	2022-01-20 12:00:24 +01:00
Julian Brost	6390911262	Merge pull request #9123 from Icinga/bugfix/icinga2-crashes-when-sending-notifications-8186 Avoid "type" key in dicts being part of object state attrs	2022-01-19 11:48:40 +01:00
Julian Brost	463b159414	Merge pull request #9171 from Icinga/bugfix/icinga-db-notification-history-might-use-incorrect-previous_hard_state-9132 IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state	2022-01-18 16:54:16 +01:00
Julian Brost	31da6a56e6	Icinga DB: remove obsolete StateChangeHandler overload This version of StateChangeHandler is no longer called anywhere as it was the wrong function for all previous callers anyways.	2022-01-18 12:26:43 +01:00
Julian Brost	cf73c6136b	Icinga DB: make host problem change events update the state tables but not write state history StateChangeHandler() is the function used when the actual hard/soft state changes and thus also writes state history. This is not desired in this case, instead, a runtime update should be generated, therefore call UpdateState() instead. refs #9063	2022-01-18 12:26:43 +01:00
Julian Brost	855e342b63	Icinga DB: make acknowledgement events update the state tables but not write state history StateChangeHandler() is the function used when the actual hard/soft state changes and thus also writes state history. This is not desired in this case, instead, a runtime update should be generated, therefore call UpdateState() instead. refs #9063	2022-01-18 12:26:43 +01:00
Julian Brost	f63268b0dd	Icinga DB: make downtime events update the state tables but not write state history StateChangeHandler() is the function used when the actual hard/soft state changes and thus also writes state history. This is not desired in this case, instead, a runtime update should be generated, therefore call UpdateState() instead. refs #9063	2022-01-18 12:26:43 +01:00
Julian Brost	447884be72	Icinga DB: don't reimplement volatile state update in SendConfigUpdate Sending a volatile state update is already implemented in UpdateState, so just use that function instead of generating the update queries.	2022-01-18 12:26:43 +01:00
Julian Brost	a6d6cb788e	Icinga DB: Merge SendStatusUpdate into UpdateState Previously, both funktions did related operations but had unclear and confusing naming: - UpdateState updated the icinga:{host,service}:state Redis keys. - SendStatusUpdate sent a runtime update for the icinga:{host,service}:state. This commit merges both functions into one with a new mode parameter. The following modes are now supported: - Volatile: Update the icinga:{host,service}:state Redis key. - Full: Perform the volatile state update and in addition send a corresponding runtime update so that this state update gets written through to the persistent database by a running icingadb process. - RuntimeOnly: Special mode for callers that can ensure that a volatile update for the current state was already performed but has to be upgraded to a full update. refs #9063	2022-01-18 12:26:43 +01:00
Alexander A. Klimov	1fee3f1b12	IcingaDB#SendSentNotification(): make stream deterministic via CheckResult#previous_hard_state Now it gets everything from one source, the CheckResult. refs #9132	2022-01-10 19:18:11 +01:00
Julian Brost	3d04b04172	Merge pull request #9138 from Icinga/bugfix/mysql-schema-versions Make MySQL schema version in full schema file and upgrade files consistent	2022-01-10 09:54:38 +01:00
Julian Brost	e518dc2436	Merge pull request #9112 from Icinga/bugfix/sync-missing-history-information Icinga DB: ensure consistent history streams in HA setup	2022-01-07 15:14:06 +01:00
Julian Brost	a99c04030c	Merge pull request #9150 from Icinga/bugfix/icingadb-cmd-arg-order-int Icinga DB: ensure icinga:*command:argument#order is an int	2022-01-05 16:07:30 +01:00
Julian Brost	3e73a262cc	Sync comment and downtime removal info for Icinga DB history When a comment or downtime is removed manually, the name of the requestor and timestamp have to be synced to other nodes in the cluster to allow all of them to generate a consistent Icinga DB history stream. refs #9101	2022-01-05 10:27:13 +01:00
Alexander Aleksandrovič Klimov	1b50d912a0	Merge pull request #9137 from Icinga/bugfix/influxdb-writer-synchronization Fix unsafe concurrent access to m_DataBuffer in InfluxdbCommonWriter	2022-01-04 17:37:28 +01:00
Alexander A. Klimov	e9e555468d	Handle "type" key in dicts being part of object state attrs i.e. the confusion of the state file deserializator with e.g. `"type":32` on startup. That would unexpectedly restore (the now ignored) null (not `{"type":32}`) as there's no type "32". refs #8186	2022-01-04 17:17:20 +01:00
Alexander Aleksandrovič Klimov	80663cf5e6	Merge pull request #9048 from Icinga/bugfix/timeperiod-dst-2.0 LegacyTimePeriod::ScriptFunc: fix DST edge-cases	2022-01-03 18:11:32 +01:00
Alexander A. Klimov	a8c9d19dae	Icinga DB: ensure icinga:command:argument#order is an int The config parser requires Command#arguments#order to be a Number, i.e. 42, 4.2 or even "4.2". That's int-casted where needed, now also for Icinga DB. Before: ``` object CheckCommand "9117" { command = [ "true" ] arguments = { "4.2" = { order = "4.2" } } } ``` 2022-01-03T13:25:07.166+0100 FATAL icingadb json: cannot unmarshal string into Go value of type int64	2022-01-03 13:28:19 +01:00
Julian Brost	33781496da	InfluxdbCommonWriter: use atomic_size_t to data buffer size from stats function m_DataBuffer may be modified concurrently while StatsFunc() is called, thus it's unsafe to call size() on it. As write access to m_DataBuffer is already synchronized by only modifying it from the single work queue thread, instead of adding a mutex, this commit adds a new std::atomic_size_t which is additionally updated when modifying m_DataBuffer and can safely be accessed in StatsFunc().	2022-01-03 12:24:26 +01:00
Julian Brost	e6300aacf9	InfluxdbCommonWriter: only flush from work queue There is no explicit synchronization of access to m_DataBuffer which is fine if it is only accessed from the single-threaded work queue. However, Stop() also called Flush() in another thread, leading to concurrent write access to m_DataBuffer which can result in a crash due to use after free/double free. Changes in this commit: * Flush() is renamed to FlushWQ() to show that it should only be called from the work queue. Additionally, it now asserts that it is running on the work queue. * Visibility of some data members is changed from protected to private. No other classes have to access these at the moment. By this change, accidental concurrent access from derived classes in the future is prevented. * Stop() now flushes by posting FlushWQ() to the work queue and joining it.	2022-01-03 12:24:26 +01:00
Julian Brost	23693248d4	Make MySQL schema version in full schema file and upgrade files consistent In the 2.12.6 release, the full schema file sets the version to 1.14.3, whereas the latest available upgrade file 2.11.0.sql sets it to 1.15.0. Therefore, ship a new upgrade file 2.12.7.sql for all users who imported their schema with version 2.11.0 or later and never performed an upgrade since then. Their databases incorrectly state schema version 1.14.3 and is bumped to the correct version 1.15.0 by the upgrade. In the 2.13.2 release, the full schema file sets the version to 1.15.0, whereas the latest available upgrade file 2.13.0.sql sets it to 1.15.1. Therefore, rename the incorrectly named upgrade file 2.13.1.sql (it was not shipped in this or any other release so far) to 2.13.3.sql for users who imported their schema with version 2.13.0 or later and never performed an upgrade since then. Their databases incorrectly state schema version 1.15.0 and are bumped to the correct version 1.15.1 by the upgrade. The full schema is not touched by this commit as for the current branch, this was already fixed by `815533b334`.	2021-12-16 15:48:12 +01:00
Julian Brost	13ea635188	Don't trigger a fixed downtime like a flexible one When creating a fixed downtime that starts immediately while the checkable is in a non-OK state, previously the code path for flexible downtimes was used to trigger this downtime. This is fixed by this commit which resolves two issued: 1. Missing downtime start notification: notifications work differently for fixed and flexible downtimes. This resulted in missing downtime start notifications under the conditions described above. 2. Incorrect downtime trigger time: this code path would incorrectly assume the timestamp of the last checkable as the trigger time which is incorrect for fixed downtimes.	2021-12-14 11:02:40 +01:00
Alexander A. Klimov	eb71fb7529	Avoid "type" key in dicts being part of object state attrs not to confuse the state file deserializator with e.g. `"type":32` on startup. That would unexpectedly restore null (not `{"type":32}`) as there's no type "32". refs #8186	2021-12-13 17:56:12 +01:00

1 2 3 4 5 ...

5968 Commits