Commit Graph

5932 Commits

Author SHA1 Message Date
Julian Brost f63268b0dd Icinga DB: make downtime events update the state tables but not write state history
StateChangeHandler() is the function used when the actual hard/soft state
changes and thus also writes state history. This is not desired in this case,
instead, a runtime update should be generated, therefore call UpdateState()
instead.

refs #9063
2022-01-18 12:26:43 +01:00
Julian Brost 447884be72 Icinga DB: don't reimplement volatile state update in SendConfigUpdate
Sending a volatile state update is already implemented in UpdateState, so just
use that function instead of generating the update queries.
2022-01-18 12:26:43 +01:00
Julian Brost a6d6cb788e Icinga DB: Merge SendStatusUpdate into UpdateState
Previously, both funktions did related operations but had unclear and confusing
naming:
- UpdateState updated the icinga:{host,service}:state Redis keys.
- SendStatusUpdate sent a runtime update for the icinga:{host,service}:state.

This commit merges both functions into one with a new mode parameter. The
following modes are now supported:
- Volatile: Update the icinga:{host,service}:state Redis key.
- Full: Perform the volatile state update and in addition send a corresponding
  runtime update so that this state update gets written through to the
  persistent database by a running icingadb process.
- RuntimeOnly: Special mode for callers that can ensure that a volatile update
  for the current state was already performed but has to be upgraded to a full
  update.

refs #9063
2022-01-18 12:26:43 +01:00
Julian Brost 3d04b04172
Merge pull request #9138 from Icinga/bugfix/mysql-schema-versions
Make MySQL schema version in full schema file and upgrade files consistent
2022-01-10 09:54:38 +01:00
Julian Brost e518dc2436
Merge pull request #9112 from Icinga/bugfix/sync-missing-history-information
Icinga DB: ensure consistent history streams in HA setup
2022-01-07 15:14:06 +01:00
Julian Brost a99c04030c
Merge pull request #9150 from Icinga/bugfix/icingadb-cmd-arg-order-int
Icinga DB: ensure icinga:*command:argument#order is an int
2022-01-05 16:07:30 +01:00
Julian Brost 3e73a262cc Sync comment and downtime removal info for Icinga DB history
When a comment or downtime is removed manually, the name of the requestor and
timestamp have to be synced to other nodes in the cluster to allow all of them
to generate a consistent Icinga DB history stream.

refs #9101
2022-01-05 10:27:13 +01:00
Alexander Aleksandrovič Klimov 1b50d912a0
Merge pull request #9137 from Icinga/bugfix/influxdb-writer-synchronization
Fix unsafe concurrent access to m_DataBuffer in InfluxdbCommonWriter
2022-01-04 17:37:28 +01:00
Alexander Aleksandrovič Klimov 80663cf5e6
Merge pull request #9048 from Icinga/bugfix/timeperiod-dst-2.0
LegacyTimePeriod::ScriptFunc: fix DST edge-cases
2022-01-03 18:11:32 +01:00
Alexander A. Klimov a8c9d19dae Icinga DB: ensure icinga:*command:argument#order is an int
The config parser requires *Command#arguments#order to be a Number, i.e. 42,
4.2 or even "4.2". That's int-casted where needed, now also for Icinga DB.

Before:

```
object CheckCommand "9117" {
	command = [ "true" ]
	arguments = {
		"4.2" = { order = "4.2" }
	}
}
```

2022-01-03T13:25:07.166+0100	FATAL	icingadb	json: cannot unmarshal string into Go value of type int64
2022-01-03 13:28:19 +01:00
Julian Brost 33781496da InfluxdbCommonWriter: use atomic_size_t to data buffer size from stats function
m_DataBuffer may be modified concurrently while StatsFunc() is called, thus
it's unsafe to call size() on it. As write access to m_DataBuffer is already
synchronized by only modifying it from the single work queue thread, instead of
adding a mutex, this commit adds a new std::atomic_size_t which is additionally
updated when modifying m_DataBuffer and can safely be accessed in StatsFunc().
2022-01-03 12:24:26 +01:00
Julian Brost e6300aacf9 InfluxdbCommonWriter: only flush from work queue
There is no explicit synchronization of access to m_DataBuffer which is fine if
it is only accessed from the single-threaded work queue. However, Stop() also
called Flush() in another thread, leading to concurrent write access to
m_DataBuffer which can result in a crash due to use after free/double free.

Changes in this commit:
* Flush() is renamed to FlushWQ() to show that it should only be called from
  the work queue. Additionally, it now asserts that it is running on the work
  queue.
* Visibility of some data members is changed from protected to private. No
  other classes have to access these at the moment. By this change, accidental
  concurrent access from derived classes in the future is prevented.
* Stop() now flushes by posting FlushWQ() to the work queue and joining it.
2022-01-03 12:24:26 +01:00
Julian Brost 23693248d4 Make MySQL schema version in full schema file and upgrade files consistent
In the 2.12.6 release, the full schema file sets the version to 1.14.3, whereas
the latest available upgrade file 2.11.0.sql sets it to 1.15.0. Therefore, ship
a new upgrade file 2.12.7.sql for all users who imported their schema with
version 2.11.0 or later and never performed an upgrade since then. Their
databases incorrectly state schema version 1.14.3 and is bumped to the correct
version 1.15.0 by the upgrade.

In the 2.13.2 release, the full schema file sets the version to 1.15.0, whereas
the latest available upgrade file 2.13.0.sql sets it to 1.15.1. Therefore,
rename the incorrectly named upgrade file 2.13.1.sql (it was not shipped in
this or any other release so far) to 2.13.3.sql for users who imported their
schema with version 2.13.0 or later and never performed an upgrade since then.
Their databases incorrectly state schema version 1.15.0 and are bumped to the
correct version 1.15.1 by the upgrade.

The full schema is not touched by this commit as for the current branch, this
was already fixed by 815533b334.
2021-12-16 15:48:12 +01:00
Julian Brost 13ea635188 Don't trigger a fixed downtime like a flexible one
When creating a fixed downtime that starts immediately while the checkable is
in a non-OK state, previously the code path for flexible downtimes was used to
trigger this downtime. This is fixed by this commit which resolves two issued:

1. Missing downtime start notification: notifications work differently for
   fixed and flexible downtimes. This resulted in missing downtime start
   notifications under the conditions described above.
2. Incorrect downtime trigger time: this code path would incorrectly assume the
   timestamp of the last checkable as the trigger time which is incorrect for
   fixed downtimes.
2021-12-14 11:02:40 +01:00
Julian Brost c71029f2e8 Set downtime trigger time deterministically
When triggering a downtime, the time of the causing event is now passed on as
the trigger time. That time is:

* For fixed downtimes: the later one of start and entry time.
* If a check result triggers the downtime: The execution end of the check
  result.
* If another downtime triggers the downtime: The trigger time of the first
  downtime.

This is done so two nodes in a HA setup can write consistent Icinga DB downtime
history streams.

refs #9101
2021-12-08 14:15:50 +01:00
Alexander Aleksandrovič Klimov 577cf94b59
Merge pull request #8956 from Icinga/Al2Klimov-patch-3
Fix IDO MySQL schema version
2021-12-07 15:31:00 +01:00
Alexander Aleksandrovič Klimov 31c564182a
Merge pull request #8990 from Icinga/bugfix/downtime-all-services-on-child-hosts
Fix scheduling of downtimes for all services on child hosts
2021-12-07 12:48:01 +01:00
Julian Brost 596fcdc123 Downtime::DowntimesExpireTimerHandler: don't copy vector
`ConfigType::GetObjectsByType<Downtime>()` already returns a
`std::vector<Downtime::Ptr>` so there is no point in copying it into another
vector of the same type just to then iterate the copied vector instead of the
original one.
2021-12-01 13:05:23 +01:00
Yonas Habteab 361807f7a9
Adjust incosistent pki log messages (#8965) 2021-11-22 16:06:55 +01:00
Julian Brost d09925189a
Merge pull request #9037 from Icinga/Al2Klimov-patch-4
InfluxdbCommonWriter#Flush(): fix log message
2021-11-19 17:09:05 +01:00
Julian Brost 2ad0a4b8c3 Add missing include to fix non-unity builds
This commit fixes the following build error:

    [ 55%] Building CXX object lib/icinga/CMakeFiles/icinga.dir/usergroup.cpp.o
    lib/icinga/usergroup.cpp:79:24: error: incomplete type ‘icinga::Notification’ used in nested name specifier
       79 | std::set<Notification::Ptr> UserGroup::GetNotifications() const
          |                        ^~~
2021-11-17 16:11:15 +01:00
Julian Brost a740b1d66c LegacyTimePeriod::ScriptFunc: fix DST edge-cases
This change fixes two problems:
* The internal functions used by ScriptFunc more or less expect to operate on
  full days, but ScriptFunc may have called them with some random timestamp
  during the day. This is fixed by always using midnight of the day as
  reference time.
* Previously, the code advanced a timestamp to the next day by adding 24 hours.
  On days with DST changes, this could either still be on the same day (a day
  may have 25 hours) or skip an entire day (a day may have 23 hours). This is
  fixed by using a struct tm to advance the time to the next day.
2021-11-17 13:09:10 +01:00
Noah Hilverling 4d3b1709fd
Merge pull request #9009 from Icinga/bugfix/icingadb-runtime-updates-delete-relationships
Icinga DB: Make sure object relationships are handled correctly during runtime updates
2021-11-12 17:52:59 +01:00
Julian Brost b9e6273ba0 Icinga DB: only log queries at debug level 2021-11-12 15:41:17 +01:00
Noah Hilverling 7a0796061a IcingaDB::AddObjectDataToRuntimeUpdates(): Copy data before modifying 2021-11-12 13:34:57 +01:00
Noah Hilverling 10bde2075a Dictionary: Make sure underlaying map is ordered 2021-11-12 13:34:57 +01:00
Noah Hilverling 73e0d6e61b Icinga DB: Make sure object relationships are handled correctly 2021-11-12 13:34:57 +01:00
Noah Hilverling 4e79eb080c
Merge pull request #9058 from Icinga/bugfix/icingadb-prefix-command_id
IcingaDB: Prefix command_id with command type
2021-11-11 11:50:26 +01:00
Noah Hilverling c1098bef35
Merge pull request #9061 from Icinga/add-downtime-duration-and-service-state-host-id-streams
Icinga DB: Add `downtime.duration` & `service_state.host_id` to Redis
2021-11-11 10:19:47 +01:00
Noah Hilverling a9c2304c61 IcingaDB: Prefix command_id with command type 2021-11-09 12:26:30 +01:00
Eric Lippmann 35053ac1dd Icinga DB: Sync groups earlier
Host and service groups are structural information that are used
for Web filters and should therefore be synchronized as soon as
possible.
2021-11-09 11:17:01 +01:00
Alexander A. Klimov 07c8440fd2 Icinga DB: sync checkables along with their states first
`WorkQueue#ParallelFor(x, false, y)` will enqueue x's items in FIFO order,
so x has to start with host and service.
2021-11-09 11:17:01 +01:00
Yonas Habteab fe5aa1e18d Icinga DB: Add `service_state.host_id` to Redis 2021-11-09 11:08:22 +01:00
Yonas Habteab 5dc45baebb Icinga DB: Add `downtime.duration` & `scheduled_duration` to Redis 2021-11-09 11:08:22 +01:00
Julian Brost 848f1ae167
Merge pull request #8998 from Icinga/bugfix/icingadb-program-start-milliseconds
Icinga DB: set value in milliseconds for program_start in stats/heartbeat
2021-11-08 18:18:19 +01:00
Julian Brost 524fe92a1d
Merge pull request #9028 from Icinga/bugfix/icingadb-zone-parent
IcingaDB: actually write parent to parent_id of zones
2021-11-08 18:08:48 +01:00
Julian Brost e46d83b6be Icinga DB: set value in milliseconds for program_start in stats/heartbeat 2021-11-08 14:37:08 +01:00
Noah Hilverling 0b9317a5bf IcingaDB: Remove GetObjectIdentifiersWithoutEnv()
Having the command type be a part of the command ID isn't needed anywhere. Removing this simplifies the way we generate IDs in general, because we don't need Prepend() anymore.

The command type was only needed to prevent ID collisions within the command_envvar and command_argument tables. Those tables have since been separated into {check,event,notification}command_envvar and {check,event,notification}command_argument tables.
2021-11-05 17:01:40 +01:00
Julian Brost 3c8672b4dc Icinga DB: increase Redis schema version
PR #9036 introduces some incompatible changes to the Redis schema, most
importantly where Icinga DB has to read the environment from: now it has to use
a new top-level key of the icinga:stats message instead of a value in the
IcingaApplication part of that message.
2021-11-05 14:14:37 +01:00
Julian Brost 6007848146 IcingaDB: export environment_id via API
Primarily required for Icinga DB integration tests at the moment, but could
also be helpful in other situations.
2021-11-05 14:14:37 +01:00
Julian Brost 4ade4c757b IcingaDB: write new environment to icinga:stats stream 2021-11-05 14:14:37 +01:00
Julian Brost 525dd50859 IcingaDB: introduce a new environment ID derived from the CA public key
In order to avoid changes to the environment ID, it is now no longer derived
from the Environment constant but instead from the public key of the CA
certificate. This ensures that it is different between clusters by default, so
no additional changes have to be done to allow two clusters to use Icinga DB to
write into the same database.

To prevent the ID from changing when the CA certificate is replaced, it is also
persisted into the file /var/lib/icinga2/icingadb.env, so if that file exists,
it takes precedence over the CA certificate.
2021-11-05 14:14:37 +01:00
Julian Brost 6cd3a483a0 tlsutility: move hex encoding into a separate function BinaryToHex 2021-11-05 14:14:37 +01:00
Julian Brost f976e351f4
Merge pull request #9044 from Icinga/bugfix/idb-dump-buf-lost
Icinga DB init. dump: flush both buffered states and state checksums
2021-11-04 12:26:28 +01:00
Alexander A. Klimov 0ff7d0a06e Icinga DB: raise icinga:schema 1 -> 2 2021-11-02 15:00:55 +01:00
Alexander A. Klimov b1714a10c2 Icinga DB: make icinga:history:stream:*#event_id deterministic
... i.e. UUID -> SHA1(env, eventType, x...) given that SHA1(env, x...) = type-specific ID.
Rationale: allow both masters to write the same history concurrently (while not
in split-brain), so that REPLACE INTO deduplicates the same events written twice.

* ack: SHA1(env, "ack_set"|"ack_clear", checkable.name, setTime)
* comment: SHA1(env, "comment_add"|"comment_remove", comment.name)
* downtime: SHA1(env, "downtime_start"|"downtime_end", downtime.name)
* flapping: SHA1(env, "flapping_start"|"flapping_end", checkable.name, startTime)
* notification: SHA1(env, "notification", notification.name, notificationType, sendTime)
* state: SHA1(env, "state_change", checkable.name, changeTime)
2021-11-02 15:00:03 +01:00
Alexander A. Klimov 5c44365c4e Icinga DB: make icinga:history:stream:notification#id deterministic
... i.e. UUID -> SHA1(x..., send time) given that SHA1(x...) = notification id.
Rationale: allow both masters to write the same notification history concurrently (while
not in split-brain), so that REPLACE INTO deduplicates the same events written twice.
2021-11-02 15:00:03 +01:00
Alexander A. Klimov c2422c56fe Icinga DB: make icinga:history:stream:state#id deterministic
... i.e. UUID -> SHA1(x..., check time) given that SHA1(x...) = checkable id.
Rationale: allow both masters to write the same state history concurrently (while
not in split-brain), so that REPLACE INTO deduplicates the same events written twice.
2021-11-02 15:00:03 +01:00
Alexander Aleksandrovič Klimov f5f8ccb1f4
Merge pull request #9020 from Icinga/feature/icingaeb-schema-version
Icinga DB: publish Redis schema version via XADD icinga:schema
2021-10-25 13:21:37 +02:00
Alexander A. Klimov d8b4768471 Icinga DB init. dump: flush both buffered states and state checksums
not to dump x states, but only x - (x % bulk) state checksums.
2021-10-21 13:49:24 +02:00