IcingaDB Check: Multiple Responsible Instances

By design, only one Icinga 2 instance should be responsible in the HA
context. If this promise is broken, the Icinga 2 IcingaDB check should
report it.

The code did not check for invalid data in icingadb:telemetry:heartbeat.
With this change, it will go CRITICAL with a descriptive message and
report the actual number of icingadb_responsible_instances in the
performance data.
This commit is contained in:
Alvar Penning 2024-11-15 12:56:45 +01:00
parent 211bae87b5
commit 0bbe7a9b2f
No known key found for this signature in database

View File

@ -227,7 +227,9 @@ void IcingadbCheckTask::ScriptFunc(const Checkable::Ptr& checkable, const CheckR
perfdata->Add(new PerfdataValue("icinga2_heartbeat_age", heartbeatLag, false, "seconds", heartbeatLagWarning, Empty, 0)); perfdata->Add(new PerfdataValue("icinga2_heartbeat_age", heartbeatLag, false, "seconds", heartbeatLagWarning, Empty, 0));
} }
if (weResponsible) { if (weResponsible && otherResponsible) {
critmsgs << " Both this instance and another instance are responsible!";
} else if (weResponsible) {
idbokmsgs << "\n* Responsible"; idbokmsgs << "\n* Responsible";
} else if (otherResponsible) { } else if (otherResponsible) {
idbokmsgs << "\n* Not responsible, but another instance is"; idbokmsgs << "\n* Not responsible, but another instance is";
@ -235,7 +237,7 @@ void IcingadbCheckTask::ScriptFunc(const Checkable::Ptr& checkable, const CheckR
critmsgs << " No instance is responsible!"; critmsgs << " No instance is responsible!";
} }
perfdata->Add(new PerfdataValue("icingadb_responsible_instances", int(weResponsible || otherResponsible), false, "", Empty, Empty, 0, 1)); perfdata->Add(new PerfdataValue("icingadb_responsible_instances", int(weResponsible) + int(otherResponsible), false, "", Empty, Empty, 0, 1));
const auto clockDriftWarning (5); const auto clockDriftWarning (5);
const auto clockDriftCritical (30); const auto clockDriftCritical (30);