Checkable: update next_check ts in ExecuteCheck only if it's needed

Since the scheduler accounts for already running checks, we only need to update the `next_check` timestamp in `Checkable::ExecuteCheck()` only where it actually makes sense to do so, and as for local checks this doesn't make sense at all. There only two cases where we need to update the next check beforehand: 1) The execute command event is sent to a connected remote endpoint, so we need to set the next check to a time in the future until we actually receive the check result back from the remote endpoint. However, it must not be too far in the future to avoid that the check is not re-run for too long in case the remote endpoint never responds. 2) The check is a remote check, but either the endpoint is currently syncing replay logs or not connected at all, and we are within the magical 5min cold startup window. In these cases, the check is effectively skipped, and there will be no check result for it coming in, we manually update the next check normally as if the check was executed. In the other cases, either the check is executed locally, which means the `m_RunningCheck` flag already prevents the scheduler from re-running the check, or this is a remote check and the endpoint is not connected, but we are outside the cold startup window, in which case we also don't do anything as we've already called `Checkable::ProcessCheckResult()` with an appropriate error state, which in turn will call `Checkable::UpdateNextCheck()`.
2025-09-26 19:18:48 +02:00 · 2025-09-24 09:37:48 +02:00 · 2025-09-24 09:37:48 +02:00 · f049f5258a
commit f049f5258a
parent 5fe6b68b89
1 changed files with 16 additions and 10 deletions
--- a/lib/icinga/checkable-check.cpp
+++ b/lib/icinga/checkable-check.cpp
@ -576,12 +576,6 @@ void Checkable::ExecuteCheck(const WaitGroup::Ptr& producer)

 	SetLastCheckStarted(Utility::GetTime());

-	/* This calls SetNextCheck() which updates the CheckerComponent's idle/pending
-	 * queues and ensures that checks are not fired multiple times. ProcessCheckResult()
-	 * is called too late. See #6421.
-	 */
-	UpdateNextCheck();
-
 	bool reachable = IsReachable();

 	{
@ -643,11 +637,16 @@ void Checkable::ExecuteCheck(const WaitGroup::Ptr& producer)
 			if (listener)
 				listener->SyncSendMessage(endpoint, message);

-			/* Re-schedule the check so we don't run it again until after we've received
-			 * a check result from the remote instance. The check will be re-scheduled
-			 * using the proper check interval once we've received a check result.
+			/*
+			 * Let the checker use a dummy next check time until we actually receive the check result from the
+			 * remote endpoint. This should be sufficiently far in the future to avoid excessive CPU load by
+			 * constantly re-running the check, but not too far in the future to avoid that the check is not
+			 * re-run for too long in case the remote endpoint never responds. We add a small grace period
+			 * to the check command timeout to account for network latency and processing time on the remote
+			 * endpoint. So, we only need to silently update this without notifying any listeners, and once
+			 * this function returns, the checker is going access it via GetNextCheck() again.
 			 */
-			SetNextCheck(Utility::GetTime() + checkTimeout + 30);
+			SetNextCheck(Utility::GetTime() + checkTimeout + 30, true);

 		/*
 		 * Let the user know that there was a problem with the check if
@ -670,6 +669,13 @@ void Checkable::ExecuteCheck(const WaitGroup::Ptr& producer)
 			cr->SetOutput(output);

 			ProcessCheckResult(cr, producer);
+		} else {
+			/**
+			 * The endpoint is currently either syncing its state or not connected yet and we are within
+			 * the magical 5min cold startup window. In both cases, we just don't do anything and wait for
+			 * the next check interval to re-try the check again. So, this check is effectively skipped.
+			 */
+			UpdateNextCheck();
 		}

 		/**