Handle concurrent config package updates gracefully

Previously, we used a simple boolean to track the state of the package updates,
and didn't reset it back when the config validation was successful because it was
assumed that if we successfully validated the config beforehand, then the worker
would also successfully reload the config afterwards, and that the old worker would
be terminated. However, this assumption is not always true due to a number of reasons
that I can't even think of right now, but the most obvious one is that after we successfully
validated the config, the config  might have changed again before the worker was able
to reload it. If that happens, then the new worker might fail to successfully validate
the config due to the recent changes, in which case the old worker would remain active,
and this flag would still be set to true, causing any subsequent requests to fail with a
`423` until you manually restart the Icinga 2 service.

So, in order to prevent such a situation, we are additionally tracking the last time a reload
failed and allow to bypass the `m_RunningPackageUpdates` flag only if the last reload failed
time was changed since the previous request.
This commit is contained in:
Yonas Habteab 2025-06-18 11:58:18 +02:00
parent 827f85c327
commit 04fe22788c
2 changed files with 32 additions and 8 deletions

View File

@ -12,7 +12,10 @@ using namespace icinga;
REGISTER_URLHANDLER("/v1/config/stages", ConfigStagesHandler);
std::atomic<bool> ConfigStagesHandler::m_RunningPackageUpdates (false);
static bool l_RunningPackageUpdates(false);
// A timestamp that indicates the last time an Icinga 2 reload failed.
static double l_LastReloadFailedTime(0);
static std::mutex l_RunningPackageUpdatesMutex; // Protects the above two variables.
bool ConfigStagesHandler::HandleRequest(
const WaitGroup::Ptr&,
@ -132,12 +135,36 @@ void ConfigStagesHandler::HandlePost(
if (reload && !activate)
BOOST_THROW_EXCEPTION(std::invalid_argument("Parameter 'reload' must be false when 'activate' is false."));
if (m_RunningPackageUpdates.exchange(true)) {
return HttpUtility::SendJsonError(response, params, 423,
"Conflicting request, there is already an ongoing package update in progress. Please try it again later.");
{
std::lock_guard runningPackageUpdatesLock(l_RunningPackageUpdatesMutex);
double currentReloadFailedTime = Application::GetLastReloadFailed();
/**
* Once the m_RunningPackageUpdates flag is set, it typically remains set until the current worker process is
* terminated, in which case the new worker will have its own m_RunningPackageUpdates flag set to false.
* However, if the reload fails for any reason, the m_RunningPackageUpdates flag will remain set to true
* in the current worker process, which will prevent any further package updates from being processed until
* the next Icinga 2 restart.
*
* So, in order to prevent such a situation, we are additionally tracking the last time a reload failed
* and allow to bypass the m_RunningPackageUpdates flag only if the last reload failed time was changed
* since the previous request.
*/
if (l_RunningPackageUpdates && l_LastReloadFailedTime == currentReloadFailedTime) {
return HttpUtility::SendJsonError(
response, params, 423,
"Conflicting request, there is already an ongoing package update in progress. Please try it again later."
);
}
auto resetPackageUpdates (Shared<Defer>::Make([]() { ConfigStagesHandler::m_RunningPackageUpdates.store(false); }));
l_RunningPackageUpdates = true;
l_LastReloadFailedTime = currentReloadFailedTime;
}
auto resetPackageUpdates (Shared<Defer>::Make([]() {
std::lock_guard lock(l_RunningPackageUpdatesMutex);
l_RunningPackageUpdates = false;
}));
std::unique_lock<std::mutex> lock(ConfigPackageUtility::GetStaticPackageMutex());

View File

@ -4,7 +4,6 @@
#define CONFIGSTAGESHANDLER_H
#include "remote/httphandler.hpp"
#include <atomic>
namespace icinga
{
@ -48,8 +47,6 @@ private:
boost::beast::http::response<boost::beast::http::string_body>& response,
const Dictionary::Ptr& params
);
static std::atomic<bool> m_RunningPackageUpdates;
};
}