Merge pull request #7150 from Icinga/bugfix/api-config-package-active-stage-name

Ensure that runtime created API objects survive a restart
This commit is contained in:
Michael Friedrich 2019-04-30 14:22:13 +02:00 committed by GitHub
commit 759b090f81
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23
8 changed files with 295 additions and 9 deletions

View File

@ -747,7 +747,7 @@ $ curl -k -s -u root:icinga -H 'Accept: application/json' -X DELETE 'https://loc
}
```
## REST API Troubleshooting: No Objects Found <a id="troubleshooting-api-no-objects-found"></a>
### REST API Troubleshooting: No Objects Found <a id="troubleshooting-api-no-objects-found"></a>
Please note that the `404` status with no objects being found can also originate
from missing or too strict object permissions for the authenticated user.
@ -761,6 +761,93 @@ In order to analyse and fix the problem, please check the following:
- use an administrative account with full permissions to check whether the objects are actually there.
- verify the permissions on the affected ApiUser object and fix them.
### Missing Runtime Objects (Hosts, Downtimes, etc.) <a id="troubleshooting-api-missing-runtime-objects"></a>
Runtime objects consume the internal config packages shared with
the REST API config packages. Each host, downtime, comment, service, etc. created
via the REST API is stored in the `_api` package.
This includes downtimes and comments, which where sometimes stored in the wrong
directory path, because the active-stage file was empty/truncated/unreadable at
this point.
Wrong:
```
/var/lib/icinga2/api/packages/_api//conf.d/downtimes/1234-5678-9012-3456.conf
```
Correct:
```
/var/lib/icinga2/api/packages/_api/abcd-ef12-3456-7890/conf.d/downtimes/1234-5678-9012-3456.conf
```
At creation time, the object lives in memory but its storage is broken. Upon restart,
it is missing and e.g. a missing downtime will re-enable unwanted notifications.
`abcd-ef12-3456-7890` is the active stage name which wasn't correctly
read by the Icinga daemon. This information is stored in `/var/lib/icinga2/api/packages/_api/active-stage`.
2.11 now limits the direct active-stage file access (this is hidden from the user),
and caches active stages for packages in-memory.
Bonus on startup/config validation: Icinga now logs a critical message when a deployed
config package is broken.
```
icinga2 daemon -C
[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.
```
In order to fix the broken config package, and mark a deployed stage as active
again, carefully do the following steps with creating a backup before:
Navigate into the API package prefix.
```
cd /var/lib/icinga2/api/packages
```
Change into the broken package directory and list all directories and files
ordered by latest changes.
```
cd _api
ls -lahtr
drwx------ 4 michi wheel 128B Mar 27 14:39 ..
-rw-r--r-- 1 michi wheel 25B Mar 27 14:39 include.conf
-rw-r--r-- 1 michi wheel 405B Mar 27 14:39 active.conf
drwx------ 7 michi wheel 224B Mar 27 15:01 abcd-ef12-3456-7890
drwx------ 5 michi wheel 160B Apr 26 12:47 .
```
As you can see, the `active-stage` file is missing. When it is there, verify that its content
is set to the stage directory as follows.
If you have more than one stage directory here, pick the latest modified
directory. Copy the directory name `abcd-ef12-3456-7890` and
add it into a new file `active-stage`. This can be done like this:
```
echo "abcd-ef12-3456-7890" > active-stage
```
Re-run config validation.
```
icinga2 daemon -C
```
The validation should not show an error.
> **Note**
>
> The internal `_api` config package structure may change in the future. Do not modify
> things in there manually or with scripts unless guided here or asked by a developer.
## Certificate Troubleshooting <a id="troubleshooting-certificate"></a>

View File

@ -109,6 +109,28 @@ The deprecated `concurrent_checks` attribute in the [checker feature](09-object-
has no effect anymore if set. Please use the [MaxConcurrentChecks](17-language-reference.md#icinga-constants-global-config)
constant in [constants.conf](04-configuring-icinga-2.md#constants-conf) instead.
### REST API <a id="upgrading-to-2-11-api"></a>
#### Config Packages <a id="upgrading-to-2-11-api-config-packages"></a>
Deployed configuration packages require an active stage, with many previous
allowed. This mechanism is used by the Icinga Director as external consumer,
and Icinga itself for storing runtime created objects inside the `_api`
package.
This includes downtimes and comments, which where sometimes stored in the wrong
directory path, because the active-stage file was empty/truncated/unreadable at
this point.
2.11 makes this mechanism more stable and detects broken config packages.
```
[2019-04-26 12:58:14 +0200] critical/ApiListener: Cannot detect active stage for package '_api'. Broken config package, check the troubleshooting documentation.
```
In order to fix this, please follow [this troubleshooting entry](15-troubleshooting.md#troubleshooting-api-missing-runtime-objects).
## Upgrading to v2.10 <a id="upgrading-to-2-10"></a>
### Path Constant Changes <a id="upgrading-to-2-10-path-constant-changes"></a>

View File

@ -289,7 +289,13 @@ int DaemonCommand::Run(const po::variables_map& vm, const std::vector<std::strin
}
/* Remove ignored Downtime/Comment objects. */
ConfigItem::RemoveIgnoredItems(ConfigObjectUtility::GetConfigDir());
try {
String configDir = ConfigObjectUtility::GetConfigDir();
ConfigItem::RemoveIgnoredItems(configDir);
} catch (const std::exception& ex) {
Log(LogNotice, "cli")
<< "Cannot clean ignored downtimes/comments: " << ex.what();
}
#ifndef _WIN32
struct sigaction sa;

View File

@ -282,7 +282,15 @@ void ApiListener::UpdateConfigObject(const ConfigObject::Ptr& object, const Mess
params->Set("version", object->GetVersion());
if (object->GetPackage() == "_api") {
String file = ConfigObjectUtility::GetObjectConfigPath(object->GetReflectionType(), object->GetName());
String file;
try {
file = ConfigObjectUtility::GetObjectConfigPath(object->GetReflectionType(), object->GetName());
} catch (const std::exception& ex) {
Log(LogNotice, "ApiListener")
<< "Cannot sync object '" << object->GetName() << "': " << ex.what();
return;
}
std::ifstream fp(file.CStr(), std::ifstream::binary);
if (!fp)

View File

@ -6,6 +6,7 @@
#include "remote/endpoint.hpp"
#include "remote/jsonrpc.hpp"
#include "remote/apifunction.hpp"
#include "remote/configpackageutility.hpp"
#include "base/convert.hpp"
#include "base/defer.hpp"
#include "base/io-engine.hpp"
@ -134,6 +135,9 @@ void ApiListener::OnConfigLoaded()
Log(LogWarning, "ApiListener", "Please read the upgrading documentation for v2.8: https://icinga.com/docs/icinga2/latest/doc/16-upgrading-icinga-2/");
}
/* Cache API packages and their active stage name. */
UpdateActivePackageStagesCache();
/* set up SSL context */
std::shared_ptr<X509> cert;
try {
@ -267,6 +271,11 @@ void ApiListener::Start(bool runtimeCreated)
m_CleanupCertificateRequestsTimer->Start();
m_CleanupCertificateRequestsTimer->Reschedule(0);
m_ApiPackageIntegrityTimer = new Timer();
m_ApiPackageIntegrityTimer->OnTimerExpired.connect(std::bind(&ApiListener::CheckApiPackageIntegrity, this));
m_ApiPackageIntegrityTimer->SetInterval(300);
m_ApiPackageIntegrityTimer->Start();
OnMasterChanged(true);
}
@ -1537,6 +1546,83 @@ Endpoint::Ptr ApiListener::GetLocalEndpoint() const
return m_LocalEndpoint;
}
void ApiListener::UpdateActivePackageStagesCache()
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);
for (auto package : ConfigPackageUtility::GetPackages()) {
String activeStage;
try {
activeStage = ConfigPackageUtility::GetActiveStageFromFile(package);
} catch (const std::exception& ex) {
Log(LogCritical, "ApiListener")
<< ex.what();
continue;
}
Log(LogNotice, "ApiListener")
<< "Updating cache: Config package '" << package << "' has active stage '" << activeStage << "'.";
m_ActivePackageStages[package] = activeStage;
}
}
void ApiListener::CheckApiPackageIntegrity()
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);
for (auto package : ConfigPackageUtility::GetPackages()) {
String activeStage;
try {
activeStage = ConfigPackageUtility::GetActiveStageFromFile(package);
} catch (const std::exception& ex) {
/* An error means that the stage is broken, try to repair it. */
auto it = m_ActivePackageStages.find(package);
if (it == m_ActivePackageStages.end())
continue;
String activeStageCached = it->second;
Log(LogInformation, "ApiListener")
<< "Repairing broken API config package '" << package
<< "', setting active stage '" << activeStageCached << "'.";
ConfigPackageUtility::SetActiveStageToFile(package, activeStageCached);
}
}
}
void ApiListener::SetActivePackageStage(const String& package, const String& stage)
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);
m_ActivePackageStages[package] = stage;
}
String ApiListener::GetActivePackageStage(const String& package)
{
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);
if (m_ActivePackageStages.find(package) == m_ActivePackageStages.end())
BOOST_THROW_EXCEPTION(ScriptError("Package " + package + " has no active stage."));
return m_ActivePackageStages[package];
}
void ApiListener::RemoveActivePackageStage(const String& package)
{
/* This is the rare occassion when a package has been deleted. */
boost::mutex::scoped_lock lock(m_ActivePackageStagesLock);
auto it = m_ActivePackageStages.find(package);
if (it == m_ActivePackageStages.end())
return;
m_ActivePackageStages.erase(it);
}
void ApiListener::ValidateTlsProtocolmin(const Lazy<String>& lvalue, const ValidationUtils& utils)
{
ObjectImpl<ApiListener>::ValidateTlsProtocolmin(lvalue, utils);

View File

@ -84,6 +84,11 @@ public:
static Value ConfigUpdateObjectAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params);
static Value ConfigDeleteObjectAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params);
/* API config packages */
void SetActivePackageStage(const String& package, const String& stage);
String GetActivePackageStage(const String& package);
void RemoveActivePackageStage(const String& package);
static Value HelloAPIHandler(const MessageOrigin::Ptr& origin, const Dictionary::Ptr& params);
static void UpdateObjectAuthority();
@ -119,6 +124,8 @@ private:
Timer::Ptr m_ReconnectTimer;
Timer::Ptr m_AuthorityTimer;
Timer::Ptr m_CleanupCertificateRequestsTimer;
Timer::Ptr m_ApiPackageIntegrityTimer;
Endpoint::Ptr m_LocalEndpoint;
static ApiListener::Ptr m_Instance;
@ -126,6 +133,7 @@ private:
void ApiTimerHandler();
void ApiReconnectTimerHandler();
void CleanupCertificateRequestsTimerHandler();
void CheckApiPackageIntegrity();
bool AddListener(const String& node, const String& service);
void AddConnection(const Endpoint::Ptr& endpoint);
@ -175,6 +183,12 @@ private:
void SendRuntimeConfigObjects(const JsonRpcConnection::Ptr& aclient);
void SyncClient(const JsonRpcConnection::Ptr& aclient, const Endpoint::Ptr& endpoint, bool needSync);
/* API Config Packages */
mutable boost::mutex m_ActivePackageStagesLock;
std::map<String, String> m_ActivePackageStages;
void UpdateActivePackageStagesCache();
};
}

View File

@ -1,6 +1,7 @@
/* Icinga 2 | (c) 2012 Icinga GmbH | GPLv2+ */
#include "remote/configpackageutility.hpp"
#include "remote/apilistener.hpp"
#include "base/application.hpp"
#include "base/exception.hpp"
#include "base/utility.hpp"
@ -34,6 +35,14 @@ void ConfigPackageUtility::DeletePackage(const String& name)
if (!Utility::PathExists(path))
BOOST_THROW_EXCEPTION(std::invalid_argument("Package does not exist."));
ApiListener::Ptr listener = ApiListener::GetInstance();
/* config packages without API make no sense. */
if (!listener)
BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured."));
listener->RemoveActivePackageStage(name);
Utility::RemoveDirRecursive(path);
Application::RequestRestart();
}
@ -157,10 +166,7 @@ void ConfigPackageUtility::WriteStageConfig(const String& packageName, const Str
void ConfigPackageUtility::ActivateStage(const String& packageName, const String& stageName)
{
String activeStagePath = GetPackageDir() + "/" + packageName + "/active-stage";
std::ofstream fpActiveStage(activeStagePath.CStr(), std::ofstream::out | std::ostream::binary | std::ostream::trunc);
fpActiveStage << stageName;
fpActiveStage.close();
SetActiveStage(packageName, stageName);
WritePackageConfig(packageName);
}
@ -242,8 +248,11 @@ std::vector<String> ConfigPackageUtility::GetStages(const String& packageName)
return stages;
}
String ConfigPackageUtility::GetActiveStage(const String& packageName)
String ConfigPackageUtility::GetActiveStageFromFile(const String& packageName)
{
/* Lock the transaction, reading this only happens on startup or when something really is broken. */
boost::mutex::scoped_lock lock(GetStaticMutex());
String path = GetPackageDir() + "/" + packageName + "/active-stage";
std::ifstream fp;
@ -255,11 +264,62 @@ String ConfigPackageUtility::GetActiveStage(const String& packageName)
fp.close();
if (fp.fail())
return "";
BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot detect active stage for package '" + packageName + "'. Broken config package, check the troubleshooting documentation."));
return stage.Trim();
}
void ConfigPackageUtility::SetActiveStageToFile(const String& packageName, const String& stageName)
{
boost::mutex::scoped_lock lock(GetStaticMutex());
String activeStagePath = GetPackageDir() + "/" + packageName + "/active-stage";
std::ofstream fpActiveStage(activeStagePath.CStr(), std::ofstream::out | std::ostream::binary | std::ostream::trunc); //TODO: fstream exceptions
fpActiveStage << stageName;
fpActiveStage.close();
}
String ConfigPackageUtility::GetActiveStage(const String& packageName)
{
ApiListener::Ptr listener = ApiListener::GetInstance();
/* config packages without API make no sense. */
if (!listener)
BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured."));
String activeStage;
/* First use runtime state. */
try {
activeStage = listener->GetActivePackageStage(packageName);
} catch (const std::exception& ex) {
/* Fallback to reading the file, happens on restarts. */
activeStage = GetActiveStageFromFile(packageName);
/* When we've read something, correct memory. */
if (!activeStage.IsEmpty())
listener->SetActivePackageStage(packageName, activeStage);
else
BOOST_THROW_EXCEPTION(std::invalid_argument("Cannot detect active stage for package '" + packageName + "'. Broken config package, check the troubleshooting documentation."));
}
return activeStage;
}
void ConfigPackageUtility::SetActiveStage(const String& packageName, const String& stageName)
{
ApiListener::Ptr listener = ApiListener::GetInstance();
/* config packages without API make no sense. */
if (!listener)
BOOST_THROW_EXCEPTION(std::invalid_argument("No ApiListener instance configured."));
listener->SetActivePackageStage(packageName, stageName);
/* Also update the marker on disk for restarts. */
SetActiveStageToFile(packageName, stageName);
}
std::vector<std::pair<String, bool> > ConfigPackageUtility::GetFiles(const String& packageName, const String& stageName)
{

View File

@ -32,7 +32,10 @@ public:
static String CreateStage(const String& packageName, const Dictionary::Ptr& files = nullptr);
static void DeleteStage(const String& packageName, const String& stageName);
static std::vector<String> GetStages(const String& packageName);
static String GetActiveStageFromFile(const String& packageName);
static String GetActiveStage(const String& packageName);
static void SetActiveStage(const String& packageName, const String& stageName);
static void SetActiveStageToFile(const String& packageName, const String& stageName);
static void ActivateStage(const String& packageName, const String& stageName);
static void AsyncTryActivateStage(const String& packageName, const String& stageName, bool reload);