mirror of https://github.com/Icinga/icinga2.git
Merge pull request #5693 from Icinga/fix/flapping-old-4982
Re-implement flapping fixes #4982
This commit is contained in:
commit
9ba5b4f4b7
|
@ -414,19 +414,42 @@ Example output in Icinga Web 2:
|
|||
|
||||
Icinga 2 supports optional detection of hosts and services that are "flapping".
|
||||
|
||||
Flapping occurs when a service or host changes state too frequently, resulting
|
||||
in a storm of problem and recovery notifications. Flapping can be the source of
|
||||
configuration problems (i.e. thresholds set too low), troublesome services,
|
||||
or real network problems.
|
||||
Flapping occurs when a service or host changes state too frequently, which would result in a storm of problem and
|
||||
recovery notifications. With flapping enabled a flapping notification will be sent while other notifications are
|
||||
suppresed until it calms down after receiving the same status from checks a few times. flapping can help detecting
|
||||
configuration problems (wrong thresholds), troublesome services, or network problems.
|
||||
|
||||
Flapping detection can be enabled or disabled using the `enable_flapping` attribute.
|
||||
The `flapping_threshold` attributes allows to specify the percentage of state changes
|
||||
when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to flap.
|
||||
The `flapping_threshold_high` and `flapping_threshold_low` attributes allows to specify the thresholds that control
|
||||
when a [host](09-object-types.md#objecttype-host) or [service](objecttype-service) is considered to be flapping.
|
||||
|
||||
Note: There are known issues with flapping detection. Please refrain from enabling
|
||||
flapping until [#4982](https://github.com/Icinga/icinga2/issues/4982) is fixed.
|
||||
The default thresholds are 30% for high and 25% for low. If the computed flapping value excedes the high threshold a
|
||||
host or service is considered flapping until it drops below the low flapping threshold.
|
||||
|
||||
## Volatile Services <a id="volatile-services"></a>
|
||||
`FlappingStart` and `FlappingEnd` notifications will be sent out accordingly, if configured. See the chapter on
|
||||
[notifications](alert-notifications) for details
|
||||
|
||||
> Note: There is no distinctions between hard and soft states with flapping. All state changes count and notifications
|
||||
> will be sent out regardless of the objects state.
|
||||
|
||||
### How it works <a id="how-it-works"></a>
|
||||
|
||||
Icinga 2 saves the last 20 state changes for every host and service. See the graphic below:
|
||||
|
||||
![Icinga 2 Flapping State Timeline](images/advanced-topics/flapping-state-graph.png)
|
||||
|
||||
All the states ware weighted, with the most recent one being worth the most (1.15) and the 20th the least (0.8). The
|
||||
states inbetween are fairly distributed. The final flapping value are the weightened state changes divided by the total
|
||||
count of 20.
|
||||
|
||||
In the example above, the added states would have a total value of 7.82 (`0.84 + 0.86 + 0.88 + 0.9 + 0.98 + 1.06 + 1.12 + 1.18`).
|
||||
This yiels a flapping percentage of 39.1% (`7.82 / 20 * 100`). As the default upper flapping threshold is 30%, it would be
|
||||
considered flapping.
|
||||
|
||||
If the next seven check results then would not be state changes, the flapping percentage would fall below the lower threshold
|
||||
of 25% and therefore the host or service would recover from flapping.
|
||||
|
||||
# Volatile Services <a id="volatile-services"></a>
|
||||
|
||||
By default all services remain in a non-volatile state. When a problem
|
||||
occurs, the `SOFT` state applies and once `max_check_attempts` attribute
|
||||
|
|
|
@ -730,7 +730,8 @@ Configuration Attributes:
|
|||
enable\_flapping | Boolean | **Optional.** Whether flap detection is enabled. Defaults to false.
|
||||
enable\_perfdata | Boolean | **Optional.** Whether performance data processing is enabled. Defaults to true.
|
||||
event\_command | Object name | **Optional.** The name of an event command that should be executed every time the host's state changes or the host is in a `SOFT` state.
|
||||
flapping\_threshold | Number | **Optional.** The flapping threshold in percent when a host is considered to be flapping.
|
||||
flapping\_threshold\_high | Number | **Optional.** Flapping upper bound in percent for a host to be considered flapping. Default `30.0`
|
||||
flapping\_threshold\_low | Number | **Optional.** Flapping lower bound in percent for a host to be considered not flapping. Default `25.0`
|
||||
volatile | Boolean | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to false.
|
||||
zone | Object name | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details.
|
||||
command\_endpoint | Object name | **Optional.** The endpoint where commands are executed on.
|
||||
|
@ -767,6 +768,7 @@ Runtime Attributes:
|
|||
downtime\_depth | Number | Whether the host has one or more active downtimes.
|
||||
flapping\_last\_change | Timestamp | When the last flapping change occurred (as a UNIX timestamp).
|
||||
flapping | Boolean | Whether the host is flapping between states.
|
||||
flapping\_current | Number | Current flapping value in percent (see flapping\_thresholds)
|
||||
state | Number | The current state (0 = UP, 1 = DOWN).
|
||||
last\_state | Number | The previous state (0 = UP, 1 = DOWN).
|
||||
last\_hard\_state | Number | The last hard state (0 = UP, 1 = DOWN).
|
||||
|
@ -1465,9 +1467,10 @@ Configuration Attributes:
|
|||
enable\_passive\_checks | Boolean | **Optional.** Whether passive checks are enabled. Defaults to `true`.
|
||||
enable\_event\_handler | Boolean | **Optional.** Enables event handlers for this host. Defaults to `true`.
|
||||
enable\_flapping | Boolean | **Optional.** Whether flap detection is enabled. Defaults to `false`.
|
||||
flapping\_threshold\_high | Number | **Optional.** Flapping upper bound in percent for a service to be considered flapping. `30.0`
|
||||
flapping\_threshold\_low | Number | **Optional.** Flapping lower bound in percent for a service to be considered not flapping. `25.0`
|
||||
enable\_perfdata | Boolean | **Optional.** Whether performance data processing is enabled. Defaults to `true`.
|
||||
event\_command | Object name | **Optional.** The name of an event command that should be executed every time the service's state changes or the service is in a `SOFT` state.
|
||||
flapping\_threshold | Number | **Optional.** The flapping threshold in percent when a service is considered to be flapping.
|
||||
volatile | Boolean | **Optional.** The volatile setting enables always `HARD` state types if `NOT-OK` state changes occur. Defaults to `false`.
|
||||
zone | Object name | **Optional.** The zone this object is a member of. Please read the [distributed monitoring](06-distributed-monitoring.md#distributed-monitoring) chapter for details.
|
||||
name | String | **Required.** The service name. Must be unique on a per-host basis. For advanced usage in [apply rules](03-monitoring-basics.md#using-apply) only.
|
||||
|
@ -1502,6 +1505,7 @@ Runtime Attributes:
|
|||
acknowledgement\_expiry | Timestamp | When the acknowledgement expires (as a UNIX timestamp; 0 = no expiry).
|
||||
downtime\_depth | Number | Whether the service has one or more active downtimes.
|
||||
flapping\_last\_change | Timestamp | When the last flapping change occurred (as a UNIX timestamp).
|
||||
flapping\_current | Number | Current flapping value in percent (see flapping\_thresholds)
|
||||
flapping | Boolean | Whether the host is flapping between states.
|
||||
state | Number | The current state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
|
||||
last\_state | Number | The previous state (0 = OK, 1 = WARNING, 2 = CRITICAL, 3 = UNKNOWN).
|
||||
|
|
|
@ -317,10 +317,10 @@ void CompatLogger::FlappingChangedHandler(const Checkable::Ptr& checkable)
|
|||
String flapping_output;
|
||||
|
||||
if (checkable->IsFlapping()) {
|
||||
flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
|
||||
flapping_output = "Checkable appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)";
|
||||
flapping_state_str = "STARTED";
|
||||
} else {
|
||||
flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
|
||||
flapping_output = "Checkable appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)";
|
||||
flapping_state_str = "STOPPED";
|
||||
}
|
||||
|
||||
|
|
|
@ -296,8 +296,8 @@ void StatusDataWriter::DumpHostObject(std::ostream& fp, const Host::Ptr& host)
|
|||
fp << "\n";
|
||||
|
||||
fp << "\t" << "initial_state" "\t" "o" "\n"
|
||||
"\t" "low_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n"
|
||||
"\t" "high_flap_threshold" "\t" << host->GetFlappingThreshold() << "\n"
|
||||
"\t" "low_flap_threshold" "\t" << host->GetFlappingThresholdLow() << "\n"
|
||||
"\t" "high_flap_threshold" "\t" << host->GetFlappingThresholdHigh() << "\n"
|
||||
"\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(host) << "\n"
|
||||
"\t" "check_freshness" "\t" "1" "\n";
|
||||
|
||||
|
@ -470,8 +470,8 @@ void StatusDataWriter::DumpServiceObject(std::ostream& fp, const Service::Ptr& s
|
|||
String icon_image_alt = service->GetIconImageAlt();
|
||||
|
||||
fp << "\t" "initial_state" "\t" "o" "\n"
|
||||
"\t" "low_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n"
|
||||
"\t" "high_flap_threshold" "\t" << service->GetFlappingThreshold() << "\n"
|
||||
"\t" "low_flap_threshold" "\t" << service->GetFlappingThresholdLow() << "\n"
|
||||
"\t" "high_flap_threshold" "\t" << service->GetFlappingThresholdHigh() << "\n"
|
||||
"\t" "process_perf_data" "\t" << CompatUtility::GetCheckableProcessPerformanceData(service) << "\n"
|
||||
"\t" "check_freshness" << "\t" "1" "\n";
|
||||
if (!notes.IsEmpty())
|
||||
|
|
|
@ -1194,10 +1194,10 @@ void DbEvents::AddFlappingChangedLogHistory(const Checkable::Ptr& checkable)
|
|||
String flapping_output;
|
||||
|
||||
if (checkable->IsFlapping()) {
|
||||
flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
|
||||
flapping_output = "Service appears to have started flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change >= " + Convert::ToString(checkable->GetFlappingThresholdHigh()) + "% threshold)";
|
||||
flapping_state_str = "STARTED";
|
||||
} else {
|
||||
flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThreshold()) + "% threshold)";
|
||||
flapping_output = "Service appears to have stopped flapping (" + Convert::ToString(checkable->GetFlappingCurrent()) + "% change < " + Convert::ToString(checkable->GetFlappingThresholdLow()) + "% threshold)";
|
||||
flapping_state_str = "STOPPED";
|
||||
}
|
||||
|
||||
|
@ -1323,8 +1323,8 @@ void DbEvents::AddFlappingChangedHistory(const Checkable::Ptr& checkable)
|
|||
fields1->Set("flapping_type", service ? 1 : 0);
|
||||
fields1->Set("object_id", checkable);
|
||||
fields1->Set("percent_state_change", checkable->GetFlappingCurrent());
|
||||
fields1->Set("low_threshold", checkable->GetFlappingThreshold());
|
||||
fields1->Set("high_threshold", checkable->GetFlappingThreshold());
|
||||
fields1->Set("low_threshold", checkable->GetFlappingThresholdLow());
|
||||
fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh());
|
||||
|
||||
fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */
|
||||
|
||||
|
@ -1369,8 +1369,8 @@ void DbEvents::AddEnableFlappingChangedHistory(const Checkable::Ptr& checkable)
|
|||
fields1->Set("flapping_type", service ? 1 : 0);
|
||||
fields1->Set("object_id", checkable);
|
||||
fields1->Set("percent_state_change", checkable->GetFlappingCurrent());
|
||||
fields1->Set("low_threshold", checkable->GetFlappingThreshold());
|
||||
fields1->Set("high_threshold", checkable->GetFlappingThreshold());
|
||||
fields1->Set("low_threshold", checkable->GetFlappingThresholdLow());
|
||||
fields1->Set("high_threshold", checkable->GetFlappingThresholdHigh());
|
||||
|
||||
fields1->Set("instance_id", 0); /* DbConnection class fills in real ID */
|
||||
|
||||
|
|
|
@ -315,14 +315,11 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig
|
|||
olock.Lock();
|
||||
SetLastCheckResult(cr);
|
||||
|
||||
bool was_flapping, is_flapping;
|
||||
bool was_flapping = IsFlapping();
|
||||
|
||||
was_flapping = IsFlapping();
|
||||
UpdateFlappingStatus(old_state != cr->GetState());
|
||||
|
||||
if (GetStateType() == StateTypeHard)
|
||||
UpdateFlappingStatus(stateChange);
|
||||
|
||||
is_flapping = IsFlapping();
|
||||
bool is_flapping = IsFlapping();
|
||||
|
||||
if (cr->GetActive()) {
|
||||
UpdateNextCheck(origin);
|
||||
|
@ -368,13 +365,13 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig
|
|||
ExecuteEventHandler();
|
||||
|
||||
/* Flapping start/end notifications */
|
||||
if (send_notification && !was_flapping && is_flapping) {
|
||||
if (!in_downtime && !was_flapping && is_flapping) {
|
||||
/* FlappingStart notifications happen on state changes, not in downtimes */
|
||||
if (!IsPaused())
|
||||
OnNotificationsRequested(this, NotificationFlappingStart, cr, "", "", MessageOrigin::Ptr());
|
||||
|
||||
Log(LogNotice, "Checkable")
|
||||
<< "Flapping: Checkable '" << GetName() << "' started flapping (" << GetFlappingThreshold() << "% < " << GetFlappingCurrent() << "%).";
|
||||
<< "Flapping: Checkable '" << GetName() << "' started flapping (Current flapping value " << GetFlappingCurrent() << "% > threshold " << GetFlappingThresholdHigh() << "%).";
|
||||
|
||||
NotifyFlapping(origin);
|
||||
} else if (!in_downtime && was_flapping && !is_flapping) {
|
||||
|
@ -383,7 +380,7 @@ void Checkable::ProcessCheckResult(const CheckResult::Ptr& cr, const MessageOrig
|
|||
OnNotificationsRequested(this, NotificationFlappingEnd, cr, "", "", MessageOrigin::Ptr());
|
||||
|
||||
Log(LogNotice, "Checkable")
|
||||
<< "Flapping: Checkable '" << GetName() << "' stopped flapping (" << GetFlappingThreshold() << "% >= " << GetFlappingCurrent() << "%).";
|
||||
<< "Flapping: Checkable '" << GetName() << "' stopped flapping (Current flapping value " << GetFlappingCurrent() << "% < threshold " << GetFlappingThresholdLow() << "%).";
|
||||
|
||||
NotifyFlapping(origin);
|
||||
}
|
||||
|
|
|
@ -17,58 +17,43 @@
|
|||
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. *
|
||||
******************************************************************************/
|
||||
|
||||
#include <bitset>
|
||||
#include "icinga/checkable.hpp"
|
||||
#include "icinga/icingaapplication.hpp"
|
||||
#include "base/utility.hpp"
|
||||
|
||||
using namespace icinga;
|
||||
|
||||
#define FLAPPING_INTERVAL (30 * 60)
|
||||
|
||||
double Checkable::GetFlappingCurrent(void) const
|
||||
{
|
||||
if (GetFlappingPositive() + GetFlappingNegative() <= 0)
|
||||
return 0;
|
||||
|
||||
return 100 * GetFlappingPositive() / (GetFlappingPositive() + GetFlappingNegative());
|
||||
}
|
||||
|
||||
void Checkable::UpdateFlappingStatus(bool stateChange)
|
||||
{
|
||||
double ts, now;
|
||||
long positive, negative;
|
||||
std::bitset<20> stateChangeBuf = GetFlappingBuffer();
|
||||
int oldestIndex = (GetFlappingBuffer() & 0xFF00000) >> 20;
|
||||
|
||||
now = Utility::GetTime();
|
||||
stateChangeBuf[oldestIndex] = stateChange;
|
||||
oldestIndex = (oldestIndex + 1) % 20;
|
||||
|
||||
ts = GetFlappingLastChange();
|
||||
positive = GetFlappingPositive();
|
||||
negative = GetFlappingNegative();
|
||||
double stateChanges = 0;
|
||||
|
||||
double diff = now - ts;
|
||||
|
||||
if (positive + negative > FLAPPING_INTERVAL) {
|
||||
double pct = (positive + negative - FLAPPING_INTERVAL) / FLAPPING_INTERVAL;
|
||||
positive -= pct * positive;
|
||||
negative -= pct * negative;
|
||||
for (int i = 0; i < 20; i++) {
|
||||
if (stateChangeBuf[(oldestIndex + i) % 20])
|
||||
stateChanges += 0.8 + (0.02 * i);
|
||||
}
|
||||
|
||||
if (stateChange)
|
||||
positive += diff;
|
||||
double flappingValue = 100.0 * stateChanges / 20.0;
|
||||
|
||||
bool flapping;
|
||||
|
||||
if (GetFlapping())
|
||||
flapping = flappingValue > GetFlappingThresholdLow();
|
||||
else
|
||||
negative += diff;
|
||||
flapping = flappingValue > GetFlappingThresholdHigh();
|
||||
|
||||
if (positive < 0)
|
||||
positive = 0;
|
||||
if (flapping != GetFlapping())
|
||||
SetFlappingLastChange(Utility::GetTime());
|
||||
|
||||
if (negative < 0)
|
||||
negative = 0;
|
||||
|
||||
// Log(LogDebug, "Checkable")
|
||||
// << "Flapping counter for '" << GetName() << "' is positive=" << positive << ", negative=" << negative;
|
||||
|
||||
SetFlappingLastChange(now);
|
||||
SetFlappingPositive(positive);
|
||||
SetFlappingNegative(negative);
|
||||
SetFlappingBuffer((stateChangeBuf.to_ulong() | (oldestIndex << 20)));
|
||||
SetFlappingCurrent(flappingValue);
|
||||
SetFlapping(flapping);
|
||||
}
|
||||
|
||||
bool Checkable::IsFlapping(void) const
|
||||
|
@ -76,5 +61,5 @@ bool Checkable::IsFlapping(void) const
|
|||
if (!GetEnableFlapping() || !IcingaApplication::GetInstance()->GetEnableFlapping())
|
||||
return false;
|
||||
else
|
||||
return GetFlappingCurrent() > GetFlappingThreshold();
|
||||
return GetFlapping();
|
||||
}
|
||||
|
|
|
@ -180,8 +180,6 @@ public:
|
|||
intrusive_ptr<EventCommand> GetEventCommand(void) const;
|
||||
|
||||
/* Flapping Detection */
|
||||
double GetFlappingCurrent(void) const;
|
||||
|
||||
bool IsFlapping(void) const;
|
||||
void UpdateFlappingStatus(bool stateChange);
|
||||
|
||||
|
|
|
@ -70,9 +70,7 @@ abstract class Checkable : CustomVarObject
|
|||
}}}
|
||||
};
|
||||
[config] bool volatile;
|
||||
[config] double flapping_threshold {
|
||||
default {{{ return 30; }}}
|
||||
};
|
||||
|
||||
[config] bool enable_active_checks {
|
||||
default {{{ return true; }}}
|
||||
};
|
||||
|
@ -92,6 +90,16 @@ abstract class Checkable : CustomVarObject
|
|||
default {{{ return true; }}}
|
||||
};
|
||||
|
||||
[config, deprecated] double flapping_threshold;
|
||||
|
||||
[config] double flapping_threshold_low {
|
||||
default {{{ return 25; }}}
|
||||
};
|
||||
|
||||
[config] double flapping_threshold_high{
|
||||
default {{{ return 30; }}}
|
||||
};
|
||||
|
||||
[config] String notes;
|
||||
[config] String notes_url;
|
||||
[config] String action_url;
|
||||
|
@ -139,12 +147,6 @@ abstract class Checkable : CustomVarObject
|
|||
};
|
||||
[state] Timestamp acknowledgement_expiry;
|
||||
[state] bool force_next_notification;
|
||||
[state] int flapping_positive;
|
||||
[state] int flapping_negative;
|
||||
[state] Timestamp flapping_last_change;
|
||||
[no_storage, protected] bool flapping {
|
||||
get {{{ return false; }}}
|
||||
};
|
||||
[no_storage] Timestamp last_check {
|
||||
get;
|
||||
};
|
||||
|
@ -152,6 +154,13 @@ abstract class Checkable : CustomVarObject
|
|||
get;
|
||||
};
|
||||
|
||||
[state] double flapping_current {
|
||||
default {{{ return 0; }}}
|
||||
};
|
||||
[state] Timestamp flapping_last_change;
|
||||
[state, no_user_view, no_user_modify] int flapping_buffer;
|
||||
[state, protected] bool flapping;
|
||||
|
||||
[config, navigation] name(Endpoint) command_endpoint (CommandEndpointRaw) {
|
||||
navigate {{{
|
||||
return Endpoint::GetByName(GetCommandEndpointRaw());
|
||||
|
|
|
@ -300,12 +300,12 @@ int CompatUtility::GetCheckableIsVolatile(const Checkable::Ptr& checkable)
|
|||
|
||||
double CompatUtility::GetCheckableLowFlapThreshold(const Checkable::Ptr& checkable)
|
||||
{
|
||||
return checkable->GetFlappingThreshold();
|
||||
return checkable->GetFlappingThresholdLow();
|
||||
}
|
||||
|
||||
double CompatUtility::GetCheckableHighFlapThreshold(const Checkable::Ptr& checkable)
|
||||
{
|
||||
return checkable->GetFlappingThreshold();
|
||||
return checkable->GetFlappingThresholdHigh();
|
||||
}
|
||||
|
||||
int CompatUtility::GetCheckableFreshnessChecksEnabled(const Checkable::Ptr& checkable)
|
||||
|
|
|
@ -99,10 +99,10 @@ add_boost_test(base
|
|||
icinga_checkresult/service_1attempt
|
||||
icinga_checkresult/service_2attempts
|
||||
icinga_checkresult/service_3attempts
|
||||
icinga_checkresult/host_flapping_notification
|
||||
icinga_checkresult/service_flapping_notification
|
||||
icinga_notification/state_filter
|
||||
icinga_notification/type_filter
|
||||
icinga_checkresult/host_flapping_notification
|
||||
icinga_checkresult/service_flapping_notification
|
||||
icinga_notification/state_filter
|
||||
icinga_notification/type_filter
|
||||
icinga_macros/simple
|
||||
icinga_perfdata/empty
|
||||
icinga_perfdata/simple
|
||||
|
@ -136,3 +136,17 @@ if(ICINGA2_WITH_LIVESTATUS)
|
|||
TESTS livestatus/hosts livestatus/services
|
||||
)
|
||||
endif()
|
||||
|
||||
set(icinga_checkable_test_SOURCES
|
||||
icinga-checkable-flapping.cpp
|
||||
)
|
||||
|
||||
add_boost_test(icinga_checkable
|
||||
SOURCES icinga-checkable-test.cpp ${icinga_checkable_test_SOURCES}
|
||||
LIBRARIES base config icinga cli
|
||||
TESTS icinga_checkable_flapping/host_not_flapping
|
||||
icinga_checkable_flapping/host_flapping
|
||||
icinga_checkable_flapping/host_flapping_recover
|
||||
icinga_checkable_flapping/host_flapping_docs_example
|
||||
)
|
||||
|
||||
|
|
|
@ -0,0 +1,260 @@
|
|||
/******************************************************************************
|
||||
* Icinga 2 *
|
||||
* Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/) *
|
||||
* *
|
||||
* This program is free software; you can redistribute it and/or *
|
||||
* modify it under the terms of the GNU General Public License *
|
||||
* as published by the Free Software Foundation; either version 2 *
|
||||
* of the License, or (at your option) any later version. *
|
||||
* *
|
||||
* This program is distributed in the hope that it will be useful, *
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||
* GNU General Public License for more details. *
|
||||
* *
|
||||
* You should have received a copy of the GNU General Public License *
|
||||
* along with this program; if not, write to the Free Software Foundation *
|
||||
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. *
|
||||
******************************************************************************/
|
||||
|
||||
#include <boost/test/unit_test.hpp>
|
||||
#include <bitset>
|
||||
#include "icinga/host.hpp"
|
||||
#include <iostream>
|
||||
|
||||
using namespace icinga;
|
||||
|
||||
#ifdef I2_DEBUG
|
||||
static CheckResult::Ptr MakeCheckResult(ServiceState state)
|
||||
{
|
||||
CheckResult::Ptr cr = new CheckResult();
|
||||
|
||||
cr->SetState(state);
|
||||
|
||||
double now = Utility::GetTime();
|
||||
cr->SetScheduleStart(now);
|
||||
cr->SetScheduleEnd(now);
|
||||
cr->SetExecutionStart(now);
|
||||
cr->SetExecutionEnd(now);
|
||||
|
||||
Utility::IncrementTime(60);
|
||||
|
||||
return cr;
|
||||
}
|
||||
|
||||
static void LogFlapping(const Checkable::Ptr& obj)
|
||||
{
|
||||
std::bitset<20> stateChangeBuf = obj->GetFlappingBuffer();
|
||||
int oldestIndex = (obj->GetFlappingBuffer() & 0xFF00000) >> 20;
|
||||
|
||||
std::cout << "Flapping: " << obj->IsFlapping() << "\nHT: " << obj->GetFlappingThresholdHigh() << " LT: " << obj->GetFlappingThresholdLow()
|
||||
<< "\nOur value: " << obj->GetFlappingCurrent() << "\nPtr: " << oldestIndex << " Buf: " << stateChangeBuf << '\n';
|
||||
}
|
||||
|
||||
|
||||
static void LogHostStatus(const Host::Ptr &host)
|
||||
{
|
||||
std::cout << "Current status: state: " << host->GetState() << " state_type: " << host->GetStateType()
|
||||
<< " check attempt: " << host->GetCheckAttempt() << "/" << host->GetMaxCheckAttempts() << std::endl;
|
||||
}
|
||||
#endif /* I2_DEBUG */
|
||||
|
||||
BOOST_AUTO_TEST_SUITE(icinga_checkable_flapping)
|
||||
|
||||
BOOST_AUTO_TEST_CASE(host_not_flapping)
|
||||
{
|
||||
#ifndef I2_DEBUG
|
||||
BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
|
||||
#else /* I2_DEBUG */
|
||||
std::cout << "Running test with a non-flapping host...\n";
|
||||
|
||||
Host::Ptr host = new Host();
|
||||
host->SetName("test");
|
||||
host->SetEnableFlapping(true);
|
||||
host->SetMaxCheckAttempts(5);
|
||||
|
||||
// Host otherwise is soft down
|
||||
host->SetState(HostUp);
|
||||
host->SetStateType(StateTypeHard);
|
||||
|
||||
Utility::SetTime(0);
|
||||
|
||||
BOOST_CHECK(host->GetFlappingCurrent() == 0);
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
|
||||
// watch the state being stable
|
||||
int i = 0;
|
||||
while (i++ < 10) {
|
||||
// For some reason, elusive to me, the first check is a state change
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
|
||||
BOOST_CHECK(host->GetState() == 0);
|
||||
BOOST_CHECK(host->GetCheckAttempt() == 1);
|
||||
BOOST_CHECK(host->GetStateType() == StateTypeHard);
|
||||
|
||||
//Should not be flapping
|
||||
BOOST_CHECK(!host->IsFlapping());
|
||||
BOOST_CHECK(host->GetFlappingCurrent() < 30.0);
|
||||
}
|
||||
#endif /* I2_DEBUG */
|
||||
}
|
||||
|
||||
BOOST_AUTO_TEST_CASE(host_flapping)
|
||||
{
|
||||
#ifndef I2_DEBUG
|
||||
BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
|
||||
#else /* I2_DEBUG */
|
||||
std::cout << "Running test with host changing state with every check...\n";
|
||||
|
||||
Host::Ptr host = new Host();
|
||||
host->SetName("test");
|
||||
host->SetEnableFlapping(true);
|
||||
host->SetMaxCheckAttempts(5);
|
||||
|
||||
Utility::SetTime(0);
|
||||
|
||||
int i = 0;
|
||||
while (i++ < 25) {
|
||||
if (i % 2)
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
else
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
|
||||
//30 Percent is our high Threshold
|
||||
if (i >= 6) {
|
||||
BOOST_CHECK(host->IsFlapping());
|
||||
} else {
|
||||
BOOST_CHECK(!host->IsFlapping());
|
||||
}
|
||||
}
|
||||
#endif /* I2_DEBUG */
|
||||
}
|
||||
|
||||
BOOST_AUTO_TEST_CASE(host_flapping_recover)
|
||||
{
|
||||
#ifndef I2_DEBUG
|
||||
BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
|
||||
#else /* I2_DEBUG */
|
||||
std::cout << "Running test with flapping recovery...\n";
|
||||
|
||||
Host::Ptr host = new Host();
|
||||
host->SetName("test");
|
||||
host->SetEnableFlapping(true);
|
||||
host->SetMaxCheckAttempts(5);
|
||||
|
||||
// Host otherwise is soft down
|
||||
host->SetState(HostUp);
|
||||
host->SetStateType(StateTypeHard);
|
||||
|
||||
Utility::SetTime(0);
|
||||
|
||||
// A few warning
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
for (int i = 0; i <= 7; i++) {
|
||||
if (i % 2)
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
else
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
}
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
|
||||
// We should be flapping now
|
||||
BOOST_CHECK(host->GetFlappingCurrent() > 30.0);
|
||||
BOOST_CHECK(host->IsFlapping());
|
||||
|
||||
// Now recover from flapping
|
||||
int count = 0;
|
||||
while (host->IsFlapping()) {
|
||||
BOOST_CHECK(host->GetFlappingCurrent() > 25.0);
|
||||
BOOST_CHECK(host->IsFlapping());
|
||||
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
count++;
|
||||
}
|
||||
|
||||
std::cout << "Recovered from flapping after " << count << " Warning results.\n";
|
||||
|
||||
BOOST_CHECK(host->GetFlappingCurrent() < 25.0);
|
||||
BOOST_CHECK(!host->IsFlapping());
|
||||
#endif /* I2_DEBUG */
|
||||
}
|
||||
|
||||
BOOST_AUTO_TEST_CASE(host_flapping_docs_example)
|
||||
{
|
||||
#ifndef I2_DEBUG
|
||||
BOOST_WARN_MESSAGE(false, "This test can only be run in a debug build!");
|
||||
#else /* I2_DEBUG */
|
||||
std::cout << "Simulating the documentation example...\n";
|
||||
|
||||
Host::Ptr host = new Host();
|
||||
host->SetName("test");
|
||||
host->SetEnableFlapping(true);
|
||||
host->SetMaxCheckAttempts(5);
|
||||
|
||||
// Host otherwise is soft down
|
||||
host->SetState(HostUp);
|
||||
host->SetStateType(StateTypeHard);
|
||||
|
||||
Utility::SetTime(0);
|
||||
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceWarning));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
BOOST_CHECK(host->GetFlappingCurrent() == 39.1);
|
||||
BOOST_CHECK(host->IsFlapping());
|
||||
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceCritical));
|
||||
|
||||
LogFlapping(host);
|
||||
LogHostStatus(host);
|
||||
BOOST_CHECK(host->GetFlappingCurrent() < 25.0);
|
||||
BOOST_CHECK(!host->IsFlapping());
|
||||
#endif
|
||||
}
|
||||
|
||||
BOOST_AUTO_TEST_SUITE_END()
|
|
@ -0,0 +1,66 @@
|
|||
/******************************************************************************
|
||||
* Icinga 2 *
|
||||
* Copyright (C) 2012-2016 Icinga Development Team (https://www.icinga.org/) *
|
||||
* *
|
||||
* This program is free software; you can redistribute it and/or *
|
||||
* modify it under the terms of the GNU General Public License *
|
||||
* as published by the Free Software Foundation; either version 2 *
|
||||
* of the License, or (at your option) any later version. *
|
||||
* *
|
||||
* This program is distributed in the hope that it will be useful, *
|
||||
* but WITHOUT ANY WARRANTY; without even the implied warranty of *
|
||||
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the *
|
||||
* GNU General Public License for more details. *
|
||||
* *
|
||||
* You should have received a copy of the GNU General Public License *
|
||||
* along with this program; if not, write to the Free Software Foundation *
|
||||
* Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301, USA. *
|
||||
******************************************************************************/
|
||||
|
||||
#define BOOST_TEST_MAIN
|
||||
#define BOOST_TEST_MODULE icinga2_test
|
||||
|
||||
#include "cli/daemonutility.hpp"
|
||||
#include "base/application.hpp"
|
||||
#include "base/loader.hpp"
|
||||
#include <BoostTestTargetConfig.h>
|
||||
#include <fstream>
|
||||
|
||||
using namespace icinga;
|
||||
|
||||
struct IcingaCheckableFixture
|
||||
{
|
||||
IcingaCheckableFixture(void)
|
||||
{
|
||||
BOOST_TEST_MESSAGE("setup running Icinga 2 core");
|
||||
|
||||
Application::InitializeBase();
|
||||
|
||||
/* start the Icinga application and load the configuration */
|
||||
Application::DeclareSysconfDir("etc");
|
||||
Application::DeclareLocalStateDir("var");
|
||||
|
||||
ActivationScope ascope;
|
||||
|
||||
Loader::LoadExtensionLibrary("icinga");
|
||||
Loader::LoadExtensionLibrary("methods"); //loaded by ITL
|
||||
|
||||
std::vector<std::string> configs;
|
||||
std::vector<ConfigItem::Ptr> newItems;
|
||||
|
||||
DaemonUtility::LoadConfigFiles(configs, newItems, "icinga2.debug", "icinga2.vars");
|
||||
|
||||
/* ignore config errors */
|
||||
WorkQueue upq;
|
||||
ConfigItem::ActivateItems(upq, newItems);
|
||||
}
|
||||
|
||||
~IcingaCheckableFixture(void)
|
||||
{
|
||||
BOOST_TEST_MESSAGE("cleanup Icinga 2 core");
|
||||
Application::UninitializeBase();
|
||||
}
|
||||
};
|
||||
|
||||
BOOST_GLOBAL_FIXTURE(IcingaCheckableFixture);
|
||||
|
|
@ -395,11 +395,9 @@ BOOST_AUTO_TEST_CASE(host_flapping_notification)
|
|||
#else /* I2_DEBUG */
|
||||
boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2));
|
||||
|
||||
int softStateCount = 20;
|
||||
int timeStepInterval = 60;
|
||||
|
||||
Host::Ptr host = new Host();
|
||||
host->SetMaxCheckAttempts(softStateCount);
|
||||
host->Activate();
|
||||
host->SetAuthority(true);
|
||||
host->SetStateRaw(ServiceOK);
|
||||
|
@ -418,18 +416,25 @@ BOOST_AUTO_TEST_CASE(host_flapping_notification)
|
|||
|
||||
std::cout << "Inserting flapping check results" << std::endl;
|
||||
|
||||
for (int i = 0; i < softStateCount; i++) {
|
||||
for (int i = 0; i < 10; i++) {
|
||||
ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical);
|
||||
host->ProcessCheckResult(MakeCheckResult(state));
|
||||
Utility::IncrementTime(timeStepInterval);
|
||||
}
|
||||
|
||||
std::cout << "Checking host state (must be flapping in SOFT state)" << std::endl;
|
||||
BOOST_CHECK(host->GetStateType() == StateTypeSoft);
|
||||
BOOST_CHECK(host->IsFlapping() == true);
|
||||
|
||||
std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl;
|
||||
CheckNotification(host, false, NotificationFlappingStart);
|
||||
CheckNotification(host, true, NotificationFlappingStart);
|
||||
|
||||
std::cout << "Now calm down..." << std::endl;
|
||||
|
||||
for (int i = 0; i < 20; i++) {
|
||||
host->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
Utility::IncrementTime(timeStepInterval);
|
||||
}
|
||||
|
||||
CheckNotification(host, true, NotificationFlappingEnd);
|
||||
|
||||
|
||||
c.disconnect();
|
||||
|
||||
|
@ -443,11 +448,9 @@ BOOST_AUTO_TEST_CASE(service_flapping_notification)
|
|||
#else /* I2_DEBUG */
|
||||
boost::signals2::connection c = Checkable::OnNotificationsRequested.connect(boost::bind(&NotificationHandler, _1, _2));
|
||||
|
||||
int softStateCount = 20;
|
||||
int timeStepInterval = 60;
|
||||
|
||||
Host::Ptr service = new Host();
|
||||
service->SetMaxCheckAttempts(softStateCount);
|
||||
service->Activate();
|
||||
service->SetAuthority(true);
|
||||
service->SetStateRaw(ServiceOK);
|
||||
|
@ -466,18 +469,24 @@ BOOST_AUTO_TEST_CASE(service_flapping_notification)
|
|||
|
||||
std::cout << "Inserting flapping check results" << std::endl;
|
||||
|
||||
for (int i = 0; i < softStateCount; i++) {
|
||||
for (int i = 0; i < 10; i++) {
|
||||
ServiceState state = (i % 2 == 0 ? ServiceOK : ServiceCritical);
|
||||
service->ProcessCheckResult(MakeCheckResult(state));
|
||||
Utility::IncrementTime(timeStepInterval);
|
||||
}
|
||||
|
||||
std::cout << "Checking service state (must be flapping in SOFT state)" << std::endl;
|
||||
BOOST_CHECK(service->GetStateType() == StateTypeSoft);
|
||||
BOOST_CHECK(service->IsFlapping() == true);
|
||||
|
||||
std::cout << "No FlappingStart notification type must have been triggered in a SOFT state" << std::endl;
|
||||
CheckNotification(service, false, NotificationFlappingStart);
|
||||
CheckNotification(service, true, NotificationFlappingStart);
|
||||
|
||||
std::cout << "Now calm down..." << std::endl;
|
||||
|
||||
for (int i = 0; i < 20; i++) {
|
||||
service->ProcessCheckResult(MakeCheckResult(ServiceOK));
|
||||
Utility::IncrementTime(timeStepInterval);
|
||||
}
|
||||
|
||||
CheckNotification(service, true, NotificationFlappingEnd);
|
||||
|
||||
c.disconnect();
|
||||
|
||||
|
|
|
@ -514,7 +514,7 @@ void ClassCompiler::HandleClass(const Klass& klass, const ClassDebugInfo&)
|
|||
if (field.Type.GetRealType().find("::Ptr") != std::string::npos)
|
||||
m_Impl << "\t" << "if (" << argName << ")" << std::endl;
|
||||
else
|
||||
m_Impl << "\t" << "if (!" << argName << ".IsEmpty())" << std::endl;
|
||||
m_Impl << "\t" << "if (" << argName << " != GetDefault" << field.GetFriendlyName() << "())" << std::endl;
|
||||
|
||||
m_Impl << "\t\t" << "Log(LogWarning, \"" << klass.Name << "\") << \"Attribute '" << field.Name << "' for object '\" << dynamic_cast<ConfigObject *>(this)->GetName() << \"' of type '\" << dynamic_cast<ConfigObject *>(this)->GetReflectionType()->GetName() << \"' is deprecated and should not be used.\";" << std::endl;
|
||||
}
|
||||
|
|
Loading…
Reference in New Issue