So far, the documentation has claimed that loggers have a default severity
(information for FileLogger and warning for SyslogLogger). However, this was
not the case and not setting the severity resulted in a configuration error.
This commit changes the default value to be information for all loggers.
This commit adds a timeout for both establishing new outgoing and incoming
connections. This timeout applies to everything until the connection is in a
state where either JsonRpcConnection or HttpServerConnection takes over.
As silent now no longer only controls the generation of log messages, a better
name is required. This changes its name, inverts its value to reflect the new
name and adds a documentation comment.
When Icinga 2 is started as a service, the early log messages generated
until the FileLogger object is activated are lost and make it really
hard to debug issues that (only) occur when Icinga 2 reloads.
With this commit, these early log messages are written to the Windows
Event Log.
68a0079c26 introduced two problems that are fixed
with this commit:
1. The new truncated/hashed name did not use EscapeName()
2. There was a possible collision of names when creating objects with a full
name of format "[80 characters]...[40 hex digits]" (i.e. the same as the
truncated/hashed variant but short enough that it isn't hashed)
The old validation regex matched if the name consists only of invalid
character, not that it does not contain them, i.e. something like "foo/bar" was
considered valid.
This commit replaces the regex with a check that all characters in the name are
allowed characters.
I.e. keep the serializations as simple as possible:
null => null
true => true
42.0 => 42
"foobar" => foobar
{{42}} => Object of type 'Function'
(["foobar"] and {"foo"="bar"} can't occur there.)
Even if a double represents an integer value, it might not be safe to cast it
to long long as it may overflow the type. Instead just use print the double
value with 0 decimals using std::setprecision.
Before:
<1> => 18446744073709551616.to_string()
"-9223372036854775808"
After:
<1> => 18446744073709551616.to_string()
"18446744073709551616"
Fixes the following build error:
/home/jbrost/dev/icinga2/lib/base/stdiostream.cpp: In member function ‘virtual size_t icinga::StdioStream::Read(void*, size_t, bool)’:
/home/jbrost/dev/icinga2/lib/base/stdiostream.cpp:28:15: error: invalid use of incomplete type ‘std::iostream’ {aka ‘class std::basic_iostream<char>’}
28 | m_InnerStream->read(static_cast<char *>(buffer), size);
| ^~
Note that even when passing `nullptr` as target zone to `RelayMessage()`, the
cluster message will still be sent to the parent zone. These incoming messages
will now be rejected by the parent nodes. At the moment, there's no way to only
send within the local zone.
Unfortunately, the symbol resolution of boost::stacktrace is broken on
FreeBSD, therefore fall back to using backtrace_symbols() to print the
stack trace saved by Boost.
Additionally, -D_GNU_SOURCE is required on FreeBSD for the
_Unwind_Backtrace function used by boost::stacktrace.
This makes the format more similar to what the uncaught C++ and SEH
exception handlers write. Previously there was no indication in the
crash log that a SIGABRT happened.
Maybe this will save the next person who has to look at this code some
time. Please don't blame me for the implementation, I'm just trying to
reconstruct what it does.
The logic for selecting the traces to print stays the same, but there
are fewer nested ifs now. This changes the format of the returned string
a bit by adding a heading for both traces.
By default, DiagnosticInformation uses the stack trace saved when the
exception was thrown, but this mechanism is not in use on Windows.
Gathering a stacktrace in the terminate handler serves as a fallback.
On Windows, the termination handler is executed for uncaught C++
exceptions unless a SEH unhandled exception filter is also set. In this
case, this filter has to explicitly chain the default filter to keep
this behavior.
Previously:
1. You delete an object from a config file
2. You reload Icinga
3. Icinga fetches all objects and whether they're active from the IDO
4. Icinga recognizes that the just deleted object doesn't exist anymore
5. Icinga marks it as inactive in the IDO, but not in memory
6. You re-create the just deleted object via API
7. Icinga still thinks it's active and doesn't activate it - it's invisible
refs #8584
Previously, the initial config dump was started in a timer executed
every 15 seconds. During the first execution of the timer, the Redis
connection is typically not established yet. Therefore, this delayed the
initial sync by up to 15 seconds.
This commit instead triggers the sync from a callback that is executed
after the connection is successfully established.
The timer is removed completely. On first glance, it looks like it would
ensure that a lost connection is reestablished, but this is handled
internally by RedisConnection. After the config has been dumped once,
that timer wouldn't ever attempt a reconnect anyways.
Disregard passive check results while no active checks are being scheduled due to violated dependencies.
This copes with the fact that programs feeding passive check results into Icinga may have no notion of reachability and so drive a checkable into HARD state although dependencies have caused active check scheduling being suspended. This may prevent superflous problem notifications being emitted during recovery.
As disable_checks defaults to false, it was regarded OK (by @Al2Klimov) to make this behaviour (which resembles the active check case) unconditional and not conditionalize it on an additional attribute.
In the description of disable_checks, note that a value of true both disables scheduling of active checks and drops passive check results.
This delays the log message stating that the initial dump is done until
all queries are actually done and now logs a meaningful duration. In
addition, this delays the return of the function and therefore when
state variables are updated by the caller.
This commit sets the activation priority if IcingaDB objects to 100 (the
same value as IDO uses) so that it get's activated after most regular
config objects (hosts, services, ...).
Before (note how Icinga 2 continues to active objects for over a minute
after IcingaDB is started and thinks the initial dump is done):
[2021-01-19 08:33:19 +0000] information/IcingaDB: 'icingadb' started.
[2021-01-19 08:34:02 +0000] information/IcingaDB: Initial config/status dump finished in 28.247 seconds.
[2021-01-19 08:35:49 +0000] information/ConfigItem: Activated all objects.
After (now activation of objects is done right after IcingaDB is
started, as it's one of the last objects to be activated):
[2021-01-19 08:39:01 +0000] information/IcingaDB: 'icingadb' started.
[2021-01-19 08:39:02 +0000] information/ConfigItem: Activated all objects.
[2021-01-19 08:39:38 +0000] information/IcingaDB: Initial config/status dump finished in 21.6606 seconds.
There is an assertion that after activating items, all these items are
active, which sounds reasonable at first. However, with concurrent API
queries, some of these could already be deleted and therefore be
deactivated again.
At numerous places in the code, something like this is performed:
String name = Downtime::AddDowntime(...);
Downtime::Ptr downtime = Downtime::GetByName(name);
However, `downtime` can be a `nullptr` after this as it is possible that
the downtime is deleted in between.
This commit changes the return type of `Downtime::AddDowntime` to return
a Downtime::Ptr instead of the full name of the downtime. `AddDowntime`
performs the very same `GetByName()` operation internally, but handles
the `nullptr` case correctly and throws an exception.
Since commit d9010c7b9f, ActivateItems no
longer uses the WorkQueue upq to perform tasks but instead performs
these locally. One instance of `upq.Join()`/`upq.HasExceptions()`
remained in the function, but I believe this was just missed when
removing the `upq.Enqueue()` call just before.
This commit removes the corresponding parameter and updates all call
sites accordingly.
`this` could be deleted after `Notification::BeginExecuteNotification`
exited and before `Notification::ExecuteNotificationHelper` finished.
This is fixed by constructing a `Notification::Ptr` and operate on that
one as it is properly reference-counted.
Only two out of three cases were handled properly by the code: host
downtimes referencing a deleted host and service downtimes referencing a
deleted service worked fine. However, if a service downtime references a
deleted host, `Host::GetByName()` returns `nullptr` which isn't
accounted for. Use `Service::GetByNamePair()` instead as this performs a
check for the host being null internally.
Boost.Beast changed the signature of
boost::beast::http::basic_fields::set in version 1.74 so that no longer
allows passing an icinga::String instance as value. This adds a
conversion function so that it works again.
Boost.Beast changed the signature of the previously used generic `set`
method so that it no longer accepts integer types, however there is
alreay a more specific method for setting the Content-Length header, so
use this one instead.
When a CRL is specified in the ApiListener configuration, Icinga 2 only
used it when connections were established so far, but not when a
certificate is requested. This allows a node to automatically renew a
revoked certificate if it meets the other conditions for auto-renewal
(issued before 2017 or expires in less than 30 days).
The function was never used and it's implementation contains a bug where
a buffer of too small size is used as a paramter to ERR_error_string.
According to the `man 3 ERR_error_info`, the buffer has to be at least
256 bytes in size.
Also the function seems of limited use as it allows to output the tag
object used with additional error information for exceptions in Boost.
However, you boost::get_error_info<>() just returns the value type but
not the full tag object from the exception.
The scientific notation is basically allowed in our performance data
parser. In an edge case where scientific notation is delivered with
an upercas E letter our parser does not recognize it and drops it as
false performance data.
The wrong interpretation as false performance data affects all parts of
the performance data, the value itself, the warning value, the critical
value, the minimum value and the maximum value.