So far, the documentation has claimed that loggers have a default severity
(information for FileLogger and warning for SyslogLogger). However, this was
not the case and not setting the severity resulted in a configuration error.
This commit changes the default value to be information for all loggers.
This commit adds a timeout for both establishing new outgoing and incoming
connections. This timeout applies to everything until the connection is in a
state where either JsonRpcConnection or HttpServerConnection takes over.
As silent now no longer only controls the generation of log messages, a better
name is required. This changes its name, inverts its value to reflect the new
name and adds a documentation comment.
When Icinga 2 is started as a service, the early log messages generated
until the FileLogger object is activated are lost and make it really
hard to debug issues that (only) occur when Icinga 2 reloads.
With this commit, these early log messages are written to the Windows
Event Log.
68a0079c26 introduced two problems that are fixed
with this commit:
1. The new truncated/hashed name did not use EscapeName()
2. There was a possible collision of names when creating objects with a full
name of format "[80 characters]...[40 hex digits]" (i.e. the same as the
truncated/hashed variant but short enough that it isn't hashed)
The old validation regex matched if the name consists only of invalid
character, not that it does not contain them, i.e. something like "foo/bar" was
considered valid.
This commit replaces the regex with a check that all characters in the name are
allowed characters.
I.e. keep the serializations as simple as possible:
null => null
true => true
42.0 => 42
"foobar" => foobar
{{42}} => Object of type 'Function'
(["foobar"] and {"foo"="bar"} can't occur there.)
Even if a double represents an integer value, it might not be safe to cast it
to long long as it may overflow the type. Instead just use print the double
value with 0 decimals using std::setprecision.
Before:
<1> => 18446744073709551616.to_string()
"-9223372036854775808"
After:
<1> => 18446744073709551616.to_string()
"18446744073709551616"
Fixes the following build error:
/home/jbrost/dev/icinga2/lib/base/stdiostream.cpp: In member function ‘virtual size_t icinga::StdioStream::Read(void*, size_t, bool)’:
/home/jbrost/dev/icinga2/lib/base/stdiostream.cpp:28:15: error: invalid use of incomplete type ‘std::iostream’ {aka ‘class std::basic_iostream<char>’}
28 | m_InnerStream->read(static_cast<char *>(buffer), size);
| ^~
Note that even when passing `nullptr` as target zone to `RelayMessage()`, the
cluster message will still be sent to the parent zone. These incoming messages
will now be rejected by the parent nodes. At the moment, there's no way to only
send within the local zone.
Unfortunately, the symbol resolution of boost::stacktrace is broken on
FreeBSD, therefore fall back to using backtrace_symbols() to print the
stack trace saved by Boost.
Additionally, -D_GNU_SOURCE is required on FreeBSD for the
_Unwind_Backtrace function used by boost::stacktrace.
This makes the format more similar to what the uncaught C++ and SEH
exception handlers write. Previously there was no indication in the
crash log that a SIGABRT happened.
Maybe this will save the next person who has to look at this code some
time. Please don't blame me for the implementation, I'm just trying to
reconstruct what it does.
The logic for selecting the traces to print stays the same, but there
are fewer nested ifs now. This changes the format of the returned string
a bit by adding a heading for both traces.
By default, DiagnosticInformation uses the stack trace saved when the
exception was thrown, but this mechanism is not in use on Windows.
Gathering a stacktrace in the terminate handler serves as a fallback.
On Windows, the termination handler is executed for uncaught C++
exceptions unless a SEH unhandled exception filter is also set. In this
case, this filter has to explicitly chain the default filter to keep
this behavior.
Previously:
1. You delete an object from a config file
2. You reload Icinga
3. Icinga fetches all objects and whether they're active from the IDO
4. Icinga recognizes that the just deleted object doesn't exist anymore
5. Icinga marks it as inactive in the IDO, but not in memory
6. You re-create the just deleted object via API
7. Icinga still thinks it's active and doesn't activate it - it's invisible
refs #8584
Previously, the initial config dump was started in a timer executed
every 15 seconds. During the first execution of the timer, the Redis
connection is typically not established yet. Therefore, this delayed the
initial sync by up to 15 seconds.
This commit instead triggers the sync from a callback that is executed
after the connection is successfully established.
The timer is removed completely. On first glance, it looks like it would
ensure that a lost connection is reestablished, but this is handled
internally by RedisConnection. After the config has been dumped once,
that timer wouldn't ever attempt a reconnect anyways.
Disregard passive check results while no active checks are being scheduled due to violated dependencies.
This copes with the fact that programs feeding passive check results into Icinga may have no notion of reachability and so drive a checkable into HARD state although dependencies have caused active check scheduling being suspended. This may prevent superflous problem notifications being emitted during recovery.
As disable_checks defaults to false, it was regarded OK (by @Al2Klimov) to make this behaviour (which resembles the active check case) unconditional and not conditionalize it on an additional attribute.
In the description of disable_checks, note that a value of true both disables scheduling of active checks and drops passive check results.
This delays the log message stating that the initial dump is done until
all queries are actually done and now logs a meaningful duration. In
addition, this delays the return of the function and therefore when
state variables are updated by the caller.
This commit sets the activation priority if IcingaDB objects to 100 (the
same value as IDO uses) so that it get's activated after most regular
config objects (hosts, services, ...).
Before (note how Icinga 2 continues to active objects for over a minute
after IcingaDB is started and thinks the initial dump is done):
[2021-01-19 08:33:19 +0000] information/IcingaDB: 'icingadb' started.
[2021-01-19 08:34:02 +0000] information/IcingaDB: Initial config/status dump finished in 28.247 seconds.
[2021-01-19 08:35:49 +0000] information/ConfigItem: Activated all objects.
After (now activation of objects is done right after IcingaDB is
started, as it's one of the last objects to be activated):
[2021-01-19 08:39:01 +0000] information/IcingaDB: 'icingadb' started.
[2021-01-19 08:39:02 +0000] information/ConfigItem: Activated all objects.
[2021-01-19 08:39:38 +0000] information/IcingaDB: Initial config/status dump finished in 21.6606 seconds.
There is an assertion that after activating items, all these items are
active, which sounds reasonable at first. However, with concurrent API
queries, some of these could already be deleted and therefore be
deactivated again.
At numerous places in the code, something like this is performed:
String name = Downtime::AddDowntime(...);
Downtime::Ptr downtime = Downtime::GetByName(name);
However, `downtime` can be a `nullptr` after this as it is possible that
the downtime is deleted in between.
This commit changes the return type of `Downtime::AddDowntime` to return
a Downtime::Ptr instead of the full name of the downtime. `AddDowntime`
performs the very same `GetByName()` operation internally, but handles
the `nullptr` case correctly and throws an exception.