mirror of
				https://github.com/Icinga/icinga2.git
				synced 2025-10-31 11:14:10 +01:00 
			
		
		
		
	
		
			
				
	
	
		
			2058 lines
		
	
	
		
			79 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
			
		
		
	
	
			2058 lines
		
	
	
		
			79 KiB
		
	
	
	
		
			Markdown
		
	
	
	
	
	
| # Technical Concepts <a id="technical-concepts"></a>
 | ||
| 
 | ||
| This chapter provides technical concepts and design insights
 | ||
| into specific Icinga 2 components such as:
 | ||
| 
 | ||
| * [Application](19-technical-concepts.md#technical-concepts-application)
 | ||
| * [Configuration](19-technical-concepts.md#technical-concepts-configuration)
 | ||
| * [Features](19-technical-concepts.md#technical-concepts-features)
 | ||
| * [Check Scheduler](19-technical-concepts.md#technical-concepts-check-scheduler)
 | ||
| * [Checks](19-technical-concepts.md#technical-concepts-checks)
 | ||
| * [Cluster](19-technical-concepts.md#technical-concepts-cluster)
 | ||
| * [TLS Network IO](19-technical-concepts.md#technical-concepts-tls-network-io)
 | ||
| 
 | ||
| ## Application <a id="technical-concepts-application"></a>
 | ||
| 
 | ||
| ### CLI Commands <a id="technical-concepts-application-cli-commands"></a>
 | ||
| 
 | ||
| The Icinga 2 application is managed with different CLI sub commands.
 | ||
| `daemon` takes care about loading the configuration files, running the
 | ||
| application as daemon, etc.
 | ||
| Other sub commands allow to enable features, generate and request
 | ||
| TLS certificates or enter the debug console.
 | ||
| 
 | ||
| The main entry point for each CLI command parses the command line
 | ||
| parameters and then triggers the required actions.
 | ||
| 
 | ||
| ### daemon CLI command <a id="technical-concepts-application-cli-commands-daemon"></a>
 | ||
| 
 | ||
| This CLI command loads the configuration files, starting with `icinga2.conf`.
 | ||
| The [configuration compiler](19-technical-concepts.md#technical-concepts-configuration) parses the
 | ||
| file and detects additional file includes, constants, and any other DSL
 | ||
| specific declaration.
 | ||
| 
 | ||
| At this stage, the configuration will already be checked against the
 | ||
| defined grammar in the scanner, and custom object validators will also be
 | ||
| checked.
 | ||
| 
 | ||
| If the user provided `-C/--validate`, the CLI command returns with the
 | ||
| validation exit code.
 | ||
| 
 | ||
| When running as daemon, additional parameters are checked, e.g. whether
 | ||
| this application was triggered by a reload, needs to daemonize with fork()
 | ||
| involved and update the object's authority. The latter is important for
 | ||
| HA-enabled cluster zones.
 | ||
| 
 | ||
| ## Configuration <a id="technical-concepts-configuration"></a>
 | ||
| 
 | ||
| ### Lexer <a id="technical-concepts-configuration-lexer"></a>
 | ||
| 
 | ||
| The lexer stage does not understand the DSL itself, it only
 | ||
| maps specific character sequences into identifiers.
 | ||
| 
 | ||
| This allows Icinga to detect the beginning of a string with `"`,
 | ||
| reading the following characters and determining the end of the
 | ||
| string with again `"`.
 | ||
| 
 | ||
| Other parts covered by the lexer a escape sequences insides a string,
 | ||
| e.g. `"\"abc"`.
 | ||
| 
 | ||
| The lexer also identifiers logical operators, e.g. `&` or `in`,
 | ||
| specific keywords like `object`, `import`, etc. and comment blocks.
 | ||
| 
 | ||
| Please check `lib/config/config_lexer.ll` for details.
 | ||
| 
 | ||
| Icinga uses [Flex](https://github.com/westes/flex) in the first stage.
 | ||
| 
 | ||
| > Flex (The Fast Lexical Analyzer)
 | ||
| >
 | ||
| > Flex is a fast lexical analyser generator. It is a tool for generating programs
 | ||
| > that perform pattern-matching on text. Flex is a free (but non-GNU) implementation
 | ||
| > of the original Unix lex program.
 | ||
| 
 | ||
| ### Parser <a id="technical-concepts-configuration-parser"></a>
 | ||
| 
 | ||
| The parser stage puts the identifiers from the lexer into more
 | ||
| context with flow control and sequences.
 | ||
| 
 | ||
| The following comparison is parsed into a left term, an operator
 | ||
| and a right term.
 | ||
| 
 | ||
| ```
 | ||
| x > 5
 | ||
| ```
 | ||
| 
 | ||
| The DSL contains many elements which require a specific order,
 | ||
| and sometimes only a left term for example.
 | ||
| 
 | ||
| The parser also takes care of parsing an object declaration for
 | ||
| example. It already knows from the lexer that `object` marks the
 | ||
| beginning of an object. It then expects a type string afterwards,
 | ||
| and the object name - which can be either a string with double quotes
 | ||
| or a previously defined constant.
 | ||
| 
 | ||
| An opening bracket `{` in this specific context starts the object
 | ||
| scope, which also is stored for later scope specific variable access.
 | ||
| 
 | ||
| If there's an apply rule defined, this follows the same principle.
 | ||
| The config parser detects the scope of an apply rule and generates
 | ||
| Icinga 2 C++ code for the parsed string tokens.
 | ||
| 
 | ||
| ```
 | ||
| assign where host.vars.sla == "24x7"
 | ||
| ```
 | ||
| 
 | ||
| is parsed into an assign token identifier, and the string expression
 | ||
| is compiled into a new `ApplyExpression` object.
 | ||
| 
 | ||
| The flow control inside the parser ensures that for example `ignore where`
 | ||
| can only be defined when a previous `assign where` was given - or when
 | ||
| inside an apply for rule.
 | ||
| 
 | ||
| Another example are specific object types which allow assign expression,
 | ||
| specifically group objects. Others objects must throw a configuration error.
 | ||
| 
 | ||
| Please check `lib/config/config_parser.yy` for more details,
 | ||
| and the [language reference](17-language-reference.md#language-reference) chapter for
 | ||
| documented DSL keywords and sequences.
 | ||
| 
 | ||
| > Icinga uses [Bison](https://en.wikipedia.org/wiki/GNU_bison) as parser generator
 | ||
| > which reads a specification of a context-free language, warns about any parsing
 | ||
| > ambiguities, and generates a parser in C++ which reads sequences of tokens and
 | ||
| > decides whether the sequence conforms to the syntax specified by the grammar.
 | ||
| 
 | ||
| 
 | ||
| ### Compiler <a id="technical-concepts-configuration-compiler"></a>
 | ||
| 
 | ||
| The config compiler initializes the scanner inside the [lexer](19-technical-concepts.md#technical-concepts-configuration-lexer)
 | ||
| stage.
 | ||
| 
 | ||
| The configuration files are parsed into memory from inside the [daemon CLI command](19-technical-concepts.md#technical-concepts-application-cli-commands-daemon)
 | ||
| which invokes the config validation in `ValidateConfigFiles()`. This compiles the
 | ||
| files into an AST expression which is executed.
 | ||
| 
 | ||
| At this stage, the expressions generate so-called "config items" which
 | ||
| are a pre-stage of the later compiled object.
 | ||
| 
 | ||
| `ConfigItem::CommitItems` takes care of committing the items, and doing a
 | ||
| rollback on failure. It also checks against matching apply rules from the previous run
 | ||
| and generates statistics about the objects which can be seen by the config validation.
 | ||
| 
 | ||
| `ConfigItem::CommitNewItems` collects the registered types and items,
 | ||
| and checks for a specific required order, e.g. a service object needs
 | ||
| a host object first.
 | ||
| 
 | ||
| The following stages happen then:
 | ||
| 
 | ||
| - **Commit**: A workqueue then commits the items in a parallel fashion for this specific type. The object gets its name, and the AST expression is executed. It is then registered into the item into `m_Object` as reference.
 | ||
| - **OnAllConfigLoaded**: Special signal for each object to pre-load required object attributes, resolve group membership, initialize functions and timers.
 | ||
| - **CreateChildObjects**: Run apply rules for this specific type.
 | ||
| - **CommitNewItems**: Apply rules may generate new config items, this is to ensure that they again run through the stages.
 | ||
| 
 | ||
| Note that the items are now committed and the configuration is validated and loaded
 | ||
| into memory. The final config objects are not yet activated though.
 | ||
| 
 | ||
| This only happens after the validation, when the application is about to be run
 | ||
| with `ConfigItem::ActivateItems`.
 | ||
| 
 | ||
| Each item has an object created in `m_Object` which is checked in a loop.
 | ||
| Again, the dependency order of activated objects is important here, e.g. logger features come first, then
 | ||
| config objects and last the checker, api, etc. features. This is done by sorting the objects
 | ||
| based on their type specific activation priority.
 | ||
| 
 | ||
| The following signals are triggered in the stages:
 | ||
| 
 | ||
| - **PreActivate**: Setting the `active` flag for the config object.
 | ||
| - **Activate**: Calls `Start()` on the object, sets the local HA authority and notifies subscribers that this object is now activated (e.g. for config updates in the DB backend).
 | ||
| 
 | ||
| 
 | ||
| ### References <a id="technical-concepts-configuration-references"></a>
 | ||
| 
 | ||
| * [The Icinga Config Compiler: An Overview](https://www.netways.de/blog/2018/07/12/the-icinga-config-compiler-an-overview/)
 | ||
| * [A parser/lexer/compiler for the Leonardo language](https://github.com/EmilGedda/Leonardo)
 | ||
| * [I wrote a programming language. Here’s how you can, too.](https://medium.freecodecamp.org/the-programming-language-pipeline-91d3f449c919)
 | ||
| * [http://onoffswitch.net/building-a-custom-lexer/](http://onoffswitch.net/building-a-custom-lexer/)
 | ||
| * [Writing an Interpreter with Lex, Yacc, and Memphis](http://memphis.compilertools.net/interpreter.html)
 | ||
| * [Flex](https://github.com/westes/flex)
 | ||
| * [GNU Bison](https://www.gnu.org/software/bison/)
 | ||
| 
 | ||
| ## Core <a id="technical-concepts-core"></a>
 | ||
| 
 | ||
| ### Core: Reload Handling <a id="technical-concepts-core-reload"></a>
 | ||
| 
 | ||
| The initial design of the reload state machine looks like this:
 | ||
| 
 | ||
| * receive reload signal SIGHUP
 | ||
| * fork a child process, start configuration validation in parallel work queues
 | ||
| * parent process continues with old configuration objects and the event scheduling
 | ||
| (doing checks, replicating cluster events, triggering alert notifications, etc.)
 | ||
| * validation NOT ok: child process terminates, parent process continues with old configuration state
 | ||
| * validation ok: child process signals parent process to terminate and save its current state (all events until now) into the icinga2 state file
 | ||
| * parent process shuts down writing icinga2.state file
 | ||
| * child process waits for parent process gone, reads the icinga2 state file and synchronizes all historical and status data
 | ||
| * child becomes the new session leader
 | ||
| 
 | ||
| Since Icinga 2.6, there are two processes when checked with `ps aux | grep icinga2` or `pidof icinga2`.
 | ||
| This was to ensure that feature file descriptors don't leak into the plugin process (e.g. DB IDO MySQL sockets).
 | ||
| 
 | ||
| Icinga 2.9 changed the reload handling a bit with SIGUSR2 signals
 | ||
| and systemd notifies.
 | ||
| 
 | ||
| With systemd, it could occur that the tree was broken thus resulting
 | ||
| in killing all remaining processes on stop, instead of a clean exit.
 | ||
| You can read the full story [here](https://github.com/Icinga/icinga2/issues/7309).
 | ||
| 
 | ||
| With 2.11 you'll now see 3 processes:
 | ||
| 
 | ||
| - The umbrella process which takes care about signal handling and process spawning/stopping
 | ||
| - The main process with the check scheduler, notifications, etc.
 | ||
| - The execution helper process
 | ||
| 
 | ||
| During reload, the umbrella process spawns a new reload process which validates the configuration.
 | ||
| Once successful, the new reload process signals the umbrella process that it is finished.
 | ||
| The umbrella process forwards the signal and tells the old main process to shutdown.
 | ||
| The old main process writes the icinga2.state file. The umbrella process signals
 | ||
| the reload process that the main process terminated.
 | ||
| 
 | ||
| The reload process was in idle wait before, and now continues to read the written
 | ||
| state file and run the event loop (checks, notifications, "events", ...). The reload
 | ||
| process itself also spawns the execution helper process again.
 | ||
| 
 | ||
| 
 | ||
| ## Features <a id="technical-concepts-features"></a>
 | ||
| 
 | ||
| Features are implemented in specific libraries and can be enabled
 | ||
| using CLI commands.
 | ||
| 
 | ||
| Features either write specific data or receive data.
 | ||
| 
 | ||
| Examples for writing data: [DB IDO](14-features.md#db-ido), [Graphite](14-features.md#graphite-carbon-cache-writer), [InfluxDB](14-features.md#influxdb-writer). [GELF](14-features.md#gelfwriter), etc.
 | ||
| Examples for receiving data: [REST API](12-icinga2-api.md#icinga2-api), etc.
 | ||
| 
 | ||
| The implementation of features makes use of existing libraries
 | ||
| and functionality. This makes the code more abstract, but shorter
 | ||
| and easier to read.
 | ||
| 
 | ||
| Features register callback functions on specific events they want
 | ||
| to handle. For example the `GraphiteWriter` feature subscribes to
 | ||
| new CheckResult events.
 | ||
| 
 | ||
| Each time Icinga 2 receives and processes a new check result, this
 | ||
| event is triggered and forwarded to all subscribers.
 | ||
| 
 | ||
| The GraphiteWriter feature calls the registered function and processes
 | ||
| the received data. Features which connect Icinga 2 to external interfaces
 | ||
| normally parse and reformat the received data into an applicable format.
 | ||
| 
 | ||
| Since this check result signal is blocking, many of the features include a work queue
 | ||
| with asynchronous task handling.
 | ||
| 
 | ||
| The GraphiteWriter uses a TCP socket to communicate with the carbon cache
 | ||
| daemon of Graphite. The InfluxDBWriter is instead writing bulk metric messages
 | ||
| to InfluxDB's HTTP API, similar to Elasticsearch.
 | ||
| 
 | ||
| 
 | ||
| ## Check Scheduler <a id="technical-concepts-check-scheduler"></a>
 | ||
| 
 | ||
| The check scheduler starts a thread which loops forever. It waits for
 | ||
| check events being inserted into `m_IdleCheckables`.
 | ||
| 
 | ||
| If the current pending check event number is larger than the configured
 | ||
| max concurrent checks, the thread waits up until it there's slots again.
 | ||
| 
 | ||
| In addition, further checks on enabled checks, check periods, etc. are
 | ||
| performed. Once all conditions have passed, the next check timestamp is
 | ||
| calculated and updated. This also is the timestamp where Icinga expects
 | ||
| a new check result ("freshness check").
 | ||
| 
 | ||
| The object is removed from idle checkables, and inserted into the
 | ||
| pending checkables list. This can be seen via REST API metrics for the
 | ||
| checker component feature as well.
 | ||
| 
 | ||
| The actual check execution happens asynchronously using the application's
 | ||
| thread pool.
 | ||
| 
 | ||
| Once the check returns, it is removed from pending checkables and again
 | ||
| inserted into idle checkables. This ensures that the scheduler takes this
 | ||
| checkable event into account in the next iteration.
 | ||
| 
 | ||
| ### Start <a id="technical-concepts-check-scheduler-start"></a>
 | ||
| 
 | ||
| When checkable objects get activated during the startup phase,
 | ||
| the checker feature registers a handler for this event. This is due
 | ||
| to the fact that the `checker` feature is fully optional, and e.g. not
 | ||
| used on command endpoint clients.
 | ||
| 
 | ||
| Whenever such an object activation signal is triggered, Icinga 2 checks
 | ||
| whether it is [authoritative for this object](19-technical-concepts.md#technical-concepts-cluster-ha-object-authority).
 | ||
| This means that inside an HA enabled zone with two endpoints, only non-paused checkable objects are
 | ||
| actively inserted into the idle checkable list for the check scheduler.
 | ||
| 
 | ||
| ### Initial Check <a id="technical-concepts-check-scheduler-initial"></a>
 | ||
| 
 | ||
| When a new checkable object (host or service) is initially added to the
 | ||
| configuration, Icinga 2 performs the following during startup:
 | ||
| 
 | ||
| * `Checkable::Start()` is called and calculates the first check time
 | ||
| * With a spread delta, the next check time is actually set.
 | ||
| 
 | ||
| If the next check should happen within a time frame of 60 seconds,
 | ||
| Icinga 2 calculates a delta from a random value. The minimum of `check_interval`
 | ||
| and 60 seconds is used as basis, multiplied with a random value between 0 and 1.
 | ||
| 
 | ||
| In the best case, this check gets immediately executed after application start.
 | ||
| The worst case scenario is that the check is scheduled 60 seconds after start
 | ||
| the latest.
 | ||
| 
 | ||
| The reasons for delaying and spreading checks during startup is that
 | ||
| the application typically needs more resources at this time (cluster connections,
 | ||
| feature warmup, initial syncs, etc.). Immediate check execution with
 | ||
| thousands of checks could lead into performance problems, and additional
 | ||
| events for each received check results.
 | ||
| 
 | ||
| Therefore the initial check window is 60 seconds on application startup,
 | ||
| random seed for all checkables. This is not predictable over multiple restarts
 | ||
| for specific checkable objects, the delta changes every time.
 | ||
| 
 | ||
| ### Scheduling Offset <a id="technical-concepts-check-scheduler-offset"></a>
 | ||
| 
 | ||
| There's a high chance that many checkable objects get executed at the same time
 | ||
| and interval after startup. The initial scheduling spreads that a little, but
 | ||
| Icinga 2 also attempts to ensure to keep fixed intervals, even with high check latency.
 | ||
| 
 | ||
| During startup, Icinga 2 calculates the scheduling offset from a random number:
 | ||
| 
 | ||
| * `Checkable::Checkable()` calls `SetSchedulingOffset()` with `Utility::Random()`
 | ||
| * The offset is a pseudo-random integral value between `0` and `RAND_MAX`.
 | ||
| 
 | ||
| Whenever the next check time is updated with `Checkable::UpdateNextCheck()`,
 | ||
| the scheduling offset is taken into account.
 | ||
| 
 | ||
| Depending on the state type (SOFT or HARD), either the `retry_interval` or `check_interval`
 | ||
| is used. If the interval is greater than 1 second, the time adjustment is calculated in the
 | ||
| following way:
 | ||
| 
 | ||
| `now * 100 + offset` divided by `interval * 100`, using the remainder (that's what `fmod()` is for)
 | ||
| and dividing this again onto base 100.
 | ||
| 
 | ||
| Example: offset is 6500, interval 300, now is 1542190472.
 | ||
| 
 | ||
| ```
 | ||
| 1542190472 * 100 + 6500 = 154219053714
 | ||
| 300 * 100 = 30000
 | ||
| 154219053714 / 30000 = 5140635.1238
 | ||
| 
 | ||
| (5140635.1238 - 5140635.0) * 30000 = 3714
 | ||
| 3714 / 100 = 37.14
 | ||
| ```
 | ||
| 
 | ||
| 37.15 seconds as an offset would be far too much, so this is again used as a calculation divider for the
 | ||
| real offset with the base of 5 times the actual interval.
 | ||
| 
 | ||
| Again, the remainder is calculated from the offset and `interval * 5`. This is divided onto base 100 again,
 | ||
| with an additional 0.5 seconds delay.
 | ||
| 
 | ||
| Example: offset is 6500, interval 300.
 | ||
| 
 | ||
| ```
 | ||
| 6500 / 300 = 21.666666666666667
 | ||
| (21.666666666666667 - 21.0) * 300 = 200
 | ||
| 200 / 100 = 2
 | ||
| 2 + 0.5 = 2.5
 | ||
| ```
 | ||
| 
 | ||
| The minimum value between the first adjustment and the second offset calculation based on the interval is
 | ||
| taken, in the above example `2.5` wins.
 | ||
| 
 | ||
| The actual next check time substracts the adjusted time from the future interval addition to provide
 | ||
| a more widespread scheduling time among all checkable objects.
 | ||
| 
 | ||
| `nextCheck = now - adj + interval`
 | ||
| 
 | ||
| You may ask, what other values can happen with this offset calculation. Consider calculating more examples
 | ||
| with different interval settings.
 | ||
| 
 | ||
| Example: offset is 34567, interval 60, now is 1542190472.
 | ||
| 
 | ||
| ```
 | ||
| 1542190472 * 100 + 34567 = 154219081767
 | ||
| 60 * 100 = 6000
 | ||
| 154219081767 / 6000 = 25703180.2945
 | ||
| (25703180.2945 - 25703180.0) * 6000 / 100 = 17.67
 | ||
| 
 | ||
| 34567 / 60 = 576.116666666666667
 | ||
| (576.116666666666667 - 576.0) * 60 / 100 + 0.5 = 1.2
 | ||
| ```
 | ||
| 
 | ||
| `1m` interval starts at `now + 1.2s`.
 | ||
| 
 | ||
| Example: offset is 12345, interval 86400, now is 1542190472.
 | ||
| 
 | ||
| ```
 | ||
| 1542190472 * 100 + 12345 = 154219059545
 | ||
| 86400 * 100 = 8640000
 | ||
| 154219059545 / 8640000 = 17849.428188078703704
 | ||
| (17849.428188078703704 - 17849) * 8640000 = 3699545
 | ||
| 3699545 / 100 = 36995.45
 | ||
| 
 | ||
| 12345 / 86400 = 0.142881944444444
 | ||
| 0.142881944444444 * 86400 / 100 + 0.5 = 123.95
 | ||
| ```
 | ||
| 
 | ||
| `1d` interval starts at `now + 2m4s`.
 | ||
| 
 | ||
| > **Note**
 | ||
| >
 | ||
| > In case you have a better algorithm at hand, feel free to discuss this in a PR on GitHub.
 | ||
| > It needs to fulfill two things: 1) spread and shuffle execution times on each `next_check` update
 | ||
| > 2) not too narrowed window for both long and short intervals
 | ||
| > Application startup and initial checks need to be handled with care in a slightly different
 | ||
| > fashion.
 | ||
| 
 | ||
| When `SetNextCheck()` is called, there are signals registered. One of them sits
 | ||
| inside the `CheckerComponent` class whose handler `CheckerComponent::NextCheckChangedHandler()`
 | ||
| deletes/inserts the next check event from the scheduling queue. This basically
 | ||
| is a list with multiple indexes with the keys for scheduling info and the object.
 | ||
| 
 | ||
| 
 | ||
| ## Checks<a id="technical-concepts-checks"></a>
 | ||
| 
 | ||
| ### Check Latency and Execution Time <a id="technical-concepts-checks-latency"></a>
 | ||
| 
 | ||
| Each check command execution logs the start and end time where
 | ||
| Icinga 2 (and the end user) is able to calculate the plugin execution time from it.
 | ||
| 
 | ||
| ```
 | ||
| GetExecutionEnd() - GetExecutionStart()
 | ||
| ```
 | ||
| 
 | ||
| The higher the execution time, the higher the command timeout must be set. Furthermore
 | ||
| users and developers are encouraged to look into plugin optimizations to minimize the
 | ||
| execution time. Sometimes it is better to let an external daemon/script do the checks
 | ||
| and feed them back via REST API.
 | ||
| 
 | ||
| Icinga 2 stores the scheduled start and end time for a check. If the actual
 | ||
| check execution time differs from the scheduled time, e.g. due to performance
 | ||
| problems or limited execution slots (concurrent checks), this value is stored
 | ||
| and computed from inside the check result.
 | ||
| 
 | ||
| The difference between the two deltas is called `check latency`.
 | ||
| 
 | ||
| ```
 | ||
| (GetScheduleEnd() - GetScheduleStart()) - CalculateExecutionTime()
 | ||
| ```
 | ||
| 
 | ||
| ### Severity <a id="technical-concepts-checks-severity"></a>
 | ||
| 
 | ||
| The severity attribute is introduced with Icinga v2.11 and provides
 | ||
| a bit mask calculated value from specific checkable object states.
 | ||
| 
 | ||
| The severity value is pre-calculated for visualization interfaces
 | ||
| such as Icinga Web which sorts the problem dashboard by severity by default.
 | ||
| 
 | ||
| The higher the severity number is, the more important the problem is.
 | ||
| 
 | ||
| Flags:
 | ||
| 
 | ||
| ```
 | ||
| /**
 | ||
|  * Severity Flags
 | ||
|  *
 | ||
|  * @ingroup icinga
 | ||
|  */
 | ||
| enum SeverityFlag
 | ||
| {
 | ||
| 	SeverityFlagDowntime = 1,
 | ||
| 	SeverityFlagAcknowledgement = 2,
 | ||
| 	SeverityFlagHostDown = 4,
 | ||
| 	SeverityFlagUnhandled = 8,
 | ||
| 	SeverityFlagPending = 16,
 | ||
| 	SeverityFlagWarning = 32,
 | ||
| 	SeverityFlagUnknown = 64,
 | ||
| 	SeverityFlagCritical = 128,
 | ||
| };
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| Host:
 | ||
| 
 | ||
| ```
 | ||
| 	/* OK/Warning = Up, Critical/Unknown = Down */
 | ||
| 	if (!HasBeenChecked())
 | ||
| 		severity |= SeverityFlagPending;
 | ||
| 	else if (state == ServiceUnknown)
 | ||
| 		severity |= SeverityFlagCritical;
 | ||
| 	else if (state == ServiceCritical)
 | ||
| 		severity |= SeverityFlagCritical;
 | ||
| 
 | ||
| 	if (IsInDowntime())
 | ||
| 		severity |= SeverityFlagDowntime;
 | ||
| 	else if (IsAcknowledged())
 | ||
| 		severity |= SeverityFlagAcknowledgement;
 | ||
| 	else
 | ||
| 		severity |= SeverityFlagUnhandled;
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| Service:
 | ||
| 
 | ||
| ```
 | ||
| 	if (!HasBeenChecked())
 | ||
| 		severity |= SeverityFlagPending;
 | ||
| 	else if (state == ServiceWarning)
 | ||
| 		severity |= SeverityFlagWarning;
 | ||
| 	else if (state == ServiceUnknown)
 | ||
| 		severity |= SeverityFlagUnknown;
 | ||
| 	else if (state == ServiceCritical)
 | ||
| 		severity |= SeverityFlagCritical;
 | ||
| 
 | ||
| 	if (IsInDowntime())
 | ||
| 		severity |= SeverityFlagDowntime;
 | ||
| 	else if (IsAcknowledged())
 | ||
| 		severity |= SeverityFlagAcknowledgement;
 | ||
| 	else if (m_Host->GetProblem())
 | ||
| 		severity |= SeverityFlagHostDown;
 | ||
| 	else
 | ||
| 		severity |= SeverityFlagUnhandled;
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| 
 | ||
| ## Cluster <a id="technical-concepts-cluster"></a>
 | ||
| 
 | ||
| This documentation refers to technical roles between cluster
 | ||
| endpoints.
 | ||
| 
 | ||
| - The `server` or `parent` role accepts incoming connection attempts and handles requests
 | ||
| - The `client` role actively connects to remote endpoints receiving config/commands, requesting certificates, etc.
 | ||
| 
 | ||
| A client role is not necessarily bound to the Icinga agent.
 | ||
| It may also be a satellite which actively connects to the
 | ||
| master.
 | ||
| 
 | ||
| ### Communication <a id="technical-concepts-cluster-communication"></a>
 | ||
| 
 | ||
| Icinga 2 uses its own certificate authority (CA) by default. The
 | ||
| public and private CA keys can be generated on the signing master.
 | ||
| 
 | ||
| Each node certificate must be signed by the private CA key.
 | ||
| 
 | ||
| Note: The following description uses `parent node` and `child node`.
 | ||
| This also applies to nodes in the same cluster zone.
 | ||
| 
 | ||
| During the connection attempt, a TLS handshake is performed.
 | ||
| If the public certificate of a child node is not signed by the same
 | ||
| CA, the child node is not trusted and the connection will be closed.
 | ||
| 
 | ||
| If the TLS handshake succeeds, the parent node reads the
 | ||
| certificate's common name (CN) of the child node and looks for
 | ||
| a local Endpoint object name configuration.
 | ||
| 
 | ||
| If there is no Endpoint object found, further communication
 | ||
| (runtime and config sync, etc.) is terminated.
 | ||
| 
 | ||
| The child node also checks the CN from the parent node's public
 | ||
| certificate. If the child node does not find any local Endpoint
 | ||
| object name configuration, it will not trust the parent node.
 | ||
| 
 | ||
| Both checks prevent accepting cluster messages from an untrusted
 | ||
| source endpoint.
 | ||
| 
 | ||
| If an Endpoint match was found, there is one additional security
 | ||
| mechanism in place: Endpoints belong to a Zone hierarchy.
 | ||
| 
 | ||
| Several cluster messages can only be sent "top down", others like
 | ||
| check results are allowed being sent from the child to the parent node.
 | ||
| 
 | ||
| Once this check succeeds the cluster messages are exchanged and processed.
 | ||
| 
 | ||
| 
 | ||
| ### CSR Signing <a id="technical-concepts-cluster-csr-signing"></a>
 | ||
| 
 | ||
| In order to make things easier, Icinga 2 provides built-in methods
 | ||
| to allow child nodes to request a signed certificate from the
 | ||
| signing master.
 | ||
| 
 | ||
| Icinga 2 v2.8 introduces the possibility to request certificates
 | ||
| from indirectly connected nodes. This is required for multi level
 | ||
| cluster environments with masters, satellites and agents.
 | ||
| 
 | ||
| CSR Signing in general starts with the master setup. This step
 | ||
| ensures that the master is in a working CSR signing state with:
 | ||
| 
 | ||
| * public and private CA key in `/var/lib/icinga2/ca`
 | ||
| * private `TicketSalt` constant defined inside the `api` feature
 | ||
| * Cluster communication is ready and Icinga 2 listens on port 5665
 | ||
| 
 | ||
| The child node setup which is run with CLI commands will now
 | ||
| attempt to connect to the parent node. This is not necessarily
 | ||
| the signing master instance, but could also be a parent satellite node.
 | ||
| 
 | ||
| During this process the child node asks the user to verify the
 | ||
| parent node's public certificate to prevent MITM attacks.
 | ||
| 
 | ||
| There are two methods to request signed certificates:
 | ||
| 
 | ||
| * Add the ticket into the request. This ticket was generated on the master
 | ||
| beforehand and contains hashed details for which client it has been created.
 | ||
| The signing master uses this information to automatically sign the certificate
 | ||
| request.
 | ||
| 
 | ||
| * Do not add a ticket into the request. It will be sent to the signing master
 | ||
| which stores the pending request. Manual user interaction with CLI commands
 | ||
| is necessary to sign the request.
 | ||
| 
 | ||
| The certificate request is sent as `pki::RequestCertificate` cluster
 | ||
| message to the parent node.
 | ||
| 
 | ||
| If the parent node is not the signing master, it stores the request
 | ||
| in `/var/lib/icinga2/certificate-requests` and forwards the
 | ||
| cluster message to its parent node.
 | ||
| 
 | ||
| Once the message arrives on the signing master, it first verifies that
 | ||
| the sent certificate request is valid. This is to prevent unwanted errors
 | ||
| or modified requests from the "proxy" node.
 | ||
| 
 | ||
| After verification, the signing master checks if the request contains
 | ||
| a valid signing ticket. It hashes the certificate's common name and
 | ||
| compares the value to the received ticket number.
 | ||
| 
 | ||
| If the ticket is valid, the certificate request is immediately signed
 | ||
| with CA key. The request is sent back to the client inside a `pki::UpdateCertificate`
 | ||
| cluster message.
 | ||
| 
 | ||
| If the child node was not the certificate request origin, it only updates
 | ||
| the cached request for the child node and send another cluster message
 | ||
| down to its child node (e.g. from a satellite to an agent).
 | ||
| 
 | ||
| 
 | ||
| If no ticket was specified, the signing master waits until the
 | ||
| `ca sign` CLI command manually signed the certificate.
 | ||
| 
 | ||
| > **Note**
 | ||
| >
 | ||
| > Push notifications for manual request signing is not yet implemented (TODO).
 | ||
| 
 | ||
| Once the child node reconnects it synchronizes all signed certificate requests.
 | ||
| This takes some minutes and requires all nodes to reconnect to each other.
 | ||
| 
 | ||
| 
 | ||
| #### CSR Signing: Clients without parent connection <a id="technical-concepts-cluster-csr-signing-clients-no-connection"></a>
 | ||
| 
 | ||
| There is an additional scenario: The setup on a child node does
 | ||
| not necessarily need a connection to the parent node.
 | ||
| 
 | ||
| This mode leaves the node in a semi-configured state. You need
 | ||
| to manually copy the master's public CA key into `/var/lib/icinga2/certs/ca.crt`
 | ||
| on the client before starting Icinga 2.
 | ||
| 
 | ||
| > **Note**
 | ||
| >
 | ||
| > The `client` in this case can be either a satellite or an agent.
 | ||
| 
 | ||
| The parent node needs to actively connect to the child node.
 | ||
| Once this connections succeeds, the child node will actively
 | ||
| request a signed certificate.
 | ||
| 
 | ||
| The update procedure works the same way as above.
 | ||
| 
 | ||
| ### High Availability <a id="technical-concepts-cluster-ha"></a>
 | ||
| 
 | ||
| General high availability is automatically enabled between two endpoints in the same
 | ||
| cluster zone.
 | ||
| 
 | ||
| **This requires the same configuration and enabled features on both nodes.**
 | ||
| 
 | ||
| HA zone members trust each other and share event updates as cluster messages.
 | ||
| This includes for example check results, next check timestamp updates, acknowledgements
 | ||
| or notifications.
 | ||
| 
 | ||
| This ensures that both nodes are synchronized. If one node goes away, the
 | ||
| remaining node takes over and continues as normal.
 | ||
| 
 | ||
| #### High Availability: Object Authority <a id="technical-concepts-cluster-ha-object-authority"></a>
 | ||
| 
 | ||
| Cluster nodes automatically determine the authority for configuration
 | ||
| objects. By default, all config objects are set to `HARunEverywhere` and
 | ||
| as such the object authority is true for any config object on any instance.
 | ||
| 
 | ||
| Specific objects can override and influence this setting, e.g. with `HARunOnce`
 | ||
| instead prior to config object activation.
 | ||
| 
 | ||
| This is done when the daemon starts and in a regular interval inside
 | ||
| the ApiListener class, specifically calling `ApiListener::UpdateObjectAuthority()`.
 | ||
| 
 | ||
| The algorithm works like this:
 | ||
| 
 | ||
| * Determine whether this instance is assigned to a local zone and endpoint.
 | ||
| * Collects all endpoints in this zone if they are connected.
 | ||
| * If there's two endpoints, but only us seeing ourselves and the application start is less than 60 seconds in the past, do nothing (wait for cluster reconnect to take place, grace period).
 | ||
| * Sort the collected endpoints by name.
 | ||
| * Iterate over all config types and their respective objects
 | ||
|  * Ignore !active objects
 | ||
|  * Ignore objects which are !HARunOnce. This means, they can run multiple times in a zone and don't need an authority update.
 | ||
|  * If this instance doesn't have a local zone, set authority to true. This is for non-clustered standalone environments where everything belongs to this instance.
 | ||
|  * Calculate the object authority based on the connected endpoint names.
 | ||
|  * Set the authority (true or false)
 | ||
| 
 | ||
| The object authority calculation works "offline" without any message exchange.
 | ||
| Each instance alculates the SDBM hash of the config object name, puts that in contrast
 | ||
| modulo the connected endpoints size.
 | ||
| This index is used to lookup the corresponding endpoint in the connected endpoints array,
 | ||
| including the local endpoint. Whether the local endpoint is equal to the selected endpoint,
 | ||
| or not, this sets the authority to `true` or `false`.
 | ||
| 
 | ||
| ```
 | ||
| authority = endpoints[Utility::SDBM(object->GetName()) % endpoints.size()] == my_endpoint;
 | ||
| ```
 | ||
| 
 | ||
| `ConfigObject::SetAuthority(bool authority)` triggers the following events:
 | ||
| 
 | ||
| * Authority is true and object now paused: Resume the object and set `paused` to `false`.
 | ||
| * Authority is false, object not paused: Pause the object and set `paused` to true.
 | ||
| 
 | ||
| **This results in activated but paused objects on one endpoint.** You can verify
 | ||
| that by querying the `paused` attribute for all objects via REST API
 | ||
| or debug console on both endpoints.
 | ||
| 
 | ||
| Endpoints inside a HA zone calculate the object authority independent from each other.
 | ||
| This object authority is important for selected features explained below.
 | ||
| 
 | ||
| Since features are configuration objects too, you must ensure that all nodes
 | ||
| inside the HA zone share the same enabled features. If configured otherwise,
 | ||
| one might have a checker feature on the left node, nothing on the right node.
 | ||
| This leads to late check results because one half is not executed by the right
 | ||
| node which holds half of the object authorities.
 | ||
| 
 | ||
| By default, features are enabled to "Run-Everywhere". Specific features which
 | ||
| support HA awareness, provide the `enable_ha` configuration attribute. When `enable_ha`
 | ||
| is set to `true` (usually the default), "Run-Once" is set and the feature pauses on one side.
 | ||
| 
 | ||
| ```
 | ||
| vim /etc/icinga2/features-enabled/graphite.conf
 | ||
| 
 | ||
| object GraphiteWriter "graphite" {
 | ||
|   ...
 | ||
|   enable_ha = true
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| Once such a feature is paused, there won't be any more event handling, e.g. the Elasticsearch
 | ||
| feature won't process any checkresults nor write to the Elasticsearch REST API.
 | ||
| 
 | ||
| When the cluster connection drops, the feature configuration object is updated with
 | ||
| the new object authority by the ApiListener timer and resumes its operation. You can see
 | ||
| that by grepping the log file for `resumed` and `paused`.
 | ||
| 
 | ||
| ```
 | ||
| [2018-10-24 13:28:28 +0200] information/GraphiteWriter: 'g-ha' paused.
 | ||
| ```
 | ||
| 
 | ||
| ```
 | ||
| [2018-10-24 13:28:28 +0200] information/GraphiteWriter: 'g-ha' resumed.
 | ||
| ```
 | ||
| 
 | ||
| Specific features with HA capabilities are explained below.
 | ||
| 
 | ||
| #### High Availability: Checker <a id="technical-concepts-cluster-ha-checker"></a>
 | ||
| 
 | ||
| The `checker` feature only executes checks for `Checkable` objects (Host, Service)
 | ||
| where it is authoritative.
 | ||
| 
 | ||
| That way each node only executes checks for a segment of the overall configuration objects.
 | ||
| 
 | ||
| The cluster message routing ensures that all check results are synchronized
 | ||
| to nodes which are not authoritative for this configuration object.
 | ||
| 
 | ||
| 
 | ||
| #### High Availability: Notifications <a id="technical-concepts-cluster-notifications"></a>
 | ||
| 
 | ||
| The `notification` feature only sends notifications for `Notification` objects
 | ||
| where it is authoritative.
 | ||
| 
 | ||
| That way each node only executes notifications for a segment of all notification objects.
 | ||
| 
 | ||
| Notified users and other event details are synchronized throughout the cluster.
 | ||
| This is required if for example the DB IDO feature is active on the other node.
 | ||
| 
 | ||
| #### High Availability: DB IDO <a id="technical-concepts-cluster-ha-ido"></a>
 | ||
| 
 | ||
| If you don't have HA enabled for the IDO feature, both nodes will
 | ||
| write their status and historical data to their own separate database
 | ||
| backends.
 | ||
| 
 | ||
| In order to avoid data separation and a split view (each node would require its
 | ||
| own Icinga Web 2 installation on top), the high availability option was added
 | ||
| to the DB IDO feature. This is enabled by default with the `enable_ha` setting.
 | ||
| 
 | ||
| This requires a central database backend. Best practice is to use a MySQL cluster
 | ||
| with a virtual IP.
 | ||
| 
 | ||
| Both Icinga 2 nodes require the connection and credential details configured in
 | ||
| their DB IDO feature.
 | ||
| 
 | ||
| During startup Icinga 2 calculates whether the feature configuration object
 | ||
| is authoritative on this node or not. The order is an alpha-numeric
 | ||
| comparison, e.g. if you have `master1` and `master2`, Icinga 2 will enable
 | ||
| the DB IDO feature on `master2` by default.
 | ||
| 
 | ||
| If the connection between endpoints drops, the object authority is re-calculated.
 | ||
| 
 | ||
| In order to prevent data duplication in a split-brain scenario where both
 | ||
| nodes would write into the same database, there is another safety mechanism
 | ||
| in place.
 | ||
| 
 | ||
| The split-brain decision which node will write to the database is calculated
 | ||
| from a quorum inside the `programstatus` table. Each node
 | ||
| verifies whether the `endpoint_name` column is not itself on database connect.
 | ||
| In addition to that the DB IDO feature compares the `last_update_time` column
 | ||
| against the current timestamp plus the configured `failover_timeout` offset.
 | ||
| 
 | ||
| That way only one active DB IDO feature writes to the database, even if they
 | ||
| are not currently connected in a cluster zone. This prevents data duplication
 | ||
| in historical tables.
 | ||
| 
 | ||
| ### Health Checks <a id="technical-concepts-cluster-health-checks"></a>
 | ||
| 
 | ||
| #### cluster-zone <a id="technical-concepts-cluster-health-checks-cluster-zone"></a>
 | ||
| 
 | ||
| This built-in check provides the possibility to check for connectivity between
 | ||
| zones.
 | ||
| 
 | ||
| If you for example need to know whether the `master` zone is connected and processing
 | ||
| messages with the child zone called `satellite` in this example, you can configure
 | ||
| the [cluster-zone](10-icinga-template-library.md#itl-icinga-cluster-zone) check as new service on all `master` zone hosts.
 | ||
| 
 | ||
| ```
 | ||
| vim /etc/zones.d/master/host1.conf
 | ||
| 
 | ||
| object Service "cluster-zone-satellite" {
 | ||
|   check_command = "cluster-zone"
 | ||
|   host_name = "host1"
 | ||
| 
 | ||
|   vars.cluster_zone = "satellite"
 | ||
| }
 | ||
| ```
 | ||
| 
 | ||
| The check itself changes to NOT-OK if one or more child endpoints in the child zone
 | ||
| are not connected to parent zone endpoints.
 | ||
| 
 | ||
| In addition to the overall connectivity check, the log lag is calculated based
 | ||
| on the to-be-sent replay log. Each instance stores that for its configured endpoint
 | ||
| objects.
 | ||
| 
 | ||
| This health check iterates over the target zone (`cluster_zone`) and their endpoints.
 | ||
| 
 | ||
| The log lag is greater than zero if
 | ||
| 
 | ||
| * the replay log synchronization is in progress and not yet finished or
 | ||
| * the endpoint is not connected, and no replay log sync happened (obviously).
 | ||
| 
 | ||
| The final log lag value is the worst value detected. If satellite1 has a log lag of
 | ||
| `1.5` and satellite2 only has `0.5`, the computed value will be `1.5.`.
 | ||
| 
 | ||
| You can control the check state by using optional warning and critical thresholds
 | ||
| for the log lag value.
 | ||
| 
 | ||
| If this service exists multiple times, e.g. for each master host object, the log lag
 | ||
| may differ based on the execution time. This happens for example on restart of
 | ||
| an instance when the log replay is in progress and a health check is executed at different
 | ||
| times.
 | ||
| If the endpoint is not connected, both master instances may have saved a different log replay
 | ||
| position from the last synchronisation.
 | ||
| 
 | ||
| The lag value is returned as performance metric key `slave_lag`.
 | ||
| 
 | ||
| Icinga 2 v2.9+ adds more performance metrics for these values:
 | ||
| 
 | ||
| * `last_messages_sent` and `last_messages_received` as UNIX timestamp
 | ||
| * `sum_messages_sent_per_second` and `sum_messages_received_per_second`
 | ||
| * `sum_bytes_sent_per_second` and `sum_bytes_received_per_second`
 | ||
| 
 | ||
| 
 | ||
| ### Config Sync <a id="technical-concepts-cluster-config-sync"></a>
 | ||
| 
 | ||
| The visible feature for the user is to put configuration files in `/etc/icinga2/zones.d/<zonename>`
 | ||
| and have them synced automatically to all involved zones and endpoints.
 | ||
| 
 | ||
| This not only includes host and service objects being checked
 | ||
| in a satellite zone, but also additional config objects such as
 | ||
| commands, groups, timeperiods and also templates.
 | ||
| 
 | ||
| Additional thoughts and complexity added:
 | ||
| 
 | ||
| - Putting files into zone directory names removes the burden to set the `zone` attribute on each object in this directory. This is done automatically by the config compiler.
 | ||
| - Inclusion of `zones.d` happens automatically, the user shouldn't be bothered about this.
 | ||
| - Before the REST API was created, only static configuration files in `/etc/icinga2/zones.d` existed. With the addition of config packages, additional `zones.d` targets must be registered (e.g. used by the Director)
 | ||
| - Only one config master is allowed. This one identifies itself with configuration files in `/etc/icinga2/zones.d`. This is not necessarily the zone master seen in the debug logs, that one is important for message routing internally.
 | ||
| - Objects and templates which cannot be bound into a specific zone (e.g. hosts in the satellite zone) must be made available "globally".
 | ||
| - Users must be able to deny the synchronisation of specific zones, e.g. for security reasons.
 | ||
| 
 | ||
| #### Config Sync: Config Master <a id="technical-concepts-cluster-config-sync-config-master"></a>
 | ||
| 
 | ||
| All zones must be configured and included in the `zones.conf` config file beforehand.
 | ||
| The zone names are the identifier for the directories underneath the `/etc/icinga2/zones.d`
 | ||
| directory. If a zone is not configured, it will not be included in the config sync - keep this
 | ||
| in mind for troubleshooting.
 | ||
| 
 | ||
| When the config master starts, the content of `/etc/icinga2/zones.d` is automatically
 | ||
| included. There's no need for an additional entry in `icinga2.conf` like `conf.d`.
 | ||
| You can verify this by running the config validation on debug level:
 | ||
| 
 | ||
| ```
 | ||
| icinga2 daemon -C -x debug | grep 'zones.d'
 | ||
| 
 | ||
| [2019-06-19 15:16:19 +0200] notice/ConfigCompiler: Compiling config file: /etc/icinga2/zones.d/global-templates/commands.conf
 | ||
| ```
 | ||
| 
 | ||
| Once the config validation succeeds, the startup routine for the daemon
 | ||
| copies the files into the "production" directory in `/var/lib/icinga2/api/zones`.
 | ||
| This directory is used for all endpoints where Icinga stores the received configuration.
 | ||
| With the exception of the config master retrieving this from `/etc/icinga2/zones.d` instead.
 | ||
| 
 | ||
| These operations are logged for better visibility.
 | ||
| 
 | ||
| ```
 | ||
| [2019-06-19 15:26:38 +0200] information/ApiListener: Copying 1 zone configuration files for zone 'global-templates' to '/var/lib/icinga2/api/zones/global-templates'.
 | ||
| [2019-06-19 15:26:38 +0200] information/ApiListener: Updating configuration file: /var/lib/icinga2/api/zones/global-templates//_etc/commands.conf
 | ||
| ```
 | ||
| 
 | ||
| The master is finished at this point. Depending on the cluster configuration,
 | ||
| the next iteration is a connected endpoint after successful TLS handshake and certificate
 | ||
| authentication.
 | ||
| 
 | ||
| It calls `SendConfigUpdate(client)` which sends the [config::Update](19-technical-concepts.md#technical-concepts-json-rpc-messages-config-update)
 | ||
| JSON-RPC message including all required zones and their configuration file content.
 | ||
| 
 | ||
| 
 | ||
| #### Config Sync: Receive Config <a id="technical-concepts-cluster-config-sync-receive-config"></a>
 | ||
| 
 | ||
| The secondary master endpoint and endpoints in a child zone will be connected to the config
 | ||
| master. The endpoint receives the [config::Update](19-technical-concepts.md#technical-concepts-json-rpc-messages-config-update)
 | ||
| JSON-RPC message and processes the content in `ConfigUpdateHandler()`. This method checks
 | ||
| whether config should be accepted. In addition to that, it locks a local mutex to avoid race conditions
 | ||
| with multiple syncs in parallel.
 | ||
| 
 | ||
| After that, the received configuration content is analysed.
 | ||
| 
 | ||
| > **Note**
 | ||
| >
 | ||
| > The cluster design allows that satellite endpoints may connect to the secondary master first.
 | ||
| > There is no immediate need to always connect to the config master first, especially since
 | ||
| > the satellite endpoints don't know that.
 | ||
| >
 | ||
| > The secondary master not only stores the master zone config files, but also all child zones.
 | ||
| > This is also the case for any HA enabled zone with more than one endpoint.
 | ||
| 
 | ||
| 
 | ||
| 2.11 puts the received configuration files into a staging directory in
 | ||
| `/var/lib/icinga2/api/zones-stage`. Previous versions directly wrote the
 | ||
| files into production which could have led to broken configuration on the
 | ||
| next manual restart.
 | ||
| 
 | ||
| ```
 | ||
| [2019-06-19 16:08:29 +0200] information/ApiListener: New client connection for identity 'master1' to [127.0.0.1]:5665
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Applying config update from endpoint 'master1' of zone 'master'.
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration for zone 'agent' from endpoint 'master1'. Comparing the checksums.
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Stage: Updating received configuration file '/var/lib/icinga2/api/zones-stage/agent//_etc/host.conf' for zone 'agent'.
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/agent' (176 Bytes).
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration for zone 'master' from endpoint 'master1'. Comparing the checksums.
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Applying configuration file update for path '/var/lib/icinga2/api/zones-stage/master' (17 Bytes).
 | ||
| [2019-06-19 16:08:30 +0200] information/ApiListener: Received configuration from endpoint 'master1' is different to production, triggering validation and reload.
 | ||
| ```
 | ||
| 
 | ||
| It then validates the received configuration in its own config stage. There is
 | ||
| an parameter override in place which disables the automatic inclusion of the production
 | ||
| config in `/var/lib/icinga2/api/zones`.
 | ||
| 
 | ||
| Once completed, the reload is triggered. This follows the same configurable timeout
 | ||
| as with the global reload.
 | ||
| 
 | ||
| ```
 | ||
| [2019-06-19 16:52:26 +0200] information/ApiListener: Config validation for stage '/var/lib/icinga2/api/zones-stage/' was OK, replacing into '/var/lib/icinga2/api/zones/' and triggering reload.
 | ||
| [2019-06-19 16:52:27 +0200] information/Application: Got reload command: Started new instance with PID '19945' (timeout is 300s).
 | ||
| [2019-06-19 16:52:28 +0200] information/Application: Reload requested, letting new process take over.
 | ||
| ```
 | ||
| 
 | ||
| Whenever the staged configuration validation fails, Icinga logs this including a reference
 | ||
| to the startup log file which includes additional errors.
 | ||
| 
 | ||
| ```
 | ||
| [2019-06-19 15:45:27 +0200] critical/ApiListener: Config validation failed for staged cluster config sync in '/var/lib/icinga2/api/zones-stage/'. Aborting. Logs: '/var/lib/icinga2/api/zones-stage//startup.log'
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| #### Config Sync: Changes and Reload <a id="technical-concepts-cluster-config-sync-changes-reload"></a>
 | ||
| 
 | ||
| Whenever a new configuration is received, it is validated and upon success, the
 | ||
| daemon automatically reloads. While the daemon continues with checks, the reload
 | ||
| cannot hand over open TCP connections. That being said, reloading the daemon everytime
 | ||
| a configuration is synchronized would lead into many not connected endpoints.
 | ||
| 
 | ||
| Therefore the cluster config sync checks whether the configuration files actually
 | ||
| changed, and will only trigger a reload when such a change happened.
 | ||
| 
 | ||
| 2.11 calculates a checksum from each file content and compares this to the
 | ||
| production configuration. Previous versions used additional metadata with timestamps from
 | ||
| files which sometimes led to problems with asynchronous dates.
 | ||
| 
 | ||
| > **Note**
 | ||
| >
 | ||
| > For compatibility reasons, the timestamp metadata algorithm is still intact, e.g.
 | ||
| > when the client is 2.11 already, but the parent endpoint is still on 2.10.
 | ||
| 
 | ||
| Icinga logs a warning when this happens.
 | ||
| 
 | ||
| ```
 | ||
| Received configuration update without checksums from parent endpoint satellite1. This behaviour is deprecated. Please upgrade the parent endpoint to 2.11+
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| The debug log provides more details on the actual checksums and checks. Future output
 | ||
| may change, use this solely for troubleshooting and debugging whenever the cluster
 | ||
| config sync fails.
 | ||
| 
 | ||
| ```
 | ||
| [2019-06-19 16:13:16 +0200] information/ApiListener: Received configuration for zone 'agent' from endpoint 'master1'. Comparing the checksums.
 | ||
| [2019-06-19 16:13:16 +0200] debug/ApiListener: Checking for config change between stage and production. Old (3): '{"/.checksums":"7ede1276a9a32019c1412a52779804a976e163943e268ec4066e6b6ec4d15d73","/.timestamp":"ec4354b0eca455f7c2ca386fddf5b9ea810d826d402b3b6ac56ba63b55c2892c","/_etc/host.conf":"35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44"}' vs. new (3): '{"/.checksums":"84a586435d732327e2152e7c9b6d85a340cc917b89ae30972042f3dc344ea7cf","/.timestamp":"0fd6facf35e49ab1b2a161872fa7ad794564eba08624373d99d31c32a7a4c7d3","/_etc/host.conf":"0d62075e89be14088de1979644b40f33a8f185fcb4bb6ff1f7da2f63c7723fcb"}'.
 | ||
| [2019-06-19 16:13:16 +0200] debug/ApiListener: Checking /_etc/host.conf for checksum: 35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44
 | ||
| [2019-06-19 16:13:16 +0200] debug/ApiListener: Path '/_etc/host.conf' doesn't match old checksum '0d62075e89be14088de1979644b40f33a8f185fcb4bb6ff1f7da2f63c7723fcb' with new checksum '35d4823684d83a5ab0ca853c9a3aa8e592adfca66210762cdf2e54339ccf0a44'.
 | ||
| ```
 | ||
| 
 | ||
| 
 | ||
| #### Config Sync: Trust <a id="technical-concepts-cluster-config-sync-trust"></a>
 | ||
| 
 | ||
| The config sync follows the "top down" approach, where the master endpoint in the master
 | ||
| zone is allowed to synchronize configuration to the child zone, e.g. the satellite zone.
 | ||
| 
 | ||
| Endpoints in the same zone, e.g. a secondary master, receive configuration for the same
 | ||
| zone and all child zones.
 | ||
| 
 | ||
| Endpoints in the satellite zone trust the parent zone, and will accept the pushed
 | ||
| configuration via JSON-RPC cluster messages. By default, this is disabled and must
 | ||
| be enabled with the `accept_config` attribute in the ApiListener feature (manually or with CLI
 | ||
| helpers).
 | ||
| 
 | ||
| The satellite zone will not only accept zone configuration for its own zone, but also
 | ||
| all configured child zones. That is why it is important to configure the zone hierarchy
 | ||
| on the satellite as well.
 | ||
| 
 | ||
| Child zones are not allowed to sync configuration up to the parent zone. Each Icinga instance
 | ||
| evaluates this in startup and knows on endpoint connect which config zones need to be synced.
 | ||
| 
 | ||
| 
 | ||
| Global zones have a special trust relationship: They are synced to all child zones, be it
 | ||
| a satellite zone or agent zone. Since checkable objects such as a Host or a Service object
 | ||
| must have only one endpoint as authority, they cannot be put into a global zone (denied by
 | ||
| the config compiler).
 | ||
| 
 | ||
| Apply rules and templates are allowed, since they are evaluated in the endpoint which received
 | ||
| the synced configuration. Keep in mind that there may be differences on the master and the satellite
 | ||
| when e.g. hostgroup membership is used for assign where expressions, but the groups are only
 | ||
| available on the master.
 | ||
| 
 | ||
| 
 | ||
| ### Cluster: Message Routing <a id="technical-concepts-cluster-message-routing"></a>
 | ||
| 
 | ||
| One fundamental part of the cluster message routing is the MessageOrigin object.
 | ||
| This is created when a new JSON-RPC message is received in `JsonRpcConnection::MessageHandler()`.
 | ||
| 
 | ||
| It contains
 | ||
| 
 | ||
| - FromZone being extracted from the endpoint object which owns the JsonRpcConnection
 | ||
| - FromClient being the JsonRpcConnection bound to the endpoint object
 | ||
| 
 | ||
| These attributes are checked in message receive api handlers for security access. E.g. whether a
 | ||
| message origin is from a child zone which is not allowed, etc.
 | ||
| This is explained in the [JSON-RPC messages](19-technical-concepts.md#technical-concepts-json-rpc-messages) chapter.
 | ||
| 
 | ||
| Whenever such a message is processed on the client, it may trigger additional cluster events
 | ||
| which are sent back to other endpoints. Therefore it is key to always pass the MessageOrigin
 | ||
| `origin` when processing these messages locally.
 | ||
| 
 | ||
| Example:
 | ||
| 
 | ||
| - Client receives a CheckResult from another endpoint in the same zone, call it `sender` for now
 | ||
| - Calls ProcessCheckResult() to store the CR and calculcate states, notifications, etc.
 | ||
| - Calls the OnNewCheckResult() signal to trigger IDO updates
 | ||
| 
 | ||
| OnNewCheckResult() also calls a registered cluster handler which forwards the CheckResult to other cluster members.
 | ||
| 
 | ||
| Without any origin details, this CheckResult would be relayed to the `sender` endpoint again.
 | ||
| Which processes the message, ProcessCheckResult(), OnNewCheckResult(), sends back and so on.
 | ||
| 
 | ||
| That creates a loop which our cluster protocol needs to prevent at all cost.
 | ||
| 
 | ||
| RelayMessageOne() takes care of the routing. This involves fetching the targetZone for this message and its endpoints.
 | ||
| 
 | ||
| - Don't relay messages to ourselves.
 | ||
| - Don't relay messages to disconnected endpoints.
 | ||
| - Don't relay the message to the zone through more than one endpoint unless this is our own zone.
 | ||
| - Don't relay messages back to the endpoint which we got the message from. **THIS**
 | ||
| - Don't relay messages back to the zone which we got the message from.
 | ||
| - Only relay message to the zone master if we're not currently the zone master.
 | ||
| 
 | ||
| ```
 | ||
|  e1 is zone master, e2 and e3 are zone members.
 | ||
| 
 | ||
|  Message is sent from e2 or e3:
 | ||
|    !isMaster == true
 | ||
|    targetEndpoint e1 is zone master -> send the message
 | ||
|    targetEndpoint e3 is not zone master -> skip it, avoid routing loops
 | ||
| 
 | ||
|  Message is sent from e1:
 | ||
|    !isMaster == false -> send the messages to e2 and e3 being the zone routing master.
 | ||
| ```
 | ||
| 
 | ||
| With passing the `origin` the following condition prevents sending a message back to sender:
 | ||
| 
 | ||
| ```
 | ||
| if (origin && origin->FromClient && targetEndpoint == origin->FromClient->GetEndpoint()) {
 | ||
| ```
 | ||
| 
 | ||
| This message then simply gets skipped for this specific Endpoint and is never sent.
 | ||
| 
 | ||
| This analysis originates from a long-lasting [downtime loop bug](https://github.com/Icinga/icinga2/issues/7198).
 | ||
| 
 | ||
| ## TLS Network IO <a id="technical-concepts-tls-network-io"></a>
 | ||
| 
 | ||
| ### TLS Connection Handling <a id="technical-concepts-tls-network-io-connection-handling"></a>
 | ||
| 
 | ||
| Icinga supports two connection directions, controlled via the `host` attribute
 | ||
| inside the Endpoint objects:
 | ||
| 
 | ||
| * Outgoing connection attempts
 | ||
| * Incoming connection handling
 | ||
| 
 | ||
| Once the connection is established, higher layers can exchange JSON-RPC and
 | ||
| HTTP messages. It doesn't matter which direction these message go.
 | ||
| 
 | ||
| This offers a big advantage over single direction connections, just like
 | ||
| polling via HTTP only. Also, connections are kept alive as long as data
 | ||
| is transmitted.
 | ||
| 
 | ||
| When the master connects to the child zone member(s), this requires more
 | ||
| resources there. Keep this in mind when endpoints are not reachable, the
 | ||
| TCP timeout blocks other resources. Moving a satellite zone in the middle
 | ||
| between masters and agents helps to split the tasks - the master
 | ||
| processes and stores data, deploys configuration and serves the API. The
 | ||
| satellites schedule the checks, connect to the agents and receive
 | ||
| check results.
 | ||
| 
 | ||
| Agents/Clients can also connect to the parent endpoints - be it a master or
 | ||
| a satellite. This is the preferred way out of a DMZ, and also reduces the
 | ||
| overhead with connecting to e.g. 2000 agents on the master. You can
 | ||
| benchmark this when TCP connections are broken and timeouts are encountered.
 | ||
| 
 | ||
| #### Master Processes Incoming Connection <a id="technical-concepts-tls-network-io-connection-handling-incoming"></a>
 | ||
| 
 | ||
| * The node starts a new ApiListener, this invokes `AddListener()`
 | ||
|     * Setup TLS Context (SslContext)
 | ||
|     * Initialize global I/O engine and create a TCP acceptor
 | ||
|     * Resolve bind host/port (optional)
 | ||
|     * Listen on IPv4 and IPv6
 | ||
|     * Re-use socket address and port
 | ||
|     * Listen on port 5665 with `INT_MAX` possible sockets
 | ||
| * Spawn a new Coroutine which listens for new incoming connections as 'TCP server' pattern
 | ||
|     * Accept new connections asynchronously
 | ||
|     * Spawn a new Coroutine which handles the new client connection in a different context, Role: Server
 | ||
| 
 | ||
| #### Master Connects Outgoing <a id="technical-concepts-tls-network-io-connection-handling-outgoing"></a>
 | ||
| 
 | ||
| * The node starts a timer in a 10 seconds interval with `ApiReconnectTimerHandler()` as callback
 | ||
|     * Loop over all configured zones, exclude global zones and not direct parent/child zones
 | ||
|     * Get the endpoints configured in the zones, exclude: local endpoint, no 'host' attribute, already connected or in progress
 | ||
|     * Call `AddConnection()`
 | ||
| * Spawn a new Coroutine after making the TLS context
 | ||
|     * Use the global I/O engine for socket I/O
 | ||
|     * Create TLS stream
 | ||
|     * Connect to endpoint host/port details
 | ||
|     * Handle the client connection, Role: Client
 | ||
| 
 | ||
| #### TLS Handshake <a id="technical-concepts-tls-network-io-connection-handling-handshake"></a>
 | ||
| 
 | ||
| * Create a TLS connection in sslConn and perform an asynchronous TLS handshake
 | ||
| * Get the peer certificate
 | ||
| * Verify the presented certificate: `ssl::verify_peer` and `ssl::verify_client_once`
 | ||
| * Get the certificate CN and compare it against the endpoint name - if not matching, return and close the connection
 | ||
| 
 | ||
| #### Data Exchange <a id="technical-concepts-tls-network-io-connection-data-exchange"></a>
 | ||
| 
 | ||
| Everything runs through TLS, we don't use any "raw" connections nor plain message handling.
 | ||
| 
 | ||
| HTTP and JSON-RPC messages share the same port and API, so additional handling is required.
 | ||
| 
 | ||
| On a new connection and successful TLS handshake, the first byte is read. This either
 | ||
| is a JSON-RPC message in Netstring format starting with a number, or plain HTTP.
 | ||
| 
 | ||
| ```
 | ||
| HTTP/1.1
 | ||
| 
 | ||
| 2:{}
 | ||
| ```
 | ||
| 
 | ||
| Depending on this, `ClientJsonRpc` or `ClientHttp` are assigned.
 | ||
| 
 | ||
| JSON-RPC:
 | ||
| 
 | ||
| * Create a new JsonRpcConnection object
 | ||
|     * When the endpoint object is configured, spawn a Coroutine which takes care of syncing the client (file and runtime config, replay log, etc.)
 | ||
|     * No endpoint treats this connection as anonymous client, with a configurable limit. This client may send a CSR signing request for example.
 | ||
|     * Start the JsonRpcConnection - this spawns Coroutines to HandleIncomingMessages, WriteOutgoingMessages, HandleAndWriteHeartbeats and CheckLiveness
 | ||
| 
 | ||
| HTTP:
 | ||
| 
 | ||
| * Create a new HttpServerConnection
 | ||
|      * Start the HttpServerConnection - this spawns Coroutines to ProcessMessages and CheckLiveness
 | ||
| 
 | ||
| 
 | ||
| All the mentioned Coroutines run asynchronously using the global I/O engine's context.
 | ||
| More details on this topic can be found in [this blogpost](https://www.netways.de/blog/2019/04/04/modern-c-programming-coroutines-with-boost/).
 | ||
| 
 | ||
| The lower levels of context switching and sharing or event polling are
 | ||
| hidden in Boost ASIO, Beast, Coroutine and Context libraries.
 | ||
| 
 | ||
| #### Data Exchange: Coroutines and I/O Engine <a id="technical-concepts-tls-network-io-connection-data-exchange-coroutines"></a>
 | ||
| 
 | ||
| Light-weight and fast operations such as connection handling or TLS handshakes
 | ||
| are performed in the default `IoBoundWorkSlot` pool inside the I/O engine.
 | ||
| 
 | ||
| The I/O engine has another pool available: `CpuBoundWork`.
 | ||
| 
 | ||
| This is used for processing CPU intensive tasks, such as handling a HTTP request.
 | ||
| Depending on the available CPU cores, this is limited to `std::thread::hardware_concurrency() * 3u / 2u`.
 | ||
| 
 | ||
| ```
 | ||
| 1 core * 3 / 2 = 1
 | ||
| 2 cores * 3 / 2 = 3
 | ||
| 8 cores * 3 / 2 = 12
 | ||
| 16 cores * 3 / 2 = 24
 | ||
| ```
 | ||
| 
 | ||
| The I/O engine itself is used with all network I/O in Icinga, not only the cluster
 | ||
| and the REST API. Features such as Graphite, InfluxDB, etc. also consume its functionality.
 | ||
| 
 | ||
| There are 2 * CPU cores threads available which run the event loop
 | ||
| in the I/O engine. This polls the I/O service with `m_IoService.run();`
 | ||
| and triggers an asynchronous event progress for waiting coroutines.
 | ||
| 
 | ||
| <!--
 | ||
| ## REST API <a id="technical-concepts-rest-api"></a>
 | ||
| 
 | ||
| Icinga 2 provides its own HTTP server which shares the port 5665 with
 | ||
| the JSON-RPC cluster protocol.
 | ||
| -->
 | ||
| 
 | ||
| ## JSON-RPC Message API <a id="technical-concepts-json-rpc-messages"></a>
 | ||
| 
 | ||
| **The JSON-RPC message API is not a public API for end users.** In case you want
 | ||
| to interact with Icinga, use the [REST API](12-icinga2-api.md#icinga2-api).
 | ||
| 
 | ||
| This section describes the internal cluster messages exchanged between endpoints.
 | ||
| 
 | ||
| > **Tip**
 | ||
| >
 | ||
| > Debug builds with `icinga2 daemon -DInternal.DebugJsonRpc=1` unveils the JSON-RPC messages.
 | ||
| 
 | ||
| ### Registered Handler Functions
 | ||
| 
 | ||
| Functions by example:
 | ||
| 
 | ||
| Event Sender: `Checkable::OnNewCheckResult`
 | ||
| 
 | ||
| ```
 | ||
| On<xyz>.connect(&xyzHandler)
 | ||
| ```
 | ||
| 
 | ||
| Event Receiver (Client): `CheckResultAPIHandler` in `REGISTER_APIFUNCTION`
 | ||
| 
 | ||
| ```
 | ||
| <xyz>APIHandler()
 | ||
| ```
 | ||
| 
 | ||
| ### Messages
 | ||
| 
 | ||
| #### icinga::Hello <a id="technical-concepts-json-rpc-messages-icinga-hello"></a>
 | ||
| 
 | ||
| > Location: `apilistener.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | icinga::Hello
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Currently empty.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: When a new client connects in `NewClientHandlerInternal()`.
 | ||
| Event Receiver: `HelloAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| None, this is a required message.
 | ||
| 
 | ||
| #### event::Heartbeat <a id="technical-concepts-json-rpc-messages-event-heartbeat"></a>
 | ||
| 
 | ||
| > Location: `jsonrpcconnection-heartbeat.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::Heartbeat
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key       | Type          | Description
 | ||
| ----------|---------------|------------------
 | ||
| timeout   | Number        | Heartbeat timeout, sender sets 120s.
 | ||
| 
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `JsonRpcConnection::HeartbeatTimerHandler`
 | ||
| Event Receiver: `HeartbeatAPIHandler`
 | ||
| 
 | ||
| Both sender and receiver exchange this heartbeat message. If the sender detects
 | ||
| that a client endpoint hasn't sent anything in the updated timeout span, it disconnects
 | ||
| the client. This is to avoid stale connections with no message processing.
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| None, this is a required message.
 | ||
| 
 | ||
| #### event::CheckResult <a id="technical-concepts-json-rpc-messages-event-checkresult"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::CheckResult
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key       | Type          | Description
 | ||
| ----------|---------------|------------------
 | ||
| host      | String        | Host name
 | ||
| service   | String        | Service name
 | ||
| cr        | Serialized CR | Check result
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnNewCheckResult`
 | ||
| Event Receiver: `CheckResultAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Hosts/services do not exist
 | ||
| * Origin is a remote command endpoint different to the configured, and whose zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::SetNextCheck <a id="technical-concepts-json-rpc-messages-event-setnextcheck"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SetNextCheck
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key         | Type          | Description
 | ||
| ------------|---------------|------------------
 | ||
| host        | String        | Host name
 | ||
| service     | String        | Service name
 | ||
| next\_check | Timestamp     | Next scheduled time as UNIX timestamp.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnNextCheckChanged`
 | ||
| Event Receiver: `NextCheckChangedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::SuppressedNotifications <a id="technical-concepts-json-rpc-messages-event-setsupressednotifications"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SuppressedNotifications
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key         		 | Type          | Description
 | ||
| -------------------------|---------------|------------------
 | ||
| host        		 | String        | Host name
 | ||
| service     		 | String        | Service name
 | ||
| supressed\_notifications | Number 	 | Bitmask for suppressed notifications.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnSuppressedNotificationsChanged`
 | ||
| Event Receiver: `SuppressedNotificationsChangedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| 
 | ||
| #### event::SetNextNotification <a id="technical-concepts-json-rpc-messages-event-setnextnotification"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SetNextNotification
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key                | Type          | Description
 | ||
| -------------------|---------------|------------------
 | ||
| host               | String        | Host name
 | ||
| service            | String        | Service name
 | ||
| notification       | String        | Notification name
 | ||
| next\_notification | Timestamp     | Next scheduled notification time as UNIX timestamp.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Notification::OnNextNotificationChanged`
 | ||
| Event Receiver: `NextNotificationChangedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Notification does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::SetForceNextCheck <a id="technical-concepts-json-rpc-messages-event-setforcenextcheck"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SetForceNextCheck
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key       | Type          | Description
 | ||
| ----------|---------------|------------------
 | ||
| host      | String        | Host name
 | ||
| service   | String        | Service name
 | ||
| forced    | Boolean       | Forced next check (execute now)
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnForceNextCheckChanged`
 | ||
| Event Receiver: `ForceNextCheckChangedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::SetForceNextNotification <a id="technical-concepts-json-rpc-messages-event-setforcenextnotification"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SetForceNextNotification
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key       | Type          | Description
 | ||
| ----------|---------------|------------------
 | ||
| host      | String        | Host name
 | ||
| service   | String        | Service name
 | ||
| forced    | Boolean       | Forced next check (execute now)
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::SetForceNextNotification`
 | ||
| Event Receiver: `ForceNextNotificationChangedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::SetAcknowledgement <a id="technical-concepts-json-rpc-messages-event-setacknowledgement"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SetAcknowledgement
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key        | Type          | Description
 | ||
| -----------|---------------|------------------
 | ||
| host       | String        | Host name
 | ||
| service    | String        | Service name
 | ||
| author     | String        | Acknowledgement author name.
 | ||
| comment    | String        | Acknowledgement comment content.
 | ||
| acktype    | Number        | Acknowledgement type (0=None, 1=Normal, 2=Sticky)
 | ||
| notify     | Boolean       | Notification should be sent.
 | ||
| persistent | Boolean       | Whether the comment is persistent.
 | ||
| expiry     | Timestamp     | Optional expire time as UNIX timestamp.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnForceNextCheckChanged`
 | ||
| Event Receiver: `ForceNextCheckChangedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::ClearAcknowledgement <a id="technical-concepts-json-rpc-messages-event-clearacknowledgement"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::ClearAcknowledgement
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key       | Type          | Description
 | ||
| ----------|---------------|------------------
 | ||
| host      | String        | Host name
 | ||
| service   | String        | Service name
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnAcknowledgementCleared`
 | ||
| Event Receiver: `AcknowledgementClearedAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone is not allowed to access this checkable.
 | ||
| 
 | ||
| #### event::SendNotifications <a id="technical-concepts-json-rpc-messages-event-sendnotifications"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::SendNotifications
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key       | Type          | Description
 | ||
| ----------|---------------|------------------
 | ||
| host      | String        | Host name
 | ||
| service   | String        | Service name
 | ||
| cr        | Serialized CR | Check result
 | ||
| type      | Number        | enum NotificationType, same as `types` for notification objects.
 | ||
| author    | String        | Author name
 | ||
| text      | String        | Notification text
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnNotificationsRequested`
 | ||
| Event Receiver: `SendNotificationsAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone.
 | ||
| 
 | ||
| #### event::NotificationSentUser <a id="technical-concepts-json-rpc-messages-event-notificationsentuser"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::NotificationSentUser
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key           | Type            | Description
 | ||
| --------------|-----------------|------------------
 | ||
| host          | String          | Host name
 | ||
| service       | String          | Service name
 | ||
| notification  | String          | Notification name.
 | ||
| user          | String          | Notified user name.
 | ||
| type          | Number          | enum NotificationType, same as `types` in Notification objects.
 | ||
| cr            | Serialized CR   | Check result.
 | ||
| author        | String          | Notification author (for specific types)
 | ||
| text          | String          | Notification text (for specific types)
 | ||
| command       | String          | Notification command name.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnNotificationSentToUser`
 | ||
| Event Receiver: `NotificationSentUserAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone.
 | ||
| 
 | ||
| #### event::NotificationSentToAllUsers <a id="technical-concepts-json-rpc-messages-event-notificationsenttoallusers"></a>
 | ||
| 
 | ||
| > Location: `clusterevents.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::NotificationSentToAllUsers
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key                         | Type            | Description
 | ||
| ----------------------------|-----------------|------------------
 | ||
| host                        | String          | Host name
 | ||
| service                     | String          | Service name
 | ||
| notification                | String          | Notification name.
 | ||
| users                       | Array of String | Notified user names.
 | ||
| type                        | Number          | enum NotificationType, same as `types` in Notification objects.
 | ||
| cr                          | Serialized CR   | Check result.
 | ||
| author                      | String          | Notification author (for specific types)
 | ||
| text                        | String          | Notification text (for specific types)
 | ||
| last\_notification          | Timestamp       | Last notification time as UNIX timestamp.
 | ||
| next\_notification          | Timestamp       | Next scheduled notification time as UNIX timestamp.
 | ||
| notification\_number        | Number          | Current notification number in problem state.
 | ||
| last\_problem\_notification | Timestamp       | Last problem notification time as UNIX timestamp.
 | ||
| no\_more\_notifications     | Boolean         | Whether to send future notifications when this notification becomes active on this HA node.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `Checkable::OnNotificationSentToAllUsers`
 | ||
| Event Receiver: `NotificationSentToAllUsersAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Checkable does not exist.
 | ||
| * Origin endpoint's zone the same as the receiver. This binds notification messages to the HA zone.
 | ||
| 
 | ||
| #### event::ExecuteCommand <a id="technical-concepts-json-rpc-messages-event-executecommand"></a>
 | ||
| 
 | ||
| > Location: `clusterevents-check.cpp` and `checkable-check.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | event::ExecuteCommand
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key           | Type          | Description
 | ||
| --------------|---------------|------------------
 | ||
| host          | String        | Host name.
 | ||
| service       | String        | Service name.
 | ||
| command\_type | String        | `check_command` or `event_command`.
 | ||
| command       | String        | CheckCommand or EventCommand name.
 | ||
| macros        | Dictionary    | Command arguments as key/value pairs for remote execution.
 | ||
| 
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| **Event Sender:** This gets constructed directly in `Checkable::ExecuteCheck()` or `Checkable::ExecuteEventHandler()` when a remote command endpoint is configured.
 | ||
| 
 | ||
| * `Get{CheckCommand,EventCommand}()->Execute()` simulates an execution and extracts all command arguments into the `macro` dictionary (inside lib/methods tasks).
 | ||
| * When the endpoint is connected, the message is constructed and sent directly.
 | ||
| * When the endpoint is not connected and not syncing replay logs and 5m after application start, generate an UNKNOWN check result for the user ("not connected").
 | ||
| 
 | ||
| **Event Receiver:** `ExecuteCommandAPIHandler`
 | ||
| 
 | ||
| Special handling, calls `ClusterEvents::EnqueueCheck()` for command endpoint checks.
 | ||
| This function enqueues check tasks into a queue which is controlled in `RemoteCheckThreadProc()`.
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Origin endpoint's zone is not a parent zone of the receiver endpoint.
 | ||
| * `accept_commands = false` in the `api` feature configuration sends back an UNKNOWN check result to the sender.
 | ||
| 
 | ||
| The receiver constructs a virtual host object and looks for the local CheckCommand object.
 | ||
| 
 | ||
| Returns UNKNOWN as check result to the sender
 | ||
| 
 | ||
| * when the CheckCommand object does not exist.
 | ||
| * when there was an exception triggered from check execution, e.g. the plugin binary could not be executed or similar.
 | ||
| 
 | ||
| The returned messages are synced directly to the sender's endpoint, no cluster broadcast.
 | ||
| 
 | ||
| > **Note**: EventCommand errors are just logged on the remote endpoint.
 | ||
| 
 | ||
| #### config::Update <a id="technical-concepts-json-rpc-messages-config-update"></a>
 | ||
| 
 | ||
| > Location: `apilistener-filesync.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | config::Update
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key        | Type          | Description
 | ||
| -----------|---------------|------------------
 | ||
| update     | Dictionary    | Config file paths and their content.
 | ||
| update\_v2 | Dictionary    | Additional meta config files introduced in 2.4+ for compatibility reasons.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| **Event Sender:** `SendConfigUpdate()` called in `ApiListener::SyncClient()` when a new client endpoint connects.
 | ||
| **Event Receiver:** `ConfigUpdateHandler` reads the config update content and stores them in `/var/lib/icinga2/api`.
 | ||
| When it detects a configuration change, the function requests and application restart.
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * The origin sender is not in a parent zone of the receiver.
 | ||
| * `api` feature does not accept config.
 | ||
| 
 | ||
| Config updates will be ignored when:
 | ||
| 
 | ||
| * The zone is not configured on the receiver endpoint.
 | ||
| * The zone is authoritative on this instance (this only happens on a master which has `/etc/icinga2/zones.d` populated, and prevents sync loops)
 | ||
| 
 | ||
| #### config::UpdateObject <a id="technical-concepts-json-rpc-messages-config-updateobject"></a>
 | ||
| 
 | ||
| > Location: `apilistener-configsync.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | config::UpdateObject
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key                  | Type        | Description
 | ||
| ---------------------|-------------|------------------
 | ||
| name                 | String      | Object name.
 | ||
| type                 | String      | Object type name.
 | ||
| version              | Number      | Object version.
 | ||
| config               | String      | Config file content for `_api` packages.
 | ||
| modified\_attributes | Dictionary  | Modified attributes at runtime as key value pairs.
 | ||
| original\_attributes | Array       | Original attributes as array of keys.
 | ||
| 
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| **Event Sender:** Either on client connect (full sync), or runtime created/updated object
 | ||
| 
 | ||
| `ApiListener::SendRuntimeConfigObjects()` gets called when a new endpoint is connected
 | ||
| and runtime created config objects need to be synced. This invokes a call to `UpdateConfigObject()`
 | ||
| to only sync this JsonRpcConnection client.
 | ||
| 
 | ||
| `ConfigObject::OnActiveChanged` (created or deleted) or `ConfigObject::OnVersionChanged` (updated)
 | ||
| also call `UpdateConfigObject()`.
 | ||
| 
 | ||
| **Event Receiver:** `ConfigUpdateObjectAPIHandler` calls `ConfigObjectUtility::CreateObject()` in order
 | ||
| to create the object if it is not already existing. Afterwards, all modified attributes are applied
 | ||
| and in case, original attributes are restored. The object version is set as well, keeping it in sync
 | ||
| with the sender.
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| ###### Sender
 | ||
| 
 | ||
| Client receiver connects:
 | ||
| 
 | ||
| The sender only syncs config object updates to a client which can access
 | ||
| the config object, in `ApiListener::SendRuntimeConfigObjects()`.
 | ||
| 
 | ||
| In addition to that, the client endpoint's zone is checked whether this zone may access
 | ||
| the config object.
 | ||
| 
 | ||
| Runtime updated object:
 | ||
| 
 | ||
| Only if the config object belongs to the `_api` package.
 | ||
| 
 | ||
| 
 | ||
| ###### Receiver
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Origin sender endpoint's zone is in a child zone.
 | ||
| * `api` feature does not accept config
 | ||
| * The received config object type does not exist (this is to prevent failures with older nodes and new object types).
 | ||
| 
 | ||
| Error handling:
 | ||
| 
 | ||
| * Log an error if `CreateObject` fails (only if the object does not already exist)
 | ||
| * Local object version is newer than the received version, object will not be updated.
 | ||
| * Compare modified and original attributes and restore any type of change here.
 | ||
| 
 | ||
| 
 | ||
| #### config::DeleteObject <a id="technical-concepts-json-rpc-messages-config-deleteobject"></a>
 | ||
| 
 | ||
| > Location: `apilistener-configsync.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | config::DeleteObject
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key                 | Type        | Description
 | ||
| --------------------|-------------|------------------
 | ||
| name                | String      | Object name.
 | ||
| type                | String      | Object type name.
 | ||
| version             | Number      | Object version.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| **Event Sender:**
 | ||
| 
 | ||
| `ConfigObject::OnActiveChanged` (created or deleted) or `ConfigObject::OnVersionChanged` (updated)
 | ||
| call `DeleteConfigObject()`.
 | ||
| 
 | ||
| **Event Receiver:** `ConfigDeleteObjectAPIHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| ###### Sender
 | ||
| 
 | ||
| Runtime deleted object:
 | ||
| 
 | ||
| Only if the config object belongs to the `_api` package.
 | ||
| 
 | ||
| ###### Receiver
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 | ||
| 
 | ||
| Message updates will be dropped when:
 | ||
| 
 | ||
| * Origin sender endpoint's zone is in a child zone.
 | ||
| * `api` feature does not accept config
 | ||
| * The received config object type does not exist (this is to prevent failures with older nodes and new object types).
 | ||
| * The object in question was not created at runtime, it does not belong to the `_api` package.
 | ||
| 
 | ||
| Error handling:
 | ||
| 
 | ||
| * Log an error if `DeleteObject` fails (only if the object does not already exist)
 | ||
| 
 | ||
| #### pki::RequestCertificate <a id="technical-concepts-json-rpc-messages-pki-requestcertificate"></a>
 | ||
| 
 | ||
| > Location: `jsonrpcconnection-pki.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | pki::RequestCertificate
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key           | Type          | Description
 | ||
| --------------|---------------|------------------
 | ||
| ticket        | String        | Own ticket, or as satellite in CA proxy from local store.
 | ||
| cert\_request | String        | Certificate request content from local store, optional.
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| Event Sender: `RequestCertificateHandler`
 | ||
| Event Receiver: `RequestCertificateHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| This is an anonymous request, and the number of anonymous clients can be configured
 | ||
| in the `api` feature.
 | ||
| 
 | ||
| Only valid certificate request messages are processed, and valid signed certificates
 | ||
| won't be signed again.
 | ||
| 
 | ||
| #### pki::UpdateCertificate <a id="technical-concepts-json-rpc-messages-pki-updatecertificate"></a>
 | ||
| 
 | ||
| > Location: `jsonrpcconnection-pki.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | pki::UpdateCertificate
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key                  | Type          | Description
 | ||
| ---------------------|---------------|------------------
 | ||
| status\_code         | Number        | Status code, 0=ok.
 | ||
| cert                 | String        | Signed certificate content.
 | ||
| ca                   | String        | Public CA certificate content.
 | ||
| fingerprint\_request | String        | Certificate fingerprint from the CSR.
 | ||
| 
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| **Event Sender:**
 | ||
| 
 | ||
| * When a client requests a certificate in `RequestCertificateHandler` and the satellite
 | ||
| already has a signed certificate, the `pki::UpdateCertificate` message is constructed and sent back.
 | ||
| * When the endpoint holding the master's CA private key (and TicketSalt private key) is able to sign
 | ||
| the request, the `pki::UpdateCertificate` message is constructed and sent back.
 | ||
| 
 | ||
| **Event Receiver:** `UpdateCertificateHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| Message updates are dropped when
 | ||
| 
 | ||
| * The origin sender is not in a parent zone of the receiver.
 | ||
| * The certificate fingerprint is in an invalid format.
 | ||
| 
 | ||
| #### log::SetLogPosition <a id="technical-concepts-json-rpc-messages-log-setlogposition"></a>
 | ||
| 
 | ||
| > Location: `apilistener.cpp` and `jsonrpcconnection.cpp`
 | ||
| 
 | ||
| ##### Message Body
 | ||
| 
 | ||
| Key       | Value
 | ||
| ----------|---------
 | ||
| jsonrpc   | 2.0
 | ||
| method    | log::SetLogPosition
 | ||
| params    | Dictionary
 | ||
| 
 | ||
| ##### Params
 | ||
| 
 | ||
| Key                 | Type          | Description
 | ||
| --------------------|---------------|------------------
 | ||
| log\_position       | Timestamp     | The endpoint's log position as UNIX timestamp.
 | ||
| 
 | ||
| 
 | ||
| ##### Functions
 | ||
| 
 | ||
| **Event Sender:**
 | ||
| 
 | ||
| During log replay to a client endpoint in `ApiListener::ReplayLog()`, each processed
 | ||
| file generates a message which updates the log position timestamp.
 | ||
| 
 | ||
| `ApiListener::ApiTimerHandler()` invokes a check to keep all connected endpoints and
 | ||
| their log position in sync during replay log.
 | ||
| 
 | ||
| **Event Receiver:** `SetLogPositionHandler`
 | ||
| 
 | ||
| ##### Permissions
 | ||
| 
 | ||
| The receiver will not process messages from not configured endpoints.
 |