icinga2/docs/icinga2-intro.md

269 lines
12 KiB
Markdown
Raw Normal View History

2013-09-25 13:51:48 +02:00
Icinga 2 is a network monitoring application that tries to improve upon
the success of Icinga 1.x while fixing some of its shortcomings. A few
frequently encountered issues are:
- Scalability problems in large monitoring setups
- Difficult configuration with dozens of "magic" tweaks and several
ways of defining services
- Code quality and the resulting inability to implement changes
without breaking add-ons
- Limited access to the runtime state of Icinga (e.g. for querying a
services state or for dynamically creating new services)
Fixing these issues would involve major breaking changes to the Icinga
1.x core and configuration syntax. Icinga users would likely experience
plenty of problems with the Icinga versions introducing these changes.
Many of these changes would likely break add-ons which rely on the NEB
API and other core internals.
From a developer standpoint this may be justifiable in order to get to a
better end-product. However, for (business) users spending time on
getting familiar with these changes for each new version may become
quite frustrating and may easily cause users to lose their confidence in
Icinga.
Nagios™ 4 is currently following this approach and it remains to be seen
how this fares with its users.
Instead the Icinga project will maintain two active development
branches. There will be one branch for Icinga 1.x which focuses on
improving the existing Icinga 1.x code base - just like it has been done
so far.
Independently from Icinga 1.x development on Icinga 2 will happen in a
separate branch and some of the long-term design goals will be outlined
in this document. Status updates for Icinga 2 will be posted on the
project website (www.icinga.org) as they become available.
Code Quality
============
Icinga 2 will not be using any code from the Icinga 1.x branch due to
the rampant code quality issues with the existing code base. However, an
important property of the Icinga development process has always been to
rely on proven technologies and Icinga 2 will be no exception.
A lot of effort has gone into designing a maintainable architecture for
Icinga 2 and making sure that algorithmic choices are in alignment with
our scalability goals for Icinga 2.
There are plans to implement unit tests for most Icinga 2 features in
order to make sure that changes to the code base do not break things
that were known to work before.
Language Choice
===============
Icinga 1.x is written in C and while in general C has quite a number of
advantages (e.g. performance and relatively easy portability to other
\*NIX- based platforms) some of its disadvantages show in the context of
a project that is as large as Icinga.
With a complex software project like Icinga an object-oriented design
helps tremendously with keeping things modular and making changes to the
existing code easier.
While it is true that you can write object-oriented software in C (the
Linux kernel is one of the best examples of how to do that) a truly
object-oriented language makes the programmers' life just a little bit
easier.
For Icinga 2 we have chosen C++ as the main language. This decision was
influenced by a number of criteria including performance, support on
different platforms and general user acceptability.
In general there is nothing wrong with other languages like Java, C\# or
Python; however - even when ignoring technical problems for just a
moment - in a community as conservative as the monitoring community
these languages seem out of place.
Knowing that users will likely want to run Icinga 2 on older systems
(which are still fully vendor-supported even for years to come) we will
make every effort to ensure that Icinga 2 can be built and run on
commonly used operating systems and refrain from using new and exotic
features like C++11.
Unlike Icinga 1.x there will be Windows support for Icinga 2. Some of
the compatibility features (e.g. the command pipe) which rely on \*NIX
features may not be supported on Windows but all new features will be
designed in such a way as to support \*NIX as well as Windows.
Configuration
=============
Icinga 1.x has a configuration format that is fully backwards-compatible
to the Nagios™ configuration format. This has the advantage of allowing
users to easily upgrade their existing Nagios™ installations as well as
downgrading if they choose to do so (even though this is generally not
the case).
The Nagios™ configuration format has evolved organically over time and
for the most part it does what its supposed to do. However this
evolutionary process has brought with it a number of problems that make
it difficult for new users to understand the full breadth of available
options and ways of setting up their monitoring environment.
Experience with other configuration formats like the one used by Puppet
has shown that it is often better to have a single "right" way of doing
things rather than having multiple ways like Nagios™ does (e.g. defining
host/service dependencies and parent/child relationships for hosts).
Icinga 2 tries to fix those issues by introducing a new object-based
configuration format that is heavily based on templates and supports
user-friendly features like freely definable macros.
External Interfaces
===================
While Icinga 1.x has easily accessible interfaces to its internal state
(e.g. status.dat, objects.cache and the command pipe) there is no
standards-based way of getting that information.
For example, using Icingas status information in a custom script
generally involves writing a parser for the status.dat format and there
are literally dozens of Icinga-specific status.dat parsers out there.
While Icinga 2 will support these legacy interfaces in order to make
migration easier and allowing users to use the existing CGIs and
whatever other scripts they may have Icinga 2 will focus on providing a
unified interface to Icingas state and providing similar functionality
to that provided by the command pipe in Icinga 1.x. The exact details
for such an interface are yet to be determined but this will likely be
an RPC interface based on one of the commonly used web-based remoting
technologies.
Icinga 1.x exports historical data using the IDO database interface
(Icinga Data Output). Icinga 2 will support IDO in a
backwards-compatible fashion in order to support icinga-web.
Additionally there will be a newly-designed backend for historical data
which can be queried using the built-in API when available. Effort will
be put into making this new data source more efficient for use with SLA
reporting.
Icinga 2 will also feature dynamic reconfiguration using the API which
means users can create, delete and update any configuration object (e.g.
hosts and services) on-the-fly. Based on the API there are plans to
implement a command-line configuration tool similar to what Pacemaker
has with "crm". Later on this API may also be used to implement
auto-discovery for new services.
The RPC interface may also be used to receive events in real-time, e.g.
when service checks are being executed or when a services state
changes. Some possible uses of this interface would be to export
performance data for services (RRD, graphite, etc.) or general log
information (logstash, graylog2, etc.).
Checks
======
In Icinga 2 services are the only checkable objects. Hosts only have a
calculated state and no check are ever run for them.
In order to maintain compatibility with the hundreds of existing check
plugins for Icinga 1.x there will be support for Nagios™-style checks.
The check interface however will be modular so that support for other
kinds of checks can be implemented later on (e.g. built-in checks for
commonly used services like PING, HTTP, etc. in order to avoid spawning
a process for each check).
Based on the availability of remote Icinga 2 instances the core can
delegate execution of service checks to them in order to support
large-scale distributed setups with a minimal amount of maintenance.
Services can be assigned to specific check instances using configuration
settings.
Notifications
=============
Event handlers and notifications will be supported similar to Icinga
1.x. Thanks to the dynamic configuration it is possible to easily adjust
the notification settings at runtime (e.g. in order to implement on-call
rotation).
Scalability
===========
Icinga 1.x has some serious scalability issues which explains why there
are several add-ons which try to improve the cores check performance.
One of these add-ons is mod\_gearman which can be used to distribute
checks to multiple workers running on remote systems.
A problem that remains is the performance of the core when processing
check results. Scaling Icinga 1.x beyond 25.000 services proves to be a
challenging problem and usually involves setting up a cascade of Icinga
1.x instances and dividing the service checks between those instances.
This significantly increases the maintenance overhead when updating the
configuration for such a setup.
Icinga 2 natively supports setting up multiple Icinga 2 instances in a
cluster to distribute work between those instances. Independent tasks
(e.g. performing service checks, sending notifications, updating the
history database, etc.) are implemented as components which can be
loaded for each instance. Configuration as well as program state is
automatically replicated between instances.
In order to support using Icinga 2 in a partially trusted environment
SSL is used for all network communication between individual instances.
Objects (like hosts and services) can be grouped into security domains
for which permissions can be specified on a per-instance basis (so e.g.
you can have a separate API or checker instance for a specific domain).
Agent-based Checks
==================
Traditionally most service checks have been performed actively, meaning
that check plugins are executed on the same server that is also running
Icinga. This works great for checking most network-based services, e.g.
PING and HTTP. However, there are a number of services which cannot be
checked remotely either because they are not network-based or because
firewall settings or network policies ("no unencrypted traffic")
disallow accessing these services from the network where Icinga is
running.
To solve this problem two add-ons have emerged, namely NRPE and NSCA.
NRPE can be thought of as a light-weight remote shell which allows the
execution of a restricted set of commands while supporting some
Nagios™-specific concepts like command timeouts. However unlike with the
design of commonly used protocols like SSH security in NRPE is merely an
afterthought.
In most monitoring setups all NRPE agents share the same secret key
which is embedded into the NRPE binary at compile time. This means that
users can extract this secret key from their NRPE agent binary and use
it to query sensitive monitoring information from other systems running
the same NRPE binary. NSCA has similar problems.
Based on Icinga 2s code for check execution there will be an agent
which can be used on \*NIX as well as on Windows platforms. The agent
will be using the same configuration format like Icinga 2 itself and
will support SSL and IPv4/IPv6 to communicate with Icinga 2.
Business Processes
==================
In most cases users dont care about the availability of individual
services but rather the aggregated state of multiple related services.
For example one might have a database cluster that is used for a web
shop. For an end-user the shop is available as long as at least one of
the database servers is working.
Icinga 1.x does not have any support for business processes out of the
box. There are several add-ons which implement business process support
for Icinga, however none of those are well-integrated into Icinga.
Icinga 2 will have native support for business processes which are built
right into the core and can be configured in a similar manner to
Nagios™-style checks. Users can define their own services based on
business rules which can be used as dependencies for other hosts or
services.
Logging
=======
Icinga 2 supports file-based logged as well as syslog (on \*NIX) and
event log (on Windows). Additionally Icinga 2 supports remote logging to
a central Icinga 2 instance.