168 lines
7.2 KiB
XML
168 lines
7.2 KiB
XML
<?xml version="1.0" encoding="utf-8"?>
|
|
<chapter>
|
|
<title>Pandora FMS advanced section</title>
|
|
<sect1><title>Pandora FMS High Availability features</title>
|
|
<para>
|
|
You may setup Pandora for use HA in several scenarios:
|
|
<itemizedlist mark='bullet'>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Database Clustering for HA</emphasis>. You need to
|
|
setup a MySQL5 Cluster. In support forums / wiki are
|
|
information to make this, you only need to convert DB schema
|
|
in a MySQL Cluster compatible tables. This scenario has been
|
|
tested and works fine, but you need some advanced knowledge
|
|
about MySQL Clustering administration.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Multiple Pandora Console</emphasis>. It's easy,
|
|
you only need to setup another one. No locking problems or
|
|
incompatibility has been detected in several Pandora FMS
|
|
deployments.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Multiple Pandora Data Servers for HA
|
|
</emphasis>. This is the more complex scenario, because you
|
|
don't need to know anything special about Pandora Server
|
|
setup, and you need to use of another tool to implement
|
|
Network HA, like VRRP or Keepalive. For Pandora Data server
|
|
you need to setup two identical machines, with the same
|
|
public keys for all agents connecting to server (and
|
|
duplicate server SSH host key). And setup Network HA to
|
|
point one of them. If one fails, VRRP or Keepalive "promote"
|
|
the other server up, and Pandora Agents, will connect it for
|
|
the next data packets. There is no need to change anything
|
|
in each of Pandora Data server, only need ensure that
|
|
Pandora Server name is the same on both machines (in pandora
|
|
server setup, not in the system hostname).
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Multiple Pandora Network Servers for HA
|
|
</emphasis>. This is easier. You need to setup multiple
|
|
network servers in several machines across your network (or
|
|
all of them in the same segment), and assign modules to the
|
|
same server. If this servers fails, and there are other
|
|
Network Servers active, marked as "primary" server,
|
|
automatically, the first network server available marked as
|
|
"Primary" will launch the network module query. If you have many
|
|
servers marked as "primary", any of them could launch query.
|
|
</para>
|
|
</listitem>
|
|
<listitem>
|
|
<para>
|
|
<emphasis>Multiple Pandora Network Servers for load
|
|
balancing. </emphasis>. You need to setup multiple network
|
|
servers in several machines across your network (or all of
|
|
them in the same segment), and assign agent/modules to
|
|
different servers, balancing load between all
|
|
servers available.
|
|
</para>
|
|
</listitem>
|
|
</itemizedlist>
|
|
</para>
|
|
</sect1>
|
|
<sect1><title>Pandora virtual servers</title>
|
|
<para>
|
|
An special case for implement more processing power in servers
|
|
could be to implement "virtual" servers. Using virtual servers
|
|
(another instance of the same server in the same machine) is used
|
|
when Pandora Server cannot process all information without delay
|
|
too much. Pandora 1.2 it's using a limited number of threads to
|
|
process information (this will change in future versions), so you
|
|
can install another instance of Pandora Network or Pandora Data
|
|
server (with another data_in directory!), to be able to process
|
|
more information with the same machine.
|
|
</para>
|
|
</sect1>
|
|
|
|
<sect1><title>Pandora Database design (and redesign from 1.1)</title>
|
|
<para>
|
|
First Pandora versions, from 0.83 until 1.1 was based on a simple
|
|
idea: one data, one database insertion. This was very easy to
|
|
develop and allowed to program easily searches, insertions and other
|
|
operations.
|
|
</para>
|
|
<para>
|
|
This had many advantages and a big problem: the
|
|
scalability. This system has a limit defined in maximum number of
|
|
modules that could support in a "easy" way, from that number of
|
|
modules the management was too slow.
|
|
</para>
|
|
<para>
|
|
Solutions based on MySQL cluster was difficult and cames with some
|
|
problems and they did not offer either a solution in the long
|
|
term.
|
|
</para>
|
|
<para>
|
|
Data compression based on interpolation and data purge, makes a
|
|
smaller database, but this was not enough. Production systems has
|
|
a limit based on 100 agents, with about ten modules each one. This
|
|
was not a high limit for large environments.
|
|
</para>
|
|
<para>
|
|
This problem was very important for Pandora future, so we
|
|
changed the way Pandora store its data. New data management system
|
|
store only "new" data. If a duplicate value enter the system, it
|
|
won't be stored in database. It's very useful to keep database
|
|
small. This works for all pandora data modules: numerical,
|
|
incremental, boolean and string.
|
|
</para>
|
|
<para>
|
|
This solves part of scalability problem reducing considerably
|
|
database usage, in about 40%-70%. We also have another solution
|
|
for scalability problems: total segregation of components in
|
|
Pandora and a built-in method to implement High Availability
|
|
solutions on Pandora components. You may have many Pandora
|
|
servers (network, data or SNMP), Pandora Consoles, and Pandora
|
|
Database (in a MySQL5 Cluster setup).
|
|
</para>
|
|
<para>
|
|
Changes come with some different ways to reading data. With new
|
|
version, if an agent cannot communicate with Pandora, and Pandora
|
|
Server doesn't receive data from agent, this "no data" cannot have
|
|
a graphical representation, for module graph there will be no
|
|
changes. You will have a graph with a perfect horizontal
|
|
line. Pandora, if doesn't receive new values, thinks that there
|
|
are no new values, and everything seems to be as in the last
|
|
notification.
|
|
</para>
|
|
<para>
|
|
This graph, for example, shows changes for each data, received every
|
|
180 seconds.
|
|
|
|
<graphic fileref="images/module_graph_full.jpg" scale="60" align="center"/>
|
|
|
|
This would be the equivalent graph for the same data, except a
|
|
connection failure, from 05:55 to 15:29 aproximately.
|
|
|
|
<graphic fileref="images/module_graph_peak.jpg" scale="60" align="center"/>
|
|
</para>
|
|
<para>
|
|
In Pandora 1.2 we introduce a new general agent graph for show
|
|
connectivity. It reflects access from modules to this agent. This
|
|
graph complements all other graphs showing when agent has activity
|
|
and it's receiving data. This is an example of an agent connecting
|
|
regulary to server:
|
|
<graphic fileref="images/access_graph_full.jpg" scale="65" align="center"/>
|
|
If you have low leaks in this graph, you may have some problems or slow
|
|
connections in Pandora Agent connectivity with Pandora Server. This graph
|
|
with previous example could have an aspect similar to this:
|
|
<graphic fileref="images/access_graph_peak.jpg" scale="65" align="center"/>
|
|
</para>
|
|
</sect1>
|
|
<sect1><title>Programmers guide to Pandora architecture</title>
|
|
<para>
|
|
<graphic fileref="images/Pandora_NetworkServer_Diagram.png" scale="65" align="center"/>
|
|
<graphic fileref="images/Pandora_DataServer_Diagram.png" scale="65" align="center"/>
|
|
<graphic fileref="images/Pandora_SNMP_Diagram.png" scale="55" align="center"/>
|
|
<graphic fileref="images/pandora_ER.png" scale="50" align="center"/>
|
|
</para>
|
|
</sect1>
|
|
</chapter>
|