[openstack-dev] [FUEL] Zabbix in HA mode

Andrew Woodward xarses at gmail.com
Mon Jan 5 21:05:50 UTC 2015

On Tue, Nov 25, 2014 at 5:21 AM, Bartosz Kupidura
<bkupidura at mirantis.com> wrote:
> Hello All,
> Im working on Zabbix implementation which include HA support.
> Zabbix server should be deployed on all controllers in HA mode.

This needs to be discouraged as much as putting mongo-db on the controllers.

> Currently we have dedicated role 'zabbix-server', which does not support more
> than one zabbix-server. Instead of this we will move monitoring solution (zabbix),
> as an additional component.

No, this must remain a separate role and can not be forced onto the
controllers the user should be discouraged from doing this. The
corosync code is quickly becoming granular enough to stand up a CRM
cluster elsewhere.

> We will introduce additional role 'zabbix-monitoring', assigned to all servers with
> lowest priority in serializer (run puppet after every other roles) when zabbix is
> enabled.
> 'Zabbix-monitoring' role will be assigned automatically.

Seems a good way to handle it, but would it run well for a plugin that
wants to be monitored (since they run after)

> When zabbix component is enabled, we will install zabbix-server on all controllers
> in active-backup mode (pacemaker+haproxy).

Again, not forced on controllers, this is very bad.


While there is development use cases to deploy monitoring on combined
controllers, and it can make use of the already existing pacemaker
cluster, this is the wrong direction to point users. There are many
reasons this is bad: for one, monitoring can become quite loaded, and
as we've seen secondary load on the controllers can collapse the
entire control plane. Secondly running monitoring on the cluster may
also result in the monitoring going offline if the cluster does, from
my own experience, not being able to see your monitoring is nearly
worse than having everything down and leads to lost precious moments
of downtime SLA.

HA Scaling:

Just like with controllers, our other HA components need to support a
scale of 1 to N. This is important as a cluster will need to scale, or
as the operator moves from POC to Production, they can deploy more
hardware. This also helps alleviate some of the not enough nodes
issues mentioned in the thread already

Ceph community

More information about the OpenStack-dev mailing list