[openstack-dev] [fuel] What to do when a controller runs out of space

Alex Schultz aschultz at mirantis.com
Mon Oct 5 13:03:51 UTC 2015


On Mon, Oct 5, 2015 at 5:56 AM, Eugene Nikanorov
<enikanorov at mirantis.com> wrote:
> Ok,
>
> Project-wise:
> 1) Pacemaker is not under our company's control, we can't assure its quality
> 2) it has terrible UX
> 3) it is not reliable
>

I disagree with #1 as I do not agree that should be a criteria for an
open-source project.  Considering pacemaker is at the core of our
controller setup, I would argue that if these are in fact true we need
to be using something else.  I would agree that it is a terrible UX
but all the clustering software I've used fall in this category.  I'd
like more information on how it is not reliable. Do we have numbers to
backup these claims?

> (3) is not evaluation of the project itself, but just a logical consequence
> of (1) and (2).
> As a part of escalation team I can say that it has cost our team thousands
> of man hours of head-scratching, staring at pacemaker logs which value are
> usually slightly below zero.
>
> Most of openstack services (in fact, ALL api servers) are stateless, they
> don't require any cluster management (also, they don't need to be moved in
> case of lack of space).
> Statefull services like neutron agents have their states being a function of
> db state and are able to syncronize it with the server without external
> "help".
>

So it's not an issue with moving services so much as being able to
stop the services when a condition is met. Have we tested all OS
services to ensure they do function 100% when out of disk space?  I
would assume that glance might have issues with image uploads if there
is no space to handle a request.

> So now usage of pacemaker can be only justified for cases where service's
> clustering mechanism requires active monitoring (rabbitmq, galera)
> But even there, examples when we are better off without pacemaker are all
> around.
>
> Thanks,
> Eugene.
>

After I sent this email, I had further discussions around the issues
that I'm facing and it may not be completely related to disk space. I
think we might be relying on the expectation that the local rabbitmq
is always available but I need to look into that. Either way, I
believe we still should continue to discuss this issue as we are
managing services in multiple ways on a single host. Additionally I do
not believe that we really perform quality health checks on our
services.

Thanks,
-Alex


>
> On Mon, Oct 5, 2015 at 1:34 PM, Sergey Vasilenko <svasilenko at mirantis.com>
> wrote:
>>
>>
>> On Mon, Oct 5, 2015 at 12:22 PM, Eugene Nikanorov
>> <enikanorov at mirantis.com> wrote:
>>>
>>> No pacemaker for os services, please.
>>> We'll be moving out neutron agents from pacemaker control in 8.0, other
>>> os services don't need it too.
>>
>>
>> could you please provide your arguments.
>>
>>
>> /sv
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>



More information about the OpenStack-dev mailing list