[openstack-dev] [Fuel] Stop deployment can break production cluster. How we should avoid it?

Igor Kalnitsky ikalnitsky at mirantis.com
Fri Jan 22 17:09:46 UTC 2016


Dmitry,

> We can mark a cluster 'operational' after successful deployment. And we
> can disable 'stop' button on this kind of clusters.

I think this is a best solution so far. Moreover, I don't know how to
fix it properly since there could be a lot of questions how this
button should behave at all.

Taking into account all this, I propose to solve this issue as a
blueprint (so we can think and cover all edge cases in the spec) or
drop stop button functionality at all.

The latest, perhaps, may be a good solution. I don't know how often
someone use Stop deployment.


Bogdan,

> This is the critical issue. The *worst* of possible situations for
> cluster operations. I believe this should be covered by a dedicated
> bulletin issued, the stop action shall be disabled for all releases as
> emergency fix, and fixed by next maintenance updates.

It wasn't always the case. Some time ago we didn't execute any tasks
on controllers when adding new nodes. It's become a case, I assume,
since Fuel 8.0, when we start executing netconfig and other puppet
task on each deployment run.

So we need to investigate in which release we have introduced
re-execution some tasks on controllers, and only then thinking about
bulletins.


Thanks,
Igor

On Fri, Jan 22, 2016 at 1:06 PM, Bogdan Dobrelya <bdobrelia at mirantis.com> wrote:
> On 22.01.2016 11:45, Dmitry Pyzhov wrote:
>> Guys,
>>
>> There is a tricky bug with our 'stop deployment'
>> feature: https://bugs.launchpad.net/fuel/+bug/1529691
>>
>> It cannot be fixed easily because it is a design flaw. By design we
>> cannot leave a node in unpredictable state. So we move all nodes that
>> are not in ready state back to bootstrap.
>>
>> But when user adding a node and deploying cluster system reruns puppet
>> on controllers. If user press 'stop' button controllers will be erased.
>> Cluster will be destroyed. Definitely this is not expected behaviour.
>
> This is the critical issue. The *worst* of possible situations for
> cluster operations. I believe this should be covered by a dedicated
> bulletin issued, the stop action shall be disabled for all releases as
> emergency fix, and fixed by next maintenance updates.
>
>>
>> Taking into account that we are going to rewrite this feature in 9.0 and
>> we are close to HCF there is no value in major changes for this feature
>> in 8.0. Let's do a simple workaround.
>>
>> We can mark a cluster 'operational' after successful deployment. And we
>> can disable 'stop' button on this kind of clusters.
>>
>> Any concerns or other proposals?
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
> --
> Best regards,
> Bogdan Dobrelya,
> Irc #bogdando
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list