[openstack-dev] [tripleo] ironic automated cleaning by default?

Dmitry Tantsur dtantsur at redhat.com
Wed Apr 25 14:55:46 UTC 2018


On 04/25/2018 04:26 PM, James Slagle wrote:
> On Wed, Apr 25, 2018 at 9:14 AM, Dmitry Tantsur <dtantsur at redhat.com> wrote:
>> Hi all,
>>
>> I'd like to restart conversation on enabling node automated cleaning by
>> default for the undercloud. This process wipes partitioning tables
>> (optionally, all the data) from overcloud nodes each time they move to
>> "available" state (i.e. on initial enrolling and after each tear down).
>>
>> We have had it disabled for a few reasons:
>> - it was not possible to skip time-consuming wiping if data from disks
>> - the way our workflows used to work required going between manageable and
>> available steps several times
>>
>> However, having cleaning disabled has several issues:
>> - a configdrive left from a previous deployment may confuse cloud-init
>> - a bootable partition left from a previous deployment may take precedence
>> in some BIOS
>> - an UEFI boot partition left from a previous deployment is likely to
>> confuse UEFI firmware
>> - apparently ceph does not work correctly without cleaning (I'll defer to
>> the storage team to comment)
>>
>> For these reasons we don't recommend having cleaning disabled, and I propose
>> to re-enable it.
>>
>> It has the following drawbacks:
>> - The default workflow will require another node boot, thus becoming several
>> minutes longer (incl. the CI)
>> - It will no longer be possible to easily restore a deleted overcloud node.
> 
> I'm trending towards -1, for these exact reasons you list as
> drawbacks. There has been no shortage of occurrences of users who have
> ended up with accidentally deleted overclouds. These are usually
> caused by user error or unintended/unpredictable Heat operations.
> Until we have a way to guarantee that Heat will never delete a node,
> or Heat is entirely out of the picture for Ironic provisioning, then
> I'd prefer that we didn't enable automated cleaning by default.
> 
> I believe we had done something with policy.json at one time to
> prevent node delete, but I don't recall if that protected from both
> user initiated actions and Heat actions. And even that was not enabled
> by default.
> 
> IMO, we need to keep "safe" defaults. Even if it means manually
> documenting that you should clean to prevent the issues you point out
> above. The alternative is to have no way to recover deleted nodes by
> default.

Well, it's not clear what is "safe" here: protect people who explicitly delete 
their stacks or protect people who don't realize that a previous deployment may 
screw up their new one in a subtle way.

> 
> 
> 
> 




More information about the OpenStack-dev mailing list