[openstack-dev] [Fuel] Wipe of the nodes' disks
Andrew Woodward
xarses at gmail.com
Thu Jan 7 16:07:44 UTC 2016
On Tue, Dec 29, 2015 at 5:35 AM Sergii Golovatiuk <sgolovatiuk at mirantis.com>
wrote:
> Hi,
>
> Let me comment inline.
>
>
> On Mon, Dec 28, 2015 at 7:06 PM, Andrew Woodward <xarses at gmail.com> wrote:
>
>> In order to ensure that LVM can be configured as desired, its necessary
>> to purge them and then reboot the node, otherwise the partitioning commands
>> will most likely fail on the next attempt as they will be initialized
>> before we can start partitioning the node. Hence, when a node is removed
>> from the environment, it is supposed to have this data destroyed. Since
>> it's a running system, the most effective way was to blast the first 1Mb of
>> each partition. (with out many more reboots)
>>
>> As to the fallback to SSH, there are two times we use this process, with
>> the node reboot (after cobbler/IBP finishes), and with the wipe as we are
>> discussing here. These are for the odd occurrences of the nodes failing to
>> restart after the MCO command. I don't think anyone has had much success
>> trying to figure out why this occurs, but I've seen nodes get stuck in
>> provisioning and remove in multiple environments using 6.1 where they
>> managed to break the SSH Fallback. It would occur around 1/20 nodes
>> seemingly randomly. So with the SSH fallback I nearly never see the failure
>> in node reboot.
>>
>
> If we talk about 6.1-7.0 release there shouldn't be any problems with mco
> reboot. SSH fallback must be deprecated at all.
>
As I noted, I've see several 6.1 deployments where it was needed, I'd
consider it still very much in use. In other cases it might be necessary to
attempt to deal with a node who's MCO agent is dead, IMO they should be
kept.
>>
>
>>
>
>>
>
>> On Thu, Dec 24, 2015 at 6:28 AM Alex Schultz <aschultz at mirantis.com>
>> wrote:
>>
>>> On Thu, Dec 24, 2015 at 1:29 AM, Artur Svechnikov
>>> <asvechnikov at mirantis.com> wrote:
>>> > Hi,
>>> > We have faced the issue that nodes' disks are wiped after stop
>>> deployment.
>>> > It occurs due to the logic of nodes removing (this is old logic and
>>> it's not
>>> > actual already as I understand). This logic contains step which calls
>>> > erase_node[0], also there is another method with wipe of disks [1].
>>> AFAIK it
>>> > was needed for smooth cobbler provision and ensure that nodes will not
>>> be
>>> > booted from disk when it shouldn't. Instead of cobbler we use IBP from
>>> > fuel-agent where current partition table is wiped before provision
>>> stage.
>>> > And use disks wiping for insurance that nodes will not booted from disk
>>> > doesn't seem good solution. I want to propose not to wipe disks and
>>> simply
>>> > unset bootable flag from node disks.
>>>
>>
> Disks must be wiped as boot flag doesn't guarantee anything. If bootlag is
> not set, BIOS will ignore ignore the device in boot-order. More over, 2
> partitions may have bootflag or operator may set to skip boot-order in BIOS.
>
> >
>>> > Please share your thoughts. Perhaps some other components use the fact
>>> that
>>> > disks are wiped after node removing or stop deployment. If it's so,
>>> then
>>> > please tell about it.
>>> >
>>> > [0]
>>> >
>>> https://github.com/openstack/fuel-astute/blob/master/lib/astute/nodes_remover.rb#L132-L137
>>> > [1]
>>> >
>>> https://github.com/openstack/fuel-astute/blob/master/lib/astute/ssh_actions/ssh_erase_nodes.rb
>>> >
>>>
>>> I thought the erase_node[0] mcollective action was the process that
>>> cleared a node's disks after their removal from an environment. When
>>> do we use the ssh_erase_nodes? Is it a fall back mechanism if the
>>> mcollective fails? My understanding on the history is based around
>>> needing to have the partitions and data wiped so that the LVM groups
>>> and other partition information does not interfere with the
>>> installation process the next time the node is provisioned. That
>>> might have been a side effect of cobbler and we should test if it's
>>> still an issue for IBP.
>>>
>>
> Since we do not use classical provision anymore, we have mco connection
> all the time. During IBP we have it as part of bootstrap, after reboot, mco
> is still present so all actions should be done via mco.
>
>
>>
>>>
>>> Thanks,
>>> -Alex
>>>
>>> [0]
>>> https://github.com/openstack/fuel-astute/blob/master/mcagents/erase_node.rb
>>>
>>> > Best regards,
>>> > Svechnikov Artur
>>> >
>>> >
>>> __________________________________________________________________________
>>> > OpenStack Development Mailing List (not for usage questions)
>>> > Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> >
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe:
>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>> --
>>
>> --
>>
>> Andrew Woodward
>>
>> Mirantis
>>
>> Fuel Community Ambassador
>>
>> Ceph Community
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
--
--
Andrew Woodward
Mirantis
Fuel Community Ambassador
Ceph Community
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160107/f0e8a4e5/attachment.html>
More information about the OpenStack-dev
mailing list