[openstack-dev] [Fuel] Wiping node's disks on delete

Alex Schultz aschultz at mirantis.com
Fri Mar 25 14:00:49 UTC 2016


On Fri, Mar 25, 2016 at 7:32 AM, Dmitry Guryanov <dguryanov at mirantis.com>
wrote:

> Here is the bug which I'm trying to fix -
> https://bugs.launchpad.net/fuel/+bug/1538587.
>
> In  VMs (set up with fuel-virtualbox) kernel panic occurs every time you
> delete node, stack trace shows error in ext4 driver [1].
> The same as in the bug.
>
> Here is a patch - https://review.openstack.org/297669 . I've checked it
> with virtual box VMs and it works fine.
>
> I propose also don't reboot nodes in case of kernel panic, so that we'll
> catch possible errors, but maybe it's too dangerous before release.
>
>
The panic is in there to prevent controllers from staying active with a bad
disk. If the file system on a controller goes RO, the node stays in the
cluster and causes errors with the openstack deployment.  The node erase
code tries to disable this prior to erasing the disk so if it's not working
we need to fix that, not remove it.

Thanks,
-Alex


> [1]
> [13607.545119] EXT4-fs error (device dm-0) in
> ext4_reserve_inode_write:4928: IO failure
> [13608.157968] EXT4-fs error (device dm-0) in
> ext4_reserve_inode_write:4928: IO failure
> [13608.780695] EXT4-fs error (device dm-0) in
> ext4_reserve_inode_write:4928: IO failure
> [13609.471245] Aborting journal on device dm-0-8.
> [13609.478549] EXT4-fs error (device dm-0) in ext4_dirty_inode:5047: IO
> failure
> [13610.069244] EXT4-fs error (device dm-0) in ext4_dirty_inode:5047: IO
> failure
> [13610.698915] Kernel panic - not syncing: EXT4-fs (device dm-0): panic
> forced after error
> [13610.698915]
> [13611.060673] CPU: 0 PID: 8676 Comm: systemd-udevd Not tainted
> 3.13.0-83-generic #127-Ubuntu
> [13611.236566] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS
> VirtualBox 12/01/2006
> [13611.887198]  00000000fffffffb ffff88003b6e9a08 ffffffff81725992
> ffffffff81a77878
> [13612.527154]  ffff88003b6e9a80 ffffffff8171e80b ffffffff00000010
> ffff88003b6e9a90
> [13613.037061]  ffff88003b6e9a30 ffff88003b6e9a50 ffff8800367f2ad0
> 0000000000000040
> [13613.717119] Call Trace:
> [13613.927162]  [<ffffffff81725992>] dump_stack+0x45/0x56
> [13614.306858]  [<ffffffff8171e80b>] panic+0xc8/0x1e1
> [13614.767154]  [<ffffffff8125e7c6>] ext4_handle_error.part.187+0xa6/0xb0
> [13615.187201]  [<ffffffff8125eddb>] __ext4_std_error+0x7b/0x100
> [13615.627960]  [<ffffffff81244c64>] ext4_reserve_inode_write+0x44/0xa0
> [13616.007943]  [<ffffffff81247f80>] ? ext4_dirty_inode+0x40/0x60
> [13616.448084]  [<ffffffff81244d04>] ext4_mark_inode_dirty+0x44/0x1f0
> [13616.917611]  [<ffffffff8126f7f9>] ? __ext4_journal_start_sb+0x69/0xe0
> [13617.367730]  [<ffffffff81247f80>] ext4_dirty_inode+0x40/0x60
> [13617.747567]  [<ffffffff811e858a>] __mark_inode_dirty+0x10a/0x2d0
> [13618.088060]  [<ffffffff811d94e1>] update_time+0x81/0xd0
> [13618.467965]  [<ffffffff811d96f0>] file_update_time+0x80/0xd0
> [13618.977649]  [<ffffffff811511f0>] __generic_file_aio_write+0x180/0x3d0
> [13619.467993]  [<ffffffff81151498>] generic_file_aio_write+0x58/0xa0
> [13619.978080]  [<ffffffff8123c712>] ext4_file_write+0xa2/0x3f0
> [13620.467624]  [<ffffffff81158066>] ? free_hot_cold_page_list+0x46/0xa0
> [13621.038045]  [<ffffffff8115d400>] ? release_pages+0x80/0x210
> [13621.408080]  [<ffffffff811bdf5a>] do_sync_write+0x5a/0x90
> [13621.818155]  [<ffffffff810e52f6>] do_acct_process+0x4e6/0x5c0
> [13622.278005]  [<ffffffff810e5a91>] acct_process+0x71/0xa0
> [13622.597617]  [<ffffffff8106a3cf>] do_exit+0x80f/0xa50
> [13622.968015]  [<ffffffff811c041e>] ? ____fput+0xe/0x10
> [13623.337738]  [<ffffffff8106a68f>] do_group_exit+0x3f/0xa0
> [13623.738020]  [<ffffffff8106a704>] SyS_exit_group+0x14/0x20
> [13624.137447]  [<ffffffff8173659d>] system_call_fastpath+0x1a/0x1f
> [13624.518044] Rebooting in 10 seconds..
>
> On Tue, Mar 22, 2016 at 1:07 PM, Dmitry Guryanov <dguryanov at mirantis.com>
> wrote:
>
>> Hello,
>>
>> Here is a start of the discussion -
>> http://lists.openstack.org/pipermail/openstack-dev/2015-December/083021.html
>> . I've subscribed to this mailing list later, so can reply there.
>>
>> Currently we clear node's disks in two places. The first one is before
>> reboot into bootstrap image [0] and the second - just before provisioning
>> in fuel-agent [1].
>>
>> There are two problems, which should be solved with erasing first
>> megabyte of disk data: node should not boot from hdd after reboot and new
>> partitioning scheme should overwrite the previous one.
>>
>> The first problem could be solved with zeroing first 512 bytes of each
>> disk (not partition). Even 446 to be precise, because last 66 bytes are
>> partition scheme, see
>> https://wiki.archlinux.org/index.php/Master_Boot_Record .
>>
>> The second problem should be solved only after reboot into bootstrap.
>> Because if we bring a new node to the cluster from some other place and
>> boot it with bootstrap image it will possibly have disks with some
>> partitions, md devices and lvm volumes. So all these entities should be
>> correctly cleared before provisioning, not before reboot. And fuel-agent
>> does it in [1].
>>
>> I propose to remove erasing first 1M of each partiton, because it can
>> lead to errors in FS kernel drivers and kernel panic. An existing
>> workaround, that in case of kernel panic we do reboot is bad because it may
>> occur just after clearing first partition of the first disk and after
>> reboot bios will read MBR of the second disk and boot from it instead of
>> network. Let's just clear first 446 bytes of each disk.
>>
>>
>> [0]
>> https://github.com/openstack/fuel-astute/blob/master/mcagents/erase_node.rb#L162-L174
>> [1]
>> https://github.com/openstack/fuel-agent/blob/master/fuel_agent/manager.py#L194-L221
>>
>>
>> --
>> Dmitry Guryanov
>>
>
>
>
> --
> Dmitry Guryanov
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160325/20f04081/attachment.html>


More information about the OpenStack-dev mailing list