[nova][os-brick] iSCSI multipath oddness during hard reboot

Tony Pearce tonyppe at gmail.com
Thu Oct 29 01:44:56 UTC 2020


On Wed, 28 Oct 2020 at 18:06, Lee Yarwood <lyarwood at redhat.com> wrote:

> On 28-10-20 17:35:59, Tony Pearce wrote:
> > Grant,
> >
> > As a guess I am suspecting your "fail_if_no_path" might be the issue but
> I
> > am not sure on the inner workings or mechanism at play during the reboot
> or
> > why it's getting stuck here for you. Your storage vendor may have
> > documentation to state what the multipath (and iscsid) config should be
> > from your host. Before changing config though I recommend getting the
> root
> > cause realised.  /var/log/messages log could help.
>
> Did you mean queue_if_no_path?


Yes indeed, my apologies. I no longer have "queue_if_no_path" in my config.

Best regards,

Tony Pearce



On Wed, 28 Oct 2020 at 18:06, Lee Yarwood <lyarwood at redhat.com> wrote:

> On 28-10-20 17:35:59, Tony Pearce wrote:
> > Grant,
> >
> > As a guess I am suspecting your "fail_if_no_path" might be the issue but
> I
> > am not sure on the inner workings or mechanism at play during the reboot
> or
> > why it's getting stuck here for you. Your storage vendor may have
> > documentation to state what the multipath (and iscsid) config should be
> > from your host. Before changing config though I recommend getting the
> root
> > cause realised.  /var/log/messages log could help.
>
> Did you mean queue_if_no_path?
>
> > Also if you go into the multipath CLI "multipathd -k" and issue "show
> > config" you may see a "NETAPP" config there already. Depending on the IDs
> > your storage may be matching that rather than the default config within
> > multipath.conf FYI.
>
> So Nova will ask os-brick to try to disconnect volumes during a hard
> reboot of an instance and I suspect this is where things are getting
> stuck in your env if you're using queue_if_no_path.
>
> Assuming you're using the libvirt virt driver has the underlying domain
> for the instance been destroyed already?
>
> $ sudo virsh dominfo $instance_uuid
>
> If it has been then we might be able to cleanup the volume manually.
>
> Either way it might be useful to raise a bug for this against Nova and
> os-brick so we can take a look at the attempt to hard reboot in more
> detail.
>
> https://launchpad.net/nova/+filebug
>
> ^ Please use the template underneath the futher information textbox once
> you've provided a title and if possible include the additional output
> somewhere for review.
>
> $ openstack server event list $instance_uuid
>
> ^ This will provide a list of actions and their associated request-ids.
> Using the request-id assocaited with the failing hard reboot can you
> then provide logs from the compute.
>
> $ zgrep -l $request-id /var/log/nova/*
>
> ^ Obviously this depends on how logging is enabled in your env but you
> hopefully get the idea.
>
> > On Wed, 28 Oct 2020 at 15:56, Grant Morley <grant at civo.com> wrote:
> >
> > > Hi Tony,
> > >
> > > We are using NetApp SolidFire for our storage. Instances seem to be in
> a
> > > normal "working" state before we try and reboot them.
> > >
> > > I haven't looked into `/usr/bin/rescan-scsi-bus.sh` but I will now so
> > > thanks for that.
> > >
> > > We are using multipath but kept it on the defaults so it looks like
> only 1
> > > path is being used.
> > >
> > > I had a feeling it was down to heavily loaded compute causing the
> issue.
> > >
> > > The config for iscsi is also the defaults from which openstack Ansible
> > > deployed.
> > >
> > > Thanks for your help.
> > >
> > > Grant
> > > On 28/10/2020 02:25, Tony Pearce wrote:
> > >
> > > Hi Grant, what storage are you using here? Is the instance in an
> > > apparently "working" state before you try and reboot it?
> > >
> > > Have you looked into `/usr/bin/rescan-scsi-bus.sh` ? Please see this
> > > reference link in the first instance: [1] "When ‘rescan-scsi-bus.sh
> -i’ is
> > > run, script execute as well a LIP_RESET (ISSUE_LIP) which may cause a
> > > disruption in I/O on the server and even cause an outage in case of a
> > > system running on heavy load."
> > >
> > > Are you using multipath? Some helpful commands:
> > >
> > > `tail -f /var/log/messages | grep multipath`
> > >
> > > `multipathd -k` = will go into mutipath cli. Then while in the cli:
> > > show config
> > > show paths
> > >
> > > If the cli is accessible then you're likely using multipath even if 1
> > > path. Then the multipath.conf is taking effect even if it's a default
> > > config.
> > >
> > > Config files relating to iscsi storage:
> > > /etc/iscsi/iscsid.conf
> > > /etc/multipath/multipath.conf
> > >
> > > [1]
> > >
> https://www.thegeekdiary.com/when-to-use-rescan-scsi-bus-sh-i-lip-flag-in-centos-rhel/
> > >
> > > Regards,
> > >
> > > Tony Pearce
> > >
> > >
> > >
> > > On Wed, 28 Oct 2020 at 03:39, Grant Morley <grant at civo.com> wrote:
> > >
> > >> Hi all,
> > >>
> > >> We are seeing some oddness on a couple of our compute hosts that
> seems to
> > >> be related to iSCSI. On a couple of our hosts I am seeing this error
> in the
> > >> nova compute logs:
> > >>
> > >> 2020-10-27 18:56:14.814 31490 WARNING
> os_brick.initiator.connectors.iscsi
> > >> [req-8613ae69-1661-49cf-8bdc-6fec875d01ba - - - - -] Couldn't find
> iscsi
> > >> sessions because iscsiadm err: iscsiadm: could not read session
> targetname:
> > >> 5
> > >> iscsiadm: could not find session info for session1707
> > >>
> > >> That seems to also stop any instance on the compute host from being
> able
> > >> to reboot.  Reboots seem to get accepted but the instance never
> completes
> > >> and gets stuck in the reboot state:
> > >>
> > >> 2020-10-27 19:11:58.891 48612 INFO nova.compute.manager [-] [instance:
> > >> c8079e85-4777-4615-9d5a-3d1151e11984] During sync_power_state the
> instance
> > >> has a pending task (reboot_started_hard). Skip.
> > >> 2020-10-27 19:11:58.891 48612 INFO nova.compute.manager [-] [instance:
> > >> 31128f26-910d-411f-98e0-c95dd36f4f0f] During sync_power_state the
> instance
> > >> has a pending task (reboot_started_hard). Skip.
> > >>
> > >> Does anyone know of a way to resolve this without rebooting the entire
> > >> compute host? I can't see any other issues other than the fact there
> is
> > >> this iSCSI error which in turn seems to stop nova from processing
> anything
> > >> for any instance.
> > >>
> > >> Any advice would be much appreciated.
> > >>
> > >> Regards,
>
> --
> Lee Yarwood                 A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672
> 2D76
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201029/fb0d3540/attachment-0001.html>


More information about the openstack-discuss mailing list