[nova][os-brick] iSCSI multipath oddness during hard reboot

Lee Yarwood lyarwood at redhat.com
Wed Oct 28 10:06:06 UTC 2020


On 28-10-20 17:35:59, Tony Pearce wrote:
> Grant,
> 
> As a guess I am suspecting your "fail_if_no_path" might be the issue but I
> am not sure on the inner workings or mechanism at play during the reboot or
> why it's getting stuck here for you. Your storage vendor may have
> documentation to state what the multipath (and iscsid) config should be
> from your host. Before changing config though I recommend getting the root
> cause realised.  /var/log/messages log could help.

Did you mean queue_if_no_path?
 
> Also if you go into the multipath CLI "multipathd -k" and issue "show
> config" you may see a "NETAPP" config there already. Depending on the IDs
> your storage may be matching that rather than the default config within
> multipath.conf FYI.

So Nova will ask os-brick to try to disconnect volumes during a hard
reboot of an instance and I suspect this is where things are getting
stuck in your env if you're using queue_if_no_path.

Assuming you're using the libvirt virt driver has the underlying domain
for the instance been destroyed already?

$ sudo virsh dominfo $instance_uuid

If it has been then we might be able to cleanup the volume manually.

Either way it might be useful to raise a bug for this against Nova and
os-brick so we can take a look at the attempt to hard reboot in more
detail.

https://launchpad.net/nova/+filebug

^ Please use the template underneath the futher information textbox once
you've provided a title and if possible include the additional output
somewhere for review.

$ openstack server event list $instance_uuid

^ This will provide a list of actions and their associated request-ids.
Using the request-id assocaited with the failing hard reboot can you
then provide logs from the compute.

$ zgrep -l $request-id /var/log/nova/*

^ Obviously this depends on how logging is enabled in your env but you
hopefully get the idea.

> On Wed, 28 Oct 2020 at 15:56, Grant Morley <grant at civo.com> wrote:
> 
> > Hi Tony,
> >
> > We are using NetApp SolidFire for our storage. Instances seem to be in a
> > normal "working" state before we try and reboot them.
> >
> > I haven't looked into `/usr/bin/rescan-scsi-bus.sh` but I will now so
> > thanks for that.
> >
> > We are using multipath but kept it on the defaults so it looks like only 1
> > path is being used.
> >
> > I had a feeling it was down to heavily loaded compute causing the issue.
> >
> > The config for iscsi is also the defaults from which openstack Ansible
> > deployed.
> >
> > Thanks for your help.
> >
> > Grant
> > On 28/10/2020 02:25, Tony Pearce wrote:
> >
> > Hi Grant, what storage are you using here? Is the instance in an
> > apparently "working" state before you try and reboot it?
> >
> > Have you looked into `/usr/bin/rescan-scsi-bus.sh` ? Please see this
> > reference link in the first instance: [1] "When ‘rescan-scsi-bus.sh -i’ is
> > run, script execute as well a LIP_RESET (ISSUE_LIP) which may cause a
> > disruption in I/O on the server and even cause an outage in case of a
> > system running on heavy load."
> >
> > Are you using multipath? Some helpful commands:
> >
> > `tail -f /var/log/messages | grep multipath`
> >
> > `multipathd -k` = will go into mutipath cli. Then while in the cli:
> > show config
> > show paths
> >
> > If the cli is accessible then you're likely using multipath even if 1
> > path. Then the multipath.conf is taking effect even if it's a default
> > config.
> >
> > Config files relating to iscsi storage:
> > /etc/iscsi/iscsid.conf
> > /etc/multipath/multipath.conf
> >
> > [1]
> > https://www.thegeekdiary.com/when-to-use-rescan-scsi-bus-sh-i-lip-flag-in-centos-rhel/
> >
> > Regards,
> >
> > Tony Pearce
> >
> >
> >
> > On Wed, 28 Oct 2020 at 03:39, Grant Morley <grant at civo.com> wrote:
> >
> >> Hi all,
> >>
> >> We are seeing some oddness on a couple of our compute hosts that seems to
> >> be related to iSCSI. On a couple of our hosts I am seeing this error in the
> >> nova compute logs:
> >>
> >> 2020-10-27 18:56:14.814 31490 WARNING os_brick.initiator.connectors.iscsi
> >> [req-8613ae69-1661-49cf-8bdc-6fec875d01ba - - - - -] Couldn't find iscsi
> >> sessions because iscsiadm err: iscsiadm: could not read session targetname:
> >> 5
> >> iscsiadm: could not find session info for session1707
> >>
> >> That seems to also stop any instance on the compute host from being able
> >> to reboot.  Reboots seem to get accepted but the instance never completes
> >> and gets stuck in the reboot state:
> >>
> >> 2020-10-27 19:11:58.891 48612 INFO nova.compute.manager [-] [instance:
> >> c8079e85-4777-4615-9d5a-3d1151e11984] During sync_power_state the instance
> >> has a pending task (reboot_started_hard). Skip.
> >> 2020-10-27 19:11:58.891 48612 INFO nova.compute.manager [-] [instance:
> >> 31128f26-910d-411f-98e0-c95dd36f4f0f] During sync_power_state the instance
> >> has a pending task (reboot_started_hard). Skip.
> >>
> >> Does anyone know of a way to resolve this without rebooting the entire
> >> compute host? I can't see any other issues other than the fact there is
> >> this iSCSI error which in turn seems to stop nova from processing anything
> >> for any instance.
> >>
> >> Any advice would be much appreciated.
> >>
> >> Regards,

-- 
Lee Yarwood                 A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 488 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201028/3ae4baee/attachment.sig>


More information about the openstack-discuss mailing list