<div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default"><font color="#666666" face="verdana, sans-serif">Also I am sending this as an FYI because I learned this the hard way :) "ISSUES WITH QUEUE_IF_NO_PATH FEATURE" [1]</font></div><div class="gmail_default"><font color="#666666" face="verdana, sans-serif"><br>[1] <a href="https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/queueifnopath_issues">https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/dm_multipath/queueifnopath_issues</a> </font><br></div><div class="gmail_default"><font color="#666666" face="verdana, sans-serif"><br></font></div><div><div dir="ltr" class="gmail_signature"><div dir="ltr">Tony Pearce<br><br></div></div></div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, 29 Oct 2020 at 09:44, Tony Pearce <<a href="mailto:tonyppe@gmail.com">tonyppe@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div dir="ltr"><div class="gmail_default" style="font-family:verdana,sans-serif;color:rgb(102,102,102)"><div dir="ltr" class="gmail_attr">On Wed, 28 Oct 2020 at 18:06, Lee Yarwood <<a href="mailto:lyarwood@redhat.com" target="_blank">lyarwood@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 28-10-20 17:35:59, Tony Pearce wrote:<br>

> Grant,<br>

> <br>

> As a guess I am suspecting your "fail_if_no_path" might be the issue but I<br>

> am not sure on the inner workings or mechanism at play during the reboot or<br>

> why it's getting stuck here for you. Your storage vendor may have<br>

> documentation to state what the multipath (and iscsid) config should be<br>

> from your host. Before changing config though I recommend getting the root<br>

> cause realised.  /var/log/messages log could help.<br>

<br>

Did you mean queue_if_no_path?</blockquote><div><br></div><div>Yes indeed, my apologies. I no longer have "queue_if_no_path" in my config.</div><div><br></div><div>Best regards,</div></div><div><div dir="ltr"><div dir="ltr"><br></div><div dir="ltr">Tony Pearce<br><br></div></div></div><br></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Wed, 28 Oct 2020 at 18:06, Lee Yarwood <<a href="mailto:lyarwood@redhat.com" target="_blank">lyarwood@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On 28-10-20 17:35:59, Tony Pearce wrote:<br>

> Grant,<br>

> <br>

> As a guess I am suspecting your "fail_if_no_path" might be the issue but I<br>

> am not sure on the inner workings or mechanism at play during the reboot or<br>

> why it's getting stuck here for you. Your storage vendor may have<br>

> documentation to state what the multipath (and iscsid) config should be<br>

> from your host. Before changing config though I recommend getting the root<br>

> cause realised.  /var/log/messages log could help.<br>

<br>

Did you mean queue_if_no_path?<br>

<br>

> Also if you go into the multipath CLI "multipathd -k" and issue "show<br>

> config" you may see a "NETAPP" config there already. Depending on the IDs<br>

> your storage may be matching that rather than the default config within<br>

> multipath.conf FYI.<br>

<br>

So Nova will ask os-brick to try to disconnect volumes during a hard<br>

reboot of an instance and I suspect this is where things are getting<br>

stuck in your env if you're using queue_if_no_path.<br>

<br>

Assuming you're using the libvirt virt driver has the underlying domain<br>

for the instance been destroyed already?<br>

<br>

$ sudo virsh dominfo $instance_uuid<br>

<br>

If it has been then we might be able to cleanup the volume manually.<br>

<br>

Either way it might be useful to raise a bug for this against Nova and<br>

os-brick so we can take a look at the attempt to hard reboot in more<br>

detail.<br>

<br>

<a href="https://launchpad.net/nova/+filebug" rel="noreferrer" target="_blank">https://launchpad.net/nova/+filebug</a><br>

<br>

^ Please use the template underneath the futher information textbox once<br>

you've provided a title and if possible include the additional output<br>

somewhere for review.<br>

<br>

$ openstack server event list $instance_uuid<br>

<br>

^ This will provide a list of actions and their associated request-ids.<br>

Using the request-id assocaited with the failing hard reboot can you<br>

then provide logs from the compute.<br>

<br>

$ zgrep -l $request-id /var/log/nova/*<br>

<br>

^ Obviously this depends on how logging is enabled in your env but you<br>

hopefully get the idea.<br>

<br>

> On Wed, 28 Oct 2020 at 15:56, Grant Morley <<a href="mailto:grant@civo.com" target="_blank">grant@civo.com</a>> wrote:<br>

> <br>

> > Hi Tony,<br>

> ><br>

> > We are using NetApp SolidFire for our storage. Instances seem to be in a<br>

> > normal "working" state before we try and reboot them.<br>

> ><br>

> > I haven't looked into `/usr/bin/rescan-scsi-bus.sh` but I will now so<br>

> > thanks for that.<br>

> ><br>

> > We are using multipath but kept it on the defaults so it looks like only 1<br>

> > path is being used.<br>

> ><br>

> > I had a feeling it was down to heavily loaded compute causing the issue.<br>

> ><br>

> > The config for iscsi is also the defaults from which openstack Ansible<br>

> > deployed.<br>

> ><br>

> > Thanks for your help.<br>

> ><br>

> > Grant<br>

> > On 28/10/2020 02:25, Tony Pearce wrote:<br>

> ><br>

> > Hi Grant, what storage are you using here? Is the instance in an<br>

> > apparently "working" state before you try and reboot it?<br>

> ><br>

> > Have you looked into `/usr/bin/rescan-scsi-bus.sh` ? Please see this<br>

> > reference link in the first instance: [1] "When ‘rescan-scsi-bus.sh -i’ is<br>

> > run, script execute as well a LIP_RESET (ISSUE_LIP) which may cause a<br>

> > disruption in I/O on the server and even cause an outage in case of a<br>

> > system running on heavy load."<br>

> ><br>

> > Are you using multipath? Some helpful commands:<br>

> ><br>

> > `tail -f /var/log/messages | grep multipath`<br>

> ><br>

> > `multipathd -k` = will go into mutipath cli. Then while in the cli:<br>

> > show config<br>

> > show paths<br>

> ><br>

> > If the cli is accessible then you're likely using multipath even if 1<br>

> > path. Then the multipath.conf is taking effect even if it's a default<br>

> > config.<br>

> ><br>

> > Config files relating to iscsi storage:<br>

> > /etc/iscsi/iscsid.conf<br>

> > /etc/multipath/multipath.conf<br>

> ><br>

> > [1]<br>

> > <a href="https://www.thegeekdiary.com/when-to-use-rescan-scsi-bus-sh-i-lip-flag-in-centos-rhel/" rel="noreferrer" target="_blank">https://www.thegeekdiary.com/when-to-use-rescan-scsi-bus-sh-i-lip-flag-in-centos-rhel/</a><br>

> ><br>

> > Regards,<br>

> ><br>

> > Tony Pearce<br>

> ><br>

> ><br>

> ><br>

> > On Wed, 28 Oct 2020 at 03:39, Grant Morley <<a href="mailto:grant@civo.com" target="_blank">grant@civo.com</a>> wrote:<br>

> ><br>

> >> Hi all,<br>

> >><br>

> >> We are seeing some oddness on a couple of our compute hosts that seems to<br>

> >> be related to iSCSI. On a couple of our hosts I am seeing this error in the<br>

> >> nova compute logs:<br>

> >><br>

> >> 2020-10-27 18:56:14.814 31490 WARNING os_brick.initiator.connectors.iscsi<br>

> >> [req-8613ae69-1661-49cf-8bdc-6fec875d01ba - - - - -] Couldn't find iscsi<br>

> >> sessions because iscsiadm err: iscsiadm: could not read session targetname:<br>

> >> 5<br>

> >> iscsiadm: could not find session info for session1707<br>

> >><br>

> >> That seems to also stop any instance on the compute host from being able<br>

> >> to reboot.  Reboots seem to get accepted but the instance never completes<br>

> >> and gets stuck in the reboot state:<br>

> >><br>

> >> 2020-10-27 19:11:58.891 48612 INFO nova.compute.manager [-] [instance:<br>

> >> c8079e85-4777-4615-9d5a-3d1151e11984] During sync_power_state the instance<br>

> >> has a pending task (reboot_started_hard). Skip.<br>

> >> 2020-10-27 19:11:58.891 48612 INFO nova.compute.manager [-] [instance:<br>

> >> 31128f26-910d-411f-98e0-c95dd36f4f0f] During sync_power_state the instance<br>

> >> has a pending task (reboot_started_hard). Skip.<br>

> >><br>

> >> Does anyone know of a way to resolve this without rebooting the entire<br>

> >> compute host? I can't see any other issues other than the fact there is<br>

> >> this iSCSI error which in turn seems to stop nova from processing anything<br>

> >> for any instance.<br>

> >><br>

> >> Any advice would be much appreciated.<br>

> >><br>

> >> Regards,<br>

<br>

-- <br>

Lee Yarwood                 A5D1 9385 88CB 7E5F BE64  6618 BCA6 6E33 F672 2D76<br>

</blockquote></div></div></div></div>

</blockquote></div>