<div dir="ltr"><div>Hi Gorka,</div><div><br></div><div>Thanks!<br></div><div>I fixed issue by <span class="gmail-HwtZe" lang="en"><span class="gmail-jCAhz gmail-ChMk0b"><span class="gmail-ryNqvb">adding to multipathd config uxsock_timeout directive: </span></span></span><br></div><div>uxsock_timeout 10000</div><div><br></div><div>Because in multipathd logs I saw this error:</div><div>3624a93705842cfae35d7483200015fd8: map flushed</div><div>cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after 4.858561 secs</div><div><br></div><div><span class="gmail-HwtZe" lang="en"><span class="gmail-jCAhz gmail-ChMk0b"><span class="gmail-ryNqvb">Now large disk backups work fine.</span></span></span></div><div><span class="gmail-HwtZe" lang="en"><span class="gmail-jCAhz gmail-ChMk0b"><span class="gmail-ryNqvb"><br></span></span></span></div><div>2. This happens because despite the timeout of the first attempt and exit code 1, the multipath device was disconnected, so the next attempts ended with an error "is not a multipath device", since the multipath device had already disconnected.</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">вт, 14 мар. 2023 г. в 14:46, Gorka Eguileor <<a href="mailto:geguileo@redhat.com">geguileo@redhat.com</a>>:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">[Sending the email again as it seems it didn't reach the ML]<br>
<br>
<br>
On 13/03, Gorka Eguileor wrote:<br>
> On 11/03, Rishat Azizov wrote:<br>
> > Hi, Gorka,<br>
> ><br>
> > Thanks. I see multiple "multipath -f" calls. Logs in attachments.<br>
> ><br>
<br>
<br>
<br>
Hi,<br>
<br>
There are multiple things going on here:<br>
<br>
1. There is a bug in os-brick, because the disconnect_volume should not<br>
fail, since it is being called with force=True and<br>
ignore_errors=True.<br>
<br>
The issues is that this call [1] is not wrapped in the<br>
ExceptionChainer context manager, and it should not even be a flush<br>
call, it should be a call to "multipathd remove map $map" instead.<br>
<br>
2. The way multipath code is written [2][3], the error we see about<br>
"3624a93705842cfae35d7483200015fce is not a multipath device" means 2<br>
different things: it is not a multipath or an error happened.<br>
<br>
So we don't really know what happened without enabling more verbose<br>
multipathd log levels.<br>
<br>
3. The "multipath -f" call should not be failing in the first place,<br>
because the failure is happening on disconnecting the source volume,<br>
which has no data buffered to be written and therefore no reason to<br>
fail the flush (unless it's using a friendly name).<br>
<br>
I don't know if it could be happening that the first flush fails with<br>
a timeout (maybe because there is an extend operation happening), but<br>
multipathd keeps trying to flush it in the background and when it<br>
succeeds it removes the multipath device, which makes following calls<br>
fail.<br>
<br>
If that's the case we would need to change the retry from automatic<br>
[4] to manual and check in-between to see if the device has been<br>
removed in-between calls.<br>
<br>
The first issue is definitely a bug, the 2nd one is something that could<br>
be changed in the deployment to try to get additional information on the<br>
failure, and the 3rd one could be a bug.<br>
<br>
I'll see if I can find someone who wants to work on the 1st and 3rd<br>
points.<br>
<br>
Cheers,<br>
Gorka.<br>
<br>
[1]: <a href="https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952" rel="noreferrer" target="_blank">https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952</a><br>
[2]: <a href="https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064" rel="noreferrer" target="_blank">https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064</a><br>
[3]: <a href="https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872" rel="noreferrer" target="_blank">https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872</a><br>
[4]: <a href="https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384" rel="noreferrer" target="_blank">https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384</a><br>
<br>
<br>
<br>
> ><br>
> > чт, 9 мар. 2023 г. в 15:55, Gorka Eguileor <<a href="mailto:geguileo@redhat.com" target="_blank">geguileo@redhat.com</a>>:<br>
> ><br>
> > > On 06/03, Rishat Azizov wrote:<br>
> > > > Hi,<br>
> > > ><br>
> > > > It works with smaller volumes.<br>
> > > ><br>
> > > > multipath.conf attached to thist email.<br>
> > > ><br>
> > > > Cinder version - 18.2.0 Wallaby<br>
> > ><br>
<br>
</blockquote></div>