Hi Gorka and Rishat,

As discussed with Gorka, I will be working on the issues reported.

I've reported 2 bugs for case 1) and 3) since we aren't sure on case 2) yet.

Bug 1: https://bugs.launchpad.net/os-brick/+bug/2012251
Fix 1https://review.opendev.org/c/openstack/os-brick/+/878045

Bug 2https://bugs.launchpad.net/os-brick/+bug/2012352
Fix 2https://review.opendev.org/c/openstack/os-brick/+/878242

I'm not 100% sure that the approach in Fix 2 is the best way to do it but it works with my test scenarios and reviews are always appreciated.

Thanks
Rajat Dhasmana

On Thu, Mar 16, 2023 at 5:45 PM Gorka Eguileor <geguileo@redhat.com> wrote:
On 16/03, Rishat Azizov wrote:
> Hi Gorka,
>
> Thanks!
> I fixed issue by adding to multipathd config uxsock_timeout directive:
> uxsock_timeout 10000
>
> Because in multipathd logs I saw this error:
> 3624a93705842cfae35d7483200015fd8: map flushed
> cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
> 4.858561 secs
>
> Now large disk backups work fine.
>
> 2. This happens because despite the timeout of the first attempt and exit
> code 1, the multipath device was disconnected, so the next attempts ended
> with an error "is not a multipath device", since the multipath device had
> already disconnected.
>

Hi,

That's a nice workaround until we fix it upstream!!

Thanks for confirming my suspicions were right. This is the 3rd thing I
mentioned was happening, flush call failed but it actually removed the
device.

We'll proceed to fix the flushing code in master.

Cheers,
Gorka.

>
> вт, 14 мар. 2023 г. в 14:46, Gorka Eguileor <geguileo@redhat.com>:
>
> > [Sending the email again as it seems it didn't reach the ML]
> >
> >
> > On 13/03, Gorka Eguileor wrote:
> > > On 11/03, Rishat Azizov wrote:
> > > > Hi, Gorka,
> > > >
> > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> > > >
> >
> >
> >
> > Hi,
> >
> > There are multiple things going on here:
> >
> > 1. There is a bug in os-brick, because the disconnect_volume should not
> >    fail, since it is being called with force=True and
> >    ignore_errors=True.
> >
> >    The issues is that this call [1] is not wrapped in the
> >    ExceptionChainer context manager, and it should not even be a flush
> >    call, it should be a call to "multipathd remove map $map" instead.
> >
> > 2. The way multipath code is written [2][3], the error we see about
> >    "3624a93705842cfae35d7483200015fce is not a multipath device" means 2
> >    different things: it is not a multipath or an error happened.
> >
> >    So we don't really know what happened without enabling more verbose
> >    multipathd log levels.
> >
> > 3. The "multipath -f" call should not be failing in the first place,
> >    because the failure is happening on disconnecting the source volume,
> >    which has no data buffered to be written and therefore no reason to
> >    fail the flush (unless it's using a friendly name).
> >
> >    I don't know if it could be happening that the first flush fails with
> >    a timeout (maybe because there is an extend operation happening), but
> >    multipathd keeps trying to flush it in the background and when it
> >    succeeds it removes the multipath device, which makes following calls
> >    fail.
> >
> >    If that's the case we would need to change the retry from automatic
> >    [4] to manual and check in-between to see if the device has been
> >    removed in-between calls.
> >
> > The first issue is definitely a bug, the 2nd one is something that could
> > be changed in the deployment to try to get additional information on the
> > failure, and the 3rd one could be a bug.
> >
> > I'll see if I can find someone who wants to work on the 1st and 3rd
> > points.
> >
> > Cheers,
> > Gorka.
> >
> > [1]:
> > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
> > [2]:
> > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
> > [3]:
> > https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
> > [4]:
> > https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384
> >
> >
> >
> > > >
> > > > чт, 9 мар. 2023 г. в 15:55, Gorka Eguileor <geguileo@redhat.com>:
> > > >
> > > > > On 06/03, Rishat Azizov wrote:
> > > > > > Hi,
> > > > > >
> > > > > > It works with smaller volumes.
> > > > > >
> > > > > > multipath.conf attached to thist email.
> > > > > >
> > > > > > Cinder version - 18.2.0 Wallaby
> > > > >
> >
> >