[cinder] Error when creating backups from iscsi volume

Rajat Dhasmana rdhasman at redhat.com
Wed Mar 22 18:14:06 UTC 2023


Hi Gorka and Rishat,

As discussed with Gorka, I will be working on the issues reported.

I've reported 2 bugs for case 1) and 3) since we aren't sure on case 2) yet.

*Bug 1*: https://bugs.launchpad.net/os-brick/+bug/2012251
*Fix 1*: https://review.opendev.org/c/openstack/os-brick/+/878045

*Bug 2*: https://bugs.launchpad.net/os-brick/+bug/2012352
*Fix 2*: https://review.opendev.org/c/openstack/os-brick/+/878242

I'm not 100% sure that the approach in *Fix 2* is the best way to do it but
it works with my test scenarios and reviews are always appreciated.

Thanks
Rajat Dhasmana

On Thu, Mar 16, 2023 at 5:45 PM Gorka Eguileor <geguileo at redhat.com> wrote:

> On 16/03, Rishat Azizov wrote:
> > Hi Gorka,
> >
> > Thanks!
> > I fixed issue by adding to multipathd config uxsock_timeout directive:
> > uxsock_timeout 10000
> >
> > Because in multipathd logs I saw this error:
> > 3624a93705842cfae35d7483200015fd8: map flushed
> > cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
> > 4.858561 secs
> >
> > Now large disk backups work fine.
> >
> > 2. This happens because despite the timeout of the first attempt and exit
> > code 1, the multipath device was disconnected, so the next attempts ended
> > with an error "is not a multipath device", since the multipath device had
> > already disconnected.
> >
>
> Hi,
>
> That's a nice workaround until we fix it upstream!!
>
> Thanks for confirming my suspicions were right. This is the 3rd thing I
> mentioned was happening, flush call failed but it actually removed the
> device.
>
> We'll proceed to fix the flushing code in master.
>
> Cheers,
> Gorka.
>
> >
> > вт, 14 мар. 2023 г. в 14:46, Gorka Eguileor <geguileo at redhat.com>:
> >
> > > [Sending the email again as it seems it didn't reach the ML]
> > >
> > >
> > > On 13/03, Gorka Eguileor wrote:
> > > > On 11/03, Rishat Azizov wrote:
> > > > > Hi, Gorka,
> > > > >
> > > > > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> > > > >
> > >
> > >
> > >
> > > Hi,
> > >
> > > There are multiple things going on here:
> > >
> > > 1. There is a bug in os-brick, because the disconnect_volume should not
> > >    fail, since it is being called with force=True and
> > >    ignore_errors=True.
> > >
> > >    The issues is that this call [1] is not wrapped in the
> > >    ExceptionChainer context manager, and it should not even be a flush
> > >    call, it should be a call to "multipathd remove map $map" instead.
> > >
> > > 2. The way multipath code is written [2][3], the error we see about
> > >    "3624a93705842cfae35d7483200015fce is not a multipath device" means
> 2
> > >    different things: it is not a multipath or an error happened.
> > >
> > >    So we don't really know what happened without enabling more verbose
> > >    multipathd log levels.
> > >
> > > 3. The "multipath -f" call should not be failing in the first place,
> > >    because the failure is happening on disconnecting the source volume,
> > >    which has no data buffered to be written and therefore no reason to
> > >    fail the flush (unless it's using a friendly name).
> > >
> > >    I don't know if it could be happening that the first flush fails
> with
> > >    a timeout (maybe because there is an extend operation happening),
> but
> > >    multipathd keeps trying to flush it in the background and when it
> > >    succeeds it removes the multipath device, which makes following
> calls
> > >    fail.
> > >
> > >    If that's the case we would need to change the retry from automatic
> > >    [4] to manual and check in-between to see if the device has been
> > >    removed in-between calls.
> > >
> > > The first issue is definitely a bug, the 2nd one is something that
> could
> > > be changed in the deployment to try to get additional information on
> the
> > > failure, and the 3rd one could be a bug.
> > >
> > > I'll see if I can find someone who wants to work on the 1st and 3rd
> > > points.
> > >
> > > Cheers,
> > > Gorka.
> > >
> > > [1]:
> > >
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
> > > [2]:
> > >
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
> > > [3]:
> > >
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
> > > [4]:
> > >
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384
> > >
> > >
> > >
> > > > >
> > > > > чт, 9 мар. 2023 г. в 15:55, Gorka Eguileor <geguileo at redhat.com>:
> > > > >
> > > > > > On 06/03, Rishat Azizov wrote:
> > > > > > > Hi,
> > > > > > >
> > > > > > > It works with smaller volumes.
> > > > > > >
> > > > > > > multipath.conf attached to thist email.
> > > > > > >
> > > > > > > Cinder version - 18.2.0 Wallaby
> > > > > >
> > >
> > >
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/d7662aa4/attachment.htm>


More information about the openstack-discuss mailing list