[cinder] Error when creating backups from iscsi volume

Rishat Azizov rishat.azizov at gmail.com
Thu Mar 16 11:02:07 UTC 2023


Hi Gorka,

Thanks!
I fixed issue by adding to multipathd config uxsock_timeout directive:
uxsock_timeout 10000

Because in multipathd logs I saw this error:
3624a93705842cfae35d7483200015fd8: map flushed
cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
4.858561 secs

Now large disk backups work fine.

2. This happens because despite the timeout of the first attempt and exit
code 1, the multipath device was disconnected, so the next attempts ended
with an error "is not a multipath device", since the multipath device had
already disconnected.


вт, 14 мар. 2023 г. в 14:46, Gorka Eguileor <geguileo at redhat.com>:

> [Sending the email again as it seems it didn't reach the ML]
>
>
> On 13/03, Gorka Eguileor wrote:
> > On 11/03, Rishat Azizov wrote:
> > > Hi, Gorka,
> > >
> > > Thanks. I see multiple "multipath -f" calls. Logs in attachments.
> > >
>
>
>
> Hi,
>
> There are multiple things going on here:
>
> 1. There is a bug in os-brick, because the disconnect_volume should not
>    fail, since it is being called with force=True and
>    ignore_errors=True.
>
>    The issues is that this call [1] is not wrapped in the
>    ExceptionChainer context manager, and it should not even be a flush
>    call, it should be a call to "multipathd remove map $map" instead.
>
> 2. The way multipath code is written [2][3], the error we see about
>    "3624a93705842cfae35d7483200015fce is not a multipath device" means 2
>    different things: it is not a multipath or an error happened.
>
>    So we don't really know what happened without enabling more verbose
>    multipathd log levels.
>
> 3. The "multipath -f" call should not be failing in the first place,
>    because the failure is happening on disconnecting the source volume,
>    which has no data buffered to be written and therefore no reason to
>    fail the flush (unless it's using a friendly name).
>
>    I don't know if it could be happening that the first flush fails with
>    a timeout (maybe because there is an extend operation happening), but
>    multipathd keeps trying to flush it in the background and when it
>    succeeds it removes the multipath device, which makes following calls
>    fail.
>
>    If that's the case we would need to change the retry from automatic
>    [4] to manual and check in-between to see if the device has been
>    removed in-between calls.
>
> The first issue is definitely a bug, the 2nd one is something that could
> be changed in the deployment to try to get additional information on the
> failure, and the 3rd one could be a bug.
>
> I'll see if I can find someone who wants to work on the 1st and 3rd
> points.
>
> Cheers,
> Gorka.
>
> [1]:
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/connectors/iscsi.py#L952
> [2]:
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/multipath/main.c#L1063-L1064
> [3]:
> https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd870132522e65dd98e/libmultipath/devmapper.c#L867-L872
> [4]:
> https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5de207d3/os_brick/initiator/linuxscsi.py#L384
>
>
>
> > >
> > > чт, 9 мар. 2023 г. в 15:55, Gorka Eguileor <geguileo at redhat.com>:
> > >
> > > > On 06/03, Rishat Azizov wrote:
> > > > > Hi,
> > > > >
> > > > > It works with smaller volumes.
> > > > >
> > > > > multipath.conf attached to thist email.
> > > > >
> > > > > Cinder version - 18.2.0 Wallaby
> > > >
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230316/72f32ad9/attachment-0001.htm>


More information about the openstack-discuss mailing list