Re: [cinder] Error when creating backups from iscsi volume

22 Mar 2023

      Hi Gorka and Rishat,

As discussed with Gorka, I will be working on the issues reported.

I've reported 2 bugs for case 1) and 3) since we aren't sure on case 2) yet.

*Bug 1*: https://bugs.launchpad.net/os-brick/+bug/2012251
*Fix 1*: https://review.opendev.org/c/openstack/os-brick/+/878045

*Bug 2*: https://bugs.launchpad.net/os-brick/+bug/2012352
*Fix 2*: https://review.opendev.org/c/openstack/os-brick/+/878242

I'm not 100% sure that the approach in *Fix 2* is the best way to do it but
it works with my test scenarios and reviews are always appreciated.

Thanks
Rajat Dhasmana

On Thu, Mar 16, 2023 at 5:45 PM Gorka Eguileor <geguileo@redhat.com> wrote:
...
On 16/03, Rishat Azizov wrote:
...
Hi Gorka,
Thanks!
I fixed issue by adding to multipathd config uxsock_timeout directive:
uxsock_timeout 10000
Because in multipathd logs I saw this error:
3624a93705842cfae35d7483200015fd8: map flushed
cli cmd 'del map 3624a93705842cfae35d7483200015fd8' timeout reached after
4.858561 secs
Now large disk backups work fine.
2. This happens because despite the timeout of the first attempt and exit
code 1, the multipath device was disconnected, so the next attempts ended
with an error "is not a multipath device", since the multipath device had
already disconnected.
Hi,
That's a nice workaround until we fix it upstream!!
Thanks for confirming my suspicions were right. This is the 3rd thing I
mentioned was happening, flush call failed but it actually removed the
device.
We'll proceed to fix the flushing code in master.
Cheers,
Gorka.
...
вт, 14 мар. 2023 г. в 14:46, Gorka Eguileor <geguileo@redhat.com>:
...
[Sending the email again as it seems it didn't reach the ML]
On 13/03, Gorka Eguileor wrote:
...
On 11/03, Rishat Azizov wrote:
...
Hi, Gorka,
Thanks. I see multiple "multipath -f" calls. Logs in attachments.
Hi,
There are multiple things going on here:
1. There is a bug in os-brick, because the disconnect_volume should not
   fail, since it is being called with force=True and
   ignore_errors=True.
The issues is that this call [1] is not wrapped in the
   ExceptionChainer context manager, and it should not even be a flush
   call, it should be a call to "multipathd remove map $map" instead.
2. The way multipath code is written [2][3], the error we see about
   "3624a93705842cfae35d7483200015fce is not a multipath device" means
...
...
different things: it is not a multipath or an error happened.
So we don't really know what happened without enabling more verbose
   multipathd log levels.
3. The "multipath -f" call should not be failing in the first place,
   because the failure is happening on disconnecting the source volume,
   which has no data buffered to be written and therefore no reason to
   fail the flush (unless it's using a friendly name).
I don't know if it could be happening that the first flush fails
with
   a timeout (maybe because there is an extend operation happening),
but
   multipathd keeps trying to flush it in the background and when it
   succeeds it removes the multipath device, which makes following
calls
   fail.
If that's the case we would need to change the retry from automatic
   [4] to manual and check in-between to see if the device has been
   removed in-between calls.
The first issue is definitely a bug, the 2nd one is something that
could
be changed in the deployment to try to get additional information on
2
the
...
...
failure, and the 3rd one could be a bug.
I'll see if I can find someone who wants to work on the 1st and 3rd
points.
Cheers,
Gorka.
[1]:
https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5...
...
[2]:
https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd87013...
...
[3]:
https://github.com/opensvc/multipath-tools/blob/db4804bc7393f2482448bdd87013...
...
[4]:
https://github.com/openstack/os-brick/blob/e15edf6c17449899ec8401c37482f7cb5...
...
...
...
чт, 9 мар. 2023 г. в 15:55, Gorka Eguileor <geguileo@redhat.com>:
...
On 06/03, Rishat Azizov wrote:
> Hi,
>
> It works with smaller volumes.
>
> multipath.conf attached to thist email.
>
> Cinder version - 18.2.0 Wallaby