[cinder][nova] Running parallel iSCSI/LVM c-vol backends is causing random failures in CI

Lee Yarwood lyarwood at redhat.com
Tue Mar 23 18:44:55 UTC 2021


On Tue, 23 Mar 2021 at 17:46, Gorka Eguileor <geguileo at redhat.com> wrote:
>
> On 09/03, Lee Yarwood wrote:
> > Hello all,
> >
> > I reported the following bug last week but I've yet to get any real
> > feedback after asking a few times in irc.
> >
> > Running parallel iSCSI/LVM c-vol backends is causing random failures in CI
> > https://bugs.launchpad.net/cinder/+bug/1917750
> >
> > AFAICT tgtadm is causing this behaviour. As I've stated in the bug
> > with Fedora 32 and lioadm I don't see the WWN conflict between the two
> > backends. Does anyone know if using lioadm is an option on Focal?
> >
> > Thanks in advance,
> >
> > Lee
> >
> >
>
> Hi Lee,
>
> Sorry for the late reply.
>
> I started looking at the case some time ago but got "distracted" with
> some other issue.
>
> I am no expert on STGT, since I always work with LIO, but from I could
> gather this seems to be caused by the conjunction of us:
>
> - Using the tgtadm helper
> - Having 2 different cinder-volume services running on 2 different hosts
>   (one in compute and another on controller).
> - Using the same volume_backend_name for both LVM backends.
>
> If we were running a single cinder-volume service with 2 backends this
> issue wouldn't happen (I checked).
>
> If we used a different volume_backend_name for each of the 2 services
> and used a volume type picking one of them for the operations, this
> wouldn't happen either.
>
> If we used LIO instead, this wouldn't happen.
>
> The cause is the automatic generation of serial/wwn for volumes by the
> STGT, that seems to be deterministic.  First target created on a host
> will be have a 60000000000000000e0000000001 prefix and then the LUN
> number (the 3 before it that we see in the connection_info is just to
> state that the WWN is of NAA type).
>
> This means that the first volume exposed by STGT on any host will ALWAYS
> have the same WWN and will mess things up if we attach them to the same
> host, because the premise of a WWN is its uniqueness and everything in
> Cinder and OS-Brick assumes this and will not be changed.
>
> For LIO it seems that the generation of the seria/wwn is non
> deterministic (or at least not the same on all hosts) so the issue won't
> happen in this specific deployment configuration.
>
> So the options to prevent this issue are to run both backends on the
> controller node, use different volume_backend_name and a volume type, or
> use LIO.

Thanks Gorka,

Just to copy my reply from the bug here.

I'm not entirely sure how using a different volume_backend_name would
help? As you say above the first target on both hosts would still have
the 60000000000000000e0000000001 prefix regardless of the name right?

Moving to a single service multibackend approach would be best but
given required job changes etc isn't something I think we can do in
the short term.

Moving to lioadm is still my preferred short term solution to this
with the following devstack change awaiting reviews below:

cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu
https://review.opendev.org/c/openstack/devstack/+/779624

Cheers,

Lee




More information about the openstack-discuss mailing list