[cinder][nova] Running parallel iSCSI/LVM c-vol backends is causing random failures in CI
Hello all, I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc. Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750 AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal? Thanks in advance, Lee
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-suppor... seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it. -- Jeremy Stanley
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-suppor... seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal: https://packages.ubuntu.com/focal/python3-rtslib-fb -- Jeremy Stanley
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-suppor... seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config? -- Jeremy Stanley
On Tue, 9 Mar 2021 at 22:24, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-suppor... seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config?
Yes correct, my question was more to see if there were any known issues with using lioadm on Focal. Anyway I've pushed the following WIP change for devstack to switch over to lioadm when using Focal: WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624
On Tue, Mar 9, 2021, at 2:40 PM, Lee Yarwood wrote:
On Tue, 9 Mar 2021 at 22:24, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-suppor... seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config?
Yes correct, my question was more to see if there were any known issues with using lioadm on Focal. Anyway I've pushed the following WIP change for devstack to switch over to lioadm when using Focal:
WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624
https://blog.e0ne.info/post/using-openstack-cinder-with-lio-target/ says the major issue with switching in cinder has been figuring out upgrade testing of the change. I don't know what that entails or why it might be a problem though.
On Tuesday, 9 March 2021 23:40:08 CET Lee Yarwood wrote:
On Tue, 9 Mar 2021 at 22:24, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-> > > > support.html seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config?
Yes correct, my question was more to see if there were any known issues with using lioadm on Focal. Anyway I've pushed the following WIP change for devstack to switch over to lioadm when using Focal:
WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624
For the record, we use the default (tgt) on the main cinder gates, as well as a lioadm job (defined in cinder-tempest-plugin). If I read the code correctly, that change would break tgt for everyone on ubuntu. Please raise this in the cinder meeting (Wednesday). -- Luigi
On Tue, 2021-03-09 at 23:50 +0100, Luigi Toscano wrote:
On Tuesday, 9 March 2021 23:40:08 CET Lee Yarwood wrote:
On Tue, 9 Mar 2021 at 22:24, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote:
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-iscsi-> > > > support.html seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config?
Yes correct, my question was more to see if there were any known issues with using lioadm on Focal. Anyway I've pushed the following WIP change for devstack to switch over to lioadm when using Focal:
WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624
For the record, we use the default (tgt) on the main cinder gates, as well as a lioadm job (defined in cinder-tempest-plugin). If I read the code correctly, that change would break tgt for everyone on ubuntu.
Please raise this in the cinder meeting (Wednesday). i dont think we can wait that long im pretty sure this is causing this error form talking to lee earlier to day http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message:%...
i guess it almost wednesday already but its startin to casue issues in multiple poject gates right before code freeze so we need to adress this before we end up rechecking things over and over on many patches.
On Wednesday, 10 March 2021 00:00:29 CET Sean Mooney wrote:
On Tue, 2021-03-09 at 23:50 +0100, Luigi Toscano wrote:
On Tuesday, 9 March 2021 23:40:08 CET Lee Yarwood wrote:
On Tue, 9 Mar 2021 at 22:24, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote: > I reported the following bug last week but I've yet to get any > real > feedback after asking a few times in irc. > > Running parallel iSCSI/LVM c-vol backends is causing random > failures > in CI > https://bugs.launchpad.net/cinder/+bug/1917750 > > AFAICT tgtadm is causing this behaviour. As I've stated in the > bug > with Fedora 32 and lioadm I don't see the WWN conflict between > the > two > backends. Does anyone know if using lioadm is an option on > Focal?
https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-is csi-> > > > support.html seems to indicate that you just need to set it in configuration. The package that document mentions looks like a distro package recommendation (does not exist under that name on PyPI) and the equivalent lib on PyPI is included in cinder's requirements.txt file, but I don't see either mentioned in the devstack source tree so maybe that needs to be installed for DevStack-based jobs to take advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config?
Yes correct, my question was more to see if there were any known issues with using lioadm on Focal. Anyway I've pushed the following WIP change for devstack to switch over to lioadm when using Focal:
WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624
For the record, we use the default (tgt) on the main cinder gates, as well as a lioadm job (defined in cinder-tempest-plugin). If I read the code correctly, that change would break tgt for everyone on ubuntu.
Please raise this in the cinder meeting (Wednesday).
i dont think we can wait that long im pretty sure this is causing this error form talking to lee earlier to day http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message: %5C%22Unable%20to%20detach%20the%20device%20from%20the%20live%20config%5C%22 %20AND%20loglevel:%20ERROR
i guess it almost wednesday already but its startin to casue issues in multiple poject gates right before code freeze so we need to adress this before we end up rechecking things over and over on many patches.
But then don't error out if someone tries to use tgtadm, which is what would happen if that patch was merged (if I didn't misread the code). -- Luigi
On Tue, 9 Mar 2021 at 23:11, Luigi Toscano <ltoscano@redhat.com> wrote:
On Wednesday, 10 March 2021 00:00:29 CET Sean Mooney wrote:
On Tue, 2021-03-09 at 23:50 +0100, Luigi Toscano wrote:
On Tuesday, 9 March 2021 23:40:08 CET Lee Yarwood wrote:
On Tue, 9 Mar 2021 at 22:24, Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2021-03-09 22:19:48 +0000 (+0000), Jeremy Stanley wrote:
On 2021-03-09 22:18:12 +0000 (+0000), Jeremy Stanley wrote: > On 2021-03-09 22:08:18 +0000 (+0000), Lee Yarwood wrote: > > I reported the following bug last week but I've yet to get any > > real > > feedback after asking a few times in irc. > > > > Running parallel iSCSI/LVM c-vol backends is causing random > > failures > > in CI > > https://bugs.launchpad.net/cinder/+bug/1917750 > > > > AFAICT tgtadm is causing this behaviour. As I've stated in the > > bug > > with Fedora 32 and lioadm I don't see the WWN conflict between > > the > > two > > backends. Does anyone know if using lioadm is an option on > > Focal? > > https://docs.openstack.org/cinder/latest/admin/blockstorage-lio-is > csi-> > > > support.html seems to indicate that you just need to > set it in configuration. The package that document mentions looks > like a distro package > recommendation (does not exist under that name on PyPI) and the > equivalent lib on PyPI is included in cinder's requirements.txt > file, but I don't see either mentioned in the devstack source tree > so maybe that needs to be installed for DevStack-based jobs to > take > advantage of it.
Oh, and this would be the package to add from Ubuntu Focal:
Nevermind, DevStack installs the projects with pip, so the one in cinder's requirements.txt should already be present. In that case, yeah, just set it in the config?
Yes correct, my question was more to see if there were any known issues with using lioadm on Focal. Anyway I've pushed the following WIP change for devstack to switch over to lioadm when using Focal:
WIP cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624
For the record, we use the default (tgt) on the main cinder gates, as well as a lioadm job (defined in cinder-tempest-plugin). If I read the code correctly, that change would break tgt for everyone on ubuntu.
Please raise this in the cinder meeting (Wednesday).
i dont think we can wait that long im pretty sure this is causing this error form talking to lee earlier to day http://logstash.openstack.org/#/dashboard/file/logstash.json?query=message: %5C%22Unable%20to%20detach%20the%20device%20from%20the%20live%20config%5C%22 %20AND%20loglevel:%20ERROR
i guess it almost wednesday already but its startin to casue issues in multiple poject gates right before code freeze so we need to adress this before we end up rechecking things over and over on many patches.
If M3 wasn't *tomorrow* I'd agree but I don't think anyone wants to change the default iSCSI target this close to feature freeze. I've also been unable to reproduce the above failure with a multi LVM/iSCSI + tgtadm setup so I'm not entirely confident about the switch to lioadm actually resolving that issue at the moment.
But then don't error out if someone tries to use tgtadm, which is what would happen if that patch was merged (if I didn't misread the code).
There's nothing stopping someone from declaring CINDER_ISCSI_HELPER=tgtadm to override the default so I'm not sure what you're suggesting. To that end I've posted the following to ensure the single host tgtadm jobs in the cinder-tempest-plugin use the correct target: Set CINDER_ISCSI_HELPER explicitly for tgtadm job https://review.opendev.org/c/openstack/cinder-tempest-plugin/+/779697 If you know of anymore please let me know! Cheers, Lee
On 09/03, Lee Yarwood wrote:
Hello all,
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
Thanks in advance,
Lee
Hi Lee, Sorry for the late reply. I started looking at the case some time ago but got "distracted" with some other issue. I am no expert on STGT, since I always work with LIO, but from I could gather this seems to be caused by the conjunction of us: - Using the tgtadm helper - Having 2 different cinder-volume services running on 2 different hosts (one in compute and another on controller). - Using the same volume_backend_name for both LVM backends. If we were running a single cinder-volume service with 2 backends this issue wouldn't happen (I checked). If we used a different volume_backend_name for each of the 2 services and used a volume type picking one of them for the operations, this wouldn't happen either. If we used LIO instead, this wouldn't happen. The cause is the automatic generation of serial/wwn for volumes by the STGT, that seems to be deterministic. First target created on a host will be have a 60000000000000000e0000000001 prefix and then the LUN number (the 3 before it that we see in the connection_info is just to state that the WWN is of NAA type). This means that the first volume exposed by STGT on any host will ALWAYS have the same WWN and will mess things up if we attach them to the same host, because the premise of a WWN is its uniqueness and everything in Cinder and OS-Brick assumes this and will not be changed. For LIO it seems that the generation of the seria/wwn is non deterministic (or at least not the same on all hosts) so the issue won't happen in this specific deployment configuration. So the options to prevent this issue are to run both backends on the controller node, use different volume_backend_name and a volume type, or use LIO. Cheers, Gorka.
On Tue, 23 Mar 2021 at 17:46, Gorka Eguileor <geguileo@redhat.com> wrote:
On 09/03, Lee Yarwood wrote:
Hello all,
I reported the following bug last week but I've yet to get any real feedback after asking a few times in irc.
Running parallel iSCSI/LVM c-vol backends is causing random failures in CI https://bugs.launchpad.net/cinder/+bug/1917750
AFAICT tgtadm is causing this behaviour. As I've stated in the bug with Fedora 32 and lioadm I don't see the WWN conflict between the two backends. Does anyone know if using lioadm is an option on Focal?
Thanks in advance,
Lee
Hi Lee,
Sorry for the late reply.
I started looking at the case some time ago but got "distracted" with some other issue.
I am no expert on STGT, since I always work with LIO, but from I could gather this seems to be caused by the conjunction of us:
- Using the tgtadm helper - Having 2 different cinder-volume services running on 2 different hosts (one in compute and another on controller). - Using the same volume_backend_name for both LVM backends.
If we were running a single cinder-volume service with 2 backends this issue wouldn't happen (I checked).
If we used a different volume_backend_name for each of the 2 services and used a volume type picking one of them for the operations, this wouldn't happen either.
If we used LIO instead, this wouldn't happen.
The cause is the automatic generation of serial/wwn for volumes by the STGT, that seems to be deterministic. First target created on a host will be have a 60000000000000000e0000000001 prefix and then the LUN number (the 3 before it that we see in the connection_info is just to state that the WWN is of NAA type).
This means that the first volume exposed by STGT on any host will ALWAYS have the same WWN and will mess things up if we attach them to the same host, because the premise of a WWN is its uniqueness and everything in Cinder and OS-Brick assumes this and will not be changed.
For LIO it seems that the generation of the seria/wwn is non deterministic (or at least not the same on all hosts) so the issue won't happen in this specific deployment configuration.
So the options to prevent this issue are to run both backends on the controller node, use different volume_backend_name and a volume type, or use LIO.
Thanks Gorka, Just to copy my reply from the bug here. I'm not entirely sure how using a different volume_backend_name would help? As you say above the first target on both hosts would still have the 60000000000000000e0000000001 prefix regardless of the name right? Moving to a single service multibackend approach would be best but given required job changes etc isn't something I think we can do in the short term. Moving to lioadm is still my preferred short term solution to this with the following devstack change awaiting reviews below: cinder: Default CINDER_ISCSI_HELPER to lioadm on Ubuntu https://review.opendev.org/c/openstack/devstack/+/779624 Cheers, Lee
participants (6)
-
Clark Boylan
-
Gorka Eguileor
-
Jeremy Stanley
-
Lee Yarwood
-
Luigi Toscano
-
Sean Mooney