Re: [openstack-dev] [nova][cinder][qa] Should we enable multiattach in tempest-full?
---- On Tue, 02 Oct 2018 00:28:51 +0900 Matt Riedemann <mriedemos@gmail.com> wrote ----
On 10/1/2018 8:37 AM, Ghanshyam Mann wrote:
+1 on adding multiattach on integrated job. It is always good to cover more features in integrate-gate instead of separate jobs. These tests does not take much time, it should be ok to add in tempest-full [1]. We should make only really slow test as 'slow' otherwise it should be fine to run in tempest-full.
I thought adding tempest-slow on cinder was merged but it is not[2]
[1]http://logs.openstack.org/80/606880/2/check/nova-multiattach/7f8681e/job-out... [2]https://review.openstack.org/#/c/591354/2
Actually it will be enabled in both tempest-full and tempest-slow, because there is also a multiattach test marked as 'slow': TestMultiAttachVolumeSwap.
I'll push patches today.
While reviewing your patch and checking multiattach slow test on stable branch as part of tempest-slow job, I found that tempest-slow (tempest-multinode-full) job does not run on nova stable branches (even nova .zuul.yaml has that job to run for stable branch) which we can say bug on Tempest side because tempest-slow job definition is to run only on master [1]. I am trying to enable that for all stable branches[2]. I am getting few failure on tempest-slow (tempest-multinode-full) for stable branches which might take time to fix and till then let's keep nova-multiattach on stable branches and remove only for master. [1] https://github.com/openstack/tempest/blob/a32467c4c515dff325e6b4b5ce7af24a0b... [2] https://review.openstack.org/#/q/topic:tempest-multinode-slow-stable+(status...) -gmann
--
Thanks,
Matt
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: OpenStack-dev-request@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
On 12/10/2018 1:21 AM, Ghanshyam Mann wrote:
I am getting few failure on tempest-slow (tempest-multinode-full) for stable branches which might take time to fix and till then let's keep nova-multiattach on stable branches and remove only for master.
Bug https://bugs.launchpad.net/cinder/+bug/1807723/ is blocking removing the nova-multiattach job from master. Something is going on with TestMultiAttachVolumeSwap when there are two hosts. That test is marked slow but runs in nova-multiattach which also runs slow tests, and nova-multiattach is a single node job. With tempest change: https://review.openstack.org/#/c/606978/ TestMultiAttachVolumeSwap gets run in the tempest-slow job which is multi-node, and as a result I'm seeing race failures in that test. I've put my notes into the bug, but I need some help from Cinder at this point. I thought I had initially identified a very obvious problem in nova, but now I think nova is working as designed (although very confusing) and we're hitting a race during the swap where deleting the attachment record for the volume/server we swapped *from* is failing saying the target is still active. The fact we used to run this on a single-node job likely masked some race issue. As far as next steps, we could: 1. Move forward with removing nova-multiattach but skip TestMultiAttachVolumeSwap until bug 1807723 is fixed. 2. Try to workaround bug 1807723 in Tempest by creating the multiattach volume and servers on the same host (by pinning them to an AZ). 3. Add some retry logic to Cinder and hope it is just a race failure when the volume is connected to servers across different hosts. Ultimately this is the best scenario but I'm just not yet sure if that is really the issue or if something is really messed up in the volume backend when this fails where retries wouldn't help. -- Thanks, Matt
On 12/10/2018 10:59 AM, Matt Riedemann wrote:
TestMultiAttachVolumeSwap gets run in the tempest-slow job which is multi-node, and as a result I'm seeing race failures in that test. I've put my notes into the bug, but I need some help from Cinder at this point. I thought I had initially identified a very obvious problem in nova, but now I think nova is working as designed (although very confusing) and we're hitting a race during the swap where deleting the attachment record for the volume/server we swapped *from* is failing saying the target is still active.
After more debugging, it looks like when deleting the servers, the volume in question that fails to delete isn't being properly detached by nova-compute, so the connection still exists when tempest tries to delete the volume and then it fails. I'm not sure what is going on here, it's almost as if something is wrong in the DB and we're not finding the appropriate BDM in the DB during the server delete so we never detach. -- Thanks, Matt
participants (2)
-
Ghanshyam Mann
-
Matt Riedemann