<div dir="ltr"><div>Thanks Sean for these replies. These make sense to me.<br></div><div><br></div><div>As I mentioned in my earlier reply, I run c9s jobs several times and I did confirm the issue</div><div>can be reproduced in c9s.</div><div># The attempts can be found here: <a href="https://review.opendev.org/c/openstack/heat/+/879014/1">https://review.opendev.org/c/openstack/heat/+/879014/1</a></div><div><br></div><div>The interesting finding was that the issue appears in c9s much less frequently than Ubuntu.</div><div>(The issue is reproduced in c9s once but I didn't hit it during recheck while ubuntu jobs were<br></div><div> consistently blocked by the libvirt problem.)<br></div><div><br></div><div>I don't know what is causing that difference but sharing my observation just in case that sounds</div><div>also interesting to the other people.<br></div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Thu, Mar 30, 2023 at 8:18 PM Sean Mooney <<a href="mailto:smooney@redhat.com">smooney@redhat.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">On Thu, 2023-03-30 at 19:54 +0900, Takashi Kajinami wrote:<br>
> Thank you, Sylvain, for all these inputs !<br>
> <br>
> On Thu, Mar 30, 2023 at 7:10 PM Sylvain Bauza <<a href="mailto:sbauza@redhat.com" target="_blank">sbauza@redhat.com</a>> wrote:<br>
> <br>
> > <br>
> > <br>
> > Le jeu. 30 mars 2023 à 06:16, Takashi Kajinami <<a href="mailto:tkajinam@redhat.com" target="_blank">tkajinam@redhat.com</a>> a<br>
> > écrit :<br>
> > <br>
> > > Hello,<br>
> > > <br>
> > > <br>
> > > Since we migrated our jobs from Ubuntu Focal to Ubuntu Jammy, heat gate<br>
> > > jobs have<br>
> > > become very flaky. Further investigation revealed that the issue is<br>
> > > related to something<br>
> > > in libvirt from Ubuntu Jammy and that prevents detaching devices from<br>
> > > instances[1].<br>
> > > <br>
> > > The same problem appears in different jobs[2] and we workaround the<br>
> > > problem by disabling<br>
> > > some affected jobs. In heat we also disabled some flaky tests but because<br>
> > > of this we no longer<br>
> > > run basic scenario tests which deploys instance/volume/network in a<br>
> > > single stack, which means<br>
> > > we lost the quite basic test coverage.<br>
> > > <br>
> > > My question is, is there anyone in the Nova team working on "fixing" this<br>
> > > problem ?<br>
> > > We might be able to implement some workaround (like checking status of<br>
> > > the instances before<br>
> > > attempting to delete it) but this should be fixed in libvirt side IMO, as<br>
> > > this looks like a "regression"<br>
> > > in Ubuntu Jammy.<br>
> > > Probably we should report a bug against the libvirt package in Ubuntu but<br>
> > > I'd like to hear some<br>
> > > thoughts from the nova team because they are more directly affected by<br>
> > > this problem.<br>
> > > <br>
> > > <br>
> > <br>
> > FWIW, we discussed about it yesterday on our vPTG :<br>
> > <a href="https://etherpad.opendev.org/p/nova-bobcat-ptg#L289" rel="noreferrer" target="_blank">https://etherpad.opendev.org/p/nova-bobcat-ptg#L289</a><br>
> > <br>
> > Most of the problems come from the volume detach thing. We also merged<br>
> > some Tempest changes for not trying to cleanup some volumes if the test was<br>
> > OK (thanks Dan for this). We also added more verifications to ask SSH to<br>
> > wait for a bit of time before calling the instance.<br>
> > Eventually, as you see in the etherpad, we didn't found any solutions but<br>
> > we'll try to add some canary job for testing multiple times volume<br>
> > attachs/detachs.<br>
> > <br>
> <br>
> > We'll also continue to discuss on the CI failures during every Nova weekly<br>
> > meetings (Tuesdays@1600UTC on #openstack-nova) and I'll want to ask a<br>
> > cross-project session for the Vancouver pPTG for Tempest/Cinder/Nova and<br>
> > others.<br>
> > I leave other SMEs to reply on your other points, like for c9s.<br>
> > <br>
> <br>
> It's good to hear that the issue is still getting attention. I'll catch up<br>
> the discussion by reading the etherpad<br>
> and will try to attend follow-up discussions if possible, especially if I<br>
> can attend Vancouver vPTG.<br>
> <br>
> I know some changes have been proposed to check ssh-ability to workaround<br>
> the problem (though<br>
> the comment in the vPTG session indicates that does not fully solve the<br>
> problem) but it's still annoying<br>
> because we don't really block resource deletions based on instance status<br>
> (especially its internal status)<br>
> so we eventually need some solutions here to avoid this problem, IMHO.<br>
> <br>
> <br>
> > <br>
> > > I'm now trying to set up a centos stream 9 job in Heat repo to see<br>
> > > whether this can be reproduced<br>
> > > if we use centos stream 9. I've been running that specific scenario test<br>
> > > in centos stream 9 jobs<br>
> > > in puppet repos but I've never seen this issue, so I suspect the issue is<br>
> > > really specific to libvirt<br>
> > > in Jammy.<br>
> > > <br>
> > <br>
> > <br>
> > Well, maybe I'm wrong, but no, we also have a centos9stream issue for<br>
> > volume detachs :<br>
> > <a href="https://bugs.launchpad.net/nova/+bug/1960346" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1960346</a><br>
> > <br>
> > <br>
> I just managed to launch a c9s job in heat but it seems the issue is<br>
> reproducible in c9s as well[1].<br>
<br>
ya i replied in paralle in my other reply i noted that we saw this issue<br>
first in c9s then in ubuntu and we also see this in our internal downstram<br>
ci.<br>
<br>
changing the distro we use for the devstack jobs wont help unless we downgrade libvirt and qemu to before the<br>
orginal change in lbvirt was done. which would break other things.<br>
> I'll rerun the job a few more times to see how frequent the issue appears<br>
> in c9s compared to<br>
> ubuntu.<br>
> We do not run many tests in puppet jobs so that might be the reason I've<br>
> never hit it in<br>
> puppet jobs.<br>
> <br>
> [1] <a href="https://review.opendev.org/c/openstack/heat/+/879014" rel="noreferrer" target="_blank">https://review.opendev.org/c/openstack/heat/+/879014</a><br>
> <br>
> <br>
> > <br>
> > <br>
> > > [1] <a href="https://bugs.launchpad.net/nova/+bug/1998274" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1998274</a><br>
> > > [2] <a href="https://bugs.launchpad.net/nova/+bug/1998148" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nova/+bug/1998148</a><br>
> > > <br>
> > > Thank you,<br>
> > > Takashi<br>
> > > <br>
> > <br>
<br>
</blockquote></div>