[nova][ops] The need for healing instance info cache to base itself on neutron for its port list
Jean-Philippe Méthot
jp.methot at planethoster.info
Tue Oct 6 20:15:15 UTC 2020
> we did not backport it due to the db migration bug but its fixed form stein on upstream.
> given we have not had issue backporting https://review.opendev.org/#/c/591607/ without
> https://review.opendev.org/#/c/614167/20 downstream i think it would be resonable to do upstream.
If it could be backported to Rocky and maybe even Queens, for those who still run Queens, I’m sure it would be strongly
appreciated (at least we would since we wouldn’t have to patch manually when we update packages)
>> Couldn’t it just have a configuration option to enable it? While I’m not convinced it can fix the root cause of our
>> problem, it could at least contribute to the stability of our and other people’s Openstack cluster.
> so this is a subtel thing. its not really a nova bug. its an issue where invalid data is returned by neuton and that
> currupts the nova database. The force refesh will heal nova if and only if the neutron issue that casue the issue in the
> first place is resovled. if the neutron issue is not fix then the force refresh will contiune to force update the nova
> networking info cache with incomplete data.
>
> so if you never have a netuon issue that returns invalid data then you will never need this patch
> if you do for say because you broke the neutron policy file then this backprot will fix the nova database only
> once the policy issue is corrected. we have had several large customer that have had issue with neutron due to
> misconfiging the polify file or due to a third part sdn contol who maintianed port information in an external db
> seperate form neutron. in the case of the policy file customer this self healing worked once they corrected the issue.
> in the case of the sdn contoler customer it did not until the sdn vendor fix the sdn contols db. once it returned
> correct data again the periodic task healed nova.
That’s interesting because we run a very basic neutron + openvswitch setup with default policies. Additionally,
we have tested the nova patch I mentioned earlier for a long while and it seemed to at least prevent the instances
from losing their port. Doesn’t that imply that neutron has consistently returned correct data in our setup in particular?
So our issue could be elsewhere? I could be wrong and it’s not a hill I’m willing to die on, I’m just pointing out my own
observations.
Jean-Philippe Méthot
Senior Openstack system administrator
Administrateur système Openstack sénior
PlanetHoster inc.
4414-4416 Louis B Mayer
Laval, QC, H7P 0G1, Canada
TEL : +1.514.802.1644 - Poste : 2644
FAX : +1.514.612.0678
CA/US : 1.855.774.4678
FR : 01 76 60 41 43
UK : 0808 189 0423
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20201006/d83efdab/attachment-0001.html>
More information about the openstack-discuss
mailing list