[openstack-dev] [Nova] [Neutron] heal_instance_info_cache_interval - Can we kill it?

Aaron Rosen aaronorosen at gmail.com
Thu May 29 00:47:21 UTC 2014


On Wed, May 28, 2014 at 7:39 AM, Assaf Muller <amuller at redhat.com> wrote:

>
>
> ----- Original Message -----
> > Hi,
> >
> > Sorry somehow I missed this email. I don't think you want to disable it,
> > though we can definitely have it run less often. The issue with
> disabling it
> > is if one of the notifications from neutron->nova never gets sent
> > successfully to nova (neutron-server is restarted before the event is
> sent
> > or some other internal failure). Nova will never update it's cache if the
> > heal_instance_info_cache_interval is set to 0.
>
> The thing is, this periodic healing doesn't imply correctness either.
> In the case where you lose a notification and the compute node hosting
> the VM is hosting a non-trivial amount of VMs it can take (With the default
> of 60 seconds) dozens of minutes to update the cache, since you only
> update a VM a minute. I could understand the use of a sanity check if it
> was performed much more often, but as it is now it seems useless to me
> since you can't really rely on it.
>

I agree with you. That's why we implemented the event callback so that the
cache would be more up to date. In honesty you can probably safely disable
the  heal_instance_info_cache_interval and things will probably be fine as
we haven't seen many failures where events from neutron fail to send. If we
find out this is the case we can definitely make the event sending
notification logic in neutron much more robust by persisting events to the
db and implementing retry logic on failure there to help ensure nova gets
the notification.

>
> What I'm trying to say is that with the inefficiency of the implementation,
> coupled with Neutron's default plugin inability to cope with a "large"
> amount of API calls, I feel like the disadvantages outweigh the
> advantages when it comes to the cache healing.
>

Right the current heal_instance implementation has scaling issues as every
compute node runs this task querying neutron. The more compute nodes you
have the more querying. Hopefully the nova v3 api should solve this issue
though as the networking information will no longer have to live in nova as
well. So someone interested in this data network data can query neutron
directly and we can avoid these type of caching issues all together :)

>
> How would you feel about disabling it, optimizing the implementation
> (For example by introducing a new networking_for_instance API verb to
> Neutron?)
> then enabling it again?
>

I think this is a good idea we should definitely implement something like
this so nova can interface with less api calls.

>
> > The neutron->nova events help
> > to ensure that the nova info_cache is up to date sooner by having neutron
> > inform nova whenever a port's data has changed (@Joe Gordon - this
> happens
> > regardless of virt driver).
> >
> > If you're using the libvirt virt driver the neutron->nova events will
> also be
> > used to ensure that the networking is 'ready' before the instance is
> powered
> > on.
> >
> > Best,
> >
> > Aaron
> >
> > P.S: we're working on making the heal_network call to neutron a lot less
> > expensive as well in the future.
> >
> >
> >
> >
> > On Tue, May 27, 2014 at 7:25 PM, Joe Gordon < joe.gordon0 at gmail.com >
> wrote:
> >
> >
> >
> >
> >
> >
> > On Wed, May 21, 2014 at 6:21 AM, Assaf Muller < amuller at redhat.com >
> wrote:
> >
> >
> > Dear Nova aficionados,
> >
> > Please make sure I understand this correctly:
> > Each nova compute instance selects a single VM out of all of the VMs
> > that it hosts, and every <heal_instance_info_cache_interval> seconds
> > queries Neutron for all of its networking information, then updates
> > Nova's DB.
> >
> > If the information above is correct, then I fail to see how that
> > is in anyway useful. For example, for a compute node hosting 20 VMs,
> > it would take 20 minutes to update the last one. Seems unacceptable
> > to me.
> >
> > Considering Icehouse's Neutron to Nova notifications, my question
> > is if we can change the default to 0 (Disable the feature), deprecate
> > it, then delete it in the K cycle. Is there a good reason not to do this?
> >
> > Based on the patch that introduced this function [0] you may be on to
> > something, but AFAIK unfortunately the neutron to nova notifications only
> > work in libvirt right now [1], so I don' think we can fully deprecate
> this
> > periodic task. That being said turning it off by default may be an
> option.
> > Have you tried disabling this feature and seeing what happens (in the
> gate
> > and/or in production)?
> >
>
> We've disabled it in a scale lab and didn't observe any black holes forming
> or other catastrophes.
>
> >
> > [0] https://review.openstack.org/#/c/4269/
> > [1] https://wiki.openstack.org/wiki/ReleaseNotes/Icehouse
> >
> >
> >
> >
> > Assaf Muller, Cloud Networking Engineer
> > Red Hat
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
> >
> >
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140528/103aaae1/attachment.html>


More information about the OpenStack-dev mailing list