[Openstack] problem with '_heal_instance_info_cache': am i the only one hitting this?

Nathanael Burton nathanael.i.burton at gmail.com
Fri Feb 6 22:46:57 UTC 2015


On Feb 2, 2015 9:44 AM, "Don Waterloo" <don.waterloo at gmail.com> wrote:
>
> I entered a bug as https://bugs.launchpad.net/nova/+bug/1413049. My
'patch' in there is not correct so ignore that :)
>
> What i'm finding is, about once or twice a day, i run into a race
condition where _heal_instance_info_cache() is active, and a new instance
is created @ the same time. The heal ends up overwriting the info cache to
[], and this is never corrected, leading to an instance that is running ok,
but broken in the database.
>
> if you run
> mysql -e "select
instances.host,instances.hostname,instances.uuid,instances.user_id from
instance_info_caches,instances where network_info = '[]' and
instances.deleted = 0 and instances.uuid =
instance_info_caches.instance_uuid;" nova
>
> it should return nothing. for me, it shows the broken instances.
>
> And they are indeed broken, they often have multiple interfaces. If the
user does a 'rebuild', then the libvirt xml file ends up with no source
bridges.
>
> I have:
> reclaim_instance_interval = 0
> heal_instance_info_cache_interval = 20
> periodic_interval=10
> image_cache_manager_interval=10
> running_deleted_instance_poll_interval=10
> instance_delete_interval=10
> running_deleted_instance_action=reap
>
>
> set.
>
> Is no one else hitting this? This might be an unusual environment since
we create instances quite dynamically (maybe 500-1000/day, all from heat so
they start a lot all @ once).
>
> _______________________________________________
> Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>

Don,

In addition to the issue we were having with recreating missing info cache
data for instances which I mentioned to you a few weeks ago (related to
this bug https://bugs.launchpad.net/nova/+bug/1378459) I think we are also
seeing this behaviour. It occurs under heavy elastic instance creation
periods.  All networking gets set up correctly, but there is no info
cache.  Within 10-60 minutes the info cache gets rebuilt by the periodic
task and everything is OK.  I'll try to test the review patch next week
that DIMS linked in your bug and see if I have any success.

Thanks,

Nate
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150206/42f3e03b/attachment.html>


More information about the Openstack mailing list