[Openstack] problem with '_heal_instance_info_cache': am i the only one hitting this?

Don Waterloo don.waterloo at gmail.com
Mon Feb 2 14:34:21 UTC 2015


I entered a bug as https://bugs.launchpad.net/nova/+bug/1413049. My 'patch'
in there is not correct so ignore that :)

What i'm finding is, about once or twice a day, i run into a race condition
where _heal_instance_info_cache() is active, and a new instance is created
@ the same time. The heal ends up overwriting the info cache to [], and
this is never corrected, leading to an instance that is running ok, but
broken in the database.

if you run
mysql -e "select
instances.host,instances.hostname,instances.uuid,instances.user_id from
instance_info_caches,instances where network_info = '[]' and
instances.deleted = 0 and instances.uuid =
instance_info_caches.instance_uuid;" nova

it should return nothing. for me, it shows the broken instances.

And they are indeed broken, they often have multiple interfaces. If the
user does a 'rebuild', then the libvirt xml file ends up with no source
bridges.

I have:
reclaim_instance_interval = 0
heal_instance_info_cache_interval = 20
periodic_interval=10
image_cache_manager_interval=10
running_deleted_instance_poll_interval=10
instance_delete_interval=10
running_deleted_instance_action=reap


set.

Is no one else hitting this? This might be an unusual environment since we
create instances quite dynamically (maybe 500-1000/day, all from heat so
they start a lot all @ once).
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150202/dde6ca5b/attachment.html>


More information about the Openstack mailing list