[Openstack] nova-compute won't restart (on some nodes) after Grizzly upgrade

Jonathan Proulx jon at jonproulx.com
Sun Aug 11 12:17:31 UTC 2013


Hi Michael,

Thanks for the offer.  I'd be happy to paste up some compute logs if you
have a interest, but I got around the issue with:

virsh list --all

and then 'virsh undefine' for all deleted instances on each host.  I've
used hypervisors directly and high level stuff like openstack (and others)
but never spent much time at the libvirt layer so that was a bit of new
info for me apparrently from the operators list not long after I sent my
query here.

Thanks,
-Jon


On Wed, Aug 7, 2013 at 9:02 PM, Michael Still <mikal at stillhq.com> wrote:

> Johnathan,
>
> this would be easier to debug with a nova-compute log. Are you willing
> to post one somewhere that people could take a look at?
>
> Thanks,
> Michael
>
> On Thu, Aug 8, 2013 at 7:35 AM, Jonathan Proulx <jon at jonproulx.com> wrote:
> > Hi All,
> >
> > Apologies to those who saw this on the operators list earlier, there is a
> > bit of new info here & having gotten no response there thought I'd take
> it
> > to a wider audience...
> >
> >
> > I'm almost through my grizzly upgrade.  I'd upgraded everything except
> > nova-compute before upgrading that (ubuntu 12.04 cloud archieve pkgs).
> >
> > On most nodes the nova-compute service upgraded and restarted properly,
> but
> > on some it imediately exits with:
> >
> > CRITICAL nova [-] 'instance_type_memory_mb'
> >
> > It would seem like this is https://code.launchpad.net/bugs/1161022 but
> the
> > fix for that was released in March and I've verified is in the packaged
> > version I'm using.
> >
> > The referenced bug involves the DB migration only updating non-deleted
> > instances in the instance-system-metatata table and the patch skips the
> > lookups that are broken (and irrelevant) for deleted instances.
> >
> > Tracing the DB calls from the host shows it is trying to do lookups for
> > instances that were deleted last October, which is a bit surprising as
> it's
> > run thousands of instances since & it's not looking those up.
> >
> > It is note worthy that that is around the time I upgraded from Essex ->
> > Folsom so it's possible their state  is weirder than most having run
> through
> > that update.
> >
> > There were directories for the instances in question in
> > /var/lib/nova/instances, so I thought "Aha!" and moved them, but on
> restart
> > I still get the same failure and same DB query for the old instances.
> Where
> > is nova getting the idea it should look these up & how can I stop it?
> >
> > I've go so far as to generate instance_type_<foo> entries in the
> > instance_system_metadata table  for all instances ever on my deployment
> > (about 500k) but I still only have the cryptic "CRITICAL nova [-]
> > 'instance_type_memory_mb'" error and a failure to start, so clearly I'm
> > casing the wrong problem some how.
> >
> > Help?
> > -Jon
> >
> > _______________________________________________
> > Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> > Post to     : openstack at lists.openstack.org
> > Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> >
>
>
>
> --
> Rackspace Australia
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20130811/e4e8a493/attachment.html>


More information about the Openstack mailing list