[Openstack] QEMU/KVM crash when mixing cpu_policy:dedicated and non-dedicated flavors?

Steve Gordon sgordon at redhat.com
Sat Sep 16 22:32:51 UTC 2017


----- Original Message -----
> From: "Tomas Brännström" <tomas.a.brannstrom at tieto.com>
> To: openstack at lists.openstack.org
> Sent: Friday, September 15, 2017 5:56:34 AM
> Subject: [Openstack] QEMU/KVM crash when mixing cpu_policy:dedicated and non-dedicated flavors?
> 
> Hi
> I just noticed a strange (?) issue when I tried to create an instance with
> a flavor with hw:cpu_policy=dedicated. The instance failed with error:
> 
> Unable to read from monitor: Connection reset by peer', u'code': 500,
> u'details': u'  File
> "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926, in
> _do_build_and_run_instance\n    filter_properties)
> File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 2116,
> in _build_and_run_instance\n    instance_uuid=instance.uuid,
> reason=six.text_type(e))
> 
> And all other instances were shut down, even those living on another
> compute host than the new one was scheduled to. A quick googling reveals
> that this could be due to the hypervisor crashing (though why would it
> crash on unrelated compute hosts??).

Are there any more specific messages in the system logs or elsewhere? Check /var/log/libvirt/* in particular, though I suspect it will be the original source of the above message it may have some additional useful information earlier.

> 
> The only odd thing here that I can think of was that the existing instances
> did -not- use dedicated cpu policy -- can there be problems like this when
> attempting to mix dedicated and non-dedicated policies?

The main problem if you mix them *on the same node* is that Nova wont account properly for this when placing guests, the current design assumes that a node will be used either for "normal" instances (with CPU overcommit) or "dedicated" instances (no CPU overcommit, pinning) and the two will be separated via the use of host aggregates and flavors. This in and of itself should not result in a QEMU crash though it may eventually result in issues w.r.t. balancing of scheduling/placement decisions. If instances on other nodes went down at the same time I'd be looking for a broader issue, what is your storage and networking setup like?

-Steve

> This was with Mitaka.
> 
> /Tomas
> 
> _______________________________________________
> Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> Post to     : openstack at lists.openstack.org
> Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> 

-- 
Steve Gordon,
Principal Product Manager,
Red Hat OpenStack Platform



More information about the Openstack mailing list