[Openstack] QEMU/KVM crash when mixing cpu_policy:dedicated and non-dedicated flavors?

Tomas Brännström tomas.a.brannstrom at tieto.com
Mon Sep 18 07:29:46 UTC 2017


We use Fuel for deployment, with a fairly simple network configuration
(Controller/Network node are the same) and OpenDaylight as the neutron
driver. However, we also have SR-IOV configured for some nics, and there
might be something interesting here.

The instance was created with an SR-IOV port, and in the logs I see
"Assigning a pci device without numa affinity toinstance
389109a4-540e-48d9-82b1-873b02cb4d31 which has numa topology". Then shortly
after creation fails and the hypervisor seems to crash.

So today I tried to create an instance without SR-IOV and
hw:policy=dedicated, and it worked fine. Then I did the same but added an
SR-IOV port, and I get the same crash (though not across all nodes this
time...)

I assume we have some kind of misconfiguration somewhere, though the entire
hypervisor crashing doesn't seem correct either :-)

/Tomas

On 17 September 2017 at 00:32, Steve Gordon <sgordon at redhat.com> wrote:

> ----- Original Message -----
> > From: "Tomas Brännström" <tomas.a.brannstrom at tieto.com>
> > To: openstack at lists.openstack.org
> > Sent: Friday, September 15, 2017 5:56:34 AM
> > Subject: [Openstack] QEMU/KVM crash when mixing cpu_policy:dedicated and
> non-dedicated flavors?
> >
> > Hi
> > I just noticed a strange (?) issue when I tried to create an instance
> with
> > a flavor with hw:cpu_policy=dedicated. The instance failed with error:
> >
> > Unable to read from monitor: Connection reset by peer', u'code': 500,
> > u'details': u'  File
> > "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line 1926,
> in
> > _do_build_and_run_instance\n    filter_properties)
> > File "/usr/lib/python2.7/dist-packages/nova/compute/manager.py", line
> 2116,
> > in _build_and_run_instance\n    instance_uuid=instance.uuid,
> > reason=six.text_type(e))
> >
> > And all other instances were shut down, even those living on another
> > compute host than the new one was scheduled to. A quick googling reveals
> > that this could be due to the hypervisor crashing (though why would it
> > crash on unrelated compute hosts??).
>
> Are there any more specific messages in the system logs or elsewhere?
> Check /var/log/libvirt/* in particular, though I suspect it will be the
> original source of the above message it may have some additional useful
> information earlier.
>
> >
> > The only odd thing here that I can think of was that the existing
> instances
> > did -not- use dedicated cpu policy -- can there be problems like this
> when
> > attempting to mix dedicated and non-dedicated policies?
>
> The main problem if you mix them *on the same node* is that Nova wont
> account properly for this when placing guests, the current design assumes
> that a node will be used either for "normal" instances (with CPU
> overcommit) or "dedicated" instances (no CPU overcommit, pinning) and the
> two will be separated via the use of host aggregates and flavors. This in
> and of itself should not result in a QEMU crash though it may eventually
> result in issues w.r.t. balancing of scheduling/placement decisions. If
> instances on other nodes went down at the same time I'd be looking for a
> broader issue, what is your storage and networking setup like?
>
> -Steve
>
> > This was with Mitaka.
> >
> > /Tomas
> >
> > _______________________________________________
> > Mailing list: http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
> > Post to     : openstack at lists.openstack.org
> > Unsubscribe : http://lists.openstack.org/cgi-bin/mailman/listinfo/
> openstack
> >
>
> --
> Steve Gordon,
> Principal Product Manager,
> Red Hat OpenStack Platform
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170918/2a08938f/attachment.html>


More information about the Openstack mailing list