[openstack-dev] [nova][neutron][SR-IOV] Hardware changes and shifting PCI addresses
Jay Pipes
jaypipes at gmail.com
Tue Sep 15 01:34:31 UTC 2015
On 09/10/2015 05:23 PM, Brent Eagles wrote:
> Hi,
>
> I was recently informed of a situation that came up when an engineer
> added an SR-IOV nic to a compute node that was hosting some guests that
> had VFs attached. Unfortunately, adding the card shuffled the PCI
> addresses causing some degree of havoc. Basically, the PCI addresses
> associated with the previously allocated VFs were no longer valid.
>
> I tend to consider this a non-issue. The expectation that hosts have
> relatively static hardware configuration (and kernel/driver configs for
> that matter) is the price you pay for having pets with direct hardware
> access. That being said, this did come as a surprise to some of those
> involved and I don't think we have any messaging around this or advice
> on how to deal with situations like this.
>
> So what should we do? I can't quite see altering OpenStack to deal with
> this situation (or even how that could work). Has anyone done any
> research into this problem, even if it is how to recover or extricate
> a guest that is no longer valid? It seems that at the very least we
> could use some stern warnings in the docs.
Hi Brent,
Interesting issue. We have code in the PCI tracker that ostensibly
handles this problem:
https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L145-L164
But the note from yjiang5 is telling:
# Pci properties may change while assigned because of
# hotplug or config changes. Although normally this should
# not happen.
# As the devices have been assigned to a instance, we defer
# the change till the instance is destroyed. We will
# not sync the new properties with database before that.
# TODO(yjiang5): Not sure if this is a right policy, but
# at least it avoids some confusion and, if
# we can add more action like killing the instance
# by force in future.
Basically, if the PCI device tracker notices that an instance is
assigned a PCI device with an address that no longer exists in the PCI
device addresses returned from libvirt, it will (eventually, in the
_free_instance() method) remove the PCI device assignment from the
Instance object, but it will make no attempt to assign a new PCI device
that meets the original PCI device specification in the launch request.
Should we handle this case and attempt a "hot re-assignment of a PCI
device"? Perhaps. Is it high priority? Not really, IMHO.
If you'd like to file a bug against Nova, that would be cool, though.
Best,
-jay
More information about the OpenStack-dev
mailing list