[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf
Jay Pipes
jaypipes at gmail.com
Tue Jul 11 00:12:54 UTC 2017
Unfortunately, Eddie, I'm not entirely sure what is going on with your
situation. According to the code, the non-existing PCI device should be
removed from the pci_devices table when the PCI manager notices the PCI
device is no longer on the local host...
On 07/09/2017 08:36 PM, Eddie Yen wrote:
> Hi there,
>
> Does the information already enough or need additional items?
>
> Thanks,
> Eddie.
>
> 2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0407 at gmail.com
> <mailto:missile0407 at gmail.com>>:
>
> Sorry,
>
> Re-new the nova-compute log after remove "1002:68c8" and restart
> nova-compute.
> http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
> <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/>
>
> 2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0407 at gmail.com
> <mailto:missile0407 at gmail.com>>:
>
> Hi Jay,
>
> Below are few logs and information you may want to check.
>
>
>
> I wrote GPU inforamtion into nova.conf like this.
>
> pci_passthrough_whitelist = [{ "product_id":"0ff3",
> "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]
>
> pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
> "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
> "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]
>
>
> Then restart the services.
>
> nova-compute log when insert new GPU device info into nova.conf
> and restart service:
> http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
> <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/>
>
> Strange is, the log shows that resource tracker only collect
> information of new setup GPU, not included the old one.
>
>
> But If I do some actions on the instance contained old GPU, the
> tracker will get both GPU.
> http://paste.openstack.org/show/614658/
> <http://paste.openstack.org/show/614658/>
>
> Nova database shows correct information on both GPU
> http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
> <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/>
>
>
>
> Now remove ID "1002:68c8" from nova.conf and compute node, and
> restart services.
>
> The pci_passthrough_whitelist and pci_alias only keep
> "10de:0ff3" GPU info.
>
> pci_passthrough_whitelist = { "product_id":"0ff3",
> "vendor_id":"10de" }
>
> pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
> "device_type":"type-PCI", "name":"k420" }
>
>
> nova-compute log shows resource tracker report node only have
> "10de:0ff3" PCI resource
> http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
> <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/>
>
> But in Nova database, "1002:68c8" still exist, and stayed in
> "Available" status. Even "deleted" value shows not zero.
> http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
> <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/>
>
>
> Many thanks,
> Eddie.
>
> 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com
> <mailto:missile0407 at gmail.com>>:
>
> Uh wait,
>
> Is that possible it still shows available if PCI device
> still exist in the same address?
>
> Because when I remove the GPU card, I replace it to a SFP+
> network card in the same slot.
> So when I type lspci the SFP+ card stay in the same address.
>
> But it still doesn't make any sense because these two cards
> definitely not a same VID:PID.
> And I set the information as VID:PID in nova.conf
>
>
> I'll try reproduce this issue and put a log on this list.
>
> Thanks,
>
> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com
> <mailto:jaypipes at gmail.com>>:
>
> Hmm, very odd indeed. Any way you can save the
> nova-compute logs from when you removed the GPU and
> restarted the nova-compute service and paste those logs
> to paste.openstack.org <http://paste.openstack.org>?
> Would be useful in tracking down this buggy behaviour...
>
> Best,
> -jay
>
> On 07/06/2017 08:54 PM, Eddie Yen wrote:
>
> Hi Jay,
>
> The status of the "removed" GPU still shows as
> "Available" in pci_devices table.
>
> 2017-07-07 8:34 GMT+08:00 Jay Pipes
> <jaypipes at gmail.com <mailto:jaypipes at gmail.com>
> <mailto:jaypipes at gmail.com
> <mailto:jaypipes at gmail.com>>>:
>
>
> Hi again, Eddie :) Answer inline...
>
> On 07/06/2017 08:14 PM, Eddie Yen wrote:
>
> Hi everyone,
>
> I'm using OpenStack Mitaka version
> (deployed from Fuel 9.2)
>
> In present, I installed two different model
> of GPU card.
>
> And wrote these information into pci_alias and
> pci_passthrough_whitelist in nova.conf on
> Controller and Compute
> (the node which installed GPU).
> Then restart nova-api, nova-scheduler,and
> nova-compute.
>
> When I check database, both of GPU info
> registered in
> pci_devices table.
>
> Now I removed one of the GPU from compute
> node, and remove the
> information from nova.conf, then restart
> services.
>
> But I check database again, the information
> of the removed card
> still exist in pci_devices table.
>
> How can I do to fix this problem?
>
>
> So, when you removed the GPU from the compute
> node and restarted the
> nova-compute service, it *should* have noticed
> you had removed the
> GPU and marked that PCI device as deleted. At
> least, according to
> this code in the PCI manager:
>
> https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183
> <https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183>
>
> <https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183
> <https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183>>
>
> Question for you: what is the value of the
> status field in the
> pci_devices table for the GPU that you removed?
>
> Best,
> -jay
>
> p.s. If you really want to get rid of that
> device, simply remove
> that record from the pci_devices table. But,
> again, it *should* be
> removed automatically...
>
> _______________________________________________
> Mailing list:
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>
> Post to : openstack at lists.openstack.org
> <mailto:openstack at lists.openstack.org>
> <mailto:openstack at lists.openstack.org
> <mailto:openstack at lists.openstack.org>>
> Unsubscribe :
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>>
>
>
>
>
>
>
More information about the Openstack
mailing list