[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf

Eddie Yen missile0407 at gmail.com
Tue Jul 11 00:17:50 UTC 2017


Roger that,

I may going to report this bug on the OpenStack Compute (Nova) Launchpad to
see what happen.

Anyway, thanks for ur help, really appreciate.


Eddie.

2017-07-11 8:12 GMT+08:00 Jay Pipes <jaypipes at gmail.com>:

> Unfortunately, Eddie, I'm not entirely sure what is going on with your
> situation. According to the code, the non-existing PCI device should be
> removed from the pci_devices table when the PCI manager notices the PCI
> device is no longer on the local host...
>
> On 07/09/2017 08:36 PM, Eddie Yen wrote:
>
>> Hi there,
>>
>> Does the information already enough or need additional items?
>>
>> Thanks,
>> Eddie.
>>
>> 2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0407 at gmail.com <mailto:
>> missile0407 at gmail.com>>:
>>
>>     Sorry,
>>
>>     Re-new the nova-compute log after remove "1002:68c8" and restart
>>     nova-compute.
>>     http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
>>     <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/>
>>
>>     2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0407 at gmail.com
>>     <mailto:missile0407 at gmail.com>>:
>>
>>
>>         Hi Jay,
>>
>>         Below are few logs and information you may want to check.
>>
>>
>>
>>         I wrote GPU inforamtion into nova.conf like this.
>>
>>         pci_passthrough_whitelist = [{ "product_id":"0ff3",
>>         "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]
>>
>>         pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
>>         "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
>>         "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]
>>
>>
>>         Then restart the services.
>>
>>         nova-compute log when insert new GPU device info into nova.conf
>>         and restart service:
>>         http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
>>         <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/>
>>
>>         Strange is, the log shows that resource tracker only collect
>>         information of new setup GPU, not included the old one.
>>
>>
>>         But If I do some actions on the instance contained old GPU, the
>>         tracker will get both GPU.
>>         http://paste.openstack.org/show/614658/
>>         <http://paste.openstack.org/show/614658/>
>>
>>         Nova database shows correct information on both GPU
>>         http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
>>         <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/>
>>
>>
>>
>>         Now remove ID "1002:68c8" from nova.conf and compute node, and
>>         restart services.
>>
>>         The pci_passthrough_whitelist and pci_alias only keep
>>         "10de:0ff3" GPU info.
>>
>>         pci_passthrough_whitelist = { "product_id":"0ff3",
>>         "vendor_id":"10de" }
>>
>>         pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
>>         "device_type":"type-PCI", "name":"k420" }
>>
>>
>>         nova-compute log shows resource tracker report node only have
>>         "10de:0ff3" PCI resource
>>         http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
>>         <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/>
>>
>>         But in Nova database, "1002:68c8" still exist, and stayed in
>>         "Available" status. Even "deleted" value shows not zero.
>>         http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
>>         <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/>
>>
>>
>>         Many thanks,
>>         Eddie.
>>
>>         2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com
>>         <mailto:missile0407 at gmail.com>>:
>>
>>             Uh wait,
>>
>>             Is that possible it still shows available if PCI device
>>             still exist in the same address?
>>
>>             Because when I remove the GPU card, I replace it to a SFP+
>>             network card in the same slot.
>>             So when I type lspci the SFP+ card stay in the same address.
>>
>>             But it still doesn't make any sense because these two cards
>>             definitely not a same VID:PID.
>>             And I set the information as VID:PID in nova.conf
>>
>>
>>             I'll try reproduce this issue and put a log on this list.
>>
>>             Thanks,
>>
>>             2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com
>>             <mailto:jaypipes at gmail.com>>:
>>
>>                 Hmm, very odd indeed. Any way you can save the
>>                 nova-compute logs from when you removed the GPU and
>>                 restarted the nova-compute service and paste those logs
>>                 to paste.openstack.org <http://paste.openstack.org>?
>>                 Would be useful in tracking down this buggy behaviour...
>>
>>                 Best,
>>                 -jay
>>
>>                 On 07/06/2017 08:54 PM, Eddie Yen wrote:
>>
>>                     Hi Jay,
>>
>>                     The status of the "removed" GPU still shows as
>>                     "Available" in pci_devices table.
>>
>>                     2017-07-07 8:34 GMT+08:00 Jay Pipes
>>                     <jaypipes at gmail.com <mailto:jaypipes at gmail.com>
>>                     <mailto:jaypipes at gmail.com
>>
>>                     <mailto:jaypipes at gmail.com>>>:
>>
>>
>>                          Hi again, Eddie :) Answer inline...
>>
>>                          On 07/06/2017 08:14 PM, Eddie Yen wrote:
>>
>>                              Hi everyone,
>>
>>                              I'm using OpenStack Mitaka version
>>                     (deployed from Fuel 9.2)
>>
>>                              In present, I installed two different model
>>                     of GPU card.
>>
>>                              And wrote these information into pci_alias
>> and
>>                              pci_passthrough_whitelist in nova.conf on
>>                     Controller and Compute
>>                              (the node which installed GPU).
>>                              Then restart nova-api, nova-scheduler,and
>>                     nova-compute.
>>
>>                              When I check database, both of GPU info
>>                     registered in
>>                              pci_devices table.
>>
>>                              Now I removed one of the GPU from compute
>>                     node, and remove the
>>                              information from nova.conf, then restart
>>                     services.
>>
>>                              But I check database again, the information
>>                     of the removed card
>>                              still exist in pci_devices table.
>>
>>                              How can I do to fix this problem?
>>
>>
>>                          So, when you removed the GPU from the compute
>>                     node and restarted the
>>                          nova-compute service, it *should* have noticed
>>                     you had removed the
>>                          GPU and marked that PCI device as deleted. At
>>                     least, according to
>>                          this code in the PCI manager:
>>
>>                     https://github.com/openstack/n
>> ova/blob/master/nova/pci/manager.py#L168-L183
>>                     <https://github.com/openstack/
>> nova/blob/master/nova/pci/manager.py#L168-L183>
>>                                             <
>> https://github.com/openstack/nova/blob/master/nova/pci/mana
>> ger.py#L168-L183
>>                     <https://github.com/openstack/
>> nova/blob/master/nova/pci/manager.py#L168-L183>>
>>
>>                          Question for you: what is the value of the
>>                     status field in the
>>                          pci_devices table for the GPU that you removed?
>>
>>                          Best,
>>                          -jay
>>
>>                          p.s. If you really want to get rid of that
>>                     device, simply remove
>>                          that record from the pci_devices table. But,
>>                     again, it *should* be
>>                          removed automatically...
>>
>>                          _______________________________________________
>>                          Mailing list:
>>                     http://lists.openstack.org/cgi
>> -bin/mailman/listinfo/openstack
>>                     <http://lists.openstack.org/cg
>> i-bin/mailman/listinfo/openstack>
>>                                             <
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>                     <http://lists.openstack.org/cg
>> i-bin/mailman/listinfo/openstack>>
>>                          Post to     : openstack at lists.openstack.org
>>                     <mailto:openstack at lists.openstack.org>
>>                          <mailto:openstack at lists.openstack.org
>>                     <mailto:openstack at lists.openstack.org>>
>>                          Unsubscribe :
>>                     http://lists.openstack.org/cgi
>> -bin/mailman/listinfo/openstack
>>                     <http://lists.openstack.org/cg
>> i-bin/mailman/listinfo/openstack>
>>                                             <
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>                     <http://lists.openstack.org/cg
>> i-bin/mailman/listinfo/openstack>>
>>
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170711/e98e880b/attachment.html>


More information about the Openstack mailing list