[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf
Eddie Yen
missile0407 at gmail.com
Mon Jul 10 00:36:42 UTC 2017
Hi there,
Does the information already enough or need additional items?
Thanks,
Eddie.
2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0407 at gmail.com>:
> Sorry,
>
> Re-new the nova-compute log after remove "1002:68c8" and restart
> nova-compute.
> http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
>
> 2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0407 at gmail.com>:
>
>> Hi Jay,
>>
>> Below are few logs and information you may want to check.
>>
>>
>>
>> I wrote GPU inforamtion into nova.conf like this.
>>
>> pci_passthrough_whitelist = [{ "product_id":"0ff3", "vendor_id":"10de"
>> }, { "product_id":"68c8", "vendor_id":"1002" }]
>>
>> pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", "device_type":
>> "type-PCI", "name":"k420" }, { "product_id":"68c8", "vendor_id":"1002",
>> "device_type":"type-PCI", "name":"v4800" }]
>>
>> Then restart the services.
>>
>> nova-compute log when insert new GPU device info into nova.conf and
>> restart service:
>> http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
>>
>> Strange is, the log shows that resource tracker only collect information
>> of new setup GPU, not included the old one.
>>
>>
>> But If I do some actions on the instance contained old GPU, the tracker
>> will get both GPU.
>> http://paste.openstack.org/show/614658/
>>
>> Nova database shows correct information on both GPU
>> http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
>>
>>
>>
>> Now remove ID "1002:68c8" from nova.conf and compute node, and restart
>> services.
>>
>> The pci_passthrough_whitelist and pci_alias only keep "10de:0ff3" GPU
>> info.
>>
>> pci_passthrough_whitelist = { "product_id":"0ff3", "vendor_id":"10de" }
>>
>> pci_alias = { "product_id":"0ff3", "vendor_id":"10de", "device_type":"
>> type-PCI", "name":"k420" }
>>
>> nova-compute log shows resource tracker report node only have "10de:0ff3"
>> PCI resource
>> http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
>>
>> But in Nova database, "1002:68c8" still exist, and stayed in "Available"
>> status. Even "deleted" value shows not zero.
>> http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
>>
>>
>> Many thanks,
>> Eddie.
>>
>> 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com>:
>>
>>> Uh wait,
>>>
>>> Is that possible it still shows available if PCI device still exist in
>>> the same address?
>>>
>>> Because when I remove the GPU card, I replace it to a SFP+ network card
>>> in the same slot.
>>> So when I type lspci the SFP+ card stay in the same address.
>>>
>>> But it still doesn't make any sense because these two cards definitely
>>> not a same VID:PID.
>>> And I set the information as VID:PID in nova.conf
>>>
>>>
>>> I'll try reproduce this issue and put a log on this list.
>>>
>>> Thanks,
>>>
>>> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com>:
>>>
>>>> Hmm, very odd indeed. Any way you can save the nova-compute logs from
>>>> when you removed the GPU and restarted the nova-compute service and paste
>>>> those logs to paste.openstack.org? Would be useful in tracking down
>>>> this buggy behaviour...
>>>>
>>>> Best,
>>>> -jay
>>>>
>>>> On 07/06/2017 08:54 PM, Eddie Yen wrote:
>>>>
>>>>> Hi Jay,
>>>>>
>>>>> The status of the "removed" GPU still shows as "Available" in
>>>>> pci_devices table.
>>>>>
>>>>> 2017-07-07 8:34 GMT+08:00 Jay Pipes <jaypipes at gmail.com <mailto:
>>>>> jaypipes at gmail.com>>:
>>>>>
>>>>>
>>>>> Hi again, Eddie :) Answer inline...
>>>>>
>>>>> On 07/06/2017 08:14 PM, Eddie Yen wrote:
>>>>>
>>>>> Hi everyone,
>>>>>
>>>>> I'm using OpenStack Mitaka version (deployed from Fuel 9.2)
>>>>>
>>>>> In present, I installed two different model of GPU card.
>>>>>
>>>>> And wrote these information into pci_alias and
>>>>> pci_passthrough_whitelist in nova.conf on Controller and
>>>>> Compute
>>>>> (the node which installed GPU).
>>>>> Then restart nova-api, nova-scheduler,and nova-compute.
>>>>>
>>>>> When I check database, both of GPU info registered in
>>>>> pci_devices table.
>>>>>
>>>>> Now I removed one of the GPU from compute node, and remove the
>>>>> information from nova.conf, then restart services.
>>>>>
>>>>> But I check database again, the information of the removed card
>>>>> still exist in pci_devices table.
>>>>>
>>>>> How can I do to fix this problem?
>>>>>
>>>>>
>>>>> So, when you removed the GPU from the compute node and restarted
>>>>> the
>>>>> nova-compute service, it *should* have noticed you had removed the
>>>>> GPU and marked that PCI device as deleted. At least, according to
>>>>> this code in the PCI manager:
>>>>>
>>>>> https://github.com/openstack/nova/blob/master/nova/pci/manag
>>>>> er.py#L168-L183
>>>>> <https://github.com/openstack/nova/blob/master/nova/pci/mana
>>>>> ger.py#L168-L183>
>>>>>
>>>>> Question for you: what is the value of the status field in the
>>>>> pci_devices table for the GPU that you removed?
>>>>>
>>>>> Best,
>>>>> -jay
>>>>>
>>>>> p.s. If you really want to get rid of that device, simply remove
>>>>> that record from the pci_devices table. But, again, it *should* be
>>>>> removed automatically...
>>>>>
>>>>> _______________________________________________
>>>>> Mailing list:
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>>>>> Post to : openstack at lists.openstack.org
>>>>> <mailto:openstack at lists.openstack.org>
>>>>> Unsubscribe :
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>>>>>
>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170710/3c494429/attachment.html>
More information about the Openstack
mailing list