[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf
Eddie Yen
missile0407 at gmail.com
Fri Jul 7 02:49:48 UTC 2017
Sorry,
Re-new the nova-compute log after remove "1002:68c8" and restart
nova-compute.
http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/
2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0407 at gmail.com>:
> Hi Jay,
>
> Below are few logs and information you may want to check.
>
>
>
> I wrote GPU inforamtion into nova.conf like this.
>
> pci_passthrough_whitelist = [{ "product_id":"0ff3", "vendor_id":"10de" },
> { "product_id":"68c8", "vendor_id":"1002" }]
>
> pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de", "device_type":
> "type-PCI", "name":"k420" }, { "product_id":"68c8", "vendor_id":"1002",
> "device_type":"type-PCI", "name":"v4800" }]
>
> Then restart the services.
>
> nova-compute log when insert new GPU device info into nova.conf and
> restart service:
> http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/
>
> Strange is, the log shows that resource tracker only collect information
> of new setup GPU, not included the old one.
>
>
> But If I do some actions on the instance contained old GPU, the tracker
> will get both GPU.
> http://paste.openstack.org/show/614658/
>
> Nova database shows correct information on both GPU
> http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/
>
>
>
> Now remove ID "1002:68c8" from nova.conf and compute node, and restart
> services.
>
> The pci_passthrough_whitelist and pci_alias only keep "10de:0ff3" GPU info.
>
> pci_passthrough_whitelist = { "product_id":"0ff3", "vendor_id":"10de" }
>
> pci_alias = { "product_id":"0ff3", "vendor_id":"10de", "device_type":
> "type-PCI", "name":"k420" }
>
> nova-compute log shows resource tracker report node only have "10de:0ff3"
> PCI resource
> http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/
>
> But in Nova database, "1002:68c8" still exist, and stayed in "Available"
> status. Even "deleted" value shows not zero.
> http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/
>
>
> Many thanks,
> Eddie.
>
> 2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com>:
>
>> Uh wait,
>>
>> Is that possible it still shows available if PCI device still exist in
>> the same address?
>>
>> Because when I remove the GPU card, I replace it to a SFP+ network card
>> in the same slot.
>> So when I type lspci the SFP+ card stay in the same address.
>>
>> But it still doesn't make any sense because these two cards definitely
>> not a same VID:PID.
>> And I set the information as VID:PID in nova.conf
>>
>>
>> I'll try reproduce this issue and put a log on this list.
>>
>> Thanks,
>>
>> 2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com>:
>>
>>> Hmm, very odd indeed. Any way you can save the nova-compute logs from
>>> when you removed the GPU and restarted the nova-compute service and paste
>>> those logs to paste.openstack.org? Would be useful in tracking down
>>> this buggy behaviour...
>>>
>>> Best,
>>> -jay
>>>
>>> On 07/06/2017 08:54 PM, Eddie Yen wrote:
>>>
>>>> Hi Jay,
>>>>
>>>> The status of the "removed" GPU still shows as "Available" in
>>>> pci_devices table.
>>>>
>>>> 2017-07-07 8:34 GMT+08:00 Jay Pipes <jaypipes at gmail.com <mailto:
>>>> jaypipes at gmail.com>>:
>>>>
>>>>
>>>> Hi again, Eddie :) Answer inline...
>>>>
>>>> On 07/06/2017 08:14 PM, Eddie Yen wrote:
>>>>
>>>> Hi everyone,
>>>>
>>>> I'm using OpenStack Mitaka version (deployed from Fuel 9.2)
>>>>
>>>> In present, I installed two different model of GPU card.
>>>>
>>>> And wrote these information into pci_alias and
>>>> pci_passthrough_whitelist in nova.conf on Controller and Compute
>>>> (the node which installed GPU).
>>>> Then restart nova-api, nova-scheduler,and nova-compute.
>>>>
>>>> When I check database, both of GPU info registered in
>>>> pci_devices table.
>>>>
>>>> Now I removed one of the GPU from compute node, and remove the
>>>> information from nova.conf, then restart services.
>>>>
>>>> But I check database again, the information of the removed card
>>>> still exist in pci_devices table.
>>>>
>>>> How can I do to fix this problem?
>>>>
>>>>
>>>> So, when you removed the GPU from the compute node and restarted the
>>>> nova-compute service, it *should* have noticed you had removed the
>>>> GPU and marked that PCI device as deleted. At least, according to
>>>> this code in the PCI manager:
>>>>
>>>> https://github.com/openstack/nova/blob/master/nova/pci/manag
>>>> er.py#L168-L183
>>>> <https://github.com/openstack/nova/blob/master/nova/pci/mana
>>>> ger.py#L168-L183>
>>>>
>>>> Question for you: what is the value of the status field in the
>>>> pci_devices table for the GPU that you removed?
>>>>
>>>> Best,
>>>> -jay
>>>>
>>>> p.s. If you really want to get rid of that device, simply remove
>>>> that record from the pci_devices table. But, again, it *should* be
>>>> removed automatically...
>>>>
>>>> _______________________________________________
>>>> Mailing list:
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>>>> Post to : openstack at lists.openstack.org
>>>> <mailto:openstack at lists.openstack.org>
>>>> Unsubscribe :
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>>> <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack>
>>>>
>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170707/a22cfd88/attachment.html>
More information about the Openstack
mailing list