<div dir="ltr">Roger that,<div><br></div><div>I may going to report this bug on the OpenStack Compute (Nova) Launchpad to see what happen.<br><br>Anyway, thanks for ur help, really appreciate.<br><br><br>Eddie.</div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-07-11 8:12 GMT+08:00 Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Unfortunately, Eddie, I'm not entirely sure what is going on with your situation. According to the code, the non-existing PCI device should be removed from the pci_devices table when the PCI manager notices the PCI device is no longer on the local host...<span class=""><br>
<br>
On 07/09/2017 08:36 PM, Eddie Yen wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
Hi there,<br>
<br>
Does the information already enough or need additional items?<br>
<br>
Thanks,<br>
Eddie.<br>
<br></span>
2017-07-07 10:49 GMT+08:00 Eddie Yen <<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a> <mailto:<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a>><wbr>>:<span class=""><br>
<br>
Sorry,<br>
<br>
Re-new the nova-compute log after remove "1002:68c8" and restart<br>
nova-compute.<br>
<a href="http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/qUCOX09jyeMydoYHc8Oz/</a><br>
<<a href="http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/qUCOX09jyeMydoYHc8Oz/</a>><br>
<br>
2017-07-07 10:37 GMT+08:00 Eddie Yen <<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a><br></span>
<mailto:<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a>><wbr>>:<div><div class="h5"><br>
<br>
Hi Jay,<br>
<br>
Below are few logs and information you may want to check.<br>
<br>
<br>
<br>
I wrote GPU inforamtion into nova.conf like this.<br>
<br>
pci_passthrough_whitelist = [{ "product_id":"0ff3",<br>
"vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]<br>
<br>
pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",<br>
"device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",<br>
"vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]<br>
<br>
<br>
Then restart the services.<br>
<br>
nova-compute log when insert new GPU device info into nova.conf<br>
and restart service:<br>
<a href="http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/z015rYGXaxYhVoafKdbx/</a><br>
<<a href="http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/z015rYGXaxYhVoafKdbx/</a>><br>
<br>
Strange is, the log shows that resource tracker only collect<br>
information of new setup GPU, not included the old one.<br>
<br>
<br>
But If I do some actions on the instance contained old GPU, the<br>
tracker will get both GPU.<br>
<a href="http://paste.openstack.org/show/614658/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/614658/</a><br>
<<a href="http://paste.openstack.org/show/614658/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/614658/</a>><br>
<br>
Nova database shows correct information on both GPU<br>
<a href="http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/8JS0i6BMitjeBVRJTkRo/</a><br>
<<a href="http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/8JS0i6BMitjeBVRJTkRo/</a>><br>
<br>
<br>
<br>
Now remove ID "1002:68c8" from nova.conf and compute node, and<br>
restart services.<br>
<br>
The pci_passthrough_whitelist and pci_alias only keep<br>
"10de:0ff3" GPU info.<br>
<br>
pci_passthrough_whitelist = { "product_id":"0ff3",<br>
"vendor_id":"10de" }<br>
<br>
pci_alias = { "product_id":"0ff3", "vendor_id":"10de",<br>
"device_type":"type-PCI", "name":"k420" }<br>
<br>
<br>
nova-compute log shows resource tracker report node only have<br>
"10de:0ff3" PCI resource<br>
<a href="http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/VjLinsipne5nM8o0TYcJ/</a><br>
<<a href="http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/VjLinsipne5nM8o0TYcJ/</a>><br>
<br>
But in Nova database, "1002:68c8" still exist, and stayed in<br>
"Available" status. Even "deleted" value shows not zero.<br>
<a href="http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/SnJ8AzJYD6wCo7jslIc2/</a><br>
<<a href="http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/SnJ8AzJYD6wCo7jslIc2/</a>><br>
<br>
<br>
Many thanks,<br>
Eddie.<br>
<br>
2017-07-07 9:05 GMT+08:00 Eddie Yen <<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a><br></div></div>
<mailto:<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a>><wbr>>:<span class=""><br>
<br>
Uh wait,<br>
<br>
Is that possible it still shows available if PCI device<br>
still exist in the same address?<br>
<br>
Because when I remove the GPU card, I replace it to a SFP+<br>
network card in the same slot.<br>
So when I type lspci the SFP+ card stay in the same address.<br>
<br>
But it still doesn't make any sense because these two cards<br>
definitely not a same VID:PID.<br>
And I set the information as VID:PID in nova.conf<br>
<br>
<br>
I'll try reproduce this issue and put a log on this list.<br>
<br>
Thanks,<br>
<br>
2017-07-07 9:01 GMT+08:00 Jay Pipes <<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a><br></span>
<mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>>>:<span class=""><br>
<br>
Hmm, very odd indeed. Any way you can save the<br>
nova-compute logs from when you removed the GPU and<br>
restarted the nova-compute service and paste those logs<br></span>
to <a href="http://paste.openstack.org" rel="noreferrer" target="_blank">paste.openstack.org</a> <<a href="http://paste.openstack.org" rel="noreferrer" target="_blank">http://paste.openstack.org</a>>?<span class=""><br>
Would be useful in tracking down this buggy behaviour...<br>
<br>
Best,<br>
-jay<br>
<br>
On 07/06/2017 08:54 PM, Eddie Yen wrote:<br>
<br>
Hi Jay,<br>
<br>
The status of the "removed" GPU still shows as<br>
"Available" in pci_devices table.<br>
<br>
2017-07-07 8:34 GMT+08:00 Jay Pipes<br>
<<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a> <mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>><br></span>
<mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a><div><div class="h5"><br>
<mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>>>>:<br>
<br>
<br>
Hi again, Eddie :) Answer inline...<br>
<br>
On 07/06/2017 08:14 PM, Eddie Yen wrote:<br>
<br>
Hi everyone,<br>
<br>
I'm using OpenStack Mitaka version<br>
(deployed from Fuel 9.2)<br>
<br>
In present, I installed two different model<br>
of GPU card.<br>
<br>
And wrote these information into pci_alias and<br>
pci_passthrough_whitelist in nova.conf on<br>
Controller and Compute<br>
(the node which installed GPU).<br>
Then restart nova-api, nova-scheduler,and<br>
nova-compute.<br>
<br>
When I check database, both of GPU info<br>
registered in<br>
pci_devices table.<br>
<br>
Now I removed one of the GPU from compute<br>
node, and remove the<br>
information from nova.conf, then restart<br>
services.<br>
<br>
But I check database again, the information<br>
of the removed card<br>
still exist in pci_devices table.<br>
<br>
How can I do to fix this problem?<br>
<br>
<br>
So, when you removed the GPU from the compute<br>
node and restarted the<br>
nova-compute service, it *should* have noticed<br>
you had removed the<br>
GPU and marked that PCI device as deleted. At<br>
least, according to<br>
this code in the PCI manager:<br>
<br>
<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/n<wbr>ova/blob/master/nova/pci/manag<wbr>er.py#L168-L183</a><br>
<<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/master/nova/pci/mana<wbr>ger.py#L168-L183</a>><br>
<<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/master/nova/pci/mana<wbr>ger.py#L168-L183</a><br>
<<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/master/nova/pci/mana<wbr>ger.py#L168-L183</a>>><br>
<br>
Question for you: what is the value of the<br>
status field in the<br>
pci_devices table for the GPU that you removed?<br>
<br>
Best,<br>
-jay<br>
<br>
p.s. If you really want to get rid of that<br>
device, simply remove<br>
that record from the pci_devices table. But,<br>
again, it *should* be<br>
removed automatically...<br>
<br>
_____________________________<wbr>__________________<br>
Mailing list:<br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k</a><br>
<<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>><br>
<<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a><br>
<<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>>><br>
Post to : <a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openstack.org</a><br>
<mailto:<a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openst<wbr>ack.org</a>><br></div></div>
<mailto:<a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.opens<wbr>tack.org</a><span class=""><br>
<mailto:<a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openst<wbr>ack.org</a>>><br>
Unsubscribe :<br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k</a><br>
<<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>><br></span>
<<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a><br>
<<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>>><br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote>
</blockquote></div><br></div>