<div dir="ltr">Roger that,<div><br></div><div>I may going to report this bug on the OpenStack Compute (Nova) Launchpad to see what happen.<br><br>Anyway, thanks for ur help, really appreciate.<br><br><br>Eddie.</div></div><div class="gmail_extra"><br><div class="gmail_quote">2017-07-11 8:12 GMT+08:00 Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Unfortunately, Eddie, I'm not entirely sure what is going on with your situation. According to the code, the non-existing PCI device should be removed from the pci_devices table when the PCI manager notices the PCI device is no longer on the local host...<span class=""><br>
<br>
On 07/09/2017 08:36 PM, Eddie Yen wrote:<br>
</span><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">
Hi there,<br>
<br>
Does the information already enough or need additional items?<br>
<br>
Thanks,<br>
Eddie.<br>
<br></span>
2017-07-07 10:49 GMT+08:00 Eddie Yen <<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a> <mailto:<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a>><wbr>>:<span class=""><br>
<br>
    Sorry,<br>
<br>
    Re-new the nova-compute log after remove "1002:68c8" and restart<br>
    nova-compute.<br>
    <a href="http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/qUCOX09jyeMydoYHc8Oz/</a><br>
    <<a href="http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/qUCOX09jyeMydoYHc8Oz/</a>><br>
<br>
    2017-07-07 10:37 GMT+08:00 Eddie Yen <<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a><br></span>
    <mailto:<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a>><wbr>>:<div><div class="h5"><br>
<br>
        Hi Jay,<br>
<br>
        Below are few logs and information you may want to check.<br>
<br>
<br>
<br>
        I wrote GPU inforamtion into nova.conf like this.<br>
<br>
        pci_passthrough_whitelist = [{ "product_id":"0ff3",<br>
        "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]<br>
<br>
        pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",<br>
        "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",<br>
        "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]<br>
<br>
<br>
        Then restart the services.<br>
<br>
        nova-compute log when insert new GPU device info into nova.conf<br>
        and restart service:<br>
        <a href="http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/z015rYGXaxYhVoafKdbx/</a><br>
        <<a href="http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/z015rYGXaxYhVoafKdbx/</a>><br>
<br>
        Strange is, the log shows that resource tracker only collect<br>
        information of new setup GPU, not included the old one.<br>
<br>
<br>
        But If I do some actions on the instance contained old GPU, the<br>
        tracker will get both GPU.<br>
        <a href="http://paste.openstack.org/show/614658/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/614658/</a><br>
        <<a href="http://paste.openstack.org/show/614658/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/614658/</a>><br>
<br>
        Nova database shows correct information on both GPU<br>
        <a href="http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/8JS0i6BMitjeBVRJTkRo/</a><br>
        <<a href="http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/8JS0i6BMitjeBVRJTkRo/</a>><br>
<br>
<br>
<br>
        Now remove ID "1002:68c8" from nova.conf and compute node, and<br>
        restart services.<br>
<br>
        The pci_passthrough_whitelist and pci_alias only keep<br>
        "10de:0ff3" GPU info.<br>
<br>
        pci_passthrough_whitelist = { "product_id":"0ff3",<br>
        "vendor_id":"10de" }<br>
<br>
        pci_alias = { "product_id":"0ff3", "vendor_id":"10de",<br>
        "device_type":"type-PCI", "name":"k420" }<br>
<br>
<br>
        nova-compute log shows resource tracker report node only have<br>
        "10de:0ff3" PCI resource<br>
        <a href="http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/VjLinsipne5nM8o0TYcJ/</a><br>
        <<a href="http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/VjLinsipne5nM8o0TYcJ/</a>><br>
<br>
        But in Nova database, "1002:68c8" still exist, and stayed in<br>
        "Available" status. Even "deleted" value shows not zero.<br>
        <a href="http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/" rel="noreferrer" target="_blank">http://paste.openstack.org/sho<wbr>w/SnJ8AzJYD6wCo7jslIc2/</a><br>
        <<a href="http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/" rel="noreferrer" target="_blank">http://paste.openstack.org/sh<wbr>ow/SnJ8AzJYD6wCo7jslIc2/</a>><br>
<br>
<br>
        Many thanks,<br>
        Eddie.<br>
<br>
        2017-07-07 9:05 GMT+08:00 Eddie Yen <<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a><br></div></div>
        <mailto:<a href="mailto:missile0407@gmail.com" target="_blank">missile0407@gmail.com</a>><wbr>>:<span class=""><br>
<br>
            Uh wait,<br>
<br>
            Is that possible it still shows available if PCI device<br>
            still exist in the same address?<br>
<br>
            Because when I remove the GPU card, I replace it to a SFP+<br>
            network card in the same slot.<br>
            So when I type lspci the SFP+ card stay in the same address.<br>
<br>
            But it still doesn't make any sense because these two cards<br>
            definitely not a same VID:PID.<br>
            And I set the information as VID:PID in nova.conf<br>
<br>
<br>
            I'll try reproduce this issue and put a log on this list.<br>
<br>
            Thanks,<br>
<br>
            2017-07-07 9:01 GMT+08:00 Jay Pipes <<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a><br></span>
            <mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>>>:<span class=""><br>
<br>
                Hmm, very odd indeed. Any way you can save the<br>
                nova-compute logs from when you removed the GPU and<br>
                restarted the nova-compute service and paste those logs<br></span>
                to <a href="http://paste.openstack.org" rel="noreferrer" target="_blank">paste.openstack.org</a> <<a href="http://paste.openstack.org" rel="noreferrer" target="_blank">http://paste.openstack.org</a>>?<span class=""><br>
                Would be useful in tracking down this buggy behaviour...<br>
<br>
                Best,<br>
                -jay<br>
<br>
                On 07/06/2017 08:54 PM, Eddie Yen wrote:<br>
<br>
                    Hi Jay,<br>
<br>
                    The status of the "removed" GPU still shows as<br>
                    "Available" in pci_devices table.<br>
<br>
                    2017-07-07 8:34 GMT+08:00 Jay Pipes<br>
                    <<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a> <mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>><br></span>
                    <mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a><div><div class="h5"><br>
                    <mailto:<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>>>>:<br>
<br>
<br>
                         Hi again, Eddie :) Answer inline...<br>
<br>
                         On 07/06/2017 08:14 PM, Eddie Yen wrote:<br>
<br>
                             Hi everyone,<br>
<br>
                             I'm using OpenStack Mitaka version<br>
                    (deployed from Fuel 9.2)<br>
<br>
                             In present, I installed two different model<br>
                    of GPU card.<br>
<br>
                             And wrote these information into pci_alias and<br>
                             pci_passthrough_whitelist in nova.conf on<br>
                    Controller and Compute<br>
                             (the node which installed GPU).<br>
                             Then restart nova-api, nova-scheduler,and<br>
                    nova-compute.<br>
<br>
                             When I check database, both of GPU info<br>
                    registered in<br>
                             pci_devices table.<br>
<br>
                             Now I removed one of the GPU from compute<br>
                    node, and remove the<br>
                             information from nova.conf, then restart<br>
                    services.<br>
<br>
                             But I check database again, the information<br>
                    of the removed card<br>
                             still exist in pci_devices table.<br>
<br>
                             How can I do to fix this problem?<br>
<br>
<br>
                         So, when you removed the GPU from the compute<br>
                    node and restarted the<br>
                         nova-compute service, it *should* have noticed<br>
                    you had removed the<br>
                         GPU and marked that PCI device as deleted. At<br>
                    least, according to<br>
                         this code in the PCI manager:<br>
<br>
                    <a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/n<wbr>ova/blob/master/nova/pci/manag<wbr>er.py#L168-L183</a><br>
                    <<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/master/nova/pci/mana<wbr>ger.py#L168-L183</a>><br>
                                            <<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/master/nova/pci/mana<wbr>ger.py#L168-L183</a><br>
                    <<a href="https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183" rel="noreferrer" target="_blank">https://github.com/openstack/<wbr>nova/blob/master/nova/pci/mana<wbr>ger.py#L168-L183</a>>><br>
<br>
                         Question for you: what is the value of the<br>
                    status field in the<br>
                         pci_devices table for the GPU that you removed?<br>
<br>
                         Best,<br>
                         -jay<br>
<br>
                         p.s. If you really want to get rid of that<br>
                    device, simply remove<br>
                         that record from the pci_devices table. But,<br>
                    again, it *should* be<br>
                         removed automatically...<br>
<br>
                         _____________________________<wbr>__________________<br>
                         Mailing list:<br>
                    <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k</a><br>
                    <<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>><br>
                                            <<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a><br>
                    <<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>>><br>
                         Post to     : <a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openstack.org</a><br>
                    <mailto:<a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openst<wbr>ack.org</a>><br></div></div>
                         <mailto:<a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.opens<wbr>tack.org</a><span class=""><br>
                    <mailto:<a href="mailto:openstack@lists.openstack.org" target="_blank">openstack@lists.openst<wbr>ack.org</a>>><br>
                         Unsubscribe :<br>
                    <a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k</a><br>
                    <<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>><br></span>
                                            <<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a><br>
                    <<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack" rel="noreferrer" target="_blank">http://lists.openstack.org/cg<wbr>i-bin/mailman/listinfo/opensta<wbr>ck</a>>><br>
<br>
<br>
<br>
<br>
<br>
<br>
</blockquote>
</blockquote></div><br></div>