[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf

Moshe Levi moshele at mellanox.com
Tue Jul 11 01:13:51 UTC 2017


Hi Eddie,


Looking on the your nova database after the delete looks correct to me.

| created_at          | updated_at          | deleted_at          | deleted | id

| 2017-06-21 00:56:06 | 2017-07-07 02:27:16 | NULL                |       0 |  2

| 2017-07-07 01:42:48 | 2017-07-07 02:13:14 | 2017-07-07 02:13:42 |       9 |  9
See that the second row has deleted_at timestamp  and deleted with no zero value (the id of the row). Nova is doing soft delete which is just marking the row as deleted but not actually deleting it from nova pci_devices table.
See [1] and [2]

There is a bug with the pci_devices in a scenario  when we can delete allocated pci device e.g. if pci.passthrough_whitelist is changed  commit [3] try to resolve.


[1] - https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/models.py#L142-L150
[2] - https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/models.py#L1411
[3-] - https://review.openstack.org/#/c/426243/

From: Eddie Yen [mailto:missile0407 at gmail.com]
Sent: Tuesday, July 11, 2017 3:18 AM
To: Jay Pipes <jaypipes at gmail.com>
Cc: openstack at lists.openstack.org
Subject: Re: [Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf

Roger that,

I may going to report this bug on the OpenStack Compute (Nova) Launchpad to see what happen.

Anyway, thanks for ur help, really appreciate.


Eddie.

2017-07-11 8:12 GMT+08:00 Jay Pipes <jaypipes at gmail.com<mailto:jaypipes at gmail.com>>:
Unfortunately, Eddie, I'm not entirely sure what is going on with your situation. According to the code, the non-existing PCI device should be removed from the pci_devices table when the PCI manager notices the PCI device is no longer on the local host...

On 07/09/2017 08:36 PM, Eddie Yen wrote:
Hi there,

Does the information already enough or need additional items?

Thanks,
Eddie.

2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0407 at gmail.com<mailto:missile0407 at gmail.com> <mailto:missile0407 at gmail.com<mailto:missile0407 at gmail.com>>>:

    Sorry,

    Re-new the nova-compute log after remove "1002:68c8" and restart
    nova-compute.
    http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0>
    <http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0>>

    2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0407 at gmail.com<mailto:missile0407 at gmail.com>
    <mailto:missile0407 at gmail.com<mailto:missile0407 at gmail.com>>>:


        Hi Jay,

        Below are few logs and information you may want to check.



        I wrote GPU inforamtion into nova.conf like this.

        pci_passthrough_whitelist = [{ "product_id":"0ff3",
        "vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]

        pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
        "device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
        "vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]


        Then restart the services.

        nova-compute log when insert new GPU device info into nova.conf
        and restart service:
        http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0>
        <http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0>>

        Strange is, the log shows that resource tracker only collect
        information of new setup GPU, not included the old one.


        But If I do some actions on the instance contained old GPU, the
        tracker will get both GPU.
        http://paste.openstack.org/show/614658/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0>
        <http://paste.openstack.org/show/614658/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0>>

        Nova database shows correct information on both GPU
        http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0>
        <http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0>>



        Now remove ID "1002:68c8" from nova.conf and compute node, and
        restart services.

        The pci_passthrough_whitelist and pci_alias only keep
        "10de:0ff3" GPU info.

        pci_passthrough_whitelist = { "product_id":"0ff3",
        "vendor_id":"10de" }

        pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
        "device_type":"type-PCI", "name":"k420" }


        nova-compute log shows resource tracker report node only have
        "10de:0ff3" PCI resource
        http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0>
        <http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0>>

        But in Nova database, "1002:68c8" still exist, and stayed in
        "Available" status. Even "deleted" value shows not zero.
        http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0>
        <http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0>>


        Many thanks,
        Eddie.

        2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com<mailto:missile0407 at gmail.com>
        <mailto:missile0407 at gmail.com<mailto:missile0407 at gmail.com>>>:

            Uh wait,

            Is that possible it still shows available if PCI device
            still exist in the same address?

            Because when I remove the GPU card, I replace it to a SFP+
            network card in the same slot.
            So when I type lspci the SFP+ card stay in the same address.

            But it still doesn't make any sense because these two cards
            definitely not a same VID:PID.
            And I set the information as VID:PID in nova.conf


            I'll try reproduce this issue and put a log on this list.

            Thanks,

            2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com<mailto:jaypipes at gmail.com>
            <mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>>>:

                Hmm, very odd indeed. Any way you can save the
                nova-compute logs from when you removed the GPU and
                restarted the nova-compute service and paste those logs
                to paste.openstack.org<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0> <http://paste.openstack.org<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0>>?
                Would be useful in tracking down this buggy behaviour...

                Best,
                -jay

                On 07/06/2017 08:54 PM, Eddie Yen wrote:

                    Hi Jay,

                    The status of the "removed" GPU still shows as
                    "Available" in pci_devices table.

                    2017-07-07 8:34 GMT+08:00 Jay Pipes
                    <jaypipes at gmail.com<mailto:jaypipes at gmail.com> <mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>>
                    <mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>

                    <mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>>>>:


                         Hi again, Eddie :) Answer inline...

                         On 07/06/2017 08:14 PM, Eddie Yen wrote:

                             Hi everyone,

                             I'm using OpenStack Mitaka version
                    (deployed from Fuel 9.2)

                             In present, I installed two different model
                    of GPU card.

                             And wrote these information into pci_alias and
                             pci_passthrough_whitelist in nova.conf on
                    Controller and Compute
                             (the node which installed GPU).
                             Then restart nova-api, nova-scheduler,and
                    nova-compute.

                             When I check database, both of GPU info
                    registered in
                             pci_devices table.

                             Now I removed one of the GPU from compute
                    node, and remove the
                             information from nova.conf, then restart
                    services.

                             But I check database again, the information
                    of the removed card
                             still exist in pci_devices table.

                             How can I do to fix this problem?


                         So, when you removed the GPU from the compute
                    node and restarted the
                         nova-compute service, it *should* have noticed
                    you had removed the
                         GPU and marked that PCI device as deleted. At
                    least, according to
                         this code in the PCI manager:

                    https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>
                    <https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>>
                                            <https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>
                    <https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>>>

                         Question for you: what is the value of the
                    status field in the
                         pci_devices table for the GPU that you removed?

                         Best,
                         -jay

                         p.s. If you really want to get rid of that
                    device, simply remove
                         that record from the pci_devices table. But,
                    again, it *should* be
                         removed automatically...

                         _______________________________________________
                         Mailing list:
                    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0>
                    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0>>
                                            <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
                    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>>
                         Post to     : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
                    <mailto:openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>>
                         <mailto:openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
                    <mailto:openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>>>
                         Unsubscribe :
                    http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
                    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>
                                            <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
                    <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>>






-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170711/ad713e80/attachment.html>


More information about the Openstack mailing list