[Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf
Moshe Levi
moshele at mellanox.com
Tue Jul 11 01:13:51 UTC 2017
Hi Eddie,
Looking on the your nova database after the delete looks correct to me.
| created_at | updated_at | deleted_at | deleted | id
| 2017-06-21 00:56:06 | 2017-07-07 02:27:16 | NULL | 0 | 2
| 2017-07-07 01:42:48 | 2017-07-07 02:13:14 | 2017-07-07 02:13:42 | 9 | 9
See that the second row has deleted_at timestamp and deleted with no zero value (the id of the row). Nova is doing soft delete which is just marking the row as deleted but not actually deleting it from nova pci_devices table.
See [1] and [2]
There is a bug with the pci_devices in a scenario when we can delete allocated pci device e.g. if pci.passthrough_whitelist is changed commit [3] try to resolve.
[1] - https://github.com/openstack/oslo.db/blob/master/oslo_db/sqlalchemy/models.py#L142-L150
[2] - https://github.com/openstack/nova/blob/master/nova/db/sqlalchemy/models.py#L1411
[3-] - https://review.openstack.org/#/c/426243/
From: Eddie Yen [mailto:missile0407 at gmail.com]
Sent: Tuesday, July 11, 2017 3:18 AM
To: Jay Pipes <jaypipes at gmail.com>
Cc: openstack at lists.openstack.org
Subject: Re: [Openstack] [nova] Database not delete PCI info after device is removed from host and nova.conf
Roger that,
I may going to report this bug on the OpenStack Compute (Nova) Launchpad to see what happen.
Anyway, thanks for ur help, really appreciate.
Eddie.
2017-07-11 8:12 GMT+08:00 Jay Pipes <jaypipes at gmail.com<mailto:jaypipes at gmail.com>>:
Unfortunately, Eddie, I'm not entirely sure what is going on with your situation. According to the code, the non-existing PCI device should be removed from the pci_devices table when the PCI manager notices the PCI device is no longer on the local host...
On 07/09/2017 08:36 PM, Eddie Yen wrote:
Hi there,
Does the information already enough or need additional items?
Thanks,
Eddie.
2017-07-07 10:49 GMT+08:00 Eddie Yen <missile0407 at gmail.com<mailto:missile0407 at gmail.com> <mailto:missile0407 at gmail.com<mailto:missile0407 at gmail.com>>>:
Sorry,
Re-new the nova-compute log after remove "1002:68c8" and restart
nova-compute.
http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0>
<http://paste.openstack.org/show/qUCOX09jyeMydoYHc8Oz/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FqUCOX09jyeMydoYHc8Oz%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098573075&sdata=brxkAv3AgO%2BwpwPXow5SY%2By0rGZ%2B7STTbEfm3gH1KSM%3D&reserved=0>>
2017-07-07 10:37 GMT+08:00 Eddie Yen <missile0407 at gmail.com<mailto:missile0407 at gmail.com>
<mailto:missile0407 at gmail.com<mailto:missile0407 at gmail.com>>>:
Hi Jay,
Below are few logs and information you may want to check.
I wrote GPU inforamtion into nova.conf like this.
pci_passthrough_whitelist = [{ "product_id":"0ff3",
"vendor_id":"10de"}, { "product_id":"68c8", "vendor_id":"1002"}]
pci_alias = [{ "product_id":"0ff3", "vendor_id":"10de",
"device_type":"type-PCI", "name":"k420"}, { "product_id":"68c8",
"vendor_id":"1002", "device_type":"type-PCI", "name":"v4800"}]
Then restart the services.
nova-compute log when insert new GPU device info into nova.conf
and restart service:
http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0>
<http://paste.openstack.org/show/z015rYGXaxYhVoafKdbx/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2Fz015rYGXaxYhVoafKdbx%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=Jc1%2B7Uexui%2FFfEL%2FdADTp6tVa9ssIBPGabGwA85Qm2E%3D&reserved=0>>
Strange is, the log shows that resource tracker only collect
information of new setup GPU, not included the old one.
But If I do some actions on the instance contained old GPU, the
tracker will get both GPU.
http://paste.openstack.org/show/614658/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0>
<http://paste.openstack.org/show/614658/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F614658%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=EvEVi1mhEAbVLK7NQppVJX8i7aqkgCtwbH8GRFr81Fo%3D&reserved=0>>
Nova database shows correct information on both GPU
http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0>
<http://paste.openstack.org/show/8JS0i6BMitjeBVRJTkRo/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2F8JS0i6BMitjeBVRJTkRo%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=V%2BFxNgTY2N3hDU6gK31axnLCf1bvz7B7Lw%2FmqY%2BrhT8%3D&reserved=0>>
Now remove ID "1002:68c8" from nova.conf and compute node, and
restart services.
The pci_passthrough_whitelist and pci_alias only keep
"10de:0ff3" GPU info.
pci_passthrough_whitelist = { "product_id":"0ff3",
"vendor_id":"10de" }
pci_alias = { "product_id":"0ff3", "vendor_id":"10de",
"device_type":"type-PCI", "name":"k420" }
nova-compute log shows resource tracker report node only have
"10de:0ff3" PCI resource
http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0>
<http://paste.openstack.org/show/VjLinsipne5nM8o0TYcJ/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FVjLinsipne5nM8o0TYcJ%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=GmWsrHuv1DphNJXSKDils8iUWn%2BKbeihjmfDQHQHOMY%3D&reserved=0>>
But in Nova database, "1002:68c8" still exist, and stayed in
"Available" status. Even "deleted" value shows not zero.
http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0>
<http://paste.openstack.org/show/SnJ8AzJYD6wCo7jslIc2/<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org%2Fshow%2FSnJ8AzJYD6wCo7jslIc2%2F&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=9bzrGFdYVtAtwKdTu0ZaxegUah3ZTBbNqAGjCrsT9lk%3D&reserved=0>>
Many thanks,
Eddie.
2017-07-07 9:05 GMT+08:00 Eddie Yen <missile0407 at gmail.com<mailto:missile0407 at gmail.com>
<mailto:missile0407 at gmail.com<mailto:missile0407 at gmail.com>>>:
Uh wait,
Is that possible it still shows available if PCI device
still exist in the same address?
Because when I remove the GPU card, I replace it to a SFP+
network card in the same slot.
So when I type lspci the SFP+ card stay in the same address.
But it still doesn't make any sense because these two cards
definitely not a same VID:PID.
And I set the information as VID:PID in nova.conf
I'll try reproduce this issue and put a log on this list.
Thanks,
2017-07-07 9:01 GMT+08:00 Jay Pipes <jaypipes at gmail.com<mailto:jaypipes at gmail.com>
<mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>>>:
Hmm, very odd indeed. Any way you can save the
nova-compute logs from when you removed the GPU and
restarted the nova-compute service and paste those logs
to paste.openstack.org<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0> <http://paste.openstack.org<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Fpaste.openstack.org&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=%2B6nouVdZuiGwaywLl%2BYGqbqDIbZZIjagLykv6%2BEYrf8%3D&reserved=0>>?
Would be useful in tracking down this buggy behaviour...
Best,
-jay
On 07/06/2017 08:54 PM, Eddie Yen wrote:
Hi Jay,
The status of the "removed" GPU still shows as
"Available" in pci_devices table.
2017-07-07 8:34 GMT+08:00 Jay Pipes
<jaypipes at gmail.com<mailto:jaypipes at gmail.com> <mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>>
<mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>
<mailto:jaypipes at gmail.com<mailto:jaypipes at gmail.com>>>>:
Hi again, Eddie :) Answer inline...
On 07/06/2017 08:14 PM, Eddie Yen wrote:
Hi everyone,
I'm using OpenStack Mitaka version
(deployed from Fuel 9.2)
In present, I installed two different model
of GPU card.
And wrote these information into pci_alias and
pci_passthrough_whitelist in nova.conf on
Controller and Compute
(the node which installed GPU).
Then restart nova-api, nova-scheduler,and
nova-compute.
When I check database, both of GPU info
registered in
pci_devices table.
Now I removed one of the GPU from compute
node, and remove the
information from nova.conf, then restart
services.
But I check database again, the information
of the removed card
still exist in pci_devices table.
How can I do to fix this problem?
So, when you removed the GPU from the compute
node and restarted the
nova-compute service, it *should* have noticed
you had removed the
GPU and marked that PCI device as deleted. At
least, according to
this code in the PCI manager:
https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>>
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>
<https://github.com/openstack/nova/blob/master/nova/pci/manager.py#L168-L183<https://emea01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fopenstack%2Fnova%2Fblob%2Fmaster%2Fnova%2Fpci%2Fmanager.py%23L168-L183&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=CYb%2Fec5fiAkU9LfJ7W6eMxXsS%2F2VpdfaVYSAdcGRy94%3D&reserved=0>>>
Question for you: what is the value of the
status field in the
pci_devices table for the GPU that you removed?
Best,
-jay
p.s. If you really want to get rid of that
device, simply remove
that record from the pci_devices table. But,
again, it *should* be
removed automatically...
_______________________________________________
Mailing list:
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098583083&sdata=ZpzNaE0Wra4KGRWcluDSyq9lIWTjcOa%2F0uEzllZ6ofI%3D&reserved=0>>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>>
Post to : openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
<mailto:openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>>
<mailto:openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>
<mailto:openstack at lists.openstack.org<mailto:openstack at lists.openstack.org>>>
Unsubscribe :
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>
<http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack<https://emea01.safelinks.protection.outlook.com/?url=http%3A%2F%2Flists.openstack.org%2Fcgi-bin%2Fmailman%2Flistinfo%2Fopenstack&data=02%7C01%7Cmoshele%40mellanox.com%7C21206586310a435b1ddf08d4c7f436df%7Ca652971c7d2e4d9ba6a4d149256f461b%7C0%7C0%7C636353299098593092&sdata=EM1gsCu55xLMlaPGl5QumwnCR%2FEfgNEEF3GpXOCDshE%3D&reserved=0>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20170711/ad713e80/attachment.html>
More information about the Openstack
mailing list