[Nova][Keystone] Instance boot fail if user has many roles
Hi Nova and Keystone folks, I'd like to get your feedback for the bug[1] and the commit[2]. We use Keystone as company internal role management system and hit the issue frequently. The commit is to change instance_system_metadata table schema, but it could be too big change only for the long role list. Another idea I can imagine is to introduce "store_boot_roles" boolean configuration to store the boot_roles or not. IIUC, the boot_roles is used only by a specific vendor data in the metadata api. If there is another good value to store from Keystone side, I'm happy to use the value instead of role name, too. 1. https://bugs.launchpad.net/nova/+bug/2075100 2. https://review.opendev.org/c/openstack/nova/+/925163 thank you, Masahito
On Tue, 2024-08-13 at 17:39 +0900, Masahito Muroi wrote:
Hi Nova and Keystone folks,
I'd like to get your feedback for the bug[1] and the commit[2]. We use Keystone as company internal role management system and hit the issue frequently.
The commit is to change instance_system_metadata table schema, but it could be too big change only for the long role list. changing the instnace_system_metadata schema form nvarchar 255 to text is not really an option.
it woudl be a very large data migration for exsting user as you would be moving the sting data form bign stored inline in the table to being stored out of line as seperate files on disk that woudl signifcalty impact the performance so that approch is not viable IMO
Another idea I can imagine is to introduce "store_boot_roles" boolean configuration to store the boot_roles or not. IIUC, the boot_roles is used only by a specific vendor data in the metadata api.
that is also not really a good option as this data is used form muplile location and reuqiring you sett the same config option on the compute conductor scheduler and api the same is very error prone.
If there is another good value to store from Keystone side, I'm happy to use the value instead of role name, too.
im not actully sure why we are storign the roles at all in the instance_system_metadata. the roles are not an atribute of the vm they are an atirbute of the user token used to create the vm as such i woudl argure we likely shoudl not be storing them at all since that data could be stale if we ever read it again later. since we cant rely on it to be current we shoudl not store it. however im not sure why we orginally stored this so woe woudl need to investiagte that.
1. https://bugs.launchpad.net/nova/+bug/2075100 2. https://review.opendev.org/c/openstack/nova/+/925163
thank you, Masahito
On Tue, 2024-08-13 at 10:47 +0100, smooney@redhat.com wrote:
On Tue, 2024-08-13 at 17:39 +0900, Masahito Muroi wrote:
Hi Nova and Keystone folks,
I'd like to get your feedback for the bug[1] and the commit[2]. We use Keystone as company internal role management system and hit the issue frequently.
The commit is to change instance_system_metadata table schema, but it could be too big change only for the long role list. changing the instnace_system_metadata schema form nvarchar 255 to text is not really an option.
it woudl be a very large data migration for exsting user as you would be moving the sting data form bign stored inline in the table to being stored out of line as seperate files on disk that woudl signifcalty impact the performance so that approch is not viable IMO
Another idea I can imagine is to introduce "store_boot_roles" boolean configuration to store the boot_roles or not. IIUC, the boot_roles is used only by a specific vendor data in the metadata api.
that is also not really a good option as this data is used form muplile location and reuqiring you sett the same config option on the compute conductor scheduler and api the same is very error prone.
If there is another good value to store from Keystone side, I'm happy to use the value instead of role name, too.
im not actully sure why we are storign the roles at all in the instance_system_metadata.
the roles are not an atribute of the vm they are an atirbute of the user token used to create the vm as such i woudl argure we likely shoudl not be storing them at all since that data could be stale if we ever read it again later. since we cant rely on it to be current we shoudl not store it.
however im not sure why we orginally stored this so woe woudl need to investiagte that. ah it was added for dynmic vendor metadata https://github.com/openstack/nova/commit/6d8b58dc6f1cbda8d664b3487674f870494... we cant just drop that without deprecation but there are a few things we can do.
first of all we can wrap https://github.com/openstack/nova/blob/7399728e8937c31b7e92ed8fa7f9ce860c9a3... in an if that check if dynmic vendor data is enabled by checking len(CONF.api.vendordata_dynamic_targets) > 1 https://docs.openstack.org/nova/latest/configuration/config.html#api.vendord... second instead of doing {'boot_roles': ','.join(context.roles)} we can add a boot_roles_count key and use that to add pagination by adding one boot_role_<index> key per role that will allow you to have arbitary number of roles with one key (boot_role_0) per role and its very easy to detect and fall back to the boot_roles key if boot_roles_count is not presnet. that minimise the upgrade impact we coudl also do an online data migration when the instace.system_metadata is next read or witten too. that will avoid the large database hit on upgrade by spreadign the migration across new usage of the upgraded code path.
1. https://bugs.launchpad.net/nova/+bug/2075100 2. https://review.opendev.org/c/openstack/nova/+/925163
thank you, Masahito
Hi thank you for the feedback.
first of all we can wrap https://github.com/openstack/nova/blob/7399728e8937c31b7e92ed8fa7f9ce860c9a3... in an if that check if dynmic vendor data is enabled by checking len(CONF.api.vendordata_dynamic_targets) > 1 https://docs.openstack.org/nova/latest/configuration/config.html#api.vendord...
second instead of doing {'boot_roles': ','.join(context.roles)}
we can add a boot_roles_count key and use that to add pagination by adding one boot_role_<index> key per role that will allow you to have arbitary number of roles with one key (boot_role_0) per role and its very easy to detect and fall back to the boot_roles key if boot_roles_count is not presnet.
That makes sense to me. Let me update the commit. thank you, Masahito -----Original Message----- From: <smooney@redhat.com> To: "Masahito Muroi"<masahito.muroi@lycorp.co.jp>; <openstack-discuss@lists.openstack.org>; Cc: Sent: 2024/08/13(火) 18:57 (GMT+09:00) Subject: Re: [Nova][Keystone] Instance boot fail if user has many roles On Tue, 2024-08-13 at 10:47 +0100, smooney@redhat.com wrote:
On Tue, 2024-08-13 at 17:39 +0900, Masahito Muroi wrote:
Hi Nova and Keystone folks,
I'd like to get your feedback for the bug[1] and the commit[2]. We use Keystone as company internal role management system and hit the issue frequently.
The commit is to change instance_system_metadata table schema, but it could be too big change only for the long role list. changing the instnace_system_metadata schema form nvarchar 255 to text is not really an option.
it woudl be a very large data migration for exsting user as you would be moving the sting data form bign stored inline in the table to being stored out of line as seperate files on disk that woudl signifcalty impact the performance so that approch is not viable IMO
Another idea I can imagine is to introduce "store_boot_roles" boolean configuration to store the boot_roles or not. IIUC, the boot_roles is used only by a specific vendor data in the metadata api.
that is also not really a good option as this data is used form muplile location and reuqiring you sett the same config option on the compute conductor scheduler and api the same is very error prone.
If there is another good value to store from Keystone side, I'm happy to use the value instead of role name, too.
im not actully sure why we are storign the roles at all in the instance_system_metadata.
the roles are not an atribute of the vm they are an atirbute of the user token used to create the vm as such i woudl argure we likely shoudl not be storing them at all since that data could be stale if we ever read it again later. since we cant rely on it to be current we shoudl not store it.
however im not sure why we orginally stored this so woe woudl need to investiagte that. ah it was added for dynmic vendor metadata https://github.com/openstack/nova/commit/6d8b58dc6f1cbda8d664b3487674f870494... we cant just drop that without deprecation but there are a few things we can do.
first of all we can wrap https://github.com/openstack/nova/blob/7399728e8937c31b7e92ed8fa7f9ce860c9a3... in an if that check if dynmic vendor data is enabled by checking len(CONF.api.vendordata_dynamic_targets) > 1 https://docs.openstack.org/nova/latest/configuration/config.html#api.vendord... second instead of doing {'boot_roles': ','.join(context.roles)} we can add a boot_roles_count key and use that to add pagination by adding one boot_role_<index> key per role that will allow you to have arbitary number of roles with one key (boot_role_0) per role and its very easy to detect and fall back to the boot_roles key if boot_roles_count is not presnet. that minimise the upgrade impact we coudl also do an online data migration when the instace.system_metadata is next read or witten too. that will avoid the large database hit on upgrade by spreadign the migration across new usage of the upgraded code path.
1. https://bugs.launchpad.net/nova/+bug/2075100 2. https://protect2.fireeye.com/v1/url?k=31323334-501d0a38-3133420b-454441504e31-45bfb03284bc255b&q=1&e=57e40200-28ef-405e-b7e7-e6df13c155e5&u=https%3A%2F%2Freview.opendev.org%2Fc%2Fopenstack%2Fnova%2F%2B%2F925163
thank you, Masahito
participants (2)
-
Masahito Muroi
-
smooney@redhat.com