Open Stack

Wed Jun 26 07:51:00 UTC 2019

On Tue, Jun 25, 2019 at 10:05 PM, Matt Riedemann <mriedemos at gmail.com> 
wrote:
> There are still quite a few TODOs in the code [1][2][3] from a kilo 
> era blueprint [4]. At this point I'm pretty sure you can't startup 
> the nova-compute service without having a ComputeNode record without 
> a host and hypervisor_hostname field set (we don't set the 
> ComputeNode.service_id anywhere anymore as far as I can tell, except 
> in some ComputeNode RPC compat code [5]).
> 
> I've stumbled across all of this code before, but was looking at it 
> again today because I have a very simple change I need to make which 
> is going from a ComputeNode object and getting the related 
> nova-compute Service object for that node.
> 
> Looking at the code one might think this is reasonable:
> 
> service = objects.Service.get_by_id(ctxt, compute_node.service_id)
> 
> But compute_node.service_id is likely None. Or how about:
> 
> service = objects.Service.get_by_compute_host(ctxt, compute_node.host)
> 
> But ComputeNode.host is also nullable (though likely should have a 
> value as noted above).
> 
> This is a long way of me saying this code is all gross and we should 
> clean it up, which means making sure all of this Kilo era compat code 
> for old records is no longer necessary, which means all of those 
> records should be migrated by now but how should we check?
> 
> I *think* this might just be as simple as a "nova-status upgrade 
> check" check which scans the cells looking for (non-deleted) 
> compute_nodes records where host is NULL and report an error if any 
> are found. I believe the recovery action for an operator that hits 
> this is to delete the busted compute_nodes record and restart the 
> nova-compute service so a new compute node record is created. I would 
> really think that anything this scan would find would be orphaned 
> compute_nodes records that could just be deleted since another 
> compute_nodes record probably already exists for the same 
> hypervisor_hostname value. IOW, I don't think we need an online data 
> migration routine for this.
> 
> Hopefully at least one person (Sylvain) can agree with me here and 
> the plan of action I've put forth.

You plan makes sens to me too.

gibi

> 
> [1] 
> https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/db/sqlalchemy/models.py#L123
> [2] 
> https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L150
> [3] 
> https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L263
> [4] 
> https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode
> [5] 
> https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8f58/nova/objects/compute_node.py#L118
> 
> --
> 
> Thanks,
> 
> Matt
> 

Open Stack

[nova] Can we drop the kilo era ComputeNode host/service_id compat code now?

OpenStack

Community

Documentation

Branding & Legal