On Tue, Jun 25, 2019 at 10:05 PM, Matt Riedemann <mriedemos@gmail.com> wrote:
There are still quite a few TODOs in the code [1][2][3] from a kilo era blueprint [4]. At this point I'm pretty sure you can't startup the nova-compute service without having a ComputeNode record without a host and hypervisor_hostname field set (we don't set the ComputeNode.service_id anywhere anymore as far as I can tell, except in some ComputeNode RPC compat code [5]).
I've stumbled across all of this code before, but was looking at it again today because I have a very simple change I need to make which is going from a ComputeNode object and getting the related nova-compute Service object for that node.
Looking at the code one might think this is reasonable:
service = objects.Service.get_by_id(ctxt, compute_node.service_id)
But compute_node.service_id is likely None. Or how about:
service = objects.Service.get_by_compute_host(ctxt, compute_node.host)
But ComputeNode.host is also nullable (though likely should have a value as noted above).
This is a long way of me saying this code is all gross and we should clean it up, which means making sure all of this Kilo era compat code for old records is no longer necessary, which means all of those records should be migrated by now but how should we check?
I *think* this might just be as simple as a "nova-status upgrade check" check which scans the cells looking for (non-deleted) compute_nodes records where host is NULL and report an error if any are found. I believe the recovery action for an operator that hits this is to delete the busted compute_nodes record and restart the nova-compute service so a new compute node record is created. I would really think that anything this scan would find would be orphaned compute_nodes records that could just be deleted since another compute_nodes record probably already exists for the same hypervisor_hostname value. IOW, I don't think we need an online data migration routine for this.
Hopefully at least one person (Sylvain) can agree with me here and the plan of action I've put forth.
You plan makes sens to me too. gibi
[1] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8... [2] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8... [3] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8... [4] https://blueprints.launchpad.net/nova/+spec/detach-service-from-computenode [5] https://github.com/openstack/nova/blob/91647a9b711a8102c79bb17c6b4dff24ad6f8...