[nova] compute's reaction to finding unmanaged VMs
Hi Nova Team, How does Nova relate to having unmanaged libvirt VMs alongside VMs managed by nova-compute? Is this considered supported, unsupported, or somewhere in between? I know about the warning in [1] and the compute startup time exception in [2]. Colleagues working on a major OpenStack version uplift in a downstream OpenStack distro started encountering the above exception. This distro utilizes some non-Nova-managed VMs (in particular, to orchestrate the OpenStack deployment itself). To reduce the deployment footprint, these VMs are co-located with Nova-managed VMs. AFAIU, this exception was introduced by the stable-compute-uuid blueprint [3]. With that in mind, I found this behavior: * Before the blueprint was implemented, Nova only logged a warning if it found an unmanaged VM. * After stable-compute-uuid, the compute service refuses to start in the same case. * If I make sure that the stable UUID is already present, then it seems to fall back to the earlier warning-only behavior. If I let nova-compute create the stable UUID itself, this leads to somewhat surprising behavior: * If nova-compute finds an unmanaged VM during its first startup, it refuses to start. * If it finds an unmanaged VM only during its second (or subsequent) startup (when the stable UUID has already been generated), it starts and logs a warning only. This made me think that the distro should pre-generate the compute UUID. Then I found in the stable-compute-uuid blueprint (and in the docs [4]) that this approach is clearly supported. It seems the compute service does not look at the metadata present in the libvirt domain definitions, and I started wondering why. AFAICT, the exception introduced by the stable-compute-uuid blueprint only really applies if Nova finds VMs that contain Nova metadata but are unknown to the compute service starting up. And this line of thought led me to my original question and these: * Could/should nova-compute inspect the metadata of libvirt VMs at startup and act differently if an unexpected VM has (or doesn’t have) Nova metadata? Would this be considered a meaningful Nova feature? Or should the distro just pre-generate the stable UUID and ignore the warning (as it did before the blueprint)? * Does Nova currently not look at the libvirt domain definition metadata simply because nobody has implemented this yet? Or is there another reason? * Is it considered supported (and to what level) to have non-Nova-managed VMs co-located with Nova-managed VMs? Thanks in advance, Bence Romsics (rubasov on irc) [1] https://github.com/openstack/nova/blob/54b65d5bf2b23fa8a4612fd3adddc8751192a... [2] https://github.com/openstack/nova/blob/54b65d5bf2b23fa8a4612fd3adddc8751192a... [3] https://specs.openstack.org/openstack/nova-specs/specs/2023.1/implemented/st... [4] https://docs.openstack.org/nova/latest/admin/compute-node-identification.htm...
How does Nova relate to having unmanaged libvirt VMs alongside VMs managed by nova-compute? Is this considered supported, unsupported, or somewhere in between?
Unrelated to stable-compute-uuid, we do not support this at all, in any way. Nova expects (and has always expected) to be the only thing managing the VMs on a libvirt instance, full stop. --Dan
On 04/08/2025 15:14, Dan Smith wrote:
How does Nova relate to having unmanaged libvirt VMs alongside VMs managed by nova-compute? Is this considered supported, unsupported, or somewhere in between? Unrelated to stable-compute-uuid, we do not support this at all, in any way. Nova expects (and has always expected) to be the only thing managing the VMs on a libvirt instance, full stop.
there is one untested caveat to that. if you use vcpu_pin_set, cpu_dedicated_set and/or cpu_shared_set to define which cores are aviabel to nova, and you adjust the host reserved ram/disk/hugepage values there was the capabltiy to run addtional host level vms on the compute nodes for thinks like vrouter for networking or other infra level usecase. i dont really know of anyone that really did that since circa 2015 era. This type of deployment was most common in installer whtat used "seed vms" to do the deployment that could be shutdown when the cloud is deployed and you are not perfroming day 2 oeprations like update/upgrade. in general you should run those vms on a seperate host that is not a nova compute node such as the controller hosts. the other commone example was providign infra level VNFs liek routing, loadblancing, vpns or firewalling as vms on the computes that are then consumed by the openstack itself. again ideally you woudl not run those vms on the comptue nodes unless you can run them as nova instance. they should be moved to dedicated networker nodes if possible. where the the logical network swtich for the openstack vms is run in a seperate vms like the early days of vrouter? ~(there was one network backend that used a vm for the vswich but i dont recall exactly) its not possible to move that vm to a separate host but that tyep of integration is not really supported upstream by the nova project. with that context in mind dan is absolutely right that for nova provisions vms nothing other then nova is allowed to interact with them. we do not document or test this colocation use-case even if very old installer sometimes did it because it not generally a usecase we want to support in nova. the capability exists but any issues that are encountered by using this partitioning approach are not upstream nova bugs. so strictly speaking it has never been officially supported, it has not been stated as unsupported in docs as there were existing deployment in production that made it work, but you are going outside the scope of upstream supported usecases if you attempt it.
--Dan
On Mon, Aug 4, 2025 at 4:15 PM Dan Smith <dms@danplanet.com> wrote:
How does Nova relate to having unmanaged libvirt VMs alongside VMs managed by nova-compute? Is this considered supported, unsupported, or somewhere in between?
Unrelated to stable-compute-uuid, we do not support this at all, in any way. Nova expects (and has always expected) to be the only thing managing the VMs on a libvirt instance, full stop.
I think this is an example of nova never intended to support the collocation of VMs but never prevented either. And therefore users out there started to rely on this capability. Now we started breaking that capability with the compute_uuid change. But we don't necessarily need to break it with that change. We could have both the compute_uuid change verifying that no nova VMs are running on the host and therefore preventing host rename, and still keeping the old behavior of allowing non nova VMs to run on the host. We would simply need to explicitly check the VMs reported by libvirt if they have nova metadata or not. I'm not sure what we would lose with this simple change. gibi
--Dan
I think this is an example of nova never intended to support the collocation of VMs but never prevented either.
Never prevented it for sure, but we have told people time and again not to do it on principle. I believe it has always been the design that Nova assumes full control of the node and previous attempts to avoid things like reaping unknown VMs as deleted were purely to prevent accidental data loss.
And therefore users out there started to rely on this capability. Now we started breaking that capability with the compute_uuid change. But we don't necessarily need to break it with that change. We could have both the compute_uuid change verifying that no nova VMs are running on the host and therefore preventing host rename, and still keeping the old behavior of allowing non nova VMs to run on the host. We would simply need to explicitly check the VMs reported by libvirt if they have nova metadata or not. I'm not sure what we would lose with this simple change.
It's certainly true that we could do this, but IMHO it would also open a can of worms in the form of blessing the idea that we allow other VMs alongside our instances. If that regresses in the future, we'll be expected to fix it. If other requests to change behavior for non-Nova instances come along, the same assumption-to-fix may be made. I'm not -2 but my preference would be to not open ourselves to that future. --Dan
On 14/08/2025 14:34, Dan Smith wrote:
I think this is an example of nova never intended to support the collocation of VMs but never prevented either. Never prevented it for sure, but we have told people time and again not to do it on principle. I believe it has always been the design that Nova assumes full control of the node and previous attempts to avoid things like reaping unknown VMs as deleted were purely to prevent accidental data loss.
its a little more subtl then that. when i first started contibuting to nova i had a very similar converstaion with many different cores about this. you are correct that we have always discouraged this type of colocation. it was howevr coniserd "allowed" i don't want to say supported since we never tested it but thing like contail vrouter i belive were first releases as VNFs that ran in a vm to provide networkign to the other nova vms. this was in the hevana/icehouse time frame so i have lost all memory of the details. you are very much correct that this only really came up after that in the context the perodic to reeap runing instnace that we did not belive hsoudl be on the host. we had extensive conversations on if checking the libvirt metadata for the nova xml namespace metaddadta was enough to determin if it was a nova managed vms. the up shot of that was yes in theory but no we did nto want to commit to that.
And therefore users out there started to rely on this capability. Now we started breaking that capability with the compute_uuid change. But we don't necessarily need to break it with that change. We could have both the compute_uuid change verifying that no nova VMs are running on the host and therefore preventing host rename, and still keeping the old behavior of allowing non nova VMs to run on the host. We would simply need to explicitly check the VMs reported by libvirt if they have nova metadata or not. I'm not sure what we would lose with this simple change. It's certainly true that we could do this, but IMHO it would also open a can of worms in the form of blessing the idea that we allow other VMs alongside our instances. If that regresses in the future, we'll be expected to fix it. If other requests to change behavior for non-Nova instances come along, the same assumption-to-fix may be made.
the rules for doing this in the past were alwsy efectivly the follwoing you must adjust https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... and https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... to account for the usage of the other vms. in general you shoudl avoid usign https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.res... and prefer https://docs.openstack.org/nova/latest/configuration/config.html#DEFAULT.vcp... or the newer https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu... and https://docs.openstack.org/nova/latest/configuration/config.html#compute.cpu... to defien which cores nova can use for its vms. you must not ever uses cores in the *_set config option for non nova managed vms. you many not use any device allow by https://docs.openstack.org/nova/latest/configuration/config.html#pci.device_... for non nova managed vm, that all appleis to the generic mdev feature or device passhtough in general like pmem. i.e. all devices manged by nova shoudl be exlcitlvy manage by it. we have alwasy said while this should be possibel to do its is not supproted by the upstream project offically and we would not build out new feature to make this more supported in the future. all of the above has only ever been true for the libvirt and xen as far as i am aware although i have only ever seen this done with libvirt.
I'm not -2 but my preference would be to not open ourselves to that future.
im not opposed to streating our statement about this to say its explicitly not support and you do so at your own risk. that is effectively the sentiment of what has always been communicated in this regard. the last time i had more then a trowaway conversation about this was with cdent at the most recent denver or vancouver ptg in the context of support Kubernetes and or zun on the same comptue node as nova. effectivly woudl it be possibel to deploy both at the same tiem and partion the host resouce so some could be used for conteiners or something like kubevirt (i dont think that existed then) via kubernetes or zun, and the rest could be used for nova. we dicssed these config option in some detail but i think the over all recomemation was its bettwer to just have a pool of baremental host in ironic an use some for k8s and some for nova as needed. i.e. partion on the host level rahter then within the host level. there was some interest in this general idea by folks that wanted to run opesntack on kubernetese as well. i do tend to agree that while it is possible to do it was never really intended to be something we developed in the future.
--Dan
participants (4)
-
Balazs Gibizer
-
Bence Romsics
-
Dan Smith
-
Sean Mooney