[nova][libvirt][qemu] Windows 11 Nested Virtualization Boot Loop on OpenStack (Emerald Rapids Hosts)
Hi community, We would like to report an issue we are encountering with nested virtualization on Windows 11 instances running on OpenStack. On our OpenStack platform, a Windows 11 VM becomes unable to boot after enabling Virtual Machine Platform/Hyper-V inside the VM and performing a hard reboot (VM does not show a BSOD but fails to boot into the OS and gets stuck in a boot loop in Tianocore logo). This issue does not occur on Windows 10 under the same conditions. System: * Compute node CPU: Intel Xeon Gold 6538Y+ (Emerald Rapids) with nested virtualization is enabled. * OpenStack 2024.2 * libvirt 8.0.0 * QEMU API 8.0.0 * QEMU hypervisor 6.2.0 Initially, our Nova CPU configuration was: * cpu_mode=host-model * cpu_model_extra_flags=+vmx,-hypervisor,-xsaves According to virsh dumpxml, host-model maps to Icelake-Server. We tested several cpu_mode=custom configurations and observed the following: * cpu_models=Icelake-Server-noTSX → error (boot loop) * cpu_models=Icelake-Server → error (boot loop) * cpu_models=Broadwell-noTSX-IBRS → works * cpu_models=Cascadelake-Server-noTSX → works The working models correspond to the Preferred CPU models recommended by QEMU: https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html#preferred-cp... We would like to ask: 1. Has anyone encountered this issue with Windows 11 + nested virtualization on Icelake/Emerald Rapids hosts? 2. Are there known root causes explaining why newer CPU models fail while preferred (older) models work? 3. Is there a recommended fix other than switching to a preferred older CPU model? Best regards, Hai Pham
does this bug <https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2106812> related your report? On Thu, Nov 20, 2025 at 9:23 PM Hai Pham Thanh <haipt43@fpt.com> wrote:
Hi community, We would like to report an issue we are encountering with nested virtualization on Windows 11 instances running on OpenStack. On our OpenStack platform, a Windows 11 VM becomes unable to boot after enabling Virtual Machine Platform/Hyper-V inside the VM and performing a hard reboot (VM does not show a BSOD but fails to boot into the OS and gets stuck in a boot loop in Tianocore logo). This issue does not occur on Windows 10 under the same conditions. System:
- Compute node CPU: Intel Xeon Gold 6538Y+ (Emerald Rapids) with nested virtualization is enabled. - OpenStack 2024.2 - libvirt 8.0.0 - QEMU API 8.0.0 - QEMU hypervisor 6.2.0
Initially, our Nova CPU configuration was:
- cpu_mode=host-model
- cpu_model_extra_flags=+vmx,-hypervisor,-xsaves
According to virsh dumpxml, host-model maps to Icelake-Server. We tested several cpu_mode=custom configurations and observed the following:
- cpu_models=Icelake-Server-noTSX → error (boot loop) - cpu_models=Icelake-Server → error (boot loop) - cpu_models=Broadwell-noTSX-IBRS → works - cpu_models=Cascadelake-Server-noTSX → works
The working models correspond to the Preferred CPU models recommended by QEMU:
https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html#preferred-cp... We would like to ask:
1. Has anyone encountered this issue with Windows 11 + nested virtualization on Icelake/Emerald Rapids hosts? 2. Are there known root causes explaining why newer CPU models fail while preferred (older) models work? 3. Is there a recommended fix other than switching to a preferred older CPU model?
Best regards, Hai Pham
-- Thu.
I think my previous message didn’t explain the actual problem clearly enough, so it might have caused some confusion. The bug you mentioned doesn’t seem related to what we’re seeing. In our case, the issue happens specifically when Nova’s cpu_models is set to IceLake variants (Icelake-Server-noTSX or Icelake-Server). With those models, Windows 11 VMs go into a boot loop after enabling nested virtualization features like Hyper-V or WSL. Windows 10 works fine, and Windows 11 also works fine if we use older CPU models like Broadwell or Cascadelake. The part about our compute nodes using Emerald Rapids CPUs was just for full system context — it’s not the root cause. Update: I suspect this might be a QEMU-side issue, so I’ve also opened a report on the QEMU GitLab. If anyone wants to follow it: https://gitlab.com/qemu-project/qemu/-/issues/3215 ________________________________ From: Hoai-Thu Vuong <thuvh87@gmail.com> Sent: Friday, November 21, 2025 11:23 AM To: Hai Pham Thanh <haipt43@fpt.com> Cc: openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: [nova][libvirt][qemu] Windows 11 Nested Virtualization Boot Loop on OpenStack (Emerald Rapids Hosts) does this bug<https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2106812> related your report? On Thu, Nov 20, 2025 at 9:23 PM Hai Pham Thanh <haipt43@fpt.com<mailto:haipt43@fpt.com>> wrote: Hi community, We would like to report an issue we are encountering with nested virtualization on Windows 11 instances running on OpenStack. On our OpenStack platform, a Windows 11 VM becomes unable to boot after enabling Virtual Machine Platform/Hyper-V inside the VM and performing a hard reboot (VM does not show a BSOD but fails to boot into the OS and gets stuck in a boot loop in Tianocore logo). This issue does not occur on Windows 10 under the same conditions. System: * Compute node CPU: Intel Xeon Gold 6538Y+ (Emerald Rapids) with nested virtualization is enabled. * OpenStack 2024.2 * libvirt 8.0.0 * QEMU API 8.0.0 * QEMU hypervisor 6.2.0 Initially, our Nova CPU configuration was: * cpu_mode=host-model * cpu_model_extra_flags=+vmx,-hypervisor,-xsaves According to virsh dumpxml, host-model maps to Icelake-Server. We tested several cpu_mode=custom configurations and observed the following: * cpu_models=Icelake-Server-noTSX → error (boot loop) * cpu_models=Icelake-Server → error (boot loop) * cpu_models=Broadwell-noTSX-IBRS → works * cpu_models=Cascadelake-Server-noTSX → works The working models correspond to the Preferred CPU models recommended by QEMU: https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html#preferred-cp... We would like to ask: 1. Has anyone encountered this issue with Windows 11 + nested virtualization on Icelake/Emerald Rapids hosts? 2. Are there known root causes explaining why newer CPU models fail while preferred (older) models work? 3. Is there a recommended fix other than switching to a preferred older CPU model? Best regards, Hai Pham -- Thu.
Still Hai Pham Thanh here — I’m replying from a different account because my company email couldn’t log in to respond. I think my previous message didn’t explain the actual problem clearly enough, so it might have caused some confusion. The bug you mentioned doesn’t seem related to what we’re seeing. In our case, the issue happens specifically when Nova’s cpu_models is set to IceLake variants (Icelake-Server-noTSX or Icelake-Server). With those models, Windows 11 VMs go into a boot loop after enabling nested virtualization features like Hyper-V or WSL. Windows 10 works fine, and Windows 11 also works fine if we use older CPU models like Broadwell or Cascadelake. The part about our compute nodes using Emerald Rapids CPUs was just to give full context — it’s not the cause. Update: I suspect this might be a QEMU-side issue, so I’ve also opened a report on the QEMU GitLab. If anyone wants to follow it: https://gitlab.com/qemu-project/qemu/-/issues/3215
Windows 11 is complicated case, because it requires tpm device (feature partly supported by openstack) and secure boot (require barbican) Thu. On Fri, Nov 21, 2025, 21:44 <phamthanhai2001@gmail.com> wrote:
Still Hai Pham Thanh here — I’m replying from a different account because my company email couldn’t log in to respond.
I think my previous message didn’t explain the actual problem clearly enough, so it might have caused some confusion.
The bug you mentioned doesn’t seem related to what we’re seeing. In our case, the issue happens specifically when Nova’s cpu_models is set to IceLake variants (Icelake-Server-noTSX or Icelake-Server). With those models, Windows 11 VMs go into a boot loop after enabling nested virtualization features like Hyper-V or WSL. Windows 10 works fine, and Windows 11 also works fine if we use older CPU models like Broadwell or Cascadelake.
The part about our compute nodes using Emerald Rapids CPUs was just to give full context — it’s not the cause.
Update: I suspect this might be a QEMU-side issue, so I’ve also opened a report on the QEMU GitLab. If anyone wants to follow it: https://gitlab.com/qemu-project/qemu/-/issues/3215
On 21/11/2025 04:23, Hoai-Thu Vuong wrote:
does this bug <https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2106812> related your report? good question
i know without openstack i have also seen issue with just plain libvirt vms and nested virt the only windows thing in my house is my "gaming pc" which is a qemu vm on Pop!_OS 22.04 LTS with a passhtough gpu. i see the same issue if i trying to enable docker/hyperv/wsl in windows 11 in that environment the host has a Intel(R) Core(TM) i7-14700K but the host detects it as Broadwell-noTSX-IBRS when i run `virsh capabilities` which sound very similar to https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2106812
On Thu, Nov 20, 2025 at 9:23 PM Hai Pham Thanh <haipt43@fpt.com> wrote:
Hi community, We would like to report an issue we are encountering with nested virtualization on Windows 11 instances running on OpenStack. On our OpenStack platform, a Windows 11 VM becomes unable to boot after enabling Virtual Machine Platform/Hyper-V inside the VM and performing a hard reboot (VM does not show a BSOD but fails to boot into the OS and gets stuck in a boot loop in Tianocore logo). This issue does not occur on Windows 10 under the same conditions. System:
* Compute node CPU: Intel Xeon Gold 6538Y+ (Emerald Rapids) with nested virtualization is enabled. * OpenStack 2024.2 * libvirt 8.0.0 * QEMU API 8.0.0 * QEMU hypervisor 6.2.0
Initially, our Nova CPU configuration was:
* cpu_mode=host-model
* cpu_model_extra_flags=+vmx,-hypervisor,-xsaves
According to virsh dumpxml, host-model maps to Icelake-Server. We tested several cpu_mode=custom configurations and observed the following:
* cpu_models=Icelake-Server-noTSX → error (boot loop) * cpu_models=Icelake-Server → error (boot loop) * cpu_models=Broadwell-noTSX-IBRS → works * cpu_models=Cascadelake-Server-noTSX → works
The working models correspond to the Preferred CPU models recommended by QEMU: https://qemu-project.gitlab.io/qemu/system/qemu-cpu-models.html#preferred-cp... We would like to ask:
1. Has anyone encountered this issue with Windows 11 + nested virtualization on Icelake/Emerald Rapids hosts? 2. Are there known root causes explaining why newer CPU models fail while preferred (older) models work?
so the way this works in libvirt is it has a cpu map fo all cpu modesl it knows about and it fined the newest model that has a subset of the feature detected on the host and report that as the host model. intel had a number of hardware cves related to transactional memory aka TSX and they remove those feature flag in later generatiosn they also remvoed some thing liek AVX512 in some skus as a result if your libvirt/qemu does not currently have maps for the new model and you have a newer cpu htat has thos flag removed it end up matching on a much older model then you would expect as a reusult. its unfortunet that this leave a lot of performance on the table but the only way to workaorund this without update libvirt/qemu is to find the newset model that owrk then use nova config option to add in the other cpu flags that are common to all the host you want to support live migration too. it looks like https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2106812 has recently been fixed in july so seeing if you can update to a version that actully supprot your hardware woudl be a good first step. i dont think this is a nova bug but if you can repoduce it with a livbirt that report your default cpu model as the same generation that you have we could explore more. normally when we enable nested virt we recommend that you use host-passthough to avoid some of these issues but that give up the ablity to live migrate unless all you host use the same sku fo cpu so the host-model approch or custom approch also shoudl work modulo bugs.
1.
2. Is there a recommended fix other than switching to a preferred older CPU model?
if live migration is not a concern just use cpu-mode=host-passthough but i think the best way is to update your livbrt/qemu if possible in your env.
1.
Best regards, Hai Pham
-- Thu.
participants (4)
-
Hai Pham Thanh
-
Hoai-Thu Vuong
-
phamthanhai2001@gmail.com
-
Sean Mooney