[ops][glance][nova] scheduling problem because of ImagePropertiesFilter
We have just finished the update of our cloud to Rocky but we are seeing a strange issue with images with property: hypervisor_type='QEMU' The scheduling of instances created with such images fails because of the ImagePropertiesFilter [*] Removing that property from the image, the scheduling works. We also tried changing hypervisor_type='QEMU' --> hypervisor_type='qemu', but this didn't help. Any suggestions? Thanks, Massimo [*] 2019-07-17 13:52:58.148 13312 INFO nova.filters [req-1863bef0-0326-46d1-a836-436227e91eef 6e3b136d578f4292a5c03b16f443ab3d d27fe2becea94a3e980fb9f66e2f291a - default default] Filtering removed all hosts f\ or the request with instance ID '63810b60-76e4-4e76-a1c3-e4d3932c002e'. Filter results: ['AggregateMultiTenancyIsolation: (start: 49, end: 37)', 'AggregateInstanceExtraSpecsFilter: (start: 37, end: 34)', \ 'RetryFilter: (start: 34, end: 34)', 'AvailabilityZoneFilter: (start: 34, end: 34)', 'ComputeFilter: (start: 34, end: 32)', 'ComputeCapabilitiesFilter: (start: 32, end: 32)', 'ImagePropertiesFilter: (star\ t: 32, end: 0)']
On 7/17/19 8:14 AM, Massimo Sgaravatto wrote:
We have just finished the update of our cloud to Rocky but we are seeing a strange issue with images with property: hypervisor_type='QEMU'
The scheduling of instances created with such images fails because of the ImagePropertiesFilter [*]
Removing that property from the image, the scheduling works. We also tried changing hypervisor_type='QEMU' --> hypervisor_type='qemu', but this didn't help.
Any suggestions?
Commit e792d50efadb36437e82381f4c84d738cee25dfd in Ocata changed the image metadata that the ImagePropertiesFilter pays attention to: diff --git a/nova/scheduler/filters/image_props_filter.py b/nova/scheduler/filters/image_props_filter.py index 06def5c769..521a6816a3 100644 --- a/nova/scheduler/filters/image_props_filter.py +++ b/nova/scheduler/filters/image_props_filter.py @@ -43,9 +43,9 @@ class ImagePropertiesFilter(filters.BaseHostFilter): def _instance_supported(self, host_state, image_props, hypervisor_version): - img_arch = image_props.get('architecture', None) - img_h_type = image_props.get('hypervisor_type', None) - img_vm_mode = image_props.get('vm_mode', None) + img_arch = image_props.get('hw_architecture') + img_h_type = image_props.get('img_hv_type') + img_vm_mode = image_props.get('hw_vm_mode') checked_img_props = ( arch.canonicalize(img_arch), hv_type.canonicalize(img_h_type), Looks like 'img_hv_type' is the metadata key you need to use. If that works, please put up a patch to the Glance "useful image properties" docs [0], we seem to be out of date on this issue. [0] https://opendev.org/openstack/glance/src/branch/master/doc/source/admin/usef... cheers, brian
Thanks, Massimo
[*] 2019-07-17 13:52:58.148 13312 INFO nova.filters [req-1863bef0-0326-46d1-a836-436227e91eef 6e3b136d578f4292a5c03b16f443ab3d d27fe2becea94a3e980fb9f66e2f291a - default default] Filtering removed all hosts f\ or the request with instance ID '63810b60-76e4-4e76-a1c3-e4d3932c002e'. Filter results: ['AggregateMultiTenancyIsolation: (start: 49, end: 37)', 'AggregateInstanceExtraSpecsFilter: (start: 37, end: 34)', \ 'RetryFilter: (start: 34, end: 34)', 'AvailabilityZoneFilter: (start: 34, end: 34)', 'ComputeFilter: (start: 34, end: 32)', 'ComputeCapabilitiesFilter: (start: 32, end: 32)', 'ImagePropertiesFilter: (star\ t: 32, end: 0)']
This seems indeed the problem (I will file an issue for the documentation). Thanks a lot ! If I remove/fix that property from an image, I am now able to start new instances using that image. The problem is with instances created BEFORE removing the property: I am not able to migrate them (using nova migrate), unless I remove the ImagePropertiesFilter from the scheduler filters. Moreover: if a user creates a snapshot of one of these instances, the snapshot got created with this wrong property Is there some clean way to remove/change the problematic property from the relevant instances ? Thanks, Massimo On Wed, Jul 17, 2019 at 5:41 PM Brian Rosmaita <rosmaita.fossdev@gmail.com> wrote:
On 7/17/19 8:14 AM, Massimo Sgaravatto wrote:
We have just finished the update of our cloud to Rocky but we are seeing a strange issue with images with property: hypervisor_type='QEMU'
The scheduling of instances created with such images fails because of the ImagePropertiesFilter [*]
Removing that property from the image, the scheduling works. We also tried changing hypervisor_type='QEMU' --> hypervisor_type='qemu', but this didn't help.
Any suggestions?
Commit e792d50efadb36437e82381f4c84d738cee25dfd in Ocata changed the image metadata that the ImagePropertiesFilter pays attention to:
diff --git a/nova/scheduler/filters/image_props_filter.py b/nova/scheduler/filters/image_props_filter.py index 06def5c769..521a6816a3 100644 --- a/nova/scheduler/filters/image_props_filter.py +++ b/nova/scheduler/filters/image_props_filter.py @@ -43,9 +43,9 @@ class ImagePropertiesFilter(filters.BaseHostFilter):
def _instance_supported(self, host_state, image_props, hypervisor_version): - img_arch = image_props.get('architecture', None) - img_h_type = image_props.get('hypervisor_type', None) - img_vm_mode = image_props.get('vm_mode', None) + img_arch = image_props.get('hw_architecture') + img_h_type = image_props.get('img_hv_type') + img_vm_mode = image_props.get('hw_vm_mode') checked_img_props = ( arch.canonicalize(img_arch), hv_type.canonicalize(img_h_type),
Looks like 'img_hv_type' is the metadata key you need to use.
If that works, please put up a patch to the Glance "useful image properties" docs [0], we seem to be out of date on this issue.
[0]
https://opendev.org/openstack/glance/src/branch/master/doc/source/admin/usef...
cheers, brian
Thanks, Massimo
[*] 2019-07-17 13:52:58.148 13312 INFO nova.filters [req-1863bef0-0326-46d1-a836-436227e91eef 6e3b136d578f4292a5c03b16f443ab3d d27fe2becea94a3e980fb9f66e2f291a - default default] Filtering removed all hosts f\ or the request with instance ID '63810b60-76e4-4e76-a1c3-e4d3932c002e'. Filter results: ['AggregateMultiTenancyIsolation: (start: 49, end: 37)', 'AggregateInstanceExtraSpecsFilter: (start: 37, end: 34)', \ 'RetryFilter: (start: 34, end: 34)', 'AvailabilityZoneFilter: (start: 34, end: 34)', 'ComputeFilter: (start: 34, end: 32)', 'ComputeCapabilitiesFilter: (start: 32, end: 32)', 'ImagePropertiesFilter: (star\ t: 32, end: 0)']
On 7/19/2019 8:26 AM, Massimo Sgaravatto wrote:
Is there some clean way to remove/change the problematic property from the relevant instances ?
Not really externally. For the snapshot images you should be able to update those properties on the image using the glance API (and CLIs), but for existing instances the image metadata properties are stored in the instance_system_metadata table with an image_ prefix so you'd have to update those. I would have thought there would be a translation shim for the rename of that property in nova though... https://review.opendev.org/#/c/202675/ doesn't explain at all why that was done. Note that the old hypervisor_type image meta key should be translated to img_hv_type in the object code: https://github.com/openstack/nova/blob/d5c67a3d954ddb571645886a23a0f251ae7dd... -- Thanks, Matt
On 7/22/2019 4:10 PM, Matt Riedemann wrote:
Note that the old hypervisor_type image meta key should be translated to img_hv_type in the object code:
https://github.com/openstack/nova/blob/d5c67a3d954ddb571645886a23a0f251ae7dd...
In other words, it smells like a nova bug to me that it's not handling that renamed image property key. -- Thanks, Matt
I was wrong: the alias work ! I.e.: img_hv_type=xyz is equivalent to: hypervisor_type=xyz The problem is when xyz is 'qemu'. This used to work with Ocata and now (Rocky) is not working anymore in "my" cloud. It works instead if I use kvm. This is weird since in https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html it is reported that: <verbatim> qemu is used for both QEMU and KVM hypervisor types </verbatim> Thanks, Massimo On Mon, Jul 22, 2019 at 11:16 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 7/22/2019 4:10 PM, Matt Riedemann wrote:
Note that the old hypervisor_type image meta key should be translated to img_hv_type in the object code:
https://github.com/openstack/nova/blob/d5c67a3d954ddb571645886a23a0f251ae7dd...
In other words, it smells like a nova bug to me that it's not handling that renamed image property key.
--
Thanks,
Matt
On 7/23/2019 8:12 AM, Massimo Sgaravatto wrote:
I was wrong: the alias work ! I.e.:
img_hv_type=xyz
is equivalent to:
hypervisor_type=xyz
The problem is when xyz is 'qemu'. This used to work with Ocata and now (Rocky) is not working anymore in "my" cloud. It works instead if I use kvm.
This is weird since in
https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html
it is reported that:
<verbatim> qemu is used for both QEMU and KVM hypervisor types </verbatim>
Thanks, Massimo
Hmm just to be clear, you used to use hypervisor_type=QEMU and that no longer works, and the problem isn't hypervisor_type vs img_hv_type (the key), but the value QEMU vs qemu. Which isn't working in Rocky? QEMU or qemu? What is the hypervisor_type on your nodes when listing them out of the API? (openstack hypervisor list --long should give you that output). Do you have a mix of QEMU vs qemu on some nodes? This sort of sounds like bug 1818092. -- Thanks, Matt
Sorry: I was not clear When running Ocata I had as property of some images: hypervisor_type='QEMU' and this worked Now in Rocjy: hypervisor_type='QEMU' --> doesn't work (i.e. all hypervisors are excluded by ImagePropertiesFilter) hypervisor_type='qemu' --> doesn't work (i.e. all hypervisors are excluded by ImagePropertiesFilter) hypervisor_type='kvm' --> works "openstack hypervisor list --long" reports "QEMU" as Hypervisor Type for all compute nodes Thanks again for your help Cheers, Massimo On Tue, Jul 23, 2019 at 3:44 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 7/23/2019 8:12 AM, Massimo Sgaravatto wrote:
I was wrong: the alias work ! I.e.:
img_hv_type=xyz
is equivalent to:
hypervisor_type=xyz
The problem is when xyz is 'qemu'. This used to work with Ocata and now (Rocky) is not working anymore in "my" cloud. It works instead if I use kvm.
This is weird since in
https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html
it is reported that:
<verbatim> qemu is used for both QEMU and KVM hypervisor types </verbatim>
Thanks, Massimo
Hmm just to be clear, you used to use hypervisor_type=QEMU and that no longer works, and the problem isn't hypervisor_type vs img_hv_type (the key), but the value QEMU vs qemu. Which isn't working in Rocky? QEMU or qemu?
What is the hypervisor_type on your nodes when listing them out of the API? (openstack hypervisor list --long should give you that output). Do you have a mix of QEMU vs qemu on some nodes? This sort of sounds like bug 1818092.
--
Thanks,
Matt
On 7/23/2019 8:57 AM, Massimo Sgaravatto wrote:
When running Ocata I had as property of some images:
hypervisor_type='QEMU'
and this worked
Now in Rocjy:
hypervisor_type='QEMU' --> doesn't work (i.e. all hypervisors are excluded by ImagePropertiesFilter) hypervisor_type='qemu' --> doesn't work (i.e. all hypervisors are excluded by ImagePropertiesFilter) hypervisor_type='kvm' --> works
"openstack hypervisor list --long" reports "QEMU" as Hypervisor Type for all compute nodes
Apparently the filter doesn't use the ComputeNode.hypervisor_type field (which is what you see in the API/CLI output) to compare the img_hv_type property, it relies on some ComputeNode.supported_instances tuples which are reported differently by the driver. Can you enable debug in the scheduler so we could see this output when you get the NoValidHost? https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/filters/i... Did you upgrade libvirt/qemu as well when you upgraded these nodes to Rocky? I wonder if the supported instance hypervisor type reported by the virt driver is kvm now rather than qemu even though the hypervisor type reported in the API shows QEMU. FWIW this is the virt driver code that reports that supported_instances information for the compute node that's used by the scheduler filter: https://github.com/openstack/nova/blob/stable/rocky/nova/virt/libvirt/driver... -- Thanks, Matt
This [*] is what appears in nova-scheduler after having enabled the debug. We performed a "yum update" so, yes, we also updated libvirt (now we are running v. 4.5.0) Thanks, Massimo [*] 2019-07-23 16:44:34.849 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>) that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported /usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103 2019-07-23 16:44:34.852 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>) that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported /usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103 On Tue, Jul 23, 2019 at 4:35 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 7/23/2019 8:57 AM, Massimo Sgaravatto wrote:
When running Ocata I had as property of some images:
hypervisor_type='QEMU'
and this worked
Now in Rocjy:
hypervisor_type='QEMU' --> doesn't work (i.e. all hypervisors are excluded by ImagePropertiesFilter) hypervisor_type='qemu' --> doesn't work (i.e. all hypervisors are excluded by ImagePropertiesFilter) hypervisor_type='kvm' --> works
"openstack hypervisor list --long" reports "QEMU" as Hypervisor Type for all compute nodes
Apparently the filter doesn't use the ComputeNode.hypervisor_type field (which is what you see in the API/CLI output) to compare the img_hv_type property, it relies on some ComputeNode.supported_instances tuples which are reported differently by the driver.
Can you enable debug in the scheduler so we could see this output when you get the NoValidHost?
https://github.com/openstack/nova/blob/stable/rocky/nova/scheduler/filters/i...
Did you upgrade libvirt/qemu as well when you upgraded these nodes to Rocky? I wonder if the supported instance hypervisor type reported by the virt driver is kvm now rather than qemu even though the hypervisor type reported in the API shows QEMU.
FWIW this is the virt driver code that reports that supported_instances information for the compute node that's used by the scheduler filter:
https://github.com/openstack/nova/blob/stable/rocky/nova/virt/libvirt/driver...
--
Thanks,
Matt
On 7/23/2019 9:50 AM, Massimo Sgaravatto wrote:
This [*] is what appears in nova-scheduler after having enabled the debug.
We performed a "yum update" so, yes, we also updated libvirt (now we are running v. 4.5.0)
Thanks, Massimo
[*]
2019-07-23 16:44:34.849 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>) that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported /usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103 2019-07-23 16:44:34.852 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>) that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported /usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103
Yeah at this point I'm not sure what's going on but the driver is reporting kvm now and your image is requesting qemu so that's why the hosts are getting filtered out. I'm not sure why the upgrade of libvirt/qemu would change what the driver is reporting now, but it's a bit lower level than I'd know about off hand. Maybe some of the Red Hat nova devs would know more about this or have seen it before. -- Thanks, Matt
On 7/23/19 8:14 AM, Matt Riedemann wrote:
On 7/23/2019 9:50 AM, Massimo Sgaravatto wrote:
This [*] is what appears in nova-scheduler after having enabled the debug.
We performed a "yum update" so, yes, we also updated libvirt (now we are running v. 4.5.0)
Thanks, Massimo
[*]
2019-07-23 16:44:34.849 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>) that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported /usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103
2019-07-23 16:44:34.852 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>) that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported /usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103
Yeah at this point I'm not sure what's going on but the driver is reporting kvm now and your image is requesting qemu so that's why the hosts are getting filtered out. I'm not sure why the upgrade of libvirt/qemu would change what the driver is reporting now, but it's a bit lower level than I'd know about off hand. Maybe some of the Red Hat nova devs would know more about this or have seen it before.
I'm not sure whether this is related, but this thread reminded me of a change that landed in Rocky where we started filtering hypervisor capabilities by the configured CONF.libvirt.virt_type: https://review.opendev.org/531347 I didn't see mention so far of how CONF.libvirt.virt_type has been configured in this deployment. Is it set to 'kvm' or 'qemu'? If it's set to 'kvm', that would cause 'qemu' capabilities to be filtered out, when they would not have been prior to Rocky. Apologies if this was an unrelated tangent. Cheers, -melanie
Melanie: I think this is indeed the problem ! But then, if I am not wrong, the note in: https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html <verbatim> Note qemu is used for both QEMU and KVM hypervisor types. </verbatim> should be removed. I can open a bug if you agree ... And maybe this is something worth to be mentioned in the release notes ? Thanks again for your help ! Cheers, Massimo On Wed, Jul 24, 2019 at 2:11 AM melanie witt <melwittt@gmail.com> wrote:
On 7/23/19 8:14 AM, Matt Riedemann wrote:
On 7/23/2019 9:50 AM, Massimo Sgaravatto wrote:
This [*] is what appears in nova-scheduler after having enabled the debug.
We performed a "yum update" so, yes, we also updated libvirt (now we are running v. 4.5.0)
Thanks, Massimo
[*]
2019-07-23 16:44:34.849 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties
ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>)
that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported
/usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103
2019-07-23 16:44:34.852 12561 DEBUG nova.scheduler.filters.image_props_filter [req-52638278-51b7-4768-836a-f70d8a8b016a ab573ba3ea014b778193b6922ffffe6d ee1865a76440481cbcff08544c7d580a - default default] Instance contains properties
ImageMetaProps(hw_architecture=<?>,hw_auto_disk_config=<?>,hw_boot_menu=<?>,hw_cdrom_bus=<?>,hw_cpu_cores=<?>,hw_cpu_max_cores=<?>,hw_cpu_max_sockets=<?>,hw_cpu_max_threads=<?>,hw_cpu_policy=<?>,hw_cpu_realtime_mask=<?>,hw_cpu_sockets=<?>,hw_cpu_thread_policy=<?>,hw_cpu_threads=<?>,hw_device_id=<?>,hw_disk_bus=<?>,hw_disk_type=<?>,hw_firmware_type=<?>,hw_floppy_bus=<?>,hw_ipxe_boot=<?>,hw_machine_type=<?>,hw_mem_page_size=<?>,hw_numa_cpus=<?>,hw_numa_mem=<?>,hw_numa_nodes=<?>,hw_pointer_model=<?>,hw_qemu_guest_agent=<?>,hw_rescue_bus=<?>,hw_rescue_device=<?>,hw_rng_model=<?>,hw_scsi_model=<?>,hw_serial_port_count=<?>,hw_video_model=<?>,hw_video_ram=<?>,hw_vif_model=<?>,hw_vif_multiqueue_enabled=<?>,hw_vm_mode=<?>,hw_watchdog_action=<?>,img_bdm_v2=<?>,img_bittorrent=<?>,img_block_device_mapping=<?>,img_cache_in_nova=<?>,img_compression_level=<?>,img_config_drive=<?>,img_hide_hypervisor_id=<?>,img_hv_requested_version=<?>,img_hv_type='qemu',img_linked_clone=<?>,img_mappings=<?>,img_owner_id=<?>,img_root_device_name=<?>,img_signature=<?>,img_signature_certificate_uuid=<?>,img_signature_hash_method=<?>,img_signature_key_type=<?>,img_use_agent=<?>,img_version=<?>,os_admin_user=<?>,os_command_line=<?>,os_distro=<?>,os_require_quiesce=<?>,os_secure_boot=<?>,os_skip_agent_inject_files_at_boot=<?>,os_skip_agent_inject_ssh=<?>,os_type=<?>,traits_required=<?>)
that are not provided by the compute node supported_instances [[u'i686', u'kvm', u'hvm'], [u'x86_64', u'kvm', u'hvm']] or hypervisor version 2012000 do not match _instance_supported
/usr/lib/python2.7/site-packages/nova/scheduler/filters/image_props_filter.py:103
Yeah at this point I'm not sure what's going on but the driver is reporting kvm now and your image is requesting qemu so that's why the hosts are getting filtered out. I'm not sure why the upgrade of libvirt/qemu would change what the driver is reporting now, but it's a bit lower level than I'd know about off hand. Maybe some of the Red Hat nova devs would know more about this or have seen it before.
I'm not sure whether this is related, but this thread reminded me of a change that landed in Rocky where we started filtering hypervisor capabilities by the configured CONF.libvirt.virt_type:
https://review.opendev.org/531347
I didn't see mention so far of how CONF.libvirt.virt_type has been configured in this deployment. Is it set to 'kvm' or 'qemu'? If it's set to 'kvm', that would cause 'qemu' capabilities to be filtered out, when they would not have been prior to Rocky.
Apologies if this was an unrelated tangent.
Cheers, -melanie
On 7/24/2019 1:37 AM, Massimo Sgaravatto wrote:
Melanie: I think this is indeed the problem !
But then, if I am not wrong, the note in:
https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html
<verbatim> Note
qemu is used for both QEMU and KVM hypervisor types. </verbatim>
should be removed. I can open a bug if you agree ...
And maybe this is something worth to be mentioned in the release notes ?
Thanks again for your help !
Cheers, Massimo
Massimo - just to confirm, your [libvirt]/virt_type is "kvm" rather than "qemu" correct? If so, then yeah what Melanie found is the problem and was regression in behavior for the ImagePropertiesFilter in Rocky and there should be a fix to the docs and likely a release note - though the release note is tricky since the regression was introduced in Rocky. -- Thanks, Matt
On 7/24/2019 9:34 AM, Matt Riedemann wrote:
Massimo - just to confirm, your [libvirt]/virt_type is "kvm" rather than "qemu" correct? If so, then yeah what Melanie found is the problem and was regression in behavior for the ImagePropertiesFilter in Rocky and there should be a fix to the docs and likely a release note - though the release note is tricky since the regression was introduced in Rocky.
Note that the glance description of the hypervisor_type property is also wrong: https://docs.openstack.org/glance/latest/admin/useful-image-properties.html "The hypervisor type. Note that qemu is used for both QEMU and KVM hypervisor types." Given https://review.opendev.org/#/c/531328/ was abandoned, if the API is still showing QEMU for the hypervisor type (which Massimo confirmed it is) even though the node is configured with virt_type=kvm and that's what the ImagePropertiesFilter is going to use, I think we'd be justified in reverting https://review.opendev.org/#/c/531347/ since it's totally confusing to operators if the API is showing the hypervisor_type as QEMU but the scheduler is filtering on "kvm". We can change the docs but that feels like papering over the issue to me. What would the docs say? Something like, "Since the 18.0.0 Rocky release, the hypervisor_type value for libvirt nodes should match the configured [libvirt]/virt_type value"? That won't fix any existing images with their properties embedded in an instance's system_metadata which could prevent you from being able to migrate those instances anywhere during the upgrade - you'd have to essentially do some database surgery in the instance_system_metadata table to fix the image_hypervisor_type value to match whatever the virt_type value is on that node. Alternatively we could try to put some targeted compat code in the ImagePropertiesFilter where if the hypervisor_type is QEMU/qemu but the node is reporting kvm, we let it slide and accept that host? -- Thanks, Matt
On 7/24/2019 9:34 AM, Matt Riedemann wrote:
Massimo - just to confirm, your [libvirt]/virt_type is "kvm" rather than "qemu" correct? If so, then yeah what Melanie found is the problem and was regression in behavior for the ImagePropertiesFilter in Rocky and there should be a fix to the docs and likely a release note - though the release note is tricky since the regression was introduced in Rocky.
Note that the glance description of the hypervisor_type property is also wrong:
https://docs.openstack.org/glance/latest/admin/useful-image-properties.html
"The hypervisor type. Note that qemu is used for both QEMU and KVM hypervisor types."
Given https://review.opendev.org/#/c/531328/ was abandoned, if the API is still showing QEMU for the hypervisor type (which Massimo confirmed it is) even though the node is configured with virt_type=kvm and that's what the ImagePropertiesFilter is going to use, I think we'd be justified in reverting https://review.opendev.org/#/c/531347/ since it's totally confusing to operators if the API is showing the hypervisor_type as QEMU but the scheduler is filtering on "kvm".
On Wed, 2019-07-24 at 09:47 -0500, Matt Riedemann wrote: the API and the fileters should both see the hyperviors type as QEMU regradles of if the virt type is qemu or kvm. kvm is just an accleration mechanium for QEMU it is not a sperate hypervior. we stopped progerssing https://review.opendev.org/#/c/531328/ as it was a breaking api change that leaked config info via the api. e.g. the virt_type. we should be consistent however an always report qemu both via the api and to the image proerties filetr for hypervisor_type. we could have a seperate imemamge metadata for forcing kvm or qemu but as documented in glance hypervior_type should be qemu when the tcg backend is user or the kvm backend. if we wanted to support kvm specificlay i would suggest supporting vm_mode=hvm
We can change the docs but that feels like papering over the issue to me. What would the docs say? Something like, "Since the 18.0.0 Rocky release, the hypervisor_type value for libvirt nodes should match the configured [libvirt]/virt_type value"? That won't fix any existing images with their properties embedded in an instance's system_metadata which could prevent you from being able to migrate those instances anywhere during the upgrade - you'd have to essentially do some database surgery in the instance_system_metadata table to fix the image_hypervisor_type value to match whatever the virt_type value is on that node.
Alternatively we could try to put some targeted compat code in the ImagePropertiesFilter where if the hypervisor_type is QEMU/qemu but the node is reporting kvm, we let it slide and accept that host?
Correct. In my environment [libvirt]/virt_type is "kvm" (and this was the case also before updating to Rocky) Thanks, Massimo On Wed, Jul 24, 2019 at 4:34 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 7/24/2019 1:37 AM, Massimo Sgaravatto wrote:
Melanie: I think this is indeed the problem !
But then, if I am not wrong, the note in:
https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html
<verbatim> Note
qemu is used for both QEMU and KVM hypervisor types. </verbatim>
should be removed. I can open a bug if you agree ...
And maybe this is something worth to be mentioned in the release notes ?
Thanks again for your help !
Cheers, Massimo
Massimo - just to confirm, your [libvirt]/virt_type is "kvm" rather than "qemu" correct? If so, then yeah what Melanie found is the problem and was regression in behavior for the ImagePropertiesFilter in Rocky and there should be a fix to the docs and likely a release note - though the release note is tricky since the regression was introduced in Rocky.
--
Thanks,
Matt
On 7/24/2019 9:51 AM, Massimo Sgaravatto wrote:
Correct. In my environment [libvirt]/virt_type is "kvm" (and this was the case also before updating to Rocky)
Please report a bug either way. I've also added comments in the bug associated with the patch that regressed the behavior [1]. [1] https://bugs.launchpad.net/nova/+bug/1195361/comments/22 -- Thanks, Matt
On Wed, Jul 24, 2019 at 4:57 PM Matt Riedemann <mriedemos@gmail.com> wrote:
Please report a bug either way. I've also added comments in the bug associated with the patch that regressed the behavior [1].
[1] https://bugs.launchpad.net/nova/+bug/1195361/comments/22
https://bugs.launchpad.net/nova/+bug/1837756 Thanks, Massimo
On 7/24/2019 10:47 AM, Massimo Sgaravatto wrote:
On Wed, Jul 24, 2019 at 4:57 PM Matt Riedemann <mriedemos@gmail.com <mailto:mriedemos@gmail.com>> wrote:
Please report a bug either way. I've also added comments in the bug associated with the patch that regressed the behavior [1].
[1] https://bugs.launchpad.net/nova/+bug/1195361/comments/22
https://bugs.launchpad.net/nova/+bug/1837756
Thanks, Massimo
Here is the revert: https://review.opendev.org/#/c/672559/ -- Thanks, Matt
Here is the revert: https://review.opendev.org/#/c/672559/
We tried to apply this patch on some compute nodes of our cloud and it works (both scheduling of new instances and migration of previously created VMs) Thanks a lot for your help !
Correct. In my environment [libvirt]/virt_type is "kvm" (and this was the case also before updating to Rocky)
On Wed, 2019-07-24 at 16:51 +0200, Massimo Sgaravatto wrote: the expected behavior in this case it the hypervior_type will be qemu if we are now reporting it as kvm that is a bug.
Thanks, Massimo
On Wed, Jul 24, 2019 at 4:34 PM Matt Riedemann <mriedemos@gmail.com> wrote:
On 7/24/2019 1:37 AM, Massimo Sgaravatto wrote:
Melanie: I think this is indeed the problem !
But then, if I am not wrong, the note in:
https://docs.openstack.org/nova/rocky/admin/configuration/schedulers.html
<verbatim> Note
qemu is used for both QEMU and KVM hypervisor types. </verbatim>
should be removed. I can open a bug if you agree ...
And maybe this is something worth to be mentioned in the release notes ?
Thanks again for your help !
Cheers, Massimo
Massimo - just to confirm, your [libvirt]/virt_type is "kvm" rather than "qemu" correct? If so, then yeah what Melanie found is the problem and was regression in behavior for the ImagePropertiesFilter in Rocky and there should be a fix to the docs and likely a release note - though the release note is tricky since the regression was introduced in Rocky.
--
Thanks,
Matt
participants (5)
-
Brian Rosmaita
-
Massimo Sgaravatto
-
Matt Riedemann
-
melanie witt
-
Sean Mooney