All VMs fail when --max exceeds available resources

Albert Braden Albert.Braden at synopsys.com
Wed Nov 20 19:02:06 UTC 2019


I was experimenting in our dev cluster with CPU pinning and filters. After I was done I ran Ansible to put everything back like it was, but the scheduler is broken in 2 ways and I can't find the problem in my config. The first symptom is that if I use -max to create more VMs than the hypervisors can support, all of them go to ERROR. Before I changed things, --max would fill the hypervisors and only the extra VMs would go to ERROR. I'll email separately about the other one; this is already getting long.

When I look at the logs, I see the "Starting to schedule" line with the list of instance UUIDs, and then "Attempting to claim resources" log entries.

Hosts are selected; they are starting with 0 instances:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:56.880 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: ef1f7493-f792-4c3b-bf50-8e68b3d553ac] Selected host: (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 120696MB disk: 986112MB io_ops: 0 instances: 0 _consume_selected_host /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:346

Scheduler starting with the correct number of HV:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:56.888 1409571 DEBUG nova.filters [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Starting with 3 host(s) get_filtered_objects /usr/lib/python2.7/dist-packages/nova/filters.py:70

I see the HV being weighed and the number of instances on each HV increasing from 0:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:57.106 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Filtered [(us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: 85880MB disk: 880640MB io_ops: 1 instances: 2, (us01odc-dev1-hv001, us01odc-dev1-hv001.internal.synopsys.com) ram: 89976MB disk: 932864MB io_ops: 0 instances: 1, (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 89976MB disk: 929792MB io_ops: 1 instances: 1] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:435

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:57.107 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Weighed [WeighedHost [host: (us01odc-dev1-hv001, us01odc-dev1-hv001.internal.synopsys.com) ram: 89976MB disk: 932864MB io_ops: 0 instances: 1, weight: 2.40576402895], WeighedHost [host: (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 89976MB disk: 929792MB io_ops: 1 instances: 1, weight: 1.11693447844], WeighedHost [host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: 85880MB disk: 880640MB io_ops: 1 instances: 2, weight: 0.961725168625]] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:454

The number of instances goes beyond the HV capacity:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:57.602 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Weighed [WeighedHost [host: (us01odc-dev1-hv001, us01odc-dev1-hv001.internal.synopsys.com) ram: 59256MB disk: 876544MB io_ops: 1 instances: 2, weight: 1.17395901159], WeighedHost [host: (us01odc-dev1-hv003, us01odc-dev1-hv003.internal.synopsys.com) ram: 59256MB disk: 873472MB io_ops: 2 instances: 2, weight: 0.435549629144], WeighedHost [host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: 55160MB disk: 824320MB io_ops: 2 instances: 3, weight: 0.292945361349]] _get_sorted_hosts /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:454

Hosts are still being selected, but they are way over capacity already:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:58.696 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: 579dd4e2-d5d5-445f-905f-84cfd93146f6] Selected host: (us01odc-dev1-hv002, us01odc-dev1-hv002.internal.synopsys.com) ram: -6280MB disk: 711680MB io_ops: 4 instances: 5 _consume_selected_host /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:346

Then we start to see the warnings:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:58.817 1409571 WARNING nova.scheduler.client.report [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Unable to submit allocation for instance 5de08185-1ef2-4c92-8a19-5f09ec27be71 (409 {"errors": [{"status": 409, "request_id": "req-6cdd0b7a-bdbd-486f-9792-20840ea4a72e", "code": "placement.undefined_code", "detail": "There was a conflict when trying to complete your request.\n\n Unable to allocate inventory: Unable to create allocation for 'MEMORY_MB' on resource provider 'f20fa03d-18f4-486b-9b40-ceaaf52dabf8'. The requested amount would exceed the capacity.  ", "title": "Conflict"}]})

And then they all fail and are deleted:

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.057 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Unable to successfully claim against any host. _schedule /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:242

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.058 1409571 DEBUG nova.scheduler.filter_scheduler [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Cleaning up allocations for [u'ef1f7493-f792-4c3b-bf50-8e68b3d553ac', u'80afcfd7-ce25-4fc4-8b0d-b581a31b87bd', u'22cf9509-39e8-456a-b32c-e950cc597266', u'd6fc0b66-cd44-410d-96c9-590c22f1e21b', u'713a12fb-8bbf-4eac-ae02-68cb007fa34e', u'30c70c4d-1447-4481-bdc5-816835180ac6', u'be955f43-ac56-4531-ada8-16ce966211c7', u'b91820f6-775e-4abb-b28e-5b9b065819c2', u'83472954-2197-4e65-b7e7-1fe892f28458', u'd862bd74-494b-4c29-94ad-b80fc6c113e8', u'80a69dba-8011-403f-87fb-c39ef17ba467', u'c440d8d1-4da2-4c27-af9f-dd5afa19d083', u'5b301d69-a4f1-4393-8d67-4286c9873490', u'579dd4e2-d5d5-445f-905f-84cfd93146f6'] _cleanup_allocations /usr/lib/python2.7/dist-packages/nova/scheduler/filter_scheduler.py:299

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.124 1409571 INFO nova.scheduler.client.report [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Deleted allocation for instance ef1f7493-f792-4c3b-bf50-8e68b3d553ac

us01odc-dev1-ctrl3:/var/log/nova/nova-scheduler.log:2019-11-19 11:39:59.190 1409571 INFO nova.scheduler.client.report [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Deleted allocation for instance 80afcfd7-ce25-4fc4-8b0d-b581a31b87bd

And the errors:

us01odc-dev1-ctrl1:/var/log/nova/nova-conductor.log:2019-11-19 11:40:00.201 1801903 WARNING nova.scheduler.utils [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] Failed to compute_task_build_instances: No valid host was found. There are not enough hosts available.

us01odc-dev1-ctrl1:/var/log/nova/nova-conductor.log:2019-11-19 11:40:00.205 1801903 WARNING nova.scheduler.utils [req-892201d0-6b04-4e92-a347-00285725fbed 2cb6757679d54a69803a5b6e317b3a93 474ae347d8ad426f8118e55eee47dcfd - default 7d3a4deab35b434bba403100a6729c81] [instance: 5b301d69-a4f1-4393-8d67-4286c9873490] Setting instance to ERROR state.: NoValidHost_Remote: No valid host was found. There are not enough hosts available.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20191120/f61d0f42/attachment-0001.html>


More information about the openstack-discuss mailing list