[Openstack-operators] How to tune scheduling for "Insufficient compute resources" (race conditions ?)

Massimo Sgaravatto massimo.sgaravatto at gmail.com
Wed Nov 30 14:56:49 UTC 2016


Hi all

I have a problem with scheduling in our Mitaka Cloud,
Basically when there are a lot of requests for new instances, some of them
fail because "Failed to compute_task_build_instances: Exceeded maximum
number of retries". And the failures are because "Insufficient compute
resources: Free memory 2879.50 MB < requested
 8192 MB" [*]

But there are compute nodes with enough memory that could serve such
requests.

In the conductor log I also see messages reporting that "Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by xxx sec" [**]


My understanding is that:

- VM a is scheduled to a certain compute node
- the scheduler chooses the same compute node for VM b before the info for
that compute node is updated (so the 'size' of VM a is not taken into
account)

Does this make sense or am I totally wrong ?

Any hints about how to cope with such scenarios, besides increasing
 scheduler_max_attempts ?

scheduler_default_filters is set to:

scheduler_default_filters =
AggregateInstanceExtraSpecsFilter,AggregateMultiTenancyIsolation,RetryFilter,AvailabilityZoneFilter,RamFilter,CoreFilter,AggregateRamFilter,AggregateCoreFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter


Thanks a lot, Massimo

[*]

2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
[req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
d27fe2becea94a3e980fb9f66e2f29
1a - - -] Failed to compute_task_build_instances: Exceeded maximum number
of retries. Exceeded max scheduling attempts 5 for instance
314eccd0-fc73-446f-8138-7d8d3c
8644f7. Last exception: Insufficient compute resources: Free memory 2879.50
MB < requested 8192 MB.
2016-11-30 15:10:20.233 25140 WARNING nova.scheduler.utils
[req-ec8c0bdc-b413-4cab-b925-eb8f11212049 840c96b6fb1e4972beaa3d30ade10cc7
d27fe2becea94a3e980fb9f66e2f29
1a - - -] [instance: 314eccd0-fc73-446f-8138-7d8d3c8644f7] Setting instance
to ERROR state.


[**]

2016-11-30 15:10:48.873 25128 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.08 sec
2016-11-30 15:10:54.372 25142 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.33 sec
2016-11-30 15:10:54.375 25140 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.32 sec
2016-11-30 15:10:54.376 25129 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.30 sec
2016-11-30 15:10:54.381 25138 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.24 sec
2016-11-30 15:10:54.381 25139 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.28 sec
2016-11-30 15:10:54.382 25143 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.24 sec
2016-11-30 15:10:54.385 25141 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 9.11 sec
2016-11-30 15:11:01.964 25128 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 3.09 sec
2016-11-30 15:11:05.503 25142 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.13 sec
2016-11-30 15:11:05.506 25138 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.12 sec
2016-11-30 15:11:05.509 25139 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.13 sec
2016-11-30 15:11:05.512 25141 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.13 sec
2016-11-30 15:11:05.525 25143 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.14 sec
2016-11-30 15:11:05.526 25140 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.15 sec
2016-11-30 15:11:05.529 25129 WARNING oslo.service.loopingcall [-] Function
'nova.servicegroup.drivers.db.DbDriver._report_state' run outlasted
interval by 1.15 sec
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20161130/d7d8bf41/attachment.html>


More information about the OpenStack-operators mailing list