[Openstack-operators] [nova] Looking for feedback on a spec to limit max_count in multi-create requests
mriedemos at gmail.com
Fri Oct 6 21:43:33 UTC 2017
I've been chasing something weird I was seeing in devstack when creating
hundreds of instances in a single request where at some limit, things
blow up in an unexpected way during scheduling and all instances were
put into ERROR state. Given the environment I was running in, this
shouldn't have been happening, and today we figured out what was
actually happening. To summarize, we retry scheduling requests on RPC
timeout so you can have scheduler_max_attempts greenthreads running
concurrently trying to schedule 1000 instances and melt your scheduler.
I've started a spec which goes into the details of the actual issue:
It also proposes a solution, but I don't feel it's the greatest
solution, so there are also some alternatives in there.
I'm really interested in operator feedback on this because I assume that
people are dealing with stuff like this in production already, and have
had to come up with ways to solve it.
More information about the OpenStack-operators