[Openstack-operators] [nova] Looking for feedback on a spec to limit max_count in multi-create requests

Matt Riedemann mriedemos at gmail.com
Fri Oct 6 21:43:33 UTC 2017


I've been chasing something weird I was seeing in devstack when creating 
hundreds of instances in a single request where at some limit, things 
blow up in an unexpected way during scheduling and all instances were 
put into ERROR state. Given the environment I was running in, this 
shouldn't have been happening, and today we figured out what was 
actually happening. To summarize, we retry scheduling requests on RPC 
timeout so you can have scheduler_max_attempts greenthreads running 
concurrently trying to schedule 1000 instances and melt your scheduler.

I've started a spec which goes into the details of the actual issue:

https://review.openstack.org/#/c/510235/

It also proposes a solution, but I don't feel it's the greatest 
solution, so there are also some alternatives in there.

I'm really interested in operator feedback on this because I assume that 
people are dealing with stuff like this in production already, and have 
had to come up with ways to solve it.

-- 

Thanks,

Matt



More information about the OpenStack-operators mailing list