[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Dan Smith dms at danplanet.com
Mon May 22 22:12:46 UTC 2017


> Whoah, but that's after 10 tries (by default).  And if e.g. it bounced
> because the instance is too big for the host, but other, smaller
> instances come in and succeed in the meantime, that could wind up being
> stretched indefinitely.  Doesn't sound like a complete answer to this issue.

No dude, remember, this is all assuming that claiming with placement
eliminates 100% of the resource races :)

The _only_ things left to reschedule for are (a) straight up 100% fail
compute host misconfigurations and (b) anything that fails some
percentage of the time and will actually be resolved by trying a
different host (i.e. baseline 40% ironic ipmi failbots).

> Today you can limit the set of compute hosts to try by specifying an
> "availability zone".  Perhaps the answer here is to support some kind of
> "exclude these hosts" list to a "fresh" deploy.
> 
> But is the cure worse than the disease?

I (and I think others) would argue that the user needing to know that
they should try a different AZ is not reasonable UX. A rebuild of an
instance that failed to boot can/should exclude the original host on the
rebuild attempt. It does today with reschedules so it's not that hard,
just requires some plumbing.

--Dan



More information about the OpenStack-operators mailing list