[openstack-dev] [nova] I think nova behaves poorly when booting multiple instances

John Garbutt john at johngarbutt.com
Fri May 29 12:27:17 UTC 2015


On 27 May 2015 at 23:36, Robert Collins <robertc at robertcollins.net> wrote:
> On 26 May 2015 at 03:37, Chris Friesen <chris.friesen at windriver.com> wrote:
>>
>> Hi all,
>>
>> I've just opened a bug around booting multiple instances at once, and it was
>> suggested on IRC that I mention it here to broaden the discussion around the
>> ideal behaviour.
>>
>> The bug is at:  https://bugs.launchpad.net/nova/+bug/1458122
>>
>> Basically the problem is this:
>>
>> When booting up instances, nova allows the user to specify a "min count" and
>> a "max count".  So logically, this request should be considered successful
>> if at least "min count" instances can be booted.
>>
>> Currently, if the user has quota space for "max count" instances, then nova
>> will try to create them all. If any of them can't be scheduled, then the
>> creation of all of them will be aborted and they will all be put into an
>> error state.

The new quota ideas we discussed should make other options for this a
lot simpler, I think:
https://review.openstack.org/#/c/182445/
But lets skip over that for now...

>> Arguably, if nova was able to schedule at least "min count" instances (which
>> defaults to 1) then it should continue on with creating those instances that
>> it was able to schedule. Only if nova cannot create at least "min count"
>> instances should nova actually consider the request as failed.
>>
>> Also, I think that if nova can't schedule "max count" instances, but can
>> schedule at least "min count" instances, then it shouldn't put the
>> unscheduled ones into an error state--it should just delete them.
>
> I think taking successfully provisioned vm's and rolling them back is
> poor, when the users request was strictly met- I'm in favour of your
> proposals.

The problem here is having a nice way to explicitly tell the users
about what worked and what didn't. Currently the instance being in an
error state because its the "good" way to tell the user that build
failed. Deleting them doesn't have the same visibility, it can look
like the just vanished.

We do have a (straw man) proposed solution for this. See the Task API
discussion here:
https://etherpad.openstack.org/p/YVR-nova-error-handling

Given this also impacts discussions around cancelling operations like
live-migrate, I would love for a sub group to form and push forward
the important work on building a "Task API". I think Andrew Laski has
committed to writing up a backlog spec for this current proposal (that
has gained a lot of support), so it could be taken on by some others
who want to move this forward. Do you fancy getting involved with
that?


Having said all that, I am very tempted to say we should deprecate the
"min_count" parameter in the API, keep the current behaviour for old
version requests, and maybe even remove the "max_count" parameter. We
could look to Heat to do a much better job of this kind of
orchestration. This is very much in the spirit of:
http://docs.openstack.org/developer/nova/devref/project_scope.html#no-more-orchestration


Either which way, given the impact of the bug fix (i.e. it touches the
API, and would probably need a micro version bump), I think it would
be great to actually write up your proposal as a nova-spec (backlog or
targeted at liberty, either way is cool). I think a spec review would
be a great way to reach a good agreement on the best approach here.


Chris, does that sounds like an approach that would work for you?


Thanks,
John



More information about the OpenStack-dev mailing list