[openstack-dev] Optionally force instances to "stay put" on resize
Sean Dague
sdague at linux.vnet.ibm.com
Mon Feb 18 12:40:41 UTC 2013
It's worth noting that the rationale for adding
"allow_resize_to_same_host" wasn't as an option that people would use.
It was added for tempest gate testing otherwise we can't test resizes on
a single node openstack.
My argument in the review is that resize is an end user operation, done
by a user on their self provisioned VMs. The common case for resize is
to increase the size of a VM.
The "force" option means that very quickly, in any real environment,
where people are scheduling full machines, the resize action stops
working (as people can only grow on a single machine), and admin
intervention is need to readjust the topology so users can resize again.
Pinning in this way gets dangerous really quickly, as you've moved
scheduler policy out of the scheduler, and made some new pathological
conditions.
I think that is antithetical to the computing model we are building, and
it opens up a brand new failure condition.
I get the need for hints, and if there was a "prefer" mode, I'd be ok
with this. But "force" just creates something new and brittle.
I'd say it's much better to come up with a proposal on how hypervisors
could influence the scheduler, and bring that to design summit. Resize
in OpenStack isn't the same as resize in oVirt, it is the allocation of
a new flavor, which in the general case means new CPU counts, new memory
sizes, new disk sizes.
Either way, I really think this is too late and too rushed to make a
change like this. Come up with a less brittle proposal for summit, and
lets discuss it there.
-Sean
On 02/18/2013 05:53 AM, John Garbutt wrote:
> This reminds me again of the differences between Migrate and
> Live-Migrate API calls.
> I think having the ability, in both cases, to do scheduler hints makes
> a lot of sense.
>
> I am thinking about admins and maintinace, rather than end-users.
>
> So +1 to most of Alex's points.
>
> John
>
> On 15 February 2013 17:21, Alex Glikson <GLIKSON at il.ibm.com> wrote:
>> IMO, the desired behavior of 'resize' is:
>> - user should be able to influence the expected 'downtime', i.e., whether it
>> should be done dynamically on the same host (zero downtime), using 'live'
>> migration (close to zero downtime), or non-live. Ideally, there should be
>> also an API to determine which modes are supported.
>> - user should be able to influence the placement, similarly to instance
>> provisioning, meaning that either scheduler hints should be persisted and
>> used during 'resize', or the user should be able to specify scheduler hints
>> when applying resize (or both). In particular, it might make sense to have a
>> dedicated weight function preferring to keep the instance on the same host,
>> if possible.
>> - optionally, it might make sense to have a filter (to be specified by the
>> admin) that would prevent migration of instances with certain
>> characteristics (which would apply during resize)
>>
>> The combination of the above would determine whether it can or will be on
>> the same host (transparently to the user).
>>
>> Having said that, as a short-term measure, making "resize_to_same_host" more
>> flexible certainly sounds like a step in the right direction.
>>
>> Regards,
>> Alex
>>
>>
>>
>>
>> From: Michael J Fork <mjfork at us.ibm.com>
>> To: openstack-dev at lists.openstack.org,
>> Date: 15/02/2013 06:08 PM
>> Subject: [openstack-dev] Optionally force instances to "stay put" on
>> resize
>> ________________________________
>>
>>
>>
>> The patch for the configurable-resize-placement blueprint
>> (https://blueprints.launchpad.net/nova/+spec/configurable-resize-placement)
>> has generated a discussion on the review boards and needed to be brought to
>> the mailing list for broader feedback.
>>
>> tl;dr would others find useful the addition of a new config option
>> "resize_to_same_host" with values "allow", "require", "forbid" that
>> deprecates "allow_resize_to_same_host" (functionality equivalent to "allow"
>> and "forbid" in "resize_to_same_host")? Existing use cases and default
>> behaviors are retained unchanged. The new use case is "resize_to_same_host
>> = require" retains the exact same external API sematics and would make it
>> such that no user actions can cause a VM migration (and the network traffic
>> with it). An administrator can still perform a manual migration that would
>> allow a subsequent resize to succeed. This patch would be most useful in
>> environments with 1GbE or with large ephemeral disks.
>>
>> Blueprint Description
>>
>>> Currently OpenStack has a boolean "allow_resize_to_same_host" config
>>> option that constrains
>>> placement during resize. When this value is false, the ignore_hosts option
>>> is passed to the scheduler.
>>> When this value is true, no options are passed to the scheduler and the
>>> current host can be
>>> considered. In some use cases - e.g. PowerVM - a third option of "require
>>> same host' is desirable.
>>
>>> This blueprint will deprecate the "allow_resize_to_same_host" config
>>> option and replace it with
>>> "resize_to_same_host" that supports 3 values - allow, forbid, require.
>>> Allow is equivalent to true in the
>>> current use case (i.e. not scheduler hint, current host is considered),
>>> forbid to false in current use case
>>> (i.e. the ignore_hosts scheduler hint is set), and require forces the same
>>> host through the use of the
>>> force_hosts scheduler hint.
>>
>> To avoid incorrectly paraphrasing others, the review comments against the
>> change are below in their entirety followed by my comments to those
>> concerns. The question we are looking to answer - would others find this
>> function useful and / or believe that OpenStack should have this option?
>>
>> Comments from https://review.openstack.org/#/c/21139/:
>>
>>> I still think this is a bad idea. The only reason the flag was there in
>>> the first place was so we could
>>> run tempest on devstack in the gate and test resize. Semantically this
>>> changes the meaning of resize
>>> in a way that I don't think should be done.
>>
>>> I understand what the patch does, and I even think it appears to be
>>> functionally correct based on
>>> what the intention appears to be. However, I'm not convinced that the
>>> option is a useful addition.
>>>
>>> First, it really just doesn't seem in the spirit of OpenStack or "cloud"
>>> to care this much about where
>>> the instance goes like this. The existing option was only a hack for
>>> testing, not something expected
>>> for admins to care about.
>>>
>>> If this really *is* something admins need to care about, I'd like to
>>> better understand why. Further, if
>>> that's the case, I'm not sure a global config option is the right way to
>>> go about it. I think it may make
>>> more sense to have this be API driven. I'd like to see some thoughts from
>>> others on this point."
>>
>>> "I completely agree with the "spirit of cloud" argument. I further think
>>> that exposing anything via the
>>> API that would support this (i.e. giving the users control or even
>>> indication of where their instance lands)
>>> is a dangerous precedent to set.
>>>
>>> I tend to think that this use case is so small and specialized, that it
>>> belongs in some other sort of policy
>>> implementation, and definitely not as yet-another-config-option to be
>>> exposed to the admins. That, or in
>>> some other project entirely :)"
>>
>> and my response to those concerns:
>>
>>> I agree this is not an 80% use case, or probably even that popular in the
>>> other 20%, but resize today
>>> is the only user facing API that can trigger the migration of a VM to a
>>> new machine. In some environments,
>>> this network traffic is undesirable - especially 1GBe - and may want to be
>>> explicitly controlled by an
>>> Administrator. In this implementation, an Admin can still invoke a
>>> migration manually to allow the resize to
>>> succeed. I would point to the Island work by Sina as an example, they
>>> wrote an entire Cinder driver
>>> designed to minimize network traffic.
>>>
>>> I agree with the point above that exposing this on an end-user API is not
>>> correct, users should not know
>>> or care where this goes. However, as the cloud operator, I should be able
>>> to have that level of control
>>> and this puts it in their hands.
>>>
>>> Obviously this option would need documented to allow administrators to
>>> decide if they need to change it,
>>> but it certainly wouldn't be default. Expectation is that it would of use
>>> in smaller installations or enterprise
>>> uses cases more often than service providers.
>>>
>>> Additionally, it continues to honor the existing resize API contract.
>>
>> An additional use case - beyond 1GbE - is if an environment uses large
>> ephemeral disks.
>>
>> Would others find this function useful and / or believe that OpenStack
>> should have this option? Again, the API contract is unchanged and it gives
>> a cloud operator an additional level of control over the movement of
>> instances. It would not be the default behavior, but rather enabled by an
>> administrator depending on their specific use cases and requirements and the
>> environment they are in.
>>
>> Thanks.
>>
>> Michael
>>
>> -------------------------------------------------
>> Michael Fork
>> OpenStack Architect, Cloud Solutions and OpenStack Development
>> IBM Systems & Technology
>> Group_______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>
>
>
--
Sean Dague
IBM Linux Technology Center
email: sdague at linux.vnet.ibm.com
alt-email: sldague at us.ibm.com
More information about the OpenStack-dev
mailing list