Open Stack

Fri Apr 25 21:15:40 UTC 2014

Hi Stackers,

When recently digging in to the new server group v3 API extension
introduced in Icehouse, I was struck with a bit of cognitive dissonance
that I can't seem to shake. While I understand and support the idea
behind the feature (affinity and anti-affinity scheduling hints), I
can't help but feel the implementation is half-baked and results in a
very awkward user experience.

The use case here is very simple: 

Alice wants to launch an instance and make sure that the instance does
not land on a compute host that contains other instances of that type.

The current user experience is that the user creates a "server group"
like so:

nova server-group-create $GROUP_NAME --policy=anti-affinity

and then, when the user wishes to launch an instance and make sure it
doesn't land on a host with another of that instance type, the user does
the following:

nova boot --group $GROUP_UUID ...

There are myriad problems with the above user experience and
implementation. Let me explain them.

1. The user isn't creating a "server group" when they issue a nova
server-group-create call. They are creating a policy and calling it a
group. Cognitive dissonance results from this mismatch.

2. There's no way to add an existing server to this "group". What this
means is that the user needs to effectively have pre-considered their
environment and policy before ever launching a VM. To realize why this
is a problem, consider the following:

 - User creates three VMs that consume high I/O utilization
 - User then wants to launch three more VMs of the same kind and make
sure they don't end up on the same hosts as the others

No can do, since the first three VMs weren't started using a --group
scheduler hint.

3. There's no way to remove members from the group

4. There's no way to manually add members to the server group

5. The act of telling the scheduler to place instances near or away from
some other instances has been hidden behind the server group API, which
means that users doing a nova help boot will see a --group option that
doesn't make much sense, as it doesn't describe the scheduling policy
activity.

Proposal
--------

I propose to scrap the server groups API entirely and replace it with a
simpler way to accomplish the same basic thing.

Create two new options to nova boot:

 --near-tag <TAG>
and
 --not-near-tag <TAG>

The first would tell the scheduler to place the new VM near other VMs
having a particular "tag". The latter would tell the scheduler to place
the new VM *not* near other VMs with a particular tag.

What is a "tag"? Well, currently, since the Compute API doesn't have a
concept of a single string tag, the tag could be a key=value pair that
would be matched against the server extra properties.

Once a real user-controlled simple string tags system is added to the
Compute API, a "tag" would be just that, a simple string that may be
attached or detached from some object (in this case, a server object).

How does this solve all the issues highlighted above? In order, it
solves the issues like so:

1. There's no need to have any "server group" object any more. Servers
have a set of tags (key/value pairs in v2/v3 API) that may be used to
identify a type of server. The activity of launching an instance would
now have options for the user to indicate their affinity preference,
which removes the cognitive dissonance that happens due to the user
needing to know what a server group is (a policy, not a group).

2. Since there is no more need to maintain a separate server group
object, if a user launched 3 instances and then wanted to make sure that
3 new instances don't end up on the same hosts, all the user needs to do
is tag the existing instances with a tag, and issue a call to:

 nova boot --not-near-tag $TAG ...

and the affinity policy is applied properly.

3. Removal of members of the "server group" is no longer an issue.
Simply untag a server to remove it from the set of servers you wish to
use in applying some affinity policy

4. Likewise, since there's no server group object, in order to relate an
existing server to another is to simply place a tag on the server.

5. The act of applying affinity policies is now directly related to the
act of launching instances, which is where it should be.

I'll type up a real blueprint spec for this, but wanted to throw the
idea out there, since it's something that struck me recently when I
tried to explain the new server groups feature.

Thoughts and feedback welcome,
-jay

Open Stack

[openstack-dev] [nova] Proposal: remove the server groups feature

OpenStack

Community

Documentation

Branding & Legal