[openstack-dev] [scheduler] APIs for Smart Resource Placement - Updated Instance Group Model and API extension model - WIP Draft

Mike Spreitzer mspreitz at us.ibm.com
Fri Oct 11 20:35:48 UTC 2013


I'll be at the summit too.  Available Nov 4 if we want to do some prep 
then.  It will be my first summit, I am not sure how overbooked my summit 
time will be.

Regards,
Mike



From:   Sylvain Bauza <sylvain.bauza at bull.net>
To:     OpenStack Development Mailing List 
<openstack-dev at lists.openstack.org>, 
Cc:     Mike Spreitzer/Watson/IBM at IBMUS
Date:   10/11/2013 08:19 AM
Subject:        Re: [openstack-dev] [scheduler] APIs for Smart Resource 
Placement - Updated Instance Group Model and API extension model - WIP 
Draft



Long-story short, sounds like we do have the same concerns here in 
Climate.

I'll be present at the Summit, any chance to do an unconference meeting in 

between all parties ?

Thanks,
-Sylvain

Le 11/10/2013 08:25, Mike Spreitzer a écrit :
Regarding Alex's question of which component does holistic infrastructure 
scheduling, I hesitate to simply answer "heat".  Heat is about 
orchestration, and infrastructure scheduling is another matter.  I have 
attempted to draw pictures to sort this out, see 
https://docs.google.com/drawings/d/1Y_yyIpql5_cdC8116XrBHzn6GfP_g0NHTTG_W4o0R9U 
and 
https://docs.google.com/drawings/d/1TCfNwzH_NBnx3bNz-GQQ1bRVgBpJdstpu0lH
_TONw6g 
.  In those you will see that I identify holistic infrastructure 
scheduling as separate functionality from infrastructure orchestration 
(the main job of today's heat engine) and also separate from software 
orchestration concerns.  However, I also see a close relationship between 
holistic infrastructure scheduling and heat, as should be evident in those 

pictures too. 

Alex made a remark about the needed inputs, and I agree but would like to 
expand a little on the topic.  One thing any scheduler needs is knowledge 
of the amount, structure, and capacity of the hosting thingies (I wish I 
could say "resources", but that would be confusing) onto which the 
workload is to be scheduled.  Scheduling decisions are made against 
available capacity.  I think the most practical way to determine available 

capacity is to separately track raw capacity and current (plus already 
planned!) allocations from that capacity, finally subtracting the latter 
from the former. 

In Nova, for example, sensing raw capacity is handled by the various 
nova-compute agents reporting that information.  I think a holistic 
infrastructure scheduler should get that information from the various 
individual services (Nova, Cinder, etc) that it is concerned with 
(presumably they have it anyway). 

A holistic infrastructure scheduler can keep track of the allocations it 
has planned (regardless of whether they have been executed yet).  However, 

there may also be allocations that did not originate in the holistic 
infrastructure scheduler.  The individual underlying services should be 
able to report (to the holistic infrastructure scheduler, even if lowly 
users are not so authorized) all the allocations currently in effect.  An 
accurate union of the current and planned allocations is what we want to 
subtract from raw capacity to get available capacity. 

If there is a long delay between planning and executing an allocation, 
there can be nasty surprises from competitors --- if there are any 
competitors.  Actually, there can be nasty surprises anyway.  Any 
scheduler should be prepared for nasty surprises, and react by some 
sensible retrying.  If nasty surprises are rare, we are pretty much done. 
If nasty surprises due to the presence of competing managers are common, 
we may be able to combat the problem by changing the long delay to a short 

one --- by moving the allocation execution earlier into a stage that is 
only about locking in allocations, leaving all the other work involved in 
creating virtual resources to later (perhaps Climate will be good for 
this).  If the delay between planning and executing an allocation is short 

and there are many nasty surprises due to competing managers, then you 
have too much competition between managers --- don't do that. 

Debo wants a simpler nova-centric story.  OK, how about the following. 
This is for the first step in the roadmap, where scheduling decisions are 
still made independently for each VM instance.  For the client/service 
interface, I think we can do this with a simple clean two-phase interface 
when traditional software orchestration is in play, a one-phase interface 
when slick new software orchestration is used.  Let me outline the 
two-phase flow.  We extend the Nova API with CRUD operations on VRTs 
(top-level groups).  For example, the CREATE operation takes a definition 
of a top-level group and all its nested groups, definitions (excepting 
stuff like userdata) of all the resources (only VM instances, for now) 
contained in those groups, all the relationships among those 
groups/resources, and all the applications of policy to those groups, 
resources, and relationships.  This is a rest-style interface; the CREATE 
operation takes a definition of the thing (a top-level group and all that 
it contains) being created; the UPDATE operation takes a revised 
definition of the whole thing.  Nova records the presented information; 
the familiar stuff is stored essentially as it is today (but marked as 
being in some new sort of tentative state), and the grouping, 
relationship, and policy stuff is stored according to a model like the one 

Debo&Yathi wrote.  The CREATE operation returns a UUID for the newly 
created top-level group.  The invocation of the top-level group CRUD is a 
single operation and it is the first of the two phases.  In the second 
phase of a CREATE flow, the client creates individual resources with the 
same calls as are used today, except that each VM instance create call is 
augmented with a pointer into the policy information.  That pointer 
consists of (1) the UUID of the relevant top-level group and (2) the name 
used within that group to identify the resource now being created. 
(Obviously we would need resources to be named uniquely among all the 
things ultimately contained anywhere in the same top-level group.  That 
could be done, e.g., with path names and a requirement only that siblings 
have distinct names.  Or we could simply require that names be unique 
without mandating any particular structure.  We could call them IDs rather 

than names.)  The way Nova handles a VM-create call can now be enhanced to 

reference and use the policy information that is associated with the newly 

passed policy pointer. 

The UPDATE flow is similar: first UPDATE the top-level group, then update 
individual resources. 

For the definition of a top-level group and all that it contains we need 
some language.  I think the obvious answer is an extended version of the 
HOT language.  Which is why I have proposed such an extension.  It is not 
because I am confused about what the heat engine should do, it is because 
I want something else (the policy-informed scheduler) to have an input 
language with sufficient content.  This is the role played by "HOT+" in 
the first of my two pictures cited above.  The same sort of language is 
needed in the first step of the roadmap, where it is only Nova that is 
policy-informed and scheduling is not yet joint --- but at this early step 

of the roadmap the resources+policy language is input to Nova rather than 
to a separate holistic infrastructure scheduler. 

Regards, 
Mike 

_________________________
______________________
OpenStack-dev mailing list
OpenStack-dev at lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131011/cffd66a0/attachment.html>


More information about the OpenStack-dev mailing list