[openstack-dev] [Congress][Delegation] Initial workflow design

Yathiraj Udupi (yudupi) yudupi at cisco.com
Fri Feb 27 19:18:53 UTC 2015


Hi,

It will be good to simplify the PoC scenario.  In terms of policies and other constraints related to VM Placement,  I guess we agree that not all policies/constraints have to originate from the policy framework such as Congress.  The existing placement engine logic that is present in the default Nova scheduler or say the Nova solver scheduler will be adding its own set of constraints to the placement calculations.

My idea would be make the Nova placement engine,  (for e.g., solver scheduler) talk to Congress to get the Datalog rules / translated LP constraints, based on the defined policies pertaining to a particular tenant/user.  This of course needs to be worked out in terms of translation logic, constraint specifications, etc.  Also, this workflow will be part of the scheduling/Placement workflow as part of the Nova boot instance API call (for the initial placement).

As the next phase, for migrations scenario, Congress can periodically trigger a check, if any of the violations/warnings are triggered, (corresponding tables getting populated, as you show in your example),  if so, then trigger migrations, which will have to go through another round of placement decisions  for figuring out the best destinations, without violating the policies and other existing constraints.

Happy to discuss more and simplify a PoC scenario.

Thanks,
Yathi.


On 2/27/15, 6:40 AM, "ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>" <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> wrote:

My first suggestion: why don’t we set up call together with Ramki, Yathi, Debo, as soon as possible ?

-          How to go forward concretely with the 8 steps for the PoC (details within each step),

o    Including nova integration points

-          Identify “friction points” in the details above to resolve for  beyond PoC


Tim: Where does the rest of the program originate?  I’m not saying the entire LP program is generated from the Datalog constraints; some of it is generated by the solver independent of the Datalog.  In the text, I gave the example of defining hMemUse[j].
Tim: The VM-placement engine does 2 things: (I) translates Datalog to LP and (ii) generates additional LP constraints.  (Both work items could leverage any constraints that are builtin to a specific solver, e.g. the solver-scheduler.  The point is that there are 2 distinct, conceptual origins of the LP constraints: those that represent the Datalog and those that codify the domain.
Tim: Each domain-specific solver can do whatever it wants, so it’s not clear to me what the value of choosing a modeling language actually is—unless we want to build a library of common functionality that makes the construction of domain-specific engine (wrappers) easier.  I’d prefer to spend our energy understanding whether the proposed workflow/interface works for a couple of different domain-specific policy engines OR to flush this one out and build it.




ð  The value of choosing a modeling language is related to how “best to automate translations” from Datalog constraints (to LP)?

o    We can have look for one unique way of generation, and not, “some of it is generated by the VM-placement engine solver independent of the Datalog”.

o    Datalog imposes most constraints (== policies)

o    Two constraints are not “policies”

§  A VM is allocated to only one host.

§  Host capacity is not exceeded.

·         Over subscription

ð  Otherwise what was your suggestion?  As follows?

o    Use framework (extend) the nova-solver-scheduler currently implements (itself using PuLP). This framework specifies an API to write constraints and cost functions (in a domain specific way). Modifying this framework:

§  To read data in from DSE

§  To obtain the cost function from Datalog (e.g. minimize Y[host1]…)

§  To obtain Datalog constraints (e.g. <75% memory allocation constraint for hosts of special zone)

o    We need to specify the “format” for this? It will likely to be a string of the form (?)

§  “hMemUse[0] – 0.75*hMemCap[0] < 100*y[0], “ Memory allocation constraint on Host 0“,









ð  From your doc (page 5, section 4)


warning(id) :-
    nova:host(id, name, service, zone, memory_capacity),
    legacy:special_zone(zone),
    ceilometer:statistics(id, "memory", avg, count, duration,
durstart, durend, max, min, period, perstart, perend,
sum, unit),
    avg > 0.75 * memory_capacity


Notice that this is a soft constraint, identified by the use of warning instead of error.  When compiling to LP, the VM-placement engine will attempt to minimize the number of rows in the warning table.  That is, for each possible row r it will create a variable Y[r] and assign it True if the row is a warning and False otherwise


The policy (warning) : when will it be evaluated? This should be done periodically? Then if the table has even one True entry, then the action should be to generate the LP, solve, activate the migrations etc.

ð  The “LP” cannot be generated when the VM-placement engine receives the policy snippet.


Ruby


De : Tim Hinrichs [mailto:thinrichs at vmware.com]
Envoyé : jeudi 26 février 2015 19:17
À : OpenStack Development Mailing List (not for usage questions)
Objet : Re: [openstack-dev] [Congress][Delegation] Initial workflow design

Inline.

From: "ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>" <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>>
Reply-To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Date: Wednesday, February 25, 2015 at 8:53 AM
To: "OpenStack Development Mailing List (not for usage questions)" <openstack-dev at lists.openstack.org<mailto:openstack-dev at lists.openstack.org>>
Subject: Re: [openstack-dev] [Congress][Delegation] Initial workflow design

Hi Tim, All,


1)      Step 3: The VM-placement engine is also a “datalog engine” . Right?

When policies are delegated:

when policies are inserted? When the VM-placement engine has already registered itself all policies are given to it?




“In our example, this would mean the domain-specific policy engine executes the following API call over the DSE”

ð  “domain-agnostic” ….



Done.


2)      Step 4:



Ok

But finally: if Congress will likely “delegate”



Not sure what you’re suggesting here.


3)      Step 5:  Compilation of subpolicy to LP in VM-placement engine



For the PoC, it is likely that the LP program ( in PuLP or some other ML) is *not* completely generated by compiler/translator.

ð  Right?

Where does the rest of the program originate?  I’m not saying the entire LP program is generated from the Datalog constraints; some of it is generated by the solver independent of the Datalog.  In the text, I gave the example of defining hMemUse[j].


     You also indicate that some category of constraints (“the LP solver doesn’t know what the relationship between assign[i][j], hMemUse[j], and vMemUse[i] actually is, so the VM-placement engine must also include constraints”) .
     These constraints must be “explicitly” written?  (e.g. max_ram_allocation etc that are constraints used in the solver-scheduler’s package).

The VM-placement engine does 2 things: (I) translates Datalog to LP and (ii) generates additional LP constraints.  (Both work items could leverage any constraints that are builtin to a specific solver, e.g. the solver-scheduler.  The point is that there are 2 distinct, conceptual origins of the LP constraints: those that represent the Datalog and those that codify the domain.



         So what “parts” will be generated:
            Cost function :
            Constraint from Policy : memory usage < 75%

         Then the rest should be “filled” up?

         Could we convene on an intermediary “modeling language”?
            @Yathi: do you think we could use some thing like AMPL ? Is this proprietary?


    A detail: the example “Y[host1] = hMemUse[host1] > 0.75 * hMemCap[host1]”


ð  To be changed to a linear form (mi – Mi > 0 then Yi = 1 else Yi = 0) so something like (mi – Mi) < 100 yi

Each domain-specific solver can do whatever it wants, so it’s not clear to me what the value of choosing a modeling language actually is—unless we want to build a library of common functionality that makes the construction of domain-specific engine (wrappers) easier.  I’d prefer to spend our energy understanding whether the proposed workflow/interface works for a couple of different domain-specific policy engines OR to flush this one out and build it.



4)      Step 6: This is completely internal to the VM-placement engine (and we could say this is “transparent” to Congress)



We should allow configuration of a solver (this itself could be a policy ☺ )

How to invoke the solver API ?



The domain-specific placement engine could send out to DSE (action_handler: data)?



I had always envisioned the solver being just a library of code—not an entity that sits on the DSE itself.

3)   Step 7 : Perform the migrations (according to the assignments computed in the step 6)

     This part invokes OpenStack API (to perform migrations).
     We may suppose that there are services implementing “action handlers”?
     It can listen on the DSE and execute the action.


That interface is supposed to exist by the Kilo release.  I’ll check up on the progress.


5)      Nova tables to use
Policy warning(id) :-
    nova:host(id, name, service, zone, memory_capacity),
    legacy:special_zone(zone),
    ceilometer:statistics(id, "memory", avg, count, duration,
     durstart, durend, max, min, period,
perstart, perend, sum, unit),
    avg > 0.75 * memory_capacity



I believe that ceilometer gives usage of VMs and not hosts. The host table (ComputeNode table) should give current used capacity.



Good to know.


6)      One of the issues highlighted in OpenStack (scheduler) and also elsewhere (e.g. Omega scheduler by google) is :

Reading “host utilization” state from the data bases and DB (nova:host table) updates and overhead of maintaining in-memory state uptodate.

ð  This is expensive and current nova-scheduler does face this issue (many blogs/discussions).

      While the first goal is a PoC, this will likely become a concern in terms of adoption.



So you’re saying we won’t have fresh enough data to make policy decisions?  If the data changes so frequently that we can’t get an accurate view, then I’m guessing we shouldn’t be migrating based on that data anyway.

Could you point me to some of these discussions?


7)      While in this document you have changed the “example” policy, could we drill down the set of policies for the PoC (the server under utilization ?)


ð  As a reference

Sure.  The only reason I chose this policy was because it doesn’t have aggregation.  I’m guessing we’ll want to avoid aggregation for the POC because we don’t yet have it in Congress, and it complicates the problem of translating Datalog to LP substantially.

Tim

Ruby

De : Yathiraj Udupi (yudupi) [mailto:yudupi at cisco.com]
Envoyé : mardi 24 février 2015 20:01
À : OpenStack Development Mailing List (not for usage questions); Tim Hinrichs
Cc : Debo Dutta (dedutta)
Objet : Re: [openstack-dev] [Congress][Delegation] Initial workflow design

Hi Tim,

Thanks for your updated doc on Delegation from Congress to a domain-specific policy engine, in this case, you are planning to build a LP-based VM-Placement engine to be the domain specific policy engine.
I agree your main goal is to first get the delegation interface sorted out.  It will be good so that external services (like Solver-Scheduler) can also easily integrate to the delegation model.

From the Solver-Scheduler point of view,  we would actually want to start working on a PoC effort to start integrating Congress and the Solver-Scheduler.
We believe rather than pushing this effort to a long-term,  it would add value to both the Solver Scheduler effort, as well as the Congress effort to try some early integration now, as most of the LP solver work for VM placements is ready available now in Solver scheduler, and we need to spend some time thinking about translating your domain-agnostic policy to constraints that the Solver scheduler can use.

I would definitely need your help from the Congress interfaces and I hope you will share your early interfaces for the delegation, so I can start the effort from the Solver scheduler side for integration.
I will reach out to you to get some initial help for integration w.r.t. Congress, and also keep you posted about the progress from our side.

Thanks,
Yathi.



On 2/23/15, 11:28 AM, "Tim Hinrichs" <thinrichs at vmware.com<mailto:thinrichs at vmware.com>> wrote:


Hi all,



I made a heavy editing pass of the Delegation google doc, incorporating many of your comments and my latest investigations into VM-placement.  I left the old stuff in place at the end of the doc and put the new stuff at the top.



My goal was to propose an end-to-end workflow for a PoC that we could put together quickly to help us explore the delegation interface.  We should iterate on this design until we have something that we think is workable.   And by all means pipe up if you think we need a totally different starting point to begin the iteration.



(BTW I'm thinking of the integration with solver-scheduler as a long-term solution to VM-placement, once we get the delegation interface sorted out.)



https://docs.google.com/document/d/1ksDilJYXV-5AXWON8PLMedDKr9NpS8VbT0jIy_MIEtI/edit#<https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_1ksDilJYXV-2D5AXWON8PLMedDKr9NpS8VbT0jIy-5FMIEtI_edit&d=AwMFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=B6BWd4kFfgOzAREgThxkmTZKy7dDXE2-eBAmL0PBK7s&m=kF8jMOpogOhk8MJWvNMKJC3PiNImxWpZeD2o642YM2s&s=8PV5EW-kz8Q9aP9riFbIjJXJNZXchx2NsL-Z3Y7E5Vg&e=>



Tim

_________________________________________________________________________________________________________________________



Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc

pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler

a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,

Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.



This message and its attachments may contain confidential or privileged information that may be protected by law;

they should not be distributed, used or copied without authorisation.

If you have received this email in error, please notify the sender and delete this message and its attachments.

As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.

Thank you.

_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci.

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been modified, changed or falsified.
Thank you.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150227/84d94884/attachment-0001.html>


More information about the OpenStack-dev mailing list