[openstack-dev] [Congress] Re: Placement and Scheduling via Policy

Tim Hinrichs thinrichs at vmware.com
Tue Dec 16 18:24:37 UTC 2014

[Adding openstack-dev to this thread.  For those of you just joining… We started kicking around ideas for how we might integrate a special-purpose VM placement engine into Congress.]

Kudva: responses inline.

On Dec 16, 2014, at 6:25 AM, Prabhakar Kudva <kudva at us.ibm.com<mailto:kudva at us.ibm.com>> wrote:


I am very interested in this.

So, it looks like there are two parts to this:
1. Policy analysis when there are a significant mix of logical and builtin predicates (i.e.,
runtime should identify a solution space when there are arithmetic operators). This will
require linear programming/ILP type solvers.  There might be a need to have a function
in runtime.py that specifically deals with this (Tim?)

I think it’s right that we expect there to be a mix of builtins and standard predicates.  But what we’re considering here is having the linear solver be treated as if it were a domain-specific policy engine.  So that solver wouldn’t be embedded into the runtime.py necessarily.  Rather, we’d delegate part of the policy to that domain-specific policy engine.

2. Enforcement. That is with a large number of constraints in place for placement and
scheduling, how does the policy engine communicate and enforce the placement
constraints to nova scheduler.

I would imagine that we could delegate either enforcement or monitoring or both.  Eventually we want enforcement here, but monitoring could be useful too.

And yes you’re asking the right questions.  I was trying to break the problem down into pieces in my bullet (1) below.  But I think there is significant overlap in the questions we need to answer whether we’re delegating monitoring or enforcement.

Both of these require some form of mathematical analysis.

Would be happy and interested to discuss more on these lines.

Maybe take a look at how I tried to breakdown the problem into separate questions in bullet (1) below and see if that makes sense.



From:        Tim Hinrichs <thinrichs at vmware.com<mailto:thinrichs at vmware.com>>
To:        "ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>" <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>>
Cc:        "Ramki Krishnan (ramk at Brocade.com<mailto:ramk at Brocade.com>)" <ramk at Brocade.com<mailto:ramk at Brocade.com>>, Gokul B Kandiraju/Watson/IBM at IBMUS, Prabhakar Kudva/Watson/IBM at IBMUS
Date:        12/15/2014 12:09 PM
Subject:        Re: Placement and Scheduling via Policy

[Adding Prabhakar and Gokul, in case they are interested.]

1) Ruby, thinking about the solver as taking 1 matrix of [vm, server] and returning another matrix helps me understand what we’re talking about—thanks.  I think you’re right that once we move from placement to optimization problems in general we’ll need to figure out how to deal with actions.  But if it’s a placement-specific policy engine, then we can build VM-migration into it.

It seems to me that the only part left is figuring out how to take an arbitrary policy, carve off the placement-relevant portion, and create the inputs the solver needs to generate that new matrix.  Some thoughts...

- My gut tells me that the placement-solver should basically say “I enforce policies having to do with the schema nova:location.”  This way the Congress policy engine knows to give it policies relevant to nova:location (placement).  If we do that, I believe we can carve off the right sub theory.

- That leaves taking a Datalog policy where we know nova:location is important and converting it to the input language required by a linear solver.  We need to remember that the Datalog rules may reference tables from other services like Neutron, Ceilometer, etc.  I think the key will be figuring out what class of policies we can actually do that for reliably.  Cool—a concrete question.

2) We can definitely wait until January on this.  I’ll be out of touch starting Friday too; it seems we all get back early January, which seems like the right time to resume our discussions.  We have some concrete questions to answer, which was what I was hoping to accomplish before we all went on holiday.

Happy Holidays!

On Dec 15, 2014, at 5:53 AM, <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> wrote:

Hi Tim

1) Is there any more data the solver needs?  Seems like it needs something about CPU-load for each VM.
2) Which solver should we be using?  What does the linear program that we feed it look like?  How do we translate the results of the linear solver into a collection of ‘migrate_VM’ API calls?”

  Question (2) seems to me the first to address, in particular:
     “how to prepare the input (variables, constraints, goal) and invoke the solver”
=>  We need rules that represent constraints to give the solver (e.g. a technical constraint that a VM should not be assigned to more than one server or that more than maximum resource (cpu / mem …) of a server cannot be assigned.

     “how to translate the results of the linear solver into a collection of API calls”:
=>  The output from the “solver” will give the new placement plan (respecting the constraints in input)?
o    E.g. a table of [vm, server, true/false]
=>  Then this depends on how “action” is going to be implemented in Congress (whether an external solver is used or not)
o    Is the action presented as the “final” DB rows that the system must produce as a result of the actions?
o    E.g. if current vm table is [vm3, host4] and the recomputed row says [vm3, host6], then the action is to move vm3 to host6?

     “how will the solver be invoked”?
=>  When will the optimization call be invoked?
=>  Is it “batched”, e.g. periodically invoke Congress to compute new assignments?

  Which solver to use:
     http://www.coin-or.org/projects/<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.coin-2Dor.org_projects_&d=AAMFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=B6BWd4kFfgOzAREgThxkmTZKy7dDXE2-eBAmL0PBK7s&m=3lvgeryw4T-aWafrSZZG96NcydtHt6HnT_6vKookx6U&s=01_9grcy8VGwbKRXcqhFRex3N0XIoCBzOimWFwXYI58&e=> and http://www.coin-or.org/projects/PuLP.xml<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.coin-2Dor.org_projects_PuLP.xml&d=AAMFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=B6BWd4kFfgOzAREgThxkmTZKy7dDXE2-eBAmL0PBK7s&m=3lvgeryw4T-aWafrSZZG96NcydtHt6HnT_6vKookx6U&s=RRiv5ZWCQwWguBZIsIXzCA4_otY4Gr7aeFmFMRB4ZZQ&e=>
      I think it may be useful to pass through an interface (e.g. LP modeler to generate LP files in standard formats accepted by prevalent solvers)

  The mathematical program:
      We can (Orange) contribute to writing down in an informal way the program for this precise use case, if this can wait until January.
      Perhaps the objective is to may be “minimize the number of servers whose usage is less than 50%”, since the original policy “Not more than 1 server of type1 to have a load under 50%” need not necessarily have a solution.

    This may help to derive the “mappings” from Congress (rules to program equations, intermediary tables to program variables)?

For “migration” use case: it may be useful to add some constraint representing cost of migration, such that the solver computes the new assignment plan such that the maximum migration cost is not exceeded. To start with, perhaps number of migrations?

I will be away from the end of the week until 5th January. I will also discuss with colleagues to see how we can formalize contribution (congress+nfv poc).


De : Tim Hinrichs [mailto:thinrichs at vmware.com]
Envoyé : vendredi 12 décembre 2014 19:41
Cc : Ramki Krishnan (ramk at Brocade.com<mailto:ramk at Brocade.com>)
Objet : Re: Placement and Scheduling via Policy

There’s a ton of good stuff here!

So if we took Ramki’s initial use case and combined it with Ruby’s HA constraint, we’d have something like the following policy.

// anti-affinity
error (server, VM1, VM2) :-
            same_ha_group(VM1, VM2),
            nova:location(VM1, server),
            nova:location(VM2, server)

// server-utilization
error(server) :-
ceilometer:average_utilization(server, “cpu-util”, avg),
avg < 50

As a start, this seems plenty complex to me.  anti-affinity is great b/c it DOES NOT require a sophisticated solver; server-utilization is great because it DOES require a linear solver.

Data the solver needs:
- Ceilometer: cpu-utilization for all the servers
- Nova: data as to where each VM is located
- Policy: high-availability groups

1) Is there any more data the solver needs?  Seems like it needs something about CPU-load for each VM.
2) Which solver should we be using?  What does the linear program that we feed it look like?  How do we translate the results of the linear solver into a collection of ‘migrate_VM’ API calls?

Maybe another few emails and then we set up a phone call.


On Dec 11, 2014, at 1:33 AM, <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> wrote:


A) First a small extension to the use case that Ramki proposes

   - Add high availability constraint.
  - Assuming server-a and server-b are of same size and same failure model.
    [Later: Assumption of identical failure rates can be loosened.
     Instead of considering only servers as failure domains, can introduce other failure domains ==> not just an anti-affinity policy but a calculation from 99,99.. requirement to VM placements, e.g.
  - For an exemplary maximum usage scenario, 53 physical servers could be under peak utilization (100%), 1 server (server-a) could be under  partial utilization (50%) with 2 instances of type large.3 and 1 instance of type large.2, and 1 server (server-b) could be under partial utilization (37.5%) with 3 instances of type large.2.
    Call VM.one.large2 as the large2 VM in server-a
    Call VM.two.large2 as one of the large2 VM in server-b

  - VM.one.large2 and VM.two.large2
  - When one of the large.3 instances mapped to server-a is deleted from physical server type 1, Policy 1 will be violated, since the overall utilization of server-a falls to 37,5%.

  - Various new placements(s) are described below

    VM.two.large2 must not be moved. Moving VM.two.large2 breaks non-affinity constraint.

    error (server, VM1, VM2) :-
                node (VM1, server1),
                node (VM2, server2),
                same_ha_group(VM1, VM2),
                equal(server1, server2);

   1) New placement 1: Move 2 instances of large.2 to server-a. Overall
   utilization of server-a - 50%. Overall utilization of server-b -

   2) New placement 2: Move 1 instance of large.3 to server-b. Overall
   utilization of server-a - 0%. Overall utilization of server-b -

   3) New placement 3: Move 3 instances of large.2 to server-a. Overall
   utilization of server-a - 62.5%. Overall utilization of server-b -

   New placements 2 and 3 could be considered optimal, since they
   achieve maximal bin packing and open up the door for turning off
   server-a or server-b and maximizing energy efficiency.

   But new placement 3 breaks client policy.

BTW: what happens if a given situation does not allow the policy violation to be removed?

B) Ramki’s original use case can itself be extended:

  Adding additional constraints to the previous use case due to cases such as:

-       Server heterogeneity

-       CPU “pinning”

-       “VM groups” (and allocation

-       Application interference

-       Refining on the statement “instantaneous energy consumption can be approximately measured using an overall utilization metric, which is a combination of CPU utilization, memory usage, I/O usage, and network usage”

Let me know if this will interest you. Some (e.g. application interference) will need some time. E.G; benchmarking / profiling to class VMs etc.

C) New placement plan execution

-       In Ramki’s original use case, violation is detected at events such as VM delete.
While certainly this by itself is sufficiently complex, we may need to consider other triggering cases (periodic or when multiple VMs are deleted/added)
-       In this case, it may not be sufficient to compute the new placement plan that brings the system to a configuration that does not break policy, but also add other goals

D) Let me know if a use case such as placing “video conferencing servers” (geographically distributed clients) would suit you (multi site scenario)

=>  Or is it too premature?


De : Tim Hinrichs [mailto:thinrichs at vmware.com]
Envoyé : mercredi 10 décembre 2014 19:44
Cc : Ramki Krishnan (ramk at Brocade.com<mailto:ramk at Brocade.com>)
Objet : Re: Placement and Scheduling via Policy

Hi Ruby,

Whatever information you think is important for the use case is good.  Section 3 from one of the docs Ramki sent you covers his use case.

>From my point of view, the keys things for the use case are…

- The placement policy (i.e. the conditions under which VMs require migration).

- A description of how we want to compute what specific migrations should be performed (a sketch of (i) the information that we need about current placements, policy violations, etc., (2) what systems/algorithms/etc. can utilize that input to figure out what migrations to perform.

I think we want to focus on the end-user/customer experience (write a policy, and watch the VMs move around to obey that policy in response to environment changes) and then work out the details of how to implement that experience.  That’s why I didn’t include things like delays, asynchronous/synchronous, architecture, applications, etc. in my 2 bullets above.


On Dec 10, 2014, at 8:55 AM, <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> <ruby.krishnaswamy at orange.com<mailto:ruby.krishnaswamy at orange.com>> wrote:

Hi Ramki, Tim

By a “format” for describing use cases, I meant to ask what sets of information to provide, for example,
-       what granularity in description of use case?
-       a specific placement policy (and perhaps citing reasons for needing such policy)?
-       Specific applications
-       Requirements on the placement manager itself (delay, …)?
o   Architecture as well
-       Specific services from the placement manager (using Congress), such as,
o   Violation detection (load, security, …)
-       Adapting (e.g. context-aware) of policies used

In any case I will read the documents that Ramki has sent to not resend similar things.


De : Ramki Krishnan [mailto:ramk at Brocade.com]
Envoyé : mercredi 10 décembre 2014 16:59
Cc : Norival Figueira; Pierre Ettori; Alex Yip; dilikris at in.ibm.com<mailto:dilikris at in.ibm.com>
Objet : RE: Placement and Scheduling via Policy

Hi Tim,

This sounds like a plan. It would be great if you could add the links below to the Congress wiki. I am all for discussing this in the openstack-dev mailing list and at this point this discussion is completely open.

IRTF NFVRG Research Group: https://trac.tools.ietf.org/group/irtf/trac/wiki/nfvrg<https://urldefense.proofpoint.com/v2/url?u=https-3A__trac.tools.ietf.org_group_irtf_trac_wiki_nfvrg&d=AAMFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=B6BWd4kFfgOzAREgThxkmTZKy7dDXE2-eBAmL0PBK7s&m=R82SMwEX_3O32-8F5eqMQ8Y6wuHt9WhjmMg6rr-4gWs&s=X---GnOf7YwhOGKMWYa8Mh52VtmO-2imfuZdKLEY39M&e=>

IRTF NFVRG draft on NFVIaaS placement/scheduling (includes system analysis for the PoC we are thinking): https://datatracker.ietf.org/doc/draft-krishnan-nfvrg-policy-based-rm-nfviaas/?include_text=1<https://urldefense.proofpoint.com/v2/url?u=https-3A__datatracker.ietf.org_doc_draft-2Dkrishnan-2Dnfvrg-2Dpolicy-2Dbased-2Drm-2Dnfviaas_-3Finclude-5Ftext-3D1&d=AAMFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=B6BWd4kFfgOzAREgThxkmTZKy7dDXE2-eBAmL0PBK7s&m=R82SMwEX_3O32-8F5eqMQ8Y6wuHt9WhjmMg6rr-4gWs&s=nE7Xheq0TcCDN98mFIOG_VvMsmfBeIDNDVVFV1HpJx0&e=>

IRTF NFVRG draft on Policy Architecture and Framework (looking forward to your comments and thoughts): https://datatracker.ietf.org/doc/draft-norival-nfvrg-nfv-policy-arch/?include_text=1<https://urldefense.proofpoint.com/v2/url?u=https-3A__datatracker.ietf.org_doc_draft-2Dnorival-2Dnfvrg-2Dnfv-2Dpolicy-2Darch_-3Finclude-5Ftext-3D1&d=AAMFAw&c=Sqcl0Ez6M0X8aeM67LKIiDJAXVeAw-YihVMNtXt-uEs&r=B6BWd4kFfgOzAREgThxkmTZKy7dDXE2-eBAmL0PBK7s&m=R82SMwEX_3O32-8F5eqMQ8Y6wuHt9WhjmMg6rr-4gWs&s=lBet00H8iO1igDZNEMUGaryHWutkg8abBbL5VG8pjyk&e=>

Hi Ruby,

Looking forward to your use cases.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20141216/09d5f322/attachment-0001.html>

More information about the OpenStack-dev mailing list