[openstack-dev] [nova] Configure overcommit policy

John Garbutt john at johngarbutt.com
Thu Nov 14 14:55:10 UTC 2013


On 13 November 2013 14:51, Khanh-Toan Tran
<khanh-toan.tran at cloudwatt.com> wrote:
> Well, I don't know what John means by "modify the over-commit calculation in
> the scheduler", so I cannot comment.

I was talking about this code:
https://github.com/openstack/nova/blob/master/nova/scheduler/filters/core_filter.py#L64

But I am not sure thats what you want.

> The idea of choosing free host for Hadoop on the fly is rather complicated
> and contains several operations, namely: (1) assuring the host never get
> pass 100% CPU load; (2) identifying a host that already has a Hadoop VM
> running on it, or already 100% CPU commitment; (3) releasing the host from
> 100% CPU commitment once the Hadoop VM stops; (4) possibly avoiding other
> applications to use the host (to economy the host resource).
>
> - You'll need (1) because otherwise your Hadoop VM would come short of
> resources after the host gets overloaded.
> - You'll need (2) because you don't want to restrict a new host while one of
> your 100% CPU commited hosts still has free resources.
> - You'll need (3) because otherwise you host would be forerever restricted,
> and that is no longer "on the fly".
> - You'll may need (4) because otherwise it'd be a waste of resources.
>
> The problem of changing CPU overcommit on the fly is that when your Hadoop
> VM is still running, someone else can add another VM in the same host with a
> higher CPU overcommit (e.g. 200%), (violating (1) ) thus effecting your
> Hadoop VM also.
> The idea of putting the host in the aggregate can give you (1) and (2). (4)
> is done by AggregateInstanceExtraSpecsFilter. However, it does not give you
> (3); which can be done with pCloud.

Step 1: use flavors so nova can tell between the two workloads, and
configure them differently

Step 2: find capacity for your workload given your current cloud usage

At the moment, most of our solutions involve reserving bits of your
cloud capacity for different workloads, generally using host
aggregates.

The issue with claiming back capacity from other workloads is a bit
tricker. The issue is I don't think you have defined where you get
that capacity back from? Maybe you want to look at giving some
workloads a higher priority over the constrained CPU resources? But
you will probably starve the little people out at random, which seems
bad. Maybe you want to have a concept of "spot instances" where they
can use your "spare capacity" until you need it, and you can just kill
them?

But maybe I am miss understanding your use case, its not totally clear to me.

John



More information about the OpenStack-dev mailing list