[openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

Alex Xu soulxu at gmail.com
Fri Jan 9 15:14:59 UTC 2015


2015-01-09 22:22 GMT+08:00 Sylvain Bauza <sbauza at redhat.com>:

>
> Le 09/01/2015 14:58, Alex Xu a écrit :
>
>
>
> 2015-01-09 17:17 GMT+08:00 Sylvain Bauza <sbauza at redhat.com>:
>
>>
>> Le 09/01/2015 09:01, Alex Xu a écrit :
>>
>> Hi, All
>>
>>  There is bug when running nova with ironic
>> https://bugs.launchpad.net/nova/+bug/1402658
>>
>>  The case is simple: one baremetal node with 1024MB ram, then boot two
>> instances with 512MB ram flavor.
>> Those two instances will be scheduling to same baremetal node.
>>
>>  The problem is at scheduler side the IronicHostManager will consume all
>> the resources for that node whatever
>> how much resource the instance used. But at compute node side, the
>> ResourceTracker won't consume resources
>> like that, just consume like normal virtual instance. And ResourceTracker
>> will update the resource usage once the
>> instance resource claimed, then scheduler will know there are some free
>> resource on that node, then will try to
>> schedule other new instance to that node.
>>
>>  I take look at that, there is NumInstanceFilter, it will limit how many
>> instance can schedule to one host. So can
>> we just use this filter to finish the goal? The max instance is
>> configured by option 'max_instances_per_host', we
>> can make the virt driver to report how many instances it supported. The
>> ironic driver can just report max_instances_per_host=1.
>> And libvirt driver can report max_instance_per_host=-1, that means no
>> limit. And then we can just remove the
>> IronicHostManager, then make the scheduler side is more simpler. Does
>> make sense? or there are more trap?
>>
>>  Thanks in advance for any feedback and suggestion.
>>
>>
>>
>>  Mmm, I think I disagree with your proposal. Let me explain by the best
>> I can why :
>>
>> tl;dr: Any proposal unless claiming at the scheduler level tends to be
>> wrong
>>
>> The ResourceTracker should be only a module for providing stats about
>> compute nodes to the Scheduler.
>> How the Scheduler is consuming these resources for making a decision
>> should only be a Scheduler thing.
>>
>
>  agreed, but we can't implement this for now, the reason is you described
> as below.
>
>
>>
>> Here, the problem is that the decision making is also shared with the
>> ResourceTracker because of the claiming system managed by the context
>> manager when booting an instance. It means that we have 2 distinct decision
>> makers for validating a resource.
>>
>>
>  Totally agreed! This is the root cause.
>
>
>>  Let's stop to be realistic for a moment and discuss about what could
>> mean a decision for something else than a compute node. Ok, let say a
>> volume.
>> Provided that *something* would report the volume statistics to the
>> Scheduler, that would be the Scheduler which would manage if a volume
>> manager could accept a volume request. There is no sense to validate the
>> decision of the Scheduler on the volume manager, just maybe doing some
>> error management.
>>
>> We know that the current model is kinda racy with Ironic because there is
>> a 2-stage validation (see [1]). I'm not in favor of complexifying the
>> model, but rather put all the claiming logic in the scheduler, which is a
>> longer path to win, but a safier one.
>>
>
>  Yea, I have thought about add same resource consume at compute manager
> side, but it's ugly because we implement ironic's resource consuming method
> in two places. If we move the claiming in the scheduler the thing will
> become easy, we can just provide some extension for different consuming
> method (If I understand right the discussion in the IRC). As gantt will be
> standalone service, so validating a resource shouldn't spread into
> different service. So I agree with you.
>
>  But for now, as you said this is long term plan. We can't provide
> different resource consuming in compute manager side now, also can't move
> the claiming into scheduler now. So the method I proposed is more easy for
> now, at least we won't have different resource consuming way between
> scheduler(IonricHostManger) and compute(ResourceTracker) for ironic. And
> ironic can works fine.
>
>  The method I propose have a little problem. When all the node allocated,
> we still can saw there are some resource are free if the flavor's resource
> is less than baremetal's resource. But it can be done by expose
> max_instance to hypervisor api(running instances already exposed), then
> user will now why can't allocated more instance. And if we can configure
> max_instance for each node, sounds like useful for operator also :)
>
>
>
> I think that if you don't want to wait for the claiming system to happen
> in the Scheduler, then at least you need to fix the current way of using
> the ResourceTracker, like what Jay Pipes is working on in his spec.
>

I'm with your guys at same line now :)


>
>
>
> -Sylvain
>
>
>> -Sylvain
>>
>> [1]  https://bugs.launchpad.net/nova/+bug/1341420
>>
>>  Thanks
>> Alex
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing listOpenStack-dev at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
>
> _______________________________________________
> OpenStack-dev mailing listOpenStack-dev at lists.openstack.orghttp://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150109/a2124dea/attachment.html>


More information about the OpenStack-dev mailing list