[openstack-dev] [Nova][Ironic] Question about scheduling two instances to same baremetal node

Sylvain Bauza sbauza at redhat.com
Fri Jan 9 14:21:13 UTC 2015


Le 09/01/2015 15:07, Murray, Paul (HP Cloud) a écrit :
>
> >There is bug when running nova with ironic 
> https://bugs.launchpad.net/nova/+bug/1402658
>
> I filed this bug – it has been a problem for us.
>
> >The problem is at scheduler side the IronicHostManager will consume 
> all the resources for that node whatever
>
> >how much resource the instance used. But at compute node side, the 
> ResourceTracker won't consume resources
>
> >like that, just consume like normal virtual instance. And 
> ResourceTracker will update the resource usage once the
>
> >instance resource claimed, then scheduler will know there are some 
> free resource on that node, then will try to
>
> >schedule other new instance to that node
>
> You have summed up the problem nicely – i.e.: the resource 
> availability is calculated incorrectly for ironic nodes.
>
> >I take look at that, there is NumInstanceFilter, it will limit how 
> many instance can schedule to one host. So can
>
> >we just use this filter to finish the goal? The max instance is 
> configured by option 'max_instances_per_host', we
>
> >can make the virt driver to report how many instances it supported. 
> The ironic driver can just report max_instances_per_host=1.
>
> >And libvirt driver can report max_instance_per_host=-1, that means no 
> limit. And then we can just remove the
>
> >IronicHostManager, then make the scheduler side is more simpler. Does 
> make sense? or there are more trap?
>
> Makes sense, but solves the wrong problem. The problem is what you 
> said above – i.e.: the resource availability is calculated incorrectly 
> for ironic nodes.
>
> The right solution would be to fix the resource tracker. The ram 
> resource on an ironic node has different allocation behavior to a 
> regular node. The test to see if a new instance fits is the same, but 
> instead of deducting the requested amount to get the remaining 
> availability it should simply return 0. This should be dealt with in 
> the new resource objects ([2] below) by either having different 
> version of the resource object for ironic nodes (certainly doable and 
> the most sensible option – resources should be presented according to 
> the resources on the host). Alternatively the ram resource object 
> should cater for the difference in its calculations.
>
> I have a local fix for this that I was too shy to propose upstream 
> because it’s a bit hacky and will hopefully be obsolete soon. I could 
> share it if you like.
>
> Paul
>
> [2] https://review.openstack.org/#/c/127609/
>

Agreed, I think that [2] will help a lot. Until it's done, are we really 
sure we want to fix the bug ? It can be workarounded by creating flavors 
being at least half the compute nodes and I really would like adding 
more tech debt.

-Sylvain

> From: *Sylvain Bauza* <sbauza at redhat.com <mailto:sbauza at redhat.com>>
> Date: 9 January 2015 at 09:17
> Subject: Re: [openstack-dev] [Nova][Ironic] Question about scheduling 
> two instances to same baremetal node
> To: "OpenStack Development Mailing List (not for usage questions)" 
> <openstack-dev at lists.openstack.org 
> <mailto:openstack-dev at lists.openstack.org>>
>
> Le 09/01/2015 09:01, Alex Xu a écrit :
>
>     Hi, All
>
>     There is bug when running nova with ironic
>     https://bugs.launchpad.net/nova/+bug/1402658
>
>     The case is simple: one baremetal node with 1024MB ram, then boot
>     two instances with 512MB ram flavor.
>
>     Those two instances will be scheduling to same baremetal node.
>
>     The problem is at scheduler side the IronicHostManager will
>     consume all the resources for that node whatever
>
>     how much resource the instance used. But at compute node side, the
>     ResourceTracker won't consume resources
>
>     like that, just consume like normal virtual instance. And
>     ResourceTracker will update the resource usage once the
>
>     instance resource claimed, then scheduler will know there are some
>     free resource on that node, then will try to
>
>     schedule other new instance to that node.
>
>     I take look at that, there is NumInstanceFilter, it will limit how
>     many instance can schedule to one host. So can
>
>     we just use this filter to finish the goal? The max instance is
>     configured by option 'max_instances_per_host', we
>
>     can make the virt driver to report how many instances it
>     supported. The ironic driver can just report max_instances_per_host=1.
>
>     And libvirt driver can report max_instance_per_host=-1, that means
>     no limit. And then we can just remove the
>
>     IronicHostManager, then make the scheduler side is more simpler.
>     Does make sense? or there are more trap?
>
>     Thanks in advance for any feedback and suggestion.
>
> Mmm, I think I disagree with your proposal. Let me explain by the best 
> I can why :
>
> tl;dr: Any proposal unless claiming at the scheduler level tends to be 
> wrong
>
> The ResourceTracker should be only a module for providing stats about 
> compute nodes to the Scheduler.
> How the Scheduler is consuming these resources for making a decision 
> should only be a Scheduler thing.
>
> Here, the problem is that the decision making is also shared with the 
> ResourceTracker because of the claiming system managed by the context 
> manager when booting an instance. It means that we have 2 distinct 
> decision makers for validating a resource.
>
> Let's stop to be realistic for a moment and discuss about what could 
> mean a decision for something else than a compute node. Ok, let say a 
> volume.
> Provided that *something* would report the volume statistics to the 
> Scheduler, that would be the Scheduler which would manage if a volume 
> manager could accept a volume request. There is no sense to validate 
> the decision of the Scheduler on the volume manager, just maybe doing 
> some error management.
>
> We know that the current model is kinda racy with Ironic because there 
> is a 2-stage validation (see [1]). I'm not in favor of complexifying 
> the model, but rather put all the claiming logic in the scheduler, 
> which is a longer path to win, but a safier one.
>
> -Sylvain
>
> [1] https://bugs.launchpad.net/nova/+bug/1341420
>
>
>     Thanks
>
>     Alex
>
>     _______________________________________________
>
>     OpenStack-dev mailing list
>
>     OpenStack-dev at lists.openstack.org  <mailto:OpenStack-dev at lists.openstack.org>
>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org 
> <mailto:OpenStack-dev at lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150109/7b982faf/attachment-0001.html>


More information about the OpenStack-dev mailing list