[openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

Sylvain Bauza sbauza at redhat.com
Wed Aug 13 17:40:10 UTC 2014

Le 13/08/2014 18:40, Brian Elliott a écrit :
> On Aug 12, 2014, at 5:21 AM, Nikola Đipanov <ndipanov at redhat.com> wrote:
>> Hey Nova-istas,
>> While I was hacking on [1] I was considering how to approach the fact
>> that we now need to track one more thing (NUMA node utilization) in our
>> resources. I went with - "I'll add it to compute nodes table" thinking
>> it's a fundamental enough property of a compute host that it deserves to
>> be there, although I was considering  Extensible Resource Tracker at one
>> point (ERT from now on - see [2]) but looking at the code - it did not
>> seem to provide anything I desperately needed, so I went with keeping it
>> simple.
>> So fast-forward a few days, and I caught myself solving a problem that I
>> kept thinking ERT should have solved - but apparently hasn't, and I
>> think it is fundamentally a broken design without it - so I'd really
>> like to see it re-visited.
>> The problem can be described by the following lemma (if you take 'lemma'
>> to mean 'a sentence I came up with just now' :)):
>> """
>> Due to the way scheduling works in Nova (roughly: pick a host based on
>> stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
>> information that scheduling service used when making a placement
>> decision, needs to be available to the compute service when testing the
>> placement.
>> “""
> Correct
>> This is not the case right now, and the ERT does not propose any way to
>> solve it - (see how I hacked around needing to be able to get
>> extra_specs when making claims in [3], without hammering the DB). The
>> result will be that any resource that we add and needs user supplied
>> info for scheduling an instance against it, will need a buggy
>> re-implementation of gathering all the bits from the request that
>> scheduler sees, to be able to work properly.
> Agreed, ERT does not attempt to solve this problem of ensuring RT has an identical set of information for testing claims.  I don’t think it was intended to.
> ERT does solve the issue of bloat in the RT with adding just-one-more-thing to test usage-wise.  It gives a nice hook for inserting your claim logic for your specific use case.

I think Nikola and I agreed on the fact that ERT is not responsible for 
this design. That said I can talk on behalf of Nikola...

>> This is obviously a bigger concern when we want to allow users to pass
>> data (through image or flavor) that can affect scheduling, but still a
>> huge concern IMHO.
> I think passing additional data through to compute just wasn’t a problem that ERT aimed to solve.  (Paul Murray?)  That being said, coordinating the passing of any extra data required to test a claim that is *not* sourced from the host itself would be a very nice addition.  You are working around it with some caching in your flavor db lookup use case, although one could of course cook up a cleaner patch to pass such data through on the “build this” request to the compute.

Indeed, and that's why I think the problem can be resolved thanks to 2 
different things :
1. Filters need to look at what ERT is giving them, that's what 
isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on 
the previous emails
2. Some extra user request needs to be checked in the test() method of 
ERT plugins (where claims are done), so I provided a WIP patch for 
discussing it : https://review.openstack.org/#/c/113936/

>> As I see that there are already BPs proposing to use this IMHO broken
>> ERT ([4] for example), which will surely add to the proliferation of
>> code that hacks around these design shortcomings in what is already a
>> messy, but also crucial (for perf as well as features) bit of Nova code.
>> I propose to revert [2] ASAP since it is still fresh, and see how we can
>> come up with a cleaner design.
> I think the ERT is forward-progress here, but am willing to review patches/specs on improvements/replacements.

Sure, your comments are welcome on https://review.openstack.org/#/c/113373/
You can find an example where TypeAffinity filter is modified to look at 
HostState and where ERT is being used for updating HostState and for 
claiming resource.

>> Would like to hear opinions on this, before I propose the patch tho!
>> Thanks all,
>> Nikola
>> [1] https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
>> [2] https://review.openstack.org/#/c/109643/
>> [3] https://review.openstack.org/#/c/111782/
>> [4] https://review.openstack.org/#/c/89893
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list