[openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

Nikola Đipanov ndipanov at redhat.com
Thu Aug 14 08:07:31 UTC 2014


On 08/13/2014 07:40 PM, Sylvain Bauza wrote:
> 
> Le 13/08/2014 18:40, Brian Elliott a écrit :
>> On Aug 12, 2014, at 5:21 AM, Nikola Đipanov <ndipanov at redhat.com> wrote:
>>> The problem can be described by the following lemma (if you take 'lemma'
>>> to mean 'a sentence I came up with just now' :)):
>>>
>>> """
>>> Due to the way scheduling works in Nova (roughly: pick a host based on
>>> stale(ish) data, rely on claims to trigger a re-schedule), _same exact_
>>> information that scheduling service used when making a placement
>>> decision, needs to be available to the compute service when testing the
>>> placement.
>>> “""
>> Correct
>>
>>> This is not the case right now, and the ERT does not propose any way to
>>> solve it - (see how I hacked around needing to be able to get
>>> extra_specs when making claims in [3], without hammering the DB). The
>>> result will be that any resource that we add and needs user supplied
>>> info for scheduling an instance against it, will need a buggy
>>> re-implementation of gathering all the bits from the request that
>>> scheduler sees, to be able to work properly.
>> Agreed, ERT does not attempt to solve this problem of ensuring RT has
>> an identical set of information for testing claims.  I don’t think it
>> was intended to.
>>
>> ERT does solve the issue of bloat in the RT with adding
>> just-one-more-thing to test usage-wise.  It gives a nice hook for
>> inserting your claim logic for your specific use case.
> 
> I think Nikola and I agreed on the fact that ERT is not responsible for
> this design. That said I can talk on behalf of Nikola...
> 

Right - the hooks, however, hook into a piece of code that has not been
designed with this kind of extensibility in mind (to put it politely),
and exposing these hooks so that people can add functionality that is
broken by design is just asking for technical debt to accumulate more
quickly.

> 
>>> This is obviously a bigger concern when we want to allow users to pass
>>> data (through image or flavor) that can affect scheduling, but still a
>>> huge concern IMHO.
>> I think passing additional data through to compute just wasn’t a
>> problem that ERT aimed to solve.  (Paul Murray?)  That being said,
>> coordinating the passing of any extra data required to test a claim
>> that is *not* sourced from the host itself would be a very nice
>> addition.  You are working around it with some caching in your flavor
>> db lookup use case, although one could of course cook up a cleaner
>> patch to pass such data through on the “build this” request to the
>> compute.

The problem is - it would not only be a nice addition - it is
_necessary_ in order to be able write code that is race free, as we all
agreed on previously when I stated my lemma above :). We can try to
disagree on that, but if we end up agreeing - I find it hard to imagine
someone would defend keeping the ERT. The result would be that will
still have things like my caching hack and things like [1] popping up
_in addition_ to ERT extensions that don't need user data, and those
that do but don't know it and end up introducing races. All of this is
just bad.

[1] https://review.openstack.org/#/c/77800/

> 
> Indeed, and that's why I think the problem can be resolved thanks to 2
> different things :
> 1. Filters need to look at what ERT is giving them, that's what
> isolate-scheduler-db is trying to do (see my patches [2.3 and 2.4] on
> the previous emails
> 2. Some extra user request needs to be checked in the test() method of
> ERT plugins (where claims are done), so I provided a WIP patch for
> discussing it : https://review.openstack.org/#/c/113936/
> 
> 

Several shortcomings discussed on the review so won't repeat them here -
but I agree - it's a nice start.

>>> As I see that there are already BPs proposing to use this IMHO broken
>>> ERT ([4] for example), which will surely add to the proliferation of
>>> code that hacks around these design shortcomings in what is already a
>>> messy, but also crucial (for perf as well as features) bit of Nova code.
>>>
>>> I propose to revert [2] ASAP since it is still fresh, and see how we can
>>> come up with a cleaner design.
>>>
>> I think the ERT is forward-progress here, but am willing to review
>> patches/specs on improvements/replacements.

Even though I disagree with several design decisions in addition to the
problem we are discussing here (and feel mildly guilty for not bringing
them up sooner), I would be happy to help with a base-line of things
that need to be fixed, and no more, before we can add it back.

I can see us keeping it, bit not allowing any new resource extensions in
before refactoring, but I am not sure I see the real win in that. I am
of course open to hear other proposals that acknowledge the brokenness.

N.



More information about the OpenStack-dev mailing list