[openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

Nikola Đipanov ndipanov at redhat.com
Thu Aug 14 07:21:44 UTC 2014

On 08/13/2014 06:05 PM, Sylvain Bauza wrote:
> Le 13/08/2014 12:21, Sylvain Bauza a écrit :
>> Le 12/08/2014 22:06, Sylvain Bauza a écrit :
>>> Le 12/08/2014 18:54, Nikola Đipanov a écrit :
>>>> On 08/12/2014 04:49 PM, Sylvain Bauza wrote:
>>>>> (sorry for reposting, missed 2 links...)
>>>>> Hi Nikola,
>>>>> Le 12/08/2014 12:21, Nikola Đipanov a écrit :
>>>>>> Hey Nova-istas,
>>>>>> While I was hacking on [1] I was considering how to approach the fact
>>>>>> that we now need to track one more thing (NUMA node utilization)
>>>>>> in our
>>>>>> resources. I went with - "I'll add it to compute nodes table"
>>>>>> thinking
>>>>>> it's a fundamental enough property of a compute host that it
>>>>>> deserves to
>>>>>> be there, although I was considering  Extensible Resource Tracker
>>>>>> at one
>>>>>> point (ERT from now on - see [2]) but looking at the code - it did
>>>>>> not
>>>>>> seem to provide anything I desperately needed, so I went with
>>>>>> keeping it
>>>>>> simple.
>>>>>> So fast-forward a few days, and I caught myself solving a problem
>>>>>> that I
>>>>>> kept thinking ERT should have solved - but apparently hasn't, and I
>>>>>> think it is fundamentally a broken design without it - so I'd really
>>>>>> like to see it re-visited.
>>>>>> The problem can be described by the following lemma (if you take
>>>>>> 'lemma'
>>>>>> to mean 'a sentence I came up with just now' :)):
>>>>>> """
>>>>>> Due to the way scheduling works in Nova (roughly: pick a host
>>>>>> based on
>>>>>> stale(ish) data, rely on claims to trigger a re-schedule), _same
>>>>>> exact_
>>>>>> information that scheduling service used when making a placement
>>>>>> decision, needs to be available to the compute service when
>>>>>> testing the
>>>>>> placement.
>>>>>> """
>>>>>> This is not the case right now, and the ERT does not propose any
>>>>>> way to
>>>>>> solve it - (see how I hacked around needing to be able to get
>>>>>> extra_specs when making claims in [3], without hammering the DB). The
>>>>>> result will be that any resource that we add and needs user supplied
>>>>>> info for scheduling an instance against it, will need a buggy
>>>>>> re-implementation of gathering all the bits from the request that
>>>>>> scheduler sees, to be able to work properly.
>>>>> Well, ERT does provide a plugin mechanism for testing resources at the
>>>>> claim level. This is the plugin responsibility to implement a test()
>>>>> method [2.1] which will be called when test_claim() [2.2]
>>>>> So, provided this method is implemented, a local host check can be
>>>>> done
>>>>> based on the host's view of resources.
>>>> Yes - the problem is there is no clear API to get all the needed
>>>> bits to
>>>> do so - especially the user supplied one from image and flavors.
>>>> On top of that, in current implementation we only pass a hand-wavy
>>>> 'usage' blob in. This makes anyone wanting to use this in conjunction
>>>> with some of the user supplied bits roll their own
>>>> 'extract_data_from_instance_metadata_flavor_image' or similar which is
>>>> horrible and also likely bad for performance.
>>> I see your concern where there is no interface for user-facing
>>> resources like flavor or image metadata.
>>> I also think indeed that the big 'usage' blob is not a good choice
>>> for long-term vision.
>>> That said, I don't think as we say in French to throw the bath
>>> water... ie. the problem is with the RT, not the ERT (apart the
>>> mention of third-party API that you noted - I'll go to it later below)
>>>>>> This is obviously a bigger concern when we want to allow users to
>>>>>> pass
>>>>>> data (through image or flavor) that can affect scheduling, but
>>>>>> still a
>>>>>> huge concern IMHO.
>>>>> And here is where I agree with you : at the moment, ResourceTracker
>>>>> (and
>>>>> consequently Extensible RT) only provides the view of the resources
>>>>> the
>>>>> host is knowing (see my point above) and possibly some other resources
>>>>> are missing.
>>>>> So, whatever your choice of going with or without ERT, your patch [3]
>>>>> still deserves it if we want not to lookup DB each time a claim goes.
>>>>>> As I see that there are already BPs proposing to use this IMHO broken
>>>>>> ERT ([4] for example), which will surely add to the proliferation of
>>>>>> code that hacks around these design shortcomings in what is already a
>>>>>> messy, but also crucial (for perf as well as features) bit of Nova
>>>>>> code.
>>>>> Two distinct implementations of that spec (ie. instances and flavors)
>>>>> have been proposed [2.3] [2.4] so reviews are welcome. If you see the
>>>>> test() method, it's no-op thing for both plugins. I'm open to comments
>>>>> because I have the stated problem : how can we define a limit on
>>>>> just a
>>>>> counter of instances and flavors ?
>>>> Will look at these - but none of them seem to hit the issue I am
>>>> complaining about, and that is that it will need to consider other
>>>> request data for claims, not only data available for on instances.
>>>> Also - the fact that you don't implement test() in flavor ones tells me
>>>> that the implementation is indeed racy (but it is racy atm as well) and
>>>> two requests can indeed race for the same host, and since no claims are
>>>> done, both can succeed. This is I believe (at least in case of single
>>>> flavor hosts) unlikely to happen in practice, but you get the idea.
>>> Agreed, these 2 patches probably require another iteration, in
>>> particular how we make sure that it won't be racy. So I need another
>>> run to think about what to test() for these 2 examples.
>>> Another patch has to be done for aggregates, but it's still WIP so
>>> not mentioned here.
>>> Anyway, as discussed during today's meeting, these 2 patches will not
>>> be based on ERT because of the risk it goes for Juno so let's scope
>>> them out of this thread.
>>>>>> I propose to revert [2] ASAP since it is still fresh, and see how
>>>>>> we can
>>>>>> come up with a cleaner design.
>>>>>> Would like to hear opinions on this, before I propose the patch tho!
>>>>> IMHO, I think the problem is more likely that the regular RT misses
>>>>> some
>>>>> information for each host so it requires to handle it on a
>>>>> case-by-case
>>>>> basis, but I don't think ERT either increases complexity or creates
>>>>> another issue.
>>>> RT does not miss info about the host, but about the particular request
>>>> which we have to fish out of different places like image_metadata
>>>> extra_specs etc, yet - it can't really work without them. This is
>>>> definitely a RT issue that is not specific to ERT.
>>> +1, I agree with you, that's an API issue for RT : how do we pass out
>>> user-defined metrics ?
>>> I still need to figure out which kind of usecases are requiring such
>>> examples, albeit the NUMA usecase of course.
>> After a night and morning of thinking about this problem, I'm writing
>> down some ideas that could help fixing the problem (although I don't
>> think it's a long-term scenario, rather a workaround) :
>> Considering you need the metadata of the flavor asked by user when
>> claiming a resource :
>> 1. Compute Manager knows request_spec when building an instance, so it
>> could pass out request_spec as paramater (optional or not) to
>> rt.instance_claim() in the claim context
>> 2. As ERT does, when claiming, it calls the ERT plugin .test() method
>> so with request_spec as extra parameter
>> 3. On the ERT side, a plugin is responsible for calling (and caching)
>> all flavors metadata, or even only the key you need to compare
>> 4. Scheduler knows flavors metadata thanks to ERT plugin and blueprint
>> isolate-scheduler-db so it makes decisions on the same subset as the RT
>> 5. When claiming in ERT plugin test(), it extracts the flavor_id,
>> looks at the internal in-memory representation of flavors to get the
>> metadata, and calls the NUMA _test_numa_topology() method for checking
>> wrt this flavor metadata
>> The idea is that so filters and claims are doing separate decisions on
>> the same source of knowledge, here being ERT.
>> -Sylvain
> Just made a draft over my thoughts here :
> https://review.openstack.org/113936

I have commented on the review itself in more detail, but in short -
this solution won't work for me, and thus in the general case as well, as

1) we need a way to persist the request data (and not make it a horrible
performance bottleneck), as caching won't cut it.

2) We also need a way to use it in every periodic task run.

None of this is only ERT problem, they are problems of the RT itself,
but as I have said in the thread already - adding a plugin API on top of
something that is broken is just asking for all sorts of trouble.

Still leaning towards revert really - anything else would be short
sighted. I would be happy to help come up with a baseline of things to
fix before we can add it back, to avoid deferring it needlesly due to
scope creep.


> -S
>>>> However, I still see several issues with the current ERT
>>>> implementation,
>>>> but the one I am getting at here, and why I think we should revert it,
>>>> is that we are building a 3rd party plugin API that is tightly coupled
>>>> to an obviously flawed internal API (RT and Claims).
>>>> We have no policy AFIK about what guarantees we provide to 3rd party
>>>> plugins, but if we are going to be breaking them all the time, or in
>>>> this case providing very limited usefulness - I see little value in the
>>>> current implementation of ERT, and see issues with it staying, which
>>>> means future work will be based on it.
>>>> N.
>>> *This* is to me IMHO the biggest problem with ERT : if we say that we
>>> externalize an API for 3rd-party plugins, we need to make sure that
>>> everything is handled.
>>> That said, instead of reverting the whole patch, could we just
>>> consider to only accept changes that wouldn't require user-facing
>>> metrics ?
>>> The existing VCPU implementation still sounds good to me, so we can
>>> just consider a clear communication on what is acceptable and what
>>> not (ie. a docstring in the plugin API or so., plus -2/-1 reviews if
>>> needed)
>>> -Sylvain
>>>>> Thanks,
>>>>> -Sylvain
>>>>>> Thanks all,
>>>>>> Nikola
>>>>>> [1]
>>>>>> https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement
>>>>>> [2] https://review.openstack.org/#/c/109643/
>>>>>> [3] https://review.openstack.org/#/c/111782/
>>>>>> [4] https://review.openstack.org/#/c/89893
>>>>>> _______________________________________________
>>>>>> OpenStack-dev mailing list
>>>>>> OpenStack-dev at lists.openstack.org
>>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>> [2.1]
>>>>> https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75
>>>>> [2.2]
>>>>> https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134
>>>>> [2.3] https://review.openstack.org/112578
>>>>> [2.4] https://review.openstack.org/113373
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

More information about the OpenStack-dev mailing list