[openstack-dev] [Nova] Concerns around the Extensible Resource Tracker design - revert maybe?

Sylvain Bauza sbauza at redhat.com
Wed Aug 13 16:05:11 UTC 2014


Le 13/08/2014 12:21, Sylvain Bauza a écrit :
>
> Le 12/08/2014 22:06, Sylvain Bauza a écrit :
>>
>> Le 12/08/2014 18:54, Nikola Đipanov a écrit :
>>> On 08/12/2014 04:49 PM, Sylvain Bauza wrote:
>>>> (sorry for reposting, missed 2 links...)
>>>>
>>>> Hi Nikola,
>>>>
>>>> Le 12/08/2014 12:21, Nikola Đipanov a écrit :
>>>>> Hey Nova-istas,
>>>>>
>>>>> While I was hacking on [1] I was considering how to approach the fact
>>>>> that we now need to track one more thing (NUMA node utilization) 
>>>>> in our
>>>>> resources. I went with - "I'll add it to compute nodes table" 
>>>>> thinking
>>>>> it's a fundamental enough property of a compute host that it 
>>>>> deserves to
>>>>> be there, although I was considering  Extensible Resource Tracker 
>>>>> at one
>>>>> point (ERT from now on - see [2]) but looking at the code - it did 
>>>>> not
>>>>> seem to provide anything I desperately needed, so I went with 
>>>>> keeping it
>>>>> simple.
>>>>>
>>>>> So fast-forward a few days, and I caught myself solving a problem 
>>>>> that I
>>>>> kept thinking ERT should have solved - but apparently hasn't, and I
>>>>> think it is fundamentally a broken design without it - so I'd really
>>>>> like to see it re-visited.
>>>>>
>>>>> The problem can be described by the following lemma (if you take 
>>>>> 'lemma'
>>>>> to mean 'a sentence I came up with just now' :)):
>>>>>
>>>>> """
>>>>> Due to the way scheduling works in Nova (roughly: pick a host 
>>>>> based on
>>>>> stale(ish) data, rely on claims to trigger a re-schedule), _same 
>>>>> exact_
>>>>> information that scheduling service used when making a placement
>>>>> decision, needs to be available to the compute service when 
>>>>> testing the
>>>>> placement.
>>>>> """
>>>>>
>>>>> This is not the case right now, and the ERT does not propose any 
>>>>> way to
>>>>> solve it - (see how I hacked around needing to be able to get
>>>>> extra_specs when making claims in [3], without hammering the DB). The
>>>>> result will be that any resource that we add and needs user supplied
>>>>> info for scheduling an instance against it, will need a buggy
>>>>> re-implementation of gathering all the bits from the request that
>>>>> scheduler sees, to be able to work properly.
>>>> Well, ERT does provide a plugin mechanism for testing resources at the
>>>> claim level. This is the plugin responsibility to implement a test()
>>>> method [2.1] which will be called when test_claim() [2.2]
>>>>
>>>> So, provided this method is implemented, a local host check can be 
>>>> done
>>>> based on the host's view of resources.
>>>>
>>>>
>>> Yes - the problem is there is no clear API to get all the needed 
>>> bits to
>>> do so - especially the user supplied one from image and flavors.
>>> On top of that, in current implementation we only pass a hand-wavy
>>> 'usage' blob in. This makes anyone wanting to use this in conjunction
>>> with some of the user supplied bits roll their own
>>> 'extract_data_from_instance_metadata_flavor_image' or similar which is
>>> horrible and also likely bad for performance.
>>
>> I see your concern where there is no interface for user-facing 
>> resources like flavor or image metadata.
>> I also think indeed that the big 'usage' blob is not a good choice 
>> for long-term vision.
>>
>> That said, I don't think as we say in French to throw the bath 
>> water... ie. the problem is with the RT, not the ERT (apart the 
>> mention of third-party API that you noted - I'll go to it later below)
>>>>> This is obviously a bigger concern when we want to allow users to 
>>>>> pass
>>>>> data (through image or flavor) that can affect scheduling, but 
>>>>> still a
>>>>> huge concern IMHO.
>>>> And here is where I agree with you : at the moment, ResourceTracker 
>>>> (and
>>>> consequently Extensible RT) only provides the view of the resources 
>>>> the
>>>> host is knowing (see my point above) and possibly some other resources
>>>> are missing.
>>>> So, whatever your choice of going with or without ERT, your patch [3]
>>>> still deserves it if we want not to lookup DB each time a claim goes.
>>>>
>>>>
>>>>> As I see that there are already BPs proposing to use this IMHO broken
>>>>> ERT ([4] for example), which will surely add to the proliferation of
>>>>> code that hacks around these design shortcomings in what is already a
>>>>> messy, but also crucial (for perf as well as features) bit of Nova 
>>>>> code.
>>>> Two distinct implementations of that spec (ie. instances and flavors)
>>>> have been proposed [2.3] [2.4] so reviews are welcome. If you see the
>>>> test() method, it's no-op thing for both plugins. I'm open to comments
>>>> because I have the stated problem : how can we define a limit on 
>>>> just a
>>>> counter of instances and flavors ?
>>>>
>>> Will look at these - but none of them seem to hit the issue I am
>>> complaining about, and that is that it will need to consider other
>>> request data for claims, not only data available for on instances.
>>>
>>> Also - the fact that you don't implement test() in flavor ones tells me
>>> that the implementation is indeed racy (but it is racy atm as well) and
>>> two requests can indeed race for the same host, and since no claims are
>>> done, both can succeed. This is I believe (at least in case of single
>>> flavor hosts) unlikely to happen in practice, but you get the idea.
>>
>> Agreed, these 2 patches probably require another iteration, in 
>> particular how we make sure that it won't be racy. So I need another 
>> run to think about what to test() for these 2 examples.
>> Another patch has to be done for aggregates, but it's still WIP so 
>> not mentioned here.
>>
>> Anyway, as discussed during today's meeting, these 2 patches will not 
>> be based on ERT because of the risk it goes for Juno so let's scope 
>> them out of this thread.
>>
>>
>>>>
>>>>> I propose to revert [2] ASAP since it is still fresh, and see how 
>>>>> we can
>>>>> come up with a cleaner design.
>>>>>
>>>>> Would like to hear opinions on this, before I propose the patch tho!
>>>> IMHO, I think the problem is more likely that the regular RT misses 
>>>> some
>>>> information for each host so it requires to handle it on a 
>>>> case-by-case
>>>> basis, but I don't think ERT either increases complexity or creates
>>>> another issue.
>>>>
>>> RT does not miss info about the host, but about the particular request
>>> which we have to fish out of different places like image_metadata
>>> extra_specs etc, yet - it can't really work without them. This is
>>> definitely a RT issue that is not specific to ERT.
>>
>> +1, I agree with you, that's an API issue for RT : how do we pass out 
>> user-defined metrics ?
>> I still need to figure out which kind of usecases are requiring such 
>> examples, albeit the NUMA usecase of course.
>>
>
> After a night and morning of thinking about this problem, I'm writing 
> down some ideas that could help fixing the problem (although I don't 
> think it's a long-term scenario, rather a workaround) :
>
> Considering you need the metadata of the flavor asked by user when 
> claiming a resource :
> 1. Compute Manager knows request_spec when building an instance, so it 
> could pass out request_spec as paramater (optional or not) to 
> rt.instance_claim() in the claim context
> 2. As ERT does, when claiming, it calls the ERT plugin .test() method 
> so with request_spec as extra parameter
> 3. On the ERT side, a plugin is responsible for calling (and caching) 
> all flavors metadata, or even only the key you need to compare
> 4. Scheduler knows flavors metadata thanks to ERT plugin and blueprint 
> isolate-scheduler-db so it makes decisions on the same subset as the RT
> 5. When claiming in ERT plugin test(), it extracts the flavor_id, 
> looks at the internal in-memory representation of flavors to get the 
> metadata, and calls the NUMA _test_numa_topology() method for checking 
> wrt this flavor metadata
>
>
> The idea is that so filters and claims are doing separate decisions on 
> the same source of knowledge, here being ERT.
>
> -Sylvain
>
>

Just made a draft over my thoughts here : 
https://review.openstack.org/113936

-S
>
>
>>>
>>> However, I still see several issues with the current ERT 
>>> implementation,
>>> but the one I am getting at here, and why I think we should revert it,
>>> is that we are building a 3rd party plugin API that is tightly coupled
>>> to an obviously flawed internal API (RT and Claims).
>>>
>>> We have no policy AFIK about what guarantees we provide to 3rd party
>>> plugins, but if we are going to be breaking them all the time, or in
>>> this case providing very limited usefulness - I see little value in the
>>> current implementation of ERT, and see issues with it staying, which
>>> means future work will be based on it.
>>>
>>> N.
>>
>> *This* is to me IMHO the biggest problem with ERT : if we say that we 
>> externalize an API for 3rd-party plugins, we need to make sure that 
>> everything is handled.
>>
>> That said, instead of reverting the whole patch, could we just 
>> consider to only accept changes that wouldn't require user-facing 
>> metrics ?
>> The existing VCPU implementation still sounds good to me, so we can 
>> just consider a clear communication on what is acceptable and what 
>> not (ie. a docstring in the plugin API or so., plus -2/-1 reviews if 
>> needed)
>>
>> -Sylvain
>>
>>>> Thanks,
>>>> -Sylvain
>>>>
>>>>> Thanks all,
>>>>>
>>>>> Nikola
>>>>>
>>>>> [1]
>>>>> https://blueprints.launchpad.net/nova/+spec/virt-driver-numa-placement 
>>>>>
>>>>> [2] https://review.openstack.org/#/c/109643/
>>>>> [3] https://review.openstack.org/#/c/111782/
>>>>> [4] https://review.openstack.org/#/c/89893
>>>>>
>>>>> _______________________________________________
>>>>> OpenStack-dev mailing list
>>>>> OpenStack-dev at lists.openstack.org
>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>> [2.1]
>>>> https://github.com/openstack/nova/blob/master/nova/compute/resources/__init__.py#L75 
>>>>
>>>>
>>>> [2.2]
>>>> https://github.com/openstack/nova/blob/master/nova/compute/claims.py#L134 
>>>>
>>>> [2.3] https://review.openstack.org/112578
>>>> [2.4] https://review.openstack.org/113373
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenStack-dev mailing list
>>>> OpenStack-dev at lists.openstack.org
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>> _______________________________________________
>>> OpenStack-dev mailing list
>>> OpenStack-dev at lists.openstack.org
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




More information about the OpenStack-dev mailing list