[openstack-dev] [nova] Boston Forum session recap - claims in the scheduler (or conductor)

Sylvain Bauza sbauza at redhat.com
Fri May 19 10:53:41 UTC 2017



Le 19/05/2017 12:19, John Garbutt a écrit :
> On 19 May 2017 at 10:03, Sylvain Bauza <sbauza at redhat.com> wrote:
>>
>>
>> Le 19/05/2017 10:02, Sylvain Bauza a écrit :
>>>
>>>
>>> Le 19/05/2017 02:55, Matt Riedemann a écrit :
>>>> The etherpad for this session is here [1]. The goal for this session was
>>>> to inform operators and get feedback on the plan for what we're doing
>>>> with moving claims from the computes to the control layer (scheduler or
>>>> conductor).
>>>>
>>>> We mostly talked about retries, which also came up in the cells v2
>>>> session that Dan Smith led [2] and will recap later.
>>>>
>>>> Without getting into too many details, in the cells v2 session we came
>>>> to a compromise on build retries and said that we could pass hosts down
>>>> to the cell so that the cell-level conductor could retry if needed (even
>>>> though we expect doing claims at the top will fix the majority of
>>>> reasons you'd have a reschedule in the first place).
>>>>
>>>
>>> And during that session, we said that given cell-local conductors (when
>>> there is a reschedule) can't upcall the global (for all cells)
>>> schedulers, that's why we agreed to use the conductor to be calling
>>> Placement API for allocations.
>>>
>>>
>>>> During the claims in the scheduler session, a new wrinkle came up which
>>>> is the hosts that the scheduler returns to the top-level conductor may
>>>> be in different cells. So if we have two cells, A and B, with hosts x
>>>> and y in cell A and host z in cell B, we can't send z to A for retries,
>>>> or x or y to B for retries. So we need some kind of post-filter/weigher
>>>> filtering such that hosts are grouped by cell and then they can be sent
>>>> to the cells for retries as necessary.
>>>>
>>>
>>> That's already proposed for reviews in
>>> https://review.openstack.org/#/c/465175/
>>>
>>>
>>>> There was also some side discussion asking if we somehow regressed
>>>> pack-first strategies by using Placement in Ocata. John Garbutt and Dan
>>>> Smith have the context on this (I think) so I'm hoping they can clarify
>>>> if we really need to fix something in Ocata at this point, or is this
>>>> more of a case of closing a loop-hole?
>>>>
>>>
>>> The problem is that the scheduler doesn't verify the cells when trying
>>> to find a destination for an instance, it's just using weights for packing.
>>>
>>> So, for example, say I have N hosts and 2 cells, the first weighting
>>> host could be in cell1 while the second could be in cell2. Then, even if
>>> the operator uses the weighers for packing, for example a RequestSpec
>>> with num_instances=2 could push one instance in cell1 and the other in
>>> cell2.
>>>
>>> From a scheduler point of view, I think we could possibly add a
>>> CellWeigher that would help to pack instances within the same cell.
>>> Anyway, that's not related to the claims series, so we could possibly
>>> backport it for Ocata hopefully.
>>>
>>
>> Melanie actually made a good point about the current logic based on the
>> `host_subset_size`config option. If you're leaving it defaulted to 1, in
>> theory all instances coming along the scheduler would get a sorted list
>> of hosts by weights and only pick the first one (ie. packing all the
>> instances onto the same host) which is good for that (except of course
>> some user request that fits all the space of the host and where a spread
>> could be better by shuffling between multiple hosts).
>>
>> So, while I began deprecating that option because I thought the race
>> condition would be fixed by conductor claims, I think we should keep it
>> for the time being until we clearly identify whether it's still necessary.
>>
>> All what I said earlier above remains valid tho. In a world where 2
>> hosts are given as the less weighed ones, we could send instances from
>> the same user request onto different cells, but that only ties the
>> problem to a multi-instance boot problem, which is far less impactful.
> 
> FWIW, I think we need to keep this.
> 
> If you have *lots* of contention when picking your host, increasing
> host_subset_size should help reduce that contention (and maybe help
> increase the throughput). I haven't written a simulator to test it
> out, but it feels like we will still need to keep the fuzzy select.
> That might just be a different way to say the same thing mel was
> saying, not sure.
> 

Yup, agreed, thanks to Mel, that's why I'm providing a new revision that
is no longer removing this conf opt.

Melanie, very good point!

-Sylvain

> Thanks,
> johnthetubaguy
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list