[openstack-dev] [nova] Stein PTG summary
jaypipes at gmail.com
Thu Sep 27 18:16:51 UTC 2018
On 09/27/2018 11:15 AM, Eric Fried wrote:
> On 09/27/2018 07:37 AM, Matt Riedemann wrote:
>> On 9/27/2018 5:23 AM, Sylvain Bauza wrote:
>>> On Thu, Sep 27, 2018 at 2:46 AM Matt Riedemann <mriedemos at gmail.com
>>> <mailto:mriedemos at gmail.com>> wrote:
>>> On 9/26/2018 5:30 PM, Sylvain Bauza wrote:
>>> > So, during this day, we also discussed about NUMA affinity and we
>>> > that we could possibly use nested resource providers for NUMA
>>> cells in
>>> > Stein, but given we don't have yet a specific Placement API
>>> query, NUMA
>>> > affinity should still be using the NUMATopologyFilter.
>>> > That said, when looking about how to use this filter for vGPUs,
>>> it looks
>>> > to me that I'd need to provide a new version for the NUMACell
>>> object and
>>> > modify the virt.hardware module. Are we also accepting this
>>> (given it's
>>> > a temporary question), or should we need to wait for the
>>> Placement API
>>> > support ?
>>> > Folks, what are you thoughts ?
>>> I'm pretty sure we've said several times already that modeling
>>> NUMA in
>>> Placement is not something for which we're holding up the extraction.
>>> It's not an extraction question. Just about knowing whether the Nova
>>> folks would accept us to modify some o.vo object and module just for a
>>> temporary time until Placement API has some new query parameter.
>>> Whether Placement is extracted or not isn't really the problem, it's
>>> more about the time it will take for this query parameter ("numbered
>>> request groups to be in the same subtree") to be implemented in the
>>> Placement API.
>>> The real problem we have with vGPUs is that if we don't have NUMA
>>> affinity, the performance would be around 10% less for vGPUs (if the
>>> pGPU isn't on the same NUMA cell than the pCPU). Not sure large
>>> operators would accept that :(
>> I don't know how close we are to having whatever we need for modeling
>> NUMA in the placement API, but I'll go out on a limb and assume we're
>> not close.
> True story. We've been talking about ways to do this since (at least)
> the Queens PTG, but haven't even landed on a decent design, let alone
> talked about getting it specced, prioritized, and implemented. Since
> full NRP support was going to be a prerequisite in any case, and our
> Stein plate is full, Train is the earliest we could reasonably expect to
> get the placement support going, let alone the nova side. So yeah...
>> Given that, if we have to do something within nova for NUMA
>> affinity for vGPUs for the NUMATopologyFilter, then I'd be OK with that
>> since it's short term like you said (although our "short term"
>> workarounds tend to last for many releases). Anyone that cares about
>> NUMA today already has to enable the scheduler filter anyway.
> +1 to this ^
Or, I don't know, maybe don't do anything and deal with the (maybe) 10%
performance impact from the cross-NUMA main memory <-> CPU hit for
post-processing of already parallel-processed GPU data.
In other words, like I've mentioned in numerous specs and in person, I
really don't think this is a major problem and is mostly something we're
making a big deal about for no real reason.
More information about the OpenStack-dev