[nova] Mempage fun

Bal√°zs Gibizer balazs.gibizer at ericsson.com
Tue Jan 8 08:54:39 UTC 2019



On Mon, Jan 7, 2019 at 6:32 PM, Stephen Finucane <sfinucan at redhat.com> 
wrote:
> We've been looking at a patch that landed some months ago and have
> spotted some issues:
> 
> https://review.openstack.org/#/c/532168
> 
> In summary, that patch is intended to make the memory check for
> instances memory pagesize aware. The logic it introduces looks
> something like this:
> 
>    If the instance requests a specific pagesize
>       (#1) Check if each host cell can provide enough memory of the
>       pagesize requested for each instance cell
>    Otherwise
>       If the host has hugepages
>          (#2) Check if each host cell can provide enough memory of the
>          smallest pagesize available on the host for each instance 
> cell
>       Otherwise
>          (#3) Check if each host cell can provide enough memory for
>          each instance cell, ignoring pagesizes
> 
> This also has the side-effect of allowing instances with hugepages and
> instances with a NUMA topology but no hugepages to co-exist on the 
> same
> host, because the latter will now be aware of hugepages and won't
> consume them. However, there are a couple of issues with this:
> 
>    1. It breaks overcommit for instances without pagesize request
>       running on hosts with different pagesizes. This is because we 
> don't
>       allow overcommit for hugepages, but case (#2) above means we 
> are now
>       reusing the same functions previously used for actual hugepage
>       checks to check for regular 4k pages
>    2. It doesn't fix the issue when non-NUMA instances exist on the 
> same
>       host as NUMA instances with hugepages. The non-NUMA instances 
> don't
>       run through any of the code above, meaning they're still not
>       pagesize aware
> 
> We could probably fix issue (1) by modifying those hugepage functions
> we're using to allow overcommit via a flag that we pass for case (#2).
> We can mitigate issue (2) by advising operators to split hosts into
> aggregates for 'hw:mem_page_size' set or unset (in addition to
> 'hw:cpu_policy' set to dedicated or shared/unset). I need to check but
> I think this may be the case in some docs (sean-k-mooney said Intel
> used to do this. I don't know about Red Hat's docs or upstream). In
> addition, we did actually called that out in the original spec:
> 
> https://specs.openstack.org/openstack/nova-specs/specs/juno/approved/virt-driver-large-pages.html#other-deployer-impact
> 
> However, if we're doing that for non-NUMA instances, one would have to
> question why the patch is necessary/acceptable for NUMA instances. For
> what it's worth, a longer fix would be to start tracking hugepages in 
> a
> non-NUMA aware way too but that's a lot more work and doesn't fix the
> issue now.
> 
> As such, my question is this: should be look at fixing issue (1) and
> documenting issue (2), or should we revert the thing wholesale until 
> we
> work on a solution that could e.g. let us track hugepages via 
> placement
> and resolve issue (2) too.

If you feel that fixing (1) is pretty simple then I suggest to do that 
and document the limitation of (2) while we think about a proper 
solution.

gibi

> 
> Thoughts?
> Stephen
> 
> 




More information about the openstack-discuss mailing list