[nova] [placement] Mempage fun

Sean Mooney smooney at redhat.com
Mon Jan 7 19:19:46 UTC 2019


On Mon, 2019-01-07 at 17:32 +0000, Stephen Finucane wrote:
> We've been looking at a patch that landed some months ago and have
> spotted some issues:
> 
> https://review.openstack.org/#/c/532168
> 
> In summary, that patch is intended to make the memory check for
> instances memory pagesize aware. The logic it introduces looks
> something like this:
> 
>    If the instance requests a specific pagesize
>       (#1) Check if each host cell can provide enough memory of the
>       pagesize requested for each instance cell
>    Otherwise
>       If the host has hugepages
>          (#2) Check if each host cell can provide enough memory of the
>          smallest pagesize available on the host for each instance cell
>       Otherwise
>          (#3) Check if each host cell can provide enough memory for
>          each instance cell, ignoring pagesizes
> 
> This also has the side-effect of allowing instances with hugepages and
> instances with a NUMA topology but no hugepages to co-exist on the same
> host, because the latter will now be aware of hugepages and won't
> consume them. However, there are a couple of issues with this:
> 
>    1. It breaks overcommit for instances without pagesize request
>       running on hosts with different pagesizes. This is because we don't
>       allow overcommit for hugepages, but case (#2) above means we are now
>       reusing the same functions previously used for actual hugepage
>       checks to check for regular 4k pages
>    2. It doesn't fix the issue when non-NUMA instances exist on the same
>       host as NUMA instances with hugepages. The non-NUMA instances don't
>       run through any of the code above, meaning they're still not
>       pagesize aware
> 
> We could probably fix issue (1) by modifying those hugepage functions
> we're using to allow overcommit via a flag that we pass for case (#2).
> We can mitigate issue (2) by advising operators to split hosts into
> aggregates for 'hw:mem_page_size' set or unset (in addition to
> 'hw:cpu_policy' set to dedicated or shared/unset). I need to check but
> I think this may be the case in some docs (sean-k-mooney said Intel
> used to do this. I don't know about Red Hat's docs or upstream). In
> addition, we did actually called that out in the original spec:
> 
> 
https://specs.openstack.org/openstack/nova-specs/specs/juno/approved/virt-driver-large-pages.html#other-deployer-impact
> 
> However, if we're doing that for non-NUMA instances, one would have to
> question why the patch is necessary/acceptable for NUMA instances. For
> what it's worth, a longer fix would be to start tracking hugepages in a
> non-NUMA aware way too but that's a lot more work and doesn't fix the
> issue now.
> 
> As such, my question is this: should be look at fixing issue (1) and
> documenting issue (2), or should we revert the thing wholesale until we
> work on a solution that could e.g. let us track hugepages via placement
> and resolve issue (2) too.
for what its worth the review in question https://review.openstack.org/#/c/532168
actully attempts to implement option 1/ form https://bugs.launchpad.net/nova/+bug/1439247

the frist time i tried to fix issue 2
was with my proposal for the AggregateTypeExtraSpecsAffinityFilter
https://review.openstack.org/#/c/183876/4/specs/liberty/approved/aggregate-flavor-extra-spec-affinity-filter.rst
which became the out tree AggregateInstanceTypeFilter after 3 cycles of trying to get it upstream.

https://github.com/openstack/nfv-filters/blob/master/nfv_filters/nova/scheduler/filters/aggregate_instance_type_filter.py

the AggregateTypeExtraSpecsAffinityFilter or AggregateInstanceTypeFilter was a filter we deveopled
spcifically to enforce seperation of instnace that uses explict memoery pages form those that did not
and also to cater for dpdk hugepage requirement and enforce seperation of pinnind an unpinned guests.

we finally got approval to publish a blog on the topin in january of 2017 
https://software.intel.com/en-us/blogs/2017/01/04/filter-by-host-aggregate-metadata-or-by-image-extra-specs
based on the the content in the second version of the spec 
https://review.openstack.org/#/c/314097/12/specs/newton/approved/aggregate-instance-type-filter.rst
this filter was used in semi production 4g trial deployment in addtion to lab use with some parthers i was working with
at the time but we decided to stop supporting it with the assumtion placemnet would solve it :)

alot of the capablities of out of tree filter could likely be acived with some extentions to placemnt but are not
support by placement today. i have raised the topic in the past of required tratis on a resouce provider that need to be
present to in the request for an alloction to be made against the resouce provide. similar i have raised the idea of
forbinon traits on a resocue provide that eliminates the resouce provide as a candiatie if present in the requerst.

this is an inverse relation ship of the required and forbidin traits we have to day but is what the filter
we implmented in 2015 did before placment using aggreate metatdata. i think there is a generalised problem
statement here that would be a ligiame usecase for placement out side of simply tracking hugepages (or preferable memory
of all page sizes) in placement.

i would be infavor of fixing oversubsiption which is issue 1 this cycle as that is clearly  a bug as a short term
solution which we could backport and exploring adressing both issue 1 and 2 with placement or by repoposing the out of
tree filter if placement deamed it out of scope. That said i too am interest to hear what other think especially the
placement folks. you can jsut use host aggrates and existing filter to address issue 2 but its really easy to get wrong
and its not very well documented that it is required.

> 
> Thoughts?
> Stephen
> 
> 




More information about the openstack-discuss mailing list