[ops][nova][placement] NUMA topology vs non-NUMA workloads

Tim Bell Tim.Bell at cern.ch
Fri May 31 07:51:03 UTC 2019


Chris,

From the CERN set up, I think there are dedicated cells for NUMA optimised configurations (but maybe one of the engineers on the team could confirm to be sure)

    Q: How important, in your cloud, is it to co-locate guests needing a
       NUMA topology with guests that do not? A review of documentation
       (upstream and vendor) shows differing levels of recommendation on
       this, but in many cases the recommendation is to not do it.
    
A: no co-location currently

    Q: If your answer to the above is "we must be able to do that": How
       important is it that your cloud be able to pack workloads as tight
       as possible? That is: If there are two NUMA nodes and each has 2
       VCPU free, should a 4 VCPU demanding non-NUMA workload be able to
       land there? Or would you prefer that not happen?

A: not applicable
    
Q:   If the answer to the first question is "we can get by without
       that" is it satisfactory to be able to configure some hosts as NUMA
       aware and others as not, as described in the "NUMA topology with
       RPs" spec [1]? In this set up some non-NUMA workloads could end up
       on a NUMA host (unless otherwise excluded by traits or aggregates),
       but only when there was contiguous resource available.

A: I think this would be OK

Tim
-----Original Message-----
From: Chris Dent <cdent+os at anticdent.org>
Reply-To: "openstack-discuss at lists.openstack.org" <openstack-discuss at lists.openstack.org>
Date: Thursday, 30 May 2019 at 14:57
To: "OpenStack-discuss at lists.openstack.org" <OpenStack-discuss at lists.openstack.org>
Subject: [ops][nova][placement] NUMA topology vs non-NUMA workloads

    
    This message is primarily addressed at operators, and of those,
    operators who are interested in effectively managing and mixing
    workloads that care about NUMA with workloads that do not. There are
    some questions within, after some background to explain the issue.
    
    At the PTG, Nova and Placement developers made a commitment to more
    effectively manage NUMA topologies within Nova and Placement. On the
    placement side this resulted in a spec which proposed several
    features that would enable more expressive queries when requesting
    allocation candidates (places for workloads to go), resulting in
    fewer late scheduling failures.
    
    At first there was one spec that discussed all the features. This
    morning it was split in two because one of the features is proving
    hard to resolve. Those two specs can be found at:
    
    * https://review.opendev.org/658510 (has all the original discussion)
    * https://review.opendev.org/662191 (the less contentious features split out)
    
    After much discussion, we would prefer to not do the feature
    discussed in 658510. Called 'can_split', it would allow specified
    classes of resource (notably VCPU and memory) to be split across
    multiple numa nodes when each node can only contribute a portion of
    the required resources and where those resources are modelled as
    inventory on the NUMA nodes, not the host at large.
    
    While this is a good idea in principle it turns out (see the spec)
    to cause many issues that require changes throughout the ecosystem,
    for example enforcing pinned cpus for workloads that would normally
    float. It's possible to make the changes, but it would require
    additional contributors to join the effort, both in terms of writing
    the code and understanding the many issues.
    
    So the questions:
    
    * How important, in your cloud, is it to co-locate guests needing a
       NUMA topology with guests that do not? A review of documentation
       (upstream and vendor) shows differing levels of recommendation on
       this, but in many cases the recommendation is to not do it.
    
    * If your answer to the above is "we must be able to do that": How
       important is it that your cloud be able to pack workloads as tight
       as possible? That is: If there are two NUMA nodes and each has 2
       VCPU free, should a 4 VCPU demanding non-NUMA workload be able to
       land there? Or would you prefer that not happen?
    
    * If the answer to the first question is "we can get by without
       that" is it satisfactory to be able to configure some hosts as NUMA
       aware and others as not, as described in the "NUMA topology with
       RPs" spec [1]? In this set up some non-NUMA workloads could end up
       on a NUMA host (unless otherwise excluded by traits or aggregates),
       but only when there was contiguous resource available.
    
    This latter question articulates the current plan unless responses
    to this message indicate it simply can't work or legions of
    assistance shows up. Note that even if we don't do can_split, we'll
    still be enabling significant progress with the other features
    described in the second spec [2].
    
    Thanks for your help in moving us in the right direction.
    
    [1] https://review.opendev.org/552924
    [2] https://review.opendev.org/662191
    -- 
    Chris Dent                       ٩◔̯◔۶           https://anticdent.org/
    freenode: cdent



More information about the openstack-discuss mailing list