[openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

Jay Pipes jaypipes at gmail.com
Wed Jan 17 13:22:29 UTC 2018


On 01/16/2018 08:19 PM, Zhenyu Zheng wrote:
> Thanks for the info, so it seems we are not going to implement aggregate 
> overcommit ratio in placement at least in the near future?

As @edleafe alluded to, we will not be adding functionality to the 
placement service to associate an overcommit ratio with an aggregate. 
This was/is buggy functionality that we do not wish to bring forward 
into the placement modeling system.

Reasons the current functionality is poorly architected and buggy 
(mentioned in @melwitt's footnote):

1) If a nova-compute service's CONF.cpu_allocation_ratio is different 
from the host aggregate's cpu_allocation_ratio metadata value, which 
value should be considered by the AggregateCoreFilter filter?

2) If a nova-compute service is associated with multiple host 
aggregates, and those aggregates contain different values for their 
cpu_allocation_ratio metadata value, which one should be used by the 
AggregateCoreFilter?

The bottom line for me is that the AggregateCoreFilter has been used as 
a crutch to solve a **configuration management problem**.

Instead of the configuration management system (Puppet, etc) setting 
nova-compute service CONF.cpu_allocation_ratio options *correctly*, 
having the admin set the HostAggregate metadata cpu_allocation_ratio 
value is error-prone for the reasons listed above.

Incidentally, this same design flaw is the reason that availability 
zones are so poorly defined in Nova. There is actually no such thing as 
an availability zone in Nova. Instead, an AZ is merely a metadata tag 
(or a CONF option! :( ) that may or may not exist against a host 
aggregate. There's lots of spaghetti in Nova due to the decision to use 
host aggregate metadata for availability zone information, which should 
have always been the domain of a **configuration management system** to 
set. [*]

In the Placement service, we have the concept of aggregates, too. 
However, in Placement, an aggregate (note: not "host aggregate") is 
merely a grouping mechanism for resource providers. Placement aggregates 
do not have any attributes themselves -- they merely represent the 
relationship between resource providers. Placement aggregates suffer 
from neither of the above listed design flaws because they are not 
buckets for metadata.

ok </rant>.

Best,
-jay

[*] Note the assumption on line 97 here:

https://github.com/openstack/nova/blob/master/nova/availability_zones.py#L96-L100

> On Wed, Jan 17, 2018 at 5:24 AM, melanie witt <melwittt at gmail.com 
> <mailto:melwittt at gmail.com>> wrote:
> 
>     Hello Stackers,
> 
>     This is a heads up to any of you using the AggregateCoreFilter,
>     AggregateRamFilter, and/or AggregateDiskFilter in the filter
>     scheduler. These filters have effectively allowed operators to set
>     overcommit ratios per aggregate rather than per compute node in <=
>     Newton.
> 
>     Beginning in Ocata, there is a behavior change where aggregate-based
>     overcommit ratios will no longer be honored during scheduling.
>     Instead, overcommit values must be set on a per compute node basis
>     in nova.conf.
> 
>     Details: as of Ocata, instead of considering all compute nodes at
>     the start of scheduler filtering, an optimization has been added to
>     query resource capacity from placement and prune the compute node
>     list with the result *before* any filters are applied. Placement
>     tracks resource capacity and usage and does *not* track aggregate
>     metadata [1]. Because of this, placement cannot consider
>     aggregate-based overcommit and will exclude compute nodes that do
>     not have capacity based on per compute node overcommit.
> 
>     How to prepare: if you have been relying on per aggregate
>     overcommit, during your upgrade to Ocata, you must change to using
>     per compute node overcommit ratios in order for your scheduling
>     behavior to stay consistent. Otherwise, you may notice increased
>     NoValidHost scheduling failures as the aggregate-based overcommit is
>     no longer being considered. You can safely remove the
>     AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter
>     from your enabled_filters and you do not need to replace them with
>     any other core/ram/disk filters. The placement query takes care of
>     the core/ram/disk filtering instead, so CoreFilter, RamFilter, and
>     DiskFilter are redundant.
> 
>     Thanks,
>     -melanie
> 
>     [1] Placement has been a new slate for resource management and prior
>     to placement, there were conflicts between the different methods for
>     setting overcommit ratios that were never addressed, such as, "which
>     value to take if a compute node has overcommit set AND the aggregate
>     has it set? Which takes precedence?" And, "if a compute node is in
>     more than one aggregate, which overcommit value should be taken?"
>     So, the ambiguities were not something that was desirable to bring
>     forward into placement.
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list