[openstack-dev] [nova] heads up to users of Aggregate[Core|Ram|Disk]Filter: behavior change in >= Ocata

Jay Pipes jaypipes at gmail.com
Thu Jan 18 20:49:09 UTC 2018


On 01/18/2018 03:06 PM, Logan V. wrote:
> We have used aggregate based scheduler filters since deploying our
> cloud in Kilo. This explains the unpredictable scheduling we have seen
> since upgrading to Ocata. Before this post, was there some indication
> I missed that these filters can no longer be used? Even now reading
> the Ocata release notes[1] or checking the filter scheduler docs[2] I
> cannot find any indication that AggregateCoreFilter,
> AggregateRamFilter, and AggregateDiskFilter are useless in Ocata+. If
> I missed something I'd like to know where it is so I can avoid that
> mistake again!

We failed to provide a release note about it. :( That's our fault and I 
apologize.

> Just to make sure I understand correctly, given this list of filters
> we used in Newton:
> AggregateInstanceExtraSpecsFilter,AggregateNumInstancesFilter,AggregateCoreFilter,AggregateRamFilter,RetryFilter,AvailabilityZoneFilter,RamFilter,ComputeFilter,ComputeCapabilitiesFilter,ImagePropertiesFilter,ServerGroupAntiAffinityFilter,ServerGroupAffinityFilter
> 
> I should remove AggregateCoreFilter, AggregateRamFilter, and RamFilter
> from the list because they are no longer useful, and replace them with
> the appropriate nova.conf settings instead, correct?

Yes, correct.

> What about AggregateInstanceExtraSpecsFilter and
> AggregateNumInstancesFilter? Do these still work?

Yes.

Best,
-jay

> Thanks
> Logan
> 
> [1] https://docs.openstack.org/releasenotes/nova/ocata.html
> [2] https://docs.openstack.org/ocata/config-reference/compute/schedulers.html
> 
> On Wed, Jan 17, 2018 at 7:57 AM, Sylvain Bauza <sbauza at redhat.com> wrote:
>>
>>
>> On Wed, Jan 17, 2018 at 2:22 PM, Jay Pipes <jaypipes at gmail.com> wrote:
>>>
>>> On 01/16/2018 08:19 PM, Zhenyu Zheng wrote:
>>>>
>>>> Thanks for the info, so it seems we are not going to implement aggregate
>>>> overcommit ratio in placement at least in the near future?
>>>
>>>
>>> As @edleafe alluded to, we will not be adding functionality to the
>>> placement service to associate an overcommit ratio with an aggregate. This
>>> was/is buggy functionality that we do not wish to bring forward into the
>>> placement modeling system.
>>>
>>> Reasons the current functionality is poorly architected and buggy
>>> (mentioned in @melwitt's footnote):
>>>
>>> 1) If a nova-compute service's CONF.cpu_allocation_ratio is different from
>>> the host aggregate's cpu_allocation_ratio metadata value, which value should
>>> be considered by the AggregateCoreFilter filter?
>>>
>>> 2) If a nova-compute service is associated with multiple host aggregates,
>>> and those aggregates contain different values for their cpu_allocation_ratio
>>> metadata value, which one should be used by the AggregateCoreFilter?
>>>
>>> The bottom line for me is that the AggregateCoreFilter has been used as a
>>> crutch to solve a **configuration management problem**.
>>>
>>> Instead of the configuration management system (Puppet, etc) setting
>>> nova-compute service CONF.cpu_allocation_ratio options *correctly*, having
>>> the admin set the HostAggregate metadata cpu_allocation_ratio value is
>>> error-prone for the reasons listed above.
>>>
>>
>> Well, the main cause why people started to use AggregateCoreFilter and
>> others is because pre-Newton, it was litterally impossible to assign
>> different allocation ratios in between computes except if you were grouping
>> them in aggregates and using those filters.
>> Now that ratios are per-compute, there is no need to keep those filters
>> except if you don't touch computes nova.conf's so that it defaults to the
>> scheduler ones. The crazy usecase would be like "I have 1000+ computes and I
>> just want to apply specific ratios to only one or two" but then, I'd second
>> Jay and say "Config management is the solution to your problem".
>>
>>
>>>
>>> Incidentally, this same design flaw is the reason that availability zones
>>> are so poorly defined in Nova. There is actually no such thing as an
>>> availability zone in Nova. Instead, an AZ is merely a metadata tag (or a
>>> CONF option! :( ) that may or may not exist against a host aggregate.
>>> There's lots of spaghetti in Nova due to the decision to use host aggregate
>>> metadata for availability zone information, which should have always been
>>> the domain of a **configuration management system** to set. [*]
>>>
>>
>> IMHO, not exactly the root cause why we have spaghetti code for AZs. I
>> rather like the idea to see an availability zone as just a user-visible
>> aggregate, because it makes things simple to understand.
>> What the spaghetti code is due to is because the transitive relationship
>> between an aggregate, a compute and an instance is misunderstood and we
>> introduced the notion of "instance AZ" which is a fool. Instances shouldn't
>> have a field saying "here is my AZ", it should rather be a flag saying "what
>> the user wanted as AZ ? (None being a choice) "
>>
>>
>>> In the Placement service, we have the concept of aggregates, too. However,
>>> in Placement, an aggregate (note: not "host aggregate") is merely a grouping
>>> mechanism for resource providers. Placement aggregates do not have any
>>> attributes themselves -- they merely represent the relationship between
>>> resource providers. Placement aggregates suffer from neither of the above
>>> listed design flaws because they are not buckets for metadata.
>>>
>>> ok </rant>.
>>>
>>> Best,
>>> -jay
>>>
>>> [*] Note the assumption on line 97 here:
>>>
>>>
>>> https://github.com/openstack/nova/blob/master/nova/availability_zones.py#L96-L100
>>>
>>>> On Wed, Jan 17, 2018 at 5:24 AM, melanie witt <melwittt at gmail.com
>>>> <mailto:melwittt at gmail.com>> wrote:
>>>>
>>>>      Hello Stackers,
>>>>
>>>>      This is a heads up to any of you using the AggregateCoreFilter,
>>>>      AggregateRamFilter, and/or AggregateDiskFilter in the filter
>>>>      scheduler. These filters have effectively allowed operators to set
>>>>      overcommit ratios per aggregate rather than per compute node in <=
>>>>      Newton.
>>>>
>>>>      Beginning in Ocata, there is a behavior change where aggregate-based
>>>>      overcommit ratios will no longer be honored during scheduling.
>>>>      Instead, overcommit values must be set on a per compute node basis
>>>>      in nova.conf.
>>>>
>>>>      Details: as of Ocata, instead of considering all compute nodes at
>>>>      the start of scheduler filtering, an optimization has been added to
>>>>      query resource capacity from placement and prune the compute node
>>>>      list with the result *before* any filters are applied. Placement
>>>>      tracks resource capacity and usage and does *not* track aggregate
>>>>      metadata [1]. Because of this, placement cannot consider
>>>>      aggregate-based overcommit and will exclude compute nodes that do
>>>>      not have capacity based on per compute node overcommit.
>>>>
>>>>      How to prepare: if you have been relying on per aggregate
>>>>      overcommit, during your upgrade to Ocata, you must change to using
>>>>      per compute node overcommit ratios in order for your scheduling
>>>>      behavior to stay consistent. Otherwise, you may notice increased
>>>>      NoValidHost scheduling failures as the aggregate-based overcommit is
>>>>      no longer being considered. You can safely remove the
>>>>      AggregateCoreFilter, AggregateRamFilter, and AggregateDiskFilter
>>>>      from your enabled_filters and you do not need to replace them with
>>>>      any other core/ram/disk filters. The placement query takes care of
>>>>      the core/ram/disk filtering instead, so CoreFilter, RamFilter, and
>>>>      DiskFilter are redundant.
>>>>
>>>>      Thanks,
>>>>      -melanie
>>>>
>>>>      [1] Placement has been a new slate for resource management and prior
>>>>      to placement, there were conflicts between the different methods for
>>>>      setting overcommit ratios that were never addressed, such as, "which
>>>>      value to take if a compute node has overcommit set AND the aggregate
>>>>      has it set? Which takes precedence?" And, "if a compute node is in
>>>>      more than one aggregate, which overcommit value should be taken?"
>>>>      So, the ambiguities were not something that was desirable to bring
>>>>      forward into placement.
>>>>
>>>>
>>>> __________________________________________________________________________
>>>>      OpenStack Development Mailing List (not for usage questions)
>>>>      Unsubscribe:
>>>>      OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>>
>>>> <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>>>>      http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>      <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> __________________________________________________________________________
>>>> OpenStack Development Mailing List (not for usage questions)
>>>> Unsubscribe:
>>>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list