[openstack-dev] [nova] [placement] [operators] Optional resource asking or not?

Sylvain Bauza sbauza at redhat.com
Wed Jan 25 15:03:45 UTC 2017



Le 25/01/2017 05:10, Matt Riedemann a écrit :
> On 1/24/2017 2:57 PM, Matt Riedemann wrote:
>> On 1/24/2017 2:38 PM, Sylvain Bauza wrote:
>>>
>>> It's litterally 2 days before FeatureFreeze and we ask operators to
>>> change their cloud right now ? Looks difficult to me and like I said in
>>> multiple places by email, we have a ton of assertions saying it's
>>> acceptable to have not all the filters.
>>>
>>> -Sylvain
>>>
>>
>> I'm not sure why feature freeze in two days is going to make a huge
>> amount of difference here. Most large production clouds are probably
>> nowhere near trunk (I'm assuming most are on Mitaka or older at this
>> point just because of how deployments seem to tail the oldest supported
>> stable branch). Or are you mainly worried about deployment tooling
>> projects, like TripleO, needing to deal with this now?
>>
>> Anyone upgrading to Ocata is going to have to read the release notes and
>> assess the upgrade impacts regardless of when we make this change, be
>> that Ocata or Pike.
>>
>> Sylvain, are you suggesting that for Ocata if, for example, the
>> CoreFilter isn't in the list of enabled scheduler filters, we don't make
>> the request for VCPU when filtering resource providers, but we also log
>> a big fat warning in the n-sch logs saying we're going to switch over in
>> Pike and that cpu_allocation_ratio needs to be configured because the
>> CoreFilter is going to be deprecated in Ocata and removed in Pike?
>>
>> [1]
>> https://specs.openstack.org/openstack/nova-specs/specs/ocata/approved/resource-providers-scheduler-db-filters.html#other-deployer-impact
>>
>>
>>
> 
> To recap the discussion we had in IRC today, we're moving forward with
> the original plan of the *filter scheduler* always requesting VCPU,
> MEMORY_MB and DISK_GB* regardless of the enabled filters. The main
> reason being there isn't a clear path forward on straddling releases to
> deprecate or make decisions based on the enabled filters and provide a
> warning that makes sense.
> 
> For example, we can't deprecate the filters (at least yet) because the
> *caching scheduler* is still using them (it's not using placement yet).
> And if we logged a warning if you don't have the CoreFilter in
> CONF.filter_scheduler.enabled_filters, for example, but we don't want
> you to have it in that list, then what are you supposed to do? i.e. the
> goal is to not have the legacy primitive resource filters enabled for
> the filter scheduler in Pike, so you get into this weird situation of
> whether or not you have them enabled or not before Pike, and in what
> cases do you log a warning that makes sense. So we agreed at this point
> it's just simpler to say that if you don't enable these filters today,
> you're going to need to configure the appropriate allocation ratio
> configuration option prior to upgrading to Ocata. That will be in the
> upgrade section of the release notes and we can probably also work it
> into the placement devref as a deployment note. We can also work this
> into the nova-status upgrade check CLI.
> 
> *DISK_GB is special since we might have a flavor that's not specifying
> any disk or a resource provider with no DISK_GB allocations if the
> instances are all booted from volumes.
> 

Update on that agreement : I made the necessary modification in the
proposal [1] for not verifying the filters. We now send a request to the
Placement API by introspecting the flavor and we get a list of potential
destinations.

When I began doing that modification, I know there was a functional test
about server groups that needed modifications to match our agreement. I
consequently made that change located in a separate patch [2] as a
prerequisite for [1].

I then spotted a problem that we didn't identified when discussing :
when checking a destination, the legacy filters for CPU, RAM and disk
don't verify the maximum capacity of the host, they only multiple the
total size by the allocation ratio, so our proposal works for them.
Now, when using the placement service, it fails because somewhere in the
DB call needed for returning the destinations, we also verify a specific
field named max_unit [3].

Consequently, the proposal we agreed is not feature-parity between
Newton and Ocata. If you follow our instructions, you will still get
different result from a placement perspective between what was in Newton
and what will be Ocata.

Technically speaking, the functional test is a canary bird, telling you
that you get NoValidHosts while it was working previously.

After that I'm stuck. We can be discussing for a while about whether all
of that is sane or not, but the fact is, there is a discrepancy.

Honestly, I don't know what to do unless considering that we're now so
close to the Feature Freeze that it's becoming an all-or-none situation.
My only silver bullet I still have could be considering a placement
failure as non blocker and fallbacking to calling the full list of nodes
for Ocata. I know that sucks, but I don't see how to unblock us in time
for getting [1] landed before tomorrow.

-Sylvain (exhausted, tired and nervous).

[1] https://review.openstack.org/#/c/417961/
[2] https://review.openstack.org/#/c/425185/
[3]
https://github.com/openstack/nova/blob/c9eb9530314d047f5013941ebcfd5ef0192a9dc3/nova/objects/resource_provider.py#L615



More information about the OpenStack-dev mailing list