[openstack-dev] [nova] Order of n-api (placement) and n-sch upgrades for Ocata
Matt Riedemann
mriedem at linux.vnet.ibm.com
Thu Jan 19 16:00:49 UTC 2017
On 1/19/2017 9:43 AM, Sylvain Bauza wrote:
>
>
> Le 19/01/2017 16:27, Matt Riedemann a écrit :
>> Sylvain and I were talking about how he's going to work placement
>> microversion requests into his filter scheduler patch [1]. He needs to
>> make requests to the placement API with microversion 1.4 [2] or later
>> for resource provider filtering on specific resource classes like VCPU
>> and MEMORY_MB.
>>
>> The question was what happens if microversion 1.4 isn't available in the
>> placement API, i.e. the nova-scheduler is running Ocata code now but the
>> placement service is running Newton still.
>>
>> Our rolling upgrades doc [3] says:
>>
>> "It is safest to start nova-conductor first and nova-api last."
>>
>> But since placement is bundled with n-api that would cause issues since
>> n-sch now depends on the n-api code.
>>
>> If you package the placement service separately from the nova-api
>> service then this is probably not an issue. You can still roll out n-api
>> last and restart it last (for control services), and just make sure that
>> placement is upgraded before nova-scheduler (we need to be clear about
>> that in [3]).
>>
>> But do we have any other issues if they are not packaged separately? Is
>> it possible to install the new code, but still only restart the
>> placement service before nova-api? I believe it is, but want to ask this
>> out loud.
>>
>> I think we're probably OK here but I wanted to ask this out loud and
>> make sure everyone is aware and can think about this as we're a week
>> from feature freeze. We also need to look into devstack/grenade because
>> I'm fairly certain that we upgrade n-sch *before* placement in a grenade
>> run which will make any issues here very obvious in [1].
>>
>> [1] https://review.openstack.org/#/c/417961/
>> [2]
>> http://docs.openstack.org/developer/nova/placement.html#filter-resource-providers-having-requested-resource-capacity
>>
>> [3]
>> http://docs.openstack.org/developer/nova/upgrade.html#rolling-upgrade-process
>>
>>
>
> I thought out loud in the nova channel at the following possibility :
> since we always ask to upgrade n-cpus *AFTER* upgrading our other
> services, we could imagine to allow the nova-scheduler gently accept to
> have a placement service be Newton *UNLESS* you have Ocata computes.
>
> On other technical words, the scheduler getting a response from the
> placement service is an hard requirement for Ocata. That said, if the
> response code is a 400 with a message saying that the schema is
> incorrect, it would be checking the max version of all the computes and
> then :
> - either the max version is Newton and then call back the
> ComputeNodeList.get_all() for getting the list of nodes
> - or, the max version is Ocata (at least one node is upgraded), and
> then we would throw a NoValidHosts
>
> That way, the upgrade path would be :
> 1/ upgrade your conductor
> 2/ upgrade all your other services but n-cpus (we could upgrade and
> restart n-sch before n-api, that would still work, or the contrary would
> be fine too)
> 3/ rolling upgrade your n-cpus
>
> I think we would keep then the existing upgrade path and we would still
> have the placement service be mandatory for Ocata.
>
> Thoughts ?
> -Sylvain
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
I don't like basing the n-sch decision on the service version of the
computes, because the computes will keep trying to connect to the
placement service until it's available, but not fail. That doesn't
really mean that placement is new enough for the scheduler to use the
1.4 microversion.
So IMO we either charge forward as planned and make it clear in the docs
that for Ocata, the placement service must be upgraded *before*
nova-scheduler, or we punt and provide a fallback to just pulling all
compute nodes from the database if we can't make the 1.4 request to
placement. Given my original post here, I'd prefer to charge forward
unless it becomes clear that is not going to work, or is at least going
to be very painful.
--
Thanks,
Matt Riedemann
More information about the OpenStack-dev
mailing list