[openstack-dev] [nova] [placement] Ocata upgrade procedure and problems when it's optional in Newton
Sylvain Bauza
sbauza at redhat.com
Tue Jan 10 13:49:01 UTC 2017
Aloha folks,
Recently, I was discussing with TripleO folks. Disclaimer, I don't think
it's only a TripleO related discussion but rather a larger one for all
our deployers.
So, the question I was asked was about how to upgrade from Newton to
Ocata for the Placement API when the deployer is not using yet the
Placement API for Newton (because it was optional in Newton).
The quick answer was to say "easy, just upgrade the service and run the
placement API *before* the scheduler upgrade". That's because we're
working on a change for the scheduler calling the Placement API instead
of getting all the compute nodes [1]
That said, I thought about something else : wait, the Newton compute
nodes work with the Placement API, cool. Cool, but what if the Placement
API is optional in Newton ? Then, the Newton computes are stopping to
call the Placement API because of a nice decorator [2] (okay with me)
Then, imagine the problem for the upgrade : given we don't have
deployers running the Placement API in Newton, they would need to
*first* deploy the (Newton or Ocata) Placement service, then SIGHUP all
the Newton compute nodes to have them reporting the resources (and
creating the inventories), then wait for some minutes that all the
inventories are reported, and then upgrade all the services (but the
compute nodes of course) to Ocata, including the scheduler service.
The above looks a different upgrade policy, right?
- Either we say you need to run the Newton placement service *before*
upgrading - and in that case, the Placement service is not optional for
Newton, right?
- Or, we say you need to run the Ocata placement service and then
restart the compute nodes *before* upgrading the services - and that's a
very different situation than the current upgrade way.
For example, I know it's not a Nova stuff, but most of our deployers
have what they say "controllers" vs. "compute" services, ie. all the
Nova services but computes running on a single (or more) machine(s). In
that case, the "controller" upgrade is monotonic and all the services
are upgraded and restarted at the same stage. If so, that looks
difficult for those deployers to just be asked to have a very different
procedure.
Anyway, I think we need to carefully consider that, and probably find
some solutions. For example, we could imagine (disclaimer #2, that's
probably silly solutions, but that's the ones I'm thinking now) :
- a DB migration for creating the inventories and allocations before
upgrading (ie. not asking the computes to register themselves to the
placement API). That would be terrible because it's a data upgrade, I
know...
- having the scheduler having a backwards compatible behaviour in [1],
ie. trying to call the Placement API for getting the list of RPs or
failback to calling all the ComputeNodes if that's not possible. But
that would mean that the Placement API is still optional for Ocata :/
- merging the scheduler calling the Placement API [1] in a point
release after we deliver Ocata (and still make the Placement API
mandatory for Ocata) so that we would be sure that all computes are
reporting their status to the Placement once we restart the scheduler in
the point release.
Thoughts ?
-Sylvain
[1] https://review.openstack.org/#/c/417961/
[2]
https://github.com/openstack/nova/blob/180e6340a595ec047c59365465f36fed7a669ec3/nova/scheduler/client/report.py#L40-L67
More information about the OpenStack-dev
mailing list