[openstack-dev] [neutron] Neutron rolling upgrade - are we there yet?
ihrachys at redhat.com
Thu Oct 15 15:23:11 UTC 2015
thanks a lot for caring about upgrades!
There are a lot of good points below. As you noted, surprisingly, we seem to have rolling upgrades working for RPC layer. Before we go into complicating database workflow by doing oslo.versionedobjects transition heavy-lifting, I would like us to spend cycles on making sure rolling upgrades work not just surprisingly, but also covered with appropriate gating (I speak grenade).
I also feel that upgrades are in lots of ways not only a technical issue, but a cultural one too. You should have reviewers being aware of all the moving parts, and how a seemingly innocent change can break the flow. That’s why I plan to start on a devref page specifically about upgrades, where we could lay ground about which scenarios we should support, and those we should not (f.e. we have plenty of compatibility code in agents that to handle old controller scenario, which should not be supported); how all pieces interact and behave in transition, and what to look for during reviews. Hopefully, once such a page is up and read by folks, we will be able to have more meaningful conversation about our upgrade strategy.
> On 14 Oct 2015, at 20:10, Korzeniewski, Artur <artur.korzeniewski at intel.com> wrote:
> Hi all,
> I would like to gather all upgrade activities in Neutron in one place, in order to summarizes the current status and future activities on rolling upgrades in Mitaka.
If you think it’s worth it, we can start up a new etherpad page to gather upgrade ideas and things to do.
> 1. RPC versioning
> a. It is already implemented in Neutron.
> b. TODO: To have the rolling upgrade we have to implement the RPC version pinning in conf.
> i. I’m not a big fan of this solution, but we can work out better idea if needed.
As Dan pointed out, and as I think Miguel was thinking about, we can have pin defined by agents in the cluster. Actually, we can have per agent pin.
> c. Possible unit/functional tests to catch RPC version incompatibilities between RPC revisions.
> d. TODO: Multi-node Grenade job to have rolling upgrades covered in CI.
That is not for unit or functional test level.
As you mentioned, we already have grenade project that is designed to test upgrades. To validate RPC compatibility on rolling upgrade we would need so called ‘partial’ job (when different components are running with different versions; in case of neutron it would mean a new controller and old agents). The job is present in nova gate and validates RPC compatibility.
As far as I know, Russell Bryant was looking into introducing the job for neutron, but was blocked by ongoing grenade refactoring to support partial upgrades ‘the right way’ (using multinode setups). I think that we should check with grenade folks on that matter, I have heard start of Mitaka was ETA for this work to complete.
> 2. Message content versioning – versioned objects
> a. TODO: implement Oslo Versionobject in Mitaka cycle. The interesting entities to be implemented: network, subnet, port, security groups…
Though we haven’t touched base neutron resources in Liberty, we introduced oslo.versionedobjects based NeutronObject class during Liberty as part of QoS effort. I plan to expand on that work during Mitaka.
The existing code for QoS resources can be found at:
> b. Will OVO have impact on vendor plugins?
It surely can have significant impact, but hopefully dict compat layer should make transition more smooth:
> c. Be strict on changes in version objects in code review, any change in object structure should increment the minor (backward-compatible) or major (breaking change) RPC version.
That’s assuming we have a clear mapping of objects onto current RPC interfaces, which is not obvious. Another problem we would need to solve is core resource extensions (currently available in ml2 only), like qos or port_security, that modify resources based on controller configuration.
> d. Indirection API – message from newer format should be translated to older version by neutron server.
For QoS, we used a new object agnostic subscriber mechanism to propagate changes applied to QoS objects into agents: http://docs.openstack.org/developer/neutron/devref/rpc_callbacks.html
It is already (expected) to downgrade objects based on agent version (note it’s not implemented yet, but will surely be ready during Mitaka):
> 3. Database migration
> a. Online schema migration was done in Liberty release, any work left to do?
Nothing specific, maybe a bug or two here and there.
> b. TODO: Online data migration to be introduced in Mitaka cycle.
> i. Online data migration can be done during normal operation on the data.
> ii. There should be also the script to invoke the data migration in the background.
> c. Currently the contract phase is doing the data migration. But since the contract phase should be run offline, we should move the data migration to preceding step. Also the contract phase should be blocked if there is still relevant data in removed entities.
Yes, we definitely need a stop mechanism first, then play with data migrations. I don’t think we can consider data migration before we have a way to hide bloody migration details behind abstract resources (read: versioned objects). Realistically, I would consider data migration too far at the moment to consider as a todo step. But we definitely should look forward to it.
> i. Contract phase can be executed online, if there is all new code running in setup.
I am not sure how it’s possible. Do you think it’s realistic to expect controller to resolve a lot of checks that usually db does (constraints?) while schema is not enforced?
> d. The other strategy is to not drop tables, alter names or remove the columns from the DB – what’s in, it’s in. We should put more attention on code reviews, merge only additive changes and avoid questionable DB modification.
I don’t like that approach. It suggests there is no way back if we screw something. Having a short contract phase which is offline seems to me like a reasonable approach. Anyway, it can be reconsidered after we have the elephant in the room solved (the data migration problem).
> e. The Neutron server should be updated first, in order to do data translation between old format into new schema. When doing this, we can be sure that old data would not be inserted into old DB structures.
To my taste, that’s ^ the most clear way to go.
> I have performed the manual Kilo to Liberty upgrade, both in operational manner and in code review of the RPC APIs. All is working fine.
> We can have some discussion on cross-project session  or we can also review any issues with Neutron upgrade in Friday’s unplugged session .
I will be more than happy to sit with folks interested in our upgrade story and go write a plan for Mitaka.
Please ping me on irc (ihrachys), and we will think how we can sync effectively and push the effort forward. (btw I am located in Czech Republic, so we should be in the same time zone).
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
More information about the OpenStack-dev