[openstack-dev] [neutron] Neutron rolling upgrade - are we there yet?
Ihar Hrachyshka
ihrachys at redhat.com
Wed Oct 21 12:20:16 UTC 2015
Hi folks,
I see there is significant interest in neutron upgrade strategy. I suggest we meet on summit on Fri during ‘unplugged’ track in a small group and come up with a plan for Mitaka and beyond. I start to think that the work we can expect is quite enormous, and some coordination is due. Maybe we’ll need to form a close subteam to track the effort in Mitaka.
I started an etherpad to track upgrade strategy discussions at:
https://etherpad.openstack.org/p/neutron-upgrade-strategy
Note that Artur already added the track to unplugged etherpad:
https://etherpad.openstack.org/p/mitaka-neutron-unplugged-track
See you in Tokyo,
Ihar
> On 19 Oct 2015, at 11:03, Miguel Angel Ajo <mangelajo at redhat.com> wrote:
>
> Rossella Sblendido wrote:
>> Hello Artur,
>>
>> thanks for staring this thread. See inline please.
>>
>> On 10/15/2015 05:23 PM, Ihar Hrachyshka wrote:
>>> Hi Artur,
>>>
>>> thanks a lot for caring about upgrades!
>>>
>>> There are a lot of good points below. As you noted, surprisingly, we seem to have rolling upgrades working for RPC layer. Before we go into complicating database workflow by doing oslo.versionedobjects transition heavy-lifting, I would like us to spend cycles on making sure rolling upgrades work not just surprisingly, but also covered with appropriate gating (I speak grenade).
>>
>> +1 agreed that the first step is to have test coverage then we can go on improving the process :)
>>
>>>
>>> I also feel that upgrades are in lots of ways not only a technical issue, but a cultural one too. You should have reviewers being aware of all the moving parts, and how a seemingly innocent change can break the flow. That’s why I plan to start on a devref page specifically about upgrades, where we could lay ground about which scenarios we should support, and those we should not (f.e. we have plenty of compatibility code in agents that to handle old controller scenario, which should not be supported); how all pieces interact and behave in transition, and what to look for during reviews. Hopefully, once such a page is up and read by folks, we will be able to have more meaningful conversation about our upgrade strategy.
>>>
>>>> On 14 Oct 2015, at 20:10, Korzeniewski, Artur <artur.korzeniewski at intel.com> wrote:
>>>>
>>>> Hi all,
>>>>
>>>> I would like to gather all upgrade activities in Neutron in one place, in order to summarizes the current status and future activities on rolling upgrades in Mitaka.
>>>>
>>>
>>> If you think it’s worth it, we can start up a new etherpad page to gather upgrade ideas and things to do.
>>>
>>>>
>>>>
>>>> 1. RPC versioning
>>>>
>>>> a. It is already implemented in Neutron.
>>>>
>>>> b. TODO: To have the rolling upgrade we have to implement the RPC version pinning in conf.
>>>>
>>>> i. I’m not a big fan of this solution, but we can work out better idea if needed.
>>>
>>> As Dan pointed out, and as I think Miguel was thinking about, we can have pin defined by agents in the cluster. Actually, we can have per agent pin.
>>
>> I am not a big fan either mostly because the pinning is a manual task. Anyway looking at the patch Dan linked https://review.openstack.org/#/c/233289/ ...if we remove the manual step I can become a fan of this approach :)
>>
> Yes, the minimum implementation we could agree on initially was pining. Direct request of objects from agents
> to neutron-server includes the requested version, so that's always OK, the complicated part is notification of object
> changes via fanout.
>
> In that case, I thinking of including the supported object versions on agent status reports, so neutron server can
> decide on runtime which versions to send (in some cases it may need to send several versions in parallel), I'm in
> long due to upload the strategy to the rpc callbacks devref. But it will be along those lines.
>
>>>
>>>>
>>>> c. Possible unit/functional tests to catch RPC version incompatibilities between RPC revisions.
>>>>
>>>> d. TODO: Multi-node Grenade job to have rolling upgrades covered in CI.
>>>
>>> That is not for unit or functional test level.
>>>
>>> As you mentioned, we already have grenade project that is designed to test upgrades. To validate RPC compatibility on rolling upgrade we would need so called ‘partial’ job (when different components are running with different versions; in case of neutron it would mean a new controller and old agents). The job is present in nova gate and validates RPC compatibility.
>>>
>>> As far as I know, Russell Bryant was looking into introducing the job for neutron, but was blocked by ongoing grenade refactoring to support partial upgrades ‘the right way’ (using multinode setups). I think that we should check with grenade folks on that matter, I have heard start of Mitaka was ETA for this work to complete.
>>>
>>>>
>>>> 2. Message content versioning – versioned objects
>>>>
>>>> a. TODO: implement Oslo Versionobject in Mitaka cycle. The interesting entities to be implemented: network, subnet, port, security groups…
>>>
>>> Though we haven’t touched base neutron resources in Liberty, we introduced oslo.versionedobjects based NeutronObject class during Liberty as part of QoS effort. I plan to expand on that work during Mitaka.
> ++
>>>
>>> The existing code for QoS resources can be found at:
>>>
>>> https://github.com/openstack/neutron/tree/master/neutron/objects
>>>
>>>>
>>>> b. Will OVO have impact on vendor plugins?
>>>
>>> It surely can have significant impact, but hopefully dict compat layer should make transition more smooth:
>>>
>>> https://github.com/openstack/neutron/blob/master/neutron/objects/base.py#L50
>
> Correct.
>>>
>>>>
>>>> c. Be strict on changes in version objects in code review, any change in object structure should increment the minor (backward-compatible) or major (breaking change) RPC version.
>>>
>>> That’s assuming we have a clear mapping of objects onto current RPC interfaces, which is not obvious. Another problem we would need to solve is core resource extensions (currently available in ml2 only), like qos or port_security, that modify resources based on controller configuration.
>>>
>>>>
>>>> d. Indirection API – message from newer format should be translated to older version by neutron server.
>>>
>>> For QoS, we used a new object agnostic subscriber mechanism to propagate changes applied to QoS objects into agents: http://docs.openstack.org/developer/neutron/devref/rpc_callbacks.html
>>>
>>> It is already (expected) to downgrade objects based on agent version (note it’s not implemented yet, but will surely be ready during Mitaka):
>>>
>>> https://github.com/openstack/neutron/blob/master/neutron/api/rpc/handlers/resources_rpc.py#L142
> Yes, that's exactly what I was talking above. It has an object retrieval for agents, where they can specify a version,
> but subscription/notifications is the complicated part.
>
>
>>>
>>>>
>>>> 3. Database migration
>>>>
>>>> a. Online schema migration was done in Liberty release, any work left to do?
>>>
>>> Nothing specific, maybe a bug or two here and there.
>>>
>>>>
>>>> b. TODO: Online data migration to be introduced in Mitaka cycle.
>>>>
>>>> i. Online data migration can be done during normal operation on the data.
>>>>
>>>> ii. There should be also the script to invoke the data migration in the background.
>>>>
>>>> c. Currently the contract phase is doing the data migration. But since the contract phase should be run offline, we should move the data migration to preceding step. Also the contract phase should be blocked if there is still relevant data in removed entities.
>>>
>>> Yes, we definitely need a stop mechanism first, then play with data migrations. I don’t think we can consider data migration before we have a way to hide bloody migration details behind abstract resources (read: versioned objects). Realistically, I would consider data migration too far at the moment to consider as a todo step. But we definitely should look forward to it.
>>>
>>>>
>>>> i. Contract phase can be executed online, if there is all new code running in setup.
>>>
>>> I am not sure how it’s possible. Do you think it’s realistic to expect controller to resolve a lot of checks that usually db does (constraints?) while schema is not enforced?
>>>
>>>>
>>>> d. The other strategy is to not drop tables, alter names or remove the columns from the DB – what’s in, it’s in. We should put more attention on code reviews, merge only additive changes and avoid questionable DB modification.
>>>>
>>>
>>> I don’t like that approach. It suggests there is no way back if we screw something. Having a short contract phase which is offline seems to me like a reasonable approach. Anyway, it can be reconsidered after we have the elephant in the room solved (the data migration problem).
>>>
>>>> e. The Neutron server should be updated first, in order to do data translation between old format into new schema. When doing this, we can be sure that old data would not be inserted into old DB structures.
>>>>
>>>
>>> To my taste, that’s ^ the most clear way to go.
>
> Correct.
>>>
>>>>
>>>>
>>>> I have performed the manual Kilo to Liberty upgrade, both in operational manner and in code review of the RPC APIs. All is working fine.
>>>>
>>>> We can have some discussion on cross-project session [7] or we can also review any issues with Neutron upgrade in Friday’s unplugged session [8].
>>>
>>> I will be more than happy to sit with folks interested in our upgrade story and go write a plan for Mitaka.
>>
>> I am interested too and I am based in Italy (same time zone, yuppie)
>>
>> cheers,
>>
>> Rossella
>
> Ping me for discussion, it's a topic I'm interested on too.
>>
>>>
>>> Please ping me on irc (ihrachys), and we will think how we can sync effectively and push the effort forward. (btw I am located in Czech Republic, so we should be in the same time zone).
>>>
>>> Regards,
>>> Ihar
>>>
>>>
>>>
>>> __________________________________________________________________________
>>> OpenStack Development Mailing List (not for usage questions)
>>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 455 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20151021/96d59071/attachment.pgp>
More information about the OpenStack-dev
mailing list