[neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud
Hello openstack-discuss and neutron experts, I'd like to ask for your input and discussion on the idea of changing the ML2 driver for an existing cloud, read: changing a tire while still riding the bike. I actually like to find out if it's feasible to switch from the trusted Linuxbridge driver to the more modern SDN stack of OVN - in place. With all existing user networks, subnets, security groups and (running) instances in the database already. And while I know there a push of migrating from OVS to OVN and a clear migration path documented (https://docs.openstack.org/neutron/latest/ovn/migration.html), but they are much more similar in their data plane. And to get this out of the way: I am not asking to do be able to do this without downtime, interruptions, migrations or any active orchestration of the process. I just want to know all the possible options apart from setting up a new cloud and asking folks to migrate all of their things over... 1) Are the data models of the user managed resources abstract (enough) from the ML2 used? So would the composition of a router, a network, some subnets, a few security group and a few instances in a project just result in a different instantiation of packet handling components, but be otherwise transparent to the user? 2) What could be possible migration strategies? While it might be a little early to think about the actual migration steps. But just to consider more than a full cloud shutdown following a cold start with modified neutron config then using the OVN ML2. I know there is more than just the network and getting a virtual layer 2 network. There are DHCP, DNS and the metadata service and last but not least the gateway / router and the security groups. But if using VXLAN for OVN (as Linuxbridge uses currently) could the shift from one implementation to the other potentially be done node by node? Or project by project by changing the network agents over to nodes already running OVN? If not, is there any other hot cut-over approach that would avoid having to shutdown all the instances, but only cause them some network downtime (until the next DHCP renew or similar?) Has anybody ever done something similar or heard about this being done anywhere? Thanks for your time and input, Christian
Hi, Dnia poniedziałek, 22 sierpnia 2022 22:32:35 CEST Christian Rohmann pisze:
Hello openstack-discuss and neutron experts,
I'd like to ask for your input and discussion on the idea of changing the ML2 driver for an existing cloud, read: changing a tire while still riding the bike.
I actually like to find out if it's feasible to switch from the trusted Linuxbridge driver to the more modern SDN stack of OVN - in place. With all existing user networks, subnets, security groups and (running) instances in the database already. And while I know there a push of migrating from OVS to OVN and a clear migration path documented (https://docs.openstack.org/neutron/latest/ovn/migration.html), but they are much more similar in their data plane.
And to get this out of the way: I am not asking to do be able to do this without downtime, interruptions, migrations or any active orchestration of the process. I just want to know all the possible options apart from setting up a new cloud and asking folks to migrate all of their things over...
1) Are the data models of the user managed resources abstract (enough) from the ML2 used? So would the composition of a router, a network, some subnets, a few security group and a few instances in a project just result in a different instantiation of packet handling components, but be otherwise transparent to the user?
Yes, data models are the same so all networks, routers, subnets will be the same but implemented differently by different backend. The only significant difference may be network types as OVN works mostly with Geneve tunnel networks and with LB backend You are using VXLAN IIUC your email.
2) What could be possible migration strategies?
While it might be a little early to think about the actual migration steps. But just to consider more than a full cloud shutdown following a cold start with modified neutron config then using the OVN ML2. I know there is more than just the network and getting a virtual layer 2 network. There are DHCP, DNS and the metadata service and last but not least the gateway / router and the security groups. But if using VXLAN for OVN (as Linuxbridge uses currently) could the shift from one implementation to the other potentially be done node by node? Or project by project by changing the network agents over to nodes already running OVN?
Even if You will keep vxlan networks with OVN backend (support is kind of limited really) You will not be able to have tunnels established between nodes with different backends so there will be no connectivity between VMs on hosts with different backends.
If not, is there any other hot cut-over approach that would avoid having to shutdown all the instances, but only cause them some network downtime (until the next DHCP renew or similar?)
TBH I don't know about any way to do that. We never tried and tested migration from LB to OVN backend. The only currently supported migration is from ML2/OVS to ML2/OVN backend and it depends on the Tripleo framework.
Has anybody ever done something similar or heard about this being done anywhere?
I don't know about anyone who did that but if there is someone, I would be happy to hear about how it was done and how it went :)
Thanks for your time and input,
Christian
-- Slawek Kaplonski Principal Software Engineer Red Hat
Thanks Slawek for your quick response! On 23/08/2022 07:47, Slawek Kaplonski wrote:
1) Are the data models of the user managed resources abstract (enough) from the ML2 used? So would the composition of a router, a network, some subnets, a few security group and a few instances in a project just result in a different instantiation of packet handling components, but be otherwise transparent to the user? Yes, data models are the same so all networks, routers, subnets will be the same but implemented differently by different backend. The only significant difference may be network types as OVN works mostly with Geneve tunnel networks and with LB backend You are using VXLAN IIUC your email.
That is reassuring. Yes we currently use VXLAN. But even with the same type of tunneling, I suppose the networks and their IDs will not align to form a proper layer 2 domain, not even talking about all the other services like DHCP or metadata. See next question about my idea to at least have some gradual switchover.
2) What could be possible migration strategies?
[...] Or project by project by changing the network agents over to nodes already running OVN? Even if You will keep vxlan networks with OVN backend (support is kind of limited really) You will not be able to have tunnels established between nodes with different backends so there will be no connectivity between VMs on hosts with different backends.
I was more thinking to move all of a projects resources to network nodes (and hypervisors) which already run OVN. So split the cloud in two classes of machines, one set unchanged running Linuxbridge and the other in OVN mode. To migrate "a project" all agents of that projects routers and networks will be changed over to agents running on OVN-powered nodes.... So this would be a hard cut-over, but limited to a single project. In alternative to replacing all of the network agents on all nodes and for all projects at the same time. Wouldn't that work - in theory - or am I missing something obvious here?
Has anybody ever done something similar or heard about this being done anywhere? I don't know about anyone who did that but if there is someone, I would be happy to hear about how it was done and how it went :)
We will certainly share our story - if we live to talk about it ;-) Thanks again, With kind regards Christian
Hi Christian, In my experience, it is possible to perform in-place migration from ML2/LXB -> ML2/OVN, albeit with a shutdown or hard reboot of the instance(s) to complete the VIF plugging and some other needed operations. I have a very rough outline of required steps if you’re interested, but they’re geared towards an openstack-ansible based deployment. I’ll try to put a writeup together in the next week or two demonstrating the process in a multi-node environment; the only one I have done recently was an all-in-one. James Denton Rackspace Private Cloud From: Christian Rohmann <christian.rohmann@inovex.de> Date: Monday, August 29, 2022 at 4:10 AM To: Slawek Kaplonski <skaplons@redhat.com>, openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Thanks Slawek for your quick response! On 23/08/2022 07:47, Slawek Kaplonski wrote: 1) Are the data models of the user managed resources abstract (enough) from the ML2 used? So would the composition of a router, a network, some subnets, a few security group and a few instances in a project just result in a different instantiation of packet handling components, but be otherwise transparent to the user? Yes, data models are the same so all networks, routers, subnets will be the same but implemented differently by different backend. The only significant difference may be network types as OVN works mostly with Geneve tunnel networks and with LB backend You are using VXLAN IIUC your email. That is reassuring. Yes we currently use VXLAN. But even with the same type of tunneling, I suppose the networks and their IDs will not align to form a proper layer 2 domain, not even talking about all the other services like DHCP or metadata. See next question about my idea to at least have some gradual switchover. 2) What could be possible migration strategies? [...] Or project by project by changing the network agents over to nodes already running OVN? Even if You will keep vxlan networks with OVN backend (support is kind of limited really) You will not be able to have tunnels established between nodes with different backends so there will be no connectivity between VMs on hosts with different backends. I was more thinking to move all of a projects resources to network nodes (and hypervisors) which already run OVN. So split the cloud in two classes of machines, one set unchanged running Linuxbridge and the other in OVN mode. To migrate "a project" all agents of that projects routers and networks will be changed over to agents running on OVN-powered nodes.... So this would be a hard cut-over, but limited to a single project. In alternative to replacing all of the network agents on all nodes and for all projects at the same time. Wouldn't that work - in theory - or am I missing something obvious here? Has anybody ever done something similar or heard about this being done anywhere? I don't know about anyone who did that but if there is someone, I would be happy to hear about how it was done and how it went :) We will certainly share our story - if we live to talk about it ;-) Thanks again, With kind regards Christian
Hey James, I am really sorry I just get back to you now. On 29/08/2022 19:54, James Denton wrote:
In my experience, it is possible to perform in-place migration from ML2/LXB -> ML2/OVN, albeit with a shutdown or hard reboot of the instance(s) to complete the VIF plugging and some other needed operations. I have a very rough outline of required steps if you’re interested, but they’re geared towards an openstack-ansible based deployment. I’ll try to put a writeup together in the next week or two demonstrating the process in a multi-node environment; the only one I have done recently was an all-in-one.
James Denton
Rackspace Private Cloud
Thanks for replying, I'd really love to see your outline / list of steps. BTW, we are actively working on switching to openstack-ansible - so that would suit us well. We also came to the conclusion that a shutdown of all instances might be required. Question is, if that has to happen instantly or if one could do that on a project by project base. Our cloud is small enough to still make this feasible, but I suppose this topic is or will become more important to other, larger clouds as well. Regards Christian
Hi Christian, I documented this a few months ago here: https://www.jimmdenton.com/migrating-lxb-to-ovn/. It’s heavily geared towards OpenStack-Ansible, but you can probably extrapolate the steps for a vanilla deployment or other deployment tool. The details will vary. Highly recommend testing this in a lab environment that mirrors production, if possible. -- James Denton Principal Architect Rackspace Private Cloud - OpenStack james.denton@rackspace.com From: Christian Rohmann <christian.rohmann@inovex.de> Date: Wednesday, November 23, 2022 at 4:27 AM To: James Denton <james.denton@rackspace.com>, Slawek Kaplonski <skaplons@redhat.com>, openstack-discuss@lists.openstack.org <openstack-discuss@lists.openstack.org> Subject: Re: [neutron] Switching the ML2 driver in-place from linuxbridge to OVN for an existing Cloud CAUTION: This message originated externally, please use caution when clicking on links or opening attachments! Hey James, I am really sorry I just get back to you now. On 29/08/2022 19:54, James Denton wrote: In my experience, it is possible to perform in-place migration from ML2/LXB -> ML2/OVN, albeit with a shutdown or hard reboot of the instance(s) to complete the VIF plugging and some other needed operations. I have a very rough outline of required steps if you’re interested, but they’re geared towards an openstack-ansible based deployment. I’ll try to put a writeup together in the next week or two demonstrating the process in a multi-node environment; the only one I have done recently was an all-in-one. James Denton Rackspace Private Cloud Thanks for replying, I'd really love to see your outline / list of steps. BTW, we are actively working on switching to openstack-ansible - so that would suit us well. We also came to the conclusion that a shutdown of all instances might be required. Question is, if that has to happen instantly or if one could do that on a project by project base. Our cloud is small enough to still make this feasible, but I suppose this topic is or will become more important to other, larger clouds as well. Regards Christian
participants (3)
-
Christian Rohmann
-
James Denton
-
Slawek Kaplonski