[openstack-ansible] Recommended branch for a production environment?
Hello, We have deployed Openstack-Ansible in a test environment and we were wondering what the recommended repository branch is to deploy in a production environment that will be integrated with a separate Ceph cluster. 1. We are thinking about pinning to the stable/zed. Is that recommended over the master branch? 2. Are bugs ironed out on both master and stable/zed with the same cadence? 3. Additionally, is Debian 11 a better alternative than Ubuntu 22 for target hosts? We noticed Ubuntu 22.04 support was added recently, whereas Debian 11 has been supported for quite some time now which leads us to believe its stability/integration could be more mature at this point? Any suggestions will be appreciated. Thank you, -- *Roger *
Hi Roger, Let me try to answer your questions 1. We do not recommend using master on production environments, as it's current development branch. I would suggest using either Zed or Yoga, unless you're going to use OVN as a network driver. If you're going to use OVN - then Zed would be the only choice. Why Yoga has came to the picture - as it's first SLURP release, so you are able to upgrade between SLURP releases, which is N+2 (so from Yoga to Antelope), while Zed is non-SLURP, so supported upgrade path will be only N+1, but that is also to Antelope. Another thing, when I'm talking about releases, I don't suggest you checkout git repo to the stable/$release branch, but pick some tagged version from that branch. For example, for Zed release latest tag as of today is 26.0.1, so I would suggest checkout repo to the latest tag on 26.x.x which will be available during the deployment time. 2. We usually do backport bug fixes to the affected stable branches as well as maintain them during their lifecycle. We all are people so it happens when we forget to backport some of them, so don't be shy to ping us if needed ;) 3. Eventually, Ubuntu is historically better tested and used by more active contributors, then debian is. Also majority of CI tests for OpenStack services are running on Ubuntu rather than Debian. While I'd say that both are supported and should work good, Ubuntu is still better tested choice and thus is a bit safer bet as of today. ср, 8 февр. 2023 г., 04:16 Roger Rivera <roger.riverac@gmail.com>:
Hello,
We have deployed Openstack-Ansible in a test environment and we were wondering what the recommended repository branch is to deploy in a production environment that will be integrated with a separate Ceph cluster.
1. We are thinking about pinning to the stable/zed. Is that recommended over the master branch? 2. Are bugs ironed out on both master and stable/zed with the same cadence? 3. Additionally, is Debian 11 a better alternative than Ubuntu 22 for target hosts?
We noticed Ubuntu 22.04 support was added recently, whereas Debian 11 has been supported for quite some time now which leads us to believe its stability/integration could be more mature at this point?
Any suggestions will be appreciated.
Thank you,
-- *Roger *
On 2023-02-08 07:12:39 +0100 (+0100), Dmitriy Rabotyagov wrote: [...]
Why Yoga has came to the picture - as it's first SLURP release, so you are able to upgrade between SLURP releases, which is N+2 (so from Yoga to Antelope), while Zed is non-SLURP, so supported upgrade path will be only N+1, but that is also to Antelope. [...]
Slight correction: Yoga is only a "dress rehearsal" for SLURP. The first official SLURP release will be 2023.1 (Antelope). We're enforcing SLURP testing for upgrades from Yoga to 2023.1 (Antelope) so that we can make sure the model is working in time for 2024.1 upgrades, but we haven't officially declared Yoga as SLURP. That said, it's probably okay to pretend it is as long as you keep in mind that the community isn't promising users it's going to work. -- Jeremy Stanley
Hello Jeremy, Thank you for expanding into this. We are now in a better position to make more accurate decisions. Regards, Roger On Wed, Feb 8, 2023 at 9:42 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2023-02-08 07:12:39 +0100 (+0100), Dmitriy Rabotyagov wrote: [...]
Why Yoga has came to the picture - as it's first SLURP release, so you are able to upgrade between SLURP releases, which is N+2 (so from Yoga to Antelope), while Zed is non-SLURP, so supported upgrade path will be only N+1, but that is also to Antelope. [...]
Slight correction: Yoga is only a "dress rehearsal" for SLURP. The first official SLURP release will be 2023.1 (Antelope). We're enforcing SLURP testing for upgrades from Yoga to 2023.1 (Antelope) so that we can make sure the model is working in time for 2024.1 upgrades, but we haven't officially declared Yoga as SLURP. That said, it's probably okay to pretend it is as long as you keep in mind that the community isn't promising users it's going to work. -- Jeremy Stanley
-- *Roger Rivera*
Hello, We have deployed an openstack-ansible cluster to test it on_metal with OVN and defined *dedicated gateway hosts* connecting to the external network with the *network-gateway_hosts* host group. Unfortunately, we are not able to connect to the external/provider networks. It seems that traffic wants to reach external networks via the hypervisor nodes and not the gateway hosts. Any suggestions on changes needed to our configuration will be highly appreciated. Environment: -Openstack Antelope -Ubuntu 22 on all hosts -3 infra hosts - 1xNIC (ens1) -2 compute hosts - 1xNIC (ens1) -2 gateway hosts - 2xNIC (ens1 internal, ens2 external) -No linux bridges are created. The gateway hosts are the only ones physically connected to the external network via physical interface ens2. Therefore, we need all external provider network traffic to traverse via these gateway hosts. Tenant networks work fine and VMs can talk to each other. However, when a VM is spawned with a floating IP to the external network, they are unable to reach the outside network. Relevant content from openstack-ansible configuration files: =.=.=.=.=.=.=.= openstack_user_config.yml =.=.=.=.=.=.=.= ``` ... provider_networks: - network: container_bridge: "br-mgmt" container_type: "veth" container_interface: "ens1" ip_from_q: "management" type: "raw" group_binds: - all_containers - hosts is_management_address: true - network: container_bridge: "br-vxlan" container_type: "veth" container_interface: "ens1" ip_from_q: "tunnel" #type: "vxlan" type: "geneve" range: "1:1000" net_name: "geneve" group_binds: - neutron_ovn_controller - network: container_bridge: "br-flat" container_type: "veth" container_interface: "ens1" type: "flat" net_name: "flat" group_binds: - neutron_ovn_controller - network: container_bridge: "br-vlan" container_type: "veth" container_interface: "ens1" type: "vlan" range: "101:300,401:500" net_name: "vlan" group_binds: - neutron_ovn_controller - network: container_bridge: "br-storage" container_type: "veth" container_interface: "ens1" ip_from_q: "storage" type: "raw" group_binds: - glance_api - cinder_api - cinder_volume - nova_compute ... compute-infra_hosts: inf1: ip: 172.16.0.1 inf2: ip: 172.16.0.2 inf3: ip: 172.16.0.3 compute_hosts: cmp4: ip: 172.16.0.21 cmp3: ip: 172.16.0.22 network_hosts: inf1: ip: 172.16.0.1 inf2: ip: 172.16.0.2 inf3: ip: 172.16.0.3 network-gateway_hosts: net1: ip: 172.16.0.31 net2: ip: 172.16.0.32 ``` =.=.=.=.=.=.=.= user_variables.yml =.=.=.=.=.=.=.= ``` --- debug: false install_method: source rabbitmq_use_ssl: False haproxy_use_keepalived: False ... neutron_plugin_type: ml2.ovn neutron_plugin_base: - neutron.services.ovn_l3.plugin.OVNL3RouterPlugin neutron_ml2_drivers_type: geneve,vlan,flat neutron_ml2_conf_ini_overrides: ml2: tenant_network_types: geneve ... ``` =.=.=.=.=.=.=.= env.d/neutron.yml =.=.=.=.=.=.=.= ``` component_skel: neutron_ovn_controller: belongs_to: - neutron_all neutron_ovn_northd: belongs_to: - neutron_all container_skel: neutron_agents_container: contains: {} properties: is_metal: true neutron_ovn_northd_container: belongs_to: - network_containers contains: - neutron_ovn_northd ``` =.=.=.=.=.=.=.= env.d/nova.yml =.=.=.=.=.=.=.= ``` component_skel: nova_compute_container: belongs_to: - compute_containers - kvm-compute_containers - lxd-compute_containers - qemu-compute_containers contains: - neutron_ovn_controller - nova_compute properties: is_metal: true ``` =.=.=.=.=.=.=.= group_vars/network_hosts =.=.=.=.=.=.=.= ``` openstack_host_specific_kernel_modules: - name: "openvswitch" pattern: "CONFIG_OPENVSWITCH" ``` The nodes layout is like this: [image: image.png] Any guidance on what we have wrong or how to improve this configuration will be appreciated. We need to make external traffic for VMs to go out via the gateway nodes and not the compute/hypervisor nodes. Thank you. Roger
Hi, I think this is known issue which should be fixed with the following patch: https://review.opendev.org/c/openstack/openstack-ansible/+/892540 In the meanwhile you should be able to workaround the issue by creating /etc/openstack_deploy/env.d/nova.yml file with following content: nova_compute_container: belongs_to: - compute_containers - kvm-compute_containers - qemu-compute_containers contains: - neutron_sriov_nic_agent - neutron_ovn_controller - nova_compute properties: is_metal: true You might also need to remove computes from the inventory using /opt/openstack-ansible/scripts/inventory-manage.py -r cmp03 They will be re-added next time running openstack-ansible or dynamic-inventory.py. Removing them is needed to ensure that they're not part of ovn-gateway related group. You might also need to stop ovn-gateway service on these computes manually, but I'm not sure 100% about that. On Sat, Sep 2, 2023, 17:47 Roger Rivera <roger.riverac@gmail.com> wrote:
Hello,
We have deployed an openstack-ansible cluster to test it on_metal with OVN and defined *dedicated gateway hosts* connecting to the external network with the *network-gateway_hosts* host group. Unfortunately, we are not able to connect to the external/provider networks. It seems that traffic wants to reach external networks via the hypervisor nodes and not the gateway hosts.
Any suggestions on changes needed to our configuration will be highly appreciated.
Environment: -Openstack Antelope -Ubuntu 22 on all hosts -3 infra hosts - 1xNIC (ens1) -2 compute hosts - 1xNIC (ens1) -2 gateway hosts - 2xNIC (ens1 internal, ens2 external) -No linux bridges are created.
The gateway hosts are the only ones physically connected to the external network via physical interface ens2. Therefore, we need all external provider network traffic to traverse via these gateway hosts.
Tenant networks work fine and VMs can talk to each other. However, when a VM is spawned with a floating IP to the external network, they are unable to reach the outside network.
Relevant content from openstack-ansible configuration files:
=.=.=.=.=.=.=.= openstack_user_config.yml =.=.=.=.=.=.=.= ``` ... provider_networks: - network: container_bridge: "br-mgmt" container_type: "veth" container_interface: "ens1" ip_from_q: "management" type: "raw" group_binds: - all_containers - hosts is_management_address: true - network: container_bridge: "br-vxlan" container_type: "veth" container_interface: "ens1" ip_from_q: "tunnel" #type: "vxlan" type: "geneve" range: "1:1000" net_name: "geneve" group_binds: - neutron_ovn_controller - network: container_bridge: "br-flat" container_type: "veth" container_interface: "ens1" type: "flat" net_name: "flat" group_binds: - neutron_ovn_controller - network: container_bridge: "br-vlan" container_type: "veth" container_interface: "ens1" type: "vlan" range: "101:300,401:500" net_name: "vlan" group_binds: - neutron_ovn_controller - network: container_bridge: "br-storage" container_type: "veth" container_interface: "ens1" ip_from_q: "storage" type: "raw" group_binds: - glance_api - cinder_api - cinder_volume - nova_compute
...
compute-infra_hosts: inf1: ip: 172.16.0.1 inf2: ip: 172.16.0.2 inf3: ip: 172.16.0.3
compute_hosts: cmp4: ip: 172.16.0.21 cmp3: ip: 172.16.0.22
network_hosts: inf1: ip: 172.16.0.1 inf2: ip: 172.16.0.2 inf3: ip: 172.16.0.3
network-gateway_hosts: net1: ip: 172.16.0.31 net2: ip: 172.16.0.32
```
=.=.=.=.=.=.=.= user_variables.yml =.=.=.=.=.=.=.= ``` --- debug: false install_method: source rabbitmq_use_ssl: False haproxy_use_keepalived: False ... neutron_plugin_type: ml2.ovn neutron_plugin_base: - neutron.services.ovn_l3.plugin.OVNL3RouterPlugin
neutron_ml2_drivers_type: geneve,vlan,flat neutron_ml2_conf_ini_overrides: ml2: tenant_network_types: geneve
... ```
=.=.=.=.=.=.=.= env.d/neutron.yml =.=.=.=.=.=.=.= ``` component_skel: neutron_ovn_controller: belongs_to: - neutron_all neutron_ovn_northd: belongs_to: - neutron_all
container_skel: neutron_agents_container: contains: {} properties: is_metal: true neutron_ovn_northd_container: belongs_to: - network_containers contains: - neutron_ovn_northd
```
=.=.=.=.=.=.=.= env.d/nova.yml =.=.=.=.=.=.=.= ``` component_skel: nova_compute_container: belongs_to: - compute_containers - kvm-compute_containers - lxd-compute_containers - qemu-compute_containers contains: - neutron_ovn_controller - nova_compute properties: is_metal: true ```
=.=.=.=.=.=.=.= group_vars/network_hosts =.=.=.=.=.=.=.= ``` openstack_host_specific_kernel_modules: - name: "openvswitch" pattern: "CONFIG_OPENVSWITCH" ```
The nodes layout is like this:
[image: image.png]
Any guidance on what we have wrong or how to improve this configuration will be appreciated. We need to make external traffic for VMs to go out via the gateway nodes and not the compute/hypervisor nodes.
Thank you.
Roger
Hello, We are noticing two issues with these changes: *1*. The overrides on the file /*etc/openstack_deploy/env.d/nova.yml* are not being honored: nova_compute_container: belongs_to: - compute_containers - kvm-compute_containers - qemu-compute_containers contains: - neutron_sriov_nic_agent - neutron_ovn_controller - nova_compute properties: is_metal: true The following block continues to be populated in with compute nodes in */etc/openstack_deploy/openstack_inventory.json* after deleting and recreating the inventory file with */opt/openstack-ansible/scripts/inventory-manage.py*: "neutron_ovn_gateway": { "children": [], "hosts": [ "cmp3", "cmp4", "net1", "net2" ] }, *2*. After changing *group_binds *to *neutron_ovn_gateway *instead of the previous *neutron_ovn_controller*, group binds for *provider_networks *in *openstack_user_config.yml*. Openstack-ansible still wants to create network mappings for compute nodes, which are not part of the *neutron_ovn_gateway *host group: =.=.=.=.=.=.=.=.= TASK [os_neutron : Setup Network Provider Bridges] ********************************************************************************************************************************************************************************************************************************************************************************************** fatal: [cmp4]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 1\n\nThe error appears to be in '/etc/ansible/roles/os_neutron/tasks/providers/setup_ovs_ovn.yml': line 55, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Setup Network Provider Bridges\n ^ here\n"} =.=.=.=.=.=.=.=.= I'll dig deeper to see if I can find anything that helps. But any assistance will be appreciated. Thanks On Sat, Sep 2, 2023 at 12:08 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote:
Hi,
I think this is known issue which should be fixed with the following patch: https://review.opendev.org/c/openstack/openstack-ansible/+/892540
In the meanwhile you should be able to workaround the issue by creating /etc/openstack_deploy/env.d/nova.yml file with following content:
nova_compute_container: belongs_to: - compute_containers - kvm-compute_containers - qemu-compute_containers contains: - neutron_sriov_nic_agent - neutron_ovn_controller - nova_compute properties: is_metal: true
You might also need to remove computes from the inventory using /opt/openstack-ansible/scripts/inventory-manage.py -r cmp03
They will be re-added next time running openstack-ansible or dynamic-inventory.py. Removing them is needed to ensure that they're not part of ovn-gateway related group. You might also need to stop ovn-gateway service on these computes manually, but I'm not sure 100% about that.
On Sat, Sep 2, 2023, 17:47 Roger Rivera <roger.riverac@gmail.com> wrote:
Hello,
We have deployed an openstack-ansible cluster to test it on_metal with OVN and defined *dedicated gateway hosts* connecting to the external network with the *network-gateway_hosts* host group. Unfortunately, we are not able to connect to the external/provider networks. It seems that traffic wants to reach external networks via the hypervisor nodes and not the gateway hosts.
Any suggestions on changes needed to our configuration will be highly appreciated.
Environment: -Openstack Antelope -Ubuntu 22 on all hosts -3 infra hosts - 1xNIC (ens1) -2 compute hosts - 1xNIC (ens1) -2 gateway hosts - 2xNIC (ens1 internal, ens2 external) -No linux bridges are created.
The gateway hosts are the only ones physically connected to the external network via physical interface ens2. Therefore, we need all external provider network traffic to traverse via these gateway hosts.
Tenant networks work fine and VMs can talk to each other. However, when a VM is spawned with a floating IP to the external network, they are unable to reach the outside network.
Relevant content from openstack-ansible configuration files:
=.=.=.=.=.=.=.= openstack_user_config.yml =.=.=.=.=.=.=.= ``` ... provider_networks: - network: container_bridge: "br-mgmt" container_type: "veth" container_interface: "ens1" ip_from_q: "management" type: "raw" group_binds: - all_containers - hosts is_management_address: true - network: container_bridge: "br-vxlan" container_type: "veth" container_interface: "ens1" ip_from_q: "tunnel" #type: "vxlan" type: "geneve" range: "1:1000" net_name: "geneve" group_binds: - neutron_ovn_controller - network: container_bridge: "br-flat" container_type: "veth" container_interface: "ens1" type: "flat" net_name: "flat" group_binds: - neutron_ovn_controller - network: container_bridge: "br-vlan" container_type: "veth" container_interface: "ens1" type: "vlan" range: "101:300,401:500" net_name: "vlan" group_binds: - neutron_ovn_controller - network: container_bridge: "br-storage" container_type: "veth" container_interface: "ens1" ip_from_q: "storage" type: "raw" group_binds: - glance_api - cinder_api - cinder_volume - nova_compute
...
compute-infra_hosts: inf1: ip: 172.16.0.1 inf2: ip: 172.16.0.2 inf3: ip: 172.16.0.3
compute_hosts: cmp4: ip: 172.16.0.21 cmp3: ip: 172.16.0.22
network_hosts: inf1: ip: 172.16.0.1 inf2: ip: 172.16.0.2 inf3: ip: 172.16.0.3
network-gateway_hosts: net1: ip: 172.16.0.31 net2: ip: 172.16.0.32
```
=.=.=.=.=.=.=.= user_variables.yml =.=.=.=.=.=.=.= ``` --- debug: false install_method: source rabbitmq_use_ssl: False haproxy_use_keepalived: False ... neutron_plugin_type: ml2.ovn neutron_plugin_base: - neutron.services.ovn_l3.plugin.OVNL3RouterPlugin
neutron_ml2_drivers_type: geneve,vlan,flat neutron_ml2_conf_ini_overrides: ml2: tenant_network_types: geneve
... ```
=.=.=.=.=.=.=.= env.d/neutron.yml =.=.=.=.=.=.=.= ``` component_skel: neutron_ovn_controller: belongs_to: - neutron_all neutron_ovn_northd: belongs_to: - neutron_all
container_skel: neutron_agents_container: contains: {} properties: is_metal: true neutron_ovn_northd_container: belongs_to: - network_containers contains: - neutron_ovn_northd
```
=.=.=.=.=.=.=.= env.d/nova.yml =.=.=.=.=.=.=.= ``` component_skel: nova_compute_container: belongs_to: - compute_containers - kvm-compute_containers - lxd-compute_containers - qemu-compute_containers contains: - neutron_ovn_controller - nova_compute properties: is_metal: true ```
=.=.=.=.=.=.=.= group_vars/network_hosts =.=.=.=.=.=.=.= ``` openstack_host_specific_kernel_modules: - name: "openvswitch" pattern: "CONFIG_OPENVSWITCH" ```
The nodes layout is like this:
[image: image.png]
Any guidance on what we have wrong or how to improve this configuration will be appreciated. We need to make external traffic for VMs to go out via the gateway nodes and not the compute/hypervisor nodes.
Thank you.
Roger
-- *Roger Rivera*
Hello Dmitriy, I appreciate you taking the time and effort to answer my questions. It's been really clarifying to know the differences from one branch to another. I've worked on GIT methodologies where the master is usually the stable branch and there is a separate development branch with all the newer commits. In our case, and with the intention to use OVN, we should be good by setting our environment up with the stable/zed branch. Especially considering LinuxBridge was moved to experimental. Also, knowing that Ubuntu has been battle tested with Openstack-Ansible is a helpful piece of information. When we started this testing project, Debian 11 was supported and Ubuntu 22 was experimental, hence we thought Debian was the main focus of attention and support for the project. Our main focus is stability, hence the interest of having a better understanding of the best branch and operating system combinations. Again, thank you very much. Best regards, Roger On Wed, Feb 8, 2023 at 1:12 AM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote:
Hi Roger,
Let me try to answer your questions
1. We do not recommend using master on production environments, as it's current development branch. I would suggest using either Zed or Yoga, unless you're going to use OVN as a network driver. If you're going to use OVN - then Zed would be the only choice. Why Yoga has came to the picture - as it's first SLURP release, so you are able to upgrade between SLURP releases, which is N+2 (so from Yoga to Antelope), while Zed is non-SLURP, so supported upgrade path will be only N+1, but that is also to Antelope. Another thing, when I'm talking about releases, I don't suggest you checkout git repo to the stable/$release branch, but pick some tagged version from that branch. For example, for Zed release latest tag as of today is 26.0.1, so I would suggest checkout repo to the latest tag on 26.x.x which will be available during the deployment time.
2. We usually do backport bug fixes to the affected stable branches as well as maintain them during their lifecycle. We all are people so it happens when we forget to backport some of them, so don't be shy to ping us if needed ;)
3. Eventually, Ubuntu is historically better tested and used by more active contributors, then debian is. Also majority of CI tests for OpenStack services are running on Ubuntu rather than Debian. While I'd say that both are supported and should work good, Ubuntu is still better tested choice and thus is a bit safer bet as of today.
ср, 8 февр. 2023 г., 04:16 Roger Rivera <roger.riverac@gmail.com>:
Hello,
We have deployed Openstack-Ansible in a test environment and we were wondering what the recommended repository branch is to deploy in a production environment that will be integrated with a separate Ceph cluster.
1. We are thinking about pinning to the stable/zed. Is that recommended over the master branch? 2. Are bugs ironed out on both master and stable/zed with the same cadence? 3. Additionally, is Debian 11 a better alternative than Ubuntu 22 for target hosts?
We noticed Ubuntu 22.04 support was added recently, whereas Debian 11 has been supported for quite some time now which leads us to believe its stability/integration could be more mature at this point?
Any suggestions will be appreciated.
Thank you,
-- *Roger *
-- *Roger Rivera*
participants (3)
-
Dmitriy Rabotyagov
-
Jeremy Stanley
-
Roger Rivera