Hello, 

We are noticing two issues with these changes:

1. The overrides on the file /etc/openstack_deploy/env.d/nova.yml are not being honored:
 
nova_compute_container:
    belongs_to:
      - compute_containers
      - kvm-compute_containers
      - qemu-compute_containers
    contains:
      - neutron_sriov_nic_agent
      - neutron_ovn_controller
      - nova_compute
    properties:
      is_metal: true
 

The following block continues to be populated in with compute nodes in /etc/openstack_deploy/openstack_inventory.json after deleting and recreating the inventory file with /opt/openstack-ansible/scripts/inventory-manage.py:

"neutron_ovn_gateway": {
        "children": [],
        "hosts": [
   "cmp3",
            "cmp4",
            "net1",
"net2"
]
    },



2. After changing group_binds to neutron_ovn_gateway instead of the previous neutron_ovn_controller, group binds for provider_networks in openstack_user_config.yml. Openstack-ansible still wants to create network mappings for compute nodes, which are not part of the neutron_ovn_gateway host group:

=.=.=.=.=.=.=.=.=
TASK [os_neutron : Setup Network Provider Bridges] **********************************************************************************************************************************************************************************************************************************************************************************************

fatal: [cmp4]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: list object has no element 1\n\nThe error appears to be in '/etc/ansible/roles/os_neutron/tasks/providers/setup_ovs_ovn.yml': line 55, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Setup Network Provider Bridges\n  ^ here\n"}

=.=.=.=.=.=.=.=.=

I'll dig deeper to see if I can find anything that helps. But any assistance will be appreciated.

Thanks


On Sat, Sep 2, 2023 at 12:08 PM Dmitriy Rabotyagov <noonedeadpunk@gmail.com> wrote:
Hi,

I think this is known issue which should be fixed with the following patch:
https://review.opendev.org/c/openstack/openstack-ansible/+/892540

In the meanwhile you should be able to workaround the issue by creating /etc/openstack_deploy/env.d/nova.yml file with following content:

nova_compute_container:
    belongs_to:
      - compute_containers
      - kvm-compute_containers
      - qemu-compute_containers
    contains:
      - neutron_sriov_nic_agent
      - neutron_ovn_controller
      - nova_compute
    properties:
      is_metal: true

You might also need to remove computes from the inventory using /opt/openstack-ansible/scripts/inventory-manage.py -r cmp03

They will be re-added next time running openstack-ansible or dynamic-inventory.py. Removing them is needed to ensure that they're not part of ovn-gateway related group.
You might also need to stop ovn-gateway service on these computes manually, but I'm not sure 100% about that.

On Sat, Sep 2, 2023, 17:47 Roger Rivera <roger.riverac@gmail.com> wrote:
Hello,

We have deployed an openstack-ansible cluster to test it on_metal with OVN and defined dedicated gateway hosts connecting to the external network with the network-gateway_hosts host group. Unfortunately, we are not able to connect to the external/provider networks. It seems that traffic wants to reach external networks via the hypervisor nodes and not the gateway hosts.

Any suggestions on changes needed to our configuration will be highly appreciated.

Environment:
-Openstack Antelope
-Ubuntu 22 on all hosts
-3 infra hosts - 1xNIC (ens1)
-2 compute hosts - 1xNIC (ens1)
-2 gateway hosts - 2xNIC (ens1 internal, ens2 external)
-No linux bridges are created.

The gateway hosts are the only ones physically connected to the external network via physical interface ens2. Therefore, we need all external provider network traffic to traverse via these gateway hosts.

Tenant networks work fine and VMs can talk to each other. However, when a VM is spawned with a floating IP to the external network, they are unable to reach the outside network.

Relevant content from openstack-ansible configuration files:


=.=.=.=.=.=.=.=
openstack_user_config.yml
=.=.=.=.=.=.=.=
```
...
provider_networks:
    - network:
        container_bridge: "br-mgmt"
        container_type: "veth"
        container_interface: "ens1"
        ip_from_q: "management"
        type: "raw"
        group_binds:
          - all_containers
          - hosts
        is_management_address: true
    - network:
        container_bridge: "br-vxlan"
        container_type: "veth"
        container_interface: "ens1"
        ip_from_q: "tunnel"
        #type: "vxlan"
        type: "geneve"
        range: "1:1000"
        net_name: "geneve"
        group_binds:
          - neutron_ovn_controller
    - network:
        container_bridge: "br-flat"
        container_type: "veth"
        container_interface: "ens1"
        type: "flat"
        net_name: "flat"
        group_binds:
          - neutron_ovn_controller
    - network:
        container_bridge: "br-vlan"
        container_type: "veth"
        container_interface: "ens1"
        type: "vlan"
        range: "101:300,401:500"
        net_name: "vlan"
        group_binds:
          - neutron_ovn_controller
    - network:
        container_bridge: "br-storage"
        container_type: "veth"
        container_interface: "ens1"
        ip_from_q: "storage"
        type: "raw"
        group_binds:
          - glance_api
          - cinder_api
          - cinder_volume
          - nova_compute
 
...

compute-infra_hosts:
  inf1:
    ip: 172.16.0.1
  inf2:
    ip: 172.16.0.2
  inf3:
    ip: 172.16.0.3

compute_hosts:
  cmp4:
    ip: 172.16.0.21
  cmp3:
    ip: 172.16.0.22

network_hosts:
  inf1:
    ip: 172.16.0.1
  inf2:
    ip: 172.16.0.2
  inf3:
    ip: 172.16.0.3

network-gateway_hosts:
  net1:
    ip: 172.16.0.31
  net2:
    ip: 172.16.0.32

```


=.=.=.=.=.=.=.=
user_variables.yml
=.=.=.=.=.=.=.=
```
---
debug: false
install_method: source
rabbitmq_use_ssl: False
haproxy_use_keepalived: False
...
neutron_plugin_type: ml2.ovn
neutron_plugin_base:
  - neutron.services.ovn_l3.plugin.OVNL3RouterPlugin

neutron_ml2_drivers_type: geneve,vlan,flat
neutron_ml2_conf_ini_overrides:
  ml2:
    tenant_network_types: geneve

...
```

=.=.=.=.=.=.=.=
env.d/neutron.yml
=.=.=.=.=.=.=.=
```
component_skel:
  neutron_ovn_controller:
    belongs_to:
      - neutron_all
  neutron_ovn_northd:
    belongs_to:
      - neutron_all

container_skel:
  neutron_agents_container:
    contains: {}
  properties:
    is_metal: true
  neutron_ovn_northd_container:
    belongs_to:
      - network_containers
    contains:
      - neutron_ovn_northd

```

=.=.=.=.=.=.=.=
env.d/nova.yml
=.=.=.=.=.=.=.=
```
component_skel:
  nova_compute_container:
    belongs_to:
      - compute_containers
      - kvm-compute_containers
      - lxd-compute_containers
      - qemu-compute_containers
    contains:
      - neutron_ovn_controller
      - nova_compute
    properties:
      is_metal: true
```

=.=.=.=.=.=.=.=
group_vars/network_hosts
=.=.=.=.=.=.=.=
```
openstack_host_specific_kernel_modules:
  - name: "openvswitch"
    pattern: "CONFIG_OPENVSWITCH"
```

The nodes layout is like this:

 image.png


Any guidance on what we have wrong or how to improve this configuration will be appreciated. We need to make external traffic for VMs to go out via the gateway nodes and not the compute/hypervisor nodes.

Thank you.

Roger


--
Roger Rivera