Hi,

I can live without metadata since custom DHCP server provides the IP, at least now when just testing.

I have now been testing a bit more. I totally opened all the security group rules.

I do have:
  * br-ex1 -> vlan tagged, bond0, physnet1.
  * br-ex2 -> flat, physnet2.

What I am looking for:
  * I can use my networks defined in hardware (br-ex1, bond0, physnet1), so that my own DHCP gives IP addresses, but I do not want to use any metadata from OpenStack for these VLAN networks
  * I can use my direct flat br-ex2 physnet2 where I do have own DHCP and no OpenStack metadata is used
  * I can use OpenStack flat networks inside projects, and use routers possibly to connect to br-ex1 and br-ex2 in case this kind of behaviour is needed. 

Current situation when sniffing from interfaces: 
  * BOOTP goes through now without problems (packets can be seen in bond0, br-int, qvoaccb1b23-a0, HW router, HW dhcp server)
  * ICMP can be seen in qvoaccb1b23-a0  but not in bond0. 
  * ARP requests can be seen in qvoaccb1b23-a0 but not in bond0

  Thanks a lot!

---- additional information - ---

--- My ml2_conf.ini looks like this: (Might be that I am now doing something which is not allowed) ---
[ml2]
type_drivers = flat,vlan,vxlan
tenant_network_types = vlan
mechanism_drivers = openvswitch,l2population
extension_drivers = port_security

[ml2_type_vlan]
network_vlan_ranges = physnet1:10:1000

[ml2_type_flat]
flat_networks = physnet2

[ml2_type_vxlan]
vni_ranges = 1:1000

--- The openstack network is created like this: ---
openstack network create --external --provider-physical-network physnet1 --provider-segment 11 --provider-network-type vlan testnet

openstack subnet  create --no-dhcp --allocation-pool start=10.10.11.100,end=10.10.11.150 --network testnet --subnet-range 10.10.11.0/24 --gateway 10.10.11.1 testnet-subnet

---- Openvswitch sees the network like this: ----
c12fde89-8778-4c42-9126-0790aea84547
    Manager "ptcp:6640:127.0.0.1"
        is_connected: true
    Bridge br-ex1
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: system
        Port bond0
            Interface bond0
        Port phy-br-ex1
            Interface phy-br-ex1
                type: patch
                options: {peer=int-br-ex1}
        Port br-ex1
            Interface br-ex1
                type: internal
    Bridge br-tun
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: system
        Port br-tun
            Interface br-tun
                type: internal
        Port patch-int
            Interface patch-int
                type: patch
                options: {peer=patch-tun}
    Bridge br-ex2
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: system
        Port phy-br-ex2
            Interface phy-br-ex2
                type: patch
                options: {peer=int-br-ex2}
        Port br-ex2
            Interface br-ex2
                type: internal
        Port eno2
            Interface eno2
    Bridge br-int
        Controller "tcp:127.0.0.1:6633"
            is_connected: true
        fail_mode: secure
        datapath_type: system
        Port br-int
            Interface br-int
                type: internal
        Port snooper0
            Interface snooper0
        Port tap5a71d857-11
            tag: 1
            Interface tap5a71d857-11
                type: internal
        Port int-br-ex1
            Interface int-br-ex1
                type: patch
                options: {peer=phy-br-ex1}
        Port patch-tun
            Interface patch-tun
                type: patch
                options: {peer=patch-int}
        Port qvoaccb1b23-a0 (This is my cirros test instance)
            tag: 7
            Interface qvoaccb1b23-a0
        Port int-br-ex2
            Interface int-br-ex2
                type: patch
                options: {peer=phy-br-ex2}


ovs-ofctl dump-flows br-int (partially only to see that VLAN translation is done)
cookie=0xd41cc81493c76645, duration=1644.771s, table=0, n_packets=78, n_bytes=18928, idle_age=51, priority=3,in_port=1,dl_vlan=11 actions=mod_vlan_vid:5,resubmit(,58)

ovs-ofctl dump-flows br-ex1 (partially only to see that VLAN translation is done)
cookie=0x9977a33c3076d47, duration=1595.358s, table=0, n_packets=92, n_bytes=18880, priority=4,in_port="phy-br-ex1",dl_vlan=5 actions=mod_vlan_vid:11,NORMAL



On Tue, 12 Mar 2024 at 10:49, Sławek Kapłoński <skaplons@redhat.com> wrote:

Hi,


Dnia poniedziałek, 11 marca 2024 11:57:44 CET Mika Saari pisze:

> Hi,

>

>   I am using latest kolla-ansible installation. Network is in openvswitch

> type.  My question is related to my provider network with VLAN type

> (external HW router, external DHCP, ...). The network is directly connected

> to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <->

> openvswitch.

>

>   When I create a new instance (cirros) to this provider network the

> instance can not connect to the metadata server during the bootup.


Metadata server is typically running inside Neutron router's namespace or, in case of isolated networks in the network's dhcp namespace created by the dhcp agent.

It will not be available if You have external dhcp server and instances plugged directly into the provider network (which I assume is not connected to any router in neutron).

You will need to use config-drive instead of metadata server in such case.

There is ongoing work [1] to provide "distributed metadata" which would be then provided directly by the neutron-openvswitch-agent but it's not completed yet (and there is no progress on it recently).


> The dhcp

> is trying to fetch the address, and I can sniff the data in bond0, br-ex1,

> br-int and br-tun and the dhcp request leaves the hardware, DHCP server

> sends the reply, but the reply wont go through back to the instance.


Where those replies are dropped? On the bond0 or somewhere in the Openvswitch?

Did You try to disable port_security for the port (or allow all ingress traffic in You security group) to make sure that SGs aren't dropping those replies?


> If I

> force a static IP to this cirros instance and try pinging the network, the

> ICMP doesn't leave the bond0. After around 15 minutes the network starts

> suddenly to work and dhcp reply goes through and the instance gets the

> dynamic IP. After this the whole network starts to work fluently.

>

>   Any ideas what might be the problem?

>

>   Thank you very much!

>


[1] https://review.opendev.org/q/topic:%22distributed_metadata_data_path%22


--

Slawek Kaplonski

Principal Software Engineer

Red Hat