New instance, provider vlan network, network unpredictable
Hi, I am using latest kolla-ansible installation. Network is in openvswitch type. My question is related to my provider network with VLAN type (external HW router, external DHCP, ...). The network is directly connected to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <-> openvswitch. When I create a new instance (cirros) to this provider network the instance can not connect to the metadata server during the bootup. The dhcp is trying to fetch the address, and I can sniff the data in bond0, br-ex1, br-int and br-tun and the dhcp request leaves the hardware, DHCP server sends the reply, but the reply wont go through back to the instance. If I force a static IP to this cirros instance and try pinging the network, the ICMP doesn't leave the bond0. After around 15 minutes the network starts suddenly to work and dhcp reply goes through and the instance gets the dynamic IP. After this the whole network starts to work fluently. Any ideas what might be the problem? Thank you very much!
Hi, On 3/11/24 6:57 AM, Mika Saari wrote:
Hi,
I am using latest kolla-ansible installation. Network is in openvswitch type. My question is related to my provider network with VLAN type (external HW router, external DHCP, ...). The network is directly connected to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <-> openvswitch.
When I create a new instance (cirros) to this provider network the instance can not connect to the metadata server during the bootup. The dhcp is trying to fetch the address, and I can sniff the data in bond0, br-ex1, br-int and br-tun and the dhcp request leaves the hardware, DHCP server sends the reply, but the reply wont go through back to the instance. If I force a static IP to this cirros instance and try pinging the network, the ICMP doesn't leave the bond0. After around 15 minutes the network starts suddenly to work and dhcp reply goes through and the instance gets the dynamic IP. After this the whole network starts to work fluently.
Any ideas what might be the problem?
I can't help much with Kolla, but if you have a provider network and the subnet you created has DHCP enabled, you will need to enable isolated metadata in the DHCP agent in order for the instance to access it. enable_isolated_metadata = True force_metadata = True https://docs.openstack.org/neutron/2023.2/admin/deploy-ovs-provider.html -Brian
Hi, Dnia poniedziałek, 11 marca 2024 11:57:44 CET Mika Saari pisze:
Hi,
I am using latest kolla-ansible installation. Network is in openvswitch type. My question is related to my provider network with VLAN type (external HW router, external DHCP, ...). The network is directly connected to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <-> openvswitch.
When I create a new instance (cirros) to this provider network the instance can not connect to the metadata server during the bootup.
Metadata server is typically running inside Neutron router's namespace or, in case of isolated networks in the network's dhcp namespace created by the dhcp agent. It will not be available if You have external dhcp server and instances plugged directly into the provider network (which I assume is not connected to any router in neutron). You will need to use config-drive instead of metadata server in such case. There is ongoing work [1] to provide "distributed metadata" which would be then provided directly by the neutron-openvswitch-agent but it's not completed yet (and there is no progress on it recently).
The dhcp is trying to fetch the address, and I can sniff the data in bond0, br-ex1, br-int and br-tun and the dhcp request leaves the hardware, DHCP server sends the reply, but the reply wont go through back to the instance.
Where those replies are dropped? On the bond0 or somewhere in the Openvswitch? Did You try to disable port_security for the port (or allow all ingress traffic in You security group) to make sure that SGs aren't dropping those replies?
If I force a static IP to this cirros instance and try pinging the network, the ICMP doesn't leave the bond0. After around 15 minutes the network starts suddenly to work and dhcp reply goes through and the instance gets the dynamic IP. After this the whole network starts to work fluently.
Any ideas what might be the problem?
Thank you very much!
[1] https://review.opendev.org/q/topic:%22distributed_metadata_data_path%22 -- Slawek Kaplonski Principal Software Engineer Red Hat
Hi, I can live without metadata since custom DHCP server provides the IP, at least now when just testing. I have now been testing a bit more. I totally opened all the security group rules. I do have: * br-ex1 -> vlan tagged, bond0, physnet1. * br-ex2 -> flat, physnet2. What I am looking for: * I can use my networks defined in hardware (br-ex1, bond0, physnet1), so that my own DHCP gives IP addresses, but I do not want to use any metadata from OpenStack for these VLAN networks * I can use my direct flat br-ex2 physnet2 where I do have own DHCP and no OpenStack metadata is used * I can use OpenStack flat networks inside projects, and use routers possibly to connect to br-ex1 and br-ex2 in case this kind of behaviour is needed. Current situation when sniffing from interfaces: * BOOTP goes through now without problems (packets can be seen in bond0, br-int, qvoaccb1b23-a0, HW router, HW dhcp server) * ICMP can be seen in qvoaccb1b23-a0 but not in bond0. * ARP requests can be seen in qvoaccb1b23-a0 but not in bond0 Thanks a lot! ---- additional information - --- --- My ml2_conf.ini looks like this: (Might be that I am now doing something which is not allowed) --- [ml2] type_drivers = flat,vlan,vxlan tenant_network_types = vlan mechanism_drivers = openvswitch,l2population extension_drivers = port_security [ml2_type_vlan] network_vlan_ranges = physnet1:10:1000 [ml2_type_flat] flat_networks = physnet2 [ml2_type_vxlan] vni_ranges = 1:1000 --- The openstack network is created like this: --- openstack network create --external --provider-physical-network physnet1 --provider-segment 11 --provider-network-type vlan testnet openstack subnet create --no-dhcp --allocation-pool start=10.10.11.100,end=10.10.11.150 --network testnet --subnet-range 10.10.11.0/24 --gateway 10.10.11.1 testnet-subnet ---- Openvswitch sees the network like this: ---- c12fde89-8778-4c42-9126-0790aea84547 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-ex1 Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port bond0 Interface bond0 Port phy-br-ex1 Interface phy-br-ex1 type: patch options: {peer=int-br-ex1} Port br-ex1 Interface br-ex1 type: internal Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Bridge br-ex2 Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port phy-br-ex2 Interface phy-br-ex2 type: patch options: {peer=int-br-ex2} Port br-ex2 Interface br-ex2 type: internal Port eno2 Interface eno2 Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-int Interface br-int type: internal Port snooper0 Interface snooper0 Port tap5a71d857-11 tag: 1 Interface tap5a71d857-11 type: internal Port int-br-ex1 Interface int-br-ex1 type: patch options: {peer=phy-br-ex1} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port qvoaccb1b23-a0 (This is my cirros test instance) tag: 7 Interface qvoaccb1b23-a0 Port int-br-ex2 Interface int-br-ex2 type: patch options: {peer=phy-br-ex2} ovs-ofctl dump-flows br-int (partially only to see that VLAN translation is done) cookie=0xd41cc81493c76645, duration=1644.771s, table=0, n_packets=78, n_bytes=18928, idle_age=51, priority=3,in_port=1,dl_vlan=11 actions=mod_vlan_vid:5,resubmit(,58) ovs-ofctl dump-flows br-ex1 (partially only to see that VLAN translation is done) cookie=0x9977a33c3076d47, duration=1595.358s, table=0, n_packets=92, n_bytes=18880, priority=4,in_port="phy-br-ex1",dl_vlan=5 actions=mod_vlan_vid:11,NORMAL On Tue, 12 Mar 2024 at 10:49, Sławek Kapłoński <skaplons@redhat.com> wrote:
Hi,
Dnia poniedziałek, 11 marca 2024 11:57:44 CET Mika Saari pisze:
Hi,
I am using latest kolla-ansible installation. Network is in openvswitch
type. My question is related to my provider network with VLAN type
(external HW router, external DHCP, ...). The network is directly connected
to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <->
openvswitch.
When I create a new instance (cirros) to this provider network the
instance can not connect to the metadata server during the bootup.
Metadata server is typically running inside Neutron router's namespace or, in case of isolated networks in the network's dhcp namespace created by the dhcp agent.
It will not be available if You have external dhcp server and instances plugged directly into the provider network (which I assume is not connected to any router in neutron).
You will need to use config-drive instead of metadata server in such case.
There is ongoing work [1] to provide "distributed metadata" which would be then provided directly by the neutron-openvswitch-agent but it's not completed yet (and there is no progress on it recently).
The dhcp
is trying to fetch the address, and I can sniff the data in bond0, br-ex1,
br-int and br-tun and the dhcp request leaves the hardware, DHCP server
sends the reply, but the reply wont go through back to the instance.
Where those replies are dropped? On the bond0 or somewhere in the Openvswitch?
Did You try to disable port_security for the port (or allow all ingress traffic in You security group) to make sure that SGs aren't dropping those replies?
If I
force a static IP to this cirros instance and try pinging the network, the
ICMP doesn't leave the bond0. After around 15 minutes the network starts
suddenly to work and dhcp reply goes through and the instance gets the
dynamic IP. After this the whole network starts to work fluently.
Any ideas what might be the problem?
Thank you very much!
[1] https://review.opendev.org/q/topic:%22distributed_metadata_data_path%22
--
Slawek Kaplonski
Principal Software Engineer
Red Hat
Hi again, The problem was my way of creating the network. Totally my misunderstanding. Created now a subnet with dhcp and allocation pool. My own HW based DHCP probably won't do anything now, but the network is working. The instance got different IP from allocation and from HW based DHCP and I assume that was the problem. Now when the allocation is same as the real IP on the instance, everything seems to be working. Thanks a lot for answers! On Sat, 16 Mar 2024 at 10:34, Mika Saari <mika.saari@simua.com> wrote:
Hi,
I can live without metadata since custom DHCP server provides the IP, at least now when just testing.
I have now been testing a bit more. I totally opened all the security group rules.
I do have: * br-ex1 -> vlan tagged, bond0, physnet1. * br-ex2 -> flat, physnet2.
What I am looking for: * I can use my networks defined in hardware (br-ex1, bond0, physnet1), so that my own DHCP gives IP addresses, but I do not want to use any metadata from OpenStack for these VLAN networks * I can use my direct flat br-ex2 physnet2 where I do have own DHCP and no OpenStack metadata is used * I can use OpenStack flat networks inside projects, and use routers possibly to connect to br-ex1 and br-ex2 in case this kind of behaviour is needed.
Current situation when sniffing from interfaces: * BOOTP goes through now without problems (packets can be seen in bond0, br-int, qvoaccb1b23-a0, HW router, HW dhcp server) * ICMP can be seen in qvoaccb1b23-a0 but not in bond0. * ARP requests can be seen in qvoaccb1b23-a0 but not in bond0
Thanks a lot!
---- additional information - ---
--- My ml2_conf.ini looks like this: (Might be that I am now doing something which is not allowed) --- [ml2] type_drivers = flat,vlan,vxlan tenant_network_types = vlan mechanism_drivers = openvswitch,l2population extension_drivers = port_security
[ml2_type_vlan] network_vlan_ranges = physnet1:10:1000
[ml2_type_flat] flat_networks = physnet2
[ml2_type_vxlan] vni_ranges = 1:1000
--- The openstack network is created like this: --- openstack network create --external --provider-physical-network physnet1 --provider-segment 11 --provider-network-type vlan testnet
openstack subnet create --no-dhcp --allocation-pool start=10.10.11.100,end=10.10.11.150 --network testnet --subnet-range 10.10.11.0/24 --gateway 10.10.11.1 testnet-subnet
---- Openvswitch sees the network like this: ---- c12fde89-8778-4c42-9126-0790aea84547 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-ex1 Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port bond0 Interface bond0 Port phy-br-ex1 Interface phy-br-ex1 type: patch options: {peer=int-br-ex1} Port br-ex1 Interface br-ex1 type: internal Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Bridge br-ex2 Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port phy-br-ex2 Interface phy-br-ex2 type: patch options: {peer=int-br-ex2} Port br-ex2 Interface br-ex2 type: internal Port eno2 Interface eno2 Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-int Interface br-int type: internal Port snooper0 Interface snooper0 Port tap5a71d857-11 tag: 1 Interface tap5a71d857-11 type: internal Port int-br-ex1 Interface int-br-ex1 type: patch options: {peer=phy-br-ex1} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port qvoaccb1b23-a0 (This is my cirros test instance) tag: 7 Interface qvoaccb1b23-a0 Port int-br-ex2 Interface int-br-ex2 type: patch options: {peer=phy-br-ex2}
ovs-ofctl dump-flows br-int (partially only to see that VLAN translation is done) cookie=0xd41cc81493c76645, duration=1644.771s, table=0, n_packets=78, n_bytes=18928, idle_age=51, priority=3,in_port=1,dl_vlan=11 actions=mod_vlan_vid:5,resubmit(,58)
ovs-ofctl dump-flows br-ex1 (partially only to see that VLAN translation is done) cookie=0x9977a33c3076d47, duration=1595.358s, table=0, n_packets=92, n_bytes=18880, priority=4,in_port="phy-br-ex1",dl_vlan=5 actions=mod_vlan_vid:11,NORMAL
On Tue, 12 Mar 2024 at 10:49, Sławek Kapłoński <skaplons@redhat.com> wrote:
Hi,
Dnia poniedziałek, 11 marca 2024 11:57:44 CET Mika Saari pisze:
Hi,
I am using latest kolla-ansible installation. Network is in openvswitch
type. My question is related to my provider network with VLAN type
(external HW router, external DHCP, ...). The network is directly connected
to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <->
openvswitch.
When I create a new instance (cirros) to this provider network the
instance can not connect to the metadata server during the bootup.
Metadata server is typically running inside Neutron router's namespace or, in case of isolated networks in the network's dhcp namespace created by the dhcp agent.
It will not be available if You have external dhcp server and instances plugged directly into the provider network (which I assume is not connected to any router in neutron).
You will need to use config-drive instead of metadata server in such case.
There is ongoing work [1] to provide "distributed metadata" which would be then provided directly by the neutron-openvswitch-agent but it's not completed yet (and there is no progress on it recently).
The dhcp
is trying to fetch the address, and I can sniff the data in bond0, br-ex1,
br-int and br-tun and the dhcp request leaves the hardware, DHCP server
sends the reply, but the reply wont go through back to the instance.
Where those replies are dropped? On the bond0 or somewhere in the Openvswitch?
Did You try to disable port_security for the port (or allow all ingress traffic in You security group) to make sure that SGs aren't dropping those replies?
If I
force a static IP to this cirros instance and try pinging the network, the
ICMP doesn't leave the bond0. After around 15 minutes the network starts
suddenly to work and dhcp reply goes through and the instance gets the
dynamic IP. After this the whole network starts to work fluently.
Any ideas what might be the problem?
Thank you very much!
[1] https://review.opendev.org/q/topic:%22distributed_metadata_data_path%22
--
Slawek Kaplonski
Principal Software Engineer
Red Hat
Hi, Dnia sobota, 16 marca 2024 11:33:39 CET Mika Saari pisze:
Hi again,
The problem was my way of creating the network. Totally my misunderstanding. Created now a subnet with dhcp and allocation pool. My own HW based DHCP probably won't do anything now, but the network is working. The instance got different IP from allocation and from HW based DHCP and I assume that was the problem. Now when the allocation is same as the real IP on the instance, everything seems to be working.
If Your instance got different IP address then allocated in Neutron the problem was "port_security" in such case. Even if You added SG rules to allow all ingress and egress traffic it still blocked traffic from unknown IP/MAC addresses send from VM (this is anti-spoofing mechanism). So to avoid that You can configure the same IP address as allocated in Neutron DB inside Your VM or disable port_security on that port (it can be done by API) or add Your new IP address (different than in the Neutron DB) to the "allowed_address_pairs" list for that port - that way Neutron will know that this other IP address is also fine and traffic from it should be allowed.
Thanks a lot for answers!
On Sat, 16 Mar 2024 at 10:34, Mika Saari <mika.saari@simua.com> wrote:
Hi,
I can live without metadata since custom DHCP server provides the IP, at least now when just testing.
I have now been testing a bit more. I totally opened all the security group rules.
I do have: * br-ex1 -> vlan tagged, bond0, physnet1. * br-ex2 -> flat, physnet2.
What I am looking for: * I can use my networks defined in hardware (br-ex1, bond0, physnet1), so that my own DHCP gives IP addresses, but I do not want to use any metadata from OpenStack for these VLAN networks * I can use my direct flat br-ex2 physnet2 where I do have own DHCP and no OpenStack metadata is used * I can use OpenStack flat networks inside projects, and use routers possibly to connect to br-ex1 and br-ex2 in case this kind of behaviour is needed.
Current situation when sniffing from interfaces: * BOOTP goes through now without problems (packets can be seen in bond0, br-int, qvoaccb1b23-a0, HW router, HW dhcp server) * ICMP can be seen in qvoaccb1b23-a0 but not in bond0. * ARP requests can be seen in qvoaccb1b23-a0 but not in bond0
Thanks a lot!
---- additional information - ---
--- My ml2_conf.ini looks like this: (Might be that I am now doing something which is not allowed) --- [ml2] type_drivers = flat,vlan,vxlan tenant_network_types = vlan mechanism_drivers = openvswitch,l2population extension_drivers = port_security
[ml2_type_vlan] network_vlan_ranges = physnet1:10:1000
[ml2_type_flat] flat_networks = physnet2
[ml2_type_vxlan] vni_ranges = 1:1000
--- The openstack network is created like this: --- openstack network create --external --provider-physical-network physnet1 --provider-segment 11 --provider-network-type vlan testnet
openstack subnet create --no-dhcp --allocation-pool start=10.10.11.100,end=10.10.11.150 --network testnet --subnet-range 10.10.11.0/24 --gateway 10.10.11.1 testnet-subnet
---- Openvswitch sees the network like this: ---- c12fde89-8778-4c42-9126-0790aea84547 Manager "ptcp:6640:127.0.0.1" is_connected: true Bridge br-ex1 Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port bond0 Interface bond0 Port phy-br-ex1 Interface phy-br-ex1 type: patch options: {peer=int-br-ex1} Port br-ex1 Interface br-ex1 type: internal Bridge br-tun Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-tun Interface br-tun type: internal Port patch-int Interface patch-int type: patch options: {peer=patch-tun} Bridge br-ex2 Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port phy-br-ex2 Interface phy-br-ex2 type: patch options: {peer=int-br-ex2} Port br-ex2 Interface br-ex2 type: internal Port eno2 Interface eno2 Bridge br-int Controller "tcp:127.0.0.1:6633" is_connected: true fail_mode: secure datapath_type: system Port br-int Interface br-int type: internal Port snooper0 Interface snooper0 Port tap5a71d857-11 tag: 1 Interface tap5a71d857-11 type: internal Port int-br-ex1 Interface int-br-ex1 type: patch options: {peer=phy-br-ex1} Port patch-tun Interface patch-tun type: patch options: {peer=patch-int} Port qvoaccb1b23-a0 (This is my cirros test instance) tag: 7 Interface qvoaccb1b23-a0 Port int-br-ex2 Interface int-br-ex2 type: patch options: {peer=phy-br-ex2}
ovs-ofctl dump-flows br-int (partially only to see that VLAN translation is done) cookie=0xd41cc81493c76645, duration=1644.771s, table=0, n_packets=78, n_bytes=18928, idle_age=51, priority=3,in_port=1,dl_vlan=11 actions=mod_vlan_vid:5,resubmit(,58)
ovs-ofctl dump-flows br-ex1 (partially only to see that VLAN translation is done) cookie=0x9977a33c3076d47, duration=1595.358s, table=0, n_packets=92, n_bytes=18880, priority=4,in_port="phy-br-ex1",dl_vlan=5 actions=mod_vlan_vid:11,NORMAL
On Tue, 12 Mar 2024 at 10:49, Sławek Kapłoński <skaplons@redhat.com> wrote:
Hi,
Dnia poniedziałek, 11 marca 2024 11:57:44 CET Mika Saari pisze:
Hi,
I am using latest kolla-ansible installation. Network is in openvswitch
type. My question is related to my provider network with VLAN type
(external HW router, external DHCP, ...). The network is directly connected
to the hardware through HW switch <-tagged-> bond0 <-> br-ex1 <->
openvswitch.
When I create a new instance (cirros) to this provider network the
instance can not connect to the metadata server during the bootup.
Metadata server is typically running inside Neutron router's namespace or, in case of isolated networks in the network's dhcp namespace created by the dhcp agent.
It will not be available if You have external dhcp server and instances plugged directly into the provider network (which I assume is not connected to any router in neutron).
You will need to use config-drive instead of metadata server in such case.
There is ongoing work [1] to provide "distributed metadata" which would be then provided directly by the neutron-openvswitch-agent but it's not completed yet (and there is no progress on it recently).
The dhcp
is trying to fetch the address, and I can sniff the data in bond0, br-ex1,
br-int and br-tun and the dhcp request leaves the hardware, DHCP server
sends the reply, but the reply wont go through back to the instance.
Where those replies are dropped? On the bond0 or somewhere in the Openvswitch?
Did You try to disable port_security for the port (or allow all ingress traffic in You security group) to make sure that SGs aren't dropping those replies?
If I
force a static IP to this cirros instance and try pinging the network, the
ICMP doesn't leave the bond0. After around 15 minutes the network starts
suddenly to work and dhcp reply goes through and the instance gets the
dynamic IP. After this the whole network starts to work fluently.
Any ideas what might be the problem?
Thank you very much!
[1] https://review.opendev.org/q/topic:%22distributed_metadata_data_path%22
--
Slawek Kaplonski
Principal Software Engineer
Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
participants (3)
-
Brian Haley
-
Mika Saari
-
Sławek Kapłoński