[Openstack-operators] [Openstack] Strange: lost physical connectivity to compute hosts when using native (ryu) openflow interface

Gustavo Randich gustavo.randich at gmail.com
Tue May 2 19:26:53 UTC 2017


Hi Kevin, here is some information aout this issue:

- if the network outage lasts less than ~1 minute, then connectivity to
host and instances is automatically restored without problem

- otherwise:

- upon outage, "ovs-vsctl show" reports "is_connected: true" in all bridges
(br-ex / br-int / br-tun)

- after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected:
true" on every bridge

- upon restoring physical interface (fix outage)

        - "ovs-vsctl show" now reports "is_connected: true" in all bridges
(br-ex / br-int / br-tun)

       - access to host and VMs is NOT restored, although some pings are
sporadically answered by host (~1 out of 20)


- to restore connectivity, we:


      - execute "ifdown br-ex; ifup br-ex" -> access to host is restored,
but not to VMs


      - restart neutron-openvswitch-agent -> access to VMs is restored

Thank you!




On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <kevin at benton.pub> wrote:

> With the network down, does ovs-vsctl show that it is connected to the
> controller?
>
> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich <
> gustavo.randich at gmail.com> wrote:
>
>> Exactly, we access via a tagged interface, which is part of br-ex
>>
>> # ip a show vlan171
>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue
>> state UNKNOWN group default qlen 1
>>     link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff
>>     inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171
>>        valid_lft forever preferred_lft forever
>>     inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link
>>        valid_lft forever preferred_lft forever
>>
>> # ovs-vsctl show
>>     ...
>>     Bridge br-ex
>>         Controller "tcp:127.0.0.1:6633"
>>             is_connected: true
>>         Port "vlan171"
>>             tag: 171
>>             Interface "vlan171"
>>                 type: internal
>>     ...
>>
>>
>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <kevin at benton.pub> wrote:
>>
>>> Ok, that's likely not the issue then. I assume the way you access each
>>> host is via an IP assigned to an OVS bridge or an interface that somehow
>>> depends on OVS?
>>>
>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.randich at gmail.com>
>>> wrote:
>>>
>>>> Hi Kevin, we are using the default listen address of loopback interface:
>>>>
>>>> # grep -r of_listen_address /etc/neutron
>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address =
>>>> 127.0.0.1
>>>>
>>>>
>>>>         tcp/127.0.0.1:6640 -> ovsdb-server /etc/openvswitch/conf.db
>>>> -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock
>>>> --private-key=db:Open_vSwitch,SSL,private_key
>>>> --certificate=db:Open_vSwitch,SSL,certificate
>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir
>>>> --log-file=/var/log/openvswitch/ovsdb-server.log
>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
>>>>
>>>> Thanks
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <kevin at benton.pub> wrote:
>>>>
>>>>> Are you using an of_listen_address value of an interface being brought
>>>>> down?
>>>>>
>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN / l2_population)
>>>>>>
>>>>>> This sounds very strange (to me): recently, after a switch outage, we
>>>>>> lost connectivity to all our Mitaka hosts. We had to enter via iLO host by
>>>>>> host and restart networking service to regain access. Then restart
>>>>>> neutron-openvswitch-agent to regain access to VMs.
>>>>>>
>>>>>> At first glance we thought it was a problem with the NIC linux driver
>>>>>> of the hosts not detecting link state correctly.
>>>>>>
>>>>>> Then we reproduced the issue simply bringing down physical interfaces
>>>>>> for around 5 minutes, then up again. Same issue.
>>>>>>
>>>>>> And then.... we found that if instead of using native (ryu) OpenFlow
>>>>>> interface in Neutron Openvswitch we used ovs-ofctl, the problem disappears.
>>>>>>
>>>>>> Any clue?
>>>>>>
>>>>>> Thanks in advance.
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> Mailing list: http://lists.openstack.org/cgi
>>>>>> -bin/mailman/listinfo/openstack
>>>>>> Post to     : openstack at lists.openstack.org
>>>>>> Unsubscribe : http://lists.openstack.org/cgi
>>>>>> -bin/mailman/listinfo/openstack
>>>>>>
>>>>>>
>>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170502/44bcbb99/attachment.html>


More information about the OpenStack-operators mailing list