[Openstack-operators] [Openstack] Strange: lost physical connectivity to compute hosts when using native (ryu) openflow interface

Gustavo Randich gustavo.randich at gmail.com
Wed May 31 14:11:47 UTC 2017


Hi Kevin, I confirm that applying the patch the problem is fixed.

Sorry for the inconvenience.


On Tue, May 30, 2017 at 9:36 PM, Kevin Benton <kevin at benton.pub> wrote:

> Do you have that patch already in your environment? If not, can you
> confirm it fixes the issue?
>
> On Tue, May 30, 2017 at 9:49 AM, Gustavo Randich <
> gustavo.randich at gmail.com> wrote:
>
>> While dumping OVS flows as you suggested, we finally found the cause of
>> the problem: our br-ex OVS bridge lacked the secure fail mode configuration.
>>
>> May be the issue is related to this: https://bugs.launchpad.net/neu
>> tron/+bug/1607787
>>
>> Thank you
>>
>>
>> On Fri, May 26, 2017 at 6:03 AM, Kevin Benton <kevin at benton.pub> wrote:
>>
>>> Sorry about the long delay.
>>>
>>> Can you dump the OVS flows before and after the outage? This will let us
>>> know if the flows Neutron setup are getting wiped out.
>>>
>>> On Tue, May 2, 2017 at 12:26 PM, Gustavo Randich <
>>> gustavo.randich at gmail.com> wrote:
>>>
>>>> Hi Kevin, here is some information aout this issue:
>>>>
>>>> - if the network outage lasts less than ~1 minute, then connectivity to
>>>> host and instances is automatically restored without problem
>>>>
>>>> - otherwise:
>>>>
>>>> - upon outage, "ovs-vsctl show" reports "is_connected: true" in all
>>>> bridges (br-ex / br-int / br-tun)
>>>>
>>>> - after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected:
>>>> true" on every bridge
>>>>
>>>> - upon restoring physical interface (fix outage)
>>>>
>>>>         - "ovs-vsctl show" now reports "is_connected: true" in all
>>>> bridges (br-ex / br-int / br-tun)
>>>>
>>>>        - access to host and VMs is NOT restored, although some pings
>>>> are sporadically answered by host (~1 out of 20)
>>>>
>>>>
>>>> - to restore connectivity, we:
>>>>
>>>>
>>>>       - execute "ifdown br-ex; ifup br-ex" -> access to host is
>>>> restored, but not to VMs
>>>>
>>>>
>>>>       - restart neutron-openvswitch-agent -> access to VMs is restored
>>>>
>>>> Thank you!
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <kevin at benton.pub> wrote:
>>>>
>>>>> With the network down, does ovs-vsctl show that it is connected to the
>>>>> controller?
>>>>>
>>>>> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich <
>>>>> gustavo.randich at gmail.com> wrote:
>>>>>
>>>>>> Exactly, we access via a tagged interface, which is part of br-ex
>>>>>>
>>>>>> # ip a show vlan171
>>>>>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc
>>>>>> noqueue state UNKNOWN group default qlen 1
>>>>>>     link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff
>>>>>>     inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171
>>>>>>        valid_lft forever preferred_lft forever
>>>>>>     inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link
>>>>>>        valid_lft forever preferred_lft forever
>>>>>>
>>>>>> # ovs-vsctl show
>>>>>>     ...
>>>>>>     Bridge br-ex
>>>>>>         Controller "tcp:127.0.0.1:6633"
>>>>>>             is_connected: true
>>>>>>         Port "vlan171"
>>>>>>             tag: 171
>>>>>>             Interface "vlan171"
>>>>>>                 type: internal
>>>>>>     ...
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <kevin at benton.pub>
>>>>>> wrote:
>>>>>>
>>>>>>> Ok, that's likely not the issue then. I assume the way you access
>>>>>>> each host is via an IP assigned to an OVS bridge or an interface that
>>>>>>> somehow depends on OVS?
>>>>>>>
>>>>>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Kevin, we are using the default listen address of loopback
>>>>>>>> interface:
>>>>>>>>
>>>>>>>> # grep -r of_listen_address /etc/neutron
>>>>>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address
>>>>>>>> = 127.0.0.1
>>>>>>>>
>>>>>>>>
>>>>>>>>         tcp/127.0.0.1:6640 -> ovsdb-server
>>>>>>>> /etc/openvswitch/conf.db -vconsole:emer -vsyslog:err -vfile:info
>>>>>>>> --remote=punix:/var/run/openvswitch/db.sock
>>>>>>>> --private-key=db:Open_vSwitch,SSL,private_key
>>>>>>>> --certificate=db:Open_vSwitch,SSL,certificate
>>>>>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir
>>>>>>>> --log-file=/var/log/openvswitch/ovsdb-server.log
>>>>>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
>>>>>>>>
>>>>>>>> Thanks
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <kevin at benton.pub>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Are you using an of_listen_address value of an interface being
>>>>>>>>> brought down?
>>>>>>>>>
>>>>>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" <
>>>>>>>>> gustavo.randich at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN /
>>>>>>>>>> l2_population)
>>>>>>>>>>
>>>>>>>>>> This sounds very strange (to me): recently, after a switch
>>>>>>>>>> outage, we lost connectivity to all our Mitaka hosts. We had to enter via
>>>>>>>>>> iLO host by host and restart networking service to regain access. Then
>>>>>>>>>> restart neutron-openvswitch-agent to regain access to VMs.
>>>>>>>>>>
>>>>>>>>>> At first glance we thought it was a problem with the NIC linux
>>>>>>>>>> driver of the hosts not detecting link state correctly.
>>>>>>>>>>
>>>>>>>>>> Then we reproduced the issue simply bringing down physical
>>>>>>>>>> interfaces for around 5 minutes, then up again. Same issue.
>>>>>>>>>>
>>>>>>>>>> And then.... we found that if instead of using native (ryu)
>>>>>>>>>> OpenFlow interface in Neutron Openvswitch we used ovs-ofctl, the problem
>>>>>>>>>> disappears.
>>>>>>>>>>
>>>>>>>>>> Any clue?
>>>>>>>>>>
>>>>>>>>>> Thanks in advance.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> _______________________________________________
>>>>>>>>>> Mailing list: http://lists.openstack.org/cgi
>>>>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>>>> Post to     : openstack at lists.openstack.org
>>>>>>>>>> Unsubscribe : http://lists.openstack.org/cgi
>>>>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170531/f775d73d/attachment.html>


More information about the OpenStack-operators mailing list