[Openstack-operators] [Openstack] Strange: lost physical connectivity to compute hosts when using native (ryu) openflow interface

Gustavo Randich gustavo.randich at gmail.com
Tue May 30 16:49:01 UTC 2017


While dumping OVS flows as you suggested, we finally found the cause of the
problem: our br-ex OVS bridge lacked the secure fail mode configuration.

May be the issue is related to this:
https://bugs.launchpad.net/neutron/+bug/1607787

Thank you


On Fri, May 26, 2017 at 6:03 AM, Kevin Benton <kevin at benton.pub> wrote:

> Sorry about the long delay.
>
> Can you dump the OVS flows before and after the outage? This will let us
> know if the flows Neutron setup are getting wiped out.
>
> On Tue, May 2, 2017 at 12:26 PM, Gustavo Randich <
> gustavo.randich at gmail.com> wrote:
>
>> Hi Kevin, here is some information aout this issue:
>>
>> - if the network outage lasts less than ~1 minute, then connectivity to
>> host and instances is automatically restored without problem
>>
>> - otherwise:
>>
>> - upon outage, "ovs-vsctl show" reports "is_connected: true" in all
>> bridges (br-ex / br-int / br-tun)
>>
>> - after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected:
>> true" on every bridge
>>
>> - upon restoring physical interface (fix outage)
>>
>>         - "ovs-vsctl show" now reports "is_connected: true" in all
>> bridges (br-ex / br-int / br-tun)
>>
>>        - access to host and VMs is NOT restored, although some pings are
>> sporadically answered by host (~1 out of 20)
>>
>>
>> - to restore connectivity, we:
>>
>>
>>       - execute "ifdown br-ex; ifup br-ex" -> access to host is restored,
>> but not to VMs
>>
>>
>>       - restart neutron-openvswitch-agent -> access to VMs is restored
>>
>> Thank you!
>>
>>
>>
>>
>> On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <kevin at benton.pub> wrote:
>>
>>> With the network down, does ovs-vsctl show that it is connected to the
>>> controller?
>>>
>>> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich <
>>> gustavo.randich at gmail.com> wrote:
>>>
>>>> Exactly, we access via a tagged interface, which is part of br-ex
>>>>
>>>> # ip a show vlan171
>>>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue
>>>> state UNKNOWN group default qlen 1
>>>>     link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff
>>>>     inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171
>>>>        valid_lft forever preferred_lft forever
>>>>     inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link
>>>>        valid_lft forever preferred_lft forever
>>>>
>>>> # ovs-vsctl show
>>>>     ...
>>>>     Bridge br-ex
>>>>         Controller "tcp:127.0.0.1:6633"
>>>>             is_connected: true
>>>>         Port "vlan171"
>>>>             tag: 171
>>>>             Interface "vlan171"
>>>>                 type: internal
>>>>     ...
>>>>
>>>>
>>>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <kevin at benton.pub> wrote:
>>>>
>>>>> Ok, that's likely not the issue then. I assume the way you access each
>>>>> host is via an IP assigned to an OVS bridge or an interface that somehow
>>>>> depends on OVS?
>>>>>
>>>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Kevin, we are using the default listen address of loopback
>>>>>> interface:
>>>>>>
>>>>>> # grep -r of_listen_address /etc/neutron
>>>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address =
>>>>>> 127.0.0.1
>>>>>>
>>>>>>
>>>>>>         tcp/127.0.0.1:6640 -> ovsdb-server /etc/openvswitch/conf.db
>>>>>> -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock
>>>>>> --private-key=db:Open_vSwitch,SSL,private_key
>>>>>> --certificate=db:Open_vSwitch,SSL,certificate
>>>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir
>>>>>> --log-file=/var/log/openvswitch/ovsdb-server.log
>>>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
>>>>>>
>>>>>> Thanks
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <kevin at benton.pub>
>>>>>> wrote:
>>>>>>
>>>>>>> Are you using an of_listen_address value of an interface being
>>>>>>> brought down?
>>>>>>>
>>>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN /
>>>>>>>> l2_population)
>>>>>>>>
>>>>>>>> This sounds very strange (to me): recently, after a switch outage,
>>>>>>>> we lost connectivity to all our Mitaka hosts. We had to enter via iLO host
>>>>>>>> by host and restart networking service to regain access. Then restart
>>>>>>>> neutron-openvswitch-agent to regain access to VMs.
>>>>>>>>
>>>>>>>> At first glance we thought it was a problem with the NIC linux
>>>>>>>> driver of the hosts not detecting link state correctly.
>>>>>>>>
>>>>>>>> Then we reproduced the issue simply bringing down physical
>>>>>>>> interfaces for around 5 minutes, then up again. Same issue.
>>>>>>>>
>>>>>>>> And then.... we found that if instead of using native (ryu)
>>>>>>>> OpenFlow interface in Neutron Openvswitch we used ovs-ofctl, the problem
>>>>>>>> disappears.
>>>>>>>>
>>>>>>>> Any clue?
>>>>>>>>
>>>>>>>> Thanks in advance.
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Mailing list: http://lists.openstack.org/cgi
>>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>> Post to     : openstack at lists.openstack.org
>>>>>>>> Unsubscribe : http://lists.openstack.org/cgi
>>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>>
>>>>>>>>
>>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170530/3c0b5f0a/attachment.html>


More information about the OpenStack-operators mailing list