[Openstack-operators] [Openstack] Strange: lost physical connectivity to compute hosts when using native (ryu) openflow interface

Kevin Benton kevin at benton.pub
Fri May 26 09:03:23 UTC 2017


Sorry about the long delay.

Can you dump the OVS flows before and after the outage? This will let us
know if the flows Neutron setup are getting wiped out.

On Tue, May 2, 2017 at 12:26 PM, Gustavo Randich <gustavo.randich at gmail.com>
wrote:

> Hi Kevin, here is some information aout this issue:
>
> - if the network outage lasts less than ~1 minute, then connectivity to
> host and instances is automatically restored without problem
>
> - otherwise:
>
> - upon outage, "ovs-vsctl show" reports "is_connected: true" in all
> bridges (br-ex / br-int / br-tun)
>
> - after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected:
> true" on every bridge
>
> - upon restoring physical interface (fix outage)
>
>         - "ovs-vsctl show" now reports "is_connected: true" in all bridges
> (br-ex / br-int / br-tun)
>
>        - access to host and VMs is NOT restored, although some pings are
> sporadically answered by host (~1 out of 20)
>
>
> - to restore connectivity, we:
>
>
>       - execute "ifdown br-ex; ifup br-ex" -> access to host is restored,
> but not to VMs
>
>
>       - restart neutron-openvswitch-agent -> access to VMs is restored
>
> Thank you!
>
>
>
>
> On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <kevin at benton.pub> wrote:
>
>> With the network down, does ovs-vsctl show that it is connected to the
>> controller?
>>
>> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich <
>> gustavo.randich at gmail.com> wrote:
>>
>>> Exactly, we access via a tagged interface, which is part of br-ex
>>>
>>> # ip a show vlan171
>>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue
>>> state UNKNOWN group default qlen 1
>>>     link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff
>>>     inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171
>>>        valid_lft forever preferred_lft forever
>>>     inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link
>>>        valid_lft forever preferred_lft forever
>>>
>>> # ovs-vsctl show
>>>     ...
>>>     Bridge br-ex
>>>         Controller "tcp:127.0.0.1:6633"
>>>             is_connected: true
>>>         Port "vlan171"
>>>             tag: 171
>>>             Interface "vlan171"
>>>                 type: internal
>>>     ...
>>>
>>>
>>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <kevin at benton.pub> wrote:
>>>
>>>> Ok, that's likely not the issue then. I assume the way you access each
>>>> host is via an IP assigned to an OVS bridge or an interface that somehow
>>>> depends on OVS?
>>>>
>>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Kevin, we are using the default listen address of loopback
>>>>> interface:
>>>>>
>>>>> # grep -r of_listen_address /etc/neutron
>>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address =
>>>>> 127.0.0.1
>>>>>
>>>>>
>>>>>         tcp/127.0.0.1:6640 -> ovsdb-server /etc/openvswitch/conf.db
>>>>> -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock
>>>>> --private-key=db:Open_vSwitch,SSL,private_key
>>>>> --certificate=db:Open_vSwitch,SSL,certificate
>>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir
>>>>> --log-file=/var/log/openvswitch/ovsdb-server.log
>>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
>>>>>
>>>>> Thanks
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <kevin at benton.pub>
>>>>> wrote:
>>>>>
>>>>>> Are you using an of_listen_address value of an interface being
>>>>>> brought down?
>>>>>>
>>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN /
>>>>>>> l2_population)
>>>>>>>
>>>>>>> This sounds very strange (to me): recently, after a switch outage,
>>>>>>> we lost connectivity to all our Mitaka hosts. We had to enter via iLO host
>>>>>>> by host and restart networking service to regain access. Then restart
>>>>>>> neutron-openvswitch-agent to regain access to VMs.
>>>>>>>
>>>>>>> At first glance we thought it was a problem with the NIC linux
>>>>>>> driver of the hosts not detecting link state correctly.
>>>>>>>
>>>>>>> Then we reproduced the issue simply bringing down physical
>>>>>>> interfaces for around 5 minutes, then up again. Same issue.
>>>>>>>
>>>>>>> And then.... we found that if instead of using native (ryu) OpenFlow
>>>>>>> interface in Neutron Openvswitch we used ovs-ofctl, the problem disappears.
>>>>>>>
>>>>>>> Any clue?
>>>>>>>
>>>>>>> Thanks in advance.
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Mailing list: http://lists.openstack.org/cgi
>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>> Post to     : openstack at lists.openstack.org
>>>>>>> Unsubscribe : http://lists.openstack.org/cgi
>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>
>>>>>>>
>>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170526/4eec4347/attachment-0001.html>


More information about the OpenStack-operators mailing list