[Openstack-operators] [Openstack] Strange: lost physical connectivity to compute hosts when using native (ryu) openflow interface
Kevin Benton
kevin at benton.pub
Wed May 31 00:36:16 UTC 2017
Do you have that patch already in your environment? If not, can you confirm
it fixes the issue?
On Tue, May 30, 2017 at 9:49 AM, Gustavo Randich <gustavo.randich at gmail.com>
wrote:
> While dumping OVS flows as you suggested, we finally found the cause of
> the problem: our br-ex OVS bridge lacked the secure fail mode configuration.
>
> May be the issue is related to this: https://bugs.launchpad.net/
> neutron/+bug/1607787
>
> Thank you
>
>
> On Fri, May 26, 2017 at 6:03 AM, Kevin Benton <kevin at benton.pub> wrote:
>
>> Sorry about the long delay.
>>
>> Can you dump the OVS flows before and after the outage? This will let us
>> know if the flows Neutron setup are getting wiped out.
>>
>> On Tue, May 2, 2017 at 12:26 PM, Gustavo Randich <
>> gustavo.randich at gmail.com> wrote:
>>
>>> Hi Kevin, here is some information aout this issue:
>>>
>>> - if the network outage lasts less than ~1 minute, then connectivity to
>>> host and instances is automatically restored without problem
>>>
>>> - otherwise:
>>>
>>> - upon outage, "ovs-vsctl show" reports "is_connected: true" in all
>>> bridges (br-ex / br-int / br-tun)
>>>
>>> - after about ~1 minute, "ovs-vsctl show" ceases to show "is_connected:
>>> true" on every bridge
>>>
>>> - upon restoring physical interface (fix outage)
>>>
>>> - "ovs-vsctl show" now reports "is_connected: true" in all
>>> bridges (br-ex / br-int / br-tun)
>>>
>>> - access to host and VMs is NOT restored, although some pings are
>>> sporadically answered by host (~1 out of 20)
>>>
>>>
>>> - to restore connectivity, we:
>>>
>>>
>>> - execute "ifdown br-ex; ifup br-ex" -> access to host is
>>> restored, but not to VMs
>>>
>>>
>>> - restart neutron-openvswitch-agent -> access to VMs is restored
>>>
>>> Thank you!
>>>
>>>
>>>
>>>
>>> On Fri, Apr 28, 2017 at 5:07 PM, Kevin Benton <kevin at benton.pub> wrote:
>>>
>>>> With the network down, does ovs-vsctl show that it is connected to the
>>>> controller?
>>>>
>>>> On Fri, Apr 28, 2017 at 2:21 PM, Gustavo Randich <
>>>> gustavo.randich at gmail.com> wrote:
>>>>
>>>>> Exactly, we access via a tagged interface, which is part of br-ex
>>>>>
>>>>> # ip a show vlan171
>>>>> 16: vlan171: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue
>>>>> state UNKNOWN group default qlen 1
>>>>> link/ether 8e:14:8d:c1:1a:5f brd ff:ff:ff:ff:ff:ff
>>>>> inet 10.171.1.240/20 brd 10.171.15.255 scope global vlan171
>>>>> valid_lft forever preferred_lft forever
>>>>> inet6 fe80::8c14:8dff:fec1:1a5f/64 scope link
>>>>> valid_lft forever preferred_lft forever
>>>>>
>>>>> # ovs-vsctl show
>>>>> ...
>>>>> Bridge br-ex
>>>>> Controller "tcp:127.0.0.1:6633"
>>>>> is_connected: true
>>>>> Port "vlan171"
>>>>> tag: 171
>>>>> Interface "vlan171"
>>>>> type: internal
>>>>> ...
>>>>>
>>>>>
>>>>> On Fri, Apr 28, 2017 at 3:03 PM, Kevin Benton <kevin at benton.pub>
>>>>> wrote:
>>>>>
>>>>>> Ok, that's likely not the issue then. I assume the way you access
>>>>>> each host is via an IP assigned to an OVS bridge or an interface that
>>>>>> somehow depends on OVS?
>>>>>>
>>>>>> On Apr 28, 2017 12:04, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Kevin, we are using the default listen address of loopback
>>>>>>> interface:
>>>>>>>
>>>>>>> # grep -r of_listen_address /etc/neutron
>>>>>>> /etc/neutron/plugins/ml2/openvswitch_agent.ini:#of_listen_address =
>>>>>>> 127.0.0.1
>>>>>>>
>>>>>>>
>>>>>>> tcp/127.0.0.1:6640 -> ovsdb-server /etc/openvswitch/conf.db
>>>>>>> -vconsole:emer -vsyslog:err -vfile:info --remote=punix:/var/run/openvswitch/db.sock
>>>>>>> --private-key=db:Open_vSwitch,SSL,private_key
>>>>>>> --certificate=db:Open_vSwitch,SSL,certificate
>>>>>>> --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --no-chdir
>>>>>>> --log-file=/var/log/openvswitch/ovsdb-server.log
>>>>>>> --pidfile=/var/run/openvswitch/ovsdb-server.pid --detach --monitor
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Apr 28, 2017 at 5:00 AM, Kevin Benton <kevin at benton.pub>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Are you using an of_listen_address value of an interface being
>>>>>>>> brought down?
>>>>>>>>
>>>>>>>> On Apr 25, 2017 17:34, "Gustavo Randich" <gustavo.randich at gmail.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> (using Mitaka / Ubuntu 16 / Neutron DVR / OVS / VXLAN /
>>>>>>>>> l2_population)
>>>>>>>>>
>>>>>>>>> This sounds very strange (to me): recently, after a switch outage,
>>>>>>>>> we lost connectivity to all our Mitaka hosts. We had to enter via iLO host
>>>>>>>>> by host and restart networking service to regain access. Then restart
>>>>>>>>> neutron-openvswitch-agent to regain access to VMs.
>>>>>>>>>
>>>>>>>>> At first glance we thought it was a problem with the NIC linux
>>>>>>>>> driver of the hosts not detecting link state correctly.
>>>>>>>>>
>>>>>>>>> Then we reproduced the issue simply bringing down physical
>>>>>>>>> interfaces for around 5 minutes, then up again. Same issue.
>>>>>>>>>
>>>>>>>>> And then.... we found that if instead of using native (ryu)
>>>>>>>>> OpenFlow interface in Neutron Openvswitch we used ovs-ofctl, the problem
>>>>>>>>> disappears.
>>>>>>>>>
>>>>>>>>> Any clue?
>>>>>>>>>
>>>>>>>>> Thanks in advance.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> Mailing list: http://lists.openstack.org/cgi
>>>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>>> Post to : openstack at lists.openstack.org
>>>>>>>>> Unsubscribe : http://lists.openstack.org/cgi
>>>>>>>>> -bin/mailman/listinfo/openstack
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>
>>>>
>>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170530/a3d1dd52/attachment.html>
More information about the OpenStack-operators
mailing list