[ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq

Jean-Philippe Méthot jp.methot at planethoster.info
Wed Jun 5 19:31:32 UTC 2019


Hi,

Thank you for your reply. There’s no firewall. However, we ended up figuring out that we were running out of tcp sockets. On a related note, we are still having issues but only with metadata fed through Neutron. Seems that it’s nova-api refusing the connection with http 500 error when the metadata-agent tries to connect to it. This is a completely different issue and may be more related to nova than neutron though, so it may very well not be the right mail thread to discuss it.

Best regards,

Jean-Philippe Méthot
Openstack system administrator
Administrateur système Openstack
PlanetHoster inc.




> Le 5 juin 2019 à 15:09, Brian Haley <haleyb.dev at gmail.com> a écrit :
> 
> On 6/5/19 1:01 PM, Jean-Philippe Méthot wrote:
>> Hi,
>> We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail :
>> =ERROR REPORT==== 5-Jun-2019::18:50:08 ===
>> closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d):
>> missed heartbeats from client, timeout: 60s
>> The neutron-server logs show this error:
>> 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer
>> 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: RecoverableConnectionError: <RecoverableConnectionError: unknown error>
> 
> Are there possibly any firewall rules getting in the way?  Connection reset by peer usually means the other end has sent a TCP Reset, which wouldn't happen if the permissions were wrong.
> 
> As a test, does this connect?
> 
> $ telnet controller1 5672
> Trying $IP...
> Connected to controller1.
> Escape character is '^]'.
> 
> -Brian
> 
> 
>> The relevant service version numbers are as follow:
>> rabbitmq-server-3.6.5-1.el7.noarch
>> openstack-neutron-12.0.6-1.el7.noarch
>> python2-oslo-messaging-5.35.4-1.el7.noarch
>> Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues.
>> I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there.
>> Best regards,
>> Jean-Philippe Méthot
>> Openstack system administrator
>> Administrateur système Openstack
>> PlanetHoster inc.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20190605/f0743f36/attachment.html>


More information about the openstack-discuss mailing list