[ops][neutron]After an upgrade to openstack queens, Neutron is unable to communicate properly with rabbitmq
jp.methot at planethoster.info
Wed Jun 5 19:31:32 UTC 2019
Thank you for your reply. There’s no firewall. However, we ended up figuring out that we were running out of tcp sockets. On a related note, we are still having issues but only with metadata fed through Neutron. Seems that it’s nova-api refusing the connection with http 500 error when the metadata-agent tries to connect to it. This is a completely different issue and may be more related to nova than neutron though, so it may very well not be the right mail thread to discuss it.
Openstack system administrator
Administrateur système Openstack
> Le 5 juin 2019 à 15:09, Brian Haley <haleyb.dev at gmail.com> a écrit :
> On 6/5/19 1:01 PM, Jean-Philippe Méthot wrote:
>> We had a Pike openstack setup that we updated to Queens earlier this week. It’s a 30 compute nodes infrastructure with 2 controller nodes and 2 network nodes, using openvswitch for networking. Since we upgraded to queens, neutron-server on the controller nodes has been unable to contact the openvswitch-agents through rabbitmq. The rabbitmq is clustered on both controller nodes and has been giving us the following error when neutron-server connections fail :
>> =ERROR REPORT==== 5-Jun-2019::18:50:08 ===
>> closing AMQP connection <0.23859.0> (10.30.0.11:53198 -> 10.30.0.11:5672 - neutron-server:1170:ccf11f31-2b3b-414e-ab19-5ee2cf5dd15d):
>> missed heartbeats from client, timeout: 60s
>> The neutron-server logs show this error:
>> 2019-06-05 18:50:33.132 1169 ERROR oslo.messaging._drivers.impl_rabbit [req-17167988-c6f2-475e-8b6a-90b92777e03a - - - - -] [b7684919-c98b-402e-90c3-59a0b5eccd1f] AMQP server on controller1:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds.: error: [Errno 104] Connection reset by peer
>> 2019-06-05 18:50:33.217 1169 ERROR oslo.messaging._drivers.impl_rabbit [-] [bd6900e0-ab7b-4139-920c-a456d7df023b] AMQP server on controller1:5672 is unreachable: <RecoverableConnectionError: unknown error>. Trying again in 1 seconds.: RecoverableConnectionError: <RecoverableConnectionError: unknown error>
> Are there possibly any firewall rules getting in the way? Connection reset by peer usually means the other end has sent a TCP Reset, which wouldn't happen if the permissions were wrong.
> As a test, does this connect?
> $ telnet controller1 5672
> Trying $IP...
> Connected to controller1.
> Escape character is '^]'.
>> The relevant service version numbers are as follow:
>> Rabbitmq does not show any alert. It also has plenty of memory and a high enough file limit. The login user and credentials are fine as they are used in other openstack services which can contact rabbitmq without issues.
>> I’ve tried optimizing rabbitmq, upgrading, downgrading, increasing timeouts in neutron services, etc, to no avail. I find myself at a loss and would appreciate if anyone has any idea as to where to go from there.
>> Best regards,
>> Jean-Philippe Méthot
>> Openstack system administrator
>> Administrateur système Openstack
>> PlanetHoster inc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the openstack-discuss