[openstack-dev] [neutron][fwaas] Investigation into debian/l3/wsgi/fwaas error

Nate Johnston nate.johnston at redhat.com
Thu Jun 21 14:48:05 UTC 2018


[bringing a side email conversation onto the main mailing list]

I have been looking into the issue with neutron_fwaas having an error
when running under the neutron-l3-agent on Debian when using wsgi.
Here's what I have tracked it down to at this point.  I am going to lay
it all out there, including points that you already know, because I am
going to bring in another party or two at this point.

To make sure we are on solid ground, let me restate what are the
parameters of the problem:

1. The error does not occur when neutron_fwaas is disabled.
2. The error does not occur if wsgi is in use.  If standard eventlet is
used, the error is not observed.
3. The error only occurs on debian; centos and ubuntu do not manifest
the problem.

As the neutron-l3-agent loads, it is trying to initialize the fwaas_v2
driver.  The driver initializes without incident, and then proceeds to
attempt to fetch firewall groups.  Note that you do not need to exercise
tempest to see this behavior; it is visible in the logs without anything
else going on.  Running pdb, I was able to trace the attempt to send the
message deep into the RPC transmission code; I saw very little there to
be suspicious of.

    2018-06-20 21:06:34.761 915007 DEBUG neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-] Fetch firewall groups from plugin get_firewall_groups_for_project /usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py:44
    ...
    2018-06-20 21:07:05.551 915007 ERROR neutron.common.rpc [-] Timeout in RPC method get_firewall_groups_for_project. Waiting for 10 seconds before next attempt. If the server is not down, consider increasing the rpc_response_timeout option as Neutron server(s) may be overloaded and unable to respond quickly enough.:
    oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 8616e98dd8d943eea1dcf99c04bd2be6

You can see, the RPC message goes into the ether and does not return.  This
results in the stacktraces in neutron-l3-agent.log.  This example is from a
later transaction.


        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 [-] FWaaS router add RPC info call failed for 8c13b5d7-7b93-4b91-ae4c-c387abe96734: oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 44851518f5ee4d40a2cdbcabc27e3c92
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 Traceback (most recent call last):
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 324, in get
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     return self._queues[msg_id].get(block=True, timeout=timeout)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 313, in get
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     return waiter.wait()
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/eventlet/queue.py", line 141, in wait
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     return get_hub().switch()
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/eventlet/hubs/hub.py", line 294, in switch
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     return self.greenlet.switch()
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 queue.Empty
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 During handling of the above exception, another exception occurred:
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 Traceback (most recent call last):
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py", line 292, in add_router
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     self._process_router_update(context, new_router)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py", line 256, in _process_router_update
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     fwg_list = self.fwplugin_rpc.get_firewall_groups_for_project(ctx)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/neutron_fwaas/services/firewall/agents/l3reference/firewall_l3_agent_v2.py", line 45, in get_firewall_groups_for_project
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     host=self.host)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 185, in call
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     time.sleep(wait)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 220, in __exit__
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     self.force_reraise()
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_utils/excutils.py", line 196, in force_reraise
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     six.reraise(self.type_, self.value, self.tb)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/six.py", line 686, in reraise
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     raise value
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/neutron/common/rpc.py", line 162, in call
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     return self._original_context.call(ctxt, method, **kwargs)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/rpc/client.py", line 174, in call
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     retry=self.retry)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/transport.py", line 131, in _send
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     timeout=timeout, retry=retry)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 559, in send
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     retry=retry)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 548, in _send
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     result = self._waiter.wait(msg_id, timeout)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 440, in wait
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     message = self.waiters.get(msg_id, timeout=timeout)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2   File "/usr/lib/python3/dist-packages/oslo_messaging/_drivers/amqpdriver.py", line 328, in get
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2     'to message ID %s' % msg_id)
        2018-06-20 21:23:24.772 919205 ERROR neutron_fwaas.services.firewall.agents.l3reference.firewall_l3_agent_v2 oslo_messaging.exceptions.MessagingTimeout: Timed out waiting for a reply to message ID 44851518f5ee4d40a2cdbcabc27e3c92

So it seems the RPC message for get_firewall_groups_for_project gets
lost. I can tell that there is no shortage of AMQP messaging occurring
between the neutron-l3-agent and RabbitMQ.  Looking at the RabbitMQ
error log, I am concerned that the message is not even really being
transmitted, because RabbitMQ is registering premature disconnections
from only the neutron_l3_agent at times that correspond to this testing.

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.16171.21> (127.0.0.1:47942 -> 127.0.0.1:5671 - neutron-l3-agent:804510:ecc4ca7d-361f-49f7-94f3-9b4a07d102fc):
        client unexpectedly closed TCP connection

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.16110.21> (127.0.0.1:47934 -> 127.0.0.1:5671 - neutron-l3-agent:804510:8974dfd0-cb22-4925-bf95-ed5dcd905e41):
        client unexpectedly closed TCP connection

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.15410.21> (127.0.0.1:47814 -> 127.0.0.1:5671 - neutron-l3-agent:804510:ce915f0c-e46d-442f-aba2-5c204589fb0f):
        client unexpectedly closed TCP connection

        =WARNING REPORT==== 20-Jun-2018::19:42:41 ===
        closing AMQP connection <0.15388.21> (127.0.0.1:47810 -> 127.0.0.1:5671 - neutron-l3-agent:804510:ab9b2193-199b-4312-9eca-2fbfc6cf27cd):
        client unexpectedly closed TCP connection

I will continue to debug the issue tomorrow.  I see no lonkage at this
point with any of the previously listed constraints on this scenario.
So I am going to copy Brian Haley for his L3 expertise, as well as the 3
FWAaaS cores to see if this directs their thoughts in any particular
direction.  I hope to continue the investigation tomorrow.

Thanks,

Nate Johnston
njohnston



More information about the OpenStack-dev mailing list