[Openstack] Neutron glitches and floatingip status DOWN
Matt Davis
mattd5574 at gmail.com
Thu May 14 22:04:37 UTC 2015
Hi George,
Thanks for the suggestions. I've tried a few things, but unfortunately I
had already restarted all of the neutron services on all of the nodes, so I
was unable to do a before/after comparison. I did do a before/after
comparison with a floatingip-associate command. I did the following:
1) Dump the output of 'ip address' and 'iptables -S' on the network node
and the compute node containing the VM.
2) Associate the floating ip with a running VM.
3) Dump the output of 'ip address' and 'iptables -S' and diff against the
originals.
I'm not seeing any change in any of the output, which seems wrong to me.
The l3 agent log is extremely long, but it only contains a few types of
entries. I've patched together one line for each type of entry I was able
to find instead of dumping thousands of repeating lines in this email. The
only
2015-05-14 21:22:03.910 25800 INFO neutron.agent.l3_agent
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] L3 agent started
Command: ['ip', '-o', 'link', 'show', 'br-ex']
Exit code: 0
Stdout: '6: br-ex: <BROADCAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state
UNKNOWN mode DEFAULT group default \\ link/ether f4:ce:46:81:bf:1a brd
ff:ff:ff:ff:ff:ff\n'
Stderr: '' execute
/usr/lib/python2.7/dist-packages/neutron/agent/linux/utils.py:75
2015-05-14 21:22:30.794 25800 DEBUG neutron.openstack.common.lockutils
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] Semaphore / lock released
"_rpc_loop" inner
/usr/lib/python2.7/dist-packages/neutron/openstack/common/lockutils.py:252
2015-05-14 21:22:31.793 25800 DEBUG neutron.openstack.common.lockutils
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] Got semaphore "l3-agent"
lock
/usr/lib/python2.7/dist-packages/neutron/openstack/common/lockutils.py:168
2015-05-14 21:22:31.794 25800 DEBUG neutron.openstack.common.lockutils
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] Got semaphore / lock
"_rpc_loop" inner
/usr/lib/python2.7/dist-packages/neutron/openstack/common/lockutils.py:248
2015-05-14 21:22:31.794 25800 DEBUG neutron.agent.l3_agent
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] Starting RPC loop for 0
updated routers _rpc_loop
/usr/lib/python2.7/dist-packages/neutron/agent/l3_agent.py:823
2015-05-14 21:22:31.794 25800 DEBUG neutron.agent.l3_agent
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] RPC loop successfully
completed _rpc_loop
/usr/lib/python2.7/dist-packages/neutron/agent/l3_agent.py:840
2015-05-14 21:34:47.918 25800 DEBUG neutron.agent.l3_agent
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] Starting _sync_routers_task
- fullsync:False _sync_routers_task
/usr/lib/python2.7/dist-packages/neutron/agent/l3_agent.py:861
2015-05-14 21:35:03.789 25800 DEBUG neutron.openstack.common.rpc.amqp
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] UNIQUE_ID is
16d55615cce34d49a82e52d47b0b0518. _add_unique_id
/usr/lib/python2.7/dist-packages/neutron/openstack/common/rpc/amqp.py:342
2015-05-14 21:35:33.790 25800 DEBUG neutron.openstack.common.rpc.amqp
[req-8436d571-b12e-4218-b13b-e5dddb461370 None] Making asynchronous cast on
q-plugin... cast
/usr/lib/python2.7/dist-packages/neutron/openstack/common/rpc/amqp.py:583
Nothing above the INFO or DEBUG level.
-Matt
On Thu, May 14, 2015 at 4:51 AM, George Mihaiescu <lmihaiescu at gmail.com>
wrote:
> Hi Matt,
>
> The L3 agent is in charge of implementing NAT rules inside the qrouter
> namespace, and it probably failed while Neutron API was down.
>
> I would dump the iptables rules, restart the agent(s) on the network node
> and compare the iptables and 'ip address' output from before and after.
>
> Also, enabling debug and verbose in neutron.conf before restarting the
> agent(s) should bring up any still existing errors.
>
> George
> On 14 May 2015 01:16, "Matt Davis" <mattd5574 at gmail.com> wrote:
>
>> Hi all,
>>
>> I've been diagnosing a problem on my icehouse install (ubuntu with a
>> 3-node galera cluster as the database backend) and I've gotten the system
>> into a bad state. The glitch I've been chasing is that the neutron API
>> becomes unresponsive for a few minutes approximately every half hour before
>> returning to normal. Nothing obvious in the logs (no warnings, errors, or
>> critical output seems correllated with the failure). The request goes in
>> and I get no response back. After 5 minutes or so, it returns to normal.
>>
>> The second problem is that while I reconfigured the system to debug
>> (removing proxy layers, connecting directly to a single galera cluster
>> node, etc.), I think I broke something having to do with floating IPs. Now
>> when I connect a floating IP to a VM, it the IP shows up as "DOWN" instead
>> of "ACTIVE" and I'm unable to ping it. Notes:
>>
>> 1) The underlying VM port is active and works as expected. I can
>> connect to the fixed IP from within the VM's virtual network. The VM can
>> connect to the outside world.
>> 2) Existing VMs with existing floating IPs work as expected.
>> 3) If I create a new VM and try to apply an existing floating IP to it
>> (one that was working on a previous VM), the status for that floating IP
>> remains "ACTIVE" but I'm unable to ping it.
>> 4) All of the security groups for all of the VMs are the same.
>>
>> Floating IP manipulation doesn't seem to produce a lot of debugging
>> content in the logs, so it's difficult to trace this one. I don't know if
>> the neutron API glitches are related or if the floating IP problem is a
>> second issue that I created in the process of debugging.
>>
>> Any idea where I should look?
>>
>> Thanks,
>>
>> -Matt
>>
>> _______________________________________________
>> Mailing list:
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>> Post to : openstack at lists.openstack.org
>> Unsubscribe :
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20150514/5b73b22d/attachment.html>
More information about the Openstack
mailing list