Open Stack

Thu Apr 10 08:46:15 UTC 2014

Steps to debug.

  1.  Understand where exactly the problem lies
     *   Are you not able to reach the floating ip of instances?
        *   First start a continuous ping from an machine outside openstack to the floating ip
        *   Go to network node. Find the interface of the router that attaches your external network to the br-ex(external bridge, you should see it in bridge_mappings, the one with no vlan id ranges in its corresponding network_vlan_ranges)
        *   Note: This interface might not be in default network node host's namespace. It would exists inside the namespace that was created for your router. Your namespace for your router would normally be something like 'qrouter-<router_id>' and you can view it using 'ip netns' command.
        *   Do 'tcpdump -lennvi <the interface>. To do this you would have to execute tcpdump inside the namespace mentioned above. You can do that by 'ip netns exec <namespace id> tcpdump -lennvi <interface_name>
        *   In your tcpdump do you see the ping requests arriving?
           *   No?
              *   If you do not see them then it might be that your physical network interface (say eth3) attached to br-ex is not in promiscous mode or it is not up.
              *   So you do 'ip link set <physical_interface> up', 'ip link set <physical_interface> promisc on'
           *   Yes?
              *   Go on the next step. Find the network interface attaching your router(external router) to your instance's network. Again it will be inside the same network namespace and to the tcpdump there.
              *   Here you should see the same ping request except that the ip you are pinging should be the private ip and not the floating ip. If this is not happening the problem lies in your neutron l3 agent and /or firewall driver.
                 *   If this too is happening you have to go to the below subject.
     *   Are the instances not able to reach other through their private ip itself?
        *   This could mean that your instance would also not be able to reach its gateway router. The router that is responsible for floating ip mapping and inter subnet connectivity.
        *   To check this start a continuous ping from one of the instances in openstack to the gateway router interface for that subnet.
        *   Start tracing where your packets are dropped using tcpdump. Below is the list of interface you are to look in the order from instance to router.
           *   The tap device attached to the instance. You can find this in the openstack dashboard page of the network.
           *   'int-br-eth1'
           *   'phy-br-eth1' at this interface the ping packets should carry a vlan(if you are using vlan mode)
           *   eth1( I am assuming that your physnet is bridged to br-eth1 and eth1 is attached to br-eth1) here the packets should carry a vlan id that was assigned to the openstack network while you created it.
           *   eth1 of the network node. 'phy-br-eth1',  'int-br-eth1' of network node. Then to the interface of the router in the instance's network

I agree Its too cryptic and would not make sense on first look but if you study the way neutron openvswitch agent works, you will see the flow I have mentioned above. If you could tell me where exactly your packet goes missing I could find a possible reason and solution to prevent outages.

There is however another way to debug using ovs-ofctl dump-flows on br-int and br-eth1 on both compute and network node. But this assumes that all flows are correctly programmed.

Thank you,

Ageeleshwar K

________________________________
From: Akshat Kansal [akshatknsl at gmail.com]
Sent: Thursday, April 10, 2014 1:26 PM
To: Robert van Leeuwen
Cc: openstack at lists.openstack.org
Subject: Re: [Openstack] quantum openvswitch agent on compute nodes stops working.

Thanks Robert,

Yes other components still work, openvswitch works fine as no flows are dropped.
I even do not see any error in the logs, but still it stops working.

Also, after the restart it starts working fine,so I don't doubt the space in rabbit message queue to be a problem.

Regards
Akshat

On Thu, Apr 10, 2014 at 11:23 AM, Robert van Leeuwen <Robert.vanLeeuwen at spilgames.com<mailto:Robert.vanLeeuwen at spilgames.com>> wrote:
> I am facing a issue, where all of a sudden the quantum openvswitch agent stops working and all the VMs lose
> connectivity and even the provisioning fails.
>
>Also, I also want to understand what is the role of quantum openvswitch agent.
>
>Any pointer will be helpful.

The agent setups the Openvswitch flows  (ovs-ofctl dump-flows).
I think it also creates the interfaces to be patched into the vms.

What does the openvswitch logs say? Do other components still work?

I think I saw something similar when rabbitmq did not have enough space (it needs at least 1GB free space).
You would be able to connect to rabbitmq (so no errors in the logs) but it stopped processing messages.

Cheers,
Robert van Leeuwen

http://www.csscorp.com/common/email-disclaimer.php
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack/attachments/20140410/07988356/attachment.html>

Open Stack

[Openstack] quantum openvswitch agent on compute nodes stops working.

OpenStack

Community

Documentation

Branding & Legal