[openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors

Kyle Mestery (kmestery) kmestery at cisco.com
Mon Nov 25 21:10:25 UTC 2013


On Nov 25, 2013, at 12:36 PM, Salvatore Orlando <sorlando at nicira.com> wrote:
> 
> Thanks Kyle,
> 
> More comments inline.
> 
> Salvatore
> 
> 
> On 25 November 2013 16:03, Kyle Mestery (kmestery) <kmestery at cisco.com> wrote:
> On Nov 25, 2013, at 8:28 AM, Salvatore Orlando <sorlando at nicira.com> wrote:
> >
> > Hi,
> >
> > I've been recently debugging some issues I've had with the OVS agent, and I found out that in many  cases (possibly every case) the code just logs errors from ovs-vsctl and ovs-ofctl without taking any action in the control flow.
> >
> > For instance, the routine which should do the wiring for a port, port_bound [1], does not react in any way if it fails to configure the local vlan, which I guess means the port would not be able to send/receive any data.
> >
> > I'm pretty sure there's a good reason for this which I'm missing at the moment. I am asking because I see a pretty large number of ALARM_CLOCK errors returned by OVS commands in gate logs (see bug [2]), and I'm not sure whether it's ok to handle them as the OVS agent is doing nowadays.
> >
> Thanks for bringing this up Salvatore. It looks like the underlying run_vstcl [1] provides an ability to raise exceptions on errors, but this is not used by most of the callers of run_vsctl. Do you think we should be returning the exceptions back up the stack to callers to handle? I think that may be a good first step.
> 
> I think it makes sense to start to handle errors; as they often happen in the agent's rpc loop simply raising will probably just cause the agent to crash.
> I looked again at the code and it really seems it's silently ignoring errors from ovs command.
> This actually makes sense in some cases. For instance the l3 agent might remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its iteration.
> 
> There are however cases in which the exception must be handled.
> In cases like the ALARM_CLOCK error, either a retry mechanism or marking the port for re-syncing at the next iteration might make sense.
> Other error cases might be unrecoverable; for instance when a port disappears. In that case it seems reasonable to put the relevant neutron port in ERROR state, so that the user is aware that the port anymore.
> 
I think it makes sense to address these things. Want me to file a bug?

> Thanks,
> Kyle
> 
> [1] https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52
> 
> > Regards,
> > Salvatore
> >
> > [1] https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
> > [2] https://bugs.launchpad.net/neutron/+bug/1254520
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev





More information about the OpenStack-dev mailing list