[openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors
sorlando at nicira.com
Mon Nov 25 18:36:00 UTC 2013
More comments inline.
On 25 November 2013 16:03, Kyle Mestery (kmestery) <kmestery at cisco.com>wrote:
> On Nov 25, 2013, at 8:28 AM, Salvatore Orlando <sorlando at nicira.com>
> > Hi,
> > I've been recently debugging some issues I've had with the OVS agent,
> and I found out that in many cases (possibly every case) the code just
> logs errors from ovs-vsctl and ovs-ofctl without taking any action in the
> control flow.
> > For instance, the routine which should do the wiring for a port,
> port_bound , does not react in any way if it fails to configure the
> local vlan, which I guess means the port would not be able to send/receive
> any data.
> > I'm pretty sure there's a good reason for this which I'm missing at the
> moment. I am asking because I see a pretty large number of ALARM_CLOCK
> errors returned by OVS commands in gate logs (see bug ), and I'm not
> sure whether it's ok to handle them as the OVS agent is doing nowadays.
> Thanks for bringing this up Salvatore. It looks like the underlying
> run_vstcl  provides an ability to raise exceptions on errors, but this
> is not used by most of the callers of run_vsctl. Do you think we should be
> returning the exceptions back up the stack to callers to handle? I think
> that may be a good first step.
I think it makes sense to start to handle errors; as they often happen in
the agent's rpc loop simply raising will probably just cause the agent to
I looked again at the code and it really seems it's silently ignoring
errors from ovs command.
This actually makes sense in some cases. For instance the l3 agent might
remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its
There are however cases in which the exception must be handled.
In cases like the ALARM_CLOCK error, either a retry mechanism or marking
the port for re-syncing at the next iteration might make sense.
Other error cases might be unrecoverable; for instance when a port
disappears. In that case it seems reasonable to put the relevant neutron
port in ERROR state, so that the user is aware that the port anymore.
> > Regards,
> > Salvatore
> > 
> >  https://bugs.launchpad.net/neutron/+bug/1254520
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the OpenStack-dev