[openstack-dev] [Openstack-dev][Neutron] Handling of ovs command errors

Salvatore Orlando sorlando at nicira.com
Mon Nov 25 18:36:00 UTC 2013


Thanks Kyle,

More comments inline.

Salvatore


On 25 November 2013 16:03, Kyle Mestery (kmestery) <kmestery at cisco.com>wrote:

> On Nov 25, 2013, at 8:28 AM, Salvatore Orlando <sorlando at nicira.com>
> wrote:
> >
> > Hi,
> >
> > I've been recently debugging some issues I've had with the OVS agent,
> and I found out that in many  cases (possibly every case) the code just
> logs errors from ovs-vsctl and ovs-ofctl without taking any action in the
> control flow.
> >
> > For instance, the routine which should do the wiring for a port,
> port_bound [1], does not react in any way if it fails to configure the
> local vlan, which I guess means the port would not be able to send/receive
> any data.
> >
> > I'm pretty sure there's a good reason for this which I'm missing at the
> moment. I am asking because I see a pretty large number of ALARM_CLOCK
> errors returned by OVS commands in gate logs (see bug [2]), and I'm not
> sure whether it's ok to handle them as the OVS agent is doing nowadays.
> >
> Thanks for bringing this up Salvatore. It looks like the underlying
> run_vstcl [1] provides an ability to raise exceptions on errors, but this
> is not used by most of the callers of run_vsctl. Do you think we should be
> returning the exceptions back up the stack to callers to handle? I think
> that may be a good first step.
>

I think it makes sense to start to handle errors; as they often happen in
the agent's rpc loop simply raising will probably just cause the agent to
crash.
I looked again at the code and it really seems it's silently ignoring
errors from ovs command.
This actually makes sense in some cases. For instance the l3 agent might
remove a qr-xxx or qg-xxx port while the l2 agent is in the middle of its
iteration.

There are however cases in which the exception must be handled.
In cases like the ALARM_CLOCK error, either a retry mechanism or marking
the port for re-syncing at the next iteration might make sense.
Other error cases might be unrecoverable; for instance when a port
disappears. In that case it seems reasonable to put the relevant neutron
port in ERROR state, so that the user is aware that the port anymore.

>
> Thanks,
> Kyle
>
> [1]
> https://github.com/openstack/neutron/blob/master/neutron/agent/linux/ovs_lib.py#L52
>
> > Regards,
> > Salvatore
> >
> > [1]
> https://github.com/openstack/neutron/blob/master/neutron/plugins/openvswitch/agent/ovs_neutron_agent.py#L599
> > [2] https://bugs.launchpad.net/neutron/+bug/1254520
> > _______________________________________________
> > OpenStack-dev mailing list
> > OpenStack-dev at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20131125/fc54892e/attachment.html>


More information about the OpenStack-dev mailing list