[openstack-dev] [Neutron] L3 agent rescheduling issue

Assaf Muller amuller at redhat.com
Thu Jun 4 17:24:37 UTC 2015



----- Original Message -----
> One reason for not sending the heartbeat from a separate greenthread could be
> that the agent is already doing it [1].
> The current proposed patch addresses the issue blindly - that is to say
> before declaring an agent dead let's wait for some more time because it
> could be stuck doing stuff. In that case I would probably make the
> multiplier (currently 2x) configurable.
> 
> The reason for which state report does not occur is probably that both it and
> the resync procedure are periodic tasks. If I got it right they're both
> executed as eventlet greenthreads but one at a time. Perhaps then adding an
> initial delay to the full sync task might ensure the first thing an agent
> does when it comes up is sending a heartbeat to the server?

There's a patch that is related to this issue:
https://review.openstack.org/#/c/186584/

I made a comment there where, at least to me, it makes a lot of sense to insert
a report_state call in the after_start method, right after the agent initializes
but before it performs the first full sync. So, right here before line 560:
https://github.com/openstack/neutron/blob/master/neutron/agent/l3/agent.py#L560

That should help *some* of the issues discussed in this thread, but not all.

> 
> On the other hand, while doing the initial full resync, is the agent able to
> process updates? If not perhaps it makes sense to have it down until it
> finishes synchronisation.
> 
> Salvatore
> 
> [1]
> http://git.openstack.org/cgit/openstack/neutron/tree/neutron/agent/l3/agent.py#n587
> 
> On 4 June 2015 at 16:16, Kevin Benton < blak111 at gmail.com > wrote:
> 
> 
> 
> 
> Why don't we put the agent heartbeat into a separate greenthread on the agent
> so it continues to send updates even when it's busy processing changes?
> On Jun 4, 2015 2:56 AM, "Anna Kamyshnikova" < akamyshnikova at mirantis.com >
> wrote:
> 
> 
> 
> Hi, neutrons!
> 
> Some time ago I discovered a bug for l3 agent rescheduling [1]. When there
> are a lot of resources and agent_down_time is not big enough neutron-server
> starts marking l3 agents as dead. The same issue has been discovered and
> fixed for DHCP-agents. I proposed a change similar to those that were done
> for DHCP-agents. [2]
> 
> There is no unified opinion on this bug and proposed change, so I want to ask
> developers whether it worth to continue work on this patch or not.
> 
> [1] - https://bugs.launchpad.net/neutron/+bug/1440761
> [2] - https://review.openstack.org/171592
> 
> --
> Regards,
> Ann Kamyshnikova
> Mirantis, Inc
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 



More information about the OpenStack-dev mailing list