[neutron] detecting l3-agent readiness

Mohammed Naser mnaser at vexxhost.com
Wed Mar 22 19:27:20 UTC 2023



From: Rodolfo Alonso Hernandez <ralonsoh at redhat.com>
Date: Monday, March 20, 2023 at 12:09 PM
To: Jan Horstmann <J.Horstmann at mittwald.de>
Cc: Mohammed Naser <mnaser at vexxhost.com>, felix.huettner at mail.schwarz <felix.huettner at mail.schwarz>, skaplons at redhat.com <skaplons at redhat.com>, openstack-discuss at lists.openstack.org <openstack-discuss at lists.openstack.org>
Subject: Re: [neutron] detecting l3-agent readiness
Hello:

I think I'm repeating myself here but we have two approaches to solve this problem:
* The easiest one, at least for the L3 agent, is to report an INFO level log before and after the full sync. That could be parsed by any tool to detect that. You can propose a patch to the Neutron repository.

I’ve kicked this off with this:

https://review.opendev.org/c/openstack/neutron/+/878248 fix: add log message for periodic_sync_routers_task fullsync [NEW]

* https://bugs.launchpad.net/neutron/+bug/2011422<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbugs.launchpad.net%2Fneutron%2F%2Bbug%2F2011422&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=MFnN5FJXrtHIctgceAu5gxc8dcZoVbXFyd0RSwCQqH4%3D&reserved=0>: a more elaborated way to report the agent status. That could provide the start flag, the revived flag, the sync processing flag and many other ones that could be defined only for this specific agent.

Regards.

On Mon, Mar 20, 2023 at 4:33 PM Jan Horstmann <J.Horstmann at mittwald.de<mailto:J.Horstmann at mittwald.de>> wrote:
On Wed, 2023-03-15 at 16:10 +0000, Felix Hüttner wrote:
> Hi,
>
> > Subject: Re: [neutron] detecting l3-agent readiness
> >
> > Hi,
> >
> > Dnia poniedziałek, 13 marca 2023 16:35:43 CET Felix Hüttner pisze:
> > > Hi Mohammed,
> > >
> > > > Subject: [neutron] detecting l3-agent readiness
> > > >
> > > > Hi folks,
> > > >
> > > > I'm working on improving the stability of rollouts when using Kubernetes as a control
> > plane, specifically around the L3 agent, it seems that I have not found a clear way to
> > detect in the code path where the L3 agent has finished it's initial sync..
> > > >
> > >
> > > We build such a solution here: https://gitlab.com/yaook/images/neutron-l3-agent/-<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgitlab.com%2Fyaook%2Fimages%2Fneutron-l3-agent%2F-&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=eI3UC%2FIfb9TuoPdy1xrmCOENIUrTiNndqMmx98J0u5s%3D&reserved=0>
> > /blob/devel/files/startup_wait_for_ns.py
> > > Basically we are checking against the neutron api what routers should be on the node and
> > then validate that all keepalived processes are up and running.
> >
> > That would work only for HA routers. If You would also have routers which aren't "ha" this
> > method may fail.
> >
>
> Yep, since we only have HA routers that works fine for us. But I guess it should also work for non-ha routers without too much adoption (maybe just check for namespaces instead of keepalived).
>

Instead of counting processes I have been using the l3 agent's
`configurations.routers` field to determine its readiness.
From my understanding comparing this number with the number of active
routers hosted by the agent should be a good indicator of its sync
status.
Using two api calls for this is inherently racy, but could be a
sufficient workaround for environments with a moderate number of
router events.
So a simple test snippet for the sync status of all agents could be:

```
import sys
import openstack
client = openstack.connection.Connection(
   ...
)
l3_agent_synced = [
    len([
        router
        for router in client.network.agent_hosted_routers(agent)
            if router.is_admin_state_up
    ]) <= client.network.get_agent(agent).configuration["routers"]
    for agent in client.network.agents()
        if agent.agent_type == "L3 agent"
           and (agent.configuration["agent_mode"] == "dvr_snat"
                or agent.configuration["agent_mode"] == "legacy")
]
if not all(l3_agent_synced):
    sys.exit(1)
```

Please let me know if I am way off with this approach :)


> > >
> > > > Am I missing it somewhere or is the architecture built in a way that doesn't really
> > answer that question?
> > > >
> > >
> > > Adding a option in the neutron api would be a lot nicer. But i guess that also counts
> > for l2 and dhcp agents.
> > >
> > >
> > > > Thanks
> > > > Mohammed
> > > >
> > > >
> > > > --
> > > > Mohammed Naser
> > > > VEXXHOST, Inc.
> > >
> > > --
> > > Felix Huettner
> > > Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die Verwertung
> > durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der vorgesehene Empfänger
> > sein, setzen Sie den Absender bitte unverzüglich in Kenntnis und löschen diese E Mail.
> > Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.datenschutz.schwarz%2F&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tfDvMmUFLZV2JbmqqeVlQq%2FzoWTRqNVrgQdKyeuCWOc%3D&reserved=0>>.
> > >
> >
> >
> > --
> > Slawek Kaplonski
> > Principal Software Engineer
> > Red Hat
>
> --
> Felix Huettner
> Diese E Mail enthält möglicherweise vertrauliche Inhalte und ist nur für die Verwertung durch den vorgesehenen Empfänger bestimmt. Sollten Sie nicht der vorgesehene Empfänger sein, setzen Sie den Absender bitte unverzüglich in Kenntnis und löschen diese E Mail. Hinweise zum Datenschutz finden Sie hier<https://www.datenschutz.schwarz<https://can01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.datenschutz.schwarz%2F&data=05%7C01%7Cmnaser%40vexxhost.com%7Ceb8b063ae0584c99f7a408db295d7b95%7C54e2b12264054dafa35bf65edc45c621%7C0%7C0%7C638149253703931245%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7C&sdata=tfDvMmUFLZV2JbmqqeVlQq%2FzoWTRqNVrgQdKyeuCWOc%3D&reserved=0>>.

--
Jan Horstmann
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.openstack.org/pipermail/openstack-discuss/attachments/20230322/4f5a4d89/attachment-0001.htm>


More information about the openstack-discuss mailing list