tir. 20. jun. 2023, 17:37 skrev Rodolfo Alonso Hernandez <ralonsoh@redhat.com>:
Hello Frode:

The local IDL cache is not updated during a transaction but after it. In order to make this transaction aware of the previous operations that are being stacked in this transaction, you should make the scheduler conditional to the command results. In other words: this is not a transactional database operation. The "write" commands are stored in the transaction while the "read" operations use the local cache, which is *not* updated. In a nutshell, the result is the same. Once you commit the "write" operations, the DB server will push the updates to the local caches.

It is true they are updated after the transaction, but the in-memory representation is also updated as operations are applied by the OVS Python IDL library (see references). That is kind of the whole point of the IDL.

But as I said, we need to be careful about what lookups are done in neutron context and what lookups are done in ovsdbapp run_idl context.

You can also look at the two reviews that does this and their functional tests if you need more proof :)

Better yet, wait until tomorrow and you'll have two reviewable changes that does this and thereby fixed the issue.

--
Frode Nordahl 


Regards.


On Tue, Jun 20, 2023 at 3:53 PM Frode Nordahl <frode.nordahl@canonical.com> wrote:
On Tue, Jun 20, 2023 at 3:42 PM Rodolfo Alonso Hernandez
<ralonsoh@redhat.com> wrote:
>
> Hello:
>
> Luis, thanks for the update, the links and sharing the WIP implementation.
>
> Frode, how this method is improving the LRP scheduling? The callback is doing the same logic as before. Furthermore, if we are calculating the GW chassis for the LRP within the same DB transaction, that means the local IDL cache won't be updated until the end of this transaction. Sorry, I might be missing something.

The essence of the change once done with it is that the scheduler code
will be called while ovsdbapp is applying the transaction (i.e. from
within run_idl). At that point, the IDL will be updated for every
operation performed ref [0][1][2], which means they will impact each
other.

What currently happens is that the Neutron code adds in-flight
operations to the ovsdbapp transaction (which is essentially a stack
of operations to perform), without taking into account what's in that
transaction.

I'm in the process of updating the current proposal in line with the
discussion I've had with Terry on that review, so you'll have
something to look at within the next day or so.

0: https://github.com/openvswitch/ovs/blob/e3ba0be48ca457ab3a1c9f1e3522e82218eca0f9/python/ovs/db/idl.py#L1316
1: https://github.com/openvswitch/ovs/blob/e3ba0be48ca457ab3a1c9f1e3522e82218eca0f9/python/ovs/db/idl.py#L1400
2: https://github.com/openvswitch/ovs/blob/1f47d73996b0c565f9ce035c899a042f2ea394a6/python/ovs/db/idl.py#L2083

--
Frode Nordahl

> Regards.
>
> On Tue, Jun 20, 2023 at 3:02 PM Frode Nordahl <frode.nordahl@canonical.com> wrote:
>>
>> Hello, Rodolfo,
>>
>> I have relevant information on one of the points discussed below, so
>> just wanted to chime in.
>>
>> On Tue, Jun 20, 2023 at 12:44 PM Rodolfo Alonso Hernandez
>> <ralonsoh@redhat.com> wrote:
>>
>> [ snip ]
>>
>> > *** OVN L3 scheduler issue ***
>> > This issue has been reproduced in an environment with more than 5 chassis with gateway ports. The router GW ports are assigned to the GW chassis using a manual scheduler implemented in Neutron (the default one is ``OVNGatewayLeastLoadedScheduler``). If one of the chassis is stopped, the GW ports should be re-assigned to the other GW chassis. This is happening but all ports fall under the same one; this re-scheduling should share the ports among the other active chassis.
>> > * Action item: I'll open a LP bug and investigate this issue.
>>
>> Background on why this is happening and a solution is being worked on in [5].
>>
>> 5: https://review.opendev.org/c/openstack/neutron/+/874760
>>
>> --
>> Frode Nordahl
>>
>> > *** Size of the OVN SB "HA_Chassis_Group" table ***
>> > The OVN SB "HA_Chassis_Group" increases its size indefinitely with each operation creating a router and assigning a new external gateway (external network). This table never decreases,
>> > * Action item: I'll open a LP bug, investigate this issue and if this is a core OVN issue, report it.
>> >
>> > *** Live migration with ML2/OVN ***
>> > This is a common topic and not only for ML2/OVN. The migration time has many factors (memory size, applications running, network BW, etc) that could slow down the migration time and trigger a communication gap during this process.
>> > * Action item: to create better documentation, both in Nova and Neutron, about the migration process, what has been done to improve it (for example, the OVN multiple port binding) and what factors will affect the migration.
>> >
>> > *** ML2/OVN IPv6 DVR ***
>> > This spec was approved during the last cycle [1]. The implementation [2] is under review.
>> > * Action item: to review the patch (for Neutron reviewers)
>> > * Action item: to implement the necessary tempest tests (for the feature developers)
>> >
>> > *** BGP with ML2/OVS, exposing address blocks ***
>> > This user has successfully deployed Neutron with ML2/OVS and n-d-r. This user is currently making public a certain set of FIPs. However, for other VMs without FIPs, the goal is to make the router GW port IP address public, using the address blocks functionality; this is not working according to the user.
>> > * Action item: (for this user) to create a LP bug describing the architecture of the deployment, the configuration used and the API commands used to reproduce this issue.
>> >
>> > *** Metadata service (any backend) ***
>> > Neutron is in charge of deploying the Metadata service on the compute nodes. Each time the metadata HTTP server is called, it requests from the Neutron API the instance and tenant ID [3]. This method implies a RPC call. In "busy" compute nodes, where the VMs are created and destroyed very fast, this RPC communication is a bottleneck.
>> > * Action item: open a LP bug to implement the same ``CacheBackedPluginApi`` used in the OVS agent. This RPC cached class creates a set of subscriptions to the needed resources ("ports" in this case). The Neutron API will send the port updated info and cached locally; that makes unnecessary the RPC request if the resources are stored locally.
>> >
>> > *** ML2/OVN + Ironic nodes ***
>> > This user has deployed ML2/OVN with Ironic nodes, and is using ovn-bgp-agent with the eVPN driver to make public the private ports (IP and MACs) to the Ironic node ports. More information in [4].
>> >
>> > *** BGP acceleration in ML2/OVN ***
>> > Many questions related to this topic, both with DPDK and HW offload. I would refer (once the link is available) to the talk "Enabling multi-cluster connectivity using dynamic routing via BGP in Openstack" given by Christophe Fontaine during this PTG. You'll find it very interesting how this new implementation moves all the packet processing to the OVS datapath (removing any Linux Bridge / iptables processing). The example provided in the talk refers to the use of DPDK.
>> >
>> >
>> > I hope this PTG was interesting for you! Don't hesitate to use the usual channels that are the mailing list and IRC. Remember we have the weekly Neutron meeting every Tuesday at 1400UTC.
>> >
>> > Regards.
>> >
>> > [1]https://specs.openstack.org/openstack/neutron-specs/specs/2023.1/ovn-ipv6-dvr.html
>> > [2]https://review.opendev.org/c/openstack/neutron/+/867513
>> > [3]https://github.com/openstack/neutron/blob/cbb89fdb1414a1b3a8e8b3a9a4154ef627bb9d1a/neutron/agent/metadata/agent.py#L89
>> > [4]https://ltomasbo.wordpress.com/2021/06/25/openstack-networking-with-evpn/
>> >
>>
>>
>> --
>> Frode Nordahl
>>


--
Frode Nordahl