On Mon, Apr 15, 2019 at 7:48 PM Bence Romsics <bence.romsics@gmail.com> wrote:
Hi Heat and Neutron folks,

I'm wondering how could I improve support for extra routes for one of
our users. In this email I'd like to get your input (from both Heat
and Neutron folks) in which direction should I start before I sit down
to write the relevant blueprints and RFEs. This will be a bit long,
please bear with me.

As you know a neutron router can have extra static routing table
entries as the 'routes' attribute like this (please see the api-ref in
[1]):

$ openstack router show router1 -f value -c routes
destination='10.1.0.0/24', gateway='10.0.0.11'
destination='10.2.0.0/24', gateway='10.0.0.12'
...

To my best knowledge the whole set of extra routes must be updated at
once which easily leads to lost updates if multiple clients modify
this attribute concurrently (imagine client A and B working
concurrently, for example both reads routes [r1], client A writes [r1,
r2], client B writes [r1, r3]; depending on timing either r2 or r3
will be lost).

Heat has an unsupported resource OS::Neutron::ExtraRoute [2] to front
Neutron extra routes. For example:

heat_template_version: '2015-04-30'
resources:
  Extra_route_1:
    type: OS::Neutron::ExtraRoute
    properties:
      destination: '10.1.0.0/24'
      nexthop: '10.0.0.11'
      router_id: 'd544b5c5-d4d2-4b52-aaa8-ef4b9fa3e154'

The Heat resource represents a single extra route out of the many in
the router's routes attribute. If you have multiple
OS::Neutron::ExtraRoute resources Heat by default handles them
concurrently leading to the lost update problem. You can easily
reproduce the problem just by having 10 extra routes in a template.
The stack creation will succeed, but the router in fact will have only
6-7 (the exact number changes based on timing) extra routes. AFAIU
this is exactly why this was marked unsupported. (Please let me know
if you know other reasons.)


I think this had been discussed on several occasions[1] before and one of the
suggestions from the neutron team was to use compare-and-swap update api [2]
to avoid race conditions.

[1] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131020.html
 [2] https://bugs.launchpad.net/neutron/+bug/1703234

As a workaround I suggested our user a hack to serialize the
OS::Neutron::ExtraRoute resources by 'depends_on'.

But which way should I go if I want to fix this properly?

way #1 - changes in Heat only: Introduce a new Heat resource
OS::Neutron::ExtraRoutes (please notice the plural) which can
represent the whole routes attribute of a router (i.e. multiple extra
routes in a single resource). This will not solve the design
shortcomings of the neutron extraroute API, but it can at least
express what can be done in the Neutron API as is. Lost updates of
multiple stacks editing the same router's routes are still possible.
 
way #2 - changes in Heat and Neutron: Introduce a new Neutron
extension with a better API (either extraroutes as a top level
resource, each object with a router_id or an action map a'la
add/remove_router_interface to edit extra routes on the server side).
This new API would work on the same DB tables and backends as the old.
The old API would be kept for backwards compatibility. But if a new
user exclusively uses the new API it can work concurrently. Then
introduce a new Heat resource (or change the existing unsupported one)
to use this new Neutron API. This is obviously more work than the
first option.

Which one do you prefer and why? Or let me know please if I missed a
better alternative.

In this thread I hope to get a cross-project agreement of this between
Heat and Neutron so I can go ahead and open the relevant Heat and
Neutron blueprints where we can focus only on each projects' internal
questions.

Thanks in advance,
Bence Romsics
irc: rubasov

[1] https://developer.openstack.org/api-ref/network/v2/?expanded=update-router-detail#update-router
[2] https://docs.openstack.org/heat/latest/template_guide/unsupported.html#OS::Neutron::ExtraRoute



--
Regards,
Rabi Mishra