[heat][neutron] improving extraroute support
Hi Heat and Neutron folks, I'm wondering how could I improve support for extra routes for one of our users. In this email I'd like to get your input (from both Heat and Neutron folks) in which direction should I start before I sit down to write the relevant blueprints and RFEs. This will be a bit long, please bear with me. As you know a neutron router can have extra static routing table entries as the 'routes' attribute like this (please see the api-ref in [1]): $ openstack router show router1 -f value -c routes destination='10.1.0.0/24', gateway='10.0.0.11' destination='10.2.0.0/24', gateway='10.0.0.12' ... To my best knowledge the whole set of extra routes must be updated at once which easily leads to lost updates if multiple clients modify this attribute concurrently (imagine client A and B working concurrently, for example both reads routes [r1], client A writes [r1, r2], client B writes [r1, r3]; depending on timing either r2 or r3 will be lost). Heat has an unsupported resource OS::Neutron::ExtraRoute [2] to front Neutron extra routes. For example: heat_template_version: '2015-04-30' resources: Extra_route_1: type: OS::Neutron::ExtraRoute properties: destination: '10.1.0.0/24' nexthop: '10.0.0.11' router_id: 'd544b5c5-d4d2-4b52-aaa8-ef4b9fa3e154' The Heat resource represents a single extra route out of the many in the router's routes attribute. If you have multiple OS::Neutron::ExtraRoute resources Heat by default handles them concurrently leading to the lost update problem. You can easily reproduce the problem just by having 10 extra routes in a template. The stack creation will succeed, but the router in fact will have only 6-7 (the exact number changes based on timing) extra routes. AFAIU this is exactly why this was marked unsupported. (Please let me know if you know other reasons.) As a workaround I suggested our user a hack to serialize the OS::Neutron::ExtraRoute resources by 'depends_on'. But which way should I go if I want to fix this properly? way #1 - changes in Heat only: Introduce a new Heat resource OS::Neutron::ExtraRoutes (please notice the plural) which can represent the whole routes attribute of a router (i.e. multiple extra routes in a single resource). This will not solve the design shortcomings of the neutron extraroute API, but it can at least express what can be done in the Neutron API as is. Lost updates of multiple stacks editing the same router's routes are still possible. way #2 - changes in Heat and Neutron: Introduce a new Neutron extension with a better API (either extraroutes as a top level resource, each object with a router_id or an action map a'la add/remove_router_interface to edit extra routes on the server side). This new API would work on the same DB tables and backends as the old. The old API would be kept for backwards compatibility. But if a new user exclusively uses the new API it can work concurrently. Then introduce a new Heat resource (or change the existing unsupported one) to use this new Neutron API. This is obviously more work than the first option. Which one do you prefer and why? Or let me know please if I missed a better alternative. In this thread I hope to get a cross-project agreement of this between Heat and Neutron so I can go ahead and open the relevant Heat and Neutron blueprints where we can focus only on each projects' internal questions. Thanks in advance, Bence Romsics irc: rubasov [1] https://developer.openstack.org/api-ref/network/v2/?expanded=update-router-d... [2] https://docs.openstack.org/heat/latest/template_guide/unsupported.html#OS::N...
On Mon, Apr 15, 2019 at 7:48 PM Bence Romsics <bence.romsics@gmail.com> wrote:
Hi Heat and Neutron folks,
I'm wondering how could I improve support for extra routes for one of our users. In this email I'd like to get your input (from both Heat and Neutron folks) in which direction should I start before I sit down to write the relevant blueprints and RFEs. This will be a bit long, please bear with me.
As you know a neutron router can have extra static routing table entries as the 'routes' attribute like this (please see the api-ref in [1]):
$ openstack router show router1 -f value -c routes destination='10.1.0.0/24', gateway='10.0.0.11' destination='10.2.0.0/24', gateway='10.0.0.12' ...
To my best knowledge the whole set of extra routes must be updated at once which easily leads to lost updates if multiple clients modify this attribute concurrently (imagine client A and B working concurrently, for example both reads routes [r1], client A writes [r1, r2], client B writes [r1, r3]; depending on timing either r2 or r3 will be lost).
Heat has an unsupported resource OS::Neutron::ExtraRoute [2] to front Neutron extra routes. For example:
heat_template_version: '2015-04-30' resources: Extra_route_1: type: OS::Neutron::ExtraRoute properties: destination: '10.1.0.0/24' nexthop: '10.0.0.11' router_id: 'd544b5c5-d4d2-4b52-aaa8-ef4b9fa3e154'
The Heat resource represents a single extra route out of the many in the router's routes attribute. If you have multiple OS::Neutron::ExtraRoute resources Heat by default handles them concurrently leading to the lost update problem. You can easily reproduce the problem just by having 10 extra routes in a template. The stack creation will succeed, but the router in fact will have only 6-7 (the exact number changes based on timing) extra routes. AFAIU this is exactly why this was marked unsupported. (Please let me know if you know other reasons.)
I think this had been discussed on several occasions[1] before and one of the suggestions from the neutron team was to use compare-and-swap update api [2] to avoid race conditions. [1] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131020.html [2] https://bugs.launchpad.net/neutron/+bug/1703234 As a workaround I suggested our user a hack to serialize the
OS::Neutron::ExtraRoute resources by 'depends_on'.
But which way should I go if I want to fix this properly?
way #1 - changes in Heat only: Introduce a new Heat resource OS::Neutron::ExtraRoutes (please notice the plural) which can represent the whole routes attribute of a router (i.e. multiple extra routes in a single resource). This will not solve the design shortcomings of the neutron extraroute API, but it can at least express what can be done in the Neutron API as is. Lost updates of multiple stacks editing the same router's routes are still possible.
way #2 - changes in Heat and Neutron: Introduce a new Neutron
extension with a better API (either extraroutes as a top level resource, each object with a router_id or an action map a'la add/remove_router_interface to edit extra routes on the server side). This new API would work on the same DB tables and backends as the old. The old API would be kept for backwards compatibility. But if a new user exclusively uses the new API it can work concurrently. Then introduce a new Heat resource (or change the existing unsupported one) to use this new Neutron API. This is obviously more work than the first option.
Which one do you prefer and why? Or let me know please if I missed a better alternative.
In this thread I hope to get a cross-project agreement of this between Heat and Neutron so I can go ahead and open the relevant Heat and Neutron blueprints where we can focus only on each projects' internal questions.
Thanks in advance, Bence Romsics irc: rubasov
[1] https://developer.openstack.org/api-ref/network/v2/?expanded=update-router-d... [2] https://docs.openstack.org/heat/latest/template_guide/unsupported.html#OS::N...
-- Regards, Rabi Mishra
On Mon, Apr 15, 2019 at 4:39 PM Rabi Mishra <ramishra@redhat.com> wrote:
I think this had been discussed on several occasions[1] before and one of the suggestions from the neutron team was to use compare-and-swap update api [2] to avoid race conditions.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131020.html [2] https://bugs.launchpad.net/neutron/+bug/1703234
Thank you for your quick answer. I did read that previous discussion and I did not properly understand why compare-and-swap was recommended. To my understanding compare-and-swap allows a client to detect a race with another, but it would not eliminate the races. All clients would have to retry until success and with the current OS::Neutron::ExtraRoute we do actively generate those races. That's why I went in the directions mentioned in my original mail. Am I missing something?
On 15/04/19 11:01 AM, Bence Romsics wrote:
On Mon, Apr 15, 2019 at 4:39 PM Rabi Mishra <ramishra@redhat.com> wrote:
I think this had been discussed on several occasions[1] before and one of the suggestions from the neutron team was to use compare-and-swap update api [2] to avoid race conditions.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131020.html [2] https://bugs.launchpad.net/neutron/+bug/1703234
Thank you for your quick answer. I did read that previous discussion and I did not properly understand why compare-and-swap was recommended. To my understanding compare-and-swap allows a client to detect a race with another, but it would not eliminate the races. All clients would have to retry until success and with the current OS::Neutron::ExtraRoute we do actively generate those races. That's why I went in the directions mentioned in my original mail. Am I missing something?
Yes, you have to do it in a loop. So instead of just: * read the current status * add the new route to the data * write back the whole thing you'd need to do: * read the current status and remember the ETag * add the new route to the data * write back the whole thing with If-Match * if the write is rejected with status code 412, go back to the beginning IMHO the best place to implement this would be in neutronclient/openstacksdk and not Heat, because the problem is inherent to the API and exists everywhere. However, I wouldn't reject a fix in Heat. cheers, Zane.
Hi,
Wiadomość napisana przez Zane Bitter <zbitter@redhat.com> w dniu 15.04.2019, o godz. 19:02:
On 15/04/19 11:01 AM, Bence Romsics wrote:
On Mon, Apr 15, 2019 at 4:39 PM Rabi Mishra <ramishra@redhat.com> wrote:
I think this had been discussed on several occasions[1] before and one of the suggestions from the neutron team was to use compare-and-swap update api [2] to avoid race conditions.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131020.html [2] https://bugs.launchpad.net/neutron/+bug/1703234 Thank you for your quick answer. I did read that previous discussion and I did not properly understand why compare-and-swap was recommended. To my understanding compare-and-swap allows a client to detect a race with another, but it would not eliminate the races. All clients would have to retry until success and with the current OS::Neutron::ExtraRoute we do actively generate those races. That's why I went in the directions mentioned in my original mail. Am I missing something?
Yes, you have to do it in a loop. So instead of just:
* read the current status * add the new route to the data * write back the whole thing
you'd need to do:
* read the current status and remember the ETag * add the new route to the data * write back the whole thing with If-Match * if the write is rejected with status code 412, go back to the beginning
IMHO the best place to implement this would be in neutronclient/openstacksdk and not Heat, because the problem is inherent to the API and exists everywhere. However, I wouldn't reject a fix in Heat.
IMO option 2 provided by Bence is the best one here. This should be fixed in neutron API by making extra routes first class API objects and treat them as router interfaces. DB layer is „ready” for that IMO so it should be handled easy. All other ways it will just be kind of workaround of the issue. Even if we will fix it in OpenStack SDK and/or neutron client there may be someone who is using API in some other way and will still hit the same issue.
cheers, Zane.
— Slawek Kaplonski Senior software engineer Red Hat
On 15/04/19 1:23 PM, Slawomir Kaplonski wrote:
Hi,
Wiadomość napisana przez Zane Bitter <zbitter@redhat.com> w dniu 15.04.2019, o godz. 19:02:
On 15/04/19 11:01 AM, Bence Romsics wrote:
On Mon, Apr 15, 2019 at 4:39 PM Rabi Mishra <ramishra@redhat.com> wrote:
I think this had been discussed on several occasions[1] before and one of the suggestions from the neutron team was to use compare-and-swap update api [2] to avoid race conditions.
[1] http://lists.openstack.org/pipermail/openstack-dev/2018-June/131020.html [2] https://bugs.launchpad.net/neutron/+bug/1703234 Thank you for your quick answer. I did read that previous discussion and I did not properly understand why compare-and-swap was recommended. To my understanding compare-and-swap allows a client to detect a race with another, but it would not eliminate the races. All clients would have to retry until success and with the current OS::Neutron::ExtraRoute we do actively generate those races. That's why I went in the directions mentioned in my original mail. Am I missing something?
Yes, you have to do it in a loop. So instead of just:
* read the current status * add the new route to the data * write back the whole thing
you'd need to do:
* read the current status and remember the ETag * add the new route to the data * write back the whole thing with If-Match * if the write is rejected with status code 412, go back to the beginning
IMHO the best place to implement this would be in neutronclient/openstacksdk and not Heat, because the problem is inherent to the API and exists everywhere. However, I wouldn't reject a fix in Heat.
IMO option 2 provided by Bence is the best one here. This should be fixed in neutron API by making extra routes first class API objects and treat them as router interfaces. DB layer is „ready” for that IMO so it should be handled easy. All other ways it will just be kind of workaround of the issue. Even if we will fix it in OpenStack SDK and/or neutron client there may be someone who is using API in some other way and will still hit the same issue.
Oh, for sure. If Neutron is willing to do that then that is definitely the best solution. (As you say, there are other non-Python clients that also need to implement the If-Match thing to work reliably with the current API.) Double points for not making another "extension" API and just saying this is either something Neutron can do or not. - ZB
Hi All, On Mon, Apr 15, 2019 at 8:08 PM Zane Bitter <zbitter@redhat.com> wrote:
On 15/04/19 1:23 PM, Slawomir Kaplonski wrote:
IMO option 2 provided by Bence is the best one here. This should be fixed in neutron API by making extra routes first class API objects and treat them as router interfaces. DB layer is „ready” for that IMO so it should be handled easy. All other ways it will just be kind of workaround of the issue. Even if we will fix it in OpenStack SDK and/or neutron client there may be someone who is using API in some other way and will still hit the same issue.
Oh, for sure. If Neutron is willing to do that then that is definitely the best solution.
I'm inclined to take that as a shared agreement to prefer way #2 over the other alternatives brought up in this thread. That is first class extra routes in neutron and adapt the heat resource to that. I am willing to take up both the Neutron and Heat sides of this work in the Train cycle if the projects agree this is a good idea to work on and it is worth the review cycles. As a next step I'll write up a Neutron RFE. And at least open a placeholder Heat blueprint - which I'll be able to specify better when the first class extraroute Neutron API is agreed on. Please expect these coming still before the PTG. I'll be attending the Summit and the PTG in Denver so we can discuss these in person if we feel that's needed at that time. Thanks, Bence
Hi,
Wiadomość napisana przez Bence Romsics <bence.romsics@gmail.com> w dniu 16.04.2019, o godz. 10:48:
Hi All,
On Mon, Apr 15, 2019 at 8:08 PM Zane Bitter <zbitter@redhat.com> wrote:
On 15/04/19 1:23 PM, Slawomir Kaplonski wrote:
IMO option 2 provided by Bence is the best one here. This should be fixed in neutron API by making extra routes first class API objects and treat them as router interfaces. DB layer is „ready” for that IMO so it should be handled easy. All other ways it will just be kind of workaround of the issue. Even if we will fix it in OpenStack SDK and/or neutron client there may be someone who is using API in some other way and will still hit the same issue.
Oh, for sure. If Neutron is willing to do that then that is definitely the best solution.
I'm inclined to take that as a shared agreement to prefer way #2 over the other alternatives brought up in this thread. That is first class extra routes in neutron and adapt the heat resource to that.
I am willing to take up both the Neutron and Heat sides of this work in the Train cycle if the projects agree this is a good idea to work on and it is worth the review cycles. As a next step I'll write up a Neutron RFE. And at least open a placeholder Heat blueprint - which I'll be able to specify better when the first class extraroute Neutron API is agreed on. Please expect these coming still before the PTG. I'll be attending the Summit and the PTG in Denver so we can discuss these in person if we feel that's needed at that time.
Thx a lot for working on this Bence :)
Thanks, Bence
— Slawek Kaplonski Senior software engineer Red Hat
FYI: I just managed to upload the first version of the Neutron RFE and spec. All review is welcome. RFE: https://bugs.launchpad.net/neutron/+bug/1826396 Spec: https://review.opendev.org/655680 The Heat blueprint is coming soon too. -- Bence
And here's the Heat story and spec: https://storyboard.openstack.org/#!/story/2005522 https://review.opendev.org/655892 -- Bence
Hi All, Some of you may not be aware yet that a new concern was raised regarding the extraroute improvement plans just after the last neutron session was closed on the PTG. It seems we have a tradeoff between the support for the use case of tracking multiple needs for the same extra route or keeping the virtual router abstraction as simple as it was in the past. I'm raising the question of this tradeoff here in the mailing list because this (I hope) seems to be the last cross-project question of this topic. If we could find a cross-project consensus on this I could continue making progress inside each project without need for further cross-project coordination. Please help me find this consensus. I don't want to unnecessarily repeat arguments already made. I think the question is clearly formulated in the comments of patch sets 5, 6 and 8 of the below neutron-spec: https://review.opendev.org/655680 Improve Extraroute API All opinions, comments, questions are welcome there. Thanks in advance, Bence (rubasov)
Hi Bence, I posted my comment there (in PS6).
On 21 May 2019, at 10:17, Bence Romsics <bence.romsics@gmail.com> wrote:
Hi All,
Some of you may not be aware yet that a new concern was raised regarding the extraroute improvement plans just after the last neutron session was closed on the PTG.
It seems we have a tradeoff between the support for the use case of tracking multiple needs for the same extra route or keeping the virtual router abstraction as simple as it was in the past.
I'm raising the question of this tradeoff here in the mailing list because this (I hope) seems to be the last cross-project question of this topic. If we could find a cross-project consensus on this I could continue making progress inside each project without need for further cross-project coordination. Please help me find this consensus.
I don't want to unnecessarily repeat arguments already made. I think the question is clearly formulated in the comments of patch sets 5, 6 and 8 of the below neutron-spec:
https://review.opendev.org/655680 Improve Extraroute API
All opinions, comments, questions are welcome there.
Thanks in advance, Bence (rubasov)
— Slawek Kaplonski Senior software engineer Red Hat
participants (4)
-
Bence Romsics
-
Rabi Mishra
-
Slawomir Kaplonski
-
Zane Bitter