[all] [oslo.messaging] Interest in collaboration on a NATS driver
Hello everyone, Before continuing on, yes this kind of is a massive effort but it doesn’t have to be, it would be very cool to get a replacement for RabbitMQ as I’m probably not the only one not satisfied with it. I've proposed a very bare POC in [1] but it's long way from being finished, but atleast some basic devstack is passing. NATS [2] is a cloud-native scalable messaging system that supports the one-to-many and pub-sub methods that we can use to implement it as a oslo.messaging driver. This would make OpenStack easier to deploy in a highly available fashion, reduce outages related to RabbitMQ, free up memory and CPU usage by RabbitMQ (it’s insane when using clustering) and embrace a more cloud-native approach for our software that runs the cloud, alternatives is also welcome :) The POC has a lot of things that could be improved for example: • Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) • Handle reconnects or interruptions (for example resubscribe to topics etc) • Timeouts need to be implemented and handled • Investigate maximum message payload size • Find or maintain a NATS python library that doesn't use async like the official one does • Add a lot of testing • Cleanup everything noted as TODO in the POC code Now I couldn’t possibly pull this off myself without some collaboration with all of you, even though I’m very motivated to just dig in and do this for the rest of the year and migrate our test fleet there I unfortunately (like everyone else) is juggeling a lot of balls at the same time. If anybody, or any company, out there would be interested in collaborating in a project to bring this support and maintain it feel free to reach out. I’m hoping somebody will bite but atleast I’ve put it out there for all of you. Best regards Tobias [1] https://review.opendev.org/c/openstack/oslo.messaging/+/848338 [2] https://nats.io
On Mon, 2022-08-29 at 13:46 +0000, Tobias Urdin wrote:
Hello everyone,
Before continuing on, yes this kind of is a massive effort but it doesn’t have to be, it would be very cool to get a replacement for RabbitMQ as I’m probably not the only one not satisfied with it. I've proposed a very bare POC in [1] but it's long way from being finished, but atleast some basic devstack is passing. +1 for actully doing this i looked at adding nats support brifly in the past but never found the time to actully try doing it. i thnk it is a very interesting alternitive to consider moving forward.
NATS [2] is a cloud-native scalable messaging system that supports the one-to-many and pub-sub methods that we can use to implement it as a oslo.messaging driver.
This would make OpenStack easier to deploy in a highly available fashion, reduce outages related to RabbitMQ, free up memory and CPU usage by RabbitMQ (it’s insane when using clustering) and embrace a more cloud-native approach for our software that runs the cloud, alternatives is also welcome :)
The POC has a lot of things that could be improved for example: • Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) • Handle reconnects or interruptions (for example resubscribe to topics etc) • Timeouts need to be implemented and handled • Investigate maximum message payload size • Find or maintain a NATS python library that doesn't use async like the official one does • Add a lot of testing • Cleanup everything noted as TODO in the POC code
Now I couldn’t possibly pull this off myself without some collaboration with all of you, even though I’m very motivated to just dig in and do this for the rest of the year and migrate our test fleet there I unfortunately (like everyone else) is juggeling a lot of balls at the same time.
If anybody, or any company, out there would be interested in collaborating in a project to bring this support and maintain it feel free to reach out. I’m hoping somebody will bite but atleast I’ve put it out there for all of you.
Best regards Tobias
[1] https://review.opendev.org/c/openstack/oslo.messaging/+/848338 [2] https://nats.io
Hi Tobias, Good to see RMQ alternatives appearing. A couple of questions from me. On Mon, 29 Aug 2022 at 15:47, Tobias Urdin <tobias.urdin@binero.com> wrote:
• Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could)
What do you mean? Is NATS only a router? (I have not used this technology yet.)
• Find or maintain a NATS python library that doesn't use async like the official one does
Why is async a bad thing? For messaging it's the right thing. Finally, have you considered just trying out ZeroMQ? I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). NATS seems to me to cater for a different use case. I might be wrong because I have read only the front page but that is the feeling I have. Cheers, Radek -yoctozepto
Hi Tobias,
Good to see RMQ alternatives appearing. A couple of questions from me.
On Mon, 29 Aug 2022 at 15:47, Tobias Urdin <tobias.urdin@binero.com> wrote:
• Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could)
What do you mean? Is NATS only a router? (I have not used this technology yet.) no but if you want distibute persiten its part of the option stream api https://docs.nats.io/nats-concepts/jetstream
On Mon, 2022-08-29 at 18:33 +0200, Radosław Piliszek wrote: they descirbe when to use core nats or jetsream here. https://docs.nats.io/using-nats/developer/develop_jetstream#when-to-use-stre... https://docs.nats.io/using-nats/developer/develop_jetstream#when-to-use-core... i think the poc is just using core nats currenlty.
• Find or maintain a NATS python library that doesn't use async like the official one does
Why is async a bad thing? For messaging it's the right thing.
Finally, have you considered just trying out ZeroMQ?
ZeroMQ used to be supported in the past but then it was remvoed if i understand correctly it only supprot notificaiton or RPC but not both i dont recall which but perhapse im miss rememebrign on that point.
I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). its not any more overkill then rabbitmq is i also dont know waht you mean when you say "majority of them stay static on the hosts they control"
NATS is intended a s a cloud native horrizontally scaleable message bus. which is exactly what openstack need IMO.
NATS seems to me to cater for a different use case. I might be wrong because I have read only the front page but that is the feeling I have.
Cheers, Radek -yoctozepto
On Mon, 29 Aug 2022 at 20:03, Sean Mooney <smooney@redhat.com> wrote:
Finally, have you considered just trying out ZeroMQ? ZeroMQ used to be supported in the past but then it was remvoed if i understand correctly it only supprot notificaiton or RPC but not both i dont recall which but perhapse im miss rememebrign on that point.
I believe it would be better suited for RPC than notifications, at least in the simplest form.
I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). its not any more overkill then rabbitmq is
True that. Probably.
i also dont know waht you mean when you say "majority of them stay static on the hosts they control"
NATS is intended a s a cloud native horrizontally scaleable message bus. which is exactly what openstack need IMO.
NATS seems to be tweaked for "come and go" situations which is an exception in the OpenStack world, not the rule (at least in my view). I mean, one normally expects to have a preset number of hypervisors and not them coming and going (which, I agree, is a nice vision, could be a proper NATS driver, with more awareness in the client projects I believe, would be an enabler for more dynamic clouds). Cheers, Radek -yoctozepto
On 8/29/22 13:20, Radosław Piliszek wrote:
On Mon, 29 Aug 2022 at 20:03, Sean Mooney <smooney@redhat.com> wrote:
Finally, have you considered just trying out ZeroMQ? ZeroMQ used to be supported in the past but then it was remvoed if i understand correctly it only supprot notificaiton or RPC but not both i dont recall which but perhapse im miss rememebrign on that point.
I believe it would be better suited for RPC than notifications, at least in the simplest form.
I believe it was the opposite. ZeroMQ could handle fire-and-forget notifications fine, but when you started trying to coordinate with the receivers the way oslo.messaging does it was missing features that had to be implemented in the o.m driver. Once you added everything needed for a fully functional OpenStack messaging driver you ended up negating a lot of the benefits of ZeroMQ. It just wasn't a good fit for the messaging model we use. Caveat: I am not a messaging expert so I'm just regurgitating the things I've heard from the messaging experts I've discussed this with.
I mean, NATS is probably an overkill for OpenStack services since the majority of them stay static on the hosts they control (think nova-compute, neutron agents - and these are also the pain points that operators want to ease). its not any more overkill then rabbitmq is
True that. Probably.
i also dont know waht you mean when you say "majority of them stay static on the hosts they control"
NATS is intended a s a cloud native horrizontally scaleable message bus. which is exactly what openstack need IMO.
NATS seems to be tweaked for "come and go" situations which is an exception in the OpenStack world, not the rule (at least in my view). I mean, one normally expects to have a preset number of hypervisors and not them coming and going (which, I agree, is a nice vision, could be a proper NATS driver, with more awareness in the client projects I believe, would be an enabler for more dynamic clouds).
Cheers, Radek -yoctozepto
Note that the is an existing spec related to this: https://review.opendev.org/c/openstack/oslo-specs/+/692784 In general I think we were in agreement on adding a NATS driver, but there were some roadblocks. I don't think any implementation work was really done as a result of that. I'm also not sure how much of that discussion is still relevant - for example, would we still require a forked library? On 8/29/22 08:46, Tobias Urdin wrote:
Hello everyone,
Before continuing on, yes this kind of is a massive effort but it doesn’t have to be, it would be very cool to get a replacement for RabbitMQ as I’m probably not the only one not satisfied with it. I've proposed a very bare POC in [1] but it's long way from being finished, but atleast some basic devstack is passing.
NATS [2] is a cloud-native scalable messaging system that supports the one-to-many and pub-sub methods that we can use to implement it as a oslo.messaging driver.
This would make OpenStack easier to deploy in a highly available fashion, reduce outages related to RabbitMQ, free up memory and CPU usage by RabbitMQ (it’s insane when using clustering) and embrace a more cloud-native approach for our software that runs the cloud, alternatives is also welcome :)
The POC has a lot of things that could be improved for example: • Do retries and acknowledgements in the library (since NATS does NOT persist messages like RabbitMQ could) • Handle reconnects or interruptions (for example resubscribe to topics etc) • Timeouts need to be implemented and handled • Investigate maximum message payload size • Find or maintain a NATS python library that doesn't use async like the official one does • Add a lot of testing • Cleanup everything noted as TODO in the POC code
Now I couldn’t possibly pull this off myself without some collaboration with all of you, even though I’m very motivated to just dig in and do this for the rest of the year and migrate our test fleet there I unfortunately (like everyone else) is juggeling a lot of balls at the same time.
If anybody, or any company, out there would be interested in collaborating in a project to bring this support and maintain it feel free to reach out. I’m hoping somebody will bite but atleast I’ve put it out there for all of you.
Best regards Tobias
[1] https://review.opendev.org/c/openstack/oslo.messaging/+/848338 [2] https://nats.io
On Thu, 1 Sept 2022 at 19:15, Ben Nemec <openstack@nemebean.com> wrote:
Note that the is an existing spec related to this: https://review.opendev.org/c/openstack/oslo-specs/+/692784
In general I think we were in agreement on adding a NATS driver, but there were some roadblocks. I don't think any implementation work was really done as a result of that. I'm also not sure how much of that discussion is still relevant - for example, would we still require a forked library?
There was, but using a long-gone lib: https://review.opendev.org/c/openstack/oslo.messaging/+/680629 I summarised it all in comments to: https://review.opendev.org/c/openstack/oslo.messaging/+/848338 Anyways, as I mentioned in the other part of this thread, the next step would be to revive the spec - I guess this is best done as a new change by Tobias (? well, as he started this). We could borrow some insight from the old spec though. Radek -yoctozepto
participants (4)
-
Ben Nemec
-
Radosław Piliszek
-
Sean Mooney
-
Tobias Urdin