Subject: Re: [Trove] State of the Trove service tenant deployment model

older
[dev] [tc][all] TC office hours is...

Darek Król

22 Jan 2019 22 Jan '19

12:09 p.m.

On Tue, Jan 22, 2019 at 07:29:25PM +1300, Zane Bitter wrote:

Last time I heard (which was probably mid-2017), the Trove team had implemented encryption for messages on the RabbitMQ bus. IIUC each DB being managed had its own encryption keys, so that would theoretically prevent both snooping and spoofing of messages. That's the good news.

The bad news is that AFAIK it's still using a shared RabbitMQ bus, so attacks like denial of service are still possible if you can extract the shared credentials from the VM. Not sure about replay attacks; I haven't actually investigated the implementation.

cheers, Zane.

...

Excellent - many thanks for the confirmation.

Cheers, Michael

Hello Michael and Zane, sorry for the late reply. I believe Zane is referring to a video from 2017 [0]. Yes, messages from trove instances are encrypted and the keys are kept in Trove DB. It is still a shared message bus, but it can be a message bus dedicated for Trove only and separated from message bus shared by other Openstack services. DDOS attacks are also mentioned in the video as a potential threat but there is very little details and possible solutions. Recently we had some internal discussion about this threat within Trove team. Maybe we could user Rabbitmq mechanisms for flow control mentioned in [1,2,3] ? Another point, I'm wondering if this is a problem only in Trove or is it something other services would be interesting in also ? Best, Darek [0] https://youtu.be/dzvcKlt3Lx8 [1] https://www.rabbitmq.com/flow-control.html [2] http://www.rabbitmq.com/blog/2012/04/17/rabbitmq-performance-measurements-pa... [3] https://tech.labs.oliverwyman.com/blog/2013/08/31/controlling-fast-producers...

Show replies by date

Fox, Kevin M

22 Jan 22 Jan

12:24 p.m.

We tried to solve it as a cross project issue for a while and then everyone gave up. lots of projects have the same problem. trove, sahara, magnum, etc. Other then just control messages, there is also the issue of version skew between guest agents and controllers and how to do rolling upgrades. Its messy today. I'd recommend at this point to maybe just run kubernetes across the vms and push the guest agents/workload to them. You can still drive it via an openstack api, but doing rolling upgrades of guest agents or mysql containers or whatever is way simpler for operators to handle. We should embrace k8s as part of the solution rather then trying to reimplement it IMO. Thanks, Kevin ________________________________________ From: Darek Król [dkrol3@gmail.com] Sent: Tuesday, January 22, 2019 12:09 PM To: Michael Richardson Cc: openstack-discuss@lists.openstack.org Subject: Subject: Re: [Trove] State of the Trove service tenant deployment model On Tue, Jan 22, 2019 at 07:29:25PM +1300, Zane Bitter wrote:

...

Last time I heard (which was probably mid-2017), the Trove team had implemented encryption for messages on the RabbitMQ bus. IIUC each DB being managed had its own encryption keys, so that would theoretically prevent both snooping and spoofing of messages. That's the good news.

The bad news is that AFAIK it's still using a shared RabbitMQ bus, so attacks like denial of service are still possible if you can extract the shared credentials from the VM. Not sure about replay attacks; I haven't actually investigated the implementation.

cheers, Zane.

...

Excellent - many thanks for the confirmation.

Cheers, Michael

Darek Król

12:38 p.m.

Is there any documentation written down from this discussion ? I would really like to read more about the problem and any ideas for possible solutions. Your recommendation abou k8s sounds interesting but I’m not sure if I understand it fully. Would you like to have a k8s cluster for all tenants on top of vms to handle trove instances ? And Is upgrade a different problem than ddos attack at message bus ? Best, Darek On Tue, 22 Jan 2019 at 21:25, Fox, Kevin M <Kevin.Fox@pnnl.gov> wrote:

...

We tried to solve it as a cross project issue for a while and then everyone gave up. lots of projects have the same problem. trove, sahara, magnum, etc.

Other then just control messages, there is also the issue of version skew between guest agents and controllers and how to do rolling upgrades. Its messy today.

I'd recommend at this point to maybe just run kubernetes across the vms and push the guest agents/workload to them. You can still drive it via an openstack api, but doing rolling upgrades of guest agents or mysql containers or whatever is way simpler for operators to handle. We should embrace k8s as part of the solution rather then trying to reimplement it IMO.

Thanks, Kevin ________________________________________ From: Darek Król [dkrol3@gmail.com] Sent: Tuesday, January 22, 2019 12:09 PM To: Michael Richardson Cc: openstack-discuss@lists.openstack.org Subject: Subject: Re: [Trove] State of the Trove service tenant deployment model

On Tue, Jan 22, 2019 at 07:29:25PM +1300, Zane Bitter wrote:

...
Last time I heard (which was probably mid-2017), the Trove team had implemented encryption for messages on the RabbitMQ bus. IIUC each DB being managed had its own encryption keys, so that would theoretically prevent both snooping and spoofing of messages. That's the good news.

The bad news is that AFAIK it's still using a shared RabbitMQ bus, so attacks like denial of service are still possible if you can extract the shared credentials from the VM. Not sure about replay attacks; I haven't actually investigated the implementation.

cheers, Zane.

...
Excellent - many thanks for the confirmation.

Cheers, Michael

Hello Michael and Zane,

sorry for the late reply.

I believe Zane is referring to a video from 2017 [0]. Yes, messages from trove instances are encrypted and the keys are kept in Trove DB. It is still a shared message bus, but it can be a message bus dedicated for Trove only and separated from message bus shared by other Openstack services.

DDOS attacks are also mentioned in the video as a potential threat but there is very little details and possible solutions. Recently we had some internal discussion about this threat within Trove team. Maybe we could user Rabbitmq mechanisms for flow control mentioned in [1,2,3] ?

Another point, I'm wondering if this is a problem only in Trove or is it something other services would be interesting in also ?

Best, Darek

[0] https://youtu.be/dzvcKlt3Lx8 [1] https://www.rabbitmq.com/flow-control.html [2] http://www.rabbitmq.com/blog/2012/04/17/rabbitmq-performance-measurements-pa... [3] https://tech.labs.oliverwyman.com/blog/2013/08/31/controlling-fast-producers...

Fox, Kevin M

12:49 p.m.

Its probably captured in summit notes from 5-3 years ago. Nothing specific I can point at without going through a lot of archaeology. Yeah, deploying the users databases on top of kubernetes in vm's would be easier to upgrade I think then pure vm's with a pile of debs/rpms inside. Its tangentially related to the message bus stuff. If you solve the ddos attack issue with the message bus, you still have the upgrade problem. but depending on how you choose to solve the communications channel issues you can solve other issues such as upgrades easier, harder, or not at all. Thanks, Kevin ________________________________ From: Darek Król [dkrol3@gmail.com] Sent: Tuesday, January 22, 2019 12:38 PM To: Fox, Kevin M Cc: Michael Richardson; openstack-discuss@lists.openstack.org Subject: Re: Subject: Re: [Trove] State of the Trove service tenant deployment model Is there any documentation written down from this discussion ? I would really like to read more about the problem and any ideas for possible solutions. Your recommendation abou k8s sounds interesting but I’m not sure if I understand it fully. Would you like to have a k8s cluster for all tenants on top of vms to handle trove instances ? And Is upgrade a different problem than ddos attack at message bus ? Best, Darek On Tue, 22 Jan 2019 at 21:25, Fox, Kevin M <Kevin.Fox@pnnl.gov<mailto:Kevin.Fox@pnnl.gov>> wrote: We tried to solve it as a cross project issue for a while and then everyone gave up. lots of projects have the same problem. trove, sahara, magnum, etc. Other then just control messages, there is also the issue of version skew between guest agents and controllers and how to do rolling upgrades. Its messy today. I'd recommend at this point to maybe just run kubernetes across the vms and push the guest agents/workload to them. You can still drive it via an openstack api, but doing rolling upgrades of guest agents or mysql containers or whatever is way simpler for operators to handle. We should embrace k8s as part of the solution rather then trying to reimplement it IMO. Thanks, Kevin ________________________________________ From: Darek Król [dkrol3@gmail.com<mailto:dkrol3@gmail.com>] Sent: Tuesday, January 22, 2019 12:09 PM To: Michael Richardson Cc: openstack-discuss@lists.openstack.org<mailto:openstack-discuss@lists.openstack.org> Subject: Subject: Re: [Trove] State of the Trove service tenant deployment model On Tue, Jan 22, 2019 at 07:29:25PM +1300, Zane Bitter wrote:

...

Last time I heard (which was probably mid-2017), the Trove team had implemented encryption for messages on the RabbitMQ bus. IIUC each DB being managed had its own encryption keys, so that would theoretically prevent both snooping and spoofing of messages. That's the good news.

The bad news is that AFAIK it's still using a shared RabbitMQ bus, so attacks like denial of service are still possible if you can extract the shared credentials from the VM. Not sure about replay attacks; I haven't actually investigated the implementation.

cheers, Zane.

...

Excellent - many thanks for the confirmation.

Cheers, Michael

Luigi Toscano

12:39 p.m.

On Tuesday, 22 January 2019 21:24:55 CET Fox, Kevin M wrote:

...

We tried to solve it as a cross project issue for a while and then everyone gave up. lots of projects have the same problem. trove, sahara, magnum, etc.

I would say that Sahara does not have the same problem. There was a discussion to use a loca agent in 2014 but it died out. Sahara communicate with each node through ssh, so no (ab)use of the message bus. Ciao -- Luigi

Fox, Kevin M

12:44 p.m.

Yeah, it has its own sets of problems. The implementation chosen differs from the one chosen by trove. But in a way that's even worse for operators, as each advanced service (trove, magnum, sahara) has to be learned independently by operators on how to secure it, debug it, upgrade it, etc. Adding each one to the cluster increases the cognitive burden significantly so its rare to deploy multiple in an openstack. Thanks, Kevin ________________________________________ From: Luigi Toscano [ltoscano@redhat.com] Sent: Tuesday, January 22, 2019 12:39 PM To: openstack-discuss@lists.openstack.org Cc: Fox, Kevin M; Darek Król; Michael Richardson Subject: Re: Subject: Re: [Trove] State of the Trove service tenant deployment model On Tuesday, 22 January 2019 21:24:55 CET Fox, Kevin M wrote:

...

We tried to solve it as a cross project issue for a while and then everyone gave up. lots of projects have the same problem. trove, sahara, magnum, etc.

Lingxian Kong

3:08 p.m.

On Wed, Jan 23, 2019 at 9:27 AM Fox, Kevin M <Kevin.Fox@pnnl.gov> wrote:

...

I'd recommend at this point to maybe just run kubernetes across the vms and push the guest agents/workload to them.

This sounds like an overkill to me. Currently, different projects in openstack are solving this issue in different ways, e.g. Octavia is using two-way SSL authentication API between the controller service and amphora(which is the vm running HTTP server inside), Magnum is using heat-container-agent that is communicating with Heat via API, etc. However, Trove chooses another option which has brought a lot of discussions over a long time. In the current situation, I don't think it's doable for each project heading to one common solution, but Trove can learn from other projects to solve its own problem. Cheers, Lingxian Kong

Fox, Kevin M

3:24 p.m.

Octavia is a slightly easier solution in that each tenanat's load balancer doesn't have to be a different version of the software being deployed, such as trove's users selection of mysql 5 vs mysq 10 or postgres, or sahara's choice in hadoop, etc. Permutations of user's version vs guest agent's version, etc. lets not get into rpms vs debs, which image building tool you use to stamp out the images, test frameworks, etc. Its simpler for a Developer, not to deal with it and just say its an Operators problem. But as a whole, including the Operators problems, its way more complex to deal with that way. just my $0.02 Thanks, Kevin ________________________________ From: Lingxian Kong [anlin.kong@gmail.com] Sent: Tuesday, January 22, 2019 3:08 PM To: openstack-discuss@lists.openstack.org Subject: Re: Subject: Re: [Trove] State of the Trove service tenant deployment model On Wed, Jan 23, 2019 at 9:27 AM Fox, Kevin M <Kevin.Fox@pnnl.gov<mailto:Kevin.Fox@pnnl.gov>> wrote: I'd recommend at this point to maybe just run kubernetes across the vms and push the guest agents/workload to them. This sounds like an overkill to me. Currently, different projects in openstack are solving this issue in different ways, e.g. Octavia is using two-way SSL authentication API between the controller service and amphora(which is the vm running HTTP server inside), Magnum is using heat-container-agent that is communicating with Heat via API, etc. However, Trove chooses another option which has brought a lot of discussions over a long time. In the current situation, I don't think it's doable for each project heading to one common solution, but Trove can learn from other projects to solve its own problem. Cheers, Lingxian Kong

Zane Bitter

5:04 p.m.

On 23/01/19 9:09 AM, Darek Król wrote:

...

On Tue, Jan 22, 2019 at 07:29:25PM +1300, Zane Bitter wrote:

...
Last time I heard (which was probably mid-2017), the Trove team had implemented encryption for messages on the RabbitMQ bus. IIUC each DB being managed had its own encryption keys, so that would theoretically prevent both snooping and spoofing of messages. That's the good news.

The bad news is that AFAIK it's still using a shared RabbitMQ bus, so attacks like denial of service are still possible if you can extract the shared credentials from the VM. Not sure about replay attacks; I haven't actually investigated the implementation.

cheers, Zane.

...
Excellent - many thanks for the confirmation.

Cheers, Michael

Hello Michael and Zane,

sorry for the late reply.

I believe Zane is referring to a video from 2017 [0]. Yes, messages from trove instances are encrypted and the keys are kept in Trove DB. It is still a shared message bus, but it can be a message bus dedicated for Trove only and separated from message bus shared by other Openstack services.

DDOS attacks are also mentioned in the video as a potential threat but there is very little details and possible solutions.

Yes, in fact that was me asking the question in that video :)

...

Recently we had some internal discussion about this threat within Trove team. Maybe we could user Rabbitmq mechanisms for flow control mentioned in [1,2,3] ?

Another point, I'm wondering if this is a problem only in Trove or is it something other services would be interesting in also ?

Best, Darek

[0] https://youtu.be/dzvcKlt3Lx8 [1] https://www.rabbitmq.com/flow-control.html [2] http://www.rabbitmq.com/blog/2012/04/17/rabbitmq-performance-measurements-pa... [3] https://tech.labs.oliverwyman.com/blog/2013/08/31/controlling-fast-producers...

Zhang Fan

10:12 p.m.

Hey all, Glad to see someone actually having trove in production and giving feedback to the community, thanks for doing that BTW. IIRC, back in 2017, we had a remote discussion during PTG, and we were planning to adopt octavia solution, @huntxu drafted a specs https://review.openstack.org/#/c/553679/, but as far as I know, he will not continue this work in the future. Best Wishes. Fan Zhang On Jan 23, 2019, at 09:04, Zane Bitter <zbitter@redhat.com<mailto:zbitter@redhat.com>> wrote: On 23/01/19 9:09 AM, Darek Król wrote: On Tue, Jan 22, 2019 at 07:29:25PM +1300, Zane Bitter wrote: Last time I heard (which was probably mid-2017), the Trove team had implemented encryption for messages on the RabbitMQ bus. IIUC each DB being managed had its own encryption keys, so that would theoretically prevent both snooping and spoofing of messages. That's the good news. The bad news is that AFAIK it's still using a shared RabbitMQ bus, so attacks like denial of service are still possible if you can extract the shared credentials from the VM. Not sure about replay attacks; I haven't actually investigated the implementation. cheers, Zane. Excellent - many thanks for the confirmation. Cheers, Michael Hello Michael and Zane, sorry for the late reply. I believe Zane is referring to a video from 2017 [0]. Yes, messages from trove instances are encrypted and the keys are kept in Trove DB. It is still a shared message bus, but it can be a message bus dedicated for Trove only and separated from message bus shared by other Openstack services. DDOS attacks are also mentioned in the video as a potential threat but there is very little details and possible solutions. Yes, in fact that was me asking the question in that video :) Recently we had some internal discussion about this threat within Trove team. Maybe we could user Rabbitmq mechanisms for flow control mentioned in [1,2,3] ? Another point, I'm wondering if this is a problem only in Trove or is it something other services would be interesting in also ? Best, Darek [0] https://youtu.be/dzvcKlt3Lx8 [1] https://www.rabbitmq.com/flow-control.html [2] http://www.rabbitmq.com/blog/2012/04/17/rabbitmq-performance-measurements-pa... [3] https://tech.labs.oliverwyman.com/blog/2013/08/31/controlling-fast-producers...

2358

Age (days ago)

2359

Last active (days ago)

List overview

Download

9 comments

6 participants

participants (6)

Darek Król
Fox, Kevin M
Lingxian Kong
Luigi Toscano
Zane Bitter
Zhang Fan