[openstack-dev] [oslo][nova] Messaging: everything can talk to everything, and that is a bad thing

Flavio Percoco flavio at redhat.com
Thu Mar 24 12:02:48 UTC 2016


On 22/03/16 17:20 -0400, Adam Young wrote:
>On 03/22/2016 09:15 AM, Flavio Percoco wrote:
>
>    On 21/03/16 21:43 -0400, Adam Young wrote:
>
>        I had a good discussion with the Nova folks in IRC today.
>
>        My goal was to understand what could talk to what, and the short
>        according to dansmith
>
>        " any node in nova land has to be able to talk to the queue for any
>        other one for the most part: compute->compute, compute->conductor,
>        conductor->compute, api->everything. There might be a few exceptions,
>        but not worth it, IMHO, in the current architecture."
>
>        Longer conversation is here:
>        http://eavesdrop.openstack.org/irclogs/%23openstack-nova/
>        %23openstack-nova.2016-03-21.log.html#t2016-03-21T17:54:27
>
>        Right now, the message queue is a nightmare.  All sorts of sensitive
>        information flows over the message queue: Tokens (including admin) are
>        the most obvious.  Every piece of audit data. All notifications and all
>        control messages.
>
>        Before we continue down the path of "anything can talk to anything" can
>        we please map out what needs to talk to what, and why?  Many of the use
>        cases seem to be based on something that should be kicked off by the
>        conductor, such as "migrate, resize, live-migrate" and it sounds like
>        there are plans to make that happen.
>
>        So, let's assume we can get to the point where, if node 1 needs to talk
>        to node 2, it will do so only via the conductor.  With that in place,
>        we can put an access control rule in place:
>
>
>    I don't think this is going to scale well. Eventually, this will require
>    evolving the conductor to some sort of message scheduler, which is pretty
>    much
>    what the message bus is supposed to do.
>
>
>I'll limit this to what happens with Rabbit and QPID (AMQP1.0) and leave 0 our
>of it for now.  I'll use rabbit as shorthand for both these, but the rules are
>the same for qpid.

Sorry for the pedantic nitpick but, it's not Qpid. I'm afraid calling it Qpid
will just confuse people on what we're really talking about here. The amqp1
driver is based on the AMQP 1.0 protocol, which is brokerless. The library used
in oslo.messaging is qpid-proton (A.K.A Proton). Qpid is just the name of the
Apache Foundation family that these projects belong to (including Qpidd the old
broker which we don't support anymore in oslo.messaging).


>For, say, a migrate operation, the call goes to API, controller, and eventually
>down to one of the compute nodes.  Source? Target?  I don't know the code well
>enough to say, but let's say it is the source.  It sends an RPC message to the
>target node.  The message goes to  the central broker right now, and then back
>down to the targen node.  Meanwhile, the source node has set up a reply queue
>and that queue name has gone into the message.  The target machine responds  by
>getting a reference to the response queue and sends a message.  This message
>goes up to the broker, and then down to the the source node.
>
>A man in the middle could sit there and also read off the queue. It could
>modify a message, with its own response queue, and happily tranfer things back
>and forth.
>
>So, we have the HMAC proposal, which then puts crypto and key distribution all
>over the place.  Yes, it would guard against a MITM attack, but the cost in
>complexity and processor time it high.
>
>
>Rabbit does not have a very flexible ACL scheme, bascially, a RegEx per Rabbit
>user.  However, we could easily spin up a new queue for direct node to node
>communication that did meet an ACL regex.  For example, if we said that the
>regex was that the node could only read/write queues that have its name in
>them, to make a request and response queue between node-1 and node-2 we could
>create a queues
>
>
>node-1-node-2
>node-1-node-2-<uuid>-reply
>
>
>So, instead of a single queue request, there are two.  And conductor could tell
>the target node: start listening on this queue.
>
>
>Or, we could pass the message through the conductor.  The request message goes
>from node-1 to conductor,  where conductor validates the businees logic of the
>message, then puts it into the message queue for node-2.  Responses can then go
>directly back from node-2 to node-1 the way they do now.
>
>OR...we could set up a direct socket between the two nodes, with the socket set
>up info going over the broker.  OR we could use a web server,  OR send it over
>SNMP.  Or SMTP, OR TFTP.  There are many ways to get the messages from node to
>node.
>
>If  we are going to use the message broker to do this, we should at least make
>it possible to secure it, even if it is not the default approach.
>
>It might be possible to use a broker specific technology to optimize this, but
>I am not a Rabbit expert.  Maybe there is some way of filtering messages?
>
>
>
>
>        1.  Compute nodes can only read from the queue compute.<name>
>        -novacompute-<index>.localdomain
>        2.  Compute nodes can only write to response queues in the RPC vhost
>        3.  Compute nodes can only write to notification queus in the
>        notification host.
>
>        I know that with AMQP, we should be able to identify the writer of a
>        message.  This means that each compute node should have its own user. 
>        I have identified how to do that for Rabbit and QPid.  I assume for 0mq
>        is would make sense to use ZAP (http://rfc.zeromq.org/spec:27) but I'd
>        rather the 0mq maintainers chime in here.
>
>
>
>    NOTE: Gentle reminder that qpidd has been removed from oslo.messaging.
>
>
>Yes, but QPID is proton is AMQP1.0 and I did a proof of concept with it last
>summer.  It supports encryption and authentication over GSSAPI and is, I think,
>the best option for securing messaging in an OpenStack deployment at the
>moment.
>

++

>
>
>    I think you can configure rabbit, amqp1 and other technologies to do what
>    you're
>    suggesting here without much trouble. TBH, I'm not sure how many chances
>    would
>    be required in Nova (or even oslo.messaging) but I'd dare to say non are
>    required.
>
>
>        I think it is safe (and sane) to have the same use on the compute node
>        communicate with  Neutron, Nova, and Ceilometer.  This will avoid a
>        false sense of security: if one is compromised, they are all going to
>        be compromised.  Plan accordingly.
>
>        Beyond that, we should have message broker users for each of the
>        components that is a client of the broker.
>
>        Applications that run on top of the cloud, and that do not get presence
>        on the compute nodes, should have their own VHost.  I see Sahara on my
>        Tripleo deploy, but I assume there are others.  Either they completely
>        get their own vhost, or the apps should share one separate from the RPC
>        /Notification vhosts we currently have.  Even Heat might fall into this
>        category.
>
>        Note that those application users can be allowed to read from the
>        notification queues if necessary.  They just should not be using the
>        same vhost for their own traffic.
>
>        Please tell me if/where I am blindingly wrong in my analysis.
>
>
>
>    I guess my question is: Have you identified things that need to be changed
>    in
>    any of the projects for this to be possible? Or is it a pure deployment
>    recommendation/decision?
>
>
>There are certainly deployment changes we need to make that help.  And we can
>likely make it such that the compute nodes can only read from their own
>appropriate queues.  However, without changing the queue naming scheme, I can't
>see how to control who can write to where.  Right now, its a free for all.
>
>
>
>    I'd argue that any change (assuming changes are required) are likely to
>    happen
>    in specific projects (Nova, Neutron, etc) and that once this scenario is
>    supported, it'll remain a deployment choice to follow it or not. If I want
>    my
>    undercloud services to use a single vhost and a single user, I must be able
>    to
>    do that. The proposal in this email complicates deployments significantly,
>    despite it making sense from a security stand point.
>
>So, nothing I am saying is preventing that.  OTOH, there is insufficient
>support from the RPC approach to do a more secure ACL.
>
>
>
>
>    One more thing. Depending on the messaging technology, having different
>    virtual
>    hosts may have an impact on the performance when running under huge loads
>    given
>    the fact that the data will be partitioned differently and, therefore,
>    written/read differently. I don't have good data at hand about this, sorry.
>
>
>So, I think that performance can be optimized many ways, including having
>multiple Brokers involved in a deployment.  I've seen architecture diagrams to
>that effect, but have not had to put it in to production myself.


Including several brokers will complicate maintenance and deployments as much as
adding a gazillion of vhosts and a user per node. This is possible but the
benefit is not enough to make this complexity worth it, I believe.

On a brokerless environment, it might be simpler to achieve this given that it's
easier to set the rules in the message router and that such environment allows
for p2p communications when needed.

Cheers,
Flavio

-- 
@flaper87
Flavio Percoco
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 819 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20160324/671b5204/attachment.pgp>


More information about the OpenStack-dev mailing list