[openstack-dev] Request for comments for a possible solution

Mike Kolesnik mkolesni at redhat.com
Wed Jan 14 14:19:18 UTC 2015


Hi Mathieu, 

Please see comments inline. 

Regards, 
Mike 

----- Original Message -----

> Hi Mike,

> after reviewing your latest patch [1], I think that a possible solution could
> be to add a new entry in fdb RPC message.
> This entry would specify whether the port is multi-bound or not.
> The new fdb message would look like this :
> {net_id:
> {port:
> {agent_ip:
> {mac, ip, multi-bound }
> }
> }
> network_type:
> vxlan,
> segment_id:
> id
> }

> When the multi-bound option would be set, the ARP responder would be
> provisioned but the underlying module (ovs or kernel vxlan) would be
> provisioned to flood the packet to every tunnel concerned by this overlay
> segment, and not only the tunnel to agent that is supposed to host the port.
> In the LB world, this means not adding fdb entry for the MAC of the
> multi-bound port, whereas in the OVS world, it means not adding a flow that
> send the trafic that matches the MAC of the multi-bound port to only one
> tunnel port, but to every tunnel port of this overlay segment.

So let me see if I understand what you suggest correctly.. 

You suggest that instead of not sending the FDB we do send it along with an optional third parameter? 

Mind you that FDBs are sent as a list so for example an l2pop message would look like: 
{ 
'61a00edd-018e-4923-9524-df91b3f3083b': { 
'ports': { 
'30.0.0.2': [ 
[ 
'00:00:00:00:00:00', 
'0.0.0.0' 
], 
[ 
'00:00:00:00:12:34', 
'10.0.0.1' 
] 
], 
'30.0.0.1': [ 
[ 
'00:00:00:00:00:00', 
'0.0.0.0' 
], 
[ 
'00:00:00:00:56:78', 
'10.1.1.1' 
] 
] 
}, 
'network_type': u'vxlan', 
'segment_id': 1 
} 
} 

So the parameter you suggest to add will be at index 2 of each FDB list? 

I'm not sure it will be optional then, otherwise it could be quite hard to decode these messages.. 

Also, you suggest that each agent will know what to do according to this parameter? 

> This way, traffic to multi-bound port will behave as unknown unicast traffic.
> First packet will be flood to every tunnel and local bridge will learn the
> correct tunnel for the following packets based on which tunnel received the
> answer.
> Once learning occurs with first ingress packet, following packets would be
> sent to the correct tunnel and not flooded anymore.

IIUC then we still need to send all nodes where the HA port is scheduled on, this just adds on top of it and moves out the decision regarding the FDB to the agent level. 

The FDB is then only needed for populating the ARP responder? 

> I've tested this with linuxbridge and it works fine. Based on code overview,
> this should work correctly with OVS too. I'll test it ASAP.

> I know that DVR team already add such a flag in RPC messages, but they revert
> it in later patches. I would be very interested in having their opinion on
> this proposal.
> It seems that DVR port could also use this flag. This would result in having
> ARP responder activated for DVR port too.

> This shouldn't need a bump in RPC versioning since this flag would be
> optionnal. So their shouldn't have any issue with backward compatibility.

I'm not sure if it's backwards compatible since you're actually changing the construct of the RPC message so it's a bit unexpected how the old agents will react. 
It's not adding a new key-value, it's modifying each fdb's list.. 

> Regards,

> Mathieu

> [1] https://review.openstack.org/#/c/141114/2

> On Sun, Dec 21, 2014 at 12:14 PM, Narasimhan, Vivekanandan <
> vivekanandan.narasimhan at hp.com > wrote:

> > Hi Mike,
> 

> > Just one comment [Vivek]
> 

> > -----Original Message-----
> 
> > From: Mike Kolesnik [mailto: mkolesni at redhat.com ]
> 
> > Sent: Sunday, December 21, 2014 11:17 AM
> 
> > To: OpenStack Development Mailing List (not for usage questions)
> 
> > Cc: Robert Kukura
> 
> > Subject: Re: [openstack-dev] [Neutron][L2Pop][HA Routers] Request for
> > comments for a possible solution
> 

> > Hi Mathieu,
> 

> > Comments inline
> 

> > Regards,
> 
> > Mike
> 

> > ----- Original Message -----
> 
> > > Mike,
> 
> > >
> 
> > > I'm not even sure that your solution works without being able to bind
> 
> > > a router HA port to several hosts.
> 
> > > What's happening currently is that you :
> 
> > >
> 
> > > 1.create the router on two l3agent.
> 
> > > 2. those l3agent trigger the sync_router() on the l3plugin.
> 
> > > 3. l3plugin.sync_routers() will trigger
> > > l2plugin.update_port(host=l3agent).
> 
> > > 4. ML2 will bind the port to the host mentioned in the last
> > > update_port().
> 
> > >
> 
> > > From a l2pop perspective, this will result in creating only one tunnel
> 
> > > to the host lastly specified.
> 
> > > I can't find any code that forces that only the master router binds
> 
> > > its router port. So we don't even know if the host which binds the
> 
> > > router port is hosting the master router or the slave one, and so if
> 
> > > l2pop is creating the tunnel to the master or to the slave.
> 
> > >
> 
> > > Can you confirm that the above sequence is correct? or am I missing
> 
> > > something?
> 

> > Are you referring to the alternative solution?
> 

> > In that case it seems that you're correct so that there would need to be
> > awareness of the master router at some level there as well.
> 
> > I can't say for sure as I've been thinking on the proposed solution with no
> > FDBs so there would be some issues with the alternative that need to be
> > ironed out.
> 

> > >
> 
> > > Without the capacity to bind a port to several hosts, l2pop won't be
> 
> > > able to create tunnel correctly, that's the reason why I was saying
> 
> > > that a prerequisite for a smart solution would be to first fix the bug
> 
> > > :
> 
> > > https://bugs.launchpad.net/neutron/+bug/1367391
> 
> > >
> 
> > > DVR Had the same issue. Their workaround was to create a new
> 
> > > port_binding tables, that manages the capacity for one DVR port to be
> 
> > > bound to several host.
> 
> > > As mentioned in the bug 1367391, this adding a technical debt in ML2,
> 
> > > which has to be tackle down in priority from my POV.
> 

> > I agree that this would simplify work but even without this bug fixed we
> > can
> > achieve either solution.
> 

> > We have already knowledge of the agents hosting a router so this is
> > completely doable without waiting for fix for bug 1367391.
> 

> > Also from my understanding the bug 1367391 is targeted at DVR only, not at
> > HA
> > router ports.
> 

> > [Vivek] Currently yes, but Bob's concept embraces all replicated ports and
> > so
> > HA router ports will play into it :)
> 

> > --
> 
> > Thanks,
> 

> > Vivek
> 

> > >
> 
> > >
> 
> > > On Thu, Dec 18, 2014 at 6:28 PM, Mike Kolesnik < mkolesni at redhat.com >
> > > wrote:
> 
> > > > Hi Mathieu,
> 
> > > >
> 
> > > > Thanks for the quick reply, some comments inline..
> 
> > > >
> 
> > > > Regards,
> 
> > > > Mike
> 
> > > >
> 
> > > > ----- Original Message -----
> 
> > > >> Hi mike,
> 
> > > >>
> 
> > > >> thanks for working on this bug :
> 
> > > >>
> 
> > > >> On Thu, Dec 18, 2014 at 1:47 PM, Gary Kotton < gkotton at vmware.com >
> > > >> wrote:
> 
> > > >> >
> 
> > > >> >
> 
> > > >> > On 12/18/14, 2:06 PM, "Mike Kolesnik" < mkolesni at redhat.com > wrote:
> 
> > > >> >
> 
> > > >> >>Hi Neutron community members.
> 
> > > >> >>
> 
> > > >> >>I wanted to query the community about a proposal of how to fix HA
> 
> > > >> >>routers not working with L2Population (bug 1365476[1]).
> 
> > > >> >>This bug is important to fix especially if we want to have HA
> 
> > > >> >>routers and DVR routers working together.
> 
> > > >> >>
> 
> > > >> >>[1] https://bugs.launchpad.net/neutron/+bug/1365476
> 
> > > >> >>
> 
> > > >> >>What's happening now?
> 
> > > >> >>* HA routers use distributed ports, i.e. the port with the same
> 
> > > >> >>IP & MAC
> 
> > > >> >> details is applied on all nodes where an L3 agent is hosting
> 
> > > >> >>this router.
> 
> > > >> >>* Currently, the port details have a binding pointing to an
> 
> > > >> >>arbitrary node
> 
> > > >> >> and this is not updated.
> 
> > > >> >>* L2pop takes this "potentially stale" information and uses it to
> 
> > > >> >>create:
> 
> > > >> >> 1. A tunnel to the node.
> 
> > > >> >> 2. An FDB entry that directs traffic for that port to that node.
> 
> > > >> >> 3. If ARP responder is on, ARP requests will not traverse the
> > > >> >> network.
> 
> > > >> >>* Problem is, the master router wouldn't necessarily be running
> 
> > > >> >>on the
> 
> > > >> >> reported agent.
> 
> > > >> >> This means that traffic would not reach the master node but
> 
> > > >> >>some arbitrary
> 
> > > >> >> node where the router master might be running, but might be in
> 
> > > >> >>another
> 
> > > >> >> state (standby, fail).
> 
> > > >> >>
> 
> > > >> >>What is proposed?
> 
> > > >> >>Basically the idea is not to do L2Pop for HA router ports that
> 
> > > >> >>reside on the tenant network.
> 
> > > >> >>Instead, we would create a tunnel to each node hosting the HA
> 
> > > >> >>router so that the normal learning switch functionality would
> 
> > > >> >>take care of switching the traffic to the master router.
> 
> > > >> >
> 
> > > >> > In Neutron we just ensure that the MAC address is unique per
> > > >> > network.
> 
> > > >> > Could a duplicate MAC address cause problems here?
> 
> > > >>
> 
> > > >> gary, AFAIU, from a Neutron POV, there is only one port, which is
> 
> > > >> the router Port, which is plugged twice. One time per port.
> 
> > > >> I think that the capacity to bind a port to several host is also a
> 
> > > >> prerequisite for a clean solution here. This will be provided by
> 
> > > >> patches to this bug :
> 
> > > >> https://bugs.launchpad.net/neutron/+bug/1367391
> 
> > > >>
> 
> > > >>
> 
> > > >> >>This way no matter where the master router is currently running,
> 
> > > >> >>the data plane would know how to forward traffic to it.
> 
> > > >> >>This solution requires changes on the controller only.
> 
> > > >> >>
> 
> > > >> >>What's to gain?
> 
> > > >> >>* Data plane only solution, independent of the control plane.
> 
> > > >> >>* Lowest failover time (same as HA routers today).
> 
> > > >> >>* High backport potential:
> 
> > > >> >> * No APIs changed/added.
> 
> > > >> >> * No configuration changes.
> 
> > > >> >> * No DB changes.
> 
> > > >> >> * Changes localized to a single file and limited in scope.
> 
> > > >> >>
> 
> > > >> >>What's the alternative?
> 
> > > >> >>An alternative solution would be to have the controller update
> 
> > > >> >>the port binding on the single port so that the plain old L2Pop
> 
> > > >> >>happens and notifies about the location of the master router.
> 
> > > >> >>This basically negates all the benefits of the proposed solution,
> 
> > > >> >>but is wider.
> 
> > > >> >>This solution depends on the report-ha-router-master spec which
> 
> > > >> >>is currently in the implementation phase.
> 
> > > >> >>
> 
> > > >> >>It's important to note that these two solutions don't collide and
> 
> > > >> >>could be done independently. The one I'm proposing just makes
> 
> > > >> >>more sense from an HA viewpoint because of it's benefits which
> 
> > > >> >>fit the HA methodology of being fast & having as little outside
> 
> > > >> >>dependency as possible.
> 
> > > >> >>It could be done as an initial solution which solves the bug for
> 
> > > >> >>mechanism drivers that support normal learning switch (OVS), and
> 
> > > >> >>later kept as an optimization to the more general, controller
> 
> > > >> >>based, solution which will solve the issue for any mechanism
> 
> > > >> >>driver working with L2Pop (Linux Bridge, possibly others).
> 
> > > >> >>
> 
> > > >> >>Would love to hear your thoughts on the subject.
> 
> > > >>
> 
> > > >> You will have to clearly update the doc to mention that deployment
> 
> > > >> with Linuxbridge+l2pop are not compatible with HA.
> 
> > > >
> 
> > > > Yes this should be added and this is already the situation right now.
> 
> > > > However if anyone would like to work on a LB fix (the general one or
> 
> > > > some specific one) I would gladly help with reviewing it.
> 
> > > >
> 
> > > >>
> 
> > > >> Moreover, this solution is downgrading the l2pop solution, by
> 
> > > >> disabling the ARP-responder when VMs want to talk to a HA router.
> 
> > > >> This means that ARP requests will be duplicated to every overlay
> 
> > > >> tunnel to feed the OVS Mac learning table.
> 
> > > >> This is something that we were trying to avoid with l2pop. But may
> 
> > > >> be this is acceptable.
> 
> > > >
> 
> > > > Yes basically you're correct, however this would be only limited to
> 
> > > > those tunnels that connect to the nodes where the HA router is
> 
> > > > hosted, so we would still limit the amount of traffic that is sent
> > > > across
> > > > the underlay.
> 
> > > >
> 
> > > > Also bear in mind that ARP is actually good (at least in OVS case)
> 
> > > > since it helps the VM locate on which tunnel the master is, so once
> 
> > > > it receives the ARP response it records a flow that directs the
> 
> > > > traffic to the correct tunnel, so we just get hit by the one ARP
> 
> > > > broadcast but it's sort of a necessary evil in order to locate the
> > > > master..
> 
> > > >
> 
> > > >>
> 
> > > >> I know that ofagent is also using l2pop, I would like to know if
> 
> > > >> ofagent deployment will be compatible with the workaround that you
> 
> > > >> are proposing.
> 
> > > >
> 
> > > > I would like to know that too, hopefully someone from OFagent can
> 
> > > > shed some light.
> 
> > > >
> 
> > > >>
> 
> > > >> My concern is that, with DVR, there are at least two major features
> 
> > > >> that are not compatible with Linuxbridge.
> 
> > > >> Linuxbridge is not running in the gate. I don't know if anybody is
> 
> > > >> running a 3rd party testing with Linuxbridge deployments. If
> 
> > > >> anybody does, it would be great to have it voting on gerrit!
> 
> > > >>
> 
> > > >> But I really wonder what is the future of linuxbridge compatibility?
> 
> > > >> should we keep on improving OVS solution without taking into
> 
> > > >> account the linuxbridge implementation?
> 
> > > >
> 
> > > > I don't know actually, but my capability is to fix it for OVS the
> 
> > > > best way possible.
> 
> > > > As I said the situation for LB won't become worse than it already
> 
> > > > is, legacy routers would till function as always.. This fix also
> 
> > > > will not block fixing LB in any other way since it can be easily
> 
> > > > adjusted (if
> 
> > > > necessary) to work only for supporting mechanisms (OVS AFAIK).
> 
> > > >
> 
> > > > Also if anyone is willing to pick up the glove and implement the
> 
> > > > general controller based fix, or something more focused on LB I will
> 
> > > > happily help review what I can.
> 
> > > >
> 
> > > >>
> 
> > > >> Regards,
> 
> > > >>
> 
> > > >> Mathieu
> 
> > > >>
> 
> > > >> >>
> 
> > > >> >>Regards,
> 
> > > >> >>Mike
> 
> > > >> >>
> 
> > > >> >>_______________________________________________
> 
> > > >> >>OpenStack-dev mailing list
> 
> > > >> >> OpenStack-dev at lists.openstack.org
> 
> > > >> >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> > > >> >
> 
> > > >> >
> 
> > > >> > _______________________________________________
> 
> > > >> > OpenStack-dev mailing list
> 
> > > >> > OpenStack-dev at lists.openstack.org
> 
> > > >> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> > > >>
> 
> > > >> _______________________________________________
> 
> > > >> OpenStack-dev mailing list
> 
> > > >> OpenStack-dev at lists.openstack.org
> 
> > > >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> > > >>
> 
> > > >
> 
> > > > _______________________________________________
> 
> > > > OpenStack-dev mailing list
> 
> > > > OpenStack-dev at lists.openstack.org
> 
> > > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> > >
> 
> > > _______________________________________________
> 
> > > OpenStack-dev mailing list
> 
> > > OpenStack-dev at lists.openstack.org
> 
> > > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> > >
> 

> > _______________________________________________
> 
> > OpenStack-dev mailing list
> 
> > OpenStack-dev at lists.openstack.org
> 
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

> > _______________________________________________
> 
> > OpenStack-dev mailing list
> 
> > OpenStack-dev at lists.openstack.org
> 
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20150114/9aa0e05f/attachment.html>


More information about the OpenStack-dev mailing list