Open Stack

Tue Nov 27 11:50:31 UTC 2012

On 11/27/2012 06:06 AM, Mark McClain wrote:
> All-
>
> I wanted to continue the discussion from the today's meeting about the L3 agents.  The two proposed solutions take different approaches, so I think we first should agree on what we're trying to solve: scaling or availability or both.
>
> Nachi and Yong call their proposal "scheduler", but really it is really a horizontal scale out model.  Scheduling is the means they've chosen to distribute the load.  While their solution scales out horizontally, it does not address fully availability.  Gary's proposal fronts the l3 services with a load balancing like service.  It addresses availability by using an active/standby setup, but does not cover what happens when vertical scaling maxes out due to too many tenant networks and/or routers to fit on a physical node.
>
> I think the answer is to do both by incorporating a combination of the two proposals.  The L3 and DHCP agents are different enough that we may not be able to find a universal solution and that's is ok.
>
> Lastly, deployers have different SLAs and may even have different SLAs for different tenants, so we need to make sure we have a foundation for vendors and deployers to meet their varying SLAs.
>
> Thoughts?

It would be great if we could have a universal solution. I feel that due 
to the different roles of the services this will be very challenging to 
achieve. I'll try and explain in more detail below.

*DHCP agents*:
At the moment each DHCP agent is able to allocate the IP address for a 
specific mac address. Each agent has this information as it is received 
from the notifications from the Quantum service. The problems with the 
DHCP agent are as follows (please feel free to add or remove):
i. For each networking providing DHCP services (currently only 
implemented by dnsmasq in Quantum) a dnsmasq process is created by the 
agent. This is problematic when the number of networks is large.
ii. When interacting with Nova firewall rules are created to enable the 
traffic to arrive from the DHCP server to the VM. This is problematic if 
the DHCP agent terminates and the VM wishes to renew a IP.

Originally I suggested that we use a load balancer to distribute the 
traffic amongst the DHCP agents. Sadly this is not relevant for two reasons:
i. HA proxy does not have UDP support. This would have enabled a virtual 
IP address for the DHCP server => no changes to the nova rules. The load 
balancer would have detected if agents were down and redirect traffic to 
agents that are up.
ii. It does not address point #ii above. I suggested to have a flag or 
configuration variable for each agent that indicate a list of networks 
that the agent can service.  This will enable the agent to limit the 
resources that can be consumed on a specific host. Naturally the devil 
is in the details on how one can go about this if it is relevant.

I think that if we had the "supported list of networks" configurable for 
the DHCP agents then the vendor can deploy as many DHCP agents as she/he 
wishes. I would prefer that this information is not on the Quantum 
service but locally on the agents. This will offer a solution for scale 
and high availability of DHCP resources.

Only problem is the ensuring that the DHCP traffic gets to the VM :). I 
do not think that it is feasible to update the hosts each time with a 
rule for a new DHCP agent that is added. One option to consider to to 
rewrite the source IP of the traffic sent from the DHCP agent. This is 
essentially what is done by a load balancer.

*L3 agents*:
Problems here are:
i. HA - what if a L3 agent goes down.
ii. Scale - how can we deploy a number of l3 agents
iii. Amount of firewall rules

In the first case if the L3 agent goes down then someone accessing a 
floating IP will be unable to access that IP. This is something that is 
critical for anyone running a cloud.

I have thought about a number of options but each has its shortcomings:
i. L3 agents to run VRRP. This will enable l3 agents to work in an 
active backup pair. This requires a number of changes to the agent. Each 
agent will have the same configuration enabling them to treat inbound 
and outbound traffic.
ii. Transparent load balancing - HA proxy does not support this
iii. Having the agents rewrite the destination MAC address of the 
default GW (the l3 agent). This solves outbound traffic but inbound is 
problematic
iv. Running l3 agents on each host could ensure that the traffic 
generated on those hosts has floating IP's. This would require us to 
change the implementation of the l3 agents to only build firewall rules 
for devices on the HOST.

None of the above deal with the firewall rules. This is something that 
can be addressed in a similar way to the DHCP agent with the L3 agents 
specifically indicating which routers it will support (this is already 
implemented when namespaces are not supported)

Thanks
Gary

>
> mark
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20121127/8397752d/attachment.html>

Open Stack

[openstack-dev] [Quantum] continuing todays discussion about the l3 agents

OpenStack

Community

Documentation

Branding & Legal