<div dir="ltr"><div>I dunno.  Quantum doesn't have real layer3 yet.  nova-network does.  And spanning tree collisions are still a credible risk in a cloud infrastructure.<br><br></div>I'm hesitant to recommend quantum to folks until it has actual layer3 support.<br>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Sep 20, 2013 at 2:14 PM, Jonathan Proulx <span dir="ltr"><<a href="mailto:jon@jonproulx.com" target="_blank">jon@jonproulx.com</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">On Thu, Sep 19, 2013 at 11:21 PM, Lorin Hochstein<br>

<<a href="mailto:lorin@nimbisservices.com">lorin@nimbisservices.com</a>> wrote:<br>

><br>

><br>

> I'd be really interested to hear what these pain points were. Were these just due to quantum/neutron, or because you were migrating from nova-network to quantum?<br>

><br>

<br>

</div>A bit of both.  My short opinion is if you have an existing deploy<br>

with nova-network don't change yet, but if you're building fresh go<br>

with neutron/quantum to save the inevitable disruptive transition but<br>

be sure you test at scale befor eyou go live.<br>

<br>

I've been holding off on the public ranting because I'm still not<br>

quite sure how much of the issue was me not being ready for the<br>

transition and how much was quantum not being ready.  I'll attempt to<br>

tell the whole story here and let others judge.<br>

<br>

Background / History:<br>

<br>

I've been running a one rack 60 physical node OpenStack deployment<br>

since July 2012 at MIT CSAIL (<a href="http://www.csail.mit.edu" target="_blank">http://www.csail.mit.edu</a>).  Base<br>

operating system is Ubuntu 12.04LTS using puppet for configuration<br>

management (puppetlabs/openstack) modules we started with Essex and<br>

have been tracking the cloudarchive for Folsom and Grizzly.  Typically<br>

the cloud is heavily utilized  and very near resource capacity.  900 -<br>

1000 instances is typical with some projects liking to start or stop<br>

several hundred at a time.  Networking was nova-network using<br>

flat-dhcp and multihost with a typical setup using an rfc1918 private<br>

network NAT'ed to the public network and available floating IPs.<br>

<br>

Essex was essentially the "Alpha" period of our deployment.  Since we<br>

weren't promising continuous operation yet and were having issues with<br>

some race conditions in the Essex schedule I took the Folsom upgrade<br>

as soon as the cloudarchive packages were available, before the puppet<br>

modules had been fully updated.  Unsurprisingly I ran into some<br>

upgrade issues both with adapting configs and some legitimate bugs<br>

that required hand clean up of some database tables.  With all that no<br>

running instances were harmed, and the whole upgrade and debug to<br>

maybe 20-25 hours.<br>

<br>

This started our "Beta" phase where we opened the cloud to everyone in<br>

the lab, but with warning stickers all over it saying future<br>

disruptive changes were likely.<br>

<br>

I work in a group of 8, but am currently the only one doing OpenStack<br>

infrastructure stuff yet (couple of others are users of the cloud and<br>

provide some operational support like creating new projects and quota<br>

updates), so any times mentioned for work done is done by one person<br>

and the longer the time the more the sleep deprivation and lower the<br>

efficiency of work done.<br>

<br>

The Plan:<br>

<br>

This was planned to be the transition from "Beta" to "General<br>

Availability" and involve more fairly major reconfigurations to meet<br>

needs we'd identified in the first year of operations.  Specifically<br>

reconfiguring the networking to connect instances directly to existing<br>

lab vlans both to get NAT out of the picture and allow easier<br>

migration of legacy applications ("pets") into the newer cloud<br>

resource, moving from the simple cinder Linux LVM backend we'd used in<br>

testing to a vendor specific SAN backend to leverage existing<br>

enterprise storage, and combining host aggregates and instance types<br>

to provide different scheduling zones for computationally intensive<br>

compute instances and more over schedulable web app and testing<br>

instances.<br>

<br>

I planned the reconfig in two main phases, upgrade in place with<br>

nova-network followed by the transition to quantum for networking.  I<br>

scheduled a week of total downtime with all instances off line.<br>

<br>

Phase One - straight upgrade:<br>

<br>

This was about as uneventful for me as what Joe described.  I did find<br>

that the introduction of of nova-conductor was a serious bottle neck<br>

at first but it was trivially solved by launching some nova-conductor<br>

instances within the cloud.  To meet my aggregate scheduler needs I<br>

also grabbed core_filter.py from Havana because I personally needed<br>

the AggregateCoreFilter which was a simple drop in and it worked, and<br>

for my EqualLogic SAN I needed to back port eqlx.py from<br>

<a href="https://review.openstack.org/#/c/43944" target="_blank">https://review.openstack.org/#/c/43944</a> which took a little more work,<br>

but both of those are highly site specific needs.<br>

<br>

Phase Two - nova-network -> quantum upgrade:<br>

<br>

The plan here was to deploy quantum using open-vswitch plugin with<br>

vlan based provider networks and gre based project-private networks.<br>

The initial provider network was the same vlan the old floating IPs<br>

had been on, additional provider vlans were to be added later (and<br>

since have been) for legacy app migration.  We didn't have a use plan<br>

for the user created gre networks but it was easy to provide, and some<br>

projects are using them now.<br>

<br>

I had initially wanted to use my existing non-openstack DHCP<br>

infrastructure, and rather wish I could since all of my ongoing<br>

troubles with qunatum centre on the dhcp-agent, but obviously if<br>

OpenStack doesn't control the DHCP it can't do fixed address<br>

assignment or even tell what IP an instance is assigned.<br>

<br>

I'd initially marked the the provider network as external, since it is<br>

a publicly addressable network with a proper router outside openstack.<br>

 I'm still not sure if it's strictly necessary but I couldn't get dhcp<br>

to work until I made it an internal network.  There was also a bit of<br>

confusion around my ovs bridges and which ports got attached to which<br>

bridge.  I was getting rather sleep deprived at this point so my notes<br>

about how things got from point A to point B are not so good.  Once I<br>

tracked down all the missing bits to get things plumbed together<br>

properly, I went back to update the docs and the steps I missed for<br>

adding interfaces to bridges were in fact in the Network<br>

Administrators Guide which I'd been using,  hence my reluctance to<br>

complain too loudly.<br>

<br>

Problems with running quantum:<br>

<br>

I'm a bit more sure of what didn't work after getting quantum setup.<br>

<br>

Most of my quantum issues are scaling issues of various sorts, which<br>

really surprised me since I don't think my current scale is very<br>

large.  That may be the problem.  In proof of concept and developer<br>

size systems the scaling doesn't matter and at really large scale<br>

horizontal scale out is already required.  At my size my single<br>

controller node (dual socket hex-core with 48G ram) typically runs at<br>

about 10-20% capacity now (was 5% pre quantum) with peaks under<br>

extreme load brushing 50%.<br>

<br>

The first issue that became apparent was that some instances would be<br>

assigned multiple quantum ports<br>

(<a href="https://bugs.launchpad.net/ubuntu/+bug/1160442" target="_blank">https://bugs.launchpad.net/ubuntu/+bug/1160442</a>).  The bug report show<br>

this happens at 128 instances concurrently started but not at 64.  I<br>

was seeing it starting around 10.  The bug reporter has 8 compute<br>

hosts, my theory is it was worse for me because I had more<br>

quantum-clients running in parallel.  I applied the patch that closed<br>

that bug which provides the following in the ovs-agent config<br>

(defaults in comments my settings uncommented:<br>

<br>

# Maximum number of SQL connections to keep open in a QueuePool in SQLAlchemy<br>

# sqlalchemy_pool_size = 5<br>

sqlalchemy_pool_size = 24<br>

# sqlalchemy_max_overflow = 10<br>

sqlalchemy_max_overflow = 48<br>

# Example sqlalchemy_pool_timeout = 30<br>

sqlalchemy_pool_timeout = 2<br>

<br>

I believe this solved multiple port problem, but then at a slightly<br>

higher scale but still <50 concurrent starts quantum port creations<br>

would time out and the abortive instances would go into error state.<br>

This seemed to be caused by serialization in both keystone and<br>

quantum-server.  In the upgrade keystone had been reconfigured to use<br>

PKI tokens stored in MYSQL, moving back to UUID tokens in memcache<br>

helped a lot both not enough.<br>

<br>

Peter Feiner 's blog post on parallel performance at<br>

<a href="http://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performance" target="_blank">http://blog.gridcentric.com/bid/318277/Boosting-OpenStack-s-Parallel-Performance</a><br>

got me most of the rest of the way.  Particularly the multi worker<br>

patches for keystone and quantum-server, which I took from his links.<br>

A multiserver patch for quantum-server is under review at<br>

<a href="https://review.openstack.org/#/c/37131" target="_blank">https://review.openstack.org/#/c/37131</a> I'm beginning to worry it will<br>

make it into Havana, there's also a review for the keystone-all piece<br>

at <a href="https://review.openstack.org/#/c/42967/" target="_blank">https://review.openstack.org/#/c/42967/</a> which I believe is being<br>

held for Icehouse.<br>

<br>

At this point I could start hundreds of instances and they would all<br>

have the proper number of quantum ports assigned and end in "Active"<br>

state, but many (most) of them would never get their address via<br>

DHCP..  After some investigation I saw that while quantum had the mac<br>

and IP assignments for the ports dnsmasq never got them.  The<br>

incidence of this fault seemed to vary not only with the number of<br>

concurrent starts but also with the number of running instances.<br>

After much hair pulling I was able to mitigate this somewhat by<br>

increasing the default DHCP lease time from 2min to 30min (which is<br>

much more reasonable IMHO) and by increasing "agent_downtime" in<br>

quantum.conf from a default of 5sec to 60sec, though even at this we<br>

still occasionally see this crop up but it's infrequent enough I've<br>

been telling people to "try again and hope it's better in the next<br>

release".<br>

<br>

Another small issue we're having is that if you assign a specific<br>

fixed IP to an instance then tare it down, you need to wait untill<br>

it's dhcp lease expires before you can launch another instance with<br>

that IP.<br>

<br>

Looking Forward:<br>

<br>

I'm optimistic about Havana it should have most (but certainly not<br>

all) I needed to club into Grizzly to make it go and there seems to<br>

have been significant work around dhcp lease updates which I hope<br>

makes things better there at least in terms of immediately releasing<br>

IPs if nothing else.<br>

<br>

Conclusion:<br>

<br>

Blame Lorin he asked :)<br>

<br>

-Jon<br>

<br>

_______________________________________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators</a><br>

</blockquote></div><br></div>