<div dir="ltr">><span style="font-size:12.8000001907349px">The biggest disconnect in the model seems to be that Neutron assumes</span><br style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">you want self service networking. Most of these deploys don't. Or even</span><br style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">more importantly, they live in an organization where that is never</span><br style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">going to be an option.</span><br style="font-size:12.8000001907349px"><br style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">>Neutron provider networks is close, except it doesn't provide for</span><br style="font-size:12.8000001907349px"><span style="font-size:12.8000001907349px">floating IP / NAT.</span><div><span style="font-size:12.8000001907349px"><br></span></div><div><span style="font-size:12.8000001907349px">Why don't shared networks work in these cases? The workflow here would be that there is a admin tenant responsible for creating the networks and setting up the neutron router and floating IP pools, etc. Then tenants would attach their VMs to the shared networks.</span></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 11, 2015 at 5:59 AM, Sean Dague <span dir="ltr"><<a href="mailto:sean@dague.net" target="_blank">sean@dague.net</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">The last couple of days I was at the Operators Meetup acting as Nova<br>

rep for the meeting. All the sessions were quite nicely recorded to<br>

etherpads here - <a href="https://etherpad.openstack.org/p/PHL-ops-meetup" target="_blank">https://etherpad.openstack.org/p/PHL-ops-meetup</a><br>

<br>

There was both a specific Nova session -<br>

<a href="https://etherpad.openstack.org/p/PHL-ops-nova-feedback" target="_blank">https://etherpad.openstack.org/p/PHL-ops-nova-feedback</a> as well as a<br>

bunch of relevant pieces of information in other sessions.<br>

<br>

This is an attempt for some summary here, anyone else that was in<br>

attendance please feel free to correct if I'm interpreting something<br>

incorrectly. There was a lot of content there, so this is in no way<br>

comprehensive list, just the highlights that I think make the most<br>

sense for the Nova team.<br>

<br>

=========================<br>

 Nova Network -> Neutron<br>

=========================<br>

<br>

This remains listed as the #1 issue from the Operator Community on<br>

their burning issues list<br>

(<a href="https://etherpad.openstack.org/p/PHL-ops-burning-issues" target="_blank">https://etherpad.openstack.org/p/PHL-ops-burning-issues</a> L18). During<br>

the tags conversation we straw polled the audience<br>

(<a href="https://etherpad.openstack.org/p/PHL-ops-tags" target="_blank">https://etherpad.openstack.org/p/PHL-ops-tags</a> L45) and about 75% of<br>

attendees were over on neutron already. However those on Nova Network<br>

we disproportionally the largest clusters and longest standing<br>

OpenStack users.<br>

<br>

Of those on nova-network about 1/2 had no interest in being on<br>

Neutron (<a href="https://etherpad.openstack.org/p/PHL-ops-nova-feedback" target="_blank">https://etherpad.openstack.org/p/PHL-ops-nova-feedback</a><br>

L24). Some of the primary reasons were the following:<br>

<br>

- Complexity concerns - neutron has a lot more moving parts<br>

- Performance concerns - nova multihost means there is very little<br>

  between guests and the fabric, which is really important for the HPC<br>

  workload use case for OpenStack.<br>

- Don't want OVS - ovs adds additional complexity, and performance<br>

  concerns. Many large sites are moving off ovs back to linux bridge<br>

  with neutron because they are hitting OVS scaling limits (especially<br>

  if on UDP) - (<a href="https://etherpad.openstack.org/p/PHL-ops-OVS" target="_blank">https://etherpad.openstack.org/p/PHL-ops-OVS</a> L142)<br>

<br>

The biggest disconnect in the model seems to be that Neutron assumes<br>

you want self service networking. Most of these deploys don't. Or even<br>

more importantly, they live in an organization where that is never<br>

going to be an option.<br>

<br>

Neutron provider networks is close, except it doesn't provide for<br>

floating IP / NAT.<br>

<br>

Going forward: I think the gap analysis probably needs to be revisited<br>

with some of the vocal large deployers. I think we assumed the<br>

functional parity gap was closed with DVR, but it's not clear in it's<br>

current format it actually meets the n-net multihost users needs.<br>

<br>

===================<br>

 EC2 going forward<br>

===================<br>

<br>

Having a sustaninable EC2 is of high interest to the operator<br>

community. Many large deploys have some users that were using AWS<br>

prior to using OpenStack, or currently are using both. They have<br>

preexisting tooling for that.<br>

<br>

There didn't seem to be any objection to the approach of an external<br>

proxy service for this function -<br>

(<a href="https://etherpad.openstack.org/p/PHL-ops-nova-feedback" target="_blank">https://etherpad.openstack.org/p/PHL-ops-nova-feedback</a> L111). Mostly<br>

the question is timing, and the fact that no one has validated the<br>

stackforge project. The fact that we landed everything people need to<br>

run this in Kilo is good, as these production deploys will be able to<br>

test it for their users when they upgrade.<br>

<br>

============================<br>

 Burning Nova Features/Bugs<br>

============================<br>

<br>

Hierarchical Projects Quotas<br>

----------------------------<br>

<br>

Hugely desired feature by the operator community<br>

(<a href="https://etherpad.openstack.org/p/PHL-ops-nova-feedback" target="_blank">https://etherpad.openstack.org/p/PHL-ops-nova-feedback</a> L116). Missed<br>

Kilo. This made everyone sad.<br>

<br>

Action: we should queue this up as early Liberty priority item.<br>

<br>

Out of sync Quotas<br>

------------------<br>

<br>

<a href="https://etherpad.openstack.org/p/PHL-ops-nova-feedback" target="_blank">https://etherpad.openstack.org/p/PHL-ops-nova-feedback</a> L63<br>

<br>

The quotas code is quite racey (this is kind of a known if you look at<br>

the bug tracker). It was actually marked as a top soft spot during<br>

last fall's bug triage -<br>

<a href="http://lists.openstack.org/pipermail/openstack-dev/2014-September/046517.html" target="_blank">http://lists.openstack.org/pipermail/openstack-dev/2014-September/046517.html</a><br>

<br>

There is an operator proposed spec for an approach here -<br>

<a href="https://review.openstack.org/#/c/161782/" target="_blank">https://review.openstack.org/#/c/161782/</a><br>

<br>

Action: we should make a solution here a top priority for enhanced<br>

testing and fixing in Liberty. Addressing this would remove a lot of<br>

pain from ops.<br>

<br>

Reporting on Scheduler Fails<br>

----------------------------<br>

<br>

Apparently, some time recently, we stopped logging scheduler fails<br>

above DEBUG, and that behavior also snuck back into Juno as well<br>

(<a href="https://etherpad.openstack.org/p/PHL-ops-nova-feedback" target="_blank">https://etherpad.openstack.org/p/PHL-ops-nova-feedback</a> L78). This<br>

has made tracking down root cause of failures far more difficult.<br>

<br>

Action: this should hopefully be a quick fix we can get in for Kilo<br>

and backport.<br>

<br>

=============================<br>

 Additional Interesting Bits<br>

=============================<br>

<br>

Rabbit<br>

------<br>

<br>

There was a whole session on Rabbit -<br>

<a href="https://etherpad.openstack.org/p/PHL-ops-rabbit-queue" target="_blank">https://etherpad.openstack.org/p/PHL-ops-rabbit-queue</a><br>

<br>

Rabbit is a top operational concern for most large sites. Almost all<br>

sites have a "restart everything that talks to rabbit" script because<br>

during rabbit ha opperations queues tend to blackhole.<br>

<br>

All other queue systems OpenStack supports are worse than Rabbit (from<br>

experience in that room).<br>

<br>

oslo.messaging < 1.6.0 was a significant regression in dependability<br>

from the incubator code. It now seems to be getting better but still a<br>

lot of issues. (L112)<br>

<br>

Operators *really* want the concept in<br>

<a href="https://review.openstack.org/#/c/146047/" target="_blank">https://review.openstack.org/#/c/146047/</a> landed. (I asked them to<br>

provide such feedback in gerrit).<br>

<br>

Nova Rolling Upgrades<br>

---------------------<br>

<br>

Most people really like the concept, couldn't find anyone that had<br>

used it yet because Neutron doesn't support it, so they had to big<br>

bang upgrades anyway.<br>

<br>

Galera Upstream Testing<br>

-----------------------<br>

<br>

The majority of deploys run with Galera MySQL. There was a question<br>

about whether or not we could get that into upstream testing pipeline<br>

as that's the common case.<br>

<span class="HOEnZb"><font color="#888888"><br>

<br>

        -Sean<br>

<br>

--<br>

Sean Dague<br>

<a href="http://dague.net" target="_blank">http://dague.net</a><br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</font></span></blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature"><div>Kevin Benton</div></div>

</div>