[openstack-dev] [nova][ironic] A couple feature freeze exception requests

Jim Rollenhagen jim at jimrollenhagen.com
Mon Aug 1 21:20:13 UTC 2016


Yes, I know this is stupid late for these.

I'd like to request two exceptions to the non-priority feature freeze,
for a couple of features in the Ironic driver.  These were not requested
at the normal time as I thought they were nowhere near ready.

Multitenant networking
======================

Ironic's top feature request for around 2 years now has been to make
networking safe for multitenant use, as opposed to a flat network
(including control plane access!) for all tenants. We've been working on
a solution for 3 cycles now, and finally have the Ironic pieces of it
done, after a heroic effort to finish things up this cycle.

There's just one patch left to make it work, in the virt driver in Nova.
That is here: https://review.openstack.org/#/c/297895/

It's important to note that this actually fixes some dead code we pushed
on before this feature was done, and is only ~50 lines, half of which
are comments/reno.

Reviewers on this unearthed a problem on the ironic side, which I expect
to be fixed in the next couple of days:
https://review.openstack.org/#/q/topic:bug/1608511

We also have CI for this feature in ironic, and I have a depends-on
testing all of this as a whole: https://review.openstack.org/#/c/347004/

Per Matt's request, I'm also adding that job to Nova's experimental
queue: https://review.openstack.org/#/c/349595/

A couple folks from the ironic team have also done some manual testing
of this feature, with the nova code in, using real switches.

Merging this patch would bring a *huge* win for deployers and operators,
and I don't think it's very risky. It'll be ready to go sometime this
week, once that ironic chain is merged.

Multi-compute usage via a hash ring
===================================

One of the major problems with the ironic virt driver today is that we
don't support running multiple nova-compute daemons with the ironic driver
loaded, because each compute service manages all ironic nodes and stomps
on each other.

There's currently a hack in the ironic virt driver to
kind of make this work, but instance locking still isn't done:
https://github.com/openstack/ironic/blob/master/ironic/nova/compute/manager.py

That is also holding back removing the pluggable compute manager in nova:
https://github.com/openstack/nova/blob/master/nova/conf/service.py#L64-L69

And as someone that runs a deployment using this hack, I can tell you
first-hand that it doesn't work well.

We (the ironic and nova community) have been working on fixing this for
2-3 cycles now, trying to find a solution that isn't terrible and
doesn't break existing use cases. We've been conflating it with how we
schedule ironic instances and keep managing to find a big wedge with
each approach. The best approach we've found involves duplicating the
compute capabilities and affinity filters in ironic.

Some of us were talking at the nova midcycle and decided we should try
the hash ring approach (like ironic uses to shard nodes between
conductors) and see how it works out, even though people have said in
the past that wouldn't work. I did a proof of concept last week, and
started playing with five compute daemons in a devstack environment.
Two nerd-snipey days later and I had a fully working solution, with unit
tests, passing CI. That is here:
https://review.openstack.org/#/c/348443/

We'll need to work on CI for this with multiple compute services. That
shouldn't be crazy difficult, but I'm not sure we'll have it done this
cycle (and it might get interesting trying to test computes joining and
leaving the cluster).

It also needs some testing at scale, which is hard to do in the upstream
gate, but I'll be doing my best to ship this downstream as soon as I
can, and iterating on any problems we see there.

It's a huge win for operators, for only a few hundred lines (some of
which will be pulled out to oslo next cycle, as it's copied from
ironic). The single compute mode would still be recommended while we
iron out any issues here, and that mode is well-understood (as this will
behave the same in that case). We have a couple of nova cores on board
with helping get this through, and I think it's totally doable.

Thanks for hearing me out,

// jim




More information about the OpenStack-dev mailing list