[openstack-dev] [nova][ironic] A couple feature freeze exception requests
mriedem at linux.vnet.ibm.com
Tue Aug 2 15:33:05 UTC 2016
On 8/1/2016 4:20 PM, Jim Rollenhagen wrote:
> Yes, I know this is stupid late for these.
> I'd like to request two exceptions to the non-priority feature freeze,
> for a couple of features in the Ironic driver. These were not requested
> at the normal time as I thought they were nowhere near ready.
> Multitenant networking
> Ironic's top feature request for around 2 years now has been to make
> networking safe for multitenant use, as opposed to a flat network
> (including control plane access!) for all tenants. We've been working on
> a solution for 3 cycles now, and finally have the Ironic pieces of it
> done, after a heroic effort to finish things up this cycle.
> There's just one patch left to make it work, in the virt driver in Nova.
> That is here: https://review.openstack.org/#/c/297895/
> It's important to note that this actually fixes some dead code we pushed
> on before this feature was done, and is only ~50 lines, half of which
> are comments/reno.
> Reviewers on this unearthed a problem on the ironic side, which I expect
> to be fixed in the next couple of days:
> We also have CI for this feature in ironic, and I have a depends-on
> testing all of this as a whole: https://review.openstack.org/#/c/347004/
> Per Matt's request, I'm also adding that job to Nova's experimental
> queue: https://review.openstack.org/#/c/349595/
> A couple folks from the ironic team have also done some manual testing
> of this feature, with the nova code in, using real switches.
> Merging this patch would bring a *huge* win for deployers and operators,
> and I don't think it's very risky. It'll be ready to go sometime this
> week, once that ironic chain is merged.
I've reviewed this one and it looks good to me. It's dependent on
python-ironicclient>=1.5.0 which Jim has a g-r bump up as a dependency.
And the gate-tempest-dsvm-ironic-multitenant-network-nv job is testing
this and passing on the test patch in ironic (and that job is in the
nova experimental queue now).
The upgrade procedure had some people scratching their heads in IRC this
week so I've stated that we need clear documentation there, which will
probably live here:
Since Ironic isn't in here:
But the docs in the Ironic repo say that Nova should be upgraded first
when going from Juno to Kilo so it's definitely important to get those
docs updated for upgrades from Mitaka to Newton, but Jim said he'd do
that this cycle.
Given how long people have been asking for this in Ironic and the Ironic
team has made it a priority to get it working on their side, and there
is CI already and a small change in Nova, I'm OK with giving a
non-priority FFE for this.
> Multi-compute usage via a hash ring
> One of the major problems with the ironic virt driver today is that we
> don't support running multiple nova-compute daemons with the ironic driver
> loaded, because each compute service manages all ironic nodes and stomps
> on each other.
> There's currently a hack in the ironic virt driver to
> kind of make this work, but instance locking still isn't done:
> That is also holding back removing the pluggable compute manager in nova:
> And as someone that runs a deployment using this hack, I can tell you
> first-hand that it doesn't work well.
> We (the ironic and nova community) have been working on fixing this for
> 2-3 cycles now, trying to find a solution that isn't terrible and
> doesn't break existing use cases. We've been conflating it with how we
> schedule ironic instances and keep managing to find a big wedge with
> each approach. The best approach we've found involves duplicating the
> compute capabilities and affinity filters in ironic.
> Some of us were talking at the nova midcycle and decided we should try
> the hash ring approach (like ironic uses to shard nodes between
> conductors) and see how it works out, even though people have said in
> the past that wouldn't work. I did a proof of concept last week, and
> started playing with five compute daemons in a devstack environment.
> Two nerd-snipey days later and I had a fully working solution, with unit
> tests, passing CI. That is here:
> We'll need to work on CI for this with multiple compute services. That
> shouldn't be crazy difficult, but I'm not sure we'll have it done this
> cycle (and it might get interesting trying to test computes joining and
> leaving the cluster).
> It also needs some testing at scale, which is hard to do in the upstream
> gate, but I'll be doing my best to ship this downstream as soon as I
> can, and iterating on any problems we see there.
> It's a huge win for operators, for only a few hundred lines (some of
> which will be pulled out to oslo next cycle, as it's copied from
> ironic). The single compute mode would still be recommended while we
> iron out any issues here, and that mode is well-understood (as this will
> behave the same in that case). We have a couple of nova cores on board
> with helping get this through, and I think it's totally doable.
I'm much less familiar with this one and haven't reviewed the series
yet, but I know that Jim, Jay and Dan were talking about this at the
midcycle and it seems this is a breakthrough of sorts on a working
solution to this problem, so I'm open to an FFE on this iff Jay and Dan
are happy with it (as sponsors more or less).
I'd be happier if there was a multi-node CI job to verify this because
I've blocked lesser things on CI for FFEs this cycle.
> Thanks for hearing me out,
> // jim
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev