Open Stack

Fri Mar 25 19:20:00 UTC 2016

On 03/24/2016 09:35 AM, Matt Riedemann wrote:
> We have another mitaka-rc-potential bug [1] due to a regression when
> detaching SR-IOV interfaces in the libvirt driver.
>
> There were two NFV CIs that ran on the original change [2].
>
> Both failed with the same devstack setup error [3][4].
>
> So it sucks that we have a regression, it sucks that no one watched for
> those CI results before approving the change, and it really sucks in
> this case since it was specifically reported from mellanox for sriov
> which failed in [4]. But it happens.
>
> What I'd like to know is, have the CI problems been fixed? There is a
> change up to fix the regression [5] and this time the Mellanox CI check
> is passing [6]. The Intel NFV CI hasn't reported, but with the mellanox
> one also testing the suspend scenario, it's probably good enough.

 From the commit message of the original patch that introduced the 
regression:

"This fix was tested on a real environment containing the above type of 
VMs. test_driver.test_detach_sriov_ports was slightly modified so that 
the VIF from which data is sent to _detach_pci_devices will contain the 
correct SRIOV values (pci_slot, vlan and hw_veb VIF type)"

I'm not sure if the above statement could ever have been true 
considering the AttributeError that occurred in the bug...

In any case, I think that it's pretty clear that the CI systems for NFV 
and PCI have been less than reliable at functionally testing the PCI and 
NFV-specific functionality in Nova.

This isn't trying to put down the people that work on those systems -- I 
know first hand that it can be difficult to build and maintain CI 
systems that report in to upstream, and I appreciate the effort that 
goes into this.

But, going forward, I think we need to do something as a concerned 
community.

How about this for a proposal?

1) We establish a joint lab environment that contains heterogeneous 
hardware to which all interested hardware vendors must provide hardware.

2) The OpenStack Foundation and the hardware vendors each foot some 
portion of the bill to hire 2 or more systems administrators to maintain 
this lab environment.

3) The upstream Infrastructure team works with the hired system 
administrators to create a single CI system that can spawn functional 
test jobs on the lab hardware and report results back to upstream Gerrit

Given the will to do this, I think the benefits of more trusted testing 
results for the PCI and SR-IOV/NFV areas would more than make up for the 
cost.

Best,
-jay

Open Stack

[openstack-dev] [nova] The same SRIOV / NFV CI failures missed a regression, why?

OpenStack

Community

Documentation

Branding & Legal