[openstack-dev] [nova] [infra] The same SRIOV / NFV CI failures missed a regression, why?

Monty Taylor mordred at inaugust.com
Sat Mar 26 15:26:34 UTC 2016

On 03/25/2016 03:52 PM, Jeremy Stanley wrote:
> On 2016-03-25 16:33:44 -0400 (-0400), Jay Pipes wrote:
> [...]
>> What I'm proposing isn't using or needing a custom OpenStack
>> deployment. There's nothing non-standard at all about the PCI or
>> NFV stuff besides the hardware required to functionally test it.
> What you _are_ talking about though is maintaining physical servers
> in a data center running an OpenStack environment (and if you want
> it participating in gating/preventing changes from merging you need
> more than one environment so we don't completely shutdown
> development when one of them collapses). This much has been a
> challenge for the TripleO team, such that the jobs running for them
> are still not voting on their changes.
>> What we're talking about here is using the same upstream Infra
>> Puppet modules, installed on a long-running server in a lab that
>> can interface with upstream Gerrit, respond to new change events
>> in the Gerrit stream, and trigger devstack-gate[-like] builds on
>> some bare-metal gear.
> It's possible I'm misunderstanding you... you're talking about
> maintaining a deployment of OpenStack with specific hardware to be
> able to run these jobs in, right? That's not as trivial an effort as
> it sounds, and I'm skeptical "a couple of operators" is sufficient
> to sustain such an endeavor.

Two things:

- Rhere is no current concept of "a long-lived machine running that we 
run devstack on from time to time" - everything in Infra is designed 
around using OpenStack APIs to get compute resources. So if we want to 
run jobs on hardware in this lab, as it stands right now, that hardware 
would need to be provided by Ironic+Nova.

Last time we did the math (and Jim can maybe correct my numbers) in 
order to keep up with the demand similar to our VM environments, I 
believe such an env would need at least 83 Ironic nodes. And as Jeremy 
said, we'd need at least 2 envs for redundancy - so in looking at 
getting it funded, looking for approximately 200 machines is likely 
about right.

- zuul v3 does introduce the concept of statically available resources 
that can be checked out of nodepool - specifically to address the 
question of people wanting to use long-lived servers as test resources 
for things. The machine count is still likely to remain static - but 
once we have zuul v3 out, it might reduce the need for the operators to 
operate 2 100-node Ironic-based OpenStack clouds. (This implies that 
help with zuul v3 might be seen as an accelerant to this project)

Also keep in mind, if/when resources are sought out, that every 
underlying OS config would double the amount of resources. So if we got 
2 sets of 100 nodes to start with, and started running NFV config'd 
devstack tests on them on ubuntu trusty, and then our friends at RedHat 
request that we test the same on a RH-baed distro, the cost for that 
would be an additional 100 nodes in each DC.

>> Is that something that is totally out of the question for the
>> upstream Infra team to be a guide for?
> We've stated in the past that we're willing to accept this level of
> integration as long as our requirements for redundancy/uptime are
> met. We mostly just don't want to see issues with the environment
> block development for projects relying on it because it's the only
> place those jobs can run, so multiple environments in different data
> centers would be a necessity (right now our gating jobs are able to
> run in any of 9 regions from 6 providers, which mitigates this
> risk).

More information about the OpenStack-dev mailing list