[OpenStack-Infra] Touching base; Airship CI cluster

Gorshunov, Roman roman.gorshunov at att.com
Fri Jan 3 16:29:53 UTC 2020


Hello Christopher, Clark, OpenStack-Infra,

Thank you for your help.

Moving conversation to OpenStack-Infra mailing list.

Answers:
OS images available are OK for us. We primarily target Ubuntu 18.04 as a base image.

> For sizing our current test VMs are 8vcpu, 8GB RAM, 80GB of disk.
This is the VM we would be running: 64 GB Ram, 200 GB block storage, 16 VCpu, + nested virtualization

> Is it possible that AirShip could run these test jobs on 3 node multinode jobs using our existing nested-virt flavors?
No, we can’t. We really need to run nested VMs inside a very big VM to simulate end-to-end baremetal deployments (with PXE booting, etc). Nested VMs would be up to 16GB RAM.                 

> Airship could choose to accept the risk of having a single provider, but I would caution against this.
We currently have one provider, planning to have two providers soon.

> The other policy thing we should consider is if these resources would be available to the large pool of users or if they should be considered airship specific.
We would prefer them to be Airship-specific, primarily because only we would be running those huge VMs, and other providers do not have this option at the moment for us to be able to utilize their hardware.

Please, let’s set up a call for us at the time you would find convenient.

--
Roman Gorshunov
Principal, Systems Engineering

From: Christopher Price <christopher.price at est.tech> 
Sent: Thursday, December 12, 2019 11:00 AM
To: Gorshunov, Roman <roman.gorshunov at att.com>
Cc: paye600 at gmail.com
Subject: Re: Touching base

Hi Roman,

I have been in touch with a few people and it seems there is a way to solve this across the groups, however there is some thinking and effort involved in making it happen.
We should set up a call with the AirShip and some of the infra team to discuss and align on objectives, however I think first it’s worth having a conversation in the AirShip community to understand what is needed.  Here is a basic rundown from Clarke, sorry for the long mail with Q&A included:

On 12/6/19 4:55 AM, Christopher Price wrote:
> Hey Clark,
> 
> Reaching out to chat about getting some of the more unique AirShip gating use-cases set up in openinfra.
> 
> Some text below in my question to Thierry for context, but to summarize:
> - Airship needs to do "huge VM" based testing with nested virtualization enabled for gating in opendev
> - A few companies are wiling to put up some hardware to support that - maintenance and support from the companies
> - I'd like to bring the "what does infra expect" topic forward to see what needs to happen and what we need to ensure to get the gating in place for these items
> 
> I think the plan is to use something like airskiff or airship-in-a-bottle as a method of doing deploy testing for gating.
> Requires a Ubuntu 16.04 VM (minimum 4vCPU/20GB RAM/32GB disk) to run.
> https://urldefense.proofpoint.com/v2/url?u=https-3A__airship-2Dtreasuremap.readthedocs.io_en_latest_airskiff.html&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=qhOWbb99YOfK35xi5gJyuCTk8aVUkJ-JqhgPQdVrAoo&m=R4VBrtcCr7N-f_2HymjTWkL3JH9SWSQGg3U_SZh32iM&s=4-zR8RIZpy1O2ZMeOjlqVklESP4M1GoUWP87iKuV2H8&e=
> https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_airshipit_airship-2Din-2Da-2Dbottle&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=qhOWbb99YOfK35xi5gJyuCTk8aVUkJ-JqhgPQdVrAoo&m=R4VBrtcCr7N-f_2HymjTWkL3JH9SWSQGg3U_SZh32iM&s=KfFmZvKHEh4LRqnDhAYol88lMLbBtGnQE_mKjUz8QTI&e=

We currently build images for Ubuntu Xenial, Ubuntu Bionic, CentOS 7, 
CentOS 8, Fedora 29, Fedora 30, opensuse 15, Debian Stretch, Debian 
Buster, and more. The image here shouldn't be an issue.

For sizing our current test VMs are 8vcpu, 8GB RAM, 80GB of disk. Zuul 
supports native multinode testing. What we've tried to do is push 
towards running tests on distributed multinode setups rather than 
singular large VMs. A major advantage of this is in many cases what we 
are producing is distributed software that needs to operate in a 
distributed manner and we are able to test that effectively.

Nested virt is likely the biggest hurdle to sort out. Despite it finally 
being enabled by default on very recent Linux kernels what we see in 
production is much older and flakier. Typical nested virt experience 
from our existing cloud providers is that it will work for some time 
then a kernel update will get pushed out. The "middle" VMs will get this 
update first and start crashing until our cloud providers update the 
base hypervisor kernels as well.

What we have done to try and improve nested virt reliability is added 
nested-virt specific labels to nodepool. teams running tests that 
trigger these nested virt crashes can use these labels to actively work 
with our cloud providers to debug and fix these issues.

Is it possible that AirShip could run these test jobs on 3 node 
multinode jobs using our existing nested-virt flavors?

> 
> I'd like to set up a call amongst stakeholders or on an AirShip dev call in the not too distant future to outline:
>             What do hardware hosting companies need to provide

Our "contributing resources" document, 
https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_infra_system-2Dconfig_contribute-2Dcloud.html&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=qhOWbb99YOfK35xi5gJyuCTk8aVUkJ-JqhgPQdVrAoo&m=R4VBrtcCr7N-f_2HymjTWkL3JH9SWSQGg3U_SZh32iM&s=sVCeNvgE11_hlPffQ0yPg0eLPMW7CcyIG-Umb2oTTls&e=, 
should give a good overview of what is required. Short version is an 
OpenStack cloud where we can run some longer lived cloud resources as 
well as the test nodes themselves.

>             Are there any specific needs on the infra team (policies or changes) that we should be aware of

We require publicly addressable IP addresses for each node because we 
run a globally distributed control plane. These addresses can be IPv6 
addresses, but we do need at least one IPv4 address for the in cloud 
mirror node.

Policy wise for the OpenStack project we've maintained that for gating 
resources need to come from at least two different providers. From our 
experience clouds come and go (due to planned an unplanned outages) and 
being able to fallback on redundant resources in particularly important 
for gating. Airship could choose to accept the risk of having a single 
provider, but I would caution against this.

The other policy thing we should consider is if these resources would be 
available to the large pool of users or if they should be considered 
airship specific. In the past we've tried to do project specific 
resources and they tended to be less reliable. This doesn't necessarily 
make it a bad idea, but I think having "pressure" from the greater whole 
helps ensure things run smoothly and problems are caught quickly.

Assuming AirShip is able to test today as described above we could add 
in these new resources to the pool to expand quotas and provide more 
resources to Airship (and possibly the whole).

>             What is required by the AirShip devs to ensure they can direct their jobs to the right machines etc..

Nodepool would provide node labels that identify the resources and Zuul 
job configuration would consume those labels. Configuration is how we 
express this.

> 
> Hopefully we can have something in place before soonish as the AirShip team want to be in Beta for their 2.0 release early next year.
> 
> Any help you can provide would be appreciated.

I think it would be helpful to get the discussion onto the infra mailing 
list, mailto:openstack-infra at lists.openstack.org, as other team members may 
have input as well.

Happy to do a call as well (perhaps we can coordinate that on the 
mailing list?). I'll be around the next two weeks pre holidays. Major 
time conflicts are Mondays 1600-1700UTC and Tuesdays 
1600-1700UTC+1900-2000UTC and my working hours are generally 
1600-0100UTC but could do a one off 1500UTC call if that helps.

I have not been able to follow up as my weeks have become strangled with end of year activities.  Your help on determining what to do next would be appreciated, and if you want to move forward with this information feel free to do so, this doesn’t have to pivot on me.  😊

/ Chris

...
 


More information about the OpenStack-Infra mailing list