[OpenStack-Infra] Touching base; Airship CI cluster
    Clark Boylan 
    cboylan at sapwetik.org
       
    Mon Jan  6 17:36:10 UTC 2020
    
    
  
On Fri, Jan 3, 2020, at 8:29 AM, Gorshunov, Roman wrote:
> Hello Christopher, Clark, OpenStack-Infra,
> 
> Thank you for your help.
> 
> Moving conversation to OpenStack-Infra mailing list.
> 
> Answers:
> OS images available are OK for us. We primarily target Ubuntu 18.04 as 
> a base image.
> 
> > For sizing our current test VMs are 8vcpu, 8GB RAM, 80GB of disk.
> This is the VM we would be running: 64 GB Ram, 200 GB block storage, 16 
> VCpu, + nested virtualization
> 
> > Is it possible that AirShip could run these test jobs on 3 
> node multinode jobs using our existing nested-virt flavors?
> No, we can’t. We really need to run nested VMs inside a very big VM to 
> simulate end-to-end baremetal deployments (with PXE booting, etc). 
> Nested VMs would be up to 16GB RAM.    
As mentioned in the earlier email nested virt is one of my big concerns with this as it hasn't been very stable for us in the past. Coupled with potentially having a single provider of CI resources this could lead to very flaky testing with no alternatives.
Thinking out loud here, could you test the PXE provisioning independent of the configuration management and workload of the PXE provisioned resources? Then you could avoid nested virt when PXE booting and use qemu emulation which should be more reliable. I believe this is how Ironic does their testing. Then we can take advantage of our existing abilities to run multinode testing to check configuration management and workloads assuming the PXE boots succeeded (because this is tested separately).
I expect this would give you more reliable testing over time as you avoid the nested virt problem entirely. This may also make the test environment fit into existing resources allowing you to take advantage of multi cloud availability should problems arise in any single test node provider cloud.
> 
> > Airship could choose to accept the risk of having a single provider, but I would caution against this.
> We currently have one provider, planning to have two providers soon.
> 
> > The other policy thing we should consider is if these resources would 
> be available to the large pool of users or if they should be 
> considered airship specific.
> We would prefer them to be Airship-specific, primarily because only we 
> would be running those huge VMs, and other providers do not have this 
> option at the moment for us to be able to utilize their hardware.
> 
> Please, let’s set up a call for us at the time you would find convenient.
Does 16:00UTC Thursday January 9, 2020 work for you? If so we should be able to use our asterisk server, https://wiki.openstack.org/wiki/Infrastructure/Conferencing, room 6001.
Let me know if you cannot connect to that easily and we can set up a jitsi meet (https://meet.jit.si/) instead.
> 
> --
> Roman Gorshunov
> Principal, Systems Engineering
> 
> From: Christopher Price <christopher.price at est.tech> 
> Sent: Thursday, December 12, 2019 11:00 AM
> To: Gorshunov, Roman <roman.gorshunov at att.com>
> Cc: paye600 at gmail.com
> Subject: Re: Touching base
> 
> Hi Roman,
> 
> I have been in touch with a few people and it seems there is a way to 
> solve this across the groups, however there is some thinking and effort 
> involved in making it happen.
> We should set up a call with the AirShip and some of the infra team to 
> discuss and align on objectives, however I think first it’s worth 
> having a conversation in the AirShip community to understand what is 
> needed.  Here is a basic rundown from Clarke, sorry for the long mail 
> with Q&A included:
> 
> On 12/6/19 4:55 AM, Christopher Price wrote:
> > Hey Clark,
> > 
> > Reaching out to chat about getting some of the more unique AirShip gating use-cases set up in openinfra.
> > 
> > Some text below in my question to Thierry for context, but to summarize:
> > - Airship needs to do "huge VM" based testing with nested virtualization enabled for gating in opendev
> > - A few companies are wiling to put up some hardware to support that - maintenance and support from the companies
> > - I'd like to bring the "what does infra expect" topic forward to see what needs to happen and what we need to ensure to get the gating in place for these items
> > 
> > I think the plan is to use something like airskiff or airship-in-a-bottle as a method of doing deploy testing for gating.
> > Requires a Ubuntu 16.04 VM (minimum 4vCPU/20GB RAM/32GB disk) to run.
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__airship-2Dtreasuremap.readthedocs.io_en_latest_airskiff.html&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=qhOWbb99YOfK35xi5gJyuCTk8aVUkJ-JqhgPQdVrAoo&m=R4VBrtcCr7N-f_2HymjTWkL3JH9SWSQGg3U_SZh32iM&s=4-zR8RIZpy1O2ZMeOjlqVklESP4M1GoUWP87iKuV2H8&e=
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_airshipit_airship-2Din-2Da-2Dbottle&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=qhOWbb99YOfK35xi5gJyuCTk8aVUkJ-JqhgPQdVrAoo&m=R4VBrtcCr7N-f_2HymjTWkL3JH9SWSQGg3U_SZh32iM&s=KfFmZvKHEh4LRqnDhAYol88lMLbBtGnQE_mKjUz8QTI&e=
> 
> We currently build images for Ubuntu Xenial, Ubuntu Bionic, CentOS 7, 
> CentOS 8, Fedora 29, Fedora 30, opensuse 15, Debian Stretch, Debian 
> Buster, and more. The image here shouldn't be an issue.
> 
> For sizing our current test VMs are 8vcpu, 8GB RAM, 80GB of disk. Zuul 
> supports native multinode testing. What we've tried to do is push 
> towards running tests on distributed multinode setups rather than 
> singular large VMs. A major advantage of this is in many cases what we 
> are producing is distributed software that needs to operate in a 
> distributed manner and we are able to test that effectively.
> 
> Nested virt is likely the biggest hurdle to sort out. Despite it finally 
> being enabled by default on very recent Linux kernels what we see in 
> production is much older and flakier. Typical nested virt experience 
> from our existing cloud providers is that it will work for some time 
> then a kernel update will get pushed out. The "middle" VMs will get this 
> update first and start crashing until our cloud providers update the 
> base hypervisor kernels as well.
> 
> What we have done to try and improve nested virt reliability is added 
> nested-virt specific labels to nodepool. teams running tests that 
> trigger these nested virt crashes can use these labels to actively work 
> with our cloud providers to debug and fix these issues.
> 
> Is it possible that AirShip could run these test jobs on 3 node 
> multinode jobs using our existing nested-virt flavors?
> 
> > 
> > I'd like to set up a call amongst stakeholders or on an AirShip dev call in the not too distant future to outline:
> >             What do hardware hosting companies need to provide
> 
> Our "contributing resources" document, 
> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.openstack.org_infra_system-2Dconfig_contribute-2Dcloud.html&d=DwMGaQ&c=LFYZ-o9_HUMeMTSQicvjIg&r=qhOWbb99YOfK35xi5gJyuCTk8aVUkJ-JqhgPQdVrAoo&m=R4VBrtcCr7N-f_2HymjTWkL3JH9SWSQGg3U_SZh32iM&s=sVCeNvgE11_hlPffQ0yPg0eLPMW7CcyIG-Umb2oTTls&e=, 
> should give a good overview of what is required. Short version is an 
> OpenStack cloud where we can run some longer lived cloud resources as 
> well as the test nodes themselves.
> 
> >             Are there any specific needs on the infra team (policies or changes) that we should be aware of
> 
> We require publicly addressable IP addresses for each node because we 
> run a globally distributed control plane. These addresses can be IPv6 
> addresses, but we do need at least one IPv4 address for the in cloud 
> mirror node.
> 
> Policy wise for the OpenStack project we've maintained that for gating 
> resources need to come from at least two different providers. From our 
> experience clouds come and go (due to planned an unplanned outages) and 
> being able to fallback on redundant resources in particularly important 
> for gating. Airship could choose to accept the risk of having a single 
> provider, but I would caution against this.
> 
> The other policy thing we should consider is if these resources would be 
> available to the large pool of users or if they should be considered 
> airship specific. In the past we've tried to do project specific 
> resources and they tended to be less reliable. This doesn't necessarily 
> make it a bad idea, but I think having "pressure" from the greater whole 
> helps ensure things run smoothly and problems are caught quickly.
> 
> Assuming AirShip is able to test today as described above we could add 
> in these new resources to the pool to expand quotas and provide more 
> resources to Airship (and possibly the whole).
> 
> >             What is required by the AirShip devs to ensure they can direct their jobs to the right machines etc..
> 
> Nodepool would provide node labels that identify the resources and Zuul 
> job configuration would consume those labels. Configuration is how we 
> express this.
> 
> > 
> > Hopefully we can have something in place before soonish as the AirShip team want to be in Beta for their 2.0 release early next year.
> > 
> > Any help you can provide would be appreciated.
> 
> I think it would be helpful to get the discussion onto the infra mailing 
> list, mailto:openstack-infra at lists.openstack.org, as other team members may 
> have input as well.
> 
> Happy to do a call as well (perhaps we can coordinate that on the 
> mailing list?). I'll be around the next two weeks pre holidays. Major 
> time conflicts are Mondays 1600-1700UTC and Tuesdays 
> 1600-1700UTC+1900-2000UTC and my working hours are generally 
> 1600-0100UTC but could do a one off 1500UTC call if that helps.
> 
> I have not been able to follow up as my weeks have become strangled 
> with end of year activities.  Your help on determining what to do next 
> would be appreciated, and if you want to move forward with this 
> information feel free to do so, this doesn’t have to pivot on me.  😊
> 
> / Chris
> 
> ...
>  
> _______________________________________________
> OpenStack-Infra mailing list
> OpenStack-Infra at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-infra
    
    
More information about the OpenStack-Infra
mailing list