[openstack-dev] [TripleO] How do the CI clouds work?
Derek Higgins
derekh at redhat.com
Thu Dec 18 12:30:57 UTC 2014
On 18/12/14 08:48, Steve Kowalik wrote:
> Hai,
>
> I am finding myself at a loss at explaining how the CI clouds that run
> the tripleo jobs work from end-to-end. I am clear that we have a tripleo
> deployment running on those racks, with a seed, a HA undercloud and
> overcloud, but then I'm left with a number of questions, such as:
Yup, this is correct, from a CI point of view all that is relevant is
the overcloud and a set of baremetal test env hosts. The seed and
undercloud are there because we used tripleo to deploy the thing in the
first place.
>
> How do we run the testenv images on the overcloud?
nodepool talks to our overcloud to create an instance where the jenkins
jobs run. This "jenkins node" is where we build the images, jenkins
doesn't manage and isn't aware of the testenvs hosts.
The entry point for jenkins to run tripleo ci is toci_gate_test.sh, at
the end of this script you'll see a call to testenv-client[1]
testenv-client talks to gearman (an instance on our overcloud, a
different gearman instance to what infra have), gearman responds with a
json file representing one of the the testenv's that have been
registered with it.
testenv-client then runs the command "./toci_devtest.sh" and passes in
the json file (via $TE_DATAFILE). To prevent 2 CI jobs using the same
testenv, the testenv is now "locked" until toci_devtest exits. The
jenkins node now has all the relevant IPs and MAC addresses to talk to
the testenv.
>
> How do the testenv images interact with the nova-compute machines in
> the overcloud?
The images are built on instances in this cloud. The MAC address of eth1
on the seed in for the testenv has been registered with neutron on the
overcloud, so its IP is known (its in the json file we got in
$TE_DATAFILE). All traffic to the other instances in the CI testenv is
routed though the seed its eth2 shares a ovs bridge with eth1 from the
other VM's in the same testenv.
>
> Are the machines running the testenv images meant to be long-running,
> or are they recycled after n number of runs?
They are long running and in theory shouldn't need to be recycled, in
practice they get recycled sometimes for one of 2 reason
1. The image needs to be updated (e.g. to increase the amount of RAM on
the vibvirt domains they host)
2. If one is experiencing a problem, I usually do a "nova rebuild" on
it, this doesn't happen very frequently, we currently have 15 TE hosts
on rh1 7 have an uptime over 80 days, while the others are new HW that
was added last week. But problems we have encountered in the passed
causing a rebuild include a TE Host loosing its IP or
https://bugs.launchpad.net/tripleo/+bug/1335926
https://bugs.launchpad.net/tripleo/+bug/1314709
>
> Cheers,
No problem I tried to document this at one stage here[2] but feel free
to add more or point out where its lacking or ask questions here and
I'll attempt to answer.
thanks,
Derek.
[1]
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/toci_gate_test.sh?id=3d86dd4c885a68eabddb7f73a6dbe6f3e75fde64#n69
[2]
http://git.openstack.org/cgit/openstack-infra/tripleo-ci/tree/docs/TripleO-ci.rst
More information about the OpenStack-dev
mailing list