[openstack-dev] [tripleo] CI is currently down: 2 blockers

Emilien Macchi emilien at redhat.com
Fri Sep 16 00:19:10 UTC 2016


So here's an update about current situation:

Master / Newton
gate-tripleo-ci-centos-7-ovb-nonha
gate-tripleo-ci-centos-7-ovb-ha
The 2 jobs are supposed to pass, but some jobs are timing out in RH1 cloud.
In order to reduce the timeouts, Ben ran:
heat-manage purge_deleted 3
nova-manage db archive_deleted_rows --verbose --max_rows 1000000
sudo mysqlcheck -o -A

gate-tripleo-ci-centos-7-nonha-multinode
We merged the revert: https://review.openstack.org/#/c/370250/
At the time I'm writing this email, the job is still non-voting:
https://review.openstack.org/#/c/371133/
But hopefully Infra will merge this patch soon to bring it back in the gate.


stable/mitaka and stable/liberty
gate-tripleo-ci-centos-7-ovb-nonha works fine.
gate-tripleo-ci-centos-7-ovb-ha is broken because Galera was updated
in EPEL (and TripleO Mitaka still deploys EPEL).
I have 2 patches in order to fix the situation:
1) Fix Galera configuration to work with recent EPEL (kudos to Damien
for his help): https://review.openstack.org/#/c/371029/
2) (not required but good to have) Disable EPEL in tripleoclient
https://review.openstack.org/#/c/369559/ - I would understand if
people -1 this patch and I have no strong opinion about it.

I hope 1) will pass CI so we can just move forward.

It's end of day for me but if someone can monitor
http://tripleo.org/cistatus.html during Friday morning and make sure
everything it still running fine, we would appreciate it. Also please
report any bug related to CI and set the ci & alert tags.

Thanks, and let's keep focusing on Newton release!

On Thu, Sep 15, 2016 at 11:26 AM, Emilien Macchi <emilien at redhat.com> wrote:
> On Wed, Sep 14, 2016 at 10:13 PM, Emilien Macchi <emilien at redhat.com> wrote:
>> Hi,
>>
>> Just a heads-up before end of day:
>>
>> 1) multinode job is failing 80% of time. James and myself did some
>> attempts to revert or fix things but we have been unfortunate until
>> now.
>> Everything is documented here: https://bugs.launchpad.net/tripleo/+bug/1623606
>
> We found out that https://review.openstack.org/#/c/368760/ is breaking
> us, so we will revert it and work on it again later.
>
>> 2) ovb jobs are timeing out during NetworkDeployment because
>> 99-refresh-completed is not signaling to Heat due to instance-id being
>> detected as null by os-apply-config.
>> James proposed a revert: https://review.openstack.org/#/c/370250/
>> But the patch can't be merged because of 1).
>
> We are going to merge James's revert, we think it will bring back OVB jobs.
>
> To merge the reverts, we need to disable voting on multinode jobs:
> https://review.openstack.org/#/c/370922/
>
> Please do not merge anything today (except the 2 reverts) until our
> situation becomes more stable. Probably tonight or tomorrow.
> Once situation is better, I or someone else in the team will give an
> update here.
>
> Thanks for your understanding,
>
>> I'll continue to work on it tomorrow but if you're able to jump in and
>> make progress on it, this downtime is very critical at this stage of
>> the cycle.
>>
>> Any help is highly welcome.
>>
>> Thanks,
>> --
>> Emilien Macchi
>
>
>
> --
> Emilien Macchi



-- 
Emilien Macchi



More information about the OpenStack-dev mailing list