[openstack-dev] [Trove] Trove-Gate timeouts

Denis Makogon dmakogon at mirantis.com
Mon Feb 17 07:29:34 UTC 2014


Hi, Craig.

Yes, i thought about configurations test suits.
For now core team, maybe, should extend gate running time.
But for the tempest tests i would suggest to exclude some tests from
'gate'-group (the longest ones).
We need to deal with it asap, because gate failing for four or five days.

Best regards
Denis Makogon.

Sent from an iPad


On Mon, Feb 17, 2014 at 6:33 AM, Craig Vyvial <cp16net at gmail.com> wrote:

> Trovesters,
>
> One reason for the longer running test was that for the configuration
> groups i added a creation of a new instance. This is to test a new instance
> will be created with a configuration group applied. This might be causing
> the run to be a little longer but i am surprised that its taking over an
> hour to run through everything still.
>
> -Craig Vyvial
>
>
> On Sun, Feb 16, 2014 at 12:25 AM, Mirantis <dmakogon at mirantis.com> wrote:
>
>> Hello, Mathew.
>>
>> I'm seeing same issues with the gate.
>> I also tried to found out why gate job is failing. First ran into issue
>> related to cinder installation failure in devstack. But then I found same
>> problem as you described. The best option is to increase job time range.
>> Thanks for such research. I hope gate will be fixed in the easiest way
>> and for the shortest period of time.
>>
>> Best regards
>> Denis Makogon.
>> Sent from an iPad
>>
>> 16 февр. 2014, в 00:46, "Lowery, Mathew" <mlowery at ebay.com> написал(а):
>>
>>  Hi all,
>>
>>  *Issue #1: Jobs that need more than one hour*
>>
>>  Of the last 30 Trove-Gate <https://rdjenkins.dyndns.org/job/Trove-Gate/>builds (spanning three days), 7 have failed due to a Jenkins job-level
>> timeout (not a proboscis timeout). These jobs had no failed tests when the
>> timeout occurred.
>>
>>  Not having access to the job config to see what the job looks like, I
>> used the console output to guess what was going on. It appears that a
>> Jenkins plugin named boot-hpcloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L181> is
>> booting a VM and running the commands given, including redstack int-tests.
>> From the console output, it states that it was supplied with an
>> ssh_shell_timeout="7200". This is passed down to another library called
>> net-ssh-simple<https://github.com/busyloop/net-ssh-simple/blob/e3834f259a47606bfb06a487ca701fc20dbad8a5/lib/net/ssh/simple.rb#L632>.
>> net-ssh-simple has two timeouts: an idle timeout and an operation timeout.
>>
>>  In the latest boot-hpcloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L182>,
>> ssh_shell_timeout is passed down to net-ssh-simple for both the idle
>> timeout and the operation timeout. But in older versions of
>> boot-hp-cloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/9260e957d6c54142c33dd9e9632b86e17fd5c02f/models/boot_vm_concurrent.rb#L141>,
>> ssh_shell_timeout is passed down to net-ssh-simple for only the idle
>> timeout, leaving a default operation timeout of 3600. This is why I believe
>> these jobs are failing after exactly one hour.
>>
>>  FYI: Here are the jobs that failed due to the Jenkins job-level timeout
>> (and had no test failures when the timeout occurred) along with their
>> associated patch sets:
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2532/console (
>> http://review.openstack.org/73786)
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2530/console (
>> http://review.openstack.org/73736)
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2517/console (
>> http://review.openstack.org/63789)
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2514/console (
>> https://review.openstack.org/50944)
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2513/console (
>> https://review.openstack.org/50944)
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2504/console (
>> https://review.openstack.org/73147)
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2503/console (
>> https://review.openstack.org/73147)
>>
>>   *Suggested action items:*
>>
>>    - If it is acceptable to have jobs that run over one hour, then
>>    install the latest boot-hpcloud-vm plugin for Jenkins which will increase
>>    the make the operation timeout match the idle timeout.
>>
>>
>>  *Issue #2: The running time of all jobs is 1 hr 1 min*
>>
>>  While the Jenkins job-level timeout will end the job after one hour, it
>> also appears to keep every job running for a minimum of one hour.  To be
>> more precise, the timeout (or minimum running time) occurs on the part of
>> the Jenkins job that runs commands on the VM; the VM provision (which takes
>> about one minute) is excluded from this timeout which is why the running
>> time of all jobs is around 1 hr 1 min<https://rdjenkins.dyndns.org/job/Trove-Gate/buildTimeTrend>.
>> A sampling of console logs showing the time the int-tests completed and
>> when the timeout kicks in:
>>
>>  https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03
>> wasted)
>>
>> *04:51:12* COMMAND_0: echo refs/changes/36/73736/2
>>
>> ...
>>
>> *05:50:10*     335.41     proboscis.case.MethodTest (test_instance_created)*05:50:10*     194.05     proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*05:51:13* ***************************************05:51:13* ****** STDERR-BEGIN ******
>>
>>
>>  https://rdjenkins.dyndns.org/job/Trove-Gate/2521/console (00:06:44
>> wasted)
>>
>> *21:11:44* COMMAND_0: echo refs/changes/89/63789/13
>>
>> ...
>>
>> *22:05:00*     195.11     proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*22:05:00*     186.89     proboscis.case.MethodTest (test_resize_down)*22:11:44* ***************************************22:11:44* ****** STDERR-BEGIN ******
>>
>>
>> https://rdjenkins.dyndns.org/job/Trove-Gate/2518/consoleFull (00:06:01
>> wasted)
>>
>> *17:46:59* COMMAND_0: echo refs/changes/02/64302/20
>>
>> ...
>>
>> *18:40:57*     210.03     proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*18:40:57*     187.89     proboscis.case.MethodTest (test_resize_down)*18:46:58* ***************************************18:46:58* ****** STDERR-BEGIN ******
>>
>>
>> *Suggested action items:*
>>
>>
>>    -
>>
>>    Given that the minimum running time is one hour, I assume the problem is in the net-ssh-simple library. Needs more investigation.
>>
>>
>>
>> *Issue #3: Jenkins console log line timestamps different between full and truncated views*
>>
>>
>>  I assume this is due to JENKINS-17779<https://issues.jenkins-ci.org/browse/JENKINS-17779>
>> .
>>
>>  *Suggested action items:*
>>
>>    - Upgrade the timestamper plugin<https://wiki.jenkins-ci.org/display/JENKINS/Timestamper>
>>    .
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>> _______________________________________________
>> OpenStack-dev mailing list
>> OpenStack-dev at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140217/a323568f/attachment.html>


More information about the OpenStack-dev mailing list