[openstack-dev] [Trove] Trove-Gate timeouts

Craig Vyvial cp16net at gmail.com
Mon Feb 17 04:33:05 UTC 2014


Trovesters,

One reason for the longer running test was that for the configuration
groups i added a creation of a new instance. This is to test a new instance
will be created with a configuration group applied. This might be causing
the run to be a little longer but i am surprised that its taking over an
hour to run through everything still.

-Craig Vyvial


On Sun, Feb 16, 2014 at 12:25 AM, Mirantis <dmakogon at mirantis.com> wrote:

> Hello, Mathew.
>
> I'm seeing same issues with the gate.
> I also tried to found out why gate job is failing. First ran into issue
> related to cinder installation failure in devstack. But then I found same
> problem as you described. The best option is to increase job time range.
> Thanks for such research. I hope gate will be fixed in the easiest way and
> for the shortest period of time.
>
> Best regards
> Denis Makogon.
> Sent from an iPad
>
> 16 февр. 2014, в 00:46, "Lowery, Mathew" <mlowery at ebay.com> написал(а):
>
>  Hi all,
>
>  *Issue #1: Jobs that need more than one hour*
>
>  Of the last 30 Trove-Gate <https://rdjenkins.dyndns.org/job/Trove-Gate/>builds (spanning three days), 7 have failed due to a Jenkins job-level
> timeout (not a proboscis timeout). These jobs had no failed tests when the
> timeout occurred.
>
>  Not having access to the job config to see what the job looks like, I
> used the console output to guess what was going on. It appears that a
> Jenkins plugin named boot-hpcloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L181> is
> booting a VM and running the commands given, including redstack int-tests.
> From the console output, it states that it was supplied with an
> ssh_shell_timeout="7200". This is passed down to another library called
> net-ssh-simple<https://github.com/busyloop/net-ssh-simple/blob/e3834f259a47606bfb06a487ca701fc20dbad8a5/lib/net/ssh/simple.rb#L632>.
> net-ssh-simple has two timeouts: an idle timeout and an operation timeout.
>
>  In the latest boot-hpcloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L182>,
> ssh_shell_timeout is passed down to net-ssh-simple for both the idle
> timeout and the operation timeout. But in older versions of
> boot-hp-cloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/9260e957d6c54142c33dd9e9632b86e17fd5c02f/models/boot_vm_concurrent.rb#L141>,
> ssh_shell_timeout is passed down to net-ssh-simple for only the idle
> timeout, leaving a default operation timeout of 3600. This is why I believe
> these jobs are failing after exactly one hour.
>
>  FYI: Here are the jobs that failed due to the Jenkins job-level timeout
> (and had no test failures when the timeout occurred) along with their
> associated patch sets:
> https://rdjenkins.dyndns.org/job/Trove-Gate/2532/console (
> http://review.openstack.org/73786)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2530/console (
> http://review.openstack.org/73736)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2517/console (
> http://review.openstack.org/63789)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2514/console (
> https://review.openstack.org/50944)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2513/console (
> https://review.openstack.org/50944)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2504/console (
> https://review.openstack.org/73147)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2503/console (
> https://review.openstack.org/73147)
>
>   *Suggested action items:*
>
>    - If it is acceptable to have jobs that run over one hour, then
>    install the latest boot-hpcloud-vm plugin for Jenkins which will increase
>    the make the operation timeout match the idle timeout.
>
>
>  *Issue #2: The running time of all jobs is 1 hr 1 min*
>
>  While the Jenkins job-level timeout will end the job after one hour, it
> also appears to keep every job running for a minimum of one hour.  To be
> more precise, the timeout (or minimum running time) occurs on the part of
> the Jenkins job that runs commands on the VM; the VM provision (which takes
> about one minute) is excluded from this timeout which is why the running
> time of all jobs is around 1 hr 1 min<https://rdjenkins.dyndns.org/job/Trove-Gate/buildTimeTrend>.
> A sampling of console logs showing the time the int-tests completed and
> when the timeout kicks in:
>
>  https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03
> wasted)
>
> *04:51:12* COMMAND_0: echo refs/changes/36/73736/2
>
> ...
>
> *05:50:10*     335.41     proboscis.case.MethodTest (test_instance_created)*05:50:10*     194.05     proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*05:51:13* ***************************************05:51:13* ****** STDERR-BEGIN ******
>
>
>  https://rdjenkins.dyndns.org/job/Trove-Gate/2521/console (00:06:44
> wasted)
>
> *21:11:44* COMMAND_0: echo refs/changes/89/63789/13
>
> ...
>
> *22:05:00*     195.11     proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*22:05:00*     186.89     proboscis.case.MethodTest (test_resize_down)*22:11:44* ***************************************22:11:44* ****** STDERR-BEGIN ******
>
>
> https://rdjenkins.dyndns.org/job/Trove-Gate/2518/consoleFull (00:06:01
> wasted)
>
> *17:46:59* COMMAND_0: echo refs/changes/02/64302/20
>
> ...
>
> *18:40:57*     210.03     proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*18:40:57*     187.89     proboscis.case.MethodTest (test_resize_down)*18:46:58* ***************************************18:46:58* ****** STDERR-BEGIN ******
>
>
> *Suggested action items:*
>
>
>    -
>
>    Given that the minimum running time is one hour, I assume the problem is in the net-ssh-simple library. Needs more investigation.
>
>
>
> *Issue #3: Jenkins console log line timestamps different between full and truncated views*
>
>
>  I assume this is due to JENKINS-17779<https://issues.jenkins-ci.org/browse/JENKINS-17779>
> .
>
>  *Suggested action items:*
>
>    - Upgrade the timestamper plugin<https://wiki.jenkins-ci.org/display/JENKINS/Timestamper>
>    .
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140216/1dc66129/attachment.html>


More information about the OpenStack-dev mailing list