[openstack-dev] [Trove] Trove-Gate timeouts
Craig Vyvial
cp16net at gmail.com
Mon Feb 17 04:33:05 UTC 2014
Trovesters,
One reason for the longer running test was that for the configuration
groups i added a creation of a new instance. This is to test a new instance
will be created with a configuration group applied. This might be causing
the run to be a little longer but i am surprised that its taking over an
hour to run through everything still.
-Craig Vyvial
On Sun, Feb 16, 2014 at 12:25 AM, Mirantis <dmakogon at mirantis.com> wrote:
> Hello, Mathew.
>
> I'm seeing same issues with the gate.
> I also tried to found out why gate job is failing. First ran into issue
> related to cinder installation failure in devstack. But then I found same
> problem as you described. The best option is to increase job time range.
> Thanks for such research. I hope gate will be fixed in the easiest way and
> for the shortest period of time.
>
> Best regards
> Denis Makogon.
> Sent from an iPad
>
> 16 февр. 2014, в 00:46, "Lowery, Mathew" <mlowery at ebay.com> написал(а):
>
> Hi all,
>
> *Issue #1: Jobs that need more than one hour*
>
> Of the last 30 Trove-Gate <https://rdjenkins.dyndns.org/job/Trove-Gate/>builds (spanning three days), 7 have failed due to a Jenkins job-level
> timeout (not a proboscis timeout). These jobs had no failed tests when the
> timeout occurred.
>
> Not having access to the job config to see what the job looks like, I
> used the console output to guess what was going on. It appears that a
> Jenkins plugin named boot-hpcloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L181> is
> booting a VM and running the commands given, including redstack int-tests.
> From the console output, it states that it was supplied with an
> ssh_shell_timeout="7200". This is passed down to another library called
> net-ssh-simple<https://github.com/busyloop/net-ssh-simple/blob/e3834f259a47606bfb06a487ca701fc20dbad8a5/lib/net/ssh/simple.rb#L632>.
> net-ssh-simple has two timeouts: an idle timeout and an operation timeout.
>
> In the latest boot-hpcloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/2272770b0ce54752eabb84229dc8939d79b2be50/models/boot_vm_concurrent.rb#L182>,
> ssh_shell_timeout is passed down to net-ssh-simple for both the idle
> timeout and the operation timeout. But in older versions of
> boot-hp-cloud-vm<https://github.com/mrhoades/boot-hpcloud-vm/blob/9260e957d6c54142c33dd9e9632b86e17fd5c02f/models/boot_vm_concurrent.rb#L141>,
> ssh_shell_timeout is passed down to net-ssh-simple for only the idle
> timeout, leaving a default operation timeout of 3600. This is why I believe
> these jobs are failing after exactly one hour.
>
> FYI: Here are the jobs that failed due to the Jenkins job-level timeout
> (and had no test failures when the timeout occurred) along with their
> associated patch sets:
> https://rdjenkins.dyndns.org/job/Trove-Gate/2532/console (
> http://review.openstack.org/73786)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2530/console (
> http://review.openstack.org/73736)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2517/console (
> http://review.openstack.org/63789)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2514/console (
> https://review.openstack.org/50944)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2513/console (
> https://review.openstack.org/50944)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2504/console (
> https://review.openstack.org/73147)
> https://rdjenkins.dyndns.org/job/Trove-Gate/2503/console (
> https://review.openstack.org/73147)
>
> *Suggested action items:*
>
> - If it is acceptable to have jobs that run over one hour, then
> install the latest boot-hpcloud-vm plugin for Jenkins which will increase
> the make the operation timeout match the idle timeout.
>
>
> *Issue #2: The running time of all jobs is 1 hr 1 min*
>
> While the Jenkins job-level timeout will end the job after one hour, it
> also appears to keep every job running for a minimum of one hour. To be
> more precise, the timeout (or minimum running time) occurs on the part of
> the Jenkins job that runs commands on the VM; the VM provision (which takes
> about one minute) is excluded from this timeout which is why the running
> time of all jobs is around 1 hr 1 min<https://rdjenkins.dyndns.org/job/Trove-Gate/buildTimeTrend>.
> A sampling of console logs showing the time the int-tests completed and
> when the timeout kicks in:
>
> https://rdjenkins.dyndns.org/job/Trove-Gate/2531/console (00:01:03
> wasted)
>
> *04:51:12* COMMAND_0: echo refs/changes/36/73736/2
>
> ...
>
> *05:50:10* 335.41 proboscis.case.MethodTest (test_instance_created)*05:50:10* 194.05 proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*05:51:13* ***************************************05:51:13* ****** STDERR-BEGIN ******
>
>
> https://rdjenkins.dyndns.org/job/Trove-Gate/2521/console (00:06:44
> wasted)
>
> *21:11:44* COMMAND_0: echo refs/changes/89/63789/13
>
> ...
>
> *22:05:00* 195.11 proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*22:05:00* 186.89 proboscis.case.MethodTest (test_resize_down)*22:11:44* ***************************************22:11:44* ****** STDERR-BEGIN ******
>
>
> https://rdjenkins.dyndns.org/job/Trove-Gate/2518/consoleFull (00:06:01
> wasted)
>
> *17:46:59* COMMAND_0: echo refs/changes/02/64302/20
>
> ...
>
> *18:40:57* 210.03 proboscis.case.MethodTest (test_instance_returns_to_active_after_resize)*18:40:57* 187.89 proboscis.case.MethodTest (test_resize_down)*18:46:58* ***************************************18:46:58* ****** STDERR-BEGIN ******
>
>
> *Suggested action items:*
>
>
> -
>
> Given that the minimum running time is one hour, I assume the problem is in the net-ssh-simple library. Needs more investigation.
>
>
>
> *Issue #3: Jenkins console log line timestamps different between full and truncated views*
>
>
> I assume this is due to JENKINS-17779<https://issues.jenkins-ci.org/browse/JENKINS-17779>
> .
>
> *Suggested action items:*
>
> - Upgrade the timestamper plugin<https://wiki.jenkins-ci.org/display/JENKINS/Timestamper>
> .
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20140216/1dc66129/attachment.html>
More information about the OpenStack-dev
mailing list