[openstack-qa] The service logs are not saved on job timeout

Attila Fazekas afazekas at redhat.com
Thu Jun 13 07:01:15 UTC 2013


Actually, I am looking for the once and for all solution.

The fixtures solution is similar to what I would implement first and
I like this type of fixture usage. 
Probably there will be more, when we start speaking about plug-ability.
But because there is no Mocking in tempest, probably the fixtures are not the best and simplest
 solution for resource management. 

IMHO the maximum extent what we can do according to the 80/20 rules [1].
The clihlt vm image's main purpose is let the other threads use the CPU, memory and storage.
The other possible trick might be the thin provisioning of the volumes, usually we need to write less than allocate.

The Tempest specific thing here, is we have a big config file which will be bigger anyway.
(The attribute based test selection is one thing what can make it smaller, bit it is another topic)

So, the timeout and plugin related things should get dedicated section like [GLOBAL] or [DEFAULT].

The one of the biggest advantage of parallel execution, we can have test cases what otherwise could add more than
256 sec gate time.

+ measurement - ceilometer,...
+ auto scaling - heat,...
+ instance validation - nova (cinder, glance, networking, ...) 

So probably the 512 sec would be the individual test case default timeout,
 if we do not want to manage the timeouts / test case.


If any issue affects multiple test cases we can run out from job task timeout.
If the alarm signal handler miss used by any imported library, it could lead to anything including dead locks.
 Fixtures does not use it fully correctly [2], because it does not considers the threading.

Looks like parmaiko using threading, but not using the alarm.
I did not read all of python code used by tempest, so I cannot be sure nothing can happen.

Probably we can take this risk, but the warning must be in the code or documentation.


This [3] is one place where I could add bigger hammer.
I would like to solve at least for this extent, but it is not a once and for all solution.

What are the obstacles in solving it at Jenkins side ?

[1] http://en.wikipedia.org/wiki/Pareto_principle
[2] https://en.wikipedia.org/wiki/Greatest_common_divisor
[3] https://github.com/openstack-infra/devstack-gate/blob/master/devstack-vm-gate.sh#L214


----- Original Message -----
> From: "Monty Taylor" <mordred at inaugust.com>
> To: openstack-qa at lists.openstack.org
> Sent: Wednesday, June 12, 2013 4:59:57 PM
> Subject: Re: [openstack-qa] The service logs are not saved on job timeout
> 
> 
> 
> On 06/12/2013 07:38 AM, James E. Blair wrote:
> > Attila Fazekas <afazekas at redhat.com> writes:
> > 
> >> The service logs are not saved on job timeout
> >> http://logs.openstack.org/23739/6/check/gate-tempest-devstack-vm-postgres-full/12385/logs/
> >>
> >> How can I help to solve this kind of issues ?
> >>
> >> How it is handled nowadays ?
> > 
> > The job timeout is enforced by Jenkins -- it forcibly aborts the job
> > when the timeout is reached.  Copying the service logs into a location
> > where they can be saved is part of the job.
> > 
> > The best solution would be for Tempest itself to enforce a timeout so
> > that the job could continue as normal.  Perhaps that's something to look
> > into with test repository.
> 
> there is a timeout fixture that we use in nova:
> 
> https://github.com/openstack/nova/blob/master/nova/test.py#L202-209
> 
> then we set default timeouts in .testr.conf:
> 
> https://github.com/openstack/nova/blob/master/.testr.conf#L4
> 
> Doing this has the benefit that a test being timed out shows up as a
> failed test with logging info and whatnot. It also runs as a fixture, so
> cleanups and the like also run as expected, which will be a nicer
> behavior for when we start running tempest against contributed existing
> clouds for refstack.
> 
> > It may be possible to configure a Jenkins post-build action to copy the
> > logs, however, I'm not certain that would work, and the plugin that does
> > that is frankly a little ridiculous (it scans the entire console log
> > with regexes to decide if it should run) and very complicated -- I'd
> > rather keep these job configurations as simple as possible.
> > 
> > In the mean time, you may be able to reproduce locally.  Jeremy Stanley
> > is working on improving documentation around how to exactly reproduce
> > the gating jobs:
> > 
> >   https://review.openstack.org/#/c/32661/

Cool.
I would like know more about https://jenkins.openstack.org/, because
I am interested to add test cases which does not really makes sense on single node.
I have several concept for solving this.


> > -Jim
> > 
> > _______________________________________________
> > openstack-qa mailing list
> > openstack-qa at lists.openstack.org
> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-qa
> > 
> 
> _______________________________________________
> openstack-qa mailing list
> openstack-qa at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-qa
> 



More information about the openstack-qa mailing list