[openstack-dev] [Neutron][Infra] Post processing of gate hooks on job timeouts

Assaf Muller assaf at redhat.com
Mon Apr 11 18:47:52 UTC 2016


On Mon, Apr 11, 2016 at 1:56 PM, Clark Boylan <cboylan at sapwetik.org> wrote:
> On Mon, Apr 11, 2016, at 10:52 AM, Jakub Libosvar wrote:
>> On 04/11/2016 06:41 PM, Clark Boylan wrote:
>> > On Mon, Apr 11, 2016, at 03:07 AM, Jakub Libosvar wrote:
>> >> Hi,
>> >>
>> >> recently we hit an issue in Neutron with tests getting stuck [1]. As a
>> >> side effect we discovered logs are not collected properly which makes it
>> >> hard to find the root cause. The reason of missing logs is that we send
>> >> SIGKILL to whatever gate hook is running when we hit the global timeout
>> >> per gate job [2]. This gives no time to running process to perform any
>> >> post-processing. In post_gate_hook function in Neutron, we collect logs
>> >> from /tmp directory, compress them and move them to /opt/stack/logs to
>> >> make them exposed.
>> >>
>> >> I have in mind two solutions to which I'd like to get feedback before
>> >> sending patches.
>> >>
>> >> 1) In Neutron, we execute tests in post_gate_hook (dunno why). But even
>> >> if we would have moved test execution into gate_hook and tests get stuck
>> >> then the post_gate_hook won't be triggered [3]. So the solution I
>> >> propose here is to terminate gate_hook N minutes before global timeout
>> >> and still execute post_gate_hook (with timeout) as post-processing
>> >> routine.
>> >>
>> >> 2) Second proposal is to let timeout wrapped commands know they are
>> >> about to be killed. We can send let's say SIGTERM instead of SIGKILL and
>> >> after certain amount of time, send SIGKILL. Example: We send SIGTERM 3
>> >> minutes before global timeout, letting these 3 minutes to 'command' to
>> >> handle the SIGTERM signal.
>> >>
>> >>  timeout -s 15 -k 3 $((REMAINING_TIME-3))m bash -c "command"
>> >>
>> >> With the 2nd approach we can trap the signal that kills running test
>> >> suite and collects logs with same functions we currently have.
>> >>
>> >>
>> >> I would personally go with second option but I want to hear if anybody
>> >> has a better idea about post processing in gate jobs or if there is
>> >> already a tool we can use to collect logs.
>> >>
>> >> Thanks,
>> >> Kuba
>> >
>> > Devstack gate already does a "soft" timeout [0] then proceeds to cleanup
>> > (part of which is collecting logs) [1], then Jenkins does the "hard"
>> > timeout [2]. Why aren't we collecting the required log files as part of
>> > the existing cleanup?
>> This existing cleanup doesn't support hooks. Neutron tests produce a lot
>> of logs by default stored in /tmp/dsvm-<job_name> so we need to compress
>> and move them to /opt/stack/logs in order to get them collected by [1].
>
> My suggestion would be to stop writing these log files to /tmp and
> instead write them to the log dir where they will be automagically
> compressed and collected.

Yeah that's what I'm doing here https://review.openstack.org/#/c/303594/.

>
>>
>> >
>> > [0]
>> > https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate-wrap.sh#n569
>> > [1]
>> > https://git.openstack.org/cgit/openstack-infra/devstack-gate/tree/devstack-vm-gate-wrap.sh#n594
>> > [2]
>> > https://git.openstack.org/cgit/openstack-infra/project-config/tree/jenkins/jobs/devstack-gate.yaml#n325
>> >
>> > Clark
>> >
>> > __________________________________________________________________________
>> > OpenStack Development Mailing List (not for usage questions)
>> > Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>> >
>>
>>
>> __________________________________________________________________________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe:
>> OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



More information about the OpenStack-dev mailing list