[openstack-dev] [neutron][infra] Functional job failure rate at 100%

Thierry Carrez thierry at openstack.org
Wed Aug 9 16:13:17 UTC 2017


Thanks for this nice detective work, Jakub and Daniel ! This disabled
test job was making me nervous wrt. Pike release.

Now I suspect that this kernel regression is affecting Ubuntu Xenial
users for previous releases of OpenStack, too, so I hope this will get
fixed in Ubuntu soon enough.

Daniel Alvarez Sanchez wrote:
> Some more info added to Jakub's excellent report :)
> 
> 
> New kernel Ubuntu-4.4.0-89.112HEADUbuntu-4.4.0-89.112master was
> tagged 9 days ago (07/31/2017) [0].
> 
> From a quick look, the only commit around this function is [1].
> 
> [0] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/?id=64de31ed97a03ec1b86fd4f76e445506dce55b02
> [1] http://kernel.ubuntu.com/git/ubuntu/ubuntu-xenial.git/commit/?id=2ad4caea651e1cc0fc86111ece9f9d74de825b78
> 
> On Wed, Aug 9, 2017 at 3:29 PM, Jakub Libosvar <jlibosva at redhat.com
> <mailto:jlibosva at redhat.com>> wrote:
> 
>     Daniel Alvarez and I spent some time looking at it and the culprit was
>     finally found.
> 
>     tl;dr
> 
>     We updated a kernel on machines to one containing bug when creating
>     conntrack entries which makes functional tests stuck. More info at [4].
> 
>     For now, I sent a patch [5] to disable for now jobs that create
>     conntrack entries manually, it needs update of commit message. Once it
>     merges, we an enable back functional job to voting to avoid regressions.
> 
>     Is it possible to switch used image for jenkins machines to use back the
>     older version? Any other ideas how to deal with the kernel bug?
> 
>     Thanks
>     Jakub
> 
>     [5] https://review.openstack.org/#/c/492068/1
>     <https://review.openstack.org/#/c/492068/1>
> 
>     On 07/08/2017 11:52, Jakub Libosvar wrote:
>     > Hi all,
>     >
>     > as per grafana [1] the functional job is broken. Looking at
>     logstash [2]
>     > it started happening consistently since 2017-08-03 16:27. I didn't
>     find
>     > any particular patch in Neutron that could cause it.
>     >
>     > The culprit is that ovsdb starts misbehaving [3] and then we retry
>     calls
>     > indefinitely. We still use 2.5.2 openvswitch as we had before. I
>     opened
>     > a bug [4] and started investigation, I'll update my findings there.
>     >
>     > I think at this point there is no reason to run "recheck" on your
>     patches.
>     >
>     > Thanks,
>     > Jakub
>     >
>     > [1]
>     >
>     http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen
>     <http://grafana.openstack.org/dashboard/db/neutron-failure-rate?panelId=7&fullscreen>
>     > [2] http://bit.ly/2vdKMwy
>     > [3]
>     >
>     http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz
>     <http://logs.openstack.org/14/488914/8/check/gate-neutron-dsvm-functional-ubuntu-xenial/75d7482/logs/openvswitch/ovsdb-server.txt.gz>
>     > [4] https://bugs.launchpad.net/neutron/+bug/1709032
>     <https://bugs.launchpad.net/neutron/+bug/1709032>
>     >
> 
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>     <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev>
> 
> 
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 


-- 
Thierry Carrez (ttx)



More information about the OpenStack-dev mailing list