[all][neutron][qa] Gate status: tempest-full-py3, tempest-slow-py3 or few more jobs are broken: "Donot recheck"
Hello Everyone, Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method: "++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private" - https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3 Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same. Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning. - https://bugs.launchpad.net/neutron/+bug/1936983 Until we find the root cause/fix, do not recheck on these failures. -gmann
On Tue, 20 Jul 2021, 16:59 -0500, Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
FYI all: nova-multi-cell, nova-live-migration, and nova-ceph-multistore are also affected: https://zuul.opendev.org/t/openstack/builds?job_name=nova-multi-cell https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration https://zuul.opendev.org/t/openstack/builds?job_name=nova-ceph-multistore And here's what the error trace looks like, for anyone who hasn't seen it yet:
Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request. ++ functions-common:oscwrap:2349 : return 1 + lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : NET_ID= + lib/neutron_plugins/services/l3:create_neutron_initial_network:215 : die_if_not_set 215 NET_ID 'Failure creating NET_ID for private 0b1d94f08f194eb5b7679c47f91f6d4c' + functions-common:die_if_not_set:216 : local exitcode=0 [Call Trace] ./stack.sh:1300:create_neutron_initial_network /opt/stack/devstack/lib/neutron_plugins/services/l3:215:die_if_not_set /opt/stack/devstack/functions-common:223:die [ERROR] /opt/stack/devstack/functions-common:215 Failure creating NET_ID for private 0b1d94f08f194eb5b7679c47f91f6d4c exit_trap: cleaning up child processes Error on exit *** FINISHED ***
-melwitt
All of them are the same issue. It happened during the devstack run with ML2/OVN. -- amotoki On Wed, Jul 21, 2021 at 8:30 AM melanie witt <melwittt@gmail.com> wrote:
On Tue, 20 Jul 2021, 16:59 -0500, Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
FYI all:
nova-multi-cell, nova-live-migration, and nova-ceph-multistore are also affected:
https://zuul.opendev.org/t/openstack/builds?job_name=nova-multi-cell https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration https://zuul.opendev.org/t/openstack/builds?job_name=nova-ceph-multistore
And here's what the error trace looks like, for anyone who hasn't seen it yet:
Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request. ++ functions-common:oscwrap:2349 : return 1 + lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : NET_ID= + lib/neutron_plugins/services/l3:create_neutron_initial_network:215 : die_if_not_set 215 NET_ID 'Failure creating NET_ID for private 0b1d94f08f194eb5b7679c47f91f6d4c' + functions-common:die_if_not_set:216 : local exitcode=0 [Call Trace] ./stack.sh:1300:create_neutron_initial_network /opt/stack/devstack/lib/neutron_plugins/services/l3:215:die_if_not_set /opt/stack/devstack/functions-common:223:die [ERROR] /opt/stack/devstack/functions-common:215 Failure creating NET_ID for private 0b1d94f08f194eb5b7679c47f91f6d4c exit_trap: cleaning up child processes Error on exit *** FINISHED ***
-melwitt
My (I guess obvious) suggestion is for Neutron to actually gate on the OVN variants of these jobs as DevStack now does. Otherwise the testing is leaky by design. Please amend. -yoctozepto On Wed, Jul 21, 2021 at 1:40 AM Akihiro Motoki <amotoki@gmail.com> wrote:
All of them are the same issue. It happened during the devstack run with ML2/OVN.
-- amotoki
On Wed, Jul 21, 2021 at 8:30 AM melanie witt <melwittt@gmail.com> wrote:
On Tue, 20 Jul 2021, 16:59 -0500, Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
FYI all:
nova-multi-cell, nova-live-migration, and nova-ceph-multistore are also affected:
https://zuul.opendev.org/t/openstack/builds?job_name=nova-multi-cell https://zuul.opendev.org/t/openstack/builds?job_name=nova-live-migration https://zuul.opendev.org/t/openstack/builds?job_name=nova-ceph-multistore
And here's what the error trace looks like, for anyone who hasn't seen it yet:
Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request. ++ functions-common:oscwrap:2349 : return 1 + lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : NET_ID= + lib/neutron_plugins/services/l3:create_neutron_initial_network:215 : die_if_not_set 215 NET_ID 'Failure creating NET_ID for private 0b1d94f08f194eb5b7679c47f91f6d4c' + functions-common:die_if_not_set:216 : local exitcode=0 [Call Trace] ./stack.sh:1300:create_neutron_initial_network /opt/stack/devstack/lib/neutron_plugins/services/l3:215:die_if_not_set /opt/stack/devstack/functions-common:223:die [ERROR] /opt/stack/devstack/functions-common:215 Failure creating NET_ID for private 0b1d94f08f194eb5b7679c47f91f6d4c exit_trap: cleaning up child processes Error on exit *** FINISHED ***
-melwitt
Hi, I checked the recent neutron log and found the same error in neutron-ovn-tempest-slow job in the check queue [1] of a neutron change merged a couple of hours ago [0]. I don't see such errors in other changes merged recently (801068 and 801076). 779310 is a child commit of 776701, so it hit the same error, but these are not directly related. I believe reverting https://review.opendev.org/c/openstack/neutron/+/776701 will fix the issue. I will update https://review.opendev.org/c/openstack/neutron/+/801478. The reason this issue happened is a configuration mismatch between tempest-slow/full used globally (in most projects) and tempest-slow/full used in neutron gate. The neutron gate has two types of the tempest-slow jobs: neutron-ovs-tempest-slow and neutron-ovn-tempest-slow. The former is voting but the latter is non-voting. tempest-slow used globally is same as the latter (neutron-ovn-tempest-slow) from the point of view of devstack configuration. This is the reason the neutron gate could not detect it before merging the commit which triggered the error. [0] https://review.opendev.org/c/openstack/neutron/+/776701 [1] https://zuul.opendev.org/t/openstack/build/5344d9bdec9346738499114c10a81aff/... Thanks, Akihiro Motoki (amotoki) On Wed, Jul 21, 2021 at 7:04 AM Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
-gmann
Hi, On środa, 21 lipca 2021 01:30:25 CEST Akihiro Motoki wrote:
Hi,
I checked the recent neutron log and found the same error in neutron-ovn-tempest-slow job in the check queue [1] of a neutron change merged a couple of hours ago [0]. I don't see such errors in other changes merged recently (801068 and 801076). 779310 is a child commit of 776701, so it hit the same error, but these are not directly related. I believe reverting https://review.opendev.org/c/openstack/neutron/+/776701 will fix the issue. I will update https://review.opendev.org/c/openstack/neutron/+/801478.
The reason this issue happened is a configuration mismatch between tempest-slow/full used globally (in most projects) and tempest-slow/full used in neutron gate. The neutron gate has two types of the tempest-slow jobs: neutron-ovs-tempest-slow and neutron-ovn-tempest-slow. The former is voting but the latter is non-voting. tempest-slow used globally is same as the latter (neutron-ovn-tempest-slow) from the point of view of devstack configuration. This is the reason the neutron gate could not detect it before merging the commit which triggered the error.
Thx for analysis and fast approval of that revert. I just proposed https://review.opendev.org/c/openstack/neutron/+/801598 to make that neutron job voting and gating and to avoid such issues in the future.
[0] https://review.opendev.org/c/openstack/neutron/+/776701 [1] https://zuul.opendev.org/t/openstack/build/5344d9bdec9346738499114c10a81aff/... og/controller/logs/screen-q-svc.txt#1993
Thanks, Akihiro Motoki (amotoki)
On Wed, Jul 21, 2021 at 7:04 AM Ghanshyam Mann <gmann@ghanshyammann.com>
wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
-gmann
-- Slawek Kaplonski Principal Software Engineer Red Hat
---- On Wed, 21 Jul 2021 01:53:59 -0500 Slawek Kaplonski <skaplons@redhat.com> wrote ----
Hi,
On środa, 21 lipca 2021 01:30:25 CEST Akihiro Motoki wrote:
Hi,
I checked the recent neutron log and found the same error in neutron-ovn-tempest-slow job in the check queue [1] of a neutron change merged a couple of hours ago [0]. I don't see such errors in other changes merged recently (801068 and 801076). 779310 is a child commit of 776701, so it hit the same error, but these are not directly related. I believe reverting https://review.opendev.org/c/openstack/neutron/+/776701 will fix the issue. I will update https://review.opendev.org/c/openstack/neutron/+/801478.
The reason this issue happened is a configuration mismatch between tempest-slow/full used globally (in most projects) and tempest-slow/full used in neutron gate. The neutron gate has two types of the tempest-slow jobs: neutron-ovs-tempest-slow and neutron-ovn-tempest-slow. The former is voting but the latter is non-voting. tempest-slow used globally is same as the latter (neutron-ovn-tempest-slow) from the point of view of devstack configuration. This is the reason the neutron gate could not detect it before merging the commit which triggered the error.
Thx for analysis and fast approval of that revert. I just proposed https://review.opendev.org/c/openstack/neutron/+/801598 to make that neutron job voting and gating and to avoid such issues in the future.
+1, having the ovn job which is the default in devstack as voting will help. -gmann
[0] https://review.opendev.org/c/openstack/neutron/+/776701 [1] https://zuul.opendev.org/t/openstack/build/5344d9bdec9346738499114c10a81aff/... og/controller/logs/screen-q-svc.txt#1993
Thanks, Akihiro Motoki (amotoki)
On Wed, Jul 21, 2021 at 7:04 AM Ghanshyam Mann <gmann@ghanshyammann.com>
wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
-gmann
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi all, The patch which reverts the failure cause [1] has landed half hour ago. I believe tempest-full/tempest-slow related jobs has recovered now. The patch to make neutron-ovn-tempest-slow voting in neutron [2] is in the gate now. Thanks, Akihiro Motoki (irc: amotoki) [1] https://review.opendev.org/c/openstack/neutron/+/801478 [2] https://review.opendev.org/c/openstack/neutron/+/801598/ On Wed, Jul 21, 2021 at 11:18 PM Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
---- On Wed, 21 Jul 2021 01:53:59 -0500 Slawek Kaplonski <skaplons@redhat.com> wrote ----
Hi,
On środa, 21 lipca 2021 01:30:25 CEST Akihiro Motoki wrote:
Hi,
I checked the recent neutron log and found the same error in neutron-ovn-tempest-slow job in the check queue [1] of a neutron change merged a couple of hours ago [0]. I don't see such errors in other changes merged recently (801068 and 801076). 779310 is a child commit of 776701, so it hit the same error, but these are not directly related. I believe reverting https://review.opendev.org/c/openstack/neutron/+/776701 will fix the issue. I will update https://review.opendev.org/c/openstack/neutron/+/801478.
The reason this issue happened is a configuration mismatch between tempest-slow/full used globally (in most projects) and tempest-slow/full used in neutron gate. The neutron gate has two types of the tempest-slow jobs: neutron-ovs-tempest-slow and neutron-ovn-tempest-slow. The former is voting but the latter is non-voting. tempest-slow used globally is same as the latter (neutron-ovn-tempest-slow) from the point of view of devstack configuration. This is the reason the neutron gate could not detect it before merging the commit which triggered the error.
Thx for analysis and fast approval of that revert. I just proposed https://review.opendev.org/c/openstack/neutron/+/801598 to make that neutron job voting and gating and to avoid such issues in the future.
+1, having the ovn job which is the default in devstack as voting will help.
-gmann
[0] https://review.opendev.org/c/openstack/neutron/+/776701 [1] https://zuul.opendev.org/t/openstack/build/5344d9bdec9346738499114c10a81aff/... og/controller/logs/screen-q-svc.txt#1993
Thanks, Akihiro Motoki (amotoki)
On Wed, Jul 21, 2021 at 7:04 AM Ghanshyam Mann <gmann@ghanshyammann.com>
wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
-gmann
-- Slawek Kaplonski Principal Software Engineer Red Hat
---- On Wed, 21 Jul 2021 20:27:59 -0500 Akihiro Motoki <amotoki@gmail.com> wrote ----
Hi all,
The patch which reverts the failure cause [1] has landed half hour ago. I believe tempest-full/tempest-slow related jobs has recovered now.
Yeah, tempest-full/tempest-slow/nova jobs are all passed on this error. We can recheck the failing patch now. -gmann
The patch to make neutron-ovn-tempest-slow voting in neutron [2] is in the gate now.
Thanks, Akihiro Motoki (irc: amotoki)
[1] https://review.opendev.org/c/openstack/neutron/+/801478 [2] https://review.opendev.org/c/openstack/neutron/+/801598/
On Wed, Jul 21, 2021 at 11:18 PM Ghanshyam Mann <gmann@ghanshyammann.com> wrote:
---- On Wed, 21 Jul 2021 01:53:59 -0500 Slawek Kaplonski <skaplons@redhat.com> wrote ----
Hi,
On środa, 21 lipca 2021 01:30:25 CEST Akihiro Motoki wrote:
Hi,
I checked the recent neutron log and found the same error in neutron-ovn-tempest-slow job in the check queue [1] of a neutron change merged a couple of hours ago [0]. I don't see such errors in other changes merged recently (801068 and 801076). 779310 is a child commit of 776701, so it hit the same error, but these are not directly related. I believe reverting https://review.opendev.org/c/openstack/neutron/+/776701 will fix the issue. I will update https://review.opendev.org/c/openstack/neutron/+/801478.
The reason this issue happened is a configuration mismatch between tempest-slow/full used globally (in most projects) and tempest-slow/full used in neutron gate. The neutron gate has two types of the tempest-slow jobs: neutron-ovs-tempest-slow and neutron-ovn-tempest-slow. The former is voting but the latter is non-voting. tempest-slow used globally is same as the latter (neutron-ovn-tempest-slow) from the point of view of devstack configuration. This is the reason the neutron gate could not detect it before merging the commit which triggered the error.
Thx for analysis and fast approval of that revert. I just proposed https://review.opendev.org/c/openstack/neutron/+/801598 to make that neutron job voting and gating and to avoid such issues in the future.
+1, having the ovn job which is the default in devstack as voting will help.
-gmann
[0] https://review.opendev.org/c/openstack/neutron/+/776701 [1] https://zuul.opendev.org/t/openstack/build/5344d9bdec9346738499114c10a81aff/... og/controller/logs/screen-q-svc.txt#1993
Thanks, Akihiro Motoki (amotoki)
On Wed, Jul 21, 2021 at 7:04 AM Ghanshyam Mann <gmann@ghanshyammann.com>
wrote:
Hello Everyone,
Since 2-3 hrs before, tempest-full-py3, tempest-slow-py3 and few more jobs started failing consistently in create_neutron_initial_network() method:
"++ lib/neutron_plugins/services/l3:create_neutron_initial_network:214 : oscwrap --os-cloud devstack-admin --os-region RegionOne network create --project a6b71b541987471595bf5e38f5fbe264 private"
- https://zuul.openstack.org/builds?job_name=tempest-full-py3 - https://zuul.openstack.org/builds?job_name=tempest-slow-py3
Strange is that 'tempest-integrated-storage'|'compute' jobs are passing even the configuration is almost the same.
Slaweq reported the below bug and trying some neutron revert to know the root cause. He will check it tomorrow morning.
- https://bugs.launchpad.net/neutron/+bug/1936983
Until we find the root cause/fix, do not recheck on these failures.
-gmann
-- Slawek Kaplonski Principal Software Engineer Red Hat
participants (5)
-
Akihiro Motoki
-
Ghanshyam Mann
-
melanie witt
-
Radosław Piliszek
-
Slawek Kaplonski