[neutron] CI update
Hi, Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs. [1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https:// e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/ 805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/ logs/screen-q-agt.txt [3] https://storage.bhs.cloud.ovh.net/v1/ AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/ neutron-ovs-tempest-slow/88f8bb7/job-output.txt -- Slawek Kaplonski Principal Software Engineer Red Hat
Greetings Slawek, Ironic, for now, has disabled the firewall code path in our jobs since it is not a critical item for us to run against integration tests, but I've seen mention to the #opendev folks observing the same failure causing the entire openstack gate to reset and re-run jobs. Given the hit counts from the logs where the error is being observed, it seems more likely this issue is a pan-openstack gate issue causing instability at this time. -Julia On Wed, Sep 22, 2021 at 3:04 AM Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi,
Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https:// e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/ 805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/ logs/screen-q-agt.txt [3] https://storage.bhs.cloud.ovh.net/v1/ AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/ neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure about if it is related to the number of hosts, there's some failure in singlenode jobs as well: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A... example: https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd... Lajos (lajoskatona) Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021. szept. 22., Sze, 12:10):
Hi,
Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https:// e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/ 805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/ logs/screen-q-agt.txt <http://e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/logs/screen-q-agt.txt> [3] https://storage.bhs.cloud.ovh.net/v1/
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/ neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi On Wed, Sep 22, 2021 at 03:30:08PM +0200, Lajos Katona wrote:
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure about if it is related to the number of hosts, there's some failure in singlenode jobs as well: http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A...
example: https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd...
Ok. So it happens on all types of jobs where neutron-ovs-agent is used :/
Lajos (lajoskatona)
Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021. szept. 22., Sze, 12:10):
Hi,
Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https:// e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/ 805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/ logs/screen-q-agt.txt <http://e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/logs/screen-q-agt.txt> [3] https://storage.bhs.cloud.ovh.net/v1/
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/ neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hello folks: I replied in [1]. I think we could have a potential problem when executing a DB change the impies a OF controller re-initialization. Most of the time we don't have problems but as we see in the CI, we could sometimes. I'll push a patch to add a retry decorator on the methods that trigger this OF controller restart. Regards. [1]https://bugs.launchpad.net/neutron/+bug/1944201/comments/4 On Wed, Sep 22, 2021 at 4:36 PM Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi
On Wed, Sep 22, 2021 at 03:30:08PM +0200, Lajos Katona wrote:
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure about if it is related to the number of hosts, there's some failure in singlenode jobs as well:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A...
example:
https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd...
Ok. So it happens on all types of jobs where neutron-ovs-agent is used :/
Lajos (lajoskatona)
Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021. szept.
22.,
Sze, 12:10):
Hi,
Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https://
e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/
805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/
logs/screen-q-agt.txt < http://e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn...
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/
neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi, On Wed, Sep 22, 2021 at 04:55:34PM +0200, Rodolfo Alonso Hernandez wrote:
Hello folks:
I replied in [1]. I think we could have a potential problem when executing a DB change the impies a OF controller re-initialization. Most of the time we don't have problems but as we see in the CI, we could sometimes.
I'll push a patch to add a retry decorator on the methods that trigger this OF controller restart.
I see that patch https://review.opendev.org/c/openstack/neutron/+/810592 is already merged. Thx Rodolfo for quick fix. @Julia: can You check if Ironic jobs are ok now?
Regards.
[1]https://bugs.launchpad.net/neutron/+bug/1944201/comments/4
On Wed, Sep 22, 2021 at 4:36 PM Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi
On Wed, Sep 22, 2021 at 03:30:08PM +0200, Lajos Katona wrote:
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure about if it is related to the number of hosts, there's some failure in singlenode jobs as well:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A...
example:
https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd...
Ok. So it happens on all types of jobs where neutron-ovs-agent is used :/
Lajos (lajoskatona)
Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021. szept.
22.,
Sze, 12:10):
Hi,
Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https://
e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/
805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/
logs/screen-q-agt.txt < http://e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn...
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/
neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi Slawek, I just pushed a patch to revert the change we did to workaround the issue in Ironic [1]. Thanks! [1] https://review.opendev.org/c/openstack/ironic/+/810973 Em sáb., 25 de set. de 2021 às 09:04, Slawek Kaplonski <skaplons@redhat.com> escreveu:
Hi,
Hello folks:
I replied in [1]. I think we could have a potential problem when executing a DB change the impies a OF controller re-initialization. Most of the time we don't have problems but as we see in the CI, we could sometimes.
I'll push a patch to add a retry decorator on the methods that trigger
On Wed, Sep 22, 2021 at 04:55:34PM +0200, Rodolfo Alonso Hernandez wrote: this
OF controller restart.
I see that patch https://review.opendev.org/c/openstack/neutron/+/810592 is already merged. Thx Rodolfo for quick fix. @Julia: can You check if Ironic jobs are ok now?
Regards.
[1]https://bugs.launchpad.net/neutron/+bug/1944201/comments/4
On Wed, Sep 22, 2021 at 4:36 PM Slawek Kaplonski <skaplons@redhat.com> wrote:
Hi
On Wed, Sep 22, 2021 at 03:30:08PM +0200, Lajos Katona wrote:
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure about if it is related to the number of hosts, there's some failure in singlenode jobs as well:
http://logstash.openstack.org/#dashboard/file/logstash.json?query=message%3A...
example:
https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd...
Ok. So it happens on all types of jobs where neutron-ovs-agent is used
:/
Lajos (lajoskatona)
Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021.
szept.
22.,
Sze, 12:10):
Hi,
Due to very urgent things which I had yesterday, I wasn't able to run Neutron CI meeting as usually. Fortunatelly we don't have many new issues in our CI. There is only one new issue in our scenario jobs which I wanted to discuss [1]. It's impacting Ironic gates but I noticed it also in the Neutron CI as well. See [2] or [3] for example. I'm not sure about Ironic jobs but in Neutron I saw it mostly (or only, I'm not sure) in the multinode jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https://
e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/
805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/
logs/screen-q-agt.txt <
http://e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn...
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/check/
neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- *Att[]'sIury Gregory Melo Ferreira * *MSc in Computer Science at UFCG* *Part of the ironic-core and puppet-manager-core team in OpenStack* *Software Engineer at Red Hat Czech* *Social*: https://www.linkedin.com/in/iurygregory *E-mail: iurygregory@gmail.com <iurygregory@gmail.com>*
Hi, Thx. Unfortunatelly it looks that the problem is still there: https:// f17f24740cbe042b8de3-2bd0a90c56ef0ba69bd0efa56ba93067.ssl.cf1.rackcdn.com/ 810973/1/check/ironic-tempest-ipa-partition-pxe_ipmitool/a36b93f/controller/ logs/screen-q-agt.txt We will need to investigate it on Monday. On sobota, 25 września 2021 09:10:04 CEST Iury Gregory wrote:
Hi Slawek,
I just pushed a patch to revert the change we did to workaround the issue in Ironic [1]. Thanks!
[1] https://review.opendev.org/c/openstack/ironic/+/810973
Em sáb., 25 de set. de 2021 às 09:04, Slawek Kaplonski <skaplons@redhat.com>
escreveu:
Hi,
On Wed, Sep 22, 2021 at 04:55:34PM +0200, Rodolfo Alonso Hernandez wrote:
Hello folks:
I replied in [1]. I think we could have a potential problem when
executing
a DB change the impies a OF controller re-initialization. Most of the
time
we don't have problems but as we see in the CI, we could sometimes.
I'll push a patch to add a retry decorator on the methods that trigger
this
OF controller restart.
I see that patch https://review.opendev.org/c/openstack/neutron/+/810592 is already merged. Thx Rodolfo for quick fix. @Julia: can You check if Ironic jobs are ok now?
Regards.
[1]https://bugs.launchpad.net/neutron/+bug/1944201/comments/4
On Wed, Sep 22, 2021 at 4:36 PM Slawek Kaplonski <skaplons@redhat.com>
wrote:
Hi
On Wed, Sep 22, 2021 at 03:30:08PM +0200, Lajos Katona wrote:
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure
about if
it is related to the number of hosts, there's some failure in
singlenode jobs as well: http://logstash.openstack.org/#dashboard/file/logstash.json? query=message%3A %5C%22error%20Datapath%20Invalid%5C%22> example: https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd n.com/805849/10/check/neutron-ovs-rally-task/5981df8/controller/logs/ screen- q-agt.txt> Ok. So it happens on all types of jobs where neutron-ovs-agent is used : :/ : Lajos (lajoskatona)
Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021.
szept.
22.,
Sze, 12:10):
Hi,
Due to very urgent things which I had yesterday, I wasn't able to
run
Neutron CI meeting as usually. Fortunatelly we don't have many new issues
in
our
CI. There is only one new issue in our scenario jobs which I wanted to
discuss
[1]. It's impacting Ironic gates but I noticed it also in the
Neutron
CI
as well. See [2] or [3] for example. I'm not sure about Ironic jobs
but in
Neutron I saw it mostly (or only, I'm not sure) in the multinode
jobs.
[1] https://bugs.launchpad.net/neutron/+bug/1944201 [2] https://
e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/
805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/
logs/screen-q-agt.txt <
http:// e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn .com/805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/ comput e1/logs/screen-q-agt.txt>
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/ check/
neutron-ovs-tempest-slow/88f8bb7/job-output.txt
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
Hi, On Sun, Sep 26, 2021 at 09:14:07AM +0200, Slawek Kaplonski wrote:
Hi,
Thx. Unfortunatelly it looks that the problem is still there: https:// f17f24740cbe042b8de3-2bd0a90c56ef0ba69bd0efa56ba93067.ssl.cf1.rackcdn.com/ 810973/1/check/ironic-tempest-ipa-partition-pxe_ipmitool/a36b93f/controller/ logs/screen-q-agt.txt
We will need to investigate it on Monday.
Ok, it looks I was too fast with saying that the patch didn't help. It seems that in the job https://zuul.opendev.org/t/openstack/build/a36b93faefde49dc8ac0b3426ec0281d/... it failed for some different reason and neutron-ovs-agent didn't actually crashed there.
On sobota, 25 września 2021 09:10:04 CEST Iury Gregory wrote:
Hi Slawek,
I just pushed a patch to revert the change we did to workaround the issue in Ironic [1]. Thanks!
[1] https://review.opendev.org/c/openstack/ironic/+/810973
Em sáb., 25 de set. de 2021 às 09:04, Slawek Kaplonski <skaplons@redhat.com>
escreveu:
Hi,
On Wed, Sep 22, 2021 at 04:55:34PM +0200, Rodolfo Alonso Hernandez wrote:
Hello folks:
I replied in [1]. I think we could have a potential problem when
executing
a DB change the impies a OF controller re-initialization. Most of the
time
we don't have problems but as we see in the CI, we could sometimes.
I'll push a patch to add a retry decorator on the methods that trigger
this
OF controller restart.
I see that patch https://review.opendev.org/c/openstack/neutron/+/810592 is already merged. Thx Rodolfo for quick fix. @Julia: can You check if Ironic jobs are ok now?
Regards.
[1]https://bugs.launchpad.net/neutron/+bug/1944201/comments/4
On Wed, Sep 22, 2021 at 4:36 PM Slawek Kaplonski <skaplons@redhat.com>
wrote:
Hi
On Wed, Sep 22, 2021 at 03:30:08PM +0200, Lajos Katona wrote:
Hi Slawek, Thanks for the summary. Regarding https://bugs.launchpad.net/neutron/+bug/1944201 not sure
about if
it is related to the number of hosts, there's some failure in
singlenode jobs as well: http://logstash.openstack.org/#dashboard/file/logstash.json? query=message%3A %5C%22error%20Datapath%20Invalid%5C%22> example: https://9f672e0630f459ee81cb-e4093b1756a9a5a7c7d28e6575b4af7f.ssl.cf5.rackcd n.com/805849/10/check/neutron-ovs-rally-task/5981df8/controller/logs/ screen- q-agt.txt> Ok. So it happens on all types of jobs where neutron-ovs-agent is used : :/ : Lajos (lajoskatona)
Slawek Kaplonski <skaplons@redhat.com> ezt írta (időpont: 2021.
szept.
22.,
Sze, 12:10): > Hi, > > Due to very urgent things which I had yesterday, I wasn't able to
run
> Neutron > CI meeting as usually. Fortunatelly we don't have many new issues
in
our
> CI. > There is only one new issue in our scenario jobs which I wanted to
discuss
> [1]. It's impacting Ironic gates but I noticed it also in the
Neutron
CI
> as > well. See [2] or [3] for example. I'm not sure about Ironic jobs
but in
> Neutron I saw it mostly (or only, I'm not sure) in the multinode
jobs.
> [1] https://bugs.launchpad.net/neutron/+bug/1944201 > [2] https://
e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn.com/
805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/compute1/
> logs/screen-q-agt.txt > <
http:// e36beaa2ff297ebe7d5f-5944c3d62ed334b8cdf50b534c246731.ssl.cf5.rackcdn .com/805849/9/check/neutron-ovs-tempest-dvr-ha-multinode-full/f83fa96/ comput e1/logs/screen-q-agt.txt>
AUTH_dcaab5e32b234d56b626f72581e3644c/zuul_opendev_logs_88f/803045/12/ check/
> neutron-ovs-tempest-slow/88f8bb7/job-output.txt > > -- > Slawek Kaplonski > Principal Software Engineer > Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
-- Slawek Kaplonski Principal Software Engineer Red Hat
participants (5)
-
Iury Gregory
-
Julia Kreger
-
Lajos Katona
-
Rodolfo Alonso Hernandez
-
Slawek Kaplonski