nova-ovs-hybrid-plug CI jobs failing for Nova
Hi, over the weekend it appears that nova-ovs-hybrid-plug CI jobs have been failing for Nova, I think because of this error: "Failed to contact the endpoint at https://200.225.47.11/image for discovery. Fallback to using that endpoint as the base url." https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90 is an example of such a failure. The CI job then executes: openstack --os-compute-api-version 2.74 server create --image --flavor --nic net-id= --host np0041269081 --wait evacuate-test Which is invalid because it doesn't have an argument to the --image flag. I am unsure who owns 200.225.47.11. Is it possible its offline? Thanks, Michael
On 2025-06-30 05:36:00 +1000 (+1000), Michael Still wrote: [...]
I am unsure who owns 200.225.47.11. Is it possible its offline?
It's a 2-node job, and that's the address of the controller node according to: https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/... Maybe check the logs for services collected from the controller to see if something crashed or failed to start? -- Jeremy Stanley
Thanks for the pointer. I think its clear that I am not an expert on this job... However, I note that both the original failed run and another log this: Warning: Permanently added '200.225.47.19' (ED25519) to the list of known hosts. rsync: [sender] link_stat "/var/lib/zuul/builds/12aacd4951a64c0abef5cfe1bf925c2e/work/ca-bundle.pem" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.2.7] Which seems like it might be causing the compute node to not start? I certainly cannot see any logs from nova on the compute node's zuul output. I'll continue to dig. Michael On Mon, Jun 30, 2025 at 6:15 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2025-06-30 05:36:00 +1000 (+1000), Michael Still wrote: [...]
I am unsure who owns 200.225.47.11. Is it possible its offline?
It's a 2-node job, and that's the address of the controller node according to:
https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/...
Maybe check the logs for services collected from the controller to see if something crashed or failed to start? -- Jeremy Stanley
so looking at the logs devstack did not compelte on the contoler do it did not even start running on the compute the way these jobs work is we fully deploy the all-in-one controller host, then we copy the data related to the tls or ceph secrets to the subnodes and then finally run devstack on those. once all devstack tasks are complete we then proced to do tempest config and run the tests. the controller failed when creating the inital neutorn netowrks which is not reallse surpisign given the q-svr the nenturon server as not deploysed. 2025-06-29 09:01:01.801 | Error while executing command: HttpException: 500, Request Failed: internal server error while processing your request. 2025-06-29 09:01:01.140 | ++ lib/neutron_plugins/services/l3:create_neutron_initial_network:202 : oscwrap --os-cloud devstack --os-region RegionOne network create private -f value -c id my guess is this is related to the eventlet removal it looks like it has been replaced with the neutron-api wsgi application https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/... that has a bunch of sql errors https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/... almost as if the schema had not been applied properly. ah yep https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/... when it was ran it failed here https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/... the hybrid plug job is testing with debian 12 currently and i guess its using 3.11 as a result. the python version might be unrelated but it is a delta form the normal ubuntu 24.04 jobs. the error seams to be sqlalchemy.exc.OperationalError: (pymysql.err.OperationalError) (1273, "Unknown collation: 'utf8mb4_0900_as_cs'") so it looks liek the version of mariadb/myxql that debian 12 is shiping may not supprot that type. so this is a regression introduced by https://github.com/openstack/neutron/commit/8990dd598f84b9da976d72672c6fd603... debian 12 is one of the testign runtimes so neutron annog uncondionally use that coaltion type this cycle it would be better to use utf8mb4_bin as the case sensitive encoding instead. that is much older then the new _0900_ variants On 29/06/2025 23:16, Michael Still wrote:
Thanks for the pointer. I think its clear that I am not an expert on this job... However, I note that both the original failed run and another log this:
Warning: Permanently added '200.225.47.19' (ED25519) to the list of known hosts. rsync: [sender] link_stat "/var/lib/zuul/builds/12aacd4951a64c0abef5cfe1bf925c2e/work/ca-bundle.pem" failed: No such file or directory (2) rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1338) [sender=3.2.7]
Which seems like it might be causing the compute node to not start? I certainly cannot see any logs from nova on the compute node's zuul output.
I'll continue to dig.
Michael
On Mon, Jun 30, 2025 at 6:15 AM Jeremy Stanley <fungi@yuggoth.org> wrote:
On 2025-06-30 05:36:00 +1000 (+1000), Michael Still wrote: [...] > I am unsure who owns 200.225.47.11. Is it possible its offline?
It's a 2-node job, and that's the address of the controller node according to: https://zuul.opendev.org/t/openstack/build/fcc678f62f584bbb9726821f5992ae90/...
Maybe check the logs for services collected from the controller to see if something crashed or failed to start? -- Jeremy Stanley
participants (3)
-
Jeremy Stanley
-
Michael Still
-
Sean Mooney