[openstack-ansible] Weird install failure
Hi all, I think I'm close to getting things working at this point, but I'm seeing a weird install failure during openstack.osa.setup_openstack: FAILED - RETRYING: [ostk-controller-heat-api-container-7d1e8599 -> ostk-controller-utility-container-db5068d8]: Add service users to roles (1 retries left).Result was: attempts: 5 censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result' changed: false retries: 6 Using module file /etc/ansible/ansible_collections/openstack/cloud/plugins/modules/role_assignment.py Pipelining is enabled. <ostk-controller> ESTABLISH SSH CONNECTION FOR USER: root <ostk-controller> SSH: ansible.cfg set ssh_args: (-C)(-o)(ControlMaster=auto)(-o)(ControlPersist=300) <ostk-controller> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no) <ostk-controller> SSH: ANSIBLE_REMOTE_PORT/remote_port/ansible_port set: (-o)(Port=22) <ostk-controller> SSH: ansible_password/ansible_ssh_password not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no) <ostk-controller> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="root") <ostk-controller> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=5) <ostk-controller> SSH: Set ssh_extra_args: (-o)(UserKnownHostsFile=/dev/null)(-o)(StrictHostKeyChecking=no)(-o)(ServerAliveInterval=64)(-o)(ServerAliveCountMax=1024)(-o)(Compression=no)(-o)(TCPKeepAlive=yes)(-o)(VerifyHostKeyDNS=no)(-o)(ForwardX11=no)(-o)(ForwardAgent=yes)(-T) <ostk-controller> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath="/root/.ansible/cp/d3dfd6cdfc") <ostk-controller> SSH: EXEC ssh -vvvvv -C -o ControlMaster=auto -o ControlPersist=300 -o StrictHostKeyChecking=no -o Port=22 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ServerAliveInterval=64 -o ServerAliveCountMax=1024 -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T -o 'ControlPath="/root/.ansible/cp/d3dfd6cdfc"' ostk-controller 'sudo lxc-attach --clear-env --name ostk-controller-utility-container-db5068d8 -- su - root -c '"'"'/bin/sh -c '"'"'"'"'"'"'"'"'/openstack/venvs/utility-30.1.0.dev46/bin/python && sleep 0'"'"'"'"'"'"'"'"''"'"'' <ostk-controller> rc=1, stdout and stderr censored due to no log <ostk-controller> Failed to connect to the host via ssh: <error censored due to no log> failed: [ostk-controller-heat-api-container-7d1e8599 -> ostk-controller-utility-container-db5068d8(192.168.100.117)] (item=None) => attempts: 5 censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result' changed: false fatal: [ostk-controller-heat-api-container-7d1e8599 -> {{ _service_setup_host }}]: FAILED! => censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result' changed: false These logs came from running ansible with -vvvvv as a parameter. Looking at the failed ssh, this root causes pretty straightforwardly: ssh -vvvvv -C -o ControlMaster=auto -o ControlPersist=300 -o StrictHostKeyChecking=no -o Port=22 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ServerAliveInterval=64 -o ServerAliveCountMax=1024 -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T -o 'ControlPath="/root/.ansible/cp/d3dfd6cdfc"' ostk-controller 'sudo lxc-attach --clear-env --name ostk-controller-utility-container-db5068d8 -- su - root -c '"'"'/bin/sh -c '"'"'"'"'"'"'"'"'/openstack/venvs/utility-30.1.0.dev46/bin/python && sleep 0'"'"'"'"'"'"'"'"''"'"'' If I manually ssh to ostk-controller and lxc-attach -n ostk-controller-utility-container-db5068d8, then run the command explicitly: root@ostk-controller-utility-container-db5068d8:~# /openstack/venvs/utility-30.1.0.dev46/bin/python && sleep 0 Python 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
KeyboardInterrupt
exit()
So basically it's invoking an interactive python session, which of course times out, and after 5 retries the installation aborts. Something is obviously missing somewhere, most likely a parameter after the python command which specifies a python script to be run. Any ideas? Thanks all, Sarah
Hi, It would be very nice if you could include at least the task name which is failing the context. But I think what you were after is just a red herring, as no_log is supplied to the task, so you won't see real output or command regardless of verbosity. If you're trying to install from master, I think this is the bug you are facing: https://bugs.launchpad.net/openstack-ansible/+bug/2103512 And the fix for it has just merged as well: https://opendev.org/openstack/openstack-ansible-plugins/commit/7b579a69eb225... I think you should be able to pull a newer version of the collection with a fix included by re-runninig bootstrap-ansible.sh Also, if I have guessed correctly where issue happens, you should be able to quickly reproduce it and see the actual error by running this: openstack-ansible openstack.osa.heat -e _service_setup_nolog=False --tags common-service On Sat, 22 Mar 2025, 00:09 Sarah Thompson, <plodger@gmail.com> wrote:
Hi all,
I think I'm close to getting things working at this point, but I'm seeing a weird install failure during openstack.osa.setup_openstack:
FAILED - RETRYING: [ostk-controller-heat-api-container-7d1e8599 -> ostk-controller-utility-container-db5068d8]: Add service users to roles (1 retries left).Result was: attempts: 5 censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result' changed: false retries: 6 Using module file /etc/ansible/ansible_collections/openstack/cloud/plugins/modules/role_assignment.py Pipelining is enabled. <ostk-controller> ESTABLISH SSH CONNECTION FOR USER: root <ostk-controller> SSH: ansible.cfg set ssh_args: (-C)(-o)(ControlMaster=auto)(-o)(ControlPersist=300) <ostk-controller> SSH: ANSIBLE_HOST_KEY_CHECKING/host_key_checking disabled: (-o)(StrictHostKeyChecking=no) <ostk-controller> SSH: ANSIBLE_REMOTE_PORT/remote_port/ansible_port set: (-o)(Port=22) <ostk-controller> SSH: ansible_password/ansible_ssh_password not set: (-o)(KbdInteractiveAuthentication=no)(-o)(PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey)(-o)(PasswordAuthentication=no) <ostk-controller> SSH: ANSIBLE_REMOTE_USER/remote_user/ansible_user/user/-u set: (-o)(User="root") <ostk-controller> SSH: ANSIBLE_TIMEOUT/timeout set: (-o)(ConnectTimeout=5) <ostk-controller> SSH: Set ssh_extra_args: (-o)(UserKnownHostsFile=/dev/null)(-o)(StrictHostKeyChecking=no)(-o)(ServerAliveInterval=64)(-o)(ServerAliveCountMax=1024)(-o)(Compression=no)(-o)(TCPKeepAlive=yes)(-o)(VerifyHostKeyDNS=no)(-o)(ForwardX11=no)(-o)(ForwardAgent=yes)(-T) <ostk-controller> SSH: found only ControlPersist; added ControlPath: (-o)(ControlPath="/root/.ansible/cp/d3dfd6cdfc") <ostk-controller> SSH: EXEC ssh -vvvvv -C -o ControlMaster=auto -o ControlPersist=300 -o StrictHostKeyChecking=no -o Port=22 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ServerAliveInterval=64 -o ServerAliveCountMax=1024 -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T -o 'ControlPath="/root/.ansible/cp/d3dfd6cdfc"' ostk-controller 'sudo lxc-attach --clear-env --name ostk-controller-utility-container-db5068d8 -- su - root -c '"'"'/bin/sh -c '"'"'"'"'"'"'"'"'/openstack/venvs/utility-30.1.0.dev46/bin/python && sleep 0'"'"'"'"'"'"'"'"''"'"'' <ostk-controller> rc=1, stdout and stderr censored due to no log <ostk-controller> Failed to connect to the host via ssh: <error censored due to no log> failed: [ostk-controller-heat-api-container-7d1e8599 -> ostk-controller-utility-container-db5068d8(192.168.100.117)] (item=None) => attempts: 5 censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result' changed: false fatal: [ostk-controller-heat-api-container-7d1e8599 -> {{ _service_setup_host }}]: FAILED! => censored: 'the output has been hidden due to the fact that ''no_log: true'' was specified for this result' changed: false
These logs came from running ansible with -vvvvv as a parameter. Looking at the failed ssh, this root causes pretty straightforwardly:
ssh -vvvvv -C -o ControlMaster=auto -o ControlPersist=300 -o StrictHostKeyChecking=no -o Port=22 -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=5 -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -o ServerAliveInterval=64 -o ServerAliveCountMax=1024 -o Compression=no -o TCPKeepAlive=yes -o VerifyHostKeyDNS=no -o ForwardX11=no -o ForwardAgent=yes -T -o 'ControlPath="/root/.ansible/cp/d3dfd6cdfc"' ostk-controller 'sudo lxc-attach --clear-env --name ostk-controller-utility-container-db5068d8 -- su - root -c '"'"'/bin/sh -c '"'"'"'"'"'"'"'"'/openstack/venvs/utility-30.1.0.dev46/bin/python && sleep 0'"'"'"'"'"'"'"'"''"'"''
If I manually ssh to ostk-controller and lxc-attach -n ostk-controller-utility-container-db5068d8, then run the command explicitly:
root@ostk-controller-utility-container-db5068d8:~# /openstack/venvs/utility-30.1.0.dev46/bin/python && sleep 0 Python 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0] on linux Type "help", "copyright", "credits" or "license" for more information.
KeyboardInterrupt
exit()
So basically it's invoking an interactive python session, which of course times out, and after 5 retries the installation aborts. Something is obviously missing somewhere, most likely a parameter after the python command which specifies a python script to be run. Any ideas?
Thanks all, Sarah
participants (2)
-
Dmitriy Rabotyagov
-
Sarah Thompson