[kayobe][victoria] no module named docker - deploy fail after deploy successful
I had a successful deployment of Openstack Victoria via Kayobe with an all-in-one node running controller and compute roles. I wanted to then add 2 controller nodes to make 3 controllers and one compute. The 2 additional controllers have a different interface naming so I needed to modify the inventory. I checked this ansible documentation to figure out the changes I’d need to make [1]. The first try, I misunderstood the layout because kayobe tried to configure the new nodes interfaces with incorrect naming.
In my first try I tried this inventory layout:
Move existing configuration:
[kayobe config] / Inventory / *
To
[kayobe config] / Inventory / Ciscohosts /
Create new from copy of existing:[kayobe config] / Inventory / Ciscohosts /
[kayobe config] / Inventory / Otherhosts /
The Otherhosts had its own host file with the 2 controllers and group_vars/controllers network interface configuration as per these two hosts. But anyway it didnt achieve the desired result so I rechecked the ansible doc [1] and decided to do this another way as follows:
In my second try I first reversed the inventory change:
Delete the new dir: [kayobe config] / Inventory / Otherhosts /
Move back the config: [kayobe config] / Inventory / Ciscohosts / * > [kayobe config] / Inventory /
Delete the empty dir: [kayobe config] / Inventory / Ciscohosts /
Then create host_vars for the two individual hosts:
[kayobe config] / Inventory / host_vars / cnode2
[kayobe config] / Inventory / host_vars / cnode3
And updated the single hosts inventory file [kayobe config] / Inventory / hosts
This seemed to work fine, the “kayobe overcloud host configure” was successful and the hosts interfaces were set up as I desired.
The issue came when doing the “kayobe overcloud service deploy” and failed with "/usr/bin/python3", "-c", "import docker” = ModuleNotFoundError: No module named 'docker' for all three nodes, where previously it (the deployment) had been successful for the all-in-one node. I do not know if this task had run or skipped before but the task is run against "baremetal" group and controllers and compute are in this group so I assume that it had been ran successfully in previous deployments and this is the weird thing because no other changes have been made apart from as described here.
Verbose error output: [3]
After the above error, I reverted the inventory back to the “working” state, which is basically to update the inventory hosts and removed the 2 controllers. As well as remove the whole host_vars directory. After doing this however, the same error is still seen /usr/bin/python3", "-c", "import docker” = ModuleNotFoundError.
I logged into the host and tried to run this manually on the CLI and I see the same output. What I don’t understand is why this error is occurring now after previous successful deployments.
To try and resolve /workaround this issue I have tried to no avail: - recreating virtual environments on all-in-one node
- recreating virtual environments on ACH
- deleting the [kolla config] directory
- deleting .ansible and /tmp/ caches
- turning off pipelining
After doing the above I needed to do the control host bootstrap and host configure before service deploy however the same error persisted and I could not work around it with any of the above steps being performed.
As a test, I decided to turn off this task in the playbook [4] and the yml file runs as follows: [2]. This results in a (maybe pseudo) successful deployment again, in a sense that it deploys without failure because that task does not run.
After this was successful in deploying once again as it had previously had been, I added the two controller nodes using the “host_vars” and then I was able to successfully deploy again with HA controllers. Well, it is successful apart from Designate issue due to Designate already having the config [5]. I can log in to the horizon dashboard and under system information I can see all three controllers there.
Could I ask the community for help with:
1.
Regarding the kayobe inventory, is anything wrong with the 2nd attempt in line with Kayobe? 2.
Has anyone come across this docker issue (or similar within this context of failing after being successful) and can suggest?
I repeatedly get these odd issues where successful deployments then fail in the future. This often occurs after making a config change and then rolling back but the roll back does not return to a working deployment state. The fix/workaround for me in these cases is to “kayobe overcloud service destroy --yes-i-really-really-mean-it” and also re-deploy the host.
[1] Best Practices — Ansible Documentation https://docs.ansible.com/ansible/2.8/user_guide/playbooks_best_practices.html
[2] modified Checking docker SDK version# command: "{{ ansible_python.execut - Pastebin.com https://pastebin.com/xXJFHc0N
[3] TASK [prechecks : Checking docker SDK version] ********************************* - Pastebin.com https://pastebin.com/ZmLs4LUZ
[4] /home/cv-user/kayobe-victoria/venvs/kolla-ansible/share/kolla-ansible/ansible/roles/prechecks/tasks/package_checks.yml
[5] TASK [designate : Update DNS pools] ******************************************** - Pastebin.com https://pastebin.com/USiCAXpT
Kind regards,
On Tue, 29 Jun 2021 at 12:03, Tony Pearce tonyppe@gmail.com wrote:
I had a successful deployment of Openstack Victoria via Kayobe with an all-in-one node running controller and compute roles. I wanted to then add 2 controller nodes to make 3 controllers and one compute. The 2 additional controllers have a different interface naming so I needed to modify the inventory. I checked this ansible documentation to figure out the changes I’d need to make [1]. The first try, I misunderstood the layout because kayobe tried to configure the new nodes interfaces with incorrect naming.
In my first try I tried this inventory layout:
Move existing configuration:
[kayobe config] / Inventory / *
To
[kayobe config] / Inventory / Ciscohosts /
Create new from copy of existing:[kayobe config] / Inventory / Ciscohosts / >
[kayobe config] / Inventory / Otherhosts /
The Otherhosts had its own host file with the 2 controllers and group_vars/controllers network interface configuration as per these two hosts. But anyway it didnt achieve the desired result so I rechecked the ansible doc [1] and decided to do this another way as follows:
In my second try I first reversed the inventory change:
Delete the new dir: [kayobe config] / Inventory / Otherhosts /
Move back the config: [kayobe config] / Inventory / Ciscohosts / * > [kayobe config] / Inventory /
Delete the empty dir: [kayobe config] / Inventory / Ciscohosts /
Then create host_vars for the two individual hosts:
[kayobe config] / Inventory / host_vars / cnode2
[kayobe config] / Inventory / host_vars / cnode3
And updated the single hosts inventory file [kayobe config] / Inventory / hosts
This seemed to work fine, the “kayobe overcloud host configure” was successful and the hosts interfaces were set up as I desired.
The issue came when doing the “kayobe overcloud service deploy” and failed with "/usr/bin/python3", "-c", "import docker” = ModuleNotFoundError: No module named 'docker' for all three nodes, where previously it (the deployment) had been successful for the all-in-one node. I do not know if this task had run or skipped before but the task is run against "baremetal" group and controllers and compute are in this group so I assume that it had been ran successfully in previous deployments and this is the weird thing because no other changes have been made apart from as described here.
Perhaps you somehow lost this file: https://opendev.org/openstack/kayobe-config/src/branch/master/etc/kayobe/inv...
Verbose error output: [3]
After the above error, I reverted the inventory back to the “working” state, which is basically to update the inventory hosts and removed the 2 controllers. As well as remove the whole host_vars directory. After doing this however, the same error is still seen /usr/bin/python3", "-c", "import docker” = ModuleNotFoundError.
I logged into the host and tried to run this manually on the CLI and I see the same output. What I don’t understand is why this error is occurring now after previous successful deployments.
To try and resolve /workaround this issue I have tried to no avail:
recreating virtual environments on all-in-one node
recreating virtual environments on ACH
deleting the [kolla config] directory
deleting .ansible and /tmp/ caches
turning off pipelining
After doing the above I needed to do the control host bootstrap and host configure before service deploy however the same error persisted and I could not work around it with any of the above steps being performed.
As a test, I decided to turn off this task in the playbook [4] and the yml file runs as follows: [2]. This results in a (maybe pseudo) successful deployment again, in a sense that it deploys without failure because that task does not run.
After this was successful in deploying once again as it had previously had been, I added the two controller nodes using the “host_vars” and then I was able to successfully deploy again with HA controllers. Well, it is successful apart from Designate issue due to Designate already having the config [5]. I can log in to the horizon dashboard and under system information I can see all three controllers there.
Could I ask the community for help with:
Regarding the kayobe inventory, is anything wrong with the 2nd attempt in line with Kayobe?
Has anyone come across this docker issue (or similar within this context of failing after being successful) and can suggest?
I repeatedly get these odd issues where successful deployments then fail in the future. This often occurs after making a config change and then rolling back but the roll back does not return to a working deployment state. The fix/workaround for me in these cases is to “kayobe overcloud service destroy --yes-i-really-really-mean-it” and also re-deploy the host.
[1] Best Practices — Ansible Documentation
[2] modified Checking docker SDK version# command: "{{ ansible_python.execut - Pastebin.com
[3] TASK [prechecks : Checking docker SDK version] ********************************* - Pastebin.com
[4] /home/cv-user/kayobe-victoria/venvs/kolla-ansible/share/kolla-ansible/ansible/roles/prechecks/tasks/package_checks.yml
[5] TASK [designate : Update DNS pools] ******************************************** - Pastebin.com
Kind regards,
Perhaps you somehow lost this file:
I thought that too, but I did not change those files at all. But also I duplicated the value from that file and inserted it (pasted the values) above the network config for the individual nodes. It did not have any change on the outcome at all, so I removed that.
I was able to fix this issue by deleting the nodes and redeploying from the same ACH, in the end.
Tony Pearce
On Thu, 1 Jul 2021 at 16:31, Mark Goddard mark@stackhpc.com wrote:
On Tue, 29 Jun 2021 at 12:03, Tony Pearce tonyppe@gmail.com wrote:
I had a successful deployment of Openstack Victoria via Kayobe with an
all-in-one node running controller and compute roles. I wanted to then add 2 controller nodes to make 3 controllers and one compute. The 2 additional controllers have a different interface naming so I needed to modify the inventory. I checked this ansible documentation to figure out the changes I’d need to make [1]. The first try, I misunderstood the layout because kayobe tried to configure the new nodes interfaces with incorrect naming.
In my first try I tried this inventory layout:
Move existing configuration:
[kayobe config] / Inventory / *
To
[kayobe config] / Inventory / Ciscohosts /
Create new from copy of existing:[kayobe config] / Inventory /
Ciscohosts / >
[kayobe config] / Inventory / Otherhosts /
The Otherhosts had its own host file with the 2 controllers and
group_vars/controllers network interface configuration as per these two hosts. But anyway it didnt achieve the desired result so I rechecked the ansible doc [1] and decided to do this another way as follows:
In my second try I first reversed the inventory change:
Delete the new dir: [kayobe config] / Inventory / Otherhosts /
Move back the config: [kayobe config] / Inventory / Ciscohosts / * >
[kayobe config] / Inventory /
Delete the empty dir: [kayobe config] / Inventory / Ciscohosts /
Then create host_vars for the two individual hosts:
[kayobe config] / Inventory / host_vars / cnode2
[kayobe config] / Inventory / host_vars / cnode3
And updated the single hosts inventory file [kayobe config] / Inventory
/ hosts
This seemed to work fine, the “kayobe overcloud host configure” was
successful and the hosts interfaces were set up as I desired.
The issue came when doing the “kayobe overcloud service deploy” and
failed with "/usr/bin/python3", "-c", "import docker” = ModuleNotFoundError: No module named 'docker' for all three nodes, where previously it (the deployment) had been successful for the all-in-one node. I do not know if this task had run or skipped before but the task is run against "baremetal" group and controllers and compute are in this group so I assume that it had been ran successfully in previous deployments and this is the weird thing because no other changes have been made apart from as described here.
Perhaps you somehow lost this file:
https://opendev.org/openstack/kayobe-config/src/branch/master/etc/kayobe/inv...
Verbose error output: [3]
After the above error, I reverted the inventory back to the “working”
state, which is basically to update the inventory hosts and removed the 2 controllers. As well as remove the whole host_vars directory. After doing this however, the same error is still seen /usr/bin/python3", "-c", "import docker” = ModuleNotFoundError.
I logged into the host and tried to run this manually on the CLI and I
see the same output. What I don’t understand is why this error is occurring now after previous successful deployments.
To try and resolve /workaround this issue I have tried to no avail:
recreating virtual environments on all-in-one node
recreating virtual environments on ACH
deleting the [kolla config] directory
deleting .ansible and /tmp/ caches
turning off pipelining
After doing the above I needed to do the control host bootstrap and host
configure before service deploy however the same error persisted and I could not work around it with any of the above steps being performed.
As a test, I decided to turn off this task in the playbook [4] and the
yml file runs as follows: [2]. This results in a (maybe pseudo) successful deployment again, in a sense that it deploys without failure because that task does not run.
After this was successful in deploying once again as it had previously
had been, I added the two controller nodes using the “host_vars” and then I was able to successfully deploy again with HA controllers. Well, it is successful apart from Designate issue due to Designate already having the config [5]. I can log in to the horizon dashboard and under system information I can see all three controllers there.
Could I ask the community for help with:
Regarding the kayobe inventory, is anything wrong with the 2nd attempt
in line with Kayobe?
Has anyone come across this docker issue (or similar within this context
of failing after being successful) and can suggest?
I repeatedly get these odd issues where successful deployments then fail
in the future. This often occurs after making a config change and then rolling back but the roll back does not return to a working deployment state. The fix/workaround for me in these cases is to “kayobe overcloud service destroy --yes-i-really-really-mean-it” and also re-deploy the host.
[1] Best Practices — Ansible Documentation
[2] modified Checking docker SDK version# command: "{{
ansible_python.execut - Pastebin.com
[3] TASK [prechecks : Checking docker SDK version]
********************************* - Pastebin.com
[4]
/home/cv-user/kayobe-victoria/venvs/kolla-ansible/share/kolla-ansible/ansible/roles/prechecks/tasks/package_checks.yml
[5] TASK [designate : Update DNS pools]
******************************************** - Pastebin.com
Kind regards,
participants (2)
-
Mark Goddard
-
Tony Pearce