[kayobe][victoria] no module named docker - deploy fail after deploy successful
Tony Pearce
tonyppe at gmail.com
Tue Jun 29 11:02:29 UTC 2021
I had a successful deployment of Openstack Victoria via Kayobe with an
all-in-one node running controller and compute roles. I wanted to then add
2 controller nodes to make 3 controllers and one compute. The 2 additional
controllers have a different interface naming so I needed to modify the
inventory. I checked this ansible documentation to figure out the changes
I’d need to make [1]. The first try, I misunderstood the layout because
kayobe tried to configure the new nodes interfaces with incorrect naming.
In my first try I tried this inventory layout:
Move existing configuration:
[kayobe config] / Inventory / *
To
[kayobe config] / Inventory / Ciscohosts /
Create new from copy of existing:[kayobe config] / Inventory / Ciscohosts /
>
[kayobe config] / Inventory / Otherhosts /
The Otherhosts had its own host file with the 2 controllers and
group_vars/controllers network interface configuration as per these two
hosts. But anyway it didnt achieve the desired result so I rechecked the
ansible doc [1] and decided to do this another way as follows:
In my second try I first reversed the inventory change:
Delete the new dir: [kayobe config] / Inventory / Otherhosts /
Move back the config: [kayobe config] / Inventory / Ciscohosts / * >
[kayobe config] / Inventory /
Delete the empty dir: [kayobe config] / Inventory / Ciscohosts /
Then create host_vars for the two individual hosts:
[kayobe config] / Inventory / host_vars / cnode2
[kayobe config] / Inventory / host_vars / cnode3
And updated the single hosts inventory file [kayobe config] / Inventory /
hosts
This seemed to work fine, the “kayobe overcloud host configure” was
successful and the hosts interfaces were set up as I desired.
The issue came when doing the “kayobe overcloud service deploy” and failed
with "/usr/bin/python3", "-c", "import docker” = ModuleNotFoundError: No
module named 'docker' for all three nodes, where previously it (the
deployment) had been successful for the all-in-one node. I do not know if
this task had run or skipped before but the task is run against "baremetal"
group and controllers and compute are in this group so I assume that it had
been ran successfully in previous deployments and this is the weird thing
because no other changes have been made apart from as described here.
Verbose error output: [3]
After the above error, I reverted the inventory back to the “working”
state, which is basically to update the inventory hosts and removed the 2
controllers. As well as remove the whole host_vars directory. After doing
this however, the same error is still seen /usr/bin/python3", "-c", "import
docker” = ModuleNotFoundError.
I logged into the host and tried to run this manually on the CLI and I see
the same output. What I don’t understand is why this error is occurring now
after previous successful deployments.
To try and resolve /workaround this issue I have tried to no avail:
- recreating virtual environments on all-in-one node
- recreating virtual environments on ACH
- deleting the [kolla config] directory
- deleting .ansible and /tmp/ caches
- turning off pipelining
After doing the above I needed to do the control host bootstrap and host
configure before service deploy however the same error persisted and I
could not work around it with any of the above steps being performed.
As a test, I decided to turn off this task in the playbook [4] and the yml
file runs as follows: [2]. This results in a (maybe pseudo) successful
deployment again, in a sense that it deploys without failure because that
task does not run.
After this was successful in deploying once again as it had previously had
been, I added the two controller nodes using the “host_vars” and then I was
able to successfully deploy again with HA controllers. Well, it is
successful apart from Designate issue due to Designate already having the
config [5]. I can log in to the horizon dashboard and under system
information I can see all three controllers there.
Could I ask the community for help with:
1.
Regarding the kayobe inventory, is anything wrong with the 2nd attempt
in line with Kayobe?
2.
Has anyone come across this docker issue (or similar within this context
of failing after being successful) and can suggest?
I repeatedly get these odd issues where successful deployments then fail in
the future. This often occurs after making a config change and then rolling
back but the roll back does not return to a working deployment state. The
fix/workaround for me in these cases is to “kayobe overcloud service
destroy --yes-i-really-really-mean-it” and also re-deploy the host.
[1] Best Practices — Ansible Documentation
<https://docs.ansible.com/ansible/2.8/user_guide/playbooks_best_practices.html>
[2] modified Checking docker SDK version# command: "{{
ansible_python.execut - Pastebin.com <https://pastebin.com/xXJFHc0N>
[3] TASK [prechecks : Checking docker SDK version]
********************************* - Pastebin.com
<https://pastebin.com/ZmLs4LUZ>
[4]
/home/cv-user/kayobe-victoria/venvs/kolla-ansible/share/kolla-ansible/ansible/roles/prechecks/tasks/package_checks.yml
[5] TASK [designate : Update DNS pools]
******************************************** - Pastebin.com
<https://pastebin.com/USiCAXpT>
Kind regards,
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20210629/72608fa3/attachment-0001.html>
More information about the openstack-discuss
mailing list