I had a successful deployment of Openstack Victoria via Kayobe with an all-in-one node running controller and compute roles. I wanted to then add 2 controller nodes to make 3 controllers and one compute. The 2 additional controllers have a different interface naming so I needed to modify the inventory. I checked this ansible documentation to figure out the changes I’d need to make [1]. The first try, I misunderstood the layout because kayobe tried to configure the new nodes interfaces with incorrect naming. 

In my first try I tried this inventory layout: 


Move existing configuration:


[kayobe config] / Inventory / * 

To

[kayobe config] / Inventory / Ciscohosts / 


Create new from copy of existing:[kayobe config] / Inventory / Ciscohosts / > 

[kayobe config] / Inventory / Otherhosts / 


The Otherhosts had its own host file with the 2 controllers and group_vars/controllers network interface configuration as per these two hosts. But anyway it didnt achieve the desired result so I rechecked the ansible doc [1] and decided to do this another way as follows: 


In my second try I first reversed the inventory change:


Delete the new dir: [kayobe config] / Inventory / Otherhosts / 

Move back the config: [kayobe config] / Inventory / Ciscohosts / * > [kayobe config] / Inventory / 

Delete the empty dir: [kayobe config] / Inventory / Ciscohosts /


Then create host_vars for the two individual hosts:

[kayobe config] / Inventory / host_vars / cnode2

[kayobe config] / Inventory / host_vars / cnode3


And updated the single hosts inventory file [kayobe config] / Inventory / hosts


This seemed to work fine, the “kayobe overcloud host configure” was successful and the hosts interfaces were set up as I desired. 


The issue came when doing the “kayobe overcloud service deploy” and failed with "/usr/bin/python3", "-c", "import docker” = ModuleNotFoundError: No module named 'docker' for all three nodes, where previously it (the deployment) had been successful for the all-in-one node. I do not know if this task had run or skipped before but the task is run against "baremetal" group and controllers and compute are in this group so I assume that it had been ran successfully in previous deployments and this is the weird thing because no other changes have been made apart from as described here.

Verbose error output: [3]


After the above error, I reverted the inventory back to the “working” state, which is basically to update the inventory hosts and removed the 2 controllers. As well as remove the whole host_vars directory. After doing this however, the same error is still seen /usr/bin/python3", "-c", "import docker” = ModuleNotFoundError. 


I logged into the host and tried to run this manually on the CLI and I see the same output. What I don’t understand is why this error is occurring now after previous successful deployments. 


To try and resolve /workaround this issue I have tried to no avail:
- recreating virtual environments on all-in-one node

- recreating virtual environments on ACH

- deleting the [kolla config] directory

- deleting .ansible and /tmp/ caches

- turning off pipelining


After doing the above I needed to do the control host bootstrap and host configure before service deploy however the same error persisted and I could not work around it with any of the above steps being performed.


As a test, I decided to turn off this task in the playbook [4] and the yml file runs as follows: [2]. This results in a (maybe pseudo) successful deployment again, in a sense that it deploys without failure because that task does not run. 

After this was successful in deploying once again as it had previously had been, I added the two controller nodes using the “host_vars” and then I was able to successfully deploy again with HA controllers. Well, it is successful apart from Designate issue due to Designate already having the config [5]. I can log in to the horizon dashboard and under system information I can see all three controllers there.


Could I ask the community for help with:

  1. Regarding the kayobe inventory, is anything wrong with the 2nd attempt in line with Kayobe?

  2. Has anyone come across this docker issue (or similar within this context of failing after being successful) and can suggest? 


I repeatedly get these odd issues where successful deployments then fail in the future. This often occurs after making a config change and then rolling back but the roll back does not return to a working deployment state. The fix/workaround for me in these cases is to “kayobe overcloud service destroy --yes-i-really-really-mean-it” and also re-deploy the host.


[1] Best Practices — Ansible Documentation

[2] modified Checking docker SDK version# command: "{{ ansible_python.execut - Pastebin.com

[3] TASK [prechecks : Checking docker SDK version] ********************************* - Pastebin.com

[4] /home/cv-user/kayobe-victoria/venvs/kolla-ansible/share/kolla-ansible/ansible/roles/prechecks/tasks/package_checks.yml

[5] TASK [designate : Update DNS pools] ******************************************** - Pastebin.com


Kind regards,