[TripleO] Unable to deploy Overcloud Machines

Yatin Karel ykarel at redhat.com
Tue Dec 28 11:27:52 UTC 2021


Hi Anirudh,

On Tue, Dec 28, 2021 at 4:32 PM Anirudh Gupta <anyrude10 at gmail.com> wrote:

> Hi Yatin,
>
> Thanks a lot for your help. I am deleting the stack and running the
> overcloud deploy command as a process.
>
> Changing the NTP settings worked for me in proceeding ahead.
>
The current result looks good, it moved ahead a lot. As per the latest
logs/error I feel you still have issues with NTP[1], you can check/confirm
by running "chrony sources" command on overcloud nodes. The current issue
should be clear once you have working NTP. Once you check/fix NTP thing you
can rerun the same overcloud deploy command and it should move forward.

[1] https://bugs.launchpad.net/tripleo/+bug/1955414

>
> But it seems the issues are not ending here.
>
> I would require some more help from you in order to deploy this.
>
> *Issue:*
>
> FATAL | Check Keystone service status | undercloud | item=heat-cfn |
> error={"ansible_job_id": "687227427425.307276", "ansible_loop_var":
> "tripleo_keystone_resources_service_async_result_item", "attempts": 1,
> "changed": false, "extra_data": {"data": null, "details": "The request you
> have made requires authentication.", "response":
> "{\"error\":{\"code\":401,\"message\":\"The request you have made requires
> authentication.\",\"title\":\"Unauthorized\"}}\n"}, "finished": 1, "msg":
> "Failed to list services: Client Error for url:
> http://10.10.30.222:5000/v3/services, *The request you have made requires
> authentication.",*
> "tripleo_keystone_resources_service_async_result_item": {"ansible_job_id":
> "687227427425.307276", "ansible_loop_var":
> "tripleo_keystone_resources_data", "changed": true, "failed": false,
> "finished": 0, "results_file": "/root/.ansible_async/687227427425.307276",
> "started": 1, "tripleo_keystone_resources_data": {"key": "heat-cfn",
> "value": {"endpoints": {"admin": "http://10.10.30.222:8000/v1",
> "internal": "http://10.10.30.222:8000/v1", "public": "
> http://10.10.30.222:8000/v1"}, "region": "regionOne", "service":
> "cloudformation", "users": {"heat-cfn": {"password":
> "3f3tHhxhna1CpRVPMjF7po49F"}}}}}}
>
>
> PFA the ansible.log file.
>
> Thanks your help and Patience.
>
> Regards
> Anirudh Gupta
>
> On Tue, Dec 28, 2021 at 2:28 PM Yatin Karel <ykarel at redhat.com> wrote:
>
>> Hi Anirudh,
>>
>> Not sure what can cause this issue, and also the shared log file is
>> incomplete. So I believe you tried the command on the same overcloud
>> deployment which was failing earlier(when docker-ha.yaml was not passed).
>> If yes, to rule out if the issue is caused by an already deployed
>> environment can delete the overcloud and then redeploy with correct
>> environment files as used in the last run.
>>
>> One reason for the password expiration that i found could be the Time is
>> not in Sync on the overcloud nodes. So it would be good to check that as
>> well and fix(by using correct NTP sources) before attempting redeployment.
>>
>> Thanks and regards
>> Yatin Karel
>>
>>
>>
>> On Tue, Dec 28, 2021 at 2:03 PM Anirudh Gupta <anyrude10 at gmail.com>
>> wrote:
>>
>>> Hi Yatin & Team
>>>
>>> Thanks for your response.
>>>
>>> When I executed the command as below, the installation moved ahead and
>>> encountered another error.
>>>
>>> openstack overcloud deploy --templates \
>>>     -r /home/stack/templates/roles_data.yaml \
>>>     -e /home/stack/templates/node-info.yaml \
>>>     -e environment.yaml \
>>>     -e
>>> /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml \
>>>     -e
>>> /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \
>>>     -e /home/stack/containers-prepare-parameter.yaml
>>>
>>> *Issue:*
>>> The error was: keystoneauth1.exceptions.http.Unauthorized: *The
>>> password is expired and needs to be changed for user*:
>>> 4f7d1dbf58574e64af9e359cb98ccbbc. (HTTP 401) (Request-ID:
>>> req-b29aa655-e3ec-4d4b-8ada-397f9a132582)
>>>
>>> I am attaching the ansible.logs for your reference. It would be a great
>>> help if you could suggest some pointers to resolve this issue.
>>>
>>> Regards
>>> Anirudh Gupta
>>>
>>> On Tue, Dec 28, 2021 at 11:13 AM Yatin Karel <ykarel at redhat.com> wrote:
>>>
>>>> Hi Anirudh,
>>>>
>>>> As said order is important here, docker-ha.yaml should be followed by
>>>> podman.yaml, the parameters in environment files override the parameters
>>>> from previous environment files passed and that would make deployment to
>>>> use podman instead of docker. Name of the parameter to which makes this
>>>> switch is "ContainerCli".
>>>>
>>>>
>>>> Thanks and regards
>>>> Yatin Karel
>>>>
>>>> On Tue, Dec 28, 2021 at 10:59 AM Anirudh Gupta <anyrude10 at gmail.com>
>>>> wrote:
>>>>
>>>>> If this is a docker-ha issue, then that has also been tried.
>>>>>
>>>>> Since this is Centos 8, there is no docker available. If I pass the
>>>>> docker-ha.yml, then it gives the following error
>>>>>
>>>>> FATAL | Pull
>>>>> undercloud.ctlplane.localdomain:8787/tripleotraincentos8/centos-binary-cinder-volume:current-tripleo
>>>>> image | overcloud-controller-1 | error={"changed": true, "cmd": "docker
>>>>> pull
>>>>> undercloud.ctlplane.localdomain:8787/tripleotraincentos8/centos-binary-cinder-volume:current-tripleo",
>>>>> "delta": "0:00:00.005932", "end": "2021-12-27 12:42:33.927484", "msg":
>>>>> "non-zero return code", "rc": 127, "start": "2021-12-27 12:42:33.921552",
>>>>> "stderr": "/bin/sh: docker: command not found", "*stderr_lines":
>>>>> ["/bin/sh: docker: command not found"], "stdout": "", "stdout_lines": []}*
>>>>>
>>>>> Regards
>>>>> Anirudh Gupta
>>>>>
>>>>> On Tue, Dec 28, 2021 at 10:26 AM Yatin Karel <ykarel at redhat.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Anirudh,
>>>>>>
>>>>>> Sorry which timer? Timer adjustment is not needed for the issue you
>>>>>> are seeing, if you mean overcloud deploy timeout then overcloud deploy
>>>>>> provides the option to do so using --timeout option. The best option for
>>>>>> now is to try docker-ha and podman in order as suggested earlier.
>>>>>>
>>>>>>
>>>>>> Thanks and Regards
>>>>>> Yatin Karel
>>>>>>
>>>>>> On Tue, Dec 28, 2021 at 10:12 AM Anirudh Gupta <anyrude10 at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks Yatin for your response.
>>>>>>>
>>>>>>> Please suggest how can this timer be increased or any other steps
>>>>>>> that needs to be followed to rectify this?
>>>>>>>
>>>>>>> Regards
>>>>>>> Anirudh Gupta
>>>>>>>
>>>>>>> On Tue, Dec 28, 2021 at 10:08 AM Yatin Karel <ykarel at redhat.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Hi Anirudh,
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Dec 27, 2021 at 9:39 PM Anirudh Gupta <anyrude10 at gmail.com>
>>>>>>>> wrote:
>>>>>>>> >
>>>>>>>> > Hi Team,
>>>>>>>> >
>>>>>>>> > I am trying to deploy TripleO Train with 3 controller and 1
>>>>>>>> Compute.
>>>>>>>> > For overcloud images, I have a registry server at undercloud only.
>>>>>>>> >
>>>>>>>> > I executed the following command to deploy overcloud
>>>>>>>> >
>>>>>>>> > openstack overcloud deploy --templates \
>>>>>>>> >     -r /home/stack/templates/roles_data.yaml \
>>>>>>>> >     -e /home/stack/templates/node-info.yaml \
>>>>>>>> >     -e
>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml \
>>>>>>>> >     -e /home/stack/containers-prepare-parameter.yaml
>>>>>>>> >
>>>>>>>> > The command ran for around 1.5 hrs and initially stack got
>>>>>>>> successfully created and after that for 45 mins, ansible tasks were getting
>>>>>>>> executed. It then gave following error in overcloud-controller-0
>>>>>>>> >
>>>>>>>> > 2021-12-27 11:12:27,507 p=181 u=mistral n=ansible | 2021-12-27
>>>>>>>> 11:12:27.506838 | 525400b1-b522-2a06-ea9d-00000000356f |         OK | Debug
>>>>>>>> output for task: Start containers for step 2 | overcloud-novacompute-0 |
>>>>>>>> result={
>>>>>>>> >     "changed": false,
>>>>>>>> >     "failed_when_result": false,
>>>>>>>> >     "start_containers_outputs.stdout_lines | default([]) |
>>>>>>>> union(start_containers_outputs.stderr_lines | default([]))": [
>>>>>>>> >
>>>>>>>> "f206c31a781641313aa4a0499c62475efc335de6faea785cd4e855dc32ebb571",
>>>>>>>> >         "",
>>>>>>>> >         "Info: Loading facts",
>>>>>>>> >         "Notice: Compiled catalog for
>>>>>>>> overcloud-novacompute-0.localdomain in environment production in 0.05
>>>>>>>> seconds",
>>>>>>>> >         "Info: Applying configuration version '1640604309'",
>>>>>>>> >         "Notice:
>>>>>>>> /Stage[main]/Tripleo::Profile::Base::Neutron::Ovn_metadata_agent_wrappers/Tripleo::Profile::Base::Neutron::Wrappers::Haproxy[ovn_metadata_haproxy_process_wrapper]/File[/var/lib/neutron/ovn_metadata_haproxy_wrapper]/ensure:
>>>>>>>> defined content as '{md5}5bb050ca70c01981975efad9d8f81f2d'",
>>>>>>>> >         "Info:
>>>>>>>> Tripleo::Profile::Base::Neutron::Wrappers::Haproxy[ovn_metadata_haproxy_process_wrapper]:
>>>>>>>> Unscheduling all events on
>>>>>>>> Tripleo::Profile::Base::Neutron::Wrappers::Haproxy[ovn_metadata_haproxy_process_wrapper]",
>>>>>>>> >         "Info: Creating state file
>>>>>>>> /var/lib/puppet/state/state.yaml",
>>>>>>>> >         "Notice: Applied catalog in 0.01 seconds",
>>>>>>>> >         "Changes:",
>>>>>>>> >         "            Total: 1",
>>>>>>>> >         "Events:",
>>>>>>>> >         "          Success: 1",
>>>>>>>> >         "Resources:",
>>>>>>>> >         "          Changed: 1",
>>>>>>>> >         "      Out of sync: 1",
>>>>>>>> >         "          Skipped: 7",
>>>>>>>> >         "            Total: 8",
>>>>>>>> >         "Time:",
>>>>>>>> >         "             File: 0.00",
>>>>>>>> >         "   Transaction evaluation: 0.01",
>>>>>>>> >         "   Catalog application: 0.01",
>>>>>>>> >         "   Config retrieval: 0.09",
>>>>>>>> >         "         Last run: 1640604309",
>>>>>>>> >         "            Total: 0.01",
>>>>>>>> >          "Version:",
>>>>>>>> >         "           Config: 1640604309",
>>>>>>>> >         "           Puppet: 5.5.10",
>>>>>>>> >         "Error executing ['podman', 'container', 'exists',
>>>>>>>> 'nova_compute_init_log']: returned 1",
>>>>>>>> >         "Did not find container with \"['podman', 'ps', '-a',
>>>>>>>> '--filter', 'label=container_name=nova_compute_init_log', '--filter',
>>>>>>>> 'label=config_id=tripleo_step2', '--format', '{{.Names}}']\" - retrying
>>>>>>>> without config_id",
>>>>>>>> >         "Did not find container with \"['podman', 'ps', '-a',
>>>>>>>> '--filter', 'label=container_name=nova_compute_init_log', '--format',
>>>>>>>> '{{.Names}}']\"",
>>>>>>>> >         "Error executing ['podman', 'container', 'exists',
>>>>>>>> 'create_haproxy_wrapper']: returned 1",
>>>>>>>> >         "Did not find container with \"['podman', 'ps', '-a',
>>>>>>>> '--filter', 'label=container_name=create_haproxy_wrapper', '--filter',
>>>>>>>> 'label=config_id=tripleo_step2', '--format', '{{.Names}}']\" - retrying
>>>>>>>> without config_id",
>>>>>>>> >         "Did not find container with \"['podman', 'ps', '-a',
>>>>>>>> '--filter', 'label=container_name=create_haproxy_wrapper', '--format',
>>>>>>>> '{{.Names}}']\""
>>>>>>>> >     ]
>>>>>>>> > }
>>>>>>>>
>>>>>>>> This is not the actual error, actual error is: puppet-user: Error:
>>>>>>>> /Stage[main]/Tripleo::Profile::Base::Rabbitmq/Rabbitmq_policy[ha-all@/]:
>>>>>>>> Could not evaluate: Command is still failing after 180 seconds expired!"
>>>>>>>>
>>>>>>>> >
>>>>>>>> > I am also attaching ansible.log file for more information.
>>>>>>>> >
>>>>>>>> > Note: On Centos 8, there is no docker, so I didn't pass
>>>>>>>> docker-ha.yml
>>>>>>>> For enabling HA and with podman in Train on CentOS8, you need to
>>>>>>>> pass both docker-ha.yaml and podman.yaml in order(*order is
>>>>>>>> important here*, so -e
>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/docker-ha.yaml -e
>>>>>>>> /usr/share/openstack-tripleo-heat-templates/environments/podman.yaml), this
>>>>>>>> way you will have deployment with HA and podman, i agree docker-ha name is
>>>>>>>> confusing here with podman but that has to be passed here to get the
>>>>>>>> required deployment. Also with Ussuri+ HA is turned on by default so those
>>>>>>>> releases may work even without passing docker-ha.yaml but for Train at
>>>>>>>> least it's needed.
>>>>>>>> >
>>>>>>>> > Can someone please help in resolving my issue
>>>>>>>> >
>>>>>>>> As per your requirement I would suggest running with the above
>>>>>>>> config.
>>>>>>>>
>>>>>>>> > Regards
>>>>>>>> > Anirudh Gupta
>>>>>>>>
>>>>>>>> Thanks and Regards
>>>>>>>> Yatin Karel
>>>>>>>>
>>>>>>>

Thanks and Regards
Yatin Karel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20211228/77fd28fa/attachment-0001.htm>


More information about the openstack-discuss mailing list