[tripleO] Customised Cinder-Volume fails at 'Paunch 5' during overcloud deployment
Hello all, I am trying to deploy RHOSP16.1 (based on ‘train’ distribution) for Certification purposes. I have build a container for our cinder driver and trying to deploy it. Deployment runs almost till the end and fails at stage when it tries to configure Pacemaker; Here is the last message: "Info: Applying configuration version '1609231063'", "Notice: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]/ensure: created", "Info: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]: Scheduling refresh of Service[pcsd]", "Info: /Stage[main]/Pacemaker::Service/Service[pcsd]: Unscheduling all events on Service[pcsd]", "Info: Class[Pacemaker::Corosync]: Unscheduling all events on Class[Pacemaker::Corosync]", "Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Cinder::Volume_bundle/Pacemaker::Resource::Bundle[openstack-cinder-volume]/Pcmk_bundle[openstack-cinder-volume]: Dependency Pcmk_property[property-overcloud-controller-0-cinder-volume-role] has failures: true", "Info: Creating state file /var/lib/puppet/state/state.yaml", "Notice: Applied catalog in 382.92 seconds", "Changes:", " Total: 1", "Events:", " Success: 1", " Failure: 2", " Total: 3", I have verified that all packages on my container-image (Pacemaker,Corosync, libqb,and pcs) are installed with same versions as the overcloud-controller. But seems that something is still missing, because deployment with the default openstack-cinder-volume image completes successfully. Can anyone help with debugging this? Let me know if more info needed. Thanks in advance, Igal
On Thu, Dec 31, 2020 at 5:26 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello all,
I am trying to deploy RHOSP16.1 (based on ‘*train’ *distribution) for Certification purposes. I have build a container for our cinder driver and trying to deploy it. Deployment runs almost till the end and fails at stage when it tries to configure Pacemaker; Here is the last message:
"Info: Applying configuration version '1609231063'", "Notice: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]/ensure: created", "Info: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]: Scheduling refresh of Service[pcsd]", "Info: /Stage[main]/Pacemaker::Service/Service[pcsd]: Unscheduling all events on Service[pcsd]", "Info: Class[Pacemaker::Corosync]: Unscheduling all events on Class[Pacemaker::Corosync]", "Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Cinder::Volume_bundle/Pacemaker::Resource::Bundle[openstack-cinder-volume]/Pcmk_bundle[openstack-cinder-volume]: Dependency Pcmk_property[property-overcloud-controller-0-cinder-volume-role] has failures: true", "Info: Creating state file /var/lib/puppet/state/state.yaml", "Notice: Applied catalog in 382.92 seconds", "Changes:", " Total: 1", "Events:", " Success: 1", " Failure: 2", " Total: 3",
I have verified that all packages on my container-image (Pacemaker,Corosync, libqb,and pcs) are installed with same versions as the overcloud-controller.
Hi Igal, Thank you for checking these package versions and stating they match the ones installed on the overcloud node. This rules out one of the common reasons for failures when trying to run a customized cinder-volume container image. But seems that something is still missing, because deployment with the
default openstack-cinder-volume image completes successfully.
This is also good to know. Can anyone help with debugging this? Let me know if more info needed.
More info is needed, but it's hard to predict exactly where to look for the root cause of the failure. I'd start by looking for something at the cinder log file to determine whether the cinder-volume service is even trying to start. Look for /var/log/containers/cinder/cinder-volume.log on the node where pacemaker is trying to run the service. Are there logs indicating the service is trying to start? Or maybe the service is launched, but fails early during startup? Another possibility is podman fails to launch the container itself. If that's happening then check for errors in /var/log/messages. One source of this type of failure is you've specified a container bind mount, but the source directory doesn't exist (docker would auto-create the source directory, but podman does not). You specifically mentioned RHOSP, so if you need additional support then I recommend opening a support case with Red Hat. That will provide a forum for posting private data, such as details of your overcloud deployment and full sosreports. Alan
Thanks in advance, Igal
Hello Alan, Thanks for your reply! I am afraid that the reason for my deployment failure might be concerned with the environment file I use to configure my cinder backend. The configuration is quite similar to https://github.com/Infinidat/tripleo-deployment-configs/blob/dev/RHOSP15/cin... So I wonder if it is possible to run a deployment where I tell 'TripleO' to use my customize container, using containers-prepare-parameter.yaml, but without the environment file =cinder-infinidat-config.yaml, and configure the backend / start cinder-volume services manually? Or I must have a minimum config as I find in: '/usr/share/openstack-tripleo-heat-templates/deployment/cinder/' (for other vendors)? If I do need such a cinder-volume-VENDOR-puppet.yaml config to be integrated during overcloud deployment, where is documentation that explains how to construct this? Do I need to use cinder-base.yaml as a template? When looking at the web for "cinder-volume-container-puppet.yaml" I found the Git Page of overcloud-resource-registry-puppet.j2.yaml <https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-resource-registry-puppet.j2.yaml> and found also https://opendev.org/openstack/tripleo-heat-templates/../deployment <https://opendev.org/openstack/tripleo-heat-templates/src/commit/fffdcf0f3059a4f1146ec533f51a65442a105092/deployment> but it is not so explanatory. I have opened a case with RedHat as well and they are checking who from their R&D could help since it's out of the scope of support. Regards, Igal On Thu, Dec 31, 2020 at 9:15 PM Alan Bishop <abishop@redhat.com> wrote:
On Thu, Dec 31, 2020 at 5:26 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello all,
I am trying to deploy RHOSP16.1 (based on ‘*train’ *distribution) for Certification purposes. I have build a container for our cinder driver and trying to deploy it. Deployment runs almost till the end and fails at stage when it tries to configure Pacemaker; Here is the last message:
"Info: Applying configuration version '1609231063'", "Notice: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]/ensure: created", "Info: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]: Scheduling refresh of Service[pcsd]", "Info: /Stage[main]/Pacemaker::Service/Service[pcsd]: Unscheduling all events on Service[pcsd]", "Info: Class[Pacemaker::Corosync]: Unscheduling all events on Class[Pacemaker::Corosync]", "Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Cinder::Volume_bundle/Pacemaker::Resource::Bundle[openstack-cinder-volume]/Pcmk_bundle[openstack-cinder-volume]: Dependency Pcmk_property[property-overcloud-controller-0-cinder-volume-role] has failures: true", "Info: Creating state file /var/lib/puppet/state/state.yaml", "Notice: Applied catalog in 382.92 seconds", "Changes:", " Total: 1", "Events:", " Success: 1", " Failure: 2", " Total: 3",
I have verified that all packages on my container-image (Pacemaker,Corosync, libqb,and pcs) are installed with same versions as the overcloud-controller.
Hi Igal,
Thank you for checking these package versions and stating they match the ones installed on the overcloud node. This rules out one of the common reasons for failures when trying to run a customized cinder-volume container image.
But seems that something is still missing, because deployment with the
default openstack-cinder-volume image completes successfully.
This is also good to know.
Can anyone help with debugging this? Let me know if more info needed.
More info is needed, but it's hard to predict exactly where to look for the root cause of the failure. I'd start by looking for something at the cinder log file to determine whether the cinder-volume service is even trying to start. Look for /var/log/containers/cinder/cinder-volume.log on the node where pacemaker is trying to run the service. Are there logs indicating the service is trying to start? Or maybe the service is launched, but fails early during startup?
Another possibility is podman fails to launch the container itself. If that's happening then check for errors in /var/log/messages. One source of this type of failure is you've specified a container bind mount, but the source directory doesn't exist (docker would auto-create the source directory, but podman does not).
You specifically mentioned RHOSP, so if you need additional support then I recommend opening a support case with Red Hat. That will provide a forum for posting private data, such as details of your overcloud deployment and full sosreports.
Alan
Thanks in advance, Igal
-- Regards, *Igal Katzir* Cell +972-54-5597086 Interoperability Team *INFINIDAT*
On Mon, Jan 4, 2021 at 5:31 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello Alan, Thanks for your reply!
I am afraid that the reason for my deployment failure might be concerned with the environment file I use to configure my cinder backend. The configuration is quite similar to https://github.com/Infinidat/tripleo-deployment-configs/blob/dev/RHOSP15/cin... So I wonder if it is possible to run a deployment where I tell 'TripleO' to use my customize container, using containers-prepare-parameter.yaml, but without the environment file =cinder-infinidat-config.yaml, and configure the backend / start cinder-volume services manually?
No, your cinder-infinidat-config.yaml file looks fine. It's responsible for getting TripleO to configure cinder to use your driver, and that phase was completed successfully prior to the deployment failure.
Or I must have a minimum config as I find in: '/usr/share/openstack-tripleo-heat-templates/deployment/cinder/' (for other vendors)? If I do need such a cinder-volume-VENDOR-puppet.yaml config to be integrated during overcloud deployment, where is documentation that explains how to construct this? Do I need to use cinder-base.yaml as a template? When looking at the web for "cinder-volume-container-puppet.yaml" I found the Git Page of overcloud-resource-registry-puppet.j2.yaml
<https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-resource-registry-puppet.j2.yaml> and found also https://opendev.org/openstack/tripleo-heat-templates/../deployment <https://opendev.org/openstack/tripleo-heat-templates/src/commit/fffdcf0f3059a4f1146ec533f51a65442a105092/deployment> but it is not so explanatory.
Your cinder-infinidat-config.yaml uses a low-level puppet mechanism for configuring what's referred to as a "custom" block storage backend. This is perfectly fine. If you want better integration with TripleO (and puppet) then you'll need to develop 3 separate patches, 1 each in puppet-cinder, puppet-tripleo and tripleo-heat-templates. Undertaking that would be a good future goal, but isn't necessary in order for you to get past your current deployment issue.
I have opened a case with RedHat as well and they are checking who from their R&D could help since it's out of the scope of support.
I think you're starting to see responses from Red Hat that should help identify and resolve the problem. Alan
Regards, Igal
On Thu, Dec 31, 2020 at 9:15 PM Alan Bishop <abishop@redhat.com> wrote:
On Thu, Dec 31, 2020 at 5:26 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello all,
I am trying to deploy RHOSP16.1 (based on ‘*train’ *distribution) for Certification purposes. I have build a container for our cinder driver and trying to deploy it. Deployment runs almost till the end and fails at stage when it tries to configure Pacemaker; Here is the last message:
"Info: Applying configuration version '1609231063'", "Notice: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]/ensure: created", "Info: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]: Scheduling refresh of Service[pcsd]", "Info: /Stage[main]/Pacemaker::Service/Service[pcsd]: Unscheduling all events on Service[pcsd]", "Info: Class[Pacemaker::Corosync]: Unscheduling all events on Class[Pacemaker::Corosync]", "Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Cinder::Volume_bundle/Pacemaker::Resource::Bundle[openstack-cinder-volume]/Pcmk_bundle[openstack-cinder-volume]: Dependency Pcmk_property[property-overcloud-controller-0-cinder-volume-role] has failures: true", "Info: Creating state file /var/lib/puppet/state/state.yaml", "Notice: Applied catalog in 382.92 seconds", "Changes:", " Total: 1", "Events:", " Success: 1", " Failure: 2", " Total: 3",
I have verified that all packages on my container-image (Pacemaker,Corosync, libqb,and pcs) are installed with same versions as the overcloud-controller.
Hi Igal,
Thank you for checking these package versions and stating they match the ones installed on the overcloud node. This rules out one of the common reasons for failures when trying to run a customized cinder-volume container image.
But seems that something is still missing, because deployment with the
default openstack-cinder-volume image completes successfully.
This is also good to know.
Can anyone help with debugging this? Let me know if more info needed.
More info is needed, but it's hard to predict exactly where to look for the root cause of the failure. I'd start by looking for something at the cinder log file to determine whether the cinder-volume service is even trying to start. Look for /var/log/containers/cinder/cinder-volume.log on the node where pacemaker is trying to run the service. Are there logs indicating the service is trying to start? Or maybe the service is launched, but fails early during startup?
Another possibility is podman fails to launch the container itself. If that's happening then check for errors in /var/log/messages. One source of this type of failure is you've specified a container bind mount, but the source directory doesn't exist (docker would auto-create the source directory, but podman does not).
You specifically mentioned RHOSP, so if you need additional support then I recommend opening a support case with Red Hat. That will provide a forum for posting private data, such as details of your overcloud deployment and full sosreports.
Alan
Thanks in advance, Igal
-- Regards,
*Igal Katzir* Cell +972-54-5597086 Interoperability Team *INFINIDAT*
Just an update on this issue; The problem was fixed after I removed the following line from my Dockerfile- 'RUN pip install --no-cache-dir -U setuptools' Apparently, This caused a problem to run /usr/sbin/pcs command which is required during overcloud deployment. Another question I have is about re-starting a container, if I have re-built the openstack-cinder-volume image on my overcloud-controller and want to test something, how can I start the container from the new image? Do I need to redeploy the entire overcloud? (it doesn't make sense) Thanks for the help, Igal On Mon, Jan 4, 2021 at 6:44 PM Alan Bishop <abishop@redhat.com> wrote:
On Mon, Jan 4, 2021 at 5:31 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello Alan, Thanks for your reply!
I am afraid that the reason for my deployment failure might be concerned with the environment file I use to configure my cinder backend. The configuration is quite similar to https://github.com/Infinidat/tripleo-deployment-configs/blob/dev/RHOSP15/cin... So I wonder if it is possible to run a deployment where I tell 'TripleO' to use my customize container, using containers-prepare-parameter.yaml, but without the environment file =cinder-infinidat-config.yaml, and configure the backend / start cinder-volume services manually?
No, your cinder-infinidat-config.yaml file looks fine. It's responsible for getting TripleO to configure cinder to use your driver, and that phase was completed successfully prior to the deployment failure.
Or I must have a minimum config as I find in: '/usr/share/openstack-tripleo-heat-templates/deployment/cinder/' (for other vendors)? If I do need such a cinder-volume-VENDOR-puppet.yaml config to be integrated during overcloud deployment, where is documentation that explains how to construct this? Do I need to use cinder-base.yaml as a template? When looking at the web for "cinder-volume-container-puppet.yaml" I found the Git Page of overcloud-resource-registry-puppet.j2.yaml
<https://github.com/openstack/tripleo-heat-templates/blob/master/overcloud-resource-registry-puppet.j2.yaml> and found also https://opendev.org/openstack/tripleo-heat-templates/../deployment <https://opendev.org/openstack/tripleo-heat-templates/src/commit/fffdcf0f3059a4f1146ec533f51a65442a105092/deployment> but it is not so explanatory.
Your cinder-infinidat-config.yaml uses a low-level puppet mechanism for configuring what's referred to as a "custom" block storage backend. This is perfectly fine. If you want better integration with TripleO (and puppet) then you'll need to develop 3 separate patches, 1 each in puppet-cinder, puppet-tripleo and tripleo-heat-templates. Undertaking that would be a good future goal, but isn't necessary in order for you to get past your current deployment issue.
I have opened a case with RedHat as well and they are checking who from their R&D could help since it's out of the scope of support.
I think you're starting to see responses from Red Hat that should help identify and resolve the problem.
Alan
Regards, Igal
On Thu, Dec 31, 2020 at 9:15 PM Alan Bishop <abishop@redhat.com> wrote:
On Thu, Dec 31, 2020 at 5:26 AM Igal Katzir <ikatzir@infinidat.com> wrote:
Hello all,
I am trying to deploy RHOSP16.1 (based on ‘*train’ *distribution) for Certification purposes. I have build a container for our cinder driver and trying to deploy it. Deployment runs almost till the end and fails at stage when it tries to configure Pacemaker; Here is the last message:
"Info: Applying configuration version '1609231063'", "Notice: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]/ensure: created", "Info: /Stage[main]/Pacemaker::Corosync/File_line[pcsd_bind_addr]: Scheduling refresh of Service[pcsd]", "Info: /Stage[main]/Pacemaker::Service/Service[pcsd]: Unscheduling all events on Service[pcsd]", "Info: Class[Pacemaker::Corosync]: Unscheduling all events on Class[Pacemaker::Corosync]", "Notice: /Stage[main]/Tripleo::Profile::Pacemaker::Cinder::Volume_bundle/Pacemaker::Resource::Bundle[openstack-cinder-volume]/Pcmk_bundle[openstack-cinder-volume]: Dependency Pcmk_property[property-overcloud-controller-0-cinder-volume-role] has failures: true", "Info: Creating state file /var/lib/puppet/state/state.yaml", "Notice: Applied catalog in 382.92 seconds", "Changes:", " Total: 1", "Events:", " Success: 1", " Failure: 2", " Total: 3",
I have verified that all packages on my container-image (Pacemaker,Corosync, libqb,and pcs) are installed with same versions as the overcloud-controller.
Hi Igal,
Thank you for checking these package versions and stating they match the ones installed on the overcloud node. This rules out one of the common reasons for failures when trying to run a customized cinder-volume container image.
But seems that something is still missing, because deployment with the
default openstack-cinder-volume image completes successfully.
This is also good to know.
Can anyone help with debugging this? Let me know if more info needed.
More info is needed, but it's hard to predict exactly where to look for the root cause of the failure. I'd start by looking for something at the cinder log file to determine whether the cinder-volume service is even trying to start. Look for /var/log/containers/cinder/cinder-volume.log on the node where pacemaker is trying to run the service. Are there logs indicating the service is trying to start? Or maybe the service is launched, but fails early during startup?
Another possibility is podman fails to launch the container itself. If that's happening then check for errors in /var/log/messages. One source of this type of failure is you've specified a container bind mount, but the source directory doesn't exist (docker would auto-create the source directory, but podman does not).
You specifically mentioned RHOSP, so if you need additional support then I recommend opening a support case with Red Hat. That will provide a forum for posting private data, such as details of your overcloud deployment and full sosreports.
Alan
Thanks in advance, Igal
-- Regards,
*Igal Katzir* Cell +972-54-5597086 Interoperability Team *INFINIDAT*
-- Regards, *Igal Katzir* Cell +972-54-5597086 Interoperability Team *INFINIDAT*
participants (2)
-
Alan Bishop
-
Igal Katzir