[tripleo] Deprecating tripleo-heat-templates firstboot
Soon nova will be switched off by default on the undercloud and all overcloud deployments will effectively be deployed-server based (either provisioned manually or via the baremetal provision command) This means that the docs for running firstboot scripts[1] will no longer work, and neither will our collection of firstboot scripts[2]. In this email I'm going to propose what we could do about this situation and if there are still unresolved issues by the PTG it might be worth having a short session on it. The baremetal provisioning implementation already uses cloud-init cloud-config internally for user creation and key injection[3] so I'm going to propose an enhancement the the baremetal provisioning yaml format so that custom cloud-config instructions can be included either inline or as a file path. I think it is worth going through each firstboot script[2] and deciding what its fate should be (other than being deprecated in Ussuri): https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/first... This has no parameters, so it could be converted to a standalone cloud-config file, but should it? Can this be achieved with kernel args? Does it require a reboot anyway, and so can be done with extraconfig? https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/first... I'm not sure why this is implemented as first boot, it seems to consume the parameter |NetConfigDataLookup and transforms it to the format os-net-config needs for the file ||/etc/os-net-config/mapping.yaml. It looks like this functionality should be moved to where os-net-config is actually invoked, and the |||NetConfigDataLookup parameter should be officially supported. || | | |https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/firstboot/userdata_dev_rsync.yaml| |I suggest deleting this and including a cloud-config version in the baremetal provisioning docs.| |||| |https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/firstboot/userdata_heat_admin.yaml| |Delete this, there is already an abstraction for this built into the baremetal provisioning format[4]| | | |https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/firstboot/userdata_root_password.yaml| |Delete this and include it as an example in the baremetal provisioning docs.| | | |https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/firstboot/userdata_timesync.yaml| |Maybe this could be converted to an extraconfig/all_nodes script[5], but it would be better if this sort of thing could be implemented as an ansible role or playbook, are there any plans for an extraconfig mechanism which uses plain ansible semantics?| | | |cheers | [1] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features... [2] https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/first... [3] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansi... [4] https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansi... https://opendev.org/openstack/tripleo-ansible/src/branch/master/tripleo_ansi... [5] https://opendev.org/openstack/tripleo-heat-templates/src/branch/master/extra...
On Tue, 2020-05-26 at 11:32 +1200, Steve Baker wrote:
If we had gone the other way around, and done the Heat Stack with "dummy" server resources before deploying baremetal we could have done this seamless, i.e passed these cloud-configs based on the stack to the baremetal provisioning yaml's extention you mention below. But that train departed, long ago ... Do we need to add some deprecation and/or validation? Something that ensure we stop the deployment in case one of the resources OS::TripleO::NodeAdminUserData, OS::TripleO::NodeTimesyncUserData, OS::TripleO::NodeUserData or OS::TripleO::{{role.name}}::NodeUserData is defined in the resource registry, with a pointer to docs on how to move it to the baremetal provisioning yaml, or extraconfig.
++
Maybe this could be done using this module in ansible instead: https://docs.ansible.com/ansible/latest/modules/modprobe_module.html#modprob...
I agree, I have been thinking about moving this for a while actually.
+1
+1
+1
On 27/05/20 5:30 am, Harald Jensås wrote:
I think if OS::TripleO::*Server: is not mapped to OS::Nova::Server then the deployment should halt with a message if OS::TripleO::NodeUserData or OS::TripleO::{{role.name}}::NodeUserData are mapped to something other than userdata_default.yaml As for OS::TripleO::NodeTimesyncUserData, it looks like this functionality is duplicated by deployment/timesync/chrony-baremetal-ansible.yaml which is mapped to OS::TripleO::Services::Timesync and included in every role, but NodeTimesyncUserData was added recently to handle some early config timestamp issues: https://opendev.org/openstack/tripleo-heat-templates/commit/eafe3908535ec766... https://bugs.launchpad.net/tripleo/+bug/1776869 Maybe this becomes less of an issue with no other config tasks happening at first boot, I've tagged in Alex for his thoughts. One option could be to enable and configure chrony during overcloud-full image build, then document how to disable it or change the ntp servers in cloud-config?
On Tue, May 26, 2020 at 3:34 PM Steve Baker <sbaker@redhat.com> wrote:
Since we configure NTP/chrony during the host_prep_tasks phase so that should be sufficient now. The original issue that we were attempting to fix with that was the read-only errors out of docker. We later learned that that issue was likely caused by a docker-puppet.py issue where we copied files that we read-only mounted. It's likely safe to delete this however, I would say it might be beneficial to include basic ntp or hwclock functionality in the new provision system on the off chance a user needs to do those prior to any configurations on the host. I believe we've cleaned up the ordering of things within the regular deployment now such that this firstboot is no longer required.
On Thu, May 28, 2020 at 4:58 PM Alex Schultz <aschultz@redhat.com> wrote:
Slight correction in that we need to be able to have the cloud-init's ntp servers configurable during provisioning. The issue arises when a host's time is in the future and a file is written out that will later be used as a container (or mounted). The container engines tend to fail horribly. I would recommend including ntp servers (or pools) in the initial provisioning and highly recommend users configure them. We'll sync up later as part of the deployment but we want the hardware's time corrected as early as possible.
On 29/05/20 11:13 am, Alex Schultz wrote:
OK, it sounds like we should configure a running timesync in the overcloud-full image with a default pool, and document what cloud-config is required to customize the pool servers or switch it off. This cloud-config can be invoked in the baremetal provisioning yaml
participants (3)
-
Alex Schultz
-
Harald Jensås
-
Steve Baker