[TripleO] Xena PTG session summaries
Hello folks, I sent out some stats and links on our PTG meetup with http://lists.openstack.org/pipermail/openstack-discuss/2021-April/021999.htm... already, but, as a couple of different people asked me about it, I took the time to write a summary for each session today. Of course you can find all etherpad links and recordings via https://etherpad.opendev.org/p/tripleo-ptg-xena (which seems to be down right now but I have backups if it isn't resolved by tomorrow I can try sharing that content somewhere else). Below is a (very) concise summary of the main points in each session and I hope that is useful to someone (especially since it took far longer than I expected ;)). Please reply here (or ping privately if you prefer) for any glaring omissions or obvious issues that should be revised (I originally intended to put this in an etherpad for easier collaboration but as I wrote above, etherpad.opendev.org seems down right now at least for me). regards, marios MON: * https://etherpad.opendev.org/p/tripleo-ptg-retrospective Retrospective of the Wallaby cycle - there are some community and team level 'headlines' on the main items worked on during this cycle on the etherpad. Some identified ideas for improvement include targeting another older branch for end-of-life likely Queens, improving upstream documentation especially removal of stale content, and creating a tag in Launchpad for teams so we can more easily identify which squad is currently assigned. * Topic: Plan/Swift removal update Presentation link: https://drive.google.com/file/d/1igOW4XuAbU55Tat73DwLqO4UGZu8MiNi/view?usp=s... An update of the work completed in the W allay cycle to remove the Swift service and the deployment plan (which is no longer used as part of our deployments) from the undercloud. From wallaby onward by default there is no undercloud Swift. There may be a revision of the spec https://opendev.org/openstack/tripleo-specs/commit/e83d8aba3a950da83a33c23bc... as the original plan didn't explicitly consider removal of the deployment plan. * https://etherpad.opendev.org/p/tripleo-ephemeral-heat Update on the ephemeral heat work (i.e. no permanent heat process on the undercloud). There has been very strong progress made in this cycle and there are still some outstanding patches https://review.opendev.org/q/topic:%22ephemeral-heat%22+(status:open) to be merged. Goal is to make this the default in Xena deployments and backport to Wallaby as optional. Besides the main feature, some related planned work includes consolidation of the python-tripleoclient "overcloud deploy" and "tripleo deploy" (eg standalone) commands. Note that this work depends on the tripleo-network-v2 work (next session below). * https://etherpad.opendev.org/p/tripleo-network-v2 Update on the network ports v2 work (moving network port creation out of the heat stack) https://opendev.org/openstack/tripleo-specs/src/branch/master/specs/wallaby/... - again good progress on this during Wallaby but there is still some ongoing work there https://review.opendev.org/q/topic:%2522network-data-v2%2522+(status:open) . The goal for Xena is to make this the default (i.e. no node/networking config in deploy-steps-playbook.yaml). One main area of work for X in this topic is integration of the baremetal network config in the overcloud deployment (i.e. allow a single command). * https://etherpad.opendev.org/p/tripleo-ceph-xena Update from the ceph team about the main work items completed in Wallaby including the tripleo-ceph-client and tripleo-ceph in place of ceph-ansible for RBD ( https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ce... and https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ce... ). The main work planned for Xena is to continue trying to achieve feature parity with ceph-ansible - including resolving cephadm blockers, , Ganesha & https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ce... . One major consideration is how to move ceph creation/config outside of the heat stack - some parts such as pools, keyrings and haproxy config will have to remain as part of the tripleo deployment. Note that this work depends on the network ports v2 (previous session above). TUE: * https://etherpad.opendev.org/p/tripleo-xena-whole-disk-images A proposal to move to whole disk images instead of the current overcloud-full.qcow+overcloud-full.initrd+overcloud-full.vmlinuz. There were many compelling arguments made for the proposal including: with the overcloud-full.qcow2 partition image, as of centos 8.4 grub2 no longer supports UEFI boot, there will be much less for ironic-python-agent to do during deployment with a single disk image, there will be just one file to distribute (vs 3), we will no longer need to define and build a separate 'hardened' image (and also remove the related CI jobs). One of the main technical issues that needs to be addressed first is the grow partition for /var which is where we are storing containers and config for deployment. * https://etherpad.opendev.org/p/tripleo-xena-drop-healthchecks Proposal to drop the container health check since are using deployment resources but aren't providing value. There was no push back against this proposal and the details are being discussed in the newly posted spec @ https://review.opendev.org/c/openstack/tripleo-specs/+/787535. * https://etherpad.opendev.org/p/ci-tripleo-repos Proposal to consolidate the various ways and places that tripleo-ci is using to configure the repos in the CI jobs. There is a spec proposed @ https://review.opendev.org/c/openstack/tripleo-specs/+/772442 - some of the work here is split into sub items which are ongoing (tripleo-get-hash there https://review.opendev.org/c/openstack/tripleo-ci/+/784392). The main outstanding blocking item here is to agree on the common data format for the various personas upstream downstream and product that we need to support eg https://github.com/mwhahaha/rhos-bootstrap/blob/main/versions/centos.yaml vs https://review.opendev.org/c/openstack/tripleo-repos/+/785593/1/tripleo_repo... * openstack tempest skiplist https://docs.google.com/presentation/d/1aCiV35IYNhPV7SRmi4_A9vkIjZ89pwfC4VvL... Update on the tempest skiplist effort during Wallaby to consolidate the skipped Tempest tests in a central location with the ability to specify particular jobs and or branches for which specific skips will apply. * https://etherpad.opendev.org/p/tripleo-next One of the main items discussed here was the 'first principles' proposal at https://review.opendev.org/c/openstack/tripleo-specs/+/786980 - these are meant to guide us when discussing changes to our deployment tooling and architecture. The proposal will merge in Xena specs once we've reached consensus on the review. Another topic discussed in this session was an update on exploratory work to replace "heat & ansible" in our deployment tooling with 'something else' - some ongoing work here is at https://github.com/cloudnull/director & https://github.com/mwhahaha/task-core. More info and pointers (also discussed Kube/OCP with an operator to deploy tripleo) on the etherpad. WED: * https://etherpad.opendev.org/p/tripleo-xena-inventory-script This was a proposal to remove the "tripleo-ansible-inventory script" @ https://github.com/openstack/tripleo-common/blob/ccd990b58b6583dda3a0e0f3413... and instead generate it from the deployment data (e.g. metalsmith or user data from deployed-server deployments). The consensus reached was that instead of removing it we should instead use it in a better way, for example make sure static inventories are generated and exported to known locations (especially for the ephemeral heat case) and re-used. * https://etherpad.opendev.org/p/vf-ui-output This was an update from the validations squad about the main items worked on during Wallaby (integrated the validation framework into the component CI pipelines, enabled the standalone job in upstream check/gate and increased adoption especially by the upgrades squad). Followed by discussions for planned Xena work, including changes in the UI/CLI (eg jq queries can be handled better and various other UI improvements more on the etherpad). Some of the other topics raised here were to make the validations themselves component aware (run all validations related to a given component) and discussion around the requirement for a molecule test on all validation additions (especially the example of mocking out OpenStack services like keystone); the compromise could be to instead use a standalone job for such cases. * https://etherpad.opendev.org/p/Validation-Framework-Next-Generation In this session the validations squad introduced ideas for the future direction of the validation framework. Some of the main proposals are to remove the validations repos - validations-common and validations-libs out of tripleo governance but still within openstack and establishing a new validations project (discussion but no clear consensus on this point), to re-merge the two repos into one consolidated validations repo and fixup the CLI (see previous session) - more items and other considerations on the etherpad. * https://etherpad.opendev.org/p/tripleo-frr-integration Update on Wallaby progress from the cross-squad team looking at FRR/BGP integration in the tripleo deployment (https://opendev.org/openstack/tripleo-specs/src/branch/master/specs/wallaby/...). Some of the main items discussed for Xena work included how we might approximate some part of this feature in upstream CI (high resource requirements - downstream CI has 9 nodes) and backport considerations (no backport to upstream/train). * https://etherpad.opendev.org/p/update-upgrade-consolidation In this session the upgrades squad outlined their proposal for consolidation of the minor update and major upgrade workflows - without any blockers or objections coming out of the discussion. One of the main considerations was around how we can decouple the operating system updates/upgrades from the tripleo container upgrade - one action item is to de-containerize those containers that are tied to the kernel version (ABI) such as libvirt and openvswitch. THU: * https://etherpad.opendev.org/p/policy-popup-xena-ptg In this session the security squad gave an update on progress during Wallaby on the Role Based Access Control (RBAC) - many services have completed implementation (Keystone, Nova, Ironic - more on the etherpad). Then there was a discussion around potential integration points during the tripleo deployment, for example https://review.opendev.org/c/openstack/tripleo-heat-templates/+/781571/7/env... . One of the considerations was around how we can test this in CI (possibly the standalone job is a good fit) as well as the use of multiple clouds.yaml for project specific operations during the deployment (with the root clouds yaml having the system-admin profile). * https://etherpad.opendev.org/p/centos-stream-9-upstream In this session the CI squad lead a discussion around centos9 stream (possibly coming Apr/May) and what we should consider/prepare for with respect to upstream CI. Some of the main changes and discussion items included NetworkManager and firewalld replacing iptables, ansible version (2.11/2.12?/?). Mainly this effort is blocked on the actual 9-stream release and getting the relevant nodepool node. Another main discussion point here was whether we would support both stream-8 and stream-9 on particular branches - consensus here is that wallaby has both 8/9 and for X can have only 9 - but this is all dependent on when 9 becomes available with respect to when Xena is released. * https://etherpad.opendev.org/p/os-migrate This session was an update from the upgrades squad around the os-migrate tool ( https://github.com/os-migrate/os-migrate ) - which aims to 'copy' your openstack deployment and in particular the end-user workloads (i.e. user data, vms etc, but not the controlplane) onto new hardware, as an alternative to the in-place upgrade. More information and slides @ https://docs.google.com/presentation/d/1UYGOI89MBLHLpS89mPp0VK1yvTYtb2BamUL_...
On Mon, Apr 26, 2021 at 7:37 PM Marios Andreou <marios@redhat.com> wrote:
Hello folks,
I sent out some stats and links on our PTG meetup with http://lists.openstack.org/pipermail/openstack-discuss/2021-April/021999.htm... already, but, as a couple of different people asked me about it, I took the time to write a summary for each session today. Of course you can find all etherpad links and recordings via https://etherpad.opendev.org/p/tripleo-ptg-xena (which seems to be down right now but I have backups if it isn't resolved by tomorrow I can try sharing that content somewhere else).
Below is a (very) concise summary of the main points in each session and I hope that is useful to someone (especially since it took far longer than I expected ;)). Please reply here (or ping privately if you prefer) for any glaring omissions or obvious issues that should be revised (I originally intended to put this in an etherpad for easier collaboration but as I wrote above, etherpad.opendev.org seems down right now at least for me).
etherpad.opendev.org back now, so I pasted the summaries into https://etherpad.opendev.org/p/tripleo-ptg-xena-summaries and linked it via our agenda etherpad, so, please help me to capture the "glaring omissions or obvious issues" I have missed in the summary of your or others' sessions? regards, marios
regards, marios
MON:
* https://etherpad.opendev.org/p/tripleo-ptg-retrospective
Retrospective of the Wallaby cycle - there are some community and team level 'headlines' on the main items worked on during this cycle on the etherpad. Some identified ideas for improvement include targeting another older branch for end-of-life likely Queens, improving upstream documentation especially removal of stale content, and creating a tag in Launchpad for teams so we can more easily identify which squad is currently assigned.
* Topic: Plan/Swift removal update Presentation link: https://drive.google.com/file/d/1igOW4XuAbU55Tat73DwLqO4UGZu8MiNi/view?usp=s...
An update of the work completed in the W allay cycle to remove the Swift service and the deployment plan (which is no longer used as part of our deployments) from the undercloud. From wallaby onward by default there is no undercloud Swift. There may be a revision of the spec https://opendev.org/openstack/tripleo-specs/commit/e83d8aba3a950da83a33c23bc... as the original plan didn't explicitly consider removal of the deployment plan.
* https://etherpad.opendev.org/p/tripleo-ephemeral-heat
Update on the ephemeral heat work (i.e. no permanent heat process on the undercloud). There has been very strong progress made in this cycle and there are still some outstanding patches https://review.opendev.org/q/topic:%22ephemeral-heat%22+(status:open) to be merged. Goal is to make this the default in Xena deployments and backport to Wallaby as optional. Besides the main feature, some related planned work includes consolidation of the python-tripleoclient "overcloud deploy" and "tripleo deploy" (eg standalone) commands. Note that this work depends on the tripleo-network-v2 work (next session below).
* https://etherpad.opendev.org/p/tripleo-network-v2
Update on the network ports v2 work (moving network port creation out of the heat stack) https://opendev.org/openstack/tripleo-specs/src/branch/master/specs/wallaby/... - again good progress on this during Wallaby but there is still some ongoing work there https://review.opendev.org/q/topic:%2522network-data-v2%2522+(status:open) . The goal for Xena is to make this the default (i.e. no node/networking config in deploy-steps-playbook.yaml). One main area of work for X in this topic is integration of the baremetal network config in the overcloud deployment (i.e. allow a single command).
* https://etherpad.opendev.org/p/tripleo-ceph-xena
Update from the ceph team about the main work items completed in Wallaby including the tripleo-ceph-client and tripleo-ceph in place of ceph-ansible for RBD ( https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ce... and https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ce... ). The main work planned for Xena is to continue trying to achieve feature parity with ceph-ansible - including resolving cephadm blockers, , Ganesha & https://specs.openstack.org/openstack/tripleo-specs/specs/wallaby/tripleo-ce... . One major consideration is how to move ceph creation/config outside of the heat stack - some parts such as pools, keyrings and haproxy config will have to remain as part of the tripleo deployment. Note that this work depends on the network ports v2 (previous session above).
TUE:
* https://etherpad.opendev.org/p/tripleo-xena-whole-disk-images
A proposal to move to whole disk images instead of the current overcloud-full.qcow+overcloud-full.initrd+overcloud-full.vmlinuz. There were many compelling arguments made for the proposal including: with the overcloud-full.qcow2 partition image, as of centos 8.4 grub2 no longer supports UEFI boot, there will be much less for ironic-python-agent to do during deployment with a single disk image, there will be just one file to distribute (vs 3), we will no longer need to define and build a separate 'hardened' image (and also remove the related CI jobs). One of the main technical issues that needs to be addressed first is the grow partition for /var which is where we are storing containers and config for deployment.
* https://etherpad.opendev.org/p/tripleo-xena-drop-healthchecks
Proposal to drop the container health check since are using deployment resources but aren't providing value. There was no push back against this proposal and the details are being discussed in the newly posted spec @ https://review.opendev.org/c/openstack/tripleo-specs/+/787535.
* https://etherpad.opendev.org/p/ci-tripleo-repos
Proposal to consolidate the various ways and places that tripleo-ci is using to configure the repos in the CI jobs. There is a spec proposed @ https://review.opendev.org/c/openstack/tripleo-specs/+/772442 - some of the work here is split into sub items which are ongoing (tripleo-get-hash there https://review.opendev.org/c/openstack/tripleo-ci/+/784392). The main outstanding blocking item here is to agree on the common data format for the various personas upstream downstream and product that we need to support eg https://github.com/mwhahaha/rhos-bootstrap/blob/main/versions/centos.yaml vs https://review.opendev.org/c/openstack/tripleo-repos/+/785593/1/tripleo_repo...
* openstack tempest skiplist https://docs.google.com/presentation/d/1aCiV35IYNhPV7SRmi4_A9vkIjZ89pwfC4VvL...
Update on the tempest skiplist effort during Wallaby to consolidate the skipped Tempest tests in a central location with the ability to specify particular jobs and or branches for which specific skips will apply.
* https://etherpad.opendev.org/p/tripleo-next
One of the main items discussed here was the 'first principles' proposal at https://review.opendev.org/c/openstack/tripleo-specs/+/786980 - these are meant to guide us when discussing changes to our deployment tooling and architecture. The proposal will merge in Xena specs once we've reached consensus on the review. Another topic discussed in this session was an update on exploratory work to replace "heat & ansible" in our deployment tooling with 'something else' - some ongoing work here is at https://github.com/cloudnull/director & https://github.com/mwhahaha/task-core. More info and pointers (also discussed Kube/OCP with an operator to deploy tripleo) on the etherpad.
WED:
* https://etherpad.opendev.org/p/tripleo-xena-inventory-script
This was a proposal to remove the "tripleo-ansible-inventory script" @ https://github.com/openstack/tripleo-common/blob/ccd990b58b6583dda3a0e0f3413... and instead generate it from the deployment data (e.g. metalsmith or user data from deployed-server deployments). The consensus reached was that instead of removing it we should instead use it in a better way, for example make sure static inventories are generated and exported to known locations (especially for the ephemeral heat case) and re-used.
* https://etherpad.opendev.org/p/vf-ui-output
This was an update from the validations squad about the main items worked on during Wallaby (integrated the validation framework into the component CI pipelines, enabled the standalone job in upstream check/gate and increased adoption especially by the upgrades squad). Followed by discussions for planned Xena work, including changes in the UI/CLI (eg jq queries can be handled better and various other UI improvements more on the etherpad). Some of the other topics raised here were to make the validations themselves component aware (run all validations related to a given component) and discussion around the requirement for a molecule test on all validation additions (especially the example of mocking out OpenStack services like keystone); the compromise could be to instead use a standalone job for such cases.
* https://etherpad.opendev.org/p/Validation-Framework-Next-Generation
In this session the validations squad introduced ideas for the future direction of the validation framework. Some of the main proposals are to remove the validations repos - validations-common and validations-libs out of tripleo governance but still within openstack and establishing a new validations project (discussion but no clear consensus on this point), to re-merge the two repos into one consolidated validations repo and fixup the CLI (see previous session) - more items and other considerations on the etherpad.
* https://etherpad.opendev.org/p/tripleo-frr-integration
Update on Wallaby progress from the cross-squad team looking at FRR/BGP integration in the tripleo deployment (https://opendev.org/openstack/tripleo-specs/src/branch/master/specs/wallaby/...). Some of the main items discussed for Xena work included how we might approximate some part of this feature in upstream CI (high resource requirements - downstream CI has 9 nodes) and backport considerations (no backport to upstream/train).
* https://etherpad.opendev.org/p/update-upgrade-consolidation
In this session the upgrades squad outlined their proposal for consolidation of the minor update and major upgrade workflows - without any blockers or objections coming out of the discussion. One of the main considerations was around how we can decouple the operating system updates/upgrades from the tripleo container upgrade - one action item is to de-containerize those containers that are tied to the kernel version (ABI) such as libvirt and openvswitch.
THU:
* https://etherpad.opendev.org/p/policy-popup-xena-ptg
In this session the security squad gave an update on progress during Wallaby on the Role Based Access Control (RBAC) - many services have completed implementation (Keystone, Nova, Ironic - more on the etherpad). Then there was a discussion around potential integration points during the tripleo deployment, for example https://review.opendev.org/c/openstack/tripleo-heat-templates/+/781571/7/env... . One of the considerations was around how we can test this in CI (possibly the standalone job is a good fit) as well as the use of multiple clouds.yaml for project specific operations during the deployment (with the root clouds yaml having the system-admin profile).
* https://etherpad.opendev.org/p/centos-stream-9-upstream
In this session the CI squad lead a discussion around centos9 stream (possibly coming Apr/May) and what we should consider/prepare for with respect to upstream CI. Some of the main changes and discussion items included NetworkManager and firewalld replacing iptables, ansible version (2.11/2.12?/?). Mainly this effort is blocked on the actual 9-stream release and getting the relevant nodepool node. Another main discussion point here was whether we would support both stream-8 and stream-9 on particular branches - consensus here is that wallaby has both 8/9 and for X can have only 9 - but this is all dependent on when 9 becomes available with respect to when Xena is released.
* https://etherpad.opendev.org/p/os-migrate
This session was an update from the upgrades squad around the os-migrate tool ( https://github.com/os-migrate/os-migrate ) - which aims to 'copy' your openstack deployment and in particular the end-user workloads (i.e. user data, vms etc, but not the controlplane) onto new hardware, as an alternative to the in-place upgrade. More information and slides @ https://docs.google.com/presentation/d/1UYGOI89MBLHLpS89mPp0VK1yvTYtb2BamUL_...
participants (1)
-
Marios Andreou