Greetings,


Thanks for all those Red Hatters who attended the OpenStack Summit and PTG in Shanghai! 

A special thanks to those who presented their topics and discussed work items with the folks in attendance.  As the current PTL for TripleO I will do my best here to summarize those conversations and items others should be made aware of.


Over the course of Thursday and Friday roughly 7-10 folks discussed the identified topics [1], at the following times with my raw notes attached [2].  My apologies if I did not accurately represent your topic here, please feel free to correct me.  


Thursday, Giulio Fidente: Edge ( SDS storage )

Guilio walked us through some background work with regards to support for storage in remote sites / edge deployments.  Working through support for Cinder was straight forward enough with no real collaboration required. Support for ceph copy on write for nova guests was also added with the glance image added to remote sites.  Where Guilio needed input was with regards to having change the ctrl plane config for glance for each remote site [3]. This ctrl plane update would force operators to put the cloud in maintenance mode for a stack update.  It was determined this could not be avoided at this time. It was noted that the TripleO simplification project and rework puppet-apply, please help us achieve that by reviewing the following two topics [4][5]. Thanks Giulio!


Thursday, Guilio Fidente: Virtual IP / Storage

Guilio walked us through some challenges with hosting a shared file system on remote/edge sites using manilla.  The idea was to use Ganesha translation with CephFS. The proposal was that Ganesha and pacemaker would be managed in the ctrl plane but there was a question with regards to the virtual ip on edge sites.  This was an interesting conversation that ended up with a suggestion from Kevin Carter to use a host-only local route on the edge to properly route the ip. This seemed to everyone to be a very clever solution to the problem :)  Thanks Guilio, Thanks Kevin!


Thursday, Martin Schuppert: Nova CellV2 multicell

Martin walked the group through the current status and plans for the multicell implementation.

Background: Nova multicells are used to help scale a cloud and partition it in such a way to get the messaging queue closer to the compute cell, essentially rabbit, galera, collector, vnc proxy and a number of compute nodes.  This architecture is already in use but with only one default cell, pike was the switch to cellv2.


The work started in Stein and continued through train using a similar approach as DCN.  Some of the specs are that there is one cell per stack that is initially created from an export of the central stack, more ansible is place for the deployment as well.  Two different architectures were noted, all cells in one heat stack [6], and one that splits the cell controllers and computes into different heat stacks w/ multiple stacks on the edge sites [7].   The development work for updates is complete and upgrades is still a WIP.


Plans for the future included integrating TLS everywhere and enabling storage in the cell ( cinder, ceph, glance).  Tony Breeds pointed out this architecture should just work in multiarch but would like the teams help in designing / advice while creating a test environment.

Please review the following patches [13]

Thanks Martin!!


We tried to get more folks to switch to their topics to Thursday but were not able to.  On to Friday.


Friday, Edge ( DCN ) roadmap: David Paterson

This conversation was informally walked through Thursday mainly with Arkady and Guilio and was followed up on Friday with a joint edge session regarding booting edge nodes. Several questions were raised on Thursday regarding the networking and connectivity for edge sites as it relates to provisioning.  Validations were discussed as a way to address the minimum requirements for booting edge nodes. David did not end up presenting here, but was available at the joint session. See the “edge booting” section later in the document for details.


Friday, Backup and Restore: Carlos Camacho

The project started in Newton. Initially the backup consisted of a database dump and files being backed up for a defined set of use cases.  In the field it was discovered that customers had many different kinds of deployments and the feature did not work well for all the intended use cases.   An improved plan included to move to full disk image backups utilizing REAR [8]. Carlos also noted that customers are now trying to use ( or misuse ) this feature to perform baremetal to virt migrations.   One of the current issues with the current solution include that it’s not clear how services behave after backup and restore.. E.G. OSD mons. Wes Hayutin noted that we have an opportunity to test the full image backup and restore solution by moving to a more image based internal CI system currently being designed by Jesse Pretorious and others.

Thanks Carlos!!


Friday, Failure Domains: Kevin Carter

Unfortunately Kevin was in high demand across PTG events and was unable to present this topic.  This should be discussed in a mid-cycle ( virtual or in person ) and written up as a blueprint.   Essentially Kevin is proposing in large deployments to allow some number or percentage of nodes to fail a deployment while not failing the entire deployment.  If a few non-critical nodes fail a large scale deployment TripleO should be better able to handle that, report back and move on. It was pointed out to me there is a related customer bug as well.

Thanks Kevin!!


Friday, Cross project:  Edge Booting: Julia Kreger

You can find notes on this session here [9].  I will only summarize the questions proposed on earlier edge ( DCN ) topic.  With regards to when does TripleO need to support redfish there was no immediate or extremely urgent requests ( please correct me if I do not have the correct information there).  Redfish IMHO did seem to be a nice improvement when compared to IPMI.

This was my first introduction to Redfish, and I of course curious what steps we had to take in order to CI it.  Luckily after doc diving I found several helpful links that include steps with setting Redfish up with our own OVB tooling ( hooray \0/ ).  Links can be found here [10], and it seems like others have done some hard work to make that possible so thank you!!

Thank you Julia!!


Friday, Further TripleO Ansible Integration: Wes Hayutin

The idea here would be allow the TripleO project to govern how TripleO is deployed with ansible as an operator.  The TripleO project would ship ansible modules and roles that directly import python-tripleoclient to support ansible to cli parity [12].  Using or modeling a new repo, perhaps called tripleo-operator-ansible would be used to host these modules and roles and include the same requirements and features of tripleo-ansible’s linting, molecule tests, and auto documentation.   This could tie in well with an initiative from Dan Macpherson to ship Ansible playbooks as part of our OSP documentation. Julia Kreger noted that we should not ignore the Ansible OpenStackSDK for part of the deployment process which is a very valid point.  Most everyone at the PTG agreed this was a good direction moving forward and would help consolidate the public and internal tooling around TripleO’s CLI in ansible.

Thanks Dan, Julia!!


Friday, TLS-E standalone gate: Ade Lee

Ade Lee walked us through a proposal to help test and CI TLS upstream which has been very difficult to date ( I can personally vouch for this ).  Using a two node setup upstream with one node as the IPA server and the other a TripleO standalone deployment. The keystone team is setting the right example for other projects and teams that are finding it difficult to keep outside patches from breaking their code, and that is to find a way to get something voting and gating upstream even if it’s not installed and deployed in the same exact ways customers may use it.

Please help by reviewing the keystone / security teams patches here [14]

Thanks Ade!!


Friday, Octavia tempest plugin support in TripleO-CI: Chandan Kumar

Chandan was off fighting battles with the infra team and other projects.  Here are some of his notes:

Have a rdo third party standalone job with full octavia tempest triggered against octavia patches from stein onwards. FS062 

Look into multinode job for queens releases as a third party from queens and rocky. FS038

Have a support of octavia tempest plugin in os_tempest.

We certainly should have a conversation offline regarding these topics. I’ll note the TripleO-CI community meeting or the #tripleo meeting on Tuesdays are a good way to continue collaborating here.

Thanks Chandan!!


Friday, Replace clouds.yaml with an openrc ansible module: Chandan Kumar

Open Question:  is this module [15] from the openstack-ansible project something we can reuse in TripleO via tripleo-ansible?


Friday, Zuul jobs and ansible roles to handle rpm packaging: Chandan Kumar

The background and context can be found here 

https://pagure.io/zuul-distro-jobs - collection of ansible roles to deal with rpm packaging

https://opendev.org/openstack/tripleo-quickstart-extras/src/branch/master/roles/build-test-packages -> creating rpms from different projects depends-on in commit message

Proposal:

Move it to zuul-jobs

Making the rpm packaging more generic for centos/fedora/rhel and move it to zuul jobs

* Move mock and rpmbuild related roles to zuul jobs repo

* Adding a mention of third party zuul jobs to main zuul jobs doc

* Build-set_registry setup http server and start the job

* Details are here 

Thanks Chandan!! This is indeed a very interesting and powerful proposal. We should definitely continue this conversation with the broader community.


Did you make it all the way down here?  Well done! I should add an easter egg :)


Links:


[1] https://etherpad.openstack.org/p/tripleo-ussuri-topics

[2] https://etherpad.openstack.org/p/tripleo-ptg-ussuri

[3] https://blueprints.launchpad.net/tripleo/+spec/split-controlplane-glance-cache

[4 ]https://review.opendev.org/#/q/topic:disable/paunch+(status:open+OR+status:merged)

[5] https://review.opendev.org/#/q/topic:deconstruct/container-puppet+(status:merged)

[6]  https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_basic.html

[7] https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_advanced.html

https://docs.openstack.org/project-deploy-guide/tripleo-docs/latest/features/deploy_cellv2_routed.html

[8] https://access.redhat.com/solutions/2115051

[9] https://etherpad.openstack.org/p/PVG-ECG-PTG

[10] https://github.com/openstack/sushy 

 https://docs.openstack.org/sushy/latest/contributor/index.html#contributing

https://docs.openstack.org/sushy-tools/latest/

https://docs.openstack.org/sushy-tools/latest/user/dynamic-emulator.html#systems-resource-driver-openstack

[12] https://hackmd.io/caRlGha7SueZxDRcyq9eGA?both


[13]  https://review.opendev.org/#/q/topic:cellv2+(status:open+OR+status:merged)

[14] https://review.opendev.org/#/q/status:open+project:openstack/tripleo-heat-templates+branch:master+topic:add_standalone_tls

[15] https://opendev.org/openstack/openstack-ansible-openstack_openrc

[16] https://etherpad.openstack.org/p/PVG-keystone-forum-policy

[17 https://datko.pl/zuul.pdf

[18] https://github.com/openstack/tripleo-heat-templates/blob/master/README.rst#service-testing-matrix

[19] https://github.com/openstack/openstack-virtual-baremetal




Thanks all!!

Wes Hayutin 

TripleO-PTL