[tripleo] Victoria TripleO PTG summary

Wesley Hayutin whayutin at redhat.com
Tue Jun 9 18:24:30 UTC 2020


Greetings,

Thanks to everyone who attended the OpenStack PTG last week!  A special
thanks to those who presented their topics and discussed work items with
the folks in attendance.  As you know the event was virtually hosted in a
video conference and seemed quite busy and packed with great topics and
conversations.  As the current PTL for TripleO I will do my best here to
summarize those conversations and items others should be made aware of.  To
review the topics and discussion please follow the links here [1].

The event was recorded, however the OpenStack foundation has not made any
of the videos publicly available yet.


Monday June 1st:

Retrospective:

The TripleO project started with a retrospective of the Ussuri cycle.  I
attempted to use OpenStack’s storyboard for the process, but had to revert
to an etherpad for usability.  Keep trying, Storyboard is getting there.
The good news is that the good things outweighed the bad things [2], and
the ideas for improvement were focused on making things faster \0/

TripleO Operator Ansible Status: by Alex Schultz

Alex gave a nice overview of his hard work throughout Ussuri to make
TripleO Operator Ansible a reality.  TripleO Operator Ansible is the
official way now to execute TripleO commands via ansible.  The upstream ci
and consultants in the field are all consolidating around the tool. The
history of reviews to make this happen can be found [4], while Alex
completed a lot of the work he also attracted a number of contributors that
also completed a lot of work.  One note that Alex wanted to emphasize was
that while TripleO Operators are meant to be executed by customers, and
consultants, TripleO-Ansible is NOT meant to be exposed or called
directly.  Slides are available here [3].  Thank you Alex!

The Future of python-tripleoclient: by Rabi Mishra

Rabi led a very interesting conversation about the steps the project would
have to take to further simplify the stack of projects used in a TripleO
deployment.  Currently there are a number of layers in client calls to
tripelo-operator-ansible, python-tripleoclient, tripleo-ansible
playbooks/modules, tripleo-common library which is complex and not an ideal
user experience in terms of logs, resolving bugs.  Out of the gate Rabi
discussed breaking down python-tripleoclient into something more basic and
moving more functions to ansible modules.   The proposal was to get rid of
CLI or replace it with a very simple one and move all the logic in
tripleoclient to ansible playbooks/modules. The top level playbooks would
directly map to current cli actions and would be in tripleo-ansible repo.
tripleo-operator-ansible can also change to use those playbooks directly
and transparently under the hood.  Details from the session can be found
here [5].  Thanks Rabi!

Ansible Strategies & us: by Alex Schultz

Alex was up again to let us know what he’s been up to make Ansible more
performant. Ansible offers several different kinds of “strategies” with
regards to how tasks are executed across multiple hosts.  The strategies
are pluggable and Alex has built a custom strategy currently called
“TripleO Free” that can be used across some but not all of TripleO’s tasks
[7].  The performance enhancement is spectacular, reducing a 30 node
deployment from almost 2 hours to under 50 minutes.  Well done!!!  I’ll
note the strategy name will be changed to garner more community support and
the performance gains are not as pronounced with fewer nodes.  A standard
CI like deployment ( 4-5 nodes ) can expect to see 20 minutes cut off the
deployment

Slides of Alex’s presentation can be found here [6].  Very well done!
Thanks Alex!

Mistral has been removed, so what is left to do? By Kevin Carter

Kevin hit us next with what to expect now that mistral has been removed and
what steps we need to take next to make the community successful.  I’ll
note the mistral container is still on the undercloud but inactive, and
workflow processing has been converted directly to ansible.  There is still
some cleanup in tripleo-common and rpm dependencies to prune.

A link to the conversation is available [8], and Kevin’s presentation is
available [9].  Thank you Kevin!

TripleO Operator Pipelines By Emilien Macchi

Emilien walked us through what it would take to further consolidate on
TripleO Operator Ansible based CI pipelines in TripleO, OSP and at customer
sites.  Breaking down the full workflow of a deployment and day two
operations in CI was reviewed.  The goal here is to replace as much CI as
possible with TripleO Operator Ansible to have a standardized ansible
interface with TripleO in any CI or customer environment.  TripleO
Operators are shipping in Ussuri and should be backwards compatible with
earlier releases.  There will be a major push upstream to further integrate
TripleO Operator Ansible into every CI job.  Notes and comments are
published here [10].  Thanks a million!



[1] https://etherpad.opendev.org/p/tripleo-ptg-victoria

[2] https://etherpad.opendev.org/p/tripleo-ptg-retrospective

[3]
https://docs.google.com/presentation/d/1Oxs-sflnJd5KIMoY6e_mlfU2zK4QWqWSG7AhJLpqTMo/edit#slide=id.p3

[4] https://etherpad.opendev.org/p/tripleo-operator-ansible

[5] https://etherpad.opendev.org/p/tripleo-future-of-tripleoclient

[6]
https://docs.google.com/presentation/d/19mr3HyyYUUGcwbRHsy2-k-F4WNCYCoFvSPzdbCTgw-A/edit?usp=sharing

[7] https://review.opendev.org/#/q/topic:strategy-improvements

[8] https://etherpad.opendev.org/p/tripleo-remove-mistral

[9]
https://docs.google.com/presentation/d/1iir7FA6YwBxRoU_SZJQ2H9Enbak4GAfBw73wU3udFkQ/edit#slide=id.g861ce413bb_0_701

[10] https://etherpad.opendev.org/p/tripleo-operator-pipelines


Tuesday June 2nd

CI updates: by Wes Hayutin

In the CI update I mostly covered the new upstream Component Pipeline.  The
component pipeline has three major goals, the first is to enable us to
release at any time with working components of OpenStack, secondly break
down a large problem into smaller problems, and third reduce time to debug
and fix.   The presentation covers monolithic vs. components builds, the
workflow and testing and monitoring of the pipeline.  The presentation is
available here [11]

I also noted that the upstream CI executed 268,805 deployments of TripleO
in the Ussuri cycle. Third party CI executed 132,853 deployments.  Not to
shabby. Details are here [12]. Thanks easter bunny!

IPv6 and DCN (routed-networks) in upstream CI by Harald Jensas

Harald kicked off the next topic about utilizing more advanced networking
with OVB in our third party TripleO CI.  The proposal is to update the CI
with multiple network segments.  Herald has been the primary of OVB (
openstack virtual baremetal ) and looks to be wrapping up this feature
[13]. Documentation for the feature can be found here [14] and notes of the
discussion are posted [15]  Thank you Harald!

Enable network isolation by default by Harald Jensas

Harald continued the networking discussion by highlighting common mistakes
made in the field with network isolation settings and TripleO. When
customers or consultants accidentally forget to include network-isolation
settings heat can be destructive to the production environment and delete
networks during an update or upgrade.   The discussion led to merged
patches that already solve the issue, but catching it earlier in the
process was still a concern and led to discussion around additional
validations.   The goal was also shifted to make network-isolation more
approachable by our customers.  A spec will be written to improve the
customer experience here.  Notes can be found [16] Thanks Harald!


 Future deprecation of tripleo-validations by Cedric Jeanneret

Cedric led us through the current status and future of TripleO
Validations.  The deprecated version of the validations do not have clear
ownership, and testing each validation has proved to be difficult.  Enter
the solution with the validation teams new framework of validations where
the validation service is clearly delineated from the validations
themselves.  We discussed ownership, packaging, CI with the entire workflow
of packaging and testing all have clear ownership and fits in very neatly
with the component pipeline.   Please read through the details of the
discussion as this will impact several projects with a clean, exciting way
to validate each service in OpenStack [17].  Thank you Cedric!!

Container Image Build v2 by Emilien Macchi and Kevin Carter

Kevin and Emilien have been putting in a lot of extra hours revamping the
container build system for TripleO and have produced a much improved system
in record time.   TripleO will benefit from smaller containers, faster
builds and the flexibility to handle upstream and downstream builds
easily.  If you haven’t seen the presentation please do have a look [18].
Notes on the topic can be found here [19].  Get involved if you can keep
up, Thanks Emilien, Kevin!

Ceph Integration w/ cephadm by Francesco Pantano, Giulio Fidente, John
Fulton

The storage trio walked us through details of cephadm and the ramifications
of replacing ceph-ansible.   We discussed a wide range of topics here
including what features should be built into TripleO vs. handled directly
by cephadmin like scale up/down, updates, upgrades.  The team walked us
through how they dissected the deployment and injected cephadmin as a proof
of concept that everything works well and proved we’re in good hands on the
storage front.  There is a lot of detail in the notes, so please have a
read through here [20].  Thank you Francesco, Giulio ( aka bob dylan ),
John!

 Removal of Heat and Swift from Undercloud by Rabi Mishra

Rabi continued from his earlier topic regarding the noble effort to further
simplify the OpenStack deployment by removing heat and swift.   Rabi
articulately described how heat is currently used in the latest release and
what would have to be done to remove both heat and swift, walking us
through heat resources, extra config, IPAM etc and how each could
potentially be replaced.  I personally really enjoyed hearing this
particular topic and how we can move forward making OpenStack less
complex.  This is an important topic and you should review Rabi’s clear
strategy here [21].  Thanks Rabi!

Database migrations - can we make them more friendly, or can we do them a
better way? By Jesse Pretorius

Jesse led us next and spoke to the hard and complex problem of database
migrations across OpenStack projects.   This session was more of a
brainstorming exercise in discovering creative solutions to complex
problems.  Unfortunately the group felt the problem came down to governance
of OpenStack itself in more uniformly enforcing migration details.  It was
concluded getting all the projects to agree on a standard for migrations
would be quite the uphill climb.  Notes on the subject can be found here
[22]. Thanks Jesse!



[11] https://drive.google.com/file/d/1rAohZ01BDFGBOjI3kS9jy-n-P1weQVIX/view

[12] https://etherpad.opendev.org/p/tripleo-ptg-victoria-ci-updates

[13]
https://review.opendev.org/#/q/topic:ipv6-support+(status:open+OR+status:merged)+project:openstack/openstack-virtual-baremetal

[14]
https://openstack-virtual-baremetal.readthedocs.io/en/latest/deploy/quintupleo.html#quintupleo-and-routed-networks

[15]
https://etherpad.opendev.org/p/tripleo-ipv6-and-routed-networks-in-upstream-ci

[16] https://etherpad.opendev.org/p/tripleo-enable-net-iso-by-default

[17] https://etherpad.opendev.org/p/tripleo-validations-future

[18]
https://docs.google.com/presentation/d/1l2RzL-hJ-fT9jzi2A7s8ladEumOvy8tDy7DP0XqnqNE/edit#slide=id.p3

[19] https://etherpad.opendev.org/p/tripleo-container-image-tooling-victoria

[20] https://etherpad.opendev.org/p/tripleo-ceph

[21] https://etherpad.opendev.org/p/tripleo-heat-swift-removal-undercloud

[22] https://etherpad.opendev.org/p/tripleo-ptg-victoria-better-db-sync


Wednesday June 3rd


Speeding up deployments, updates and upgrades by Jesse Pretorius

Jesse had quite a few suggestions and proposals on how we may speed up
updates and upgrades.  Some of the highlights were building on top of
Alex’s ansible strategy improvements and avoiding skipped tasks, avoiding
unnecessary reboots etc.  There was a lot discussed, please refer to the
etherpad for details [23].  Thank you Jesse!

Running validations from within a container by Cedric Jeanneret

Cedric continued to walk us through the very near future of validations.
Cedric wanted to discuss the delivery of validations and the implications
of using a container to host the validations.  Older non-containerized
versions of TripleO were discussed, using an ansible collection and a
container were discussed.  There are multiple use cases for validations
leading the group to not consolidate on a single delivery mechanism.  More
discussion was needed at the end of this topic.  Notes are here [24].
Thanks Cedric!


 Auto --limit scale-up of Compute nodes by Luke Short

Luke walked us through his auto scale up spec [25] in the following
presentation [26].  Essentially this is customizing the scale up process to
better match the ansible configuration for forked processes to make the
scale up quicker.  Luke was able to perform a 10 node scale up in 20
minutes.  Nice presentation Luke!

TripleO usability enhancements:  by Wes Hayutin

Initially there was not a lot of suggestions in this topic prior to PTG,
however once we got started with usability improvements they started to
roll in.  Definitely check the etherpad [27], but I’ll list some here.


   -

   Fix “FAILED - RETRYING in stack status”   - fixed already in
   https://review.opendev.org/#/c/725665/
   -

   Network-Isolation user experience - linked back to Harold’s topic
   -

   Chem raised a number of update / upgrade improvements on line 25 [27]
   -

   Eliminate the need for customers to remove deprecated services from
   roles_data in upgrades
   -

   Improved logging
   -

   Block commands that require a previous actions
   -

   Prompt user prior to dangerous actions
   -

   Keep simplifying, e.g.’s Rabi’s proposals.

Check the etherpad for more details [27]

 Improvements in TLS Everywhere/ CI  Presenter by Ronelle Landy, Ade Lee

At the last PTG in Shanghai Ade and I proposed a job that would setup IPA
and a Standalone deployment of TripleO and configure it to work together
upstream to check and GATE changes to TLS.  I’m happy to report that Ade
and Ronelle got the job done!! The presentation is available here [29].
This should go a very long way to help prevent TLS related bugs across
upstream and internal testing.  Thanks Ade, Thanks Ronelle!



[23] https://etherpad.opendev.org/p/tripleo-ptg-victoria-speedups

[24] https://etherpad.opendev.org/p/tripleo-validation-container

[25]
https://review.opendev.org/#/c/727768/1/specs/victoria/auto_scale_up_compute_nodes.rst

[26] https://drive.google.com/file/d/1c2D67QZ5UGBYRONogZJKMief5wJxQAIt/view

[27] https://etherpad.opendev.org/p/tripleo-usability-enhancements

[28] https://etherpad.opendev.org/p/tripleo-tls-everywhere-ci

[29]
https://docs.google.com/presentation/d/1gruDzHIjZtPtUUYSRGIrJynTOt9bZP3W7TLfd80H9ws/edit#slide=id.p


Thursday June 4th

Config-download 2.0 by Luke Short

Luke walked us through a proposal to build on config download and what
steps could be taken further simplify the deployment using Ansible.  Using
idempotent tasks, static playbooks etc.  The source of truth with regards
customer environments was a tough nut to crack here and it was difficult to
see a very clear and backwards compatible method to approach this.  Notes
are available here [30]

Ansible logging within tripleoclient  by Cedric Jeanneret

Cedric proposed improvements to TripleO logging here [31].  Cedric walked
the group through the spec.  Logging is certainly an area we all want to
see improvement on and we all have opinions about so this was a lively
conversation.  Cedric pointed out there are two main types of steps we need
to go after which is any ansible task, and any tripleo cli command called
need to be logged together in a human readable way.  Very good points made
and the conversation will continue in the TripleO spec. [31].  Thanks
Cedric!

Transitioning the underlying CentOS/RHEL - how can we improve this process? By
Jesse

Jesse spoke to the challenges facing upgrades with regards to the mix of
host RHEL versions in a deployment.  Several finer points were made with
nuances of the upgrade with regards to HA, pacemaker, libvirt and ironic
versions.   It was a good troubleshooting session and a thoughtful dialogue
with the group.  Read the details here [33]  Thanks Jesse!

 VxFlexOS integration within TripleO  by Jean Pierre Roquesalane/Rajini
Karthik

The Tripleo team answered questions and walked the Dell team through the
best practices with integrating a 3rd party service with TripleO.  Details
are here [34]

TripleO CI - audit coverage for neutron, ovn, ovs, octavia  by Brent
Eagles, Slawek Kaplonski, Wes Hayutin

This session was about reemphasizing the importance of the network
scenarios and workflows that are critical to TripleO’s success in the
field.  Over the past year or so Brent and others have done a great job in
adding additional upstream coverage but it’s now time to make sure this is
everyone’s job as well.  We discussed the challenges with upstream jobs,
the neutron project and TripleO.   We also spec’d out some idea on where to
build on top of Brent and Slawek successes to reach greater upstream
coverage of critical network features and workflows.

Notes are available [35].  Really appreciate Brent and Slawek making time
to attend, thank you!!


tripleo-validations package future by Cedric Jeanneret



Last but not least Cedric led a discussion on packaging for validations and
how we can align responsibility of validations with rpms, git repos and
CI.  We spoke to how the packaging is related to the component pipeline.
Designing this carefully so that the validation’s team and other projects
can work together and independently with clear lines.  Good stuff here,
details [36]  Thanks Cedric!

openstack tripleo deploy (standalone) for multinode by James Slagle

Please read through the blueprint and specs proposed by James with regards
to utilizing the standalone deployment for multinode overcloud deployments.
[37] Thanks James


[30] https://etherpad.opendev.org/p/tripleo-ptg-victoria-config-download-two

[31]
https://review.opendev.org/#/c/733652/1/specs/victoria/ansible-logging-tripleoclient.rst

[32] https://etherpad.opendev.org/p/tripleo-ptg-victoria-ansible-logging

[33] https://etherpad.opendev.org/p/tripleo-ptg-victoria-distro-transition

[34] https://etherpad.opendev.org/p/tripleo-ptg-victororia-VxFlexOS

[35 https://etherpad.opendev.org/p/tripleo-network-coverage-audit

[36] https://etherpad.opendev.org/p/tripleo-validations-future

[37 https://etherpad.opendev.org/p/tripleo-ptg-tripleo-deploy

Did you make it?

Special thank you sincerely ( I know I never sound sincere ) to both
Emilien and Alex throughout the cycle in helping me with my PTL
responsibilities!!

See you next PTG :)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-discuss/attachments/20200609/0c404ebb/attachment-0001.html>


More information about the openstack-discuss mailing list