[openstack-dev] [kolla] PTG Summary

Paul Bourke paul.bourke at oracle.com
Thu Mar 8 14:30:53 UTC 2018

Hi all,

Here's my summary of the various topics we discussed during the PTG. 
There were one or two I had to step out for but hopefully this serves as 
an overall recap. Please refer to the main etherpad[0] for more details 
and links to the session specific pads.

build.py script refactor
* I think was little debate that we need this. However, discussion moved 
fairly quickly towards if there's changes we can make to our images that 
will not require maintaining such a large build script in the first place.
* loci images are making good progress and are already in use by 
   * By moving the start scripts from the kolla images into 
kolla-ansible we can decouple ourselves from these images and open the 
possibility of comsuming images from other sources such as loci.

* Do a poc of externalising start scripts (started under 

plugin split from main images
* Plugins continue to be a contentious issue in Kolla
* The current approach of installing all available plugins 'out of the 
box' is not working for certain users.
* Sam Betts had a good example of why this is not working for them, I 
don't feel I can summarise it properly. Will reach out to him to clarify.
* We didn't reach a conclusion on this, it seems there are pros and cons 
to each approach. Needs further discussion and possibly some pocs.

ansible "--check" and "--diff" mode
* Operators would like to see some dry run like features in kolla-ansible.
* Would like to see the return of something like genconfig, where 
configs can be generated ahead of time and diffed/reviewed before deploy.
* Also some general discussion in this session on management and scaling 
difficulties with kolla.
* Inventory management needs to be more flexible.
* Operations are too slow once you hit about 200 nodes, operators are 
finding they have to use manual trickery to divide up their inventories.
* A lot of operations take place when very little has changed config wise.

* No specific actions came out of this at this time. I think we'd need 
more time on this topic to determine specific work items that can make 
improvements here.

Database backup & recovery
* Interesting topic, all in agreement kolla should provide some 
functionality in this area.
* Discussion around which areas of responsibility fall on kolla vs. the 
operator. E.g. 'kolla should allow for regular database backups, how 
those are restored is beyond project scope'
* yankcrime has done some ground work on this as well as a poc.
* Good documentation is important here.

* Review yankcrime's poc and provide feedback
* Form a spec detailing what mechanism we want to use to trigger 
backups, etc.

* All seem in agreement that the issues and work seen in migrating to 
ceph-ansible currently outweigh the benefits.
* Decided to stick with improving kolla ceph for now, with bluestore 
support being a priority.

* Write a blueprint to add support for bluestore 
* Update docs to better inform operators on why they may or may not want 
to use kolla ceph vs the alternatives.

Prometheus support for monitoring
* There have been some previous attempts to add a monitoring stack in 
Kolla, though none have come to fruition.
* Oracle are looking at prometheus and what it will take to integrate 
that to Kolla to fill this gap.

* Write spec to detail how this will work.
* Do the work.

self health check support
* This had some crossover with the monitoring discussion.
* Kolla has some checks in the form of our 'sanity checks', but these 
are underutilised and not implemented for every service. Tempest or 
rally would be a better fit here.

* Remove the sanity check code from kolla-ansible - it's not fit for 
purpose and our assumption is noone is using it.
* Make contact with the self healing SIG, and see if we can help here. 
They may have recommendations for us.
* Make a spec for this.

destroy service & node
* Several aspects to this:
   * We would like to be able to remove an individual service as part of 
kolla-ansible destroy
* It is not clear what best practice is to remove a control node in Kolla
* Likewise for compute
* This could be automated but documentation would go a long way here also.

* Clearly document how to remove a control/compute node from a kolla 

integrate with docker-compose
* This is something Jeffrey is working on so we didn't have much to 
contribute in the way of discussion.

* Review and provide feedback on https://review.openstack.org/538581

Implement rolling upgrade for all core projects
* Started by defining the 'terms of engagement', i.e. what do we mean by 
rolling upgrade in kolla, what we currently have vs. what projects 
support, etc.
* There are two efforts under way here, 1) supporting online upgrade for 
all core projects that support it, 2) supporting FFU(offline) upgrade in 
* lujinluo is working on a way to do online FFU in Kolla.
* Testing - we need gates to test upgrade.

* Finish implementation of rolling upgrade for all projects that support 
it in Rocky
* Improve documentation around this and upgrades in general for Kolla
* Spec in Rocky for FFU and associated efforts
* Begin looking at what would be required for upgrade gates in Kolla

* mgoddard gave us an overview of the project, what it is and potential 
cross over / collaboration areas with kolla.
* In short, Kayobe adds the pieces to kolla-ansible required to build an 
end-to-end OpenStack deployment tool, along the lines of TripleO
* There's lots of good info on this on 

* None at this time.

HAProxy config customisation ( customize non-openstack service conf)
* Discussion continues on the best way to handle non ini style config 
customisation in kolla.
* Similar to the plugins we have lots of ideas but each comes with pros 
and cons so its not yet clear which is the right approach.

[0] https://etherpad.openstack.org/p/kolla-rocky-ptg-planning

More information about the OpenStack-dev mailing list