On Mon, 4 Mar 2019 at 17:52, Ben Nemec <openstack@nemebean.com> wrote:
On 3/4/19 9:16 AM, Alex Schultz wrote:
On Mon, Mar 4, 2019 at 6:11 AM Dan Prince <dprince@redhat.com> wrote:
On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:
On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:
Recently we've been cleaning house in some of of the TripleO supported services.
We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.
For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.
It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.
Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.
Agree the cost isn't zero. But it also isn't high. And there is value to a project having a deep bench of services from which to choose and try out. The existance of at least some "niche" services in TripleO provides some value to our users and perhaps even an argument to use TripleO as it would be considered a feature to be able to try out these services. Perhaps even partially implemented ones in some cases still have value (no HA support for example).
So I gave it some thought and rather than just deprecating for removal, could we instead mark them as experimental and treat them as such? Yes you're right that folks might want to try these services, however there is no clear definition of a service that should always work vs a service that might work. From an end user perspective if they see that something like Congress is defined and they try and consume it only to find out it doesn't work or isn't configured correctly then that is a poor experience. I also don't think someone who is new to TripleO who wants to try out a service will likely be able to figure out why it's not working and just think "TripleO doesn't work". Can we move services which we have no guarentee to be working (no testing/no owners) to a /experimental/ folder to indicate the service may or may not work?
As someone who wrote the templates for a now-deprecated service I like the idea of them living on in some format. On the other hand, in the course of writing the Designate templates they were broken multiple times by TripleO changes to the service interfaces. If a service isn't being tested regularly I suspect there's little chance of it continuing to work long-term without _someone_ looking after it.
Heck, Designate _is_ in the gate right now and it still broke recently in real deployments with separate control and compute nodes. Without someone paying attention to it I don't know how that would ever have been found or fixed.
I think my recommendation would be to keep James's maintainer requirement for even experimental services, but maybe instead of gating on them just have a periodic job that runs with them enabled once a night and emails the maintainer of record if it fails. That way they can't block other work and aren't consuming much in the way of ci resources, but they can be maintained with minimal effort. It might encourage more people to sign up as maintainers if they know breakages in the service aren't going to force them to drop everything to unblock the gate.
In the kolla project we run some of the service-specific jobs only when relevant files have changed, using Zuuls files/irrelevant-files configuration syntax. This can be combined with a periodic job to catch code rot. Mark
Or maybe that will just result in all the periodic jobs failing indefinitely, but if that happens then you know the maintainer isn't maintaining anymore and you can deprecate the service.
I'm also not sure how much burden that would put on the ci squad to set up such jobs. That's another discussion we'd need to have.
I just spent the time to "flatten" many of these services thinking they would stay for awhile. Many of us are willing to chip in to keep some of these I think.
[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/
Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?
Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented.
Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes.
For the services we agree to keep we could always run them in a lower bandwidth CI framework. Something like periodic jobs. Understood these would occasionally get broken but the upstream feedback loop would at least exist and the services could stay. And we'd still be able to reduce our CI resources as well.
Thanks, -Alex
Dan