Re: [TripleO] criteria for deprecating services

5 Mar 2019


      On Mon, 4 Mar 2019 at 17:52, Ben Nemec <openstack@nemebean.com> wrote:
...
On 3/4/19 9:16 AM, Alex Schultz wrote:
...
On Mon, Mar 4, 2019 at 6:11 AM Dan Prince <dprince@redhat.com> wrote:
...
On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:
...
On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:
...
Recently we've been cleaning house in some of of the TripleO
supported
services.
We removed MongoDB as RDO was also dropping it. I guess we needed
to
follow suite as our CI is also based on the packages there.
For other services (Designate for example) if the RDO packages
exist
and we already have support do we really need to deprecate them?
Having
the ability to deploy some of the lesser used but still active
OpenStack projects with our deployment framework is nice for
developers
and users alike. Especially when you want to try out a new
services.
It's the long term maintenance of them to ensure they continue to
work
(packaging/promotions/requirement syncing). If no one is watching
them
and making sure they still work, I'm not sure it's worth saying they
are "supported". Much like the baremetal support that we had, when we
drop any testing we might as well mark them deprecated since there is
no way to know if they still "work" the next day.  Adding and
maintaining services is non-trivial so unless it's actively used, I
don't think it's necessarily a bad thing to trim our "supported" list
to a set of known good services.
Just in the last two or three weeks I've had to go address packaging
problems with Vitrage[0] and Tacker[1] because requirements changed
in
the project and the packages weren't kept up to date so the puppet
module CI was broken.  No one noticed this was broken until we went
to
go update some unrelated things and found out that they were broken.
The same thing happens in TripleO too where a breakage in a less than
supported service takes away time for more important work.  The cost
to keep these things working is > 0.
Agree the cost isn't zero. But it also isn't high. And there is value
to a project having a deep bench of services from which to choose and
try out. The existance of at least some "niche" services in TripleO
provides some value to our users and perhaps even an argument to use
TripleO as it would be considered a feature to be able to try out these
services. Perhaps even partially implemented ones in some cases still
have value (no HA support for example).
So I gave it some thought and rather than just deprecating for
removal, could we instead mark them as experimental and treat them as
such?  Yes you're right that folks might want to try these services,
however there is no clear definition of a service that should always
work vs a service that might work.  From an end user perspective if
they see that something like Congress is defined and they try and
consume it only to find out it doesn't work or isn't configured
correctly then that is a poor experience.   I also don't think someone
who is new to TripleO who wants to try out a service will likely be
able to figure out why it's not working and just think "TripleO
doesn't work".  Can we move services which we have no guarentee to be
working (no testing/no owners) to a /experimental/ folder to indicate
the service may or may not work?
As someone who wrote the templates for a now-deprecated service I like
the idea of them living on in some format. On the other hand, in the
course of writing the Designate templates they were broken multiple
times by TripleO changes to the service interfaces. If a service isn't
being tested regularly I suspect there's little chance of it continuing
to work long-term without _someone_ looking after it.
Heck, Designate _is_ in the gate right now and it still broke recently
in real deployments with separate control and compute nodes. Without
someone paying attention to it I don't know how that would ever have
been found or fixed.
I think my recommendation would be to keep James's maintainer
requirement for even experimental services, but maybe instead of gating
on them just have a periodic job that runs with them enabled once a
night and emails the maintainer of record if it fails. That way they
can't block other work and aren't consuming much in the way of ci
resources, but they can be maintained with minimal effort. It might
encourage more people to sign up as maintainers if they know breakages
in the service aren't going to force them to drop everything to unblock
the gate.
In the kolla project we run some of the service-specific jobs only when
relevant files have changed, using Zuuls files/irrelevant-files
configuration syntax. This can be combined with a periodic job to catch
code rot.
Mark
...
Or maybe that will just result in all the periodic jobs failing
indefinitely, but if that happens then you know the maintainer isn't
maintaining anymore and you can deprecate the service.
I'm also not sure how much burden that would put on the ci squad to set
up such jobs. That's another discussion we'd need to have.
...
...
I just spent the time to "flatten" many of these services thinking they
would stay for awhile. Many of us are willing to chip in to keep some
of these I think.
...
[0] https://review.rdoproject.org/r/#/c/19006/
[1] https://review.rdoproject.org/r/#/c/18830/
...
Rather than debate these things ad-hoc on some of the various
reviews I
figured it work asking here. Do we have a criteria for when it is
appropriate to deprecate a service that is implemented and fully
working? Is it costing us that much in terms of CI and resources to
keep a few of these services around?
Do you have a definition of "fully implemented"?  Some of the
services
that have been added were added but never actually tested. Designate
only recently was covered with testing.  Things like Congress have
never been tested (like via tempest) and we've only done an install
but no actual service verification.  I would say Designate might be
closer to fully implemented but Tacker/Congress would not be
considered implemented.
Given that we've previously been asked to reduce our CI footprint, I
think it's hard to say is it really costing that much because the
answer would be yes if it has even the slightest impact.  The fewer
services we support, the less scenarios we have to have, the less
complex deployments we have and the less resource it consumes.
For the services we agree to keep we could always run them in a lower
bandwidth CI framework. Something like periodic jobs. Understood these
would occasionally get broken but the upstream feedback loop would at
least exist and the services could stay. And we'd still be able to
reduce our CI resources as well.
...
Thanks,
-Alex
...
Dan

Re: [TripleO] criteria for deprecating services

Mark Goddard