On Mon, 4 Mar 2019 at 17:52, Ben Nemec <openstack@nemebean.com> wrote:


On 3/4/19 9:16 AM, Alex Schultz wrote:
> On Mon, Mar 4, 2019 at 6:11 AM Dan Prince <dprince@redhat.com> wrote:
>>
>> On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:
>>> On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:
>>>> Recently we've been cleaning house in some of of the TripleO
>>>> supported
>>>> services.
>>>>
>>>> We removed MongoDB as RDO was also dropping it. I guess we needed
>>>> to
>>>> follow suite as our CI is also based on the packages there.
>>>>
>>>> For other services (Designate for example) if the RDO packages
>>>> exist
>>>> and we already have support do we really need to deprecate them?
>>>> Having
>>>> the ability to deploy some of the lesser used but still active
>>>> OpenStack projects with our deployment framework is nice for
>>>> developers
>>>> and users alike. Especially when you want to try out a new
>>>> services.
>>>>
>>>
>>> It's the long term maintenance of them to ensure they continue to
>>> work
>>> (packaging/promotions/requirement syncing). If no one is watching
>>> them
>>> and making sure they still work, I'm not sure it's worth saying they
>>> are "supported". Much like the baremetal support that we had, when we
>>> drop any testing we might as well mark them deprecated since there is
>>> no way to know if they still "work" the next day.  Adding and
>>> maintaining services is non-trivial so unless it's actively used, I
>>> don't think it's necessarily a bad thing to trim our "supported" list
>>> to a set of known good services.
>>>
>>> Just in the last two or three weeks I've had to go address packaging
>>> problems with Vitrage[0] and Tacker[1] because requirements changed
>>> in
>>> the project and the packages weren't kept up to date so the puppet
>>> module CI was broken.  No one noticed this was broken until we went
>>> to
>>> go update some unrelated things and found out that they were broken.
>>> The same thing happens in TripleO too where a breakage in a less than
>>> supported service takes away time for more important work.  The cost
>>> to keep these things working is > 0.
>>
>> Agree the cost isn't zero. But it also isn't high. And there is value
>> to a project having a deep bench of services from which to choose and
>> try out. The existance of at least some "niche" services in TripleO
>> provides some value to our users and perhaps even an argument to use
>> TripleO as it would be considered a feature to be able to try out these
>> services. Perhaps even partially implemented ones in some cases still
>> have value (no HA support for example).
>>
>
> So I gave it some thought and rather than just deprecating for
> removal, could we instead mark them as experimental and treat them as
> such?  Yes you're right that folks might want to try these services,
> however there is no clear definition of a service that should always
> work vs a service that might work.  From an end user perspective if
> they see that something like Congress is defined and they try and
> consume it only to find out it doesn't work or isn't configured
> correctly then that is a poor experience.   I also don't think someone
> who is new to TripleO who wants to try out a service will likely be
> able to figure out why it's not working and just think "TripleO
> doesn't work".  Can we move services which we have no guarentee to be
> working (no testing/no owners) to a /experimental/ folder to indicate
> the service may or may not work?

As someone who wrote the templates for a now-deprecated service I like
the idea of them living on in some format. On the other hand, in the
course of writing the Designate templates they were broken multiple
times by TripleO changes to the service interfaces. If a service isn't
being tested regularly I suspect there's little chance of it continuing
to work long-term without _someone_ looking after it.

Heck, Designate _is_ in the gate right now and it still broke recently
in real deployments with separate control and compute nodes. Without
someone paying attention to it I don't know how that would ever have
been found or fixed.

I think my recommendation would be to keep James's maintainer
requirement for even experimental services, but maybe instead of gating
on them just have a periodic job that runs with them enabled once a
night and emails the maintainer of record if it fails. That way they
can't block other work and aren't consuming much in the way of ci
resources, but they can be maintained with minimal effort. It might
encourage more people to sign up as maintainers if they know breakages
in the service aren't going to force them to drop everything to unblock
the gate.

In the kolla project we run some of the service-specific jobs only when relevant files have changed, using Zuuls files/irrelevant-files configuration syntax. This can be combined with a periodic job to catch code rot.
Mark


Or maybe that will just result in all the periodic jobs failing
indefinitely, but if that happens then you know the maintainer isn't
maintaining anymore and you can deprecate the service.

I'm also not sure how much burden that would put on the ci squad to set
up such jobs. That's another discussion we'd need to have.

>
>
>> I just spent the time to "flatten" many of these services thinking they
>> would stay for awhile. Many of us are willing to chip in to keep some
>> of these I think.
>>
>>>
>>> [0] https://review.rdoproject.org/r/#/c/19006/
>>> [1] https://review.rdoproject.org/r/#/c/18830/
>>>
>>>> Rather than debate these things ad-hoc on some of the various
>>>> reviews I
>>>> figured it work asking here. Do we have a criteria for when it is
>>>> appropriate to deprecate a service that is implemented and fully
>>>> working? Is it costing us that much in terms of CI and resources to
>>>> keep a few of these services around?
>>>>
>>>
>>> Do you have a definition of "fully implemented"?  Some of the
>>> services
>>> that have been added were added but never actually tested. Designate
>>> only recently was covered with testing.  Things like Congress have
>>> never been tested (like via tempest) and we've only done an install
>>> but no actual service verification.  I would say Designate might be
>>> closer to fully implemented but Tacker/Congress would not be
>>> considered implemented.
>>>
>>> Given that we've previously been asked to reduce our CI footprint, I
>>> think it's hard to say is it really costing that much because the
>>> answer would be yes if it has even the slightest impact.  The fewer
>>> services we support, the less scenarios we have to have, the less
>>> complex deployments we have and the less resource it consumes.
>>
>> For the services we agree to keep we could always run them in a lower
>> bandwidth CI framework. Something like periodic jobs. Understood these
>> would occasionally get broken but the upstream feedback loop would at
>> least exist and the services could stay. And we'd still be able to
>> reduce our CI resources as well.
>>
>>>
>>> Thanks,
>>> -Alex
>>>
>>>> Dan
>>>>
>>>>
>>
>