[TripleO] criteria for deprecating services

Dan Prince

1 Mar 2019 1 Mar '19

2:20 p.m.

Recently we've been cleaning house in some of of the TripleO supported services. We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there. For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services. Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around? Dan

Show replies by date

Alex Schultz

1 Mar 1 Mar

2:43 p.m.

On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...

Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services. Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0. [0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

...

Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?

Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented. Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes. Thanks, -Alex

...

Dan

Tony Breeds

3 Mar 3 Mar

4:01 p.m.

On Fri, Mar 01, 2019 at 03:43:18PM -0700, Alex Schultz wrote:

...

On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...
Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.

Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.

[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

I don't really want to distract from the general topic on which I don't have any strong oinions ... BUT ... For this case is there a CI job what could have caught this failure? In general on OpenSTack infrastructire we have the periodic pipeline so that once a day we can check the generat health of a project. This gives us a) an indication that things are broken and b) when (within 24hours) they broke? Could we do something similar in RDO? Granted this doesn't mean people will pay attention and fix things but it does help with backtracking? Another thought I know we have some automatic jobs that rebuild RDO packages when the requirements team bump things in u-c could we run something similar that looks at the requirements of a service and check that they're reflected in the .spec? Yours Tony.

Emilien Macchi

4:18 p.m.

(as the one who has been dropping stuffs lately) To emphasize what Alex said: Yes there is a cost in maintaining all these services, and we would like to spend our time on doing something else but maintaining the world, and refocus on what does really matter. I agree very much that it's a shame to deprecate Designate. As we know it's an important service ans it was freshly added to TripleO. However, if there is no team to maintain its integration then it's always going to be the same folks who will maintain its integration (packaging, puppet-tripleo, THT, containers, etc). I believe that some of us are exhausted to support that amount of YAML when we know less than 50% is actually deployed & supported in production. Of course we don't have all numbers but it's a guess from our bug reports and feature requests. I also believe these people who tirelessly maintain CI & packaging might want to reduce their time to debug these unused (or less used) service and have cycles to actually simplify TripleO and think about the next steps. What I propose is that we continue to collect feedback from our users and people who deploy TripleO. And we deal with the services case by case. For Designate, I agree it's not ideal to deprecate it upstream and maybe we can give it one more chance (one cycle), to see if we have potential users. For Congress, I haven't seen any user to be honest. Same for Tacker. Same for ODL. And it will be the same for Docker in Train cycle. Thanks for running this discussion Dan, I hope we can find some consensus. On Fri, Mar 1, 2019 at 5:48 PM Alex Schultz <aschultz@redhat.com> wrote:

...

On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...
Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.

Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.

[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

...
Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?

Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented.

Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes.

Thanks, -Alex

...
Dan

-- Emilien Macchi

James Slagle

4 Mar 4 Mar

4:43 a.m.

On Sun, Mar 3, 2019 at 7:22 PM Emilien Macchi <emilien@redhat.com> wrote:

...

(as the one who has been dropping stuffs lately)

To emphasize what Alex said: Yes there is a cost in maintaining all these services, and we would like to spend our time on doing something else but maintaining the world, and refocus on what does really matter.

I agree very much that it's a shame to deprecate Designate. As we know it's an important service ans it was freshly added to TripleO. However, if there is no team to maintain its integration then it's always going to be the same folks who will maintain its integration (packaging, puppet-tripleo, THT, containers, etc).

I believe that some of us are exhausted to support that amount of YAML when we know less than 50% is actually deployed & supported in production. Of course we don't have all numbers but it's a guess from our bug reports and feature requests. I also believe these people who tirelessly maintain CI & packaging might want to reduce their time to debug these unused (or less used) service and have cycles to actually simplify TripleO and think about the next steps.

What I propose is that we continue to collect feedback from our users and people who deploy TripleO. And we deal with the services case by case. For Designate, I agree it's not ideal to deprecate it upstream and maybe we can give it one more chance (one cycle), to see if we have potential users. For Congress, I haven't seen any user to be honest. Same for Tacker. Same for ODL. And it will be the same for Docker in Train cycle.

I'd propose something slightly different. We start a spreadsheet/wiki/git doc/whatever with a list of all the services and specific maintainers. Each service must have at least one maintainer. For those services that don't have a maintainer, they are deprecated. That's the only requirement...there must be at least one specific person who has volunteered to maintain the service, which includes all t-h-t, puppet, packaging, and CI (if applicable). If the service is not maintained then it's deprecated and will then be removed. That way, there is a very straightforward path to avoid service deprecation...maintain it. Also, folks who generally maintain CI and packaging and keep things running smoothly for the rest of us don't have to worry about individual services that may or may not be maintained. If a service is causing issues and it's documented as being maintained (but it's actually not), then anyone can propose its deprecation as a way to come to consensus on its actual state. -- -- James Slagle --

Emilien Macchi

5:59 a.m.

On Mon, Mar 4, 2019 at 7:43 AM James Slagle <james.slagle@gmail.com> wrote: [...] I'd propose something slightly different. We start a

...

spreadsheet/wiki/git doc/whatever with a list of all the services and specific maintainers. Each service must have at least one maintainer. For those services that don't have a maintainer, they are deprecated.

I like the idea of ownership.

...

That's the only requirement...there must be at least one specific person who has volunteered to maintain the service, which includes all t-h-t, puppet, packaging, and CI (if applicable). If the service is not maintained then it's deprecated and will then be removed.

That way, there is a very straightforward path to avoid service deprecation...maintain it.

Also, folks who generally maintain CI and packaging and keep things running smoothly for the rest of us don't have to worry about individual services that may or may not be maintained. If a service is causing issues and it's documented as being maintained (but it's actually not), then anyone can propose its deprecation as a way to come to consensus on its actual state.

Thanks, I think this is a sane proposal. -- Emilien Macchi

Dan Prince

5:11 a.m.

On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:

...

On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...
Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.

Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.

Agree the cost isn't zero. But it also isn't high. And there is value to a project having a deep bench of services from which to choose and try out. The existance of at least some "niche" services in TripleO provides some value to our users and perhaps even an argument to use TripleO as it would be considered a feature to be able to try out these services. Perhaps even partially implemented ones in some cases still have value (no HA support for example). I just spent the time to "flatten" many of these services thinking they would stay for awhile. Many of us are willing to chip in to keep some of these I think.

...

[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

...
Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?

Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented.

Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes.

For the services we agree to keep we could always run them in a lower bandwidth CI framework. Something like periodic jobs. Understood these would occasionally get broken but the upstream feedback loop would at least exist and the services could stay. And we'd still be able to reduce our CI resources as well.

...

Thanks, -Alex

...
Dan

Emilien Macchi

6:02 a.m.

On Mon, Mar 4, 2019 at 8:17 AM Dan Prince <dprince@redhat.com> wrote:

...

Agree the cost isn't zero. But it also isn't high. And there is value to a project having a deep bench of services from which to choose and try out. The existance of at least some "niche" services in TripleO provides some value to our users and perhaps even an argument to use TripleO as it would be considered a feature to be able to try out these services. Perhaps even partially implemented ones in some cases still have value (no HA support for example)

Maybe some middle ground here would be to move these services into tht/deployment/unsupported (if our community decided to NOT deprecate them). What do you think? Again my goal isn't to block innovation but to let our team refocus on other subjects that matter. -- Emilien Macchi

Alex Schultz

7:16 a.m.

On Mon, Mar 4, 2019 at 6:11 AM Dan Prince <dprince@redhat.com> wrote:

...

On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:

...
On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...
Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.

Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.

Agree the cost isn't zero. But it also isn't high. And there is value to a project having a deep bench of services from which to choose and try out. The existance of at least some "niche" services in TripleO provides some value to our users and perhaps even an argument to use TripleO as it would be considered a feature to be able to try out these services. Perhaps even partially implemented ones in some cases still have value (no HA support for example).

So I gave it some thought and rather than just deprecating for removal, could we instead mark them as experimental and treat them as such? Yes you're right that folks might want to try these services, however there is no clear definition of a service that should always work vs a service that might work. From an end user perspective if they see that something like Congress is defined and they try and consume it only to find out it doesn't work or isn't configured correctly then that is a poor experience. I also don't think someone who is new to TripleO who wants to try out a service will likely be able to figure out why it's not working and just think "TripleO doesn't work". Can we move services which we have no guarentee to be working (no testing/no owners) to a /experimental/ folder to indicate the service may or may not work?

...

I just spent the time to "flatten" many of these services thinking they would stay for awhile. Many of us are willing to chip in to keep some of these I think.

...
[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

...
Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?

Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented.

Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes.

For the services we agree to keep we could always run them in a lower bandwidth CI framework. Something like periodic jobs. Understood these would occasionally get broken but the upstream feedback loop would at least exist and the services could stay. And we'd still be able to reduce our CI resources as well.

...
Thanks, -Alex

...
Dan

Ben Nemec

9:51 a.m.

On 3/4/19 9:16 AM, Alex Schultz wrote:

...

On Mon, Mar 4, 2019 at 6:11 AM Dan Prince <dprince@redhat.com> wrote:

...
On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:

...
On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...
Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.

Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.

Agree the cost isn't zero. But it also isn't high. And there is value to a project having a deep bench of services from which to choose and try out. The existance of at least some "niche" services in TripleO provides some value to our users and perhaps even an argument to use TripleO as it would be considered a feature to be able to try out these services. Perhaps even partially implemented ones in some cases still have value (no HA support for example).

So I gave it some thought and rather than just deprecating for removal, could we instead mark them as experimental and treat them as such? Yes you're right that folks might want to try these services, however there is no clear definition of a service that should always work vs a service that might work. From an end user perspective if they see that something like Congress is defined and they try and consume it only to find out it doesn't work or isn't configured correctly then that is a poor experience. I also don't think someone who is new to TripleO who wants to try out a service will likely be able to figure out why it's not working and just think "TripleO doesn't work". Can we move services which we have no guarentee to be working (no testing/no owners) to a /experimental/ folder to indicate the service may or may not work?

As someone who wrote the templates for a now-deprecated service I like the idea of them living on in some format. On the other hand, in the course of writing the Designate templates they were broken multiple times by TripleO changes to the service interfaces. If a service isn't being tested regularly I suspect there's little chance of it continuing to work long-term without _someone_ looking after it. Heck, Designate _is_ in the gate right now and it still broke recently in real deployments with separate control and compute nodes. Without someone paying attention to it I don't know how that would ever have been found or fixed. I think my recommendation would be to keep James's maintainer requirement for even experimental services, but maybe instead of gating on them just have a periodic job that runs with them enabled once a night and emails the maintainer of record if it fails. That way they can't block other work and aren't consuming much in the way of ci resources, but they can be maintained with minimal effort. It might encourage more people to sign up as maintainers if they know breakages in the service aren't going to force them to drop everything to unblock the gate. Or maybe that will just result in all the periodic jobs failing indefinitely, but if that happens then you know the maintainer isn't maintaining anymore and you can deprecate the service. I'm also not sure how much burden that would put on the ci squad to set up such jobs. That's another discussion we'd need to have.

...

...
I just spent the time to "flatten" many of these services thinking they would stay for awhile. Many of us are willing to chip in to keep some of these I think.

...
[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

...
Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?

Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented.

Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes.

For the services we agree to keep we could always run them in a lower bandwidth CI framework. Something like periodic jobs. Understood these would occasionally get broken but the upstream feedback loop would at least exist and the services could stay. And we'd still be able to reduce our CI resources as well.

...
Thanks, -Alex

...
Dan

Mark Goddard

5 Mar 5 Mar

12:01 a.m.

On Mon, 4 Mar 2019 at 17:52, Ben Nemec <openstack@nemebean.com> wrote:

...

On 3/4/19 9:16 AM, Alex Schultz wrote:

...
On Mon, Mar 4, 2019 at 6:11 AM Dan Prince <dprince@redhat.com> wrote:

...
On Fri, 2019-03-01 at 15:43 -0700, Alex Schultz wrote:

...
On Fri, Mar 1, 2019 at 3:24 PM Dan Prince <dprince@redhat.com> wrote:

...
Recently we've been cleaning house in some of of the TripleO supported services.

We removed MongoDB as RDO was also dropping it. I guess we needed to follow suite as our CI is also based on the packages there.

For other services (Designate for example) if the RDO packages exist and we already have support do we really need to deprecate them? Having the ability to deploy some of the lesser used but still active OpenStack projects with our deployment framework is nice for developers and users alike. Especially when you want to try out a new services.

It's the long term maintenance of them to ensure they continue to work (packaging/promotions/requirement syncing). If no one is watching them and making sure they still work, I'm not sure it's worth saying they are "supported". Much like the baremetal support that we had, when we drop any testing we might as well mark them deprecated since there is no way to know if they still "work" the next day. Adding and maintaining services is non-trivial so unless it's actively used, I don't think it's necessarily a bad thing to trim our "supported" list to a set of known good services.

Just in the last two or three weeks I've had to go address packaging problems with Vitrage[0] and Tacker[1] because requirements changed in the project and the packages weren't kept up to date so the puppet module CI was broken. No one noticed this was broken until we went to go update some unrelated things and found out that they were broken. The same thing happens in TripleO too where a breakage in a less than supported service takes away time for more important work. The cost to keep these things working is > 0.

Agree the cost isn't zero. But it also isn't high. And there is value to a project having a deep bench of services from which to choose and try out. The existance of at least some "niche" services in TripleO provides some value to our users and perhaps even an argument to use TripleO as it would be considered a feature to be able to try out these services. Perhaps even partially implemented ones in some cases still have value (no HA support for example).

So I gave it some thought and rather than just deprecating for removal, could we instead mark them as experimental and treat them as such? Yes you're right that folks might want to try these services, however there is no clear definition of a service that should always work vs a service that might work. From an end user perspective if they see that something like Congress is defined and they try and consume it only to find out it doesn't work or isn't configured correctly then that is a poor experience. I also don't think someone who is new to TripleO who wants to try out a service will likely be able to figure out why it's not working and just think "TripleO doesn't work". Can we move services which we have no guarentee to be working (no testing/no owners) to a /experimental/ folder to indicate the service may or may not work?

As someone who wrote the templates for a now-deprecated service I like the idea of them living on in some format. On the other hand, in the course of writing the Designate templates they were broken multiple times by TripleO changes to the service interfaces. If a service isn't being tested regularly I suspect there's little chance of it continuing to work long-term without _someone_ looking after it.

Heck, Designate _is_ in the gate right now and it still broke recently in real deployments with separate control and compute nodes. Without someone paying attention to it I don't know how that would ever have been found or fixed.

I think my recommendation would be to keep James's maintainer requirement for even experimental services, but maybe instead of gating on them just have a periodic job that runs with them enabled once a night and emails the maintainer of record if it fails. That way they can't block other work and aren't consuming much in the way of ci resources, but they can be maintained with minimal effort. It might encourage more people to sign up as maintainers if they know breakages in the service aren't going to force them to drop everything to unblock the gate.

In the kolla project we run some of the service-specific jobs only when relevant files have changed, using Zuuls files/irrelevant-files configuration syntax. This can be combined with a periodic job to catch code rot. Mark

...

Or maybe that will just result in all the periodic jobs failing indefinitely, but if that happens then you know the maintainer isn't maintaining anymore and you can deprecate the service.

I'm also not sure how much burden that would put on the ci squad to set up such jobs. That's another discussion we'd need to have.

...
...
I just spent the time to "flatten" many of these services thinking they would stay for awhile. Many of us are willing to chip in to keep some of these I think.

...
[0] https://review.rdoproject.org/r/#/c/19006/ [1] https://review.rdoproject.org/r/#/c/18830/

...
Rather than debate these things ad-hoc on some of the various reviews I figured it work asking here. Do we have a criteria for when it is appropriate to deprecate a service that is implemented and fully working? Is it costing us that much in terms of CI and resources to keep a few of these services around?

Do you have a definition of "fully implemented"? Some of the services that have been added were added but never actually tested. Designate only recently was covered with testing. Things like Congress have never been tested (like via tempest) and we've only done an install but no actual service verification. I would say Designate might be closer to fully implemented but Tacker/Congress would not be considered implemented.

Given that we've previously been asked to reduce our CI footprint, I think it's hard to say is it really costing that much because the answer would be yes if it has even the slightest impact. The fewer services we support, the less scenarios we have to have, the less complex deployments we have and the less resource it consumes.

For the services we agree to keep we could always run them in a lower bandwidth CI framework. Something like periodic jobs. Understood these would occasionally get broken but the upstream feedback loop would at least exist and the services could stay. And we'd still be able to reduce our CI resources as well.

...
Thanks, -Alex

...
Dan

2315

Age (days ago)

2319

Last active (days ago)

List overview

Download

10 comments

7 participants

participants (7)

Alex Schultz
Ben Nemec
Dan Prince
Emilien Macchi
James Slagle
Mark Goddard
Tony Breeds