[Telemetry][Requirements][TC] When did we stop caring about backwards compatibility
Hi all, I'd like to bring the conversation to a wider audience as the current state is really not sustainable. Since I've got involved with Telemetry project early this year, I honestly think our gates have been blocked more than not and most of that has been due to requirements bumps where the dependencies break backwards compatibility and that is not obvious, like major version bump (pyparsing being the current example [0] [1]), or are just not tested before bumped (like the recent Sphinx/WSME episode [2]). While I do recognize Dr. Harbott's concerns around gnocchi [1] I'd like to point out that a) it's still currently our only supported time series database for metrics and b) providing alternative solution has not been an exactly smooth road either [3]. Unfortunately gnocchi is still very tightly coupled in our Telemetry testing (something I'm working on) and not having an operational gate makes it very difficult to even work on those tests across the Aodh and Telemetry-tempest-plugin repos. We have been capping and blacklisting broken dependencies for years, so I'd like to hear what has changed recently so that we do not care anymore but prefer "the latest" over the "latest working". In my understanding the upper constraints was supposed to be "the latest dependencies you can stand up a functional and working cloud with" not "the latest dependencies existing regardless of what they break". Back to Dr. Harbott's remarks of gnocchi being an acceptable requirement [1], without a working gate we cannot even deprecate it. And yet we would need an appropriate deprecation period before removing it. Should we be forced to rip out our only TS database (gnocchi) to be able to focus on developing support for the alternative (Prometheus) I would really like to see TC resolution for this so we can point the impacted users to it, avoiding all the blame falling to the Telemetry project team. I would also like to see the Telemetry job being in the Requirements gate so we get warnings when something is about to break, rather than a jammed gate because of an upper constraints bump. Thanks, Erno "jokke" Kuvaja [0] https://github.com/pyparsing/pyparsing/issues/504 [1] https://review.opendev.org/c/openstack/requirements/+/898699 [2] https://review.opendev.org/c/openstack/requirements/+/896018 [3] https://review.opendev.org/c/openstack/requirements/+/891291
Hello, I just want to comment on the pyparsing pinning and say that it’s removed in Gnocchi master branch, let me know if you need it for any stable branches as well as it was only pinned to make verification of error messages in testing pass. Best regards Tobias
On 3 Nov 2023, at 13:07, Erno Kuvaja <ekuvaja@redhat.com> wrote:
Hi all,
I'd like to bring the conversation to a wider audience as the current state is really not sustainable. Since I've got involved with Telemetry project early this year, I honestly think our gates have been blocked more than not and most of that has been due to requirements bumps where the dependencies break backwards compatibility and that is not obvious, like major version bump (pyparsing being the current example [0] [1]), or are just not tested before bumped (like the recent Sphinx/WSME episode [2]).
While I do recognize Dr. Harbott's concerns around gnocchi [1] I'd like to point out that a) it's still currently our only supported time series database for metrics and b) providing alternative solution has not been an exactly smooth road either [3]. Unfortunately gnocchi is still very tightly coupled in our Telemetry testing (something I'm working on) and not having an operational gate makes it very difficult to even work on those tests across the Aodh and Telemetry-tempest-plugin repos.
We have been capping and blacklisting broken dependencies for years, so I'd like to hear what has changed recently so that we do not care anymore but prefer "the latest" over the "latest working". In my understanding the upper constraints was supposed to be "the latest dependencies you can stand up a functional and working cloud with" not "the latest dependencies existing regardless of what they break".
Back to Dr. Harbott's remarks of gnocchi being an acceptable requirement [1], without a working gate we cannot even deprecate it. And yet we would need an appropriate deprecation period before removing it. Should we be forced to rip out our only TS database (gnocchi) to be able to focus on developing support for the alternative (Prometheus) I would really like to see TC resolution for this so we can point the impacted users to it, avoiding all the blame falling to the Telemetry project team.
I would also like to see the Telemetry job being in the Requirements gate so we get warnings when something is about to break, rather than a jammed gate because of an upper constraints bump.
Thanks, Erno "jokke" Kuvaja
[0] https://github.com/pyparsing/pyparsing/issues/504 [1] https://review.opendev.org/c/openstack/requirements/+/898699 [2] https://review.opendev.org/c/openstack/requirements/+/896018 [3] https://review.opendev.org/c/openstack/requirements/+/891291
On Fri, 3 Nov 2023 at 07:11, Erno Kuvaja <ekuvaja@redhat.com> wrote:
Hi all,
I'd like to bring the conversation to a wider audience as the current state is really not sustainable. Since I've got involved with Telemetry project early this year, I honestly think our gates have been blocked more than not and most of that has been due to requirements bumps where the dependencies break backwards compatibility and that is not obvious, like major version bump (pyparsing being the current example [0] [1]), or are just not tested before bumped (like the recent Sphinx/WSME episode [2]).
The "Sphinx/WSME episode" was a human error in risk and impact assessment. As an individual that contributed to that, I apologise and have made workflow changes to try to avoid it in the future. I will not speak to the long term stability of the telemetry gate.
While I do recognize Dr. Harbott's concerns around gnocchi [1] I'd like to point out that a) it's still currently our only supported time series database for metrics and b) providing alternative solution has not been an exactly smooth road either [3]. Unfortunately gnocchi is still very tightly coupled in our Telemetry testing (something I'm working on) and not having an operational gate makes it very difficult to even work on those tests across the Aodh and Telemetry-tempest-plugin repos.
We have been capping and blacklisting broken dependencies for years, so I'd like to hear what has changed recently so that we do not care anymore but prefer "the latest" over the "latest working". In my understanding the upper constraints was supposed to be "the latest dependencies you can stand up a functional and working cloud with" not "the latest dependencies existing regardless of what they break".
First, some terminology that I'll use. *direct* dependency: A library directly used by OpenStack. For example, anything in global-requirements.txt *indirect* dependency: A library pulled in by a direct dependency. For example, anything in upper-constraints.txt and *NOT* in global-requirements.txt OpenStack project: Hopefully that's obvious :) Code in the openstack org on opendev.org In this case pyparsing is a direct dependency, as is gnocchiclient. gnocchi itself is ... something else not because of history but because it isn't listed in either place and instead, as far as I can tell, is pulled in directly from github via required_projects [1], and then installed from git. I'm actually going to ignore that and treat gnocchi as an indirect dependency. We have been pinning *direct* dependencies of openstack, when an OpenStack project itself was broken. For example ceilometer's gate was failing because of this pyparsing change. I tried to find an example similar to this where there is a *direct* dependency that is incompatible with the requirements of an *indirect* dependency. I couldn't find one. That doesn't mean it hasn't happened, just pointing out that this is .... somewhat a-typical, scenario.
Back to Dr. Harbott's remarks of gnocchi being an acceptable requirement [1], without a working gate we cannot even deprecate it. And yet we would need an appropriate deprecation period before removing it. Should we be forced to rip out our only TS database (gnocchi) to be able to focus on developing support for the alternative (Prometheus) I would really like to see TC resolution for this so we can point the impacted users to it, avoiding all the blame falling to the Telemetry project team.
I can't see anywhere that the acceptability of gnocchi was questioned. There is this[2] where the method of installation of gnocchi was questioned. For example modify https://opendev.org/openstack/ceilometer/src/branch/master/devstack/plugin.s... to install gnocchi in a venv, and then start services from there. It's possible that there could be some incompatibilities but if all the communications to/from gnocchi are via the REST API and not RPC they're very very small. The change only modified upper-constraints and ignored global-requirements which creates additional work for the requirements team and, I believe, is what is meant by "Note also that a pin as proposed here causes a lot of maintenance effort as it has to be manually kept in place for every weekly requirements update." [3]
I would also like to see the Telemetry job being in the Requirements gate so we get warnings when something is about to break, rather than a jammed gate because of an upper constraints bump.
We can do that assuming there is a reliable representative job. There is example here: https://review.opendev.org/c/openstack/requirements/+/831334 I think a reasonable summary of this is: * One team [requirements[, has made mistakes which have impacted another team [telemetry], raising developer frustration. NOTE: This is NOT explicit anywhere in the extended conversation and is my personal take-away * Because this resulted in a non-functional gate for the telemetry team, there was a, mismatched, sense of urgency also raising developer frustration. * There are some, albeit small, technical issues with the requested change, which were not communicated well. * The *indirect* dependency (gnocchi) has an ... elaborate .. history with OpenStack which possibly contributed. The action item I see from this is for the requirements team to form, and document a statement about how compatibility between direct and indirect dependencies are handled. This document should be shared for feedback. This is a regretful series of events, and an excellent reminder to be excellent towards each other. Tony. [1] https://opendev.org/openstack/telemetry-tempest-plugin/src/branch/master/.zu... [2] https://meetings.opendev.org/irclogs/%23openstack-requirements/%23openstack-... [3] https://review.opendev.org/c/openstack/requirements/+/898699/1#message-cc8f3...
On 2023-11-06 16:00:43 -0600 (-0600), Tony Breeds wrote: [...]
gnocchi itself is ... something else not because of history but because it isn't listed in either place and instead, as far as I can tell, is pulled in directly from github via required_projects [1], and then installed from git. [...]
We typically reserve that installation method for additional jobs which consume specific dependencies from source in order to be able to depends-on pull requests and the like. Maybe one takeaway here is that Gnocchi should be included in the global requirements/constraints lists, installed from released packages like any normal direct Python language dependency, and then any jobs which absolutely need to install from source in order to test unreleased Gnocchi can override that. -- Jeremy Stanley
On Mon, 6 Nov 2023 at 17:38, Jeremy Stanley <fungi@yuggoth.org> wrote:
We typically reserve that installation method for additional jobs which consume specific dependencies from source in order to be able to depends-on pull requests and the like. Maybe one takeaway here is that Gnocchi should be included in the global requirements/constraints lists, installed from released packages like any normal direct Python language dependency, and then any jobs which absolutely need to install from source in order to test unreleased Gnocchi can override that.
Indeed. I think that's another concrete action item. Yours Tony.
participants (4)
-
Erno Kuvaja
-
Jeremy Stanley
-
Tobias Urdin
-
Tony Breeds