[tc][telemetry][gnocchi] The future of Gnocchi in OpenStack
Zane Bitter
zbitter at redhat.com
Fri Aug 28 14:48:53 UTC 2020
On 28/08/20 8:36 am, Adrian Turjak wrote:
> Hey OpenStackers,
>
> We're currently in the process of discussing what to do with OpenStack's
> reliance on Gnocchi, and at present it is looking like we are most
> likely to just fork it back under a new name (currently Farfalle to
> stick with the pasta theme).
>
> The discussion is mostly happening here:
> https://review.opendev.org/#/c/744592/
>
> But for those running Gnocchi in prod, this is likely something you may
> want to know about and we'd like to hear from you.
>
> A bit of history: Gnocchi started off as a new backend for Ceilometer in
> OpenStack, and eventually become the defacto API for telemetry samples
> when that was removed from Ceilometer (as backed by MongoDB). Gnocchi
> was eventually spun off outside of OpenStack, but still essentially
> remained our API for telemetry despite not being an official part of
> OpenStack anymore.
I think a large part of the issue here is that there are multiple
reasons for wanting (small-t) telemetry from OpenStack, and historically
because of reasons they have all been conflated into one Thing with the
result that sometimes one use case wins. At least 3 that I can think of are:
1) Monitoring the OpenStack infrastructure by the operator, including
feeding into business processes like reporting, capacity planning &c.
2) Billing
3) Monitoring user resources by the user/application, either directly or
via other OpenStack services like Heat or Senlin.
For the first, you just want to be able to dump data into a TSDB of the
operator's choice. Since all of the reporting requirements are
business-specific anyway, it's up to the operator to decide how they
want to store the data and how they want to interact with it. It appears
that this may have been the theory behind the Gnocchi split.
On the other hand, for the third one you really need something that
should be an official OpenStack API with all of the attendant stability
guarantees, because it is part of OpenStack's user interface.
The second lands somewhere in between; AIUI CloudKitty is written to
support multiple back-ends, with OpenStack Telemetry being the primary
one. So it needs a fairly stable API because it's consumed by other
OpenStack projects, but it's ultimately operator-facing.
As I have argued before, when we are thinking about road maps we need to
think of these as different use cases, and they're different enough that
they are probably best served by least two separate tools.
Mohammed has made a compelling argument in the past that Prometheus is
more or less the industry standard for the first use case, and we should
just export metrics to that directly in the OpenStack services, rather
than going through the Ceilometer collector.
I don't know what should be done about the third, but I do know that
currently Telemetry is breaking Heat's gate and people are seriously
discussing disabling the Telemetry-related tests, which I assume would
mean deprecating the resources. Monasca offers an alternative, but isn't
preferred for some distributors and operators because it brings the
whole Java ecosystem along for the ride (managing the Python one is
already hard enough).
cheers,
Zane.
More information about the openstack-discuss
mailing list