But for those running Gnocchi in prod, this is likely something you may
want to know about and we'd like to hear from you.
Hello, everyone!
Here at Selectel we use Gnocchi as a backend for Ceilometer – we gather
different metrics from virtual machines and provide our customers with graphs in
a control panel. In this scenario we rely on Gnocchi's Keystone auth support and
nearly standard mappings for instances, volumes, ports, etc provided out of the
box.
We also use Gnocchi as a secondary target for our home-grown billing system.
Billing measures are gathered from different OpenStack and custom APIs,
go through the charging engine and then being POSTed to Gnocchi API in batches.
Here again we need the possibility to fetch measures with project- and domain-
scoped tokens on the customer side in the control panel to be able to separate scopes
for resellers (domain owners) and their clients (project owners).
The third way to consume Gnocchi API is through OpenStack Watcher in it's
strategy for balancing load in our regions. Here we use hosts metrics as well as
virtual machines metrics.
What do we like in Gnocchi:
- API is clean and easy to use, object model is universal and makes us able to
utilize it in different scenarios;
- Fast enough for our use cases;
- Can store metrics for a long period of time with a ceph backend with no
performance penalty – useful in billing case.
What we do not like:
- server-side aggregations do not work as one might think they should work – API
and CLI are very hard to use, we stopped trying to use them;
- very CPU and disk IO intensive, platforms are hot like hell 24/7 processing
not more then 1k metrics per second;
- sometimes deadlocks happen in Redis incoming metrics storage preventing
measures from certain metrics from being processed.
What are our plans for the nearest future:
- try to switch Watcher to Grafana backend to be able to use the same Prometheus
metrics we rely on for alerting and capacity planning;
- continue using Gnocchi only for VMs mertics, switching billing system for
something more reliable in terms of missed points on graphs.
Speaking about VMs metrics, it would probably be great to be able to continue
using Gnocchi API for customer-facing features as it works well with OpenStack
object model, authentication and everything. But Gnocchi's TSDB is not the best
on the market. By switching it to Victoria Metrics, providing Prometheus API and
working amazingly with Grafana, we would be able to gather and store metrics
with node/libvirt exporters and Prometheus doing remote writes to Victoria, and consume them via Grafana/AlertManager or
Gnocchi API depending on a scenario.
--