[Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

Daniele Venzano daniele.venzano at eurecom.fr
Thu Feb 12 11:24:12 UTC 2015

Unfortunately, I can only confirm the sorry state of Ceilometer.
We tried it on a very small setup (6 compute nodes) and run in so many issues, we dropped it and created our own solution based on a mix of scripts that read from the nova/neutron DB, iptables and collectd data. No need for more collection agents than what we are already running for the systems monitoring.

We tried the version in Havana and, later, in Icehouse. For starters the documentation was suggesting MySQL as default backend. MySQL will last just a few days and then break down under the size of the tables. We tried MongoDB, but were still not satisfied with performance on such a small cluster.
Then there is the metering agent. It is yet another daemon, not integrated in Neutron and there is no documentation about what it is actually measuring. What if I have multiple routers? Ingress and Egress? >From which point of view?
The same applies to Cinder, it requires and external agent (to be run via cron!).

Some metrics were not recorded, we couldn't understand why and, again, no documentation and no tooling to help us understand whether we were just missing some config options somewhere in nova-compute or there was some other problem with KVM/libvirt versions.
And even when we had some data and wanted to generate just a proof-of-concept report with some information about tenant resource usage, we found problems with the API. The fact that no one had bothered to write a simple proof of concept script that uses the API to actually do something useful was really off-putting.

We had to dig in libvirt to understand what some of the metrics actually mean.
We found that we could read those same metrics from our (more efficient, well-known) monitoring system.

For some time we run just the agents and aggregated the data in an elasticsearch instance through the UDP msgpack pipeline (more bugs, message format is inconsistent, different agents generate different fields, in slightly different formats).
It works. But for our needs it was just too much work. Most of the data is already available from other sources with well-known APIs.

Ah, also there is a long standing bug open: Sahara and Ceilometer cannot be used together. And we use Sahara.

I opened bugs for some of these issues, but since then I lost interest.

In the end, I think it really depends on what kind of data you need and what (developer) resources you can throw at the problem.
Unless in Juno things changed dramatically, Ceilometer will not work out of the box. You will have to lose time because of the non-existent documentation, you will have to develop code and scripts anyway and finally you will have to create something between your billing system and the ceilometer API, because to the best of my knowledge there is nothing that uses it.

eBay has the resources to do all that. We don't.

-----Original Message-----
From: George Shuklin [mailto:george.shuklin at gmail.com] 
Sent: Thursday 12 February 2015 02:59
To: openstack-operators at lists.openstack.org
Subject: Re: [Openstack-operators] [Ceilometer] Real world experience with Ceilometer deployments - Feedback requested

Ceilometer is in sad state.

1. Collector leaks memory. We ran it on same host with mongo, and it grab 29Gb out of 32, leaving mongo with less than gig memory available.
2. Metering agent cause huge load on neutron-server. o(n) of metering rules and tenants. Few bugs reported, one bugfix in review.
3. Metering agent simply do no work on multi-network-nodes installation. 
It exepects all routers be on same host. Fixed or not - I don't know, we have our own crude fix.
4. Many rough edges. Ceilometer much less tested than nova. Sometimes it traces and skip counting. Fresh example: if metadata has '.' in the name, ceilometer trace on it and did not count in glance usage.
5. Very slow on reports (using mongo's mapreduce).

Overall feeling: barely usable, but with my experience with cloud billings, not the worst thing I saw in my life.

About load: except reporting and memory leaks, it use rather small amount of resources.

On 02/11/2015 09:37 PM, Maish Saidel-Keesing wrote:
> Is Ceilometer ready for prime time?
> I would be interested in hearing from people who have deployed 
> OpenStack clouds with Ceilometer, and their experience. Some of the 
> topics I am looking for feedback on are:
> - Database Size
> - MongoDB management, Sharding, replica sets etc.
> - Replication strategies
> - Database backup/restore
> - Overall useability
> - Gripes, pains and problems (things to look out for)
> - Possible replacements for Ceilometer that you have used instead
> If you are willing to share - I am sure it will be beneficial to the 
> whole community.
> Thanks in Advance
> With best regards,
> Maish Saidel-Keesing
> Platform Architect
> Cisco
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operator
> s

OpenStack-operators mailing list
OpenStack-operators at lists.openstack.org

More information about the OpenStack-operators mailing list