[Openstack-operators] scaling gnocchi metricd

Mike Lowe jomlowe at iu.edu
Tue Mar 28 19:55:07 UTC 2017


I recently got into trouble with a large backlog. What I found was at some point the backlog got too large for gnocchi to effectivly function.  When using ceph list of metric objects is kept in a omap object which normally is a quick and efficient way to store this list.  However, at some point the list grows too large for it to be managed by the leveldb which implements the omap k/v store.  I finally moved to some ssd’s to get enough iops for leveldb/omap to function.  What I’m guessing is that if you are using ceph the increased number of metrics grabbed per pass reduced the number of times a now expensive operation is performed.  Indications are that the new bluestore should make omap scale much better but isn’t slated to go stable for a few months with the release of Luminous.


> On Mar 28, 2017, at 2:28 PM, Ionut Biru - Fleio <ionut at fleio.com> wrote:
> 
> Hello,
> 
> I do have a cloud under administration, my setup is fairly basic, I have deployed openstack using Openstack Ansible, currently I'm a Newton and planning to upgrade on Ocata.
> 
> I'm having a problem with gnocchi metricd falling behind on processing metrics.
> 
> Gnocchi config: https://paste.xinu.at/f73A/ <https://paste.xinu.at/f73A/>
> 
> In I'm using default workers number(cpu count) the number of "storage/total number of measures to process" keeps growing, last time I had 300k in queue. In seems that the tasks are not rescheduled in order to process them all in time and it processing couples of metrics after they are received from ceilometer and after that they are kept in queue and I only have 10 compute nodes with about 70 instances.
> 
> In order to process I had to set up workers to a very high number (100) and keep restarting metricd in order for them to be processed but this method is very cpu and memory intensive and luckily I found another method that works quite well.
> 
> https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154 <https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154>
> 
> I have modified TASKS_PER_WORKER and BLOCK_SIZE to 400 and now metricd keeps processing them.
> 
> I'm not sure yet if is a bug or not but my question is, how do you guys scale gnocchi metricd in order to process a lot of resources and metrics?
> 
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170328/dc19ba81/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3574 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170328/dc19ba81/attachment.bin>


More information about the OpenStack-operators mailing list