[Openstack-operators] scaling gnocchi metricd

Mike Lowe jomlowe at iu.edu
Thu Mar 30 18:09:28 UTC 2017


I think it was somewhere around the 2M mark.

> On Mar 30, 2017, at 8:33 AM, Alex Krzos <akrzos at redhat.com> wrote:
> 
> On Tue, Mar 28, 2017 at 3:55 PM, Mike Lowe <jomlowe at iu.edu <mailto:jomlowe at iu.edu>> wrote:
>> I recently got into trouble with a large backlog. What I found was at some
>> point the backlog got too large for gnocchi to effectivly function.  When
>> using ceph list of metric objects is kept in a omap object which normally is
>> a quick and efficient way to store this list.  However, at some point the
>> list grows too large for it to be managed by the leveldb which implements
>> the omap k/v store.
> 
> Can you share at what number of keys were stored in the omap object
> when this became a problem.
> 
> Thanks,
> 
> Alex
> 
>> I finally moved to some ssd’s to get enough iops for
>> leveldb/omap to function.  What I’m guessing is that if you are using ceph
>> the increased number of metrics grabbed per pass reduced the number of times
>> a now expensive operation is performed.  Indications are that the new
>> bluestore should make omap scale much better but isn’t slated to go stable
>> for a few months with the release of Luminous.
>> 
>> 
>> On Mar 28, 2017, at 2:28 PM, Ionut Biru - Fleio <ionut at fleio.com> wrote:
>> 
>> Hello,
>> 
>> I do have a cloud under administration, my setup is fairly basic, I have
>> deployed openstack using Openstack Ansible, currently I'm a Newton and
>> planning to upgrade on Ocata.
>> 
>> I'm having a problem with gnocchi metricd falling behind on processing
>> metrics.
>> 
>> Gnocchi config: https://paste.xinu.at/f73A/
>> 
>> In I'm using default workers number(cpu count) the number of "storage/total
>> number of measures to process" keeps growing, last time I had 300k in queue.
>> In seems that the tasks are not rescheduled in order to process them all in
>> time and it processing couples of metrics after they are received from
>> ceilometer and after that they are kept in queue and I only have 10 compute
>> nodes with about 70 instances.
>> 
>> In order to process I had to set up workers to a very high number (100) and
>> keep restarting metricd in order for them to be processed but this method is
>> very cpu and memory intensive and luckily I found another method that works
>> quite well.
>> 
>> https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154
>> 
>> I have modified TASKS_PER_WORKER and BLOCK_SIZE to 400 and now metricd keeps
>> processing them.
>> 
>> I'm not sure yet if is a bug or not but my question is, how do you guys
>> scale gnocchi metricd in order to process a lot of resources and metrics?
>> 
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>> 
>> 
>> 
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org <mailto:OpenStack-operators at lists.openstack.org>
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators <http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170330/52f22f19/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3574 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-operators/attachments/20170330/52f22f19/attachment.bin>


More information about the OpenStack-operators mailing list