[Openstack-operators] scaling gnocchi metricd

Alex Krzos akrzos at redhat.com
Thu Mar 30 13:33:58 UTC 2017


On Tue, Mar 28, 2017 at 3:55 PM, Mike Lowe <jomlowe at iu.edu> wrote:
> I recently got into trouble with a large backlog. What I found was at some
> point the backlog got too large for gnocchi to effectivly function.  When
> using ceph list of metric objects is kept in a omap object which normally is
> a quick and efficient way to store this list.  However, at some point the
> list grows too large for it to be managed by the leveldb which implements
> the omap k/v store.

Can you share at what number of keys were stored in the omap object
when this became a problem.

Thanks,

Alex

>  I finally moved to some ssd’s to get enough iops for
> leveldb/omap to function.  What I’m guessing is that if you are using ceph
> the increased number of metrics grabbed per pass reduced the number of times
> a now expensive operation is performed.  Indications are that the new
> bluestore should make omap scale much better but isn’t slated to go stable
> for a few months with the release of Luminous.
>
>
> On Mar 28, 2017, at 2:28 PM, Ionut Biru - Fleio <ionut at fleio.com> wrote:
>
> Hello,
>
> I do have a cloud under administration, my setup is fairly basic, I have
> deployed openstack using Openstack Ansible, currently I'm a Newton and
> planning to upgrade on Ocata.
>
> I'm having a problem with gnocchi metricd falling behind on processing
> metrics.
>
> Gnocchi config: https://paste.xinu.at/f73A/
>
> In I'm using default workers number(cpu count) the number of "storage/total
> number of measures to process" keeps growing, last time I had 300k in queue.
> In seems that the tasks are not rescheduled in order to process them all in
> time and it processing couples of metrics after they are received from
> ceilometer and after that they are kept in queue and I only have 10 compute
> nodes with about 70 instances.
>
> In order to process I had to set up workers to a very high number (100) and
> keep restarting metricd in order for them to be processed but this method is
> very cpu and memory intensive and luckily I found another method that works
> quite well.
>
> https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154
>
> I have modified TASKS_PER_WORKER and BLOCK_SIZE to 400 and now metricd keeps
> processing them.
>
> I'm not sure yet if is a bug or not but my question is, how do you guys
> scale gnocchi metricd in order to process a lot of resources and metrics?
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>
>
>
> _______________________________________________
> OpenStack-operators mailing list
> OpenStack-operators at lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>



More information about the OpenStack-operators mailing list