[Openstack-operators] scaling gnocchi metricd

Alex Krzos akrzos at redhat.com
Wed Mar 29 12:55:27 UTC 2017


On Wed, Mar 29, 2017 at 2:10 AM, Ionut Biru - Fleio <ionut at fleio.com> wrote:
> I'm not using influxdb, just basic configuration generated by openstack
> ansible, which enables file storage by default.
>

Oops I honed in on the influxdb portion of the config.  I see driver
is set to file.

>
> The reason for bumping those values was to process a lot of measures and 400
> seems a high number at that time.
>

I was actually referring to the assigned values of 16 for
TASKS_PER_WORKER and 4 for BLOCK_SIZE.  My assumption is that someone
has done some sort of analysis or testing to show what good values are
here. That is not to discount that obviously a higher value has helped
your situation here.  It sounds like it is absolutely worth it to
investigate what is the optimal setting here or the trade-offs of
bumping this value.

Lets gather a few more data points around your environment though for
future reference such as you had 10 workers so 10*16 = 160 Total Tasks
you could handle per metric processing wake up period that could not
sustain your 70 instances.  Do you know how many metrics you have per
instances and whether they had other resources which would create more
metrics? (nics, volumes, disks)  Did you have more than one machine
hosting metricd workers? (That adds to that capacity then)  Is the
setup baremetal or some sort of virtualized cloud?

>
> I did use the below values without any impact
>
> metric_processing_delay = 0
>

I have seen values less than 10s here actually become detrimental
though I did not have the time to root cause why that occurs.

> metric_reporting_delay = 1
>
> metric_cleanul_delay = 10
>
>
> I'm opened to apply any configuration modification to my setup in order to
> resolve my issue without any code modification(that i did).
>

The current tunings I know trades system resources for higher
capacity.  Given that I'd like to understand what if anything was
traded when you bumped those constants in the code base. Did you
happen to have any additional telemetry collected on your cloud that
can help characterize Gnocchi's resource consumption after your
changes?  Also what are you using to monitor the Gnocchi Backlog?  I
use a collectd plugin available here
(https://github.com/akrzos/collectd-gnocchi-status)

> ________________________________
> From: Alex Krzos <akrzos at redhat.com>
> Sent: Tuesday, March 28, 2017 8:19:58 PM
> To: Ionut Biru - Fleio
> Cc: openstack-operators at lists.openstack.org
> Subject: Re: [Openstack-operators] scaling gnocchi metricd
>
> This is interesting, thanks for sharing.  I assume your using an
> influxdb storage driver correct?  I have also wondered if there was a
> specific reason for the TASKS_PER_WORKER and BLOCK_SIZE values.
>
> Also did you have to adjust your metric_processing_delay?
>
>
> Alex Krzos | Performance Engineering
> Red Hat
> Desk: 919-754-4280
> Mobile: 919-909-6266
>
>
> On Tue, Mar 28, 2017 at 3:28 PM, Ionut Biru - Fleio <ionut at fleio.com> wrote:
>> Hello,
>>
>>
>> I do have a cloud under administration, my setup is fairly basic, I have
>> deployed openstack using Openstack Ansible, currently I'm a Newton and
>> planning to upgrade on Ocata.
>>
>>
>> I'm having a problem with gnocchi metricd falling behind on processing
>> metrics.
>>
>>
>> Gnocchi config: https://paste.xinu.at/f73A/
>>
>>
>> In I'm using default workers number(cpu count) the number of
>> "storage/total
>> number of measures to process" keeps growing, last time I had 300k in
>> queue.
>> In seems that the tasks are not rescheduled in order to process them all
>> in
>> time and it processing couples of metrics after they are received from
>> ceilometer and after that they are kept in queue and I only have 10
>> compute
>> nodes with about 70 instances.
>>
>>
>> In order to process I had to set up workers to a very high number (100)
>> and
>> keep restarting metricd in order for them to be processed but this method
>> is
>> very cpu and memory intensive and luckily I found another method that
>> works
>> quite well.
>>
>>
>>
>> https://git.openstack.org/cgit/openstack/gnocchi/tree/gnocchi/cli.py?h=stable/3.1#n154
>>
>>
>> I have modified TASKS_PER_WORKER and BLOCK_SIZE to 400 and now metricd
>> keeps
>> processing them.
>>
>>
>> I'm not sure yet if is a bug or not but my question is, how do you guys
>> scale gnocchi metricd in order to process a lot of resources and metrics?
>>
>>
>> _______________________________________________
>> OpenStack-operators mailing list
>> OpenStack-operators at lists.openstack.org
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators
>>



More information about the OpenStack-operators mailing list