[openstack-dev] [gnocchi] per-sack vs per-metric locking tradeoffs

Julien Danjou julien at danjou.info
Fri Apr 28 07:48:54 UTC 2017


On Thu, Apr 27 2017, gordon chung wrote:

> so as we transition to the bucket/shard/sack framework for incoming 
> writes, we've set up locks on the sacks so we only have one process 
> handling any given sack. this allows us to remove the per-metric locking 
> we had previously.

yay!

> the issue i've notice now is that if we only have per-sack locking, 
> metric based actions can affect sack level processing. for example:
>
> scenario 1:
> 1. delete metric, locks sack to delete single metric,
> 2. metricd processor attempts to process entire sack but can't, so skips.

Yes, I wrote that in a review somewhere. We need to rework 1. so
deletion happens at the same time we lock the sack to process metrics
basically. We might want to merge the janitor into the worker I imagine.
Currently a janitor can grab metrics and do dumb things like:
- metric1 from sackA
- metric2 from sackB
- metric3 from sackA

and do 3 different lock+delete -_-

> scenario 2:
> 1. API request passes in 'refresh' param so they want all unaggregated 
> metrics to be processed on demand and returned.
> 2. API locks 1 or more sacks to process 1 or more metrics
> 3. metricd processor attempts to process entire sack but can't, so 
> skips. potentially multiple sacks unprocessed in currently cycle.
>
> scenario 3
> same as scenario 2 but metricd processor locks first, and either blocks
> API process OR  doesn't allow API to guarantee 'all measures processed'.

Yes, I'm even more worried about scenario 3, we should probably add a
safe guard timeout parameter set by the admin there.

> i imagine these scenarios are not critical unless a very large 
> processing interval is defined or if for some unfortunate reason, the 
> metric-based actions are perfectly timed to lock out background processing.
>
> alternatively, this could be solved by keeping per-metric locks in 
> addition to per-sack locks. this would effectively double the number of 
> active locks we have so instead of each metricd worker having a single 
> per-sack lock, it will also have a per-metric lock for whatever metric 
> it may be publishing at the time.

If we got a timeout set for scenario 3, I'm not that worried. I guess
worst thing is that people would be unhappy with the API spending time
doing computation anyway so we'd need to rework how refresh work or add
an ability to disable it.

-- 
Julien Danjou
// Free Software hacker
// https://julien.danjou.info
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 800 bytes
Desc: not available
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170428/182b4905/attachment.sig>


More information about the OpenStack-dev mailing list