[openstack-dev] [gnocchi] new measures backlog scheduling

gordon chung gord at live.ca
Tue Apr 18 13:38:35 UTC 2017



On 18/04/17 08:44 AM, Julien Danjou wrote:

>
> Live upgrade never has been supported in Gnocchi, so I don't see how
> that's a problem. It'd be cool to support it for sure, but we're far
> from having been able to implement it at any point in time in the best.
> So it's not a new issue or anything like that. I really don't see
> a problem with loading the number of sacks at startup.
>

it's a problem if you don't do a full shut down of every single gnocchi 
service. my main concern is this is not a 'lose data' situation like if 
you screw up any of the options. this will corrupt your storage. i'll 
ignore discussion for live upgrade for now to not get sidetracked.


>
> I think it's worth it only if you use replicas – and I don't think 2 is
> enough, I'd try 3 at least, and make it configurable. It'll reduce a lot
> lock-contention (e.g. by 17x time in my previous example).

i could make it same reduction in lock contention if i added basic 
partitioning :P

> As far as I'm concerned, since the number of replicas is configurable,
> you can add a knob that would set replicas=number_of_metricd_worker that
> would implement the current behaviour you implemented – every worker
> tries to grab every sack.

do we want it configurable? tbh, would anyway one configure it or know 
how to configure it? even for us, we're just guessing somewhat.lol i'm 
going to leave it static for now.

>
> We're not leveraging the re-balancing aspect of hashring, that's true.
> We could probably use any dumber system to spread sacks across workers,
> We could stick to the good ol' "len(sacks) / len(workers in the group)".
>
> But I think there's a couple of things down the road that may help us:
> Using the hashring makes sure worker X does not jump from sacks [A, B,
> C], to [W, X, Y, Z] but just to [A, B] or [A, B, C, X]. That should
> minimize lock contention when bringing up/down new workers. I admit it's
> a very marginal win, but… it comes free with it.
> Also, I envision a push based approach in the future (to replace the
> metricd_processing_delay) which will require worker to register to
> sacks. Making sure the rebalancing does not shake everything but is
> rather smooth will also reduce workload around that. Again, it comes
> free.
>

this is not really free. choosing hashring means we will have idle 
workers and more complexity of figuring out what each of the other 
agents look like in group. it's a trade-off (especially considering how 
few keys to nodes we have) which is why i brought up question.

i'll be honest, i'll probably still switch back to hashring... but want 
to make sure we're not just thinking hashring only.

-- 
gord


More information about the OpenStack-dev mailing list