[openstack-dev] [gnocchi] new measures backlog scheduling

gordon chung gord at live.ca
Tue Nov 15 00:13:04 UTC 2016



On 14/11/16 05:53 PM, John Dickinson wrote:
>
> I'm stepping in to an area here (gnocchi) that I know very little about,
> so please forgive me where I mess up.
>
> First, as a practical note, stuff in Swift will be /much/ better when
> you spread it across the entire namespace. It's a lot better to data
> stored in many containers instead of putting all data into just one
> container. Spreading the data out takes advantages of Swift's scaling
> characteristics and makes users and ops happier.

thanks for the tip John! good to know the proposal might actually have 
additional benefits.

>
> Second, at the risk of overgeneralizing[1], you may want to consider
> using the consistent hashing code from Ironic and Nova (and is being
> discussed as a new oslo library). Consistent hashing gives you the nice
> property of being able to change the number of buckets you're hashing
> data into without having to rehash most of the existing data. Think of
> it this way, if you hash into two buckets and use even/odd (i.e. last
> bit) to determine which bucket data goes into, then when you need a
> third bucket you have to switch to MOD 3 and two-thirds of your existing
> data will move into a different bucket. That's bad, and it gets even
> worse as you add more and more buckets. With consistent hashing, you can
> get the property that if you add 1% more buckets, you'll only move about
> 1% of the existing data to a different bucket.

is the consistent hashing code == hashring code? we actually jacked that 
from Ironic and used it in Ceilometer already. :) but that's a good idea 
to leverage it in this case if we proceed. i just noticed jd adding it 
to tooz[1].

>
> So going even further into my ignorance of the gnocchi problem space, I
> could imagine that there may be some benefit of being able to change the
> number of hash buckets over time based on how many items, how many
> workers are processing them, rate of new metrics, etc. If there's
> benefit to changing the number of hash buckets over time, then looking
> at consistent hashing is probably worth the time. If there is no benefit
> to changing the number of hash buckets over time, then a simple
> |hash(data) >> (hash_len - log2(num_hash_buckets) + 1)| or |hash(data)
> MOD num_hash_buckets| is probably sufficient.

that's a good point. in Ceilometer, iirc we don't ever change the number 
of buckets so we arguably didn't need it. in this case, i imagine we'd 
need to consider overhead of creating a lot of buckets vs letting users 
attempt to figure out what the right number of buckets is for their system.

thanks for all the info! glad your filter is picking up keywords in 
email body.

[1] https://review.openstack.org/#/c/397264/

cheers,
-- 
gord



More information about the OpenStack-dev mailing list