[openstack-dev] [gnocchi] new measures backlog scheduling
me at not.mn
Mon Nov 14 22:53:18 UTC 2016
On 14 Nov 2016, at 13:57, gordon chung wrote:
> one issue i've noticed with our 'backlog' scheduling is that we register
> all our new measures in a single folder/filestore object. this folder or
> object in most production cases can grow quite large (tens/hundreds of
> thousands). so we don't load it all into memory, the drivers will only
> grab the first x items and process them. unfortunately, we don't control
> the ordering of the returned items so it is dependent on the ordering
> the backend returns. for Ceph, it returns in what i guess is some
> alphanumeric order. the file driver i believe returns based on how the
> filesystem indexes files. i have no idea how swift ordering behaves. the
Listings in Swift are lexicographically sorted by the object name.
> result of this is that we may starve some new measures from being
> processed because they keep getting pushed back by more recent measures
> if less agents are deployed.
> with that said, this isn't a huge issue because measures can be
> processed on demand using refresh parameter but it's not ideal.
> i was thinking, to better handle processing while minimising the effects
> of a driver's natural indexing, we can hash our new measures into
> buckets based on metric_id. Gnocchi would hash all incoming metrics into
> 100? buckets and metricd agents would divide up these buckets and loop
> through them. this would ensure we have smaller buckets to deal with and
> therefore less chance, metrics get pushed back and starved. that said it
> will add additional requirements of 100? folders/filestore objects
> rather than 1. it will also mean we may be making significantly more
> smaller fetches vs single (possibly) giant fetch.
> to extend this, we could also hash into project_id groups and thus allow
> some projects to have more workers and thus more performant queries?
> this might be too product tailored. :)
I'm stepping in to an area here (gnocchi) that I know very little about, so please forgive me where I mess up.
First, as a practical note, stuff in Swift will be *much* better when you spread it across the entire namespace. It's a lot better to data stored in many containers instead of putting all data into just one container. Spreading the data out takes advantages of Swift's scaling characteristics and makes users and ops happier.
Second, at the risk of overgeneralizing, you may want to consider using the consistent hashing code from Ironic and Nova (and is being discussed as a new oslo library). Consistent hashing gives you the nice property of being able to change the number of buckets you're hashing data into without having to rehash most of the existing data. Think of it this way, if you hash into two buckets and use even/odd (i.e. last bit) to determine which bucket data goes into, then when you need a third bucket you have to switch to MOD 3 and two-thirds of your existing data will move into a different bucket. That's bad, and it gets even worse as you add more and more buckets. With consistent hashing, you can get the property that if you add 1% more buckets, you'll only move about 1% of the existing data to a different bucket.
So going even further into my ignorance of the gnocchi problem space, I could imagine that there may be some benefit of being able to change the number of hash buckets over time based on how many items, how many workers are processing them, rate of new metrics, etc. If there's benefit to changing the number of hash buckets over time, then looking at consistent hashing is probably worth the time. If there is no benefit to changing the number of hash buckets over time, then a simple `hash(data) >> (hash_len - log2(num_hash_buckets) + 1)` or `hash(data) MOD num_hash_buckets` is probably sufficient.
 I've got a nice hammer. Your problem sure looks like a nail to me.
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
-------------- next part --------------
An HTML attachment was scrubbed...
-------------- next part --------------
A non-text attachment was scrubbed...
Size: 801 bytes
Desc: OpenPGP digital signature
More information about the OpenStack-dev