Open Stack

Mon Nov 14 21:57:39 UTC 2016

hi,

one issue i've noticed with our 'backlog' scheduling is that we register 
all our new measures in a single folder/filestore object. this folder or 
object in most production cases can grow quite large (tens/hundreds of 
thousands). so we don't load it all into memory, the drivers will only 
grab the first x items and process them. unfortunately, we don't control 
the ordering of the returned items so it is dependent on the ordering 
the backend returns. for Ceph, it returns in what i guess is some 
alphanumeric order. the file driver i believe returns based on how the 
filesystem indexes files. i have no idea how swift ordering behaves. the 
result of this is that we may starve some new measures from being 
processed because they keep getting pushed back by more recent measures 
if less agents are deployed.

with that said, this isn't a huge issue because measures can be 
processed on demand using refresh parameter but it's not ideal.

i was thinking, to better handle processing while minimising the effects 
of a driver's natural indexing, we can hash our new measures into 
buckets based on metric_id. Gnocchi would hash all incoming metrics into 
100? buckets and metricd agents would divide up these buckets and loop 
through them. this would ensure we have smaller buckets to deal with and 
therefore less chance, metrics get pushed back and starved. that said it 
will add additional requirements of 100? folders/filestore objects 
rather than 1. it will also mean we may be making significantly more 
smaller fetches vs single (possibly) giant fetch.

to extend this, we could also hash into project_id groups and thus allow 
some projects to have more workers and thus more performant queries? 
this might be too product tailored. :)

thoughts?

cheers,
-- 
gord

Open Stack

[openstack-dev] [gnocchi] new measures backlog scheduling

OpenStack

Community

Documentation

Branding & Legal