[openstack-dev] [gnocchi] new measures backlog scheduling
gord at live.ca
Mon Nov 14 21:57:39 UTC 2016
one issue i've noticed with our 'backlog' scheduling is that we register
all our new measures in a single folder/filestore object. this folder or
object in most production cases can grow quite large (tens/hundreds of
thousands). so we don't load it all into memory, the drivers will only
grab the first x items and process them. unfortunately, we don't control
the ordering of the returned items so it is dependent on the ordering
the backend returns. for Ceph, it returns in what i guess is some
alphanumeric order. the file driver i believe returns based on how the
filesystem indexes files. i have no idea how swift ordering behaves. the
result of this is that we may starve some new measures from being
processed because they keep getting pushed back by more recent measures
if less agents are deployed.
with that said, this isn't a huge issue because measures can be
processed on demand using refresh parameter but it's not ideal.
i was thinking, to better handle processing while minimising the effects
of a driver's natural indexing, we can hash our new measures into
buckets based on metric_id. Gnocchi would hash all incoming metrics into
100? buckets and metricd agents would divide up these buckets and loop
through them. this would ensure we have smaller buckets to deal with and
therefore less chance, metrics get pushed back and starved. that said it
will add additional requirements of 100? folders/filestore objects
rather than 1. it will also mean we may be making significantly more
smaller fetches vs single (possibly) giant fetch.
to extend this, we could also hash into project_id groups and thus allow
some projects to have more workers and thus more performant queries?
this might be too product tailored. :)
More information about the OpenStack-dev