[openstack-dev] [nova] [placement] placement api request analysis
cdent+os at anticdent.org
Wed Jan 25 17:11:36 UTC 2017
I've started looking into what kind of request load the placement
API can expect when both the scheduler and the resource tracker are
talking to it. I think this is important to do now before we have
things widely relying on this stuff so we can give some reasonable
advice on deployment options and expected traffic.
I'm working with a single node devstack, which should make the math
nice and easy.
Unfortunately when doing this is really ended more of an audit of
where the resource tracker is doing more than it ought to be. What
follows ends being a rambling exploration of areas that _may_ be
I've marked paragraphs that have things that maybe ought to change
with #B<some number>. It appears that the resource tracker is doing
a lot of extra work that it doesn't need to do (even before the
advent of the placement API). There's already one fix in progress
(for B2) but the others need some discussion as I'm not sure of the
ramifications. I'd like some help deciding what's going on before I
make random bug reports.
When the compute node starts it makes two requests to create the
resource provider that represents that compute, at which point it
also requests the aggregates for that resource provider, to update
its local map of aggregate associations.
It then updates inventory for the resource provider, twice, the
first one is a conflict (probably because the generation is out of
After that every 60s or so, five requests are made:
These requests are returning the same data each time (so far).
The request to get aggregates happens twice on every cycle, because
it happens each time we ensure the resource provider is present in
our local map of resource providers. Aggregates are checked each time
because if we don't there's no other clean way for an operator to
associate aggregates and have them quickly picked up.
The request to inventories is checking if inventory has
changed. This is happening as a result of the regular call to
'update_available_resource' passing through _update method.
That same method is also calling _init_compute_node, which will
_also_ think about updating the inventory and thus do the aggregates
check from _ensure_resource_provider. That seems redundant. Perhaps
we should only call update_resource_stats from _update and not from
_init_compute_node as they are both called from the same method in
the resource tracker.
That same method also reguarly calls '_update_usage_from_instances'
which calls 'remove_deleted_instances' with a potentially empty list
of instances. That method gets the allocations for this compute
So before we've added any VMs we're at 5000 requests per minute in a
1000 node cluster.
Adding in the fix at https://review.openstack.org/#/c/424305/
reduces a lot of that churn by avoiding an update from _update when
not necessary, reducing to three requests every 60s when there are
no servers. The remaining requests are from the call to
_init_compute_node at #B1 above.
Creating a Server
When we create a server there are seven total requests, with these
involved with the actual instance:
(allocations are done by comparing with what's there, if anything)
The others are what _update does.
After that the three requests grows to four per 60s:
The new GET to /placement/allocations is happening when the
resource tracker calls _update_usage_from_instance, which is always
being called becuause is_new_instance is always true in that method,
even when the instance is not "new". This is happening because the
tracked_instaces dict is _always_ getting cleared before
_update_usage_from_instance is being called. Which is weird because
it appears that it is that method's job to update tracked_instances.
If I remove the clear() the get on /placement/allocations goes away.
But I'm not sure what else this will break. The addition of that line
was a long time ago, in this change (I think):
With the clear() gone the calls in _init_compute_node are the only
remaining repeats. If we take those out, now there is only the call
to check that allocations are equivalent between placement the
compute node (every 60s):
Doing this will break aggregate association updates, so that's no
After all these changes when I add a second server, after a few
requests (as above) things settle back to the one GET for the
resource provider allocations.
So we're currently at one request per compute node per 60s, with the
changes I've made, or 5 without.
Deleting one server deletes the expected allocations, does the
aggregates and inventories GETs once for each server that had
existed and then one get every 60s for the allocations.
If I delete all the servers there is one series of aggregates and
inventories GETs and then we settle back to the one get every 60s
The good news is that the filtering added by bauzas is working as
expected and desired when talking to the placement API. The medium
news is that the resource tracker results in the desired data in the
database with the current code in master. The somewhat bad news is
that the resource tracker is making far more GET requests than it
really needs to. 5+n per 60s where n == the number of instances
currently in use by the compute node.
I think a next step would be for someone else who is familiar with
the resource tracker to have a look over what tracked_instances
really means, and the when and why of
self.scheduler_client.update_resource_stats being called to see
which can be skipped.
And we should merge https://review.openstack.org/#/c/424305/
If you made it this far, thanks!
 This should probably be looked into, but I skipped doing so in
this exploration to avoid getting distracted.
 The compute node is authoritative about its allocations. If
placement API has data that doesn't match the local truth the
allocations are removed.
 So going back to our theoretical 1000 node deployment, if we
assuming there's 10 computes on every node (for sake of easy math)
we've got 15,000 read requests per 60s without ever accounting for
any changes in the number of instances. Or if we have a nice smooth
spread of periodic jobs about 250/rps. That's not much but I'm
pretty we've yet to do any real performance checking of the
placement API. It also leaves out any accounting for the operations
from the scheduler itself, or the write operations that the resource
tracker does when things change.
 Which is kinda where I hoped but didn't get to today. Maybe
another day soon.
Chris Dent ¯\_(ツ)_/¯ https://anticdent.org/
freenode: cdent tw: @anticdent
More information about the OpenStack-dev