[openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Chris Dent cdent+os at anticdent.org
Mon Nov 5 11:52:58 UTC 2018


On Sun, 4 Nov 2018, Jay Pipes wrote:

> Now that we have generation markers protecting both providers and consumers, 
> we can rely on those generations to signal to the scheduler report client 
> that it needs to pull fresh information about a provider or consumer. So, 
> there's really no need to automatically and blindly refresh any more.

I agree with this ^.

I've been trying to tease out the issues in this thread and on the
associated review [1] and I've decided that much of my confusion
comes from the fact that we refer to a thing which is a "cache" in
the resource tracker and either trusting it more or not having it at
all, and I think that's misleading. To me a "cache" has multiple
clients and there's some need for reconciliation and invalidation
amongst them. The thing that's in the resource tracker is in one
process, changes to it are synchronized; it's merely a data structure.

Some words follow where I try to tease things out a bit more (mostly
for my own sake, but if it helps other people, great). At the very
end there's a bit of list of suggested todos for us to consider.

What we have is a data structure which represents the resource
tracker and virtdirver's current view on what providers and
associates it is aware of. We maintain a boundary between the RT and
the virtdriver that means there's "updating" going on that sometimes
is a bit fussy to resolve (cf. recent adjustments to allocation
ratio handling).

In the old way, every now and again we get a bunch of info from
placement to confirm that our view is right and try to reconcile
things.

What we're considering moving towards is only doing that "get a
bunch of info from placement" when we fail to write to placement
because of a generation conflict.

Thus we should only read from placement:

* at compute node startup
* when a write fails

And we should only write to placement:

* at compute node startup
* when the virt driver tells us something has changed

Is that right? If it is not right, can we do that? If not, why not?

Because generations change, often, they guard against us making
changes in ignorance and allow us to write blindly and only GET when
we fail. We've got this everywhere now, let's use it. So, for
example, even if something else besides the compute is adding
traits, it's cool. We'll fail when we (the compute) try to clobber.

Elsewhere in the thread several other topics were raised. A lot of
that boil back to "what are we actually trying to do in the
periodics?". As is often the case (and appropriately so) what we're
trying to do has evolved and accreted in an organic fashion and it
is probably time for us to re-evaluate and make sure we're doing the
right stuff. The first step is writing that down. That aspect has
always been pretty obscure or tribal to me, I presume so for others.
So doing a legit audit of that code and the goals is something we
should do.

Mohammed's comments about allocations getting out of sync are
important. I agree with him that it would be excellent if we could
go back to self-healing those, especially because of the "wait for
the computes to automagically populate everything" part he mentions.
However, that aspect, while related to this, is not quite the same
thing. The management of allocations and the management of
inventories (and "associates") is happening from different angles.

And finally, even if we turn off these refreshes to lighten the
load, placement still needs to be capable of dealing with frequent
requests, so we have something to fix there. We need to do the
analysis to find out where the cost is and implement some solutions.
At the moment we don't know where it is. It could be:

* In the database server
* In the python code that marshals the data around those calls to
   the database
* In the python code that handles the WSGI interactions
* In the web server that is talking to the python code

belmoreira's document [2] suggests some avenues of investigation
(most CPU time is in user space and not waiting) but we'd need a bit
more information to plan any concrete next steps:

* what's the web server and which wsgi configuration?
* where's the database, if it's different what's the load there?

I suspect there's a lot we can do to make our code more correct and
efficient. And beyond that there is a great deal of standard run-of-
the mill server-side caching and etag handling that we could
implement if necessary. That is: treat placement like a web app that
needs to be optimized in the usual ways.

As Eric suggested at the start of the thread, this kind of
investigation is expected and normal. We've not done something
wrong. Make it, make it correct, make it fast is the process.
We're oscillating somewhere between 2 and 3.

So in terms of actions:

* I'm pretty well situated to do some deeper profiling and
   benchmarking of placement to find the elbows in that.

* It seems like Eric and Jay are probably best situated to define
   and refine what should really be going on with the resource
   tracker and other actions on the compute-node.

* We need to have further discussion and investigation on
   allocations getting out of sync. Volunteers?

What else?

[1] https://review.openstack.org/#/c/614886/
[2] https://docs.google.com/document/d/1d5k1hA3DbGmMyJbXdVcekR12gyrFTaj_tJdFwdQy-8E/edit

-- 
Chris Dent                       ٩◔̯◔۶           https://anticdent.org/
freenode: cdent                                         tw: @anticdent


More information about the OpenStack-dev mailing list