<div dir="ltr"><blockquote style="margin:0 0 0 40px;border:none;padding:0px"></blockquote><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Thus we should only read from placement:<br>* at compute node startup<br>* when a write fails<br>And we should only write to placement:<br>* at compute node startup<br>* when the virt driver tells us something has changed  </blockquote><div><br></div><div>I agree with this. </div><div><br></div><div>We could also prepare an interface for operators/other-projects to force nova to pull fresh information from placement and put it into its cache in order to avoid predictable conflicts.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Is that right? If it is not right, can we do that? If not, why not?  </blockquote><div><br></div><div>The same question from me.</div><div>Refreshing periodically strategy might be now an optional optimization for smaller clouds?<br></div><div><br></div><div><div class="gmail_quote"><div dir="ltr">2018年11月5日(月) 20:53 Chris Dent <<a href="mailto:cdent%2Bos@anticdent.org">cdent+os@anticdent.org</a>>:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On Sun, 4 Nov 2018, Jay Pipes wrote:<br>

<br>

> Now that we have generation markers protecting both providers and consumers, <br>

> we can rely on those generations to signal to the scheduler report client <br>

> that it needs to pull fresh information about a provider or consumer. So, <br>

> there's really no need to automatically and blindly refresh any more.<br>

<br>

I agree with this ^.<br>

<br>

I've been trying to tease out the issues in this thread and on the<br>

associated review [1] and I've decided that much of my confusion<br>

comes from the fact that we refer to a thing which is a "cache" in<br>

the resource tracker and either trusting it more or not having it at<br>

all, and I think that's misleading. To me a "cache" has multiple<br>

clients and there's some need for reconciliation and invalidation<br>

amongst them. The thing that's in the resource tracker is in one<br>

process, changes to it are synchronized; it's merely a data structure.<br>

<br>

Some words follow where I try to tease things out a bit more (mostly<br>

for my own sake, but if it helps other people, great). At the very<br>

end there's a bit of list of suggested todos for us to consider.<br>

<br>

What we have is a data structure which represents the resource<br>

tracker and virtdirver's current view on what providers and<br>

associates it is aware of. We maintain a boundary between the RT and<br>

the virtdriver that means there's "updating" going on that sometimes<br>

is a bit fussy to resolve (cf. recent adjustments to allocation<br>

ratio handling).<br>

<br>

In the old way, every now and again we get a bunch of info from<br>

placement to confirm that our view is right and try to reconcile<br>

things.<br>

<br>

What we're considering moving towards is only doing that "get a<br>

bunch of info from placement" when we fail to write to placement<br>

because of a generation conflict.<br>

<br>

Thus we should only read from placement:<br>

<br>

* at compute node startup<br>

* when a write fails<br>

<br>

And we should only write to placement:<br>

<br>

* at compute node startup<br>

* when the virt driver tells us something has changed<br>

<br>

Is that right? If it is not right, can we do that? If not, why not?<br>

<br>

Because generations change, often, they guard against us making<br>

changes in ignorance and allow us to write blindly and only GET when<br>

we fail. We've got this everywhere now, let's use it. So, for<br>

example, even if something else besides the compute is adding<br>

traits, it's cool. We'll fail when we (the compute) try to clobber.<br>

<br>

Elsewhere in the thread several other topics were raised. A lot of<br>

that boil back to "what are we actually trying to do in the<br>

periodics?". As is often the case (and appropriately so) what we're<br>

trying to do has evolved and accreted in an organic fashion and it<br>

is probably time for us to re-evaluate and make sure we're doing the<br>

right stuff. The first step is writing that down. That aspect has<br>

always been pretty obscure or tribal to me, I presume so for others.<br>

So doing a legit audit of that code and the goals is something we<br>

should do.<br>

<br>

Mohammed's comments about allocations getting out of sync are<br>

important. I agree with him that it would be excellent if we could<br>

go back to self-healing those, especially because of the "wait for<br>

the computes to automagically populate everything" part he mentions.<br>

However, that aspect, while related to this, is not quite the same<br>

thing. The management of allocations and the management of<br>

inventories (and "associates") is happening from different angles.<br>

<br>

And finally, even if we turn off these refreshes to lighten the<br>

load, placement still needs to be capable of dealing with frequent<br>

requests, so we have something to fix there. We need to do the<br>

analysis to find out where the cost is and implement some solutions.<br>

At the moment we don't know where it is. It could be:<br>

<br>

* In the database server<br>

* In the python code that marshals the data around those calls to<br>

   the database<br>

* In the python code that handles the WSGI interactions<br>

* In the web server that is talking to the python code<br>

<br>

belmoreira's document [2] suggests some avenues of investigation<br>

(most CPU time is in user space and not waiting) but we'd need a bit<br>

more information to plan any concrete next steps:<br>

<br>

* what's the web server and which wsgi configuration?<br>

* where's the database, if it's different what's the load there?<br>

<br>

I suspect there's a lot we can do to make our code more correct and<br>

efficient. And beyond that there is a great deal of standard run-of-<br>

the mill server-side caching and etag handling that we could<br>

implement if necessary. That is: treat placement like a web app that<br>

needs to be optimized in the usual ways.<br>

<br>

As Eric suggested at the start of the thread, this kind of<br>

investigation is expected and normal. We've not done something<br>

wrong. Make it, make it correct, make it fast is the process.<br>

We're oscillating somewhere between 2 and 3.<br>

<br>

So in terms of actions:<br>

<br>

* I'm pretty well situated to do some deeper profiling and<br>

   benchmarking of placement to find the elbows in that.<br>

<br>

* It seems like Eric and Jay are probably best situated to define<br>

   and refine what should really be going on with the resource<br>

   tracker and other actions on the compute-node.<br>

<br>

* We need to have further discussion and investigation on<br>

   allocations getting out of sync. Volunteers?<br>

<br>

What else?<br>

<br>

[1] <a href="https://review.openstack.org/#/c/614886/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/614886/</a><br>

[2] <a href="https://docs.google.com/document/d/1d5k1hA3DbGmMyJbXdVcekR12gyrFTaj_tJdFwdQy-8E/edit" rel="noreferrer" target="_blank">https://docs.google.com/document/d/1d5k1hA3DbGmMyJbXdVcekR12gyrFTaj_tJdFwdQy-8E/edit</a><br>

<br>

-- <br>

Chris Dent                       ٩◔̯◔۶           <a href="https://anticdent.org/" rel="noreferrer" target="_blank">https://anticdent.org/</a><br>

freenode: cdent                                         tw: @anticdent__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div></div></div>