<div dir="ltr"><div dir="ltr"><div>Thanks Eric for the patch.<br></div><div><div>This will help keeping placement calls under control.</div><div><br></div><div>Belmiro</div></div><div><br></div></div></div><br><div class="gmail_quote"><div dir="ltr">On Sun, Nov 4, 2018 at 1:01 PM Jay Pipes <<a href="mailto:jaypipes@gmail.com">jaypipes@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">On 11/02/2018 03:22 PM, Eric Fried wrote:<br>

> All-<br>

> <br>

> Based on a (long) discussion yesterday [1] I have put up a patch [2]<br>

> whereby you can set [compute]resource_provider_association_refresh to<br>

> zero and the resource tracker will never* refresh the report client's<br>

> provider cache. Philosophically, we're removing the "healing" aspect of<br>

> the resource tracker's periodic and trusting that placement won't<br>

> diverge from whatever's in our cache. (If it does, it's because the op<br>

> hit the CLI, in which case they should SIGHUP - see below.)<br>

> <br>

> *except:<br>

> - When we initially create the compute node record and bootstrap its<br>

> resource provider.<br>

> - When the virt driver's update_provider_tree makes a change,<br>

> update_from_provider_tree reflects them in the cache as well as pushing<br>

> them back to placement.<br>

> - If update_from_provider_tree fails, the cache is cleared and gets<br>

> rebuilt on the next periodic.<br>

> - If you send SIGHUP to the compute process, the cache is cleared.<br>

> <br>

> This should dramatically reduce the number of calls to placement from<br>

> the compute service. Like, to nearly zero, unless something is actually<br>

> changing.<br>

> <br>

> Can I get some initial feedback as to whether this is worth polishing up<br>

> into something real? (It will probably need a bp/spec if so.)<br>

> <br>

> [1]<br>

> <a href="http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03" rel="noreferrer" target="_blank">http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03</a><br>

> [2] <a href="https://review.openstack.org/#/c/614886/" rel="noreferrer" target="_blank">https://review.openstack.org/#/c/614886/</a><br>

> <br>

> ==========<br>

> Background<br>

> ==========<br>

> In the Queens release, our friends at CERN noticed a serious spike in<br>

> the number of requests to placement from compute nodes, even in a<br>

> stable-state cloud. Given that we were in the process of adding a ton of<br>

> infrastructure to support sharing and nested providers, this was not<br>

> unexpected. Roughly, what was previously:<br>

> <br>

>   @periodic_task:<br>

>       GET /resource_providers/$compute_uuid<br>

>       GET /resource_providers/$compute_uuid/inventories<br>

> <br>

> became more like:<br>

> <br>

>   @periodic_task:<br>

>       # In Queens/Rocky, this would still just return the compute RP<br>

>       GET /resource_providers?in_tree=$compute_uuid<br>

>       # In Queens/Rocky, this would return nothing<br>

>       GET /resource_providers?member_of=...&required=MISC_SHARES...<br>

>       for each provider returned above:  # i.e. just one in Q/R<br>

>           GET /resource_providers/$compute_uuid/inventories<br>

>           GET /resource_providers/$compute_uuid/traits<br>

>           GET /resource_providers/$compute_uuid/aggregates<br>

> <br>

> In a cloud the size of CERN's, the load wasn't acceptable. But at the<br>

> time, CERN worked around the problem by disabling refreshing entirely.<br>

> (The fact that this seems to have worked for them is an encouraging sign<br>

> for the proposed code change.)<br>

> <br>

> We're not actually making use of most of that information, but it sets<br>

> the stage for things that we're working on in Stein and beyond, like<br>

> multiple VGPU types, bandwidth resource providers, accelerators, NUMA,<br>

> etc., so removing/reducing the amount of information we look at isn't<br>

> really an option strategically.<br>

<br>

I support your idea of getting rid of the periodic refresh of the cache <br>

in the scheduler report client. Much of that was added in order to <br>

emulate the original way the resource tracker worked.<br>

<br>

Most of the behaviour in the original resource tracker (and some of the <br>

code still in there for dealing with (surprise!) PCI passthrough devices <br>

and NUMA topology) was due to doing allocations on the compute node (the <br>

whole claims stuff). We needed to always be syncing the state of the <br>

compute_nodes and pci_devices table in the cell database with whatever <br>

usage information was being created/modified on the compute nodes [0].<br>

<br>

All of the "healing" code that's in the resource tracker was basically <br>

to deal with "soft delete", migrations that didn't complete or work <br>

properly, and, again, to handle allocations becoming out-of-sync because <br>

the compute nodes were responsible for allocating (as opposed to the <br>

current situation we have where the placement service -- via the <br>

scheduler's call to claim_resources() -- is responsible for allocating <br>

resources [1]).<br>

<br>

Now that we have generation markers protecting both providers and <br>

consumers, we can rely on those generations to signal to the scheduler <br>

report client that it needs to pull fresh information about a provider <br>

or consumer. So, there's really no need to automatically and blindly <br>

refresh any more.<br>

<br>

Best,<br>

-jay<br>

<br>

[0] We always need to be syncing those tables because those tables, <br>

unlike the placement database's data modeling, couple both inventory AND <br>

usage in the same table structure...<br>

<br>

[1] again, except for PCI devices and NUMA topology, because of the <br>

tight coupling in place with the different resource trackers those types <br>

of resources use...<br>

<br>

<br>

__________________________________________________________________________<br>

OpenStack Development Mailing List (not for usage questions)<br>

Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.openstack.org?subject:unsubscribe</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev</a><br>

</blockquote></div>