[openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Belmiro Moreira moreira.belmiro.email.lists at gmail.com
Mon Nov 5 09:16:26 UTC 2018


Thanks Eric for the patch.
This will help keeping placement calls under control.

Belmiro


On Sun, Nov 4, 2018 at 1:01 PM Jay Pipes <jaypipes at gmail.com> wrote:

> On 11/02/2018 03:22 PM, Eric Fried wrote:
> > All-
> >
> > Based on a (long) discussion yesterday [1] I have put up a patch [2]
> > whereby you can set [compute]resource_provider_association_refresh to
> > zero and the resource tracker will never* refresh the report client's
> > provider cache. Philosophically, we're removing the "healing" aspect of
> > the resource tracker's periodic and trusting that placement won't
> > diverge from whatever's in our cache. (If it does, it's because the op
> > hit the CLI, in which case they should SIGHUP - see below.)
> >
> > *except:
> > - When we initially create the compute node record and bootstrap its
> > resource provider.
> > - When the virt driver's update_provider_tree makes a change,
> > update_from_provider_tree reflects them in the cache as well as pushing
> > them back to placement.
> > - If update_from_provider_tree fails, the cache is cleared and gets
> > rebuilt on the next periodic.
> > - If you send SIGHUP to the compute process, the cache is cleared.
> >
> > This should dramatically reduce the number of calls to placement from
> > the compute service. Like, to nearly zero, unless something is actually
> > changing.
> >
> > Can I get some initial feedback as to whether this is worth polishing up
> > into something real? (It will probably need a bp/spec if so.)
> >
> > [1]
> >
> http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
> > [2] https://review.openstack.org/#/c/614886/
> >
> > ==========
> > Background
> > ==========
> > In the Queens release, our friends at CERN noticed a serious spike in
> > the number of requests to placement from compute nodes, even in a
> > stable-state cloud. Given that we were in the process of adding a ton of
> > infrastructure to support sharing and nested providers, this was not
> > unexpected. Roughly, what was previously:
> >
> >   @periodic_task:
> >       GET /resource_providers/$compute_uuid
> >       GET /resource_providers/$compute_uuid/inventories
> >
> > became more like:
> >
> >   @periodic_task:
> >       # In Queens/Rocky, this would still just return the compute RP
> >       GET /resource_providers?in_tree=$compute_uuid
> >       # In Queens/Rocky, this would return nothing
> >       GET /resource_providers?member_of=...&required=MISC_SHARES...
> >       for each provider returned above:  # i.e. just one in Q/R
> >           GET /resource_providers/$compute_uuid/inventories
> >           GET /resource_providers/$compute_uuid/traits
> >           GET /resource_providers/$compute_uuid/aggregates
> >
> > In a cloud the size of CERN's, the load wasn't acceptable. But at the
> > time, CERN worked around the problem by disabling refreshing entirely.
> > (The fact that this seems to have worked for them is an encouraging sign
> > for the proposed code change.)
> >
> > We're not actually making use of most of that information, but it sets
> > the stage for things that we're working on in Stein and beyond, like
> > multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
> > etc., so removing/reducing the amount of information we look at isn't
> > really an option strategically.
>
> I support your idea of getting rid of the periodic refresh of the cache
> in the scheduler report client. Much of that was added in order to
> emulate the original way the resource tracker worked.
>
> Most of the behaviour in the original resource tracker (and some of the
> code still in there for dealing with (surprise!) PCI passthrough devices
> and NUMA topology) was due to doing allocations on the compute node (the
> whole claims stuff). We needed to always be syncing the state of the
> compute_nodes and pci_devices table in the cell database with whatever
> usage information was being created/modified on the compute nodes [0].
>
> All of the "healing" code that's in the resource tracker was basically
> to deal with "soft delete", migrations that didn't complete or work
> properly, and, again, to handle allocations becoming out-of-sync because
> the compute nodes were responsible for allocating (as opposed to the
> current situation we have where the placement service -- via the
> scheduler's call to claim_resources() -- is responsible for allocating
> resources [1]).
>
> Now that we have generation markers protecting both providers and
> consumers, we can rely on those generations to signal to the scheduler
> report client that it needs to pull fresh information about a provider
> or consumer. So, there's really no need to automatically and blindly
> refresh any more.
>
> Best,
> -jay
>
> [0] We always need to be syncing those tables because those tables,
> unlike the placement database's data modeling, couple both inventory AND
> usage in the same table structure...
>
> [1] again, except for PCI devices and NUMA topology, because of the
> tight coupling in place with the different resource trackers those types
> of resources use...
>
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20181105/1f27c679/attachment.html>


More information about the OpenStack-dev mailing list