Open Stack

Tue Nov 6 23:53:28 UTC 2018

I do intend to respond to all the excellent discussion on this thread,
but right now I just want to offer an update on the code:

I've split the effort apart into multiple changes starting at [1]. A few
of these are ready for review.

One opinion was that a specless blueprint would be appropriate. If
there's consensus on this, I'll spin one up.

[1] https://review.openstack.org/#/c/615606/

On 11/5/18 03:16, Belmiro Moreira wrote:
> Thanks Eric for the patch.
> This will help keeping placement calls under control.
> 
> Belmiro
> 
> 
> On Sun, Nov 4, 2018 at 1:01 PM Jay Pipes <jaypipes at gmail.com
> <mailto:jaypipes at gmail.com>> wrote:
> 
>     On 11/02/2018 03:22 PM, Eric Fried wrote:
>     > All-
>     >
>     > Based on a (long) discussion yesterday [1] I have put up a patch [2]
>     > whereby you can set [compute]resource_provider_association_refresh to
>     > zero and the resource tracker will never* refresh the report client's
>     > provider cache. Philosophically, we're removing the "healing"
>     aspect of
>     > the resource tracker's periodic and trusting that placement won't
>     > diverge from whatever's in our cache. (If it does, it's because the op
>     > hit the CLI, in which case they should SIGHUP - see below.)
>     >
>     > *except:
>     > - When we initially create the compute node record and bootstrap its
>     > resource provider.
>     > - When the virt driver's update_provider_tree makes a change,
>     > update_from_provider_tree reflects them in the cache as well as
>     pushing
>     > them back to placement.
>     > - If update_from_provider_tree fails, the cache is cleared and gets
>     > rebuilt on the next periodic.
>     > - If you send SIGHUP to the compute process, the cache is cleared.
>     >
>     > This should dramatically reduce the number of calls to placement from
>     > the compute service. Like, to nearly zero, unless something is
>     actually
>     > changing.
>     >
>     > Can I get some initial feedback as to whether this is worth
>     polishing up
>     > into something real? (It will probably need a bp/spec if so.)
>     >
>     > [1]
>     >
>     http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
>     > [2] https://review.openstack.org/#/c/614886/
>     >
>     > ==========
>     > Background
>     > ==========
>     > In the Queens release, our friends at CERN noticed a serious spike in
>     > the number of requests to placement from compute nodes, even in a
>     > stable-state cloud. Given that we were in the process of adding a
>     ton of
>     > infrastructure to support sharing and nested providers, this was not
>     > unexpected. Roughly, what was previously:
>     >
>     >   @periodic_task:
>     >       GET /resource_providers/$compute_uuid
>     >       GET /resource_providers/$compute_uuid/inventories
>     >
>     > became more like:
>     >
>     >   @periodic_task:
>     >       # In Queens/Rocky, this would still just return the compute RP
>     >       GET /resource_providers?in_tree=$compute_uuid
>     >       # In Queens/Rocky, this would return nothing
>     >       GET /resource_providers?member_of=...&required=MISC_SHARES...
>     >       for each provider returned above:  # i.e. just one in Q/R
>     >           GET /resource_providers/$compute_uuid/inventories
>     >           GET /resource_providers/$compute_uuid/traits
>     >           GET /resource_providers/$compute_uuid/aggregates
>     >
>     > In a cloud the size of CERN's, the load wasn't acceptable. But at the
>     > time, CERN worked around the problem by disabling refreshing entirely.
>     > (The fact that this seems to have worked for them is an
>     encouraging sign
>     > for the proposed code change.)
>     >
>     > We're not actually making use of most of that information, but it sets
>     > the stage for things that we're working on in Stein and beyond, like
>     > multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
>     > etc., so removing/reducing the amount of information we look at isn't
>     > really an option strategically.
> 
>     I support your idea of getting rid of the periodic refresh of the cache
>     in the scheduler report client. Much of that was added in order to
>     emulate the original way the resource tracker worked.
> 
>     Most of the behaviour in the original resource tracker (and some of the
>     code still in there for dealing with (surprise!) PCI passthrough
>     devices
>     and NUMA topology) was due to doing allocations on the compute node
>     (the
>     whole claims stuff). We needed to always be syncing the state of the
>     compute_nodes and pci_devices table in the cell database with whatever
>     usage information was being created/modified on the compute nodes [0].
> 
>     All of the "healing" code that's in the resource tracker was basically
>     to deal with "soft delete", migrations that didn't complete or work
>     properly, and, again, to handle allocations becoming out-of-sync
>     because
>     the compute nodes were responsible for allocating (as opposed to the
>     current situation we have where the placement service -- via the
>     scheduler's call to claim_resources() -- is responsible for allocating
>     resources [1]).
> 
>     Now that we have generation markers protecting both providers and
>     consumers, we can rely on those generations to signal to the scheduler
>     report client that it needs to pull fresh information about a provider
>     or consumer. So, there's really no need to automatically and blindly
>     refresh any more.
> 
>     Best,
>     -jay
> 
>     [0] We always need to be syncing those tables because those tables,
>     unlike the placement database's data modeling, couple both inventory
>     AND
>     usage in the same table structure...
> 
>     [1] again, except for PCI devices and NUMA topology, because of the
>     tight coupling in place with the different resource trackers those
>     types
>     of resources use...
> 
> 
>     __________________________________________________________________________
>     OpenStack Development Mailing List (not for usage questions)
>     Unsubscribe:
>     OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
>     <http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe>
>     http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 
> 
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
> 

Open Stack

[openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

OpenStack

Community

Documentation

Branding & Legal