[openstack-dev] [nova][placement] Placement requests and caching in the resource tracker

Mohammed Naser mnaser at vexxhost.com
Sun Nov 4 10:11:59 UTC 2018


On Fri, Nov 2, 2018 at 9:32 PM Matt Riedemann <mriedemos at gmail.com> wrote:
>
> On 11/2/2018 2:22 PM, Eric Fried wrote:
> > Based on a (long) discussion yesterday [1] I have put up a patch [2]
> > whereby you can set [compute]resource_provider_association_refresh to
> > zero and the resource tracker will never* refresh the report client's
> > provider cache. Philosophically, we're removing the "healing" aspect of
> > the resource tracker's periodic and trusting that placement won't
> > diverge from whatever's in our cache. (If it does, it's because the op
> > hit the CLI, in which case they should SIGHUP - see below.)
> >
> > *except:
> > - When we initially create the compute node record and bootstrap its
> > resource provider.
> > - When the virt driver's update_provider_tree makes a change,
> > update_from_provider_tree reflects them in the cache as well as pushing
> > them back to placement.
> > - If update_from_provider_tree fails, the cache is cleared and gets
> > rebuilt on the next periodic.
> > - If you send SIGHUP to the compute process, the cache is cleared.
> >
> > This should dramatically reduce the number of calls to placement from
> > the compute service. Like, to nearly zero, unless something is actually
> > changing.
> >
> > Can I get some initial feedback as to whether this is worth polishing up
> > into something real? (It will probably need a bp/spec if so.)
> >
> > [1]
> > http://eavesdrop.openstack.org/irclogs/%23openstack-nova/%23openstack-nova.2018-11-01.log.html#t2018-11-01T17:32:03
> > [2]https://review.openstack.org/#/c/614886/
> >
> > ==========
> > Background
> > ==========
> > In the Queens release, our friends at CERN noticed a serious spike in
> > the number of requests to placement from compute nodes, even in a
> > stable-state cloud. Given that we were in the process of adding a ton of
> > infrastructure to support sharing and nested providers, this was not
> > unexpected. Roughly, what was previously:
> >
> >   @periodic_task:
> >       GET/resource_providers/$compute_uuid
> >       GET/resource_providers/$compute_uuid/inventories
> >
> > became more like:
> >
> >   @periodic_task:
> >       # In Queens/Rocky, this would still just return the compute RP
> >       GET /resource_providers?in_tree=$compute_uuid
> >       # In Queens/Rocky, this would return nothing
> >       GET /resource_providers?member_of=...&required=MISC_SHARES...
> >       for each provider returned above:  # i.e. just one in Q/R
> >           GET/resource_providers/$compute_uuid/inventories
> >           GET/resource_providers/$compute_uuid/traits
> >           GET/resource_providers/$compute_uuid/aggregates
> >
> > In a cloud the size of CERN's, the load wasn't acceptable. But at the
> > time, CERN worked around the problem by disabling refreshing entirely.
> > (The fact that this seems to have worked for them is an encouraging sign
> > for the proposed code change.)
> >
> > We're not actually making use of most of that information, but it sets
> > the stage for things that we're working on in Stein and beyond, like
> > multiple VGPU types, bandwidth resource providers, accelerators, NUMA,
> > etc., so removing/reducing the amount of information we look at isn't
> > really an option strategically.
>
> A few random points from the long discussion that should probably
> re-posed here for wider thought:
>
> * There was probably a lot of discussion about why we needed to do this
> caching and stuff in the compute in the first place. What has changed
> that we no longer need to aggressively refresh the cache on every
> periodic? I thought initially it came up because people really wanted
> the compute to be fully self-healing to any external changes, including
> hot plugging resources like disk on the host to automatically reflect
> those changes in inventory. Similarly, external user/service
> interactions with the placement API which would then be automatically
> picked up by the next periodic run - is that no longer a desire, and/or
> how was the decision made previously that simply requiring a SIGHUP in
> that case wasn't sufficient/desirable.
>
> * I believe I made the point yesterday that we should probably not
> refresh by default, and let operators opt-in to that behavior if they
> really need it, i.e. they are frequently making changes to the
> environment, potentially by some external service (I could think of
> vCenter doing this to reflect changes from vCenter back into
> nova/placement), but I don't think that should be the assumed behavior
> by most and our defaults should reflect the "normal" use case.
>
> * I think I've noted a few times now that we don't actually use the
> provider aggregates information (yet) in the compute service. Nova host
> aggregate membership is mirror to placement since Rocky [1] but that
> happens in the API, not the the compute. The only thing I can think of
> that relied on resource provider aggregate information in the compute is
> the shared storage providers concept, but that's not supported (yet)
> [2]. So do we need to keep retrieving aggregate information when nothing
> in compute uses it yet?
>
> * Similarly, why do we need to get traits on each periodic? The only
> in-tree virt driver I'm aware of that *reports* traits is the libvirt
> driver for CPU features [3]. Otherwise I think the idea behind getting
> the latest traits is so the virt driver doesn't overwrite any traits set
> externally on the compute node root resource provider. I think that
> still stands and is probably OK, even though we have generations now
> which should keep us from overwriting if we don't have the latest
> traits, but I wanted to bring it up since it's related to the "why do we
> need provider aggregates in the compute?" question.
>
> * Regardless of what we do, I think we should probably *at least* make
> that refresh associations config allow 0 to disable it so CERN (and
> others) can avoid the need to continually forward-porting code to
> disable it.
>
> [1]
> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/placement-mirror-host-aggregates.html
> [2] https://bugs.launchpad.net/nova/+bug/1784020
> [3]
> https://specs.openstack.org/openstack/nova-specs/specs/rocky/implemented/report-cpu-features-as-traits.html
>
> --
>
> Thanks,
>
> Matt
>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Mohammed Naser — vexxhost
-----------------------------------------------------
D. 514-316-8872
D. 800-910-1726 ext. 200
E. mnaser at vexxhost.com
W. http://vexxhost.com



More information about the OpenStack-dev mailing list