[openstack-dev] [nova] [placement] resource providers update 44
mriedemos at gmail.com
Wed Dec 13 03:24:40 UTC 2017
On 12/8/2017 7:57 AM, Chris Dent wrote:
> I'm feeling pressure from the (excellent and welcomed) Weekly Owl
> and Keystone update:
> to enhance the rp and placement update brand, to maximize market
> penetration, global influence, and reader participation, but instead
> I'll stick to habit.
> # Most Important
> The three main themes (nested providers, alternate hosts, migration)
> continue to be the main priorities.
> A thing to be aware of, now that some of nested has merged on the
> placement side, is that the nova side is underway and the work
> surrounding the ProviderTree concept is fairly penetrating. It will
> supersede the "get_inventory" method in the virt drivers, and thus far
> not all the virt drivers have get_inventory methods. See
> way down inside the nested-resource-providers topic for a bit of
> # What's Changed
> Microversion 1.14 has merged (causing a might microversion conflict
> pileup behind it) meaning that some aspects of nested providers are in
> the placement API. Hierarchies of providers can be created, and trees
> of results can be returned:
I didn't help with the reviews when this merged, but did ask some
questions about the "in_tree" filter functionality added in a review today:
It would be helpful for me, and I'll assume others, if we had something
more in the API reference docs with example provider trees and what the
results would be when listing providers with specific nodes in that
tree, i.e. define what a "provider tree" is - where does it stop? I
don't know if that's good to bake into the API reference parameter
description itself or create it's own doc page to link to for an
explanation and example usage.
> A fix was merged that defaults the accept header, if not set, to
> application/json. This means that whereas in the past an error
> response could inadvertently be HTML, it will now be JSON, structured
> (partially, critically the 'code' field is not there as we've stumbled
> trying to create standards) according to the errors guideline:
> Eric made the compute node be more alert when it encounters an error
> condition when getting or creating resource providers:
> This led to the discovery that in grenade placement was not being
> properly stopped and restarted over the upgrade transition:
> I mention all this because it's quite likely that latent bugs with
> talking to placement (from nova) in grenade will be exposed. Be on
> the lookout. If you find something weird, report a bug, and if
> possible, fix it.
> # Help Wanted
> (unchanged from last week, no new data, yet)
> A takeaway from summit is that we need, where possible, benchmarking
> info from people who are making the transition from old methods of
> scheduling to the newer allocation_candidate driven modes. While
> detailed numbers will be most useful, even anecdotal summaries of
> "woot it's way better" or "hmmm, no it seems worse" are useful.
> # Docs
> Quite a few docs improvements have merged. Others need more review:
> * https://review.openstack.org/#/c/512215/
> Add create inventories doc for placement
> * https://review.openstack.org/#/c/523007/
> Add x-openstack-request-id in API ref
> * https://review.openstack.org/#/c/521541/
> Add 'Location' parameters in API ref
> * https://review.openstack.org/#/c/511342/
> add API reference for create inventory
> # Nested Providers
> The nested-resource-providers stack has grown a long tail of changes
> for managing nested providers rooted on a compute node:
> As mentioned above this has impact for virt drivers.
> The current spec for nested providers
> doesn't really cover the ProviderTree and inventory management plans
> that are currently being implemented in that long tail. That makes it
> a bit harder to review than it might otherwise. We may not need a spec
> but a sort of explanatory overview may help provide some context on
> what needs to happen. A lot of the work that is in progress feels like
> it is working to a design where the use cases are not entirely obvious.
> There's a danger this can lead to an implementation that is somewhat
> divorced for reality. There's no evidence as yet that this is
> happening, but there's also none that it's not.
> ## Alternate Hosts
> Having the scheduler request and use alternate hosts:
> This has come unstuck and is moving along, but needs continued eyes.
I pushed a change to run ironic CI on superconductor mode again because
the ironic team had to disable that in Pike due to their multinode job
failing at a high rate if it couldn't do reschedules.
That's dependent on Ed's change to use the alternate hosts in the
reschedule loop when building an instance. The ironic multinode job
failed with a RescheduledException when it tried to apparently hit the
scheduler (and/or API DB) again, so something is amiss there. I linked
the job failure n-cpu logs into the relevant patches but didn't get to
the point of tracking the request ID through the scheduler logs to see
if we log the alternates chosen for that request (there would only be 1
alternate in a 2-node job). Or it could just be a simple bug in the nova
patch, idk, but it needs investigation.
> ## Migration allocations
> Do allocation "doubling" using the migration uuid for the consumer for
> one half. This is also very close:
> The concept of migration allocations is what drove the work to enable
> the POST /allocations handling now at microversion 1.13, so we have
> the option to start using that power. Dan helpfully left comments in
> the code to indicate where it could be done. Do we want to consider
> getting that in before the end of queens, to avoid some racing?
Yes, we want to use POST /allocations during cold/live migrate. The
sooner we get that done in Queens the better (flush out any side effects
as early as possible). We already require placement 1.14 now so we might
as well just start working on this change. I'd say I'll start working on
it but I've been saying that to too many things lately. I did leave the
blueprint open for the time being to account for this work.
> ## Misc Traits, Shared, Etc Cleanups
> There's a stack of code that's not attached to a blueprint, starting
> that fixes up a lot of things related to traits, sharing providers,
> test additions and fixes to those tests. At the moment they are a bug
> But that is not the only bug they are addressing. Some of the above
> probably appear in the list below too.
> # Other
> This week nothing new is added to the "other" list. I've simply copied
> over the previous week's list with anything that's been merged or
> abandoned removed. A fair amount has been merged, that's cool.
> * https://review.openstack.org/#/c/522002/
> skip authentication on root URI
> * https://review.openstack.org/#/c/522407/
> Add aggregates check in allocation canditates
> * https://review.openstack.org/#/c/519462/
> Log options at debug when starting API services under wsgi
> (Make any sense to split this into placement and nova versions? One
> seems easier than the other)
> * https://review.openstack.org/#/c/506175/
> VMware: implement get_inventory() driver method
> * https://review.openstack.org/#/c/508555/
> Re-use existing ComputeNode on ironic rebalance
This is approved now and the backport to stable/pike is proposed.
> Proper error handling by _ensure_resource_provider
> (This is already approved for master, but there are backports.)
> * https://review.openstack.org/#/q/topic:bp/placement-osc-plugin
> Build the placement osc plugin
Andrey pushed a new version of the bottom change (allocations support);
need to get back to reviewing that.
> * https://review.openstack.org/#/c/511936/
> Neutron's placement client
> * https://review.openstack.org/#/c/521640/
> cache-related headers for placement
Approved. Get your rebase buttons ready.
> * https://review.openstack.org/#/q/topic:bp/request-traits-in-nova
> request traits in nova
> * https://review.openstack.org/#/c/513041/
> Extract instance allocation removal code
> * https://review.openstack.org/#/c/493865/
> cover migration cases with functional tests
> * https://review.openstack.org/#/c/501252/
> doc: note that custom resources are not fully supported
> * https://review.openstack.org/#/c/494206/
> Remove the Pike migration code for flavor migration
I've started a nova-status patch to check for this migration:
Needs tests which I hope to add sometime later this week. I think we
should backport that also since we would have done it in Pike if we had
it then, but a lot of this stuff came in late too.
> * https://review.openstack.org/#/c/512497/
> Refactor placement version check
> * https://review.openstack.org/#/q/topic:bp/add-support-for-vgpu
> Add support for VGPU
> * https://review.openstack.org/#/q/topic:placement_schema_separation
> Put the json schema in their own directory
> # End
> Your prize this week is one of those sticky octopuses that you throw
> at the wall, and it rolls down. Except dressed as Santa Claus.
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
More information about the OpenStack-dev