[nova] review guide for the bandwidth patches

Sean Mooney smooney at redhat.com
Mon Jan 21 11:31:24 UTC 2019

On Mon, 2019-01-21 at 10:45 +0000, Bal√°zs Gibizer wrote:
> On Fri, Jan 18, 2019 at 7:40 PM, Dan Smith <dms at danplanet.com> wrote:
> > >  * There will be a second new microversion (probably in Train) that 
> > > will
> > >  enable move operations for server having resource aware ports. This
> > >  microversion split will allow us not to block the create / delete
> > >  support for the feature in Stein.
> > > 
> > >  * The new microversions will act as a feature flag in the code. This
> > >  will allow merging single use cases (e.g.: server create with one 
> > > ovs
> > >  backed resource aware port) and functionally verifying it before the
> > >  whole generic create use case is ready and enabled.
> > > 
> > >  * A nova-manage command will be provided to heal the port 
> > > allocations
> > >  without moving the servers if there is enough resource inventory
> > >  available for it on the current host. This tool will only work 
> > > online
> > >  as it will call neutron and placement APIs.
> > > 
> > >  * Server move operations with the second new microversion will
> > >  automatically heal the server allocation.
> > 
> > I wasn't on this call, so apologies if I'm missing something 
> > important.
> > 
> > Having a microversion that allows move operations for an instance
> > configured with one of these ports seems really terrible to me. What
> > exactly is the point of that? To distinguish between Stein and Train
> > systems purely because Stein didn't have time to finish the feature?
> I think in Stein we have time to finish the boot / delete use case of 
> the feature but most probably do not have time to finish the move use 
> cases. I belive that the boot / delete use case is already useful for 
> end users. There are plenty of features in nova that are enabled before 
> supporting all the cases, like move operations with NUMA.
that is true however numa in partaclar was due to an oversight not by design.
as is the case with macvtap sriov numa had intended to support livemigration
from its introduction even if they are only now being completed. numa
even without artoms work has always supported cold migrations the same is true
of cpu pinning,hugepages,pci/sriov  pass-though.
> > 
> > IMHO, we should really avoid abusing microversions for that sort of
> > thing. I would tend to err on the side of "if it's not ready, then 
> > it's
> > not ready" for Stein, but I'm sure the desire to get this in (even if
> > partially) is too strong for that sort of restraint.
> Why it is an abuse of the microversion to use it to signal that a new 
> use case is supported? I'm confused. I was asked to use microversions 
> to signal that a feature is ready.  So I'm not sure why in case of a 
> feature (feature == one ore more use case(s)) it is OK to use a 
> microversion but not OK when a use case (e.g. boot/delete) is completed.
dan can speak for himself but i would assume because it does not signal that
the use case is supported. it merely signals taht the codebase could support
it, as move operations can be disable via config or may not be supported by
the selected hypervior (ironic), the presence of the microversion
alone is not enough to determine the usecase is supported.

unlike neutron extensions micro versions are not advertised individually and cant
be enabled only when the deployment is configured to support a feature.

> > 
> > Can we not return 403 in Stein, since moving instances is disable-able
> > anyway, and just make it work in Train? Having a new microversion 
> > with a
> > description of "nothing changed except we finished a feature so you 
> > can
> > do this very obscure thing now" seems like we're just using them as an
> I think "nothing is changed" would not be true. Some operation (e.g. 
> server move) that was rejected before (or even accepted but caused 
> unintentional resource overallocation) now works properly.

since the min bandwith before was best effort any overallocation was
not a bug or unintentional it was allowed by design given that we initall
planned to delegate the bandwith mangment to the sdn contoler.

as matt pointed out the apis for creating qos rules and policies are admin only 
as are most of the move operations. a tenant could have chosen to apply the
QOS policy but the admin had to create it in the first place.

>  Isn't it the 
> "you can do this very obscure thing now" documentation of a 
> microversion that makes the new API behavior discoverable?
> > experimental feature flag, which was definitely not the intent. I know
> > returning 403 for "you can't do this right now" isn't *as* 
> > discoverable,
> > but you kinda have to handle 403 for operations that could be disabled
> > anyway, so...
> The boot / delete use case would not be experimental, that would be 
> final.
> 403 is a client error but in this case, in Stein, move operations would 
> not be implemented yet. So for me that error is not a client error 
> (e.g. there is no way a client can fix it) but a server error, like 
> HTTP 501.
a 501 "not implemented" would be a valid error code to use with the new mirco version
that declares support for bandwith based schduling. 
resize today does not retrun 501
nor do shelve/unshelve

the same is true of migrate and live migrate

as such for older microverions returning 501 would be incorrect as its a change in the set of
response codes that existing clients should expect form those endpoints. while
i agree it is not a client error being consitent with exisitng behavior chould be preferable
as client presuably know how to deal with it.

> Cheers,
> gibi
> > 
> > --Dan

More information about the openstack-discuss mailing list