[openstack-dev] [nova][placement] Trying to summarize bp/glance-image-traits scheduling alternatives for rebuild
melwittt at gmail.com
Wed May 2 23:11:18 UTC 2018
On Wed, 2 May 2018 17:45:37 -0500, Matt Riedemann wrote:
> On 5/2/2018 5:39 PM, Jay Pipes wrote:
>> My personal preference is to add less technical debt and go with a
>> solution that checks if image traits have changed in nova-api and if so,
>> simply refuse to perform a rebuild.
> So, what if when I created my server, the image I used, let's say
> image1, had required trait A and that fit the host.
> Then some external service removes (or somehow changes) trait A from the
> compute node resource provider (because people can and will do this,
> there are a few vmware specs up that rely on being able to manage traits
> out of band from nova), and then I rebuild my server with image2 that
> has required trait A. That would match the original trait A in image1
> and we'd say, "yup, lgtm!" and do the rebuild even though the compute
> node resource provider wouldn't have trait A anymore.
> Having said that, it could technically happen before traits if the
> operator changed something on the underlying compute host which
> invalidated instances running on that host, but I'd think if that
> happened the operator would be migrating everything off the host and
> disabling it from scheduling before making whatever that kind of change
> would be, let's say they change the hypervisor or something less drastic
> but still image property invalidating.
This is a scenario I was thinking about too. In the land of software
licenses, this would be analogous to removing a license from a compute
host, say. The instance is already there but should we let a rebuild
proceed that is going to violate the image traits currently supported by
that host? Do we potentially prolong the life of that instance by
letting it be re-imaged?
I'm late to this thread but I finally went through the replies and my
thought is, we should do a pre-flight check to verify with placement
whether the image traits requested are 1) supported by the compute host
the instance is residing on and 2) coincide with the already-existing
allocations. Instead of making an assumption based on "last image" vs
"new image" and artificially limiting a rebuild that should be valid to
go ahead. I can imagine scenarios where a user is trying to do a rebuild
that their cloud admin says should be perfectly valid on their
hypervisor, but it's getting rejected because old image traits != new
image traits. It seems like unnecessary user and admin pain.
It doesn't seem correct to reject the request if the current compute
host can fulfill it, and if I understood correctly, we have placement
APIs we can call from the conductor to verify the image traits requested
for the rebuild can be fulfilled. Is there a reason not to do that?
More information about the OpenStack-dev