[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Sean Dague sean at dague.net
Mon May 22 19:36:19 UTC 2017


On 05/22/2017 02:45 PM, James Penick wrote:
<snip>
>  
> 
>     I recognize that large Ironic users expressed their concerns about
>     IPMI/BMC communication being unreliable and not wanting to have
>     users manually retry a baremetal instance launch. But, on this
>     particular point, I'm of the opinion that Nova just do one thing and
>     do it well. Nova isn't an orchestrator, nor is it intending to be a
>     "just continually try to get me to this eventual state" system like
>     Kubernetes.
> 
> 
> Kubernetes is a larger orchestration platform that provides autoscale. I
> don't expect Nova to provide autoscale, but 
> 
> I agree that Nova should do one thing and do it really well, and in my
> mind that thing is reliable provisioning of compute resources.
> Kubernetes does autoscale among other things. I'm not asking for Nova to
> provide Autoscale, I -AM- asking OpenStack's compute platform to
> provision a discrete compute resource reliably. This means overcoming
> common and simple error cases. As a deployer of OpenStack I'm trying to
> build a cloud that wraps the chaos of infrastructure, and present a
> reliable facade. When my users issue a boot request, I want to see if
> fulfilled. I don't expect it to be a 100% guarantee across any possible
> failure, but I expect (and my users demand) that my "Infrastructure as a
> service" API make reasonable accommodation to overcome common failures. 

Right, I think hits my major queeziness with throwing the baby out with
the bathwater here. I feel like Nova's job is to give me a compute when
asked for computes. Yes, like malloc, things could fail. But honestly if
Nova can recover from that scenario, it should try to. The baremetal and
affinity cases are pretty good instances where Nova can catch and
recover, and not just export that complexity up.

It would make me sad to just export that complexity to users, and
instead of handing those cases internally make every SDK, App, and
simple script build their own retry loop.

	-Sean

-- 
Sean Dague
http://dague.net



More information about the OpenStack-operators mailing list