[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Jay Pipes jaypipes at gmail.com
Tue May 23 15:52:06 UTC 2017


On 05/22/2017 03:36 PM, Sean Dague wrote:
> On 05/22/2017 02:45 PM, James Penick wrote:
> <snip>
>>
>>      I recognize that large Ironic users expressed their concerns about
>>      IPMI/BMC communication being unreliable and not wanting to have
>>      users manually retry a baremetal instance launch. But, on this
>>      particular point, I'm of the opinion that Nova just do one thing and
>>      do it well. Nova isn't an orchestrator, nor is it intending to be a
>>      "just continually try to get me to this eventual state" system like
>>      Kubernetes.
>>
>> Kubernetes is a larger orchestration platform that provides autoscale. I
>> don't expect Nova to provide autoscale, but
>>
>> I agree that Nova should do one thing and do it really well, and in my
>> mind that thing is reliable provisioning of compute resources.
>> Kubernetes does autoscale among other things. I'm not asking for Nova to
>> provide Autoscale, I -AM- asking OpenStack's compute platform to
>> provision a discrete compute resource reliably. This means overcoming
>> common and simple error cases. As a deployer of OpenStack I'm trying to
>> build a cloud that wraps the chaos of infrastructure, and present a
>> reliable facade. When my users issue a boot request, I want to see if
>> fulfilled. I don't expect it to be a 100% guarantee across any possible
>> failure, but I expect (and my users demand) that my "Infrastructure as a
>> service" API make reasonable accommodation to overcome common failures.
> 
> Right, I think hits my major queeziness with throwing the baby out with
> the bathwater here. I feel like Nova's job is to give me a compute when
> asked for computes. Yes, like malloc, things could fail. But honestly if
> Nova can recover from that scenario, it should try to. The baremetal and
> affinity cases are pretty good instances where Nova can catch and
> recover, and not just export that complexity up.
> 
> It would make me sad to just export that complexity to users, and
> instead of handing those cases internally make every SDK, App, and
> simple script build their own retry loop.

If Heat was more widely deployed, would you feel this way? Would you 
reconsider having Heat as one of those "basic compute services" in 
OpenStack, then?

This is, unfortunately, one of the main problems stemming from OpenStack 
not having a *single* public API, with projects implementing parts of 
that single public API. You know, the thing I started arguing for about 
6 years ago.

If we had one single public porcelain API, we wouldn't even need to have 
this conversation. People wouldn't even know we'd changed implementation 
details behind the scenes and were doing retries at a slightly higher 
level than before. Oh well... we live and learn (maybe).

Best,
-jay



More information about the OpenStack-operators mailing list