[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Jay Pipes jaypipes at gmail.com
Mon May 22 17:54:13 UTC 2017

Hi Ops,

I need your feedback on a very important direction we would like to 
pursue. I realize that there were Forum sessions about this topic at the 
summit in Boston and that there were some decisions that were reached.

I'd like to revisit that decision and explain why I'd like your support 
for getting rid of the automatic reschedule behaviour entirely in Nova 
for Pike.

== The current situation and why it sucks ==

Nova currently attempts to "reschedule" instances when any of the 
following events occur:

a) the "claim resources" process that occurs on the nova-compute worker 
results in the chosen compute node exceeding its own capacity

b) in between the time a compute node was chosen by the scheduler, 
another process launched an instance that would violate an affinity 

c) an "unknown" exception occurs during the spawn process. In practice, 
this really only is seen when the Ironic baremetal node that was chosen 
by the scheduler turns out to be unreliable (IPMI issues, BMC failures, 
etc) and wasn't able to launch the instance. [1]

The logic for handling these reschedules makes the Nova conductor, 
scheduler and compute worker code very complex. With the new cellsv2 
architecture in Nova, child cells are not able to communicate with the 
Nova scheduler (and thus "ask for a reschedule").

We (the Nova team) would like to get rid of the automated rescheduling 
behaviour that Nova currently exposes because we could eliminate a large 
amount of complexity (which leads to bugs) from the already-complicated 
dance of communication that occurs between internal Nova components.

== What we would like to do ==

With the move of the resource claim to the Nova scheduler [2], we can 
entirely eliminate the a) class of Reschedule causes.

This leaves class b) and c) causes of Rescheduling.

For class b) causes, we should be able to solve this issue when the 
placement service understands affinity/anti-affinity (maybe 
Queens/Rocky). Until then, we propose that instead of raising a 
Reschedule when an affinity constraint was last-minute violated due to a 
racing scheduler decision, that we simply set the instance to an ERROR 

Personally, I have only ever seen anti-affinity/affinity use cases in 
relation to NFV deployments, and in every NFV deployment of OpenStack 
there is a VNFM or MANO solution that is responsible for the 
orchestration of instances belonging to various service function chains. 
I think it is reasonable to expect the MANO system to be responsible for 
attempting a re-launch of an instance that was set to ERROR due to a 
last-minute affinity violation.

**Operators, do you agree with the above?**

Finally, for class c) Reschedule causes, I do not believe that we should 
be attempting automated rescheduling when "unknown" errors occur. I just 
don't believe this is something Nova should be doing.

I recognize that large Ironic users expressed their concerns about 
IPMI/BMC communication being unreliable and not wanting to have users 
manually retry a baremetal instance launch. But, on this particular 
point, I'm of the opinion that Nova just do one thing and do it well. 
Nova isn't an orchestrator, nor is it intending to be a "just 
continually try to get me to this eventual state" system like Kubernetes.

If we removed Reschedule for class c) failures entirely, large Ironic 
deployers would have to train users to manually retry a failed launch or 
would need to write a simple retry mechanism into whatever client/UI 
that they expose to their users.

**Ironic operators, would the above decision force you to abandon Nova 
as the multi-tenant BMaaS facility?**

Thanks in advance for your consideration and feedback.


[1] This really does not occur with any frequency for hypervisor virt 
drivers, since the exceptions those hypervisors throw are caught by the 
nova-compute worker and handled without raising a Reschedule.


More information about the OpenStack-operators mailing list