[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Jonathan Proulx jon at csail.mit.edu
Mon May 22 19:00:09 UTC 2017


On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote:
:On Mon, May 22, 2017 at 10:54 AM, Jay Pipes <jaypipes at gmail.com> wrote:
:
:> Hi Ops,
:>
:> Hi!
:
:
:>
:> For class b) causes, we should be able to solve this issue when the
:> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
:> Until then, we propose that instead of raising a Reschedule when an
:> affinity constraint was last-minute violated due to a racing scheduler
:> decision, that we simply set the instance to an ERROR state.
:>
:> Personally, I have only ever seen anti-affinity/affinity use cases in
:> relation to NFV deployments, and in every NFV deployment of OpenStack there
:> is a VNFM or MANO solution that is responsible for the orchestration of
:> instances belonging to various service function chains. I think it is
:> reasonable to expect the MANO system to be responsible for attempting a
:> re-launch of an instance that was set to ERROR due to a last-minute
:> affinity violation.
:>
:
:
:> **Operators, do you agree with the above?**
:>
:
:I do not. My affinity and anti-affinity use cases reflect the need to build
:large applications across failure domains in a datacenter.
:
:Anti-affinity: Most anti-affinity use cases relate to the ability to
:guarantee that instances are scheduled across failure domains, others
:relate to security compliance.
:
:Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
:processing data need to be in the same rack as the nodes which house the
:data. This is a common setup for large hadoop deployers.

James describes my use case as well.

I would also rather see a reschedule, if we're having a really bad day
and reach max retries then see ERR

-Jon



More information about the OpenStack-operators mailing list