[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality

Jonathan Proulx jon at csail.mit.edu
Mon May 22 19:53:12 UTC 2017


To be clear on my view of the whole proposal

most of my Rescheduling that I've seen and want are of type "A" where
claim exceeds resources.  At least I think they are type "A" and not
"C" unknown.

The exact case is that I over subsribe RAM (1.5x) my users typically over
claim so this is OK (my worst case is a hypervisor using only 10% of
claimed RAM).  But there are some hotspots where propertional
utilization is high so libvirt won't start more VMs becasue it really
doesn't have the memory.

If that's solved (or will be at the time reschedule goes away), teh
cases I've actually experienced would be solved.

The anit-affinity use cases are currently most important to be of the
affinity scheduling and I haven't (to my knowlege) seen collisions in
that direction.  So I could live with that race becuase for me it is
uncommon (though I imagine for others where positive affinity is
important teh race may get lost mroe frequently) 

-Jon

On Mon, May 22, 2017 at 03:00:09PM -0400, Jonathan Proulx wrote:
:On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote:
::On Mon, May 22, 2017 at 10:54 AM, Jay Pipes <jaypipes at gmail.com> wrote:
::
::> Hi Ops,
::>
::> Hi!
::
::
::>
::> For class b) causes, we should be able to solve this issue when the
::> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
::> Until then, we propose that instead of raising a Reschedule when an
::> affinity constraint was last-minute violated due to a racing scheduler
::> decision, that we simply set the instance to an ERROR state.
::>
::> Personally, I have only ever seen anti-affinity/affinity use cases in
::> relation to NFV deployments, and in every NFV deployment of OpenStack there
::> is a VNFM or MANO solution that is responsible for the orchestration of
::> instances belonging to various service function chains. I think it is
::> reasonable to expect the MANO system to be responsible for attempting a
::> re-launch of an instance that was set to ERROR due to a last-minute
::> affinity violation.
::>
::
::
::> **Operators, do you agree with the above?**
::>
::
::I do not. My affinity and anti-affinity use cases reflect the need to build
::large applications across failure domains in a datacenter.
::
::Anti-affinity: Most anti-affinity use cases relate to the ability to
::guarantee that instances are scheduled across failure domains, others
::relate to security compliance.
::
::Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
::processing data need to be in the same rack as the nodes which house the
::data. This is a common setup for large hadoop deployers.
:
:James describes my use case as well.
:
:I would also rather see a reschedule, if we're having a really bad day
:and reach max retries then see ERR
:
:-Jon

-- 



More information about the OpenStack-operators mailing list