[Openstack-operators] [nova][ironic][scheduler][placement] IMPORTANT: Getting rid of the automated reschedule functionality
Jay Pipes
jaypipes at gmail.com
Mon May 22 19:55:30 UTC 2017
On 05/22/2017 03:53 PM, Jonathan Proulx wrote:
> To be clear on my view of the whole proposal
>
> most of my Rescheduling that I've seen and want are of type "A" where
> claim exceeds resources. At least I think they are type "A" and not
> "C" unknown.
>
> The exact case is that I over subsribe RAM (1.5x) my users typically over
> claim so this is OK (my worst case is a hypervisor using only 10% of
> claimed RAM). But there are some hotspots where propertional
> utilization is high so libvirt won't start more VMs becasue it really
> doesn't have the memory.
>
> If that's solved (or will be at the time reschedule goes away), teh
> cases I've actually experienced would be solved.
>
> The anit-affinity use cases are currently most important to be of the
> affinity scheduling and I haven't (to my knowlege) seen collisions in
> that direction. So I could live with that race becuase for me it is
> uncommon (though I imagine for others where positive affinity is
> important teh race may get lost mroe frequently)
Thanks for the feedback, Jon.
For the record, affinity really doesn't have much of a race condition at
all. It's really only anti-affinity that has much of a chance of
last-minute violation.
Best,
-jay
> On Mon, May 22, 2017 at 03:00:09PM -0400, Jonathan Proulx wrote:
> :On Mon, May 22, 2017 at 11:45:33AM -0700, James Penick wrote:
> ::On Mon, May 22, 2017 at 10:54 AM, Jay Pipes <jaypipes at gmail.com> wrote:
> ::
> ::> Hi Ops,
> ::>
> ::> Hi!
> ::
> ::
> ::>
> ::> For class b) causes, we should be able to solve this issue when the
> ::> placement service understands affinity/anti-affinity (maybe Queens/Rocky).
> ::> Until then, we propose that instead of raising a Reschedule when an
> ::> affinity constraint was last-minute violated due to a racing scheduler
> ::> decision, that we simply set the instance to an ERROR state.
> ::>
> ::> Personally, I have only ever seen anti-affinity/affinity use cases in
> ::> relation to NFV deployments, and in every NFV deployment of OpenStack there
> ::> is a VNFM or MANO solution that is responsible for the orchestration of
> ::> instances belonging to various service function chains. I think it is
> ::> reasonable to expect the MANO system to be responsible for attempting a
> ::> re-launch of an instance that was set to ERROR due to a last-minute
> ::> affinity violation.
> ::>
> ::
> ::
> ::> **Operators, do you agree with the above?**
> ::>
> ::
> ::I do not. My affinity and anti-affinity use cases reflect the need to build
> ::large applications across failure domains in a datacenter.
> ::
> ::Anti-affinity: Most anti-affinity use cases relate to the ability to
> ::guarantee that instances are scheduled across failure domains, others
> ::relate to security compliance.
> ::
> ::Affinity: Hadoop/Big data deployments have affinity use cases, where nodes
> ::processing data need to be in the same rack as the nodes which house the
> ::data. This is a common setup for large hadoop deployers.
> :
> :James describes my use case as well.
> :
> :I would also rather see a reschedule, if we're having a really bad day
> :and reach max retries then see ERR
> :
> :-Jon
>
More information about the OpenStack-operators
mailing list