<div dir="ltr">I have to agree with James....<div><br></div><div>My affinity and anti-affinity rules have nothing to do with NFV. a-a is almost always a failure domain solution. I'm not sure we have users actually choosing affinity (though it would likely be for network speed issues and/or some sort of badly architected need or perceived need for coupling.)</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 22, 2017 at 12:45 PM, James Penick <span dir="ltr"><<a href="mailto:jpenick@gmail.com" target="_blank">jpenick@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr"><br><div class="gmail_extra"><br><div class="gmail_quote">On Mon, May 22, 2017 at 10:54 AM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hi Ops,<br>

<br></blockquote><div>Hi!</div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

<br>

For class b) causes, we should be able to solve this issue when the placement service understands affinity/anti-affinity (maybe Queens/Rocky). Until then, we propose that instead of raising a Reschedule when an affinity constraint was last-minute violated due to a racing scheduler decision, that we simply set the instance to an ERROR state.<br>

<br>

Personally, I have only ever seen anti-affinity/affinity use cases in relation to NFV deployments, and in every NFV deployment of OpenStack there is a VNFM or MANO solution that is responsible for the orchestration of instances belonging to various service function chains. I think it is reasonable to expect the MANO system to be responsible for attempting a re-launch of an instance that was set to ERROR due to a last-minute affinity violation.<br></blockquote><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

**Operators, do you agree with the above?**<br></blockquote><div> </div></span><div>I do not. My affinity and anti-affinity use cases reflect the need to build large applications across failure domains in a datacenter.</div><div><br></div><div>Anti-affinity: Most anti-affinity use cases relate to the ability to guarantee that instances are scheduled across failure domains, others relate to security compliance.</div><div><br></div><div>Affinity: Hadoop/Big data deployments have affinity use cases, where nodes processing data need to be in the same rack as the nodes which house the data. This is a common setup for large hadoop deployers. </div><span class=""><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

I recognize that large Ironic users expressed their concerns about IPMI/BMC communication being unreliable and not wanting to have users manually retry a baremetal instance launch. But, on this particular point, I'm of the opinion that Nova just do one thing and do it well. Nova isn't an orchestrator, nor is it intending to be a "just continually try to get me to this eventual state" system like Kubernetes.<br></blockquote><div><br></div></span><div>Kubernetes is a larger orchestration platform that provides autoscale. I don't expect Nova to provide autoscale, but </div><div><br></div><div>I agree that Nova should do one thing and do it really well, and in my mind that thing is reliable provisioning of compute resources. Kubernetes does autoscale among other things. I'm not asking for Nova to provide Autoscale, I -AM- asking OpenStack's compute platform to provision a discrete compute resource reliably. This means overcoming common and simple error cases. As a deployer of OpenStack I'm trying to build a cloud that wraps the chaos of infrastructure, and present a reliable facade. When my users issue a boot request, I want to see if fulfilled. I don't expect it to be a 100% guarantee across any possible failure, but I expect (and my users demand) that my "Infrastructure as a service" API make reasonable accommodation to overcome common failures. </div><span class=""><div> <br></div><div> <br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">

If we removed Reschedule for class c) failures entirely, large Ironic deployers would have to train users to manually retry a failed launch or would need to write a simple retry mechanism into whatever client/UI that they expose to their users.<br>

<br>

**Ironic operators, would the above decision force you to abandon Nova as the multi-tenant BMaaS facility?**<br>

<br></blockquote><div><br></div></span><div> I just glanced at one of my production clusters and found there are around 7K users defined, many of whom use OpenStack on a daily basis. When they issue a boot call, they expect that request to be honored. From their perspective, if they call AWS, they get what they ask for. If you remove reschedules you're not just breaking the expectation of a single deployer, but for my thousands of engineers who, every day, rely on OpenStack to manage their stack.</div><div><br></div><div>I don't have a "i'll take my football and go home" mentality. But if you remove the ability for the compute provisioning API to present a reliable facade over infrastructure, I have to go write something else, or patch it back in. Now it's even harder for me to get and stay current with OpenStack.</div><div><br></div><div>During the summit the agreement was, if I recall, that reschedules would happen within a cell, and not between the parent and cell. That was completely acceptable to me.</div><span class="HOEnZb"><font color="#888888"><div><br></div><div>-James</div><div> </div></font></span></div></div></div>

<br>______________________________<wbr>_________________<br>

OpenStack-operators mailing list<br>

<a href="mailto:OpenStack-operators@lists.openstack.org">OpenStack-operators@lists.<wbr>openstack.org</a><br>

<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators" rel="noreferrer" target="_blank">http://lists.openstack.org/<wbr>cgi-bin/mailman/listinfo/<wbr>openstack-operators</a><br>

<br></blockquote></div><br></div>