<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Tahoma;
panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0cm;
margin-bottom:.0001pt;
font-size:12.0pt;
font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-reply;
font-family:"Calibri","sans-serif";
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:612.0pt 792.0pt;
margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Hi Chris,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Yep, Conductor can defiantly make life better here by breaking the create sequence up into smaller steps.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">We could probably do something fairly simple ahead of that (if anyone was interested) by just providing granularity between the capacity failure and other build
errors and having separate retry limits for both. (I always like to keep simple fixes moving through while waiting for the big changes to land)<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Phil<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif"">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif""> Chris Behrens [mailto:cbehrens@codestud.com]
<br>
<b>Sent:</b> 22 May 2013 18:48<br>
<b>To:</b> OpenStack Development Mailing List<br>
<b>Subject:</b> Re: [openstack-dev] Running multiple filter schedulers in parallel<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal">On May 22, 2013, at 7:01 AM, "Day, Phil" <<a href="mailto:philip.day@hp.com">philip.day@hp.com</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal"><br>
<br>
<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Thanks Chris,</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">Yep we do have the scheduler_host_subset_size set (I think that was one of our contributions), and we are getting the kick back from the compute nodes when
the resource tracker kicks in, so all of that is working as it should do. I’ve been a bit wary of bumping the retry count too high as we’ve also seen a number of errors bouncing thought host due to other errors (such as the quantum port quota issue), but
I might look at bumping it up a tad.</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Yeah. I think we need some better logic based on the type of exception that occurs. You may want to possibly retry forever (maybe not forever, but a lot more than 3 times -- although maybe forever is OK -- worst case you loop through
all your hosts and they're all full :) if the exception comes from the resource tracker. It's doing its job properly and it's kicking back the message quickly. But other exceptions can occur later… possibly things like bad images, etc. You probably want
to retry a few times just to make sure, but you don't want to retry very long because it's likely never going to succeed and you want to make sure don't have an instance sitting in 'building' forever.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">As I said, I think this can get better with conductor. If we have conductor responsible for building, we can break up the process to make things easier. We can do things like:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">[conductor]<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">def _get_host_from_scheduer():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> [query scheduler and return a host we've not tried with this request]<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">def _allocate_networks():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> return nwinfo<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">def _tell_compute_to_assign_instance():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> [on the compute side, this does resource_tracker checking and assigns instance['host']]<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">def _tell_compute_to_build():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> [compute downloads image and builds]<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">def _tell_compute_to_run_instance()<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Then build instance logic could be:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">with try_2_times():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> _allocate_networks()<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">with try_2_times():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> with try_forever():<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> _get_host_from_scheduler()<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> _tell_compute_to_assign_instance()<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> _tell_compute_to_build()<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> _tell_compute_to_run_instance()<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Something like that if you know what I mean. This would happen to address one of the other issues with retries… they deallocate and reallocate networks every time you have to retry due to resource tracker failure.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D"> </span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:11.0pt;font-family:"Calibri","sans-serif";color:#1F497D">I’m also thinking about a small mod to the scheduler_host_subset_size code that adds a scheduler_host_subset_offest, so that you could for example have one
scheduler picking from the top 10 hosts, and another picking from 11-20. That won’t guarantee there’s never an overlap, but I think it would reduce it considerably. It would also mean that if you do loose a scheduler the most hosts that are no longer
scheduled to becomes scheduler_host_subset_size.</span><o:p></o:p></p>
</div>
</div>
</blockquote>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">That's an interesting thought. Although I'd like to make things "just work" without having to explicitly configure something like this. :)<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">- Chris<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</div>
</div>
</body>
</html>