<div dir="ltr"><div class="gmail_extra"><div><div class="gmail_signature"><div dir="ltr"><div>Hi,</div><div><br></div><div>a comment about 'report as full' vs 'remove from inventory'</div><div><br></div></div></div></div><div class="gmail_quote">On Mon, Jan 2, 2017 at 7:53 PM, Jay Pipes <span dir="ltr"><<a href="mailto:jaypipes@gmail.com" target="_blank">jaypipes@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Great questions, Vlad. Comments inline.<span class="gmail-"><br>
<br>
On 12/30/2016 11:40 AM, Vladyslav Drok wrote:<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Hi all!<br>
<br>
There is a long standing problem of resources reporting in ironic virt<br>
driver.<br>
</blockquote>
<br></span>
That would be an understatement :)<span class="gmail-"><br>
<br>
> It's described in a couple of bugs I've found - [0], [1].<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
Switching to placement API will make things better, but still there are<br>
some problems there. For example, there are cases when ironic needs to<br>
say "this node is not available", and it reports the<br>
vcpus=memory_mb=local_gb as 0 in this case. Placement API does not allow<br>
0s, so in [2] it is proposed to remove inventory records in this case.<br>
</blockquote>
<br></span>
Correct.<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">
But the whole logic here [3] seems not that obvious to me, so I'd like<br>
to discuss when do we need to report 0s to placement API. I'm thinking<br>
about the following (copy-pasted from my comment on [2]):<br>
<br></span>
* If there is an instance_uuid on the node, no matter what<span class="gmail-"><br>
provision/power state it's in, consider the resources as used. In<br>
case it's an orphan, an admin will need to take some manual action<br>
anyway.<br>
</span></blockquote>
<br>
The single source of truth for Ironic instances is the Ironic database. If Ironic's database says that a node is consumed by an instance, then it should be considered by Nova to be consumed.<br></blockquote><div><br></div><div>Well, it is nova that marks the instance as consumed by setting the instance_uuid field on the node :) The question is when is the right time to remove it... (see my next comment below). Currently it is removed before teardown/undeploy, so the node in CLEANING state already has no instance_uuid on itself.</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* If there is no instance_uuid and a node is in cleaning/clean wait<span class="gmail-"><br>
after tear down, it is a part of normal node lifecycle, report all<br>
resources as used. This means we need a way to determine if it's a<br>
manual or automated clean.<br>
</span></blockquote>
<br>
I don't see a need to determine manual vs. automated clean. The node is in a clean state; therefore the inventory of resources on that node are not available for a consumer of those resources to consume. So, the inventory should be deleted in Nova. This inventory should be re-added if and when the node is in a state that a consumer can grab it.<br>
<br></blockquote><div><br></div><div>There is a difference between "removing the resource from available" vs "declaring the resource fully consumed" - the end result for scheduling is the same (those resources are not being scheduled to), but I am worrying about any cloud-wide monitoring mechanisms that may start alerting about hypervisors disappearing / total cloud capacity going down even though everything is operating normally.</div><div><br></div><div>IMO during the happy path for nova instance on ironic node ( node available -> nova does deploy -> node active -> nova does undeploy -> node is available, with all intermediate *ing / *_wait states) the node should be reported as "fully consumed by instance" as cleaning in this case is a standard part of healthy node lifecycle. Only when something out of happy path happens (maintenance, deploy or cleaning error) should the node be removed from overall cloud capacity. And this is why we might have to differentiate between automated cleaning (happy path) vs manual cleaning (usually some manual recovery from error). Due to this I'd also suggest to remove the instance_uud from ironic node in the end of cleaning, should make clearer in which stage is the node right now.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">
* If there is no instance_uuid, and a node:<br>
o has a bad power state or<br>
o is in maintenance<br>
o or actually in any other case, consider it unavailable, report<span class="gmail-"><br>
available resources = used resources = 0. Provision state does<br>
not matter in this logic, all cases that we wanted to take into<br>
account are described in the first two bullets.<br>
</span></blockquote>
<br>
Correct. If there is no instance UUID for the node, that means there's no allocation for it. If there's no allocation for the node, its inventory can and should be deleted if the node cannot be consumed by an instance (for whatever reason).<br>
<br>
Best,<br>
-jay<br>
<br>
<blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class="gmail-">
Any thoughts?<br>
<br>
[0]. <a href="https://bugs.launchpad.net/nova/+bug/1402658" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nov<wbr>a/+bug/1402658</a><br>
[1]. <a href="https://bugs.launchpad.net/nova/+bug/1637449" rel="noreferrer" target="_blank">https://bugs.launchpad.net/nov<wbr>a/+bug/1637449</a><br>
[2]. <a href="https://review.openstack.org/414214" rel="noreferrer" target="_blank">https://review.openstack.org/4<wbr>14214</a><br>
[3]. <a href="https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262" rel="noreferrer" target="_blank">https://github.com/openstack/n<wbr>ova/blob/1506c36b4446f6ba1487a<wbr>2d68e4b23cb3fca44cb/nova/virt/<wbr>ironic/driver.py#L262</a><br>
<br>
Happy holidays to everyone!<br>
-Vlad<br>
<br>
<br></span>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
<br>
</blockquote>
<br>
______________________________<wbr>______________________________<wbr>______________<br>
OpenStack Development Mailing List (not for usage questions)<br>
Unsubscribe: <a href="http://OpenStack-dev-request@lists.openstack.org?subject:unsubscribe" rel="noreferrer" target="_blank">OpenStack-dev-request@lists.op<wbr>enstack.org?subject:unsubscrib<wbr>e</a><br>
<a href="http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev" rel="noreferrer" target="_blank">http://lists.openstack.org/cgi<wbr>-bin/mailman/listinfo/openstac<wbr>k-dev</a><br>
</blockquote></div><div class="gmail_extra"><br></div><div class="gmail_extra">Cheers,</div><div class="gmail_extra"><br></div>Dr. Pavlo Shchelokovskyy<div>Senior Software Engineer</div><div>Mirantis Inc</div><div><a href="http://www.mirantis.com/" target="_blank">www.mirantis.com</a></div></div></div>