[openstack-dev] [ironic] [nova] Ironic virt driver resources reporting

Pavlo Shchelokovskyy pshchelokovskyy at mirantis.com
Tue Jan 3 11:31:51 UTC 2017


Hi,

a comment about 'report as full' vs 'remove from inventory'

On Mon, Jan 2, 2017 at 7:53 PM, Jay Pipes <jaypipes at gmail.com> wrote:

> Great questions, Vlad. Comments inline.
>
> On 12/30/2016 11:40 AM, Vladyslav Drok wrote:
>
>> Hi all!
>>
>> There is a long standing problem of resources reporting in ironic virt
>> driver.
>>
>
> That would be an understatement :)
>
> > It's described in a couple of bugs I've found - [0], [1].
>
>> Switching to placement API will make things better, but still there are
>> some problems there. For example, there are cases when ironic needs to
>> say "this node is not available", and it reports the
>> vcpus=memory_mb=local_gb as 0 in this case. Placement API does not allow
>> 0s, so in [2] it is proposed to remove inventory records in this case.
>>
>
> Correct.
>
> But the whole logic here [3] seems not that obvious to me, so I'd like
>> to discuss when do we need to report 0s to placement API. I'm thinking
>> about the following (copy-pasted from my comment on [2]):
>>
>>   * If there is an instance_uuid on the node, no matter what
>>     provision/power state it's in, consider the resources as used. In
>>     case it's an orphan, an admin will need to take some manual action
>>     anyway.
>>
>
> The single source of truth for Ironic instances is the Ironic database. If
> Ironic's database says that a node is consumed by an instance, then it
> should be considered by Nova to be consumed.
>

Well, it is nova that marks the instance as consumed by setting the
instance_uuid field on the node :) The question is when is the right time
to remove it... (see my next comment below). Currently it is removed before
teardown/undeploy, so the node in CLEANING state already has no
instance_uuid on itself.


>   * If there is no instance_uuid and a node is in cleaning/clean wait
>>     after tear down, it is a part of normal node lifecycle, report all
>>     resources as used. This means we need a way to determine if it's a
>>     manual or automated clean.
>>
>
> I don't see a need to determine manual vs. automated clean. The node is in
> a clean state; therefore the inventory of resources on that node are not
> available for a consumer of those resources to consume. So, the inventory
> should be deleted in Nova. This inventory should be re-added if and when
> the node is in a state that a consumer can grab it.
>
>
There is a difference between "removing the resource from available" vs
"declaring the resource fully consumed" - the end result for scheduling is
the same (those resources are not being scheduled to), but I am worrying
about any cloud-wide monitoring mechanisms that may start alerting about
hypervisors disappearing / total cloud capacity going down even though
everything is operating normally.

IMO during the happy path for nova instance on ironic node ( node available
-> nova does deploy -> node active -> nova does undeploy -> node is
available, with all intermediate *ing / *_wait states) the node should be
reported as "fully consumed by instance" as cleaning in this case is a
standard part of healthy node lifecycle. Only when something out of happy
path happens (maintenance, deploy or cleaning error) should the node be
removed from overall cloud capacity. And this is why we might have to
differentiate between automated cleaning (happy path) vs manual cleaning
(usually some manual recovery from error). Due to this I'd also suggest to
remove the instance_uud from ironic node in the end of cleaning, should
make clearer in which stage is the node right now.


>   * If there is no instance_uuid, and a node:
>>       o has a bad power state or
>>       o is in maintenance
>>       o or actually in any other case, consider it unavailable, report
>>         available resources = used resources = 0. Provision state does
>>         not matter in this logic, all cases that we wanted to take into
>>         account are described in the first two bullets.
>>
>
> Correct. If there is no instance UUID for the node, that means there's no
> allocation for it. If there's no allocation for the node, its inventory can
> and should be deleted if the node cannot be consumed by an instance (for
> whatever reason).
>
> Best,
> -jay
>
> Any thoughts?
>>
>> [0]. https://bugs.launchpad.net/nova/+bug/1402658
>> [1]. https://bugs.launchpad.net/nova/+bug/1637449
>> [2]. https://review.openstack.org/414214
>> [3]. https://github.com/openstack/nova/blob/1506c36b4446f6ba1487a
>> 2d68e4b23cb3fca44cb/nova/virt/ironic/driver.py#L262
>>
>> Happy holidays to everyone!
>> -Vlad
>>
>>
>> ____________________________________________________________
>> ______________
>> OpenStack Development Mailing List (not for usage questions)
>> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscrib
>> e
>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>>
>>
> __________________________________________________________________________
> OpenStack Development Mailing List (not for usage questions)
> Unsubscribe: OpenStack-dev-request at lists.openstack.org?subject:unsubscribe
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>

Cheers,

Dr. Pavlo Shchelokovskyy
Senior Software Engineer
Mirantis Inc
www.mirantis.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.openstack.org/pipermail/openstack-dev/attachments/20170103/e5a18d91/attachment.html>


More information about the OpenStack-dev mailing list