Open Stack

Tue May 31 17:40:32 UTC 2016

On 05/31/2016 01:06 PM, Chris Dent wrote:
> On Tue, 31 May 2016, Jay Pipes wrote:
>> Kinda. What the compute node needs is an InventoryList object
>> containing all inventory records for all resource classes both local
>> to it as well as associated to it via any aggregate-resource-pool
>> mapping.
>
> Okay, that mostly makes sense. A bit different from what I've proved
> out so far, but plenty of room to make it go that way.

Understood, and not a problem. I will provide more in-depth coded 
examples in code review comments.

>> The SQL for generating this InventoryList is the following:
>
> Presumably this would be a method on the InventoryList object
> itself?

InventoryList.get_by_compute_node() would be my suggestion. :)

>> We can deal with multiple shared storage pools per aggregate at a
>> later time. Just take the first resource provider in the list of
>> inventory records returned from the above SQL query that corresponds
>> to the DISK_GB resource class and that is resource provider you will
>> deduct from.
>
> So this seem rather fragile and pretty user-hostile. We're creating an
> opportunity for people to easily replace their existing bad tracking of
> disk usage with a different style of bad tracking of disk usage.

I'm not clear why the new way of tracking disk usage would be "bad 
tracking"? The new way is correct -- i.e. the total amount of DISK_GB 
will be correct instead of multiplied by the number of compute nodes 
using that shared storage.

> If we assign to different shared disk resource pools to the same
> aggregate we've got a weird situation (unless we explicitly order
> the resource providers by something).

Sure, but I'm saying that, for now, this isn't something I think we need 
to be concerned about. Deployers cannot *currently* have multiple shared 
storage pools used for providing VM ephemeral disk resources. So, there 
is no danger -- outside of a deployer deliberately sabotaging things -- 
for a compute node to have >1 DISK_GB inventory record if we have a 
standard process for deployers that use shared storage to create their 
resource pools for DISK_GB and assign compute nodes to that resource pool.

> Maybe that's fine, for now, but it seems we need to be aware of, not
> only for ourselves, but in the documentation when we tell people how
> to start using resource pools: Oh, by the way, for now, just
> associate one shared disk pool to an aggregate.

Sure, absolutely.

>> Assume only a single resource provider of DISK_GB. It will be either a
>> compute node's resource provider ID or a resource pool's resource
>> provider ID.
>
> ✔
>
>> For this initial work, my idea was to have some code that, on creation
>> of a resource pool and its association with an aggregate, if that
>> resource pool has an inventory record with resource_class of DISK_GB
>> then remove any inventory records with DISK_GB resource class for any
>> compute node's (local) resource provider ID associated with that
>> aggregate. This way we ensure the existing behaviour that a compute
>> node either has local disk or it uses shared storage, but not both.
>
> So let me translate that to make sure I get it:
>
> * node X exists, has inventory of DISK_GB
> * node X is in aggregate Y
> * resource pool A is created
> * two possible paths now: first associating aggregate to pool or
>    first adding inventory pool
> * in either case, when aggregate Y is associated, if the pool has
>    DISK_GB, traverse the nodes in aggregate Y and drop the disk
>    inventory

Correct.

> So, effectively, any time we associate an aggregate we need to
> inspect its nodes?

Yeah.... good point. :(

I suppose the alternative would be to "deal" with the multiple resource 
providers by just having the resource tracker pick whichever one appears 
first for a resource class (and order by the resource provider ID...). 
This might actually be a better alternative long-term, since then all we 
would need to do is change the ordering logic to take into account 
multiple resource providers of the same resource class instead of 
dealing with all this messy validation and conversion.

> What happens if we ever disassociate an aggregate from a resource pool?
> Do the nodes in the aggregate have some way to get their local Inventory
> back or are we going to assume that the switch to shared is one way?

OK, yeah, you've sold me that my solution isn't good. By just allowing 
multiple providers and picking the "first" that appears, we limit 
ourselves to just needing to do the scrubbing of compute node local 
DISK_GB inventory records -- which we can do in an online data migration 
-- and we don't have to worry about the disassociate/associate aggregate 
problems.

> In my scribbles when I was thinking this through (that led to the
> start of this thread) I had imagined that rather than finding both
> the resource pool and compute node resource providers when finding
> available disk we'd instead see if there was resource pool, use it
> if it was there, and if not, just use the compute node. Therefore if
> the resource pool was ever disassociated, we'd be back to where we
> were before without needing to reset the state in the artifact
> world.

That would work too, yes. And seems simpler to reason about... but has 
the potential of leaving bad inventory records in the inventories table 
for "local" DISK_GB resources that never will be used.

Best,
-jay

Open Stack

[openstack-dev] [nova] [placement] aggregates associated with multiple resource providers

OpenStack

Community

Documentation

Branding & Legal